In November 2021, we began a journey with a major gaming publisher to reimagine their approach to performance testing. In this article, we’ll retrospect on how we helped them transform their fragmented approach performance testing into a comprehensive strategy that allows them to simulate up to 25M concurrent users on their servers.
Understanding the Challenges
During initial discussions, we quickly realized the publisher had an even more complex landscape than we were expecting. Like other major publishers, they have made a variety of significant acquisitions over the years - which added extra titles, teams, and studios to their ranks. As complex organizations tend to do, each line of business gradually drifted into using their own tools and processes for developing, testing, and releasing. In late 2021, this fragmentation led to a breaking point, which prompted their leadership team to explore alternative methods for centralizing their performance testing strategy.
Aside from fragmentation, there were a few other obstacles to solve:
- Scalability Issues: Existing tools like Blazemeter and Jmeter could not handle the massive volume of concurrent users, leading to scalability bottlenecks and high costs (largely for cloud consumption and operational overhead).
- Operational Inefficiencies: Preparing for performance tests was a very labor-intensive process, taking up to 2-3 days just to run a single test. This preparation time significantly hindered the agility needed for frequent testing.
- Custom Protocols: The publisher used proprietary protocols customized from Google’s protocols, which were not supported by commercial performance testing tools, necessitating a unique solution.
Given the sheer scope of the tasks at hand, the publisher decided to engage outside help - which is where the team at Perform came into the picture.
Initial Steps and Gap Analysis
As a starting point, the Perform team ran through a comprehensive gap analysis and intake exercise with the various business units.
- Tooling and Monitoring Assessment: Evaluated the various tools, monitoring systems, and dashboards currently in use across different teams.
- Release Frequency and Test Cycle Time: Analyzed how often each team released new updates and how long test cycles took for each release
- Platform and Technology Evaluation: Inventoried the technologies and platforms used to develop and run the games.
- Team and Skill-Set Analysis: Assessed the strengths and weaknesses of each team to identify areas where hiring or training could add new capabilities
Once this was done, we had everything we needed to build and present a blueprint for a unified performance framework to the publisher’s leadership team. Shortly after the final approvals to get started were granted, the real work began.
Building the Unified Platform
Now, as a general practice, we typically don’t recommend building custom tools for performance testing. However, the gaming industry can be an exception to the rule - especially for complex publishers and studios that service millions of gamers all over the world.
With our unified custom platform, the goal was to address 5 key areas, which were:
- Consolidation: Standardize performance testing strategies across different projects and products to ensure consistent performance standards and create a ‘north star’ for measuring performance.
- Scalability: Handle tests for up to 25 million concurrent users to ensure the platform could meet current (and future) demands.
- Efficiency: Reduce performance test preparation time from an average of 3 days to hours, enabling more frequent and agile testing cycles - while also reducing operational strain
- Dynamic Load Management: Provide the capability to scale up or down specific transactions dynamically during tests to enhance testing flexibility and reliability.
- Granular Control: Allow for the adjustment of individual calls without affecting the entire system, which is particularly beneficial in gaming environments where different transactions have varying performance requirements.
- Custom Protocol Support: Enable performance testing against the publisher’s proprietary protocols, which would overcome a significant barrier that commercial tools could not address.
Since the first version of the custom solution was built, we’ve continued to collaborate with the publisher’s team to further enhance the platform. This has led to additional innovations over the last few years.
Key Innovations and Features
As of 2024, we continue to collaborate with the publisher to build more capabilities. Here’s a few (of many) recent enhancements:
- Production Testing: Enabled testing in live production environments with the ability to isolate and reduce load on problematic areas without halting the entire test. This approach mitigated risks and ensured continuous testing without disrupting live services.
- Scheduler Integration: Integrated a scheduler that automatically created test patterns based on real production data. This feature simplified workload modeling and eliminated guesswork, ensuring tests were realistic and relevant.
- Cloud Deployment: Transitioned from Jenkins to direct cloud API deployments for load generators, further reducing preparation time and enhancing scalability. This shift leveraged the flexibility and power of cloud computing to meet testing demands efficiently.
Overall, this project was one of the most complex and demanding engagements we’ve worked on in recent years - but it was easily one of the rewarding!
Results and Impact
The unified performance testing platform delivered - and continues to deliver - significant and measurable benefits for the publisher:
- Increased Scalability: Successfully supports up to 25 million concurrent users during testing, ensuring the platform can handle future growth.
- Operational Efficiency & Increased Quality: Dramatically reduced the test preparation phase from 2-3 days to just a few hours, enabling the teams to conduct tests more frequently and efficiently. As a result, this has led to a higher degree of quality and reliability across their various games and systems.
- Enhanced Flexibility: Provided granular control over individual transactions during load tests, improving test accuracy and reliability. This flexibility allowed the teams to focus on specific areas of the application and resolve issues dynamically.
- Real-time Issue Management: Enabled real-time capabilities to manage load while tests are in progress. This prevents test failures and eliminates the need to start tests over from the beginning - which in turn facilitates continuous performance testing.
Conclusion
This project exemplifies how a tailored approach to performance engineering can overcome the inherent challenges of large-scale, fragmented testing environments. By developing a unified platform, we enabled a major gaming publisher to scale their load testing capabilities, ensuring their games could handle massive user loads efficiently and reliably.
This transformative solution not only addressed the immediate pain points but also set a new standard for performance testing in the gaming industry, demonstrating the power of customized, scalable performance engineering solutions.