October 10, 2024

IN THIS ARTICLE
2024 Software Developer Salary Guide

We’ve been building teams overseas for over a decade. Download our definitive guide to hiring international software developers.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Increases in traffic from holiday surges and seasonal spikes can quickly push applications to their limits. Whether it’s a retail site handling a large wave of online shoppers or a streamlining platform serving a new episode for millions of viewers, application performance can be a make-or-break difference. The same holds true for tech companies releasing a new product or a gaming studio launching a new game. In other sectors - like finance, government, and healthcare - the stakes can be even higher. 

We can be certain of a few things. 

  • One, applications can rarely (if ever) afford to experience extended periods of downtime or diminished performance. 
  • Second, spikes in usage are bound to happen at some point or another. 

In this article, we’ll explore the key strategies and best practices for preparing your application to smoothly handle the seasonal spikes. From optimizing performance testing to scaling infrastructure, we'll provide actionable insights to help you ensure a seamless user experience during the busiest times of the year.

1. Business Preparation & Test Planning

Identify Business Goals and Establish Technical Objectives

First off, it’s important for performance architects to serve as a bridge between technical and business stakeholders. Think of it this way. If non-technical business stakeholders and technical team members are ‘speaking 2 different languages’ - then performance teams need to be ‘bilingual’. This ensures nothing is lost in translation when non-technical business stakeholders articulate their objectives to technical teams. 

  • Understanding Business Objectives: Grasping the key goals and metrics that the business aims to achieve during the peak period.
  • Communicating Technical Needs: Clearly explaining the technical implications of these goals to both technical and non-technical stakeholders.
  • Feedback Loop to Business Stakeholders: For example, the business may ask for what seems like a simple change in functionality. In some cases, these can have significant downstream technical impacts and take longer than expected. Factors like these need to be communicated back to business stakeholders in relatable terms. 

Aligning Business and Technical Stakeholders

Here are a few practical examples of how a performance engineer might translate a set of business needs into technical requirements. 

Example 1: Driving Online Revenue With a Flash Sale

  • Business Need: A marketing team for a retail brand wants to hold a flash sale to drive a 70% increase in sales over a 48-hour period.
  • Technical Translation: This requires the application to handle more than twice the usual traffic, necessitating extensive load tests that simulate this increased volume. It also means ensuring that the payment gateway, inventory management system, and user interface can handle the increased load without degradation in performance.
  • Communicating Implications: A performance engineer must communicate to non-technical stakeholders that achieving this goal involves significant testing and possible infrastructure upgrades, explaining in simple terms the necessity of each technical step to ensure the sale's success.

Example 2: Enhanced User Experience During Signup 

  • Business Need: The product team wants to ensure that page load times do not exceed 2 seconds, even during peak traffic. The marketing team is running ads during a major TV event and is anticipating a spike in traffic from web, mobile web, and native mobile.
  • Technical Translation: This involves performance tests focused on page load times under various load conditions, optimizing server response times, and possibly implementing a content delivery network (CDN) to distribute load efficiently. Various API’s will also need to maintain certain thresholds for response times to meet the need.
  • Communicating Implications: The performance engineer should explain to non-technical stakeholders how optimizing server response times and using a CDN will help achieve faster load times. Using relatable analogies or visual aids to illustrate the concepts always helps during cross-functional meetings. 

Example 3: Launching a New Feature

  • Business Need: The business plans to introduce a new feature that allows users to customize products, expecting high engagement during the holidays.
  • Technical Translation: This includes load testing the new feature to ensure it can handle expected usage. Articulate the need to  integrate the new functionality with existing systems without performance impacts and ensuring it doesn’t introduce new bottlenecks.
  • Communicating Implications: The performance engineer needs to communicate to non-technical stakeholders that thorough testing and careful integration are crucial to prevent performance issues. They should also emphasize the user benefits and potential risks if not properly managed.

Divide Traffic Estimates into Channels and Build Tests Accordingly

Unless you’re launching a brand-new product or lack an APM tool, start by investigating historical data. How much total traffic did your application experience the last time you experienced peak usage? Of that traffic, where did it originate from (ie. mobile, mobile web, native mobile)? This is important since it can help performance engineers construct more realistic tests. 

For example, if the business objective is to handle one million transactions in an hour, the performance team would need to create test scenarios that simulates load based on estimates of various channels (e.g., 500,000 transactions on mobile, 300,000 on the web, and 200,000 on native mobile).

Ensure Team Readiness 

It’s one thing to remediate a functional bug that leaks into production. Non-functional performance bugs can be far more difficult and time-consuming to fix. Oftentimes, it requires close collaboration across DevOps, database, and development teams to solve the root cause of an issue. For this very reason, it’s essential to have well-defined procedures for team members to follow during an unexpected outage. 

Of course, ‘Plan A’ is to make sure a critical defect never leaks to production in the first place. Plan B is to make sure that incidents are contained quickly and only cause minimal disruption to the business. 

Action Points:

  • War Room Setup: Establish a virtual or physical war room for proactive, coordinated responses. 
  • Team Training: Regularly train your team on alert management and issue resolution.
  • Shift Planning: Plan shifts and ensure key personnel are available during critical times.

Establish a Backup Plan

Incorporate failover tests to ensure your system can handle unexpected issues. For example, if you have two database instances, test whether the secondary instance can seamlessly take over if the primary one fails under heavy load. Additionally, prepare for potential traffic diversion if issues arise on specific platforms, such as mobile or web. This can also buy time for team members to fix issues in impacted areas without outages or performance degradations for end users. 

Scenarios to Consider:

  • Throttling: Can your system throttle traffic to handle sudden spikes?
  • Alerting and Response: Are alerts configured to promptly notify the right individuals? Are teams prepared to act on alerts? 
  • Team Coordination: Can various team members quickly and easily communicate with one another?

2. Test Execution & Operational Response

Comprehensive load tests are perfect for conducting a final 'dress rehearsal' prior to anticipated traffic spikes. This phase also provides stakeholders with the opportunity to validate their monitoring and response capabilities under simulated peak conditions, ensuring the application and team are prepared to handle real-world increases in traffic and workload.

Pre-Execution Coordination

Before executing large-scale performance tests, ensure all teams are aligned and have a clear understanding of their roles and responsibilities. This phase sets the stage for smooth and effective test execution.

Steps to Follow:

  • Kickoff Meetings: Organize kickoff meetings with all relevant stakeholders to review the test plan and objectives.
  • Resource Allocation: Confirm that all necessary resources (tools, environments, personnel) are available and ready.
  • Timeline Agreement: Agree on a timeline for the test execution phase, including key milestones and deadlines.
  • Code Freeze: Implement a code freeze period leading up to peak traffic times to ensure stability. This involves halting all non-critical changes to the codebase to prevent new issues from being introduced.

Comprehensive Performance Testing

Execute a variety of performance tests to cover different aspects of your application’s performance under peak load conditions. Here’s the 4 main examples:

  • Load Testing: Simulate expected peak traffic to measure how the system performs under typical load conditions.
  • Stress Testing: Push the system beyond its usual limits to identify breaking points and understand how it behaves under extreme conditions.
  • Endurance Testing: Run tests over extended periods to check for memory leaks, performance degradation, and stability over time.
  • Spike Testing: Simulate sudden, extreme increases in load to see how the system handles unexpected spikes.

Tips for Performance Test Execution

  • Use Realistic Data: Ensure that test data accurately represents real-world usage scenarios.
  • Monitor During Tests: Continuously monitor system performance during tests to capture real-time data and identify issues as they occur.
  • Adjust and Repeat: Based on initial test results, adjust configurations and retest as necessary to fine-tune performance.

Real-Time Monitoring and Alerts

This is also a time to put your monitoring and alerting practices to the test. Here’s a few key areas to consider:

  • Invest in APM: Choose appropriate monitoring tools (e.g., Splunk, Dynatrace, New Relic) that integrate well with your infrastructure.
  • Define Metrics: Identify and define critical performance metrics such as response times, error rates, throughput, and resource utilization.
  • Configure Alerts: Set up alerts for critical metrics. Ensure alerts are fine-tuned to avoid unnecessary noise and have a clear escalation path. 

Operational Response and Coordination

See how various teams respond to potential issues that are found in a pre-production environment. If performance tests do flag an issue in a stable environment (like staging), it’s at least an opportunity to see how cross-functional teams work together in a lower stakes setting. 

Team Members to Include for Operational Readiness Drills:

  • Performance Engineers: To monitor system performance and identify any potential issues.
  • DevOps Team: To manage infrastructure and deploy necessary changes swiftly.
  • Database Administrators: To handle database performance and failover scenarios.
  • Application Developers: To fix any code-related issues that may arise.
  • QA/Testers: To validate that the system meets performance criteria and to help identify bugs.
  • Business Stakeholders: To ensure that the system meets business requirements and to provide feedback on performance.

The Importance of Final Code Freeze

For proper readiness, it’s crucial to have at least 1-2 weeks for a final code freeze before a large anticipated spike in traffic. If changes do need to be made, it should require a change order that’s carefully reviewed and tested before it’s deployed. All too often, companies fail to start planning early and don’t have a chance to stabilize their application before a traffic surge. When this occurs, applications can be particularly vulnerable to experiencing outages or performance degradation in production. 

3. Review

Conducting a Postmortem Analysis

Teams have a tendency to drift back into silos after a peak event is completed. However, this is an important opportunity to have a productive post-mortem meeting for comparing real-world results to assumptions made during the planning and preparation phases. 

Steps for Postmortem:

  • Data Collection: Gather data on system performance, user activity, any incidents that occurred, and business results 
  • Analysis: Analyze the data to identify what went well and what needs improvement.
  • Feedback: Collect feedback from all stakeholders to get a comprehensive view of the performance.

Validating Assumptions

Review the assumptions made during the preparation phase and validate them against the actual outcomes. For example, if you expected 50% of traffic from mobile users, was that the case? Or was it around 70%? A finding like this can help guide future performance tests and make them more reliable. 

Key Steps:

  • Assumption Review: List all the baselines and assumptions and compare them with actual data.
  • Impact Assessment: Assess the impact of any incorrect assumptions on overall performance.
  • Adjust Future Strategies: Based on the findings, update and refine your performance testing and preparation strategies to better align with new observations.

Incorporating Learnings into Future Plans

Use the insights gained from the postmortem analysis to improve future performance testing plans. This includes updating test scenarios, refining response procedures, and enhancing monitoring tools.

Improvement Plan:

  • Scenario Updates: Modify test scenarios based on the actual performance data.
  • Procedure Refinement: Improve response procedures to address any gaps identified during the review.
  • Tool Enhancements: Upgrade monitoring and alerting tools to better capture performance metrics.
  • Training: Ensure that the team stays updated with the latest performance engineering practices and tools.

Continuous Improvement

By conducting thorough reviews and incorporating learnings into future plans, you can ensure continuous improvement in your application’s performance. This iterative approach helps build a robust system that can handle peak traffic efficiently, ensuring a smooth experience for users during critical periods.

To sum everything up, preparing your application for peak traffic involves meticulous preparation, effective execution, and thorough review. By following these best practices, you can ensure that your application remains reliable and performs optimally, providing a seamless experience for your users during the busiest times of the year.

If you’re looking for any support in your planning process, you’re more than welcome to book a time with our performance architects to learn how we enable clients to navigate seasonal spikes. 

Dev

teams

love

Perform

“Perform helped MHE build out new teams, supplement existing teams, and improve our overall performance testing posture”.
Shane Shelton

Sr. Director

Application Performance and Development Operations, McGraw Hill Education

totalperform logo

Founded by engineers - for engineers.
Expert consulting and staffing for software engineering at scale.