Inauguration Day – DevOps in the White House!

Published by on

This isn’t a political view blog. No, it’s about change.  Americans are getting a new President. Trump will be appointing new members of the Presidential staff and these new leaders will surely initiate change in our government’s current laws, policies, and processes. Especially in technology, we need to keep up and be ready for change to ensure successful releases of web, mobile and IoT applications.

Perhaps one of the most famous blunders in the web history was with the 2013 crash of the Healthcare.Gov site. Years of planning and millions of dollars were spent to create, organize, and rollout out the new healthcare web site. I don’t know the exact legal mumbo jumbo as how to create a Bill which passes to a Law, but it certainly took a lot of effort and the cause was great – to give people affordable healthcare options. These independent plans were immensely helpful to those of us who are entrepreneurs and pay the premiums entirely out of pocket. When Americans finally went to sign up for their new healthcare plans, the response times of the web pages started to crawl – too slow even for usability – and then it crashed. Upon rollout, it’s estimated that approximately 2 million people who attempted to sign up for their new healthcare plans were affected by this blunder.  This root cause? Lack of methodical performance testing. Change is good and in web and mobile application technology, every change needs to be performance tested to ensure success.

Could this blunder have been prevented? Yes, absolutely, this web site failure could have been prevented by methodical performance testing and analyzing the results. To design a realistic performance test is a mathematical equation. Take the potential population of users (peak load) and add some extra headroom – just in case. Then create load scripts that mimic the transaction flows of real users. Add in some random yet reasonable think/pause times to make those automated users act like human beings. Define a scenario which ramps up the users slowly to peak load. Isolate bottlenecks and tune for scalability. Additionally you execute spike test where larger than expected volumes of users hit your site, endurance tests where you execute your expected load for a prolonged period to ensure your application remains stable over time. You know, the stuff we do at TPC every week.

Did the government learn from this blunder? Re-Enrollment came around and it was time to sign up for your health plan again. It was hoped they had used the time to methodically load test the and tune for scalability. After all, they had a year of web logs and statistics available for analysis. All the required statistics were there – arrival rates, session durations, transaction mixes, and peak usage patterns. But nope, upon re-enrollment, users experienced slow response times and browser timeouts again (well, not enough to make the the news this time but I did some screenshot for proof). Unfortunately, I hadn’t anticipated writing this blog and I didn’t save those screenshots after sharing a laugh with another performance engineer colleague or I would have included them!  Trump is already making moves to repeal the Obama healthcare policy. We hope that the replacement options will be even better and even more affordable. The question remains is if the new government will thoroughly load test the site for peak load to ensure that Americans can access and successfully sign up for new healthcare options.

Change is constant. Nowhere it that truth more relevant than in the deployment of applications. Load testing releases due to new code, architectural changes and increased workloads is required to ensure success. We used the example of a government website but obviously every web application, every mobile application, and every Internet of Things application requires load testing prior to release. People have become more impatient every year. Our standards of a satisfactory user experience have also changed with technology. We might have been patient to wait 8 or 10 seconds for a page to load but now, we wait 3 seconds and then start muttering about response times. If we mutter to ourselves for more than a few seconds, we give up and this results in a technical term called a “bounce”. Bounces represent loss. Loss of productivity, loss of customers, loss of revenue, loss of branding, loss of reputation, and list goes on. Continuous Integration of load testing is vital to every application’s success. As change is always in flux, application owners take on the responsibility of integrating load testing into each and every release.

We hope that the new President’s staff will involve us Performance Engineers in the release of the newly anticipated healthcare site! We also hope that businesses will reach out TPC for any of their load testing needs.



Categories: Load Testing