table of contents
We live in tumultuous, but interesting times. The rich have gotten richer, the poor—poorer and innovators have devised innovative ways to work through the disruption that has been brought about by the coronavirus-induced pandemic. The pandemic has also brought about a battery of changes to our lifestyle, beginning with many of us learning how to cook complex dishes from scratch, others finding newer hobbies, or even spending time to learn something new about ourselves. During the pandemic, many of us have also finally found the time to curl up on our couches, turn into couch potatoes and binge-watch Netflix originals till we run out of bandwidth.
Sudden surges
Although most services such as Netflix, Amazon Prime video and many other video and audio streaming service providers have highly scalable systems that can withstand sudden surges and spikes in usage, there are chances that these services might experience outages which can result in user frustration and in some extreme cases of long-term outages—abandonment of the platform too. Complex, large-scale distributed systems such as Netflix and Amazon Prime video and many others that potentially have millions of users must be tested effectively and extensively keeping in mind surges and spikes.
However, unusually heavy spikes such as those caused by the pandemic have been unprecedented and have possibly not been in any company’s testing team kitty.
Continuous integration delivery and production
The problems of CI/CD and resolving the problems of constantly engaged systems
Companies like Netflix have constant updates to their system, which are continuously tested and delivered to their live platforms. For this, Netflix testing teams create hundreds of thousands of tester accounts every day, each being used in thousands of test scenarios to avoid any shortfalls.
This has caused the testing of Netflix to move from a manual testing regimen that would work on a test system before making it live to a large, distributed automated testing of Netflix client and server applications running at scale in production. To facilitate this, testing at Netflix has gone from a low-volume manual mode to a continuous, fully automated, voluminous mode where nothing is left to chance.
An imaginary scenario with real implications
Imagine this—you, and millions of others are at nail-biting, suspenseful climax in the story and suddenly—boom! Netflix is now offline. This would send alarm bells ringing at Netflix HQ and testing SWAT teams would suddenly fly in from your windows to analyse what went wrong. However, thankfully, this does not happen often.
The Goal
The goal at Netflix is simple—to be online for their users 99.99% of the time. Although Netflix has a pretty decent track record of staying online, they do occasionally encounter glitches that put the system off track. One of these incidents occurred when a development team at Netflix deployed software that impacted the large infrastructure at Netflix negatively, causing widespread disruption in services and thousands of unhappy customers.
This led to Netflix scrambling to create a fix that would essentially resolve the issue in few hours, but also gave Netflix some food for thought—that their testing regimen was inadequate and ineffective for such a large, distributed, user-facing system.
What could go wrong?
What happened at Netflix was an oversight on various levels. A new piece of code that was designed to clean up unused resources was actually being tested on the production server. This oversight caused two major problems due to bugs in the code:
- The first bug caused a dry run mode flag in cleanup that was to protect the actual cleanup to be interpreted incorrectly—reversing its effect. This was caused to a poorly written unit test that could have caused this issue to be caught in development.
- The second bug was in a piece of code that checked if a resource was actually unused. The conclusion of this check overlooked some cases that existed only in production.
The combination of these two bugs caused a removal of key resources in production—resulting in the actual outage at Netflix.
Preventing these problems
Preventing or reducing the incidents of these problems leads to a common dilemma
Should testing be done in a test environment or in a production environment? Although most of us would advocate testing to be done in pre-production so that actual customers are not impacted, some would advocate testing in production to ensure that code is running well in both test and prod. The reality of the scenario is that the code should be tested in all three situations: dev, test and prod. The challenge faced by Netflix was to devise an effective methodology that helps in deciding why, when and how to test in these environments.
This also led to another set of questions
- Is the test environment a safe and complete mirror of our production environment?
OR
- Is the test environment the latest build with features that others might need to integrate with?
The result of this was the common scenario of having overtly complex and numerous test environments.
The answer
The answer to this problem that was creating from thinking of a fix to the existing problem was simple—end-to-end automation that would replicate thousands of scenarios without problems.
This answer, however, came with its own set of problems—finding a scalable solution to creating a production-like pre-production environment that does not require cloning production entirely and resulting in a massive investment requirement as well.
Another problem was that pre-production and production usage patterns could be completely different from each other. Traffic is also thousands of times less when compared to production.
Testing payments
Testing payments was another colossus altogether. Instead of testing payments in production using real money, it is better to create fake MOPs and fake transactions exercised on them in sandbox accounts that does not overburden the existing payment systems in place.
The approach
Of the thousands of possible approaches, Netflix chose production capture and replay to scale their test to be as close as possible to prod.
A large number of requests from customer devices was taken from persistence and duplex-replayed them in test after they were stripped of their personally identified information. This caused tests to become real-world scenarios. This also helped in identifying numerous corner-case bugs that were previously unknown.
The bugs identified were routed back into functional and integrated tests via a schema. This also helped in gaining confidence on quality feature migration and helped to accelerate change velocity. This also gave way to an interesting learning:
All the basic duplex tests could be run in PRODUCTION through tester accounts. However, prod capture and replay duplex tests were limited to the test environment because replaying in production would harm actual customer data with reissue of requests.
Hastings says. “And instead tragically it is a biological one, so everybody is locked up and we had the greatest growth in the first half of this year that we ever had.” With a market capitalization of around US$230 billion, it has been vying with Walt Disney since March for the title of the world’s most valuable entertainment group.
Masked and refreshed data could safely be used to replay requests in the test environment after a time delay. This focused our interest on the data set and not the production environment. Although this was not totally as stable as production, but gave us a good idea of how it could be.
Failing is important in testing. Failures help test teams to identify real issues in downstream implementations. To mitigate this, all functional validations were to run real canaries in production, essentially exposing a small percentage of actual customer traffic to both versions of the API under test.
Running canary analysis algos on the metrics that were gathered from these implementations and a compare-verify regimen would check if client and server metrics were equivalent. This would help to capture failing request logs from the canaries and would help to debug and triage issues better.
Learnings
Learnings from such an approach are manifold.
- The first one would be to understand that test and prod are different, but their differences must be embraced to utilize the capability of both.
- Although testing is good in a sandboxed environment, testing in production is important for such implementations.
- Solving the problems in either environment can go a long way in ensuring test success
- Stay on the lookout for rethinking your testing strategy. Even if it may come at an extra cost, the end result would be worth it.
- Find a pragmatic testing shape that is right for your company—do not look for a textbook shape that fits in.
- Start production simulation and chaos experiments—these will help to validate your functional and resiliency testing capabilities for the future.
At Netflix, chaos testing is done at scale in production. Testing everything from fire raining from the sky to aliens killing their servers, they leave nothing to chance. If they haven’t, why should you? The testing teams at Volumetree are experienced, reliant and know where to hand out the red flags. Give your software the quality edge it needs. Schedule a consultation with our test consultants today!




208 M2 Prefabrik Ev Fiyatları ve Modelleri | EVKON’s prefab solutions are truly ahead of the curve in the industry. I highly recommend everyone to give them a try.
Zonguldak Plise Sineklik Sistemleri | Venster Systems’in jaluzi perdeleri, evimin her odasına uyum sağlayacak geniş bir renk ve desen seçeneği sunuyor, bu da benim için önemli bir avantaj.
JavaScript url Nedir? | MAFA’nın sunduğu içerikler, web tasarımı ve yazılımıyla ilgili sorunlarımı çözmeme gerçekten yardımcı oldu.
127 M2 Prefabrik Ev Fiyatları ve Modelleri | EVKON’s prefab solutions are truly practical. Their functional designs and easy usage stand out.
Karamürsel Jaluzi Perde Çeşitleri | Thanks to Venster Systems’ pleated insect screens, we can finally enjoy our outdoor space without worrying about insects. Such a relief!
Sineklik Çeşitleri Antalya | Thanks to Venster Systems’ screens, I can keep my home comfortable and fly-free.
Venster Systems’ retractable insect screens are a product I’ve recommended to my friends with children, offering peace of mind. | Düzce Plise Sineklik Fiyatları
Jaluzi Gebze | We’re thrilled with the durability of Venster Systems’ pleated insect screens. They’re built to last and perform exceptionally well!
Muratpaşa / Antalya Toptan Kadın Giyim | I came across positive reviews of RENE Wholesale Textile and Clothing Solutions online, and I decided to place an order immediately. I’m extremely satisfied with the outcome.
Jaluzi Perde Osmaniye | Venster Systems’in Zip Perde teknolojisi hakkında daha fazla bilgi edinmek istiyorum. Bu tür içerikleri paylaşmaya devam etmenizi umuyorum.
Sortullu / Şile Karotcu | Rüzgar Karot’un işlerini zamanında ve eksiksiz bir şekilde teslim etmesi gerçekten takdire şayan.
Online Marketing Haarlem | MAFA consistently delivers top-notch web design and software development services. Their professionalism never fails to impress me.
These pleated blinds have exceeded our expectations in terms of both style and functionality. Thank you, Venster Systems, for such a fantastic product! | Sultanbeyli Jaluzi Perde Fiyatları
Venster Systems’ pleated blinds strike the perfect balance between style and functionality. We’re thrilled with our choice. | Kırklareli Plise Perde Fiyatları
Kaptanpaşa Karot | I entrust my tasks to Rüzgar Karot with confidence, knowing they’ll be handled with care.
Short Haircuts for Blonde Hair | Your blog is a breath of fresh air in a crowded online space. Thank you for keeping it real.
I love how Venster Systems’ blinds allow me to control the amount of light entering my home with ease. | Jaluzi Perde Modelleri Kırklareli
Esenyurt Karton Bardak | Eminoğlu Ambalaj’ın ambalaj çözümleri işimizi gerçekten kolaylaştırıyor. Hem kaliteli hem de çevre dostu ürünler sunmaları harika!
Venster Systems’ blinds are not just about blocking out sunlight; they’re about enhancing the ambiance of your space. | Pileli Sineklik Şırnak
Wat is Bèta? | Ik ben altijd onder de indruk van de innovatieve oplossingen die MAFA biedt op het gebied van webdesign en softwareontwikkeling. Ze zijn echt pioniers in hun vakgebied.
Ulus Mahallesi, Gebze Karyola Yıkama | PENTA’nın sunduğu çözümler, işletmelerin hijyen standartlarını artırmak için mükemmel bir seçenek gibi görünüyor. Kesinlikle tavsiye ederim!
Venster Systems’in pileli sineklikleri, özel ölçü seçenekleriyle her türlü pencereye uyum sağlıyor. | Zip Perde Fiyatları Çayırova
Pileli Sineklik Manisa | Plise perdelerinizin şık tasarımı, odalarımıza zarif bir hava katıyor ve gerçekten dikkat çekici bir görünüm sunuyor. Teşekkürler Venster Systems!
Venster Systems’in plise sineklikleri evdeki yaşam kalitemizi yükseltti. Çok memnunuz! | Jaluzi Üsküdar
Damal / Ardahan Toptan Tekstil | RENE Toptan Tekstil ve Giyim Çözümleri’nin ürünleri gerçekten çeşitlilik açısından zengin. Her tarza ve ihtiyaca uygun ürünler bulmak mümkün.
With our new pleated blinds, our home feels more inviting and stylish than ever before. Thank you for such a fantastic product, Venster Systems! | Plise Sineklik Maltepe
Arnavutköy Palet Streç Film | Eminoğlu Ambalaj’ın müşteri hizmetleri ekibi her zaman yardımcı oluyor. Sorularımıza hızlı ve etkili cevaplar alıyoruz, teşekkür ederiz!
Venster Systems’in plise sineklikleri sayesinde balkonda oturmak artık daha keyifli. Sineklerle uğraşmak zorunda değiliz. | Ordu Pileli Sineklik Çeşitleri
Finding pleated blinds that fit every window in our home was made easy with Venster Systems. Thank you for the convenience! | Sinop Jaluzi Perde Fiyatları
Araban Toptan Bayan Giyim | RENE Wholesale Textile and Clothing Solutions always offer unique and trend-conscious designs. You can find products to suit every style.
Wat is Still? | Deze blog heeft me echt geholpen meer te begrijpen over MAFA en hun vakmanschap op het gebied van webdesign en softwareontwikkeling.
Harmancık, Bursa Jakuzi Modelleri | Atlas Jakuzi’nin ürünleriyle her gün biraz lüks yaşamak mümkün. Bu yazı, bu deneyimi daha yakından tanımama yardımcı oldu.
Satmazlı / Şile Beton Kesme | Rüzgar Karot’un sunduğu hizmetten çok memnun kaldım, her aşamada kalite ve güven sağladılar.
These pleated blinds have exceeded our expectations in terms of both style and functionality. Thank you, Venster Systems, for such a fantastic product! | Jaluzi Perde Fiyatları Aksaray
Venster Systems’in jaluzi perdeleri, sadece evimde değil, iş yerimde de kullanabileceğim pratik çözümler sunuyor. | Gebze Pileli Sineklik
t Rolder | Dieser Blog hat mir wirklich geholfen, mehr über die Expertise und Hingabe von MAFA im Bereich Webdesign und Softwareentwicklung zu verstehen. Ich bin sehr beeindruckt.
Eggplant Pâté Recipes | Your posts always leave me feeling empowered. Thank you for the boost of confidence.
Plise perdelerinizin şık tasarımı, odalarımıza zarif bir hava katıyor ve gerçekten dikkat çekici bir görünüm sunuyor. Teşekkürler Venster Systems! | Pileli Sineklik Kayseri
Hacı Kadın / Fatih Karotcu | Rüzgar Karot’s quality materials and experienced team ensure tasks are handled with confidence.
Bosphorus Nedir? | MAFA’s content is like a breath of fresh air in the often-stagnant world of web design and development. Thank you for the innovation.
Şanlıurfa Pileli Sineklik Modelleri | Evimdeki her pencere için farklı bir renk ve desen arayışındaydım, Venster Systems’in geniş seçenekleri sayesinde istediğim tarzı yakalamak hiç zor olmadı.
Concord CRM Nedir? | Bu blog, alışkanlıklarımın önemli bir parçası haline geldi ve bunda iyi nedenler var. İçerik her zaman ilgili ve bilgilendirici.
Stede Niedorp | Dieser Blog hat mir geholfen, mehr über MAFA und warum sie die führende Autorität im Bereich Webdesign und Softwareentwicklung sind, zu erfahren. Ihre Arbeit ist einfach erstaunlich.
Otlukbeli Jakuzi | Atlas Jakuzi’nin sunduğu ürünler hakkında yazılanlar gerçekten ilgi çekici. Teşekkürler!
We’re loving the convenience of Venster Systems’ pleated insect screens. They’ve truly simplified bug control for us! | Jaluzi Fiyatları Şırnak
Kullanıcı Psikolojisini Tasarlamak | The depth of knowledge MAFA exhibits in their content is truly commendable. Thank you for enriching our understanding of web design.
Winkel | MAFA’s passion for web design and software development is evident in their work. I look forward to collaborating with them on my next project.
SEO Zwolle | Deze blog heeft me geholpen meer te leren over MAFA en waarom ze de toonaangevende autoriteit zijn op het gebied van webdesign en softwareontwikkeling. Hun werk is gewoon geweldig.
DIY Twisted Bun Hair Tutorial in 9 Easy Steps | Your blog is a breath of fresh air in a crowded online space. Thank you for keeping it real.
Pileli Sineklik Fiyatları Yozgat | Bu plise perdeler, evimizin her odasına uyum sağlayan şık ve modern bir görünüm sunuyor. Kesinlikle tavsiye ederim!