table of contents

We live in tumultuous, but interesting times. The rich have gotten richer, the poor—poorer and innovators have devised innovative ways to work through the disruption that has been brought about by the coronavirus-induced pandemic. The pandemic has also brought about a battery of changes to our lifestyle, beginning with many of us learning how to cook complex dishes from scratch, others finding newer hobbies, or even spending time to learn something new about ourselves. During the pandemic, many of us have also finally found the time to curl up on our couches, turn into couch potatoes and binge-watch Netflix originals till we run out of bandwidth. 

Sudden surges

Although most services such as Netflix, Amazon Prime video and many other video and audio streaming service providers have highly scalable systems that can withstand sudden surges and spikes in usage, there are chances that these services might experience outages which can result in user frustration and in some extreme cases of long-term outages—abandonment of the platform too. Complex, large-scale distributed systems such as Netflix and Amazon Prime video and many others that potentially have millions of users must be tested effectively and extensively keeping in mind surges and spikes. 

However, unusually heavy spikes such as those caused by the pandemic have been unprecedented and have possibly not been in any company’s testing team kitty.

Netflix team

Continuous integration delivery and production

The problems of CI/CD and resolving the problems of constantly engaged systems

Companies like Netflix have constant updates to their system, which are continuously tested and delivered to their live platforms. For this, Netflix testing teams create hundreds of thousands of tester accounts every day, each being used in thousands of test scenarios to avoid any shortfalls.

This has caused the testing of Netflix to move from a manual testing regimen that would work on a test system before making it live to a large, distributed automated testing of Netflix client and server applications running at scale in production. To facilitate this, testing at Netflix has gone from a low-volume manual mode to a continuous, fully automated, voluminous mode where nothing is left to chance.

An imaginary scenario with real implications

Imagine this—you, and millions of others are at nail-biting, suspenseful climax in the story and suddenly—boom! Netflix is now offline. This would send alarm bells ringing at Netflix HQ and testing SWAT teams would suddenly fly in from your windows to analyse what went wrong. However, thankfully, this does not happen often.

The Goal

The goal at Netflix is simple—to be online for their users 99.99% of the time. Although Netflix has a pretty decent track record of staying online, they do occasionally encounter glitches that put the system off track. One of these incidents occurred when a development team at Netflix deployed software that impacted the large infrastructure at Netflix negatively, causing widespread disruption in services and thousands of unhappy customers.

This led to Netflix scrambling to create a fix that would essentially resolve the issue in few hours, but also gave Netflix some food for thought—that their testing regimen was inadequate and ineffective for such a large, distributed, user-facing system.

What could go wrong?

What happened at Netflix was an oversight on various levels. A new piece of code that was designed to clean up unused resources was actually being tested on the production server. This oversight caused two major problems due to bugs in the code:

  1. The first bug caused a dry run mode flag in cleanup that was to protect the actual cleanup to be interpreted incorrectly—reversing its effect. This was caused to a poorly written unit test that could have caused this issue to be caught in development.
  2. The second bug was in a piece of code that checked if a resource was actually unused. The conclusion of this check overlooked some cases that existed only in production.

The combination of these two bugs caused a removal of key resources in production—resulting in the actual outage at Netflix.

Preventing these problems

Preventing or reducing the incidents of these problems leads to a common dilemma

Should testing be done in a test environment or in a production environment? Although most of us would advocate testing to be done in pre-production so that actual customers are not impacted, some would advocate testing in production to ensure that code is running well in both test and prod. The reality of the scenario is that the code should be tested in all three situations: dev, test and prod. The challenge faced by Netflix was to devise an effective methodology that helps in deciding why, when and how to test in these environments.

This also led to another set of questions

  • Is the test environment a safe and complete mirror of our production environment?

OR

  • Is the test environment the latest build with features that others might need to integrate with?

The result of this was the common scenario of having overtly complex and numerous test environments. 

The answer

The answer to this problem that was creating from thinking of a fix to the existing problem was simple—end-to-end automation that would replicate thousands of scenarios without problems. 

This answer, however, came with its own set of problems—finding a scalable solution to creating a production-like pre-production environment that does not require cloning production entirely and resulting in a massive investment requirement as well.

Another problem was that pre-production and production usage patterns could be completely different from each other. Traffic is also thousands of times less when compared to production. 

Testing payments

Testing payments was another colossus altogether. Instead of testing payments in production using real money, it is better to create fake MOPs and fake transactions exercised on them in sandbox accounts that does not overburden the existing payment systems in place. 

Netflix testing payment

The approach

Of the thousands of possible approaches, Netflix chose production capture and replay to scale their test to be as close as possible to prod.

A large number of requests from customer devices was taken from persistence and duplex-replayed them in test after they were stripped of their personally identified information. This caused tests to become real-world scenarios. This also helped in identifying numerous corner-case bugs that were previously unknown.

The bugs identified were routed back into functional and integrated tests via a schema. This also helped in gaining confidence on quality feature migration and helped to accelerate change velocity. This also gave way to an interesting learning:

All the basic duplex tests could be run in PRODUCTION through tester accounts. However, prod capture and replay duplex tests were limited to the test environment because replaying in production would harm actual customer data with reissue of requests.

Netflix owner

Hastings says. “And instead tragically it is a biological one, so everybody is locked up and we had the greatest growth in the first half of this year that we ever had.” With a market capitalization of around US$230 billion, it has been vying with Walt Disney since March for the title of the world’s most valuable entertainment group.

Masked and refreshed data could safely be used to replay requests in the test environment after a time delay. This focused our interest on the data set and not the production environment. Although this was not totally as stable as production, but gave us a good idea of how it could be.

Failing is important in testing. Failures help test teams to identify real issues in downstream implementations. To mitigate this, all functional validations were to run real canaries in production, essentially exposing a small percentage of actual customer traffic to both versions of the API under test.

Running canary analysis algos on the metrics that were gathered from these implementations and a compare-verify regimen would check if client and server metrics were equivalent. This would help to capture failing request logs from the canaries and would help to debug and triage issues better. 

Learnings

Learnings from such an approach are manifold. 

  • The first one would be to understand that test and prod are different, but their differences must be embraced to utilize the capability of both.
  • Although testing is good in a sandboxed environment, testing in production is important for such implementations.
  • Solving the problems in either environment can go a long way in ensuring test success
  • Stay on the lookout for rethinking your testing strategy. Even if it may come at an extra cost, the end result would be worth it.
  • Find a pragmatic testing shape that is right for your company—do not look for a textbook shape that fits in.
  • Start production simulation and chaos experiments—these will help to validate your functional and resiliency testing capabilities for the future.

At Netflix, chaos testing is done at scale in production. Testing everything from fire raining from the sky to aliens killing their servers, they leave nothing to chance. If they haven’t, why should you? The testing teams at Volumetree are experienced, reliant and know where to hand out the red flags. Give your software the quality edge it needs. Schedule a consultation with our test consultants today!

build your mobile app

 

post tags :

4,228 Comments

  1. metformin 850 mg buy online May 2, 2024 at 4:58 am - Reply

    [url=http://metforemin.online/]metformin 5000 mg[/url]

  2. cheap tadalafil 40 mg May 2, 2024 at 7:47 am - Reply

    [url=https://tadalafilu.online/]tadalafil tablets 2.5 mg[/url]

  3. ScottDon May 2, 2024 at 9:53 am - Reply

    Backlink pyramid
    Sure, here’s the text with spin syntax applied:

    Hyperlink Pyramid

    After several updates to the G search algorithm, it is essential to employ different methods for ranking.

    Today there is a method to capture the focus of search engines to your site with the assistance of backlinks.

    Backlinks are not only an powerful promotional instrument but they also have natural visitors, direct sales from these sources probably will not be, but transitions will be, and it is beneficial visitors that we also receive.

    What in the end we get at the final outcome:

    We present search engines site through backlinks.
    Prluuchayut natural click-throughs to the site and it is also a indicator to search engines that the resource is used by users.
    How we show search engines that the site is valuable:

    Backlinks do to the main page where the main information.
    We make backlinks through redirects credible sites.
    The most CRUCIAL we place the site on sites analytical tools distinct tool, the site goes into the memory of these analyzers, then the received links we place as redirections on weblogs, discussion boards, comment sections. This essential action shows search engines the site map as analysis tool sites present all information about sites with all keywords and headings and it is very GOOD.
    All details about our services is on the website!

  4. buy lisinopril 10 mg online May 2, 2024 at 9:56 am - Reply

    [url=http://lisinoprildrl.online/]lisinopril in mexico[/url]

  5. buy valtrex online australia May 2, 2024 at 11:03 am - Reply

    [url=http://valtrexv.com/]valtrex over the counter uk[/url]

  6. lisinopril in mexico May 2, 2024 at 11:21 am - Reply

    [url=http://lisinoprilgp.online/]lisinopril 20 tablet[/url]

  7. JamesLic May 2, 2024 at 11:40 am - Reply

    Продажа квартир https://novostroykihome.ru/ и недвижимости в Казани по выгодной стоимости на официальном сайте застройщика. Жилье в Казани: помощь в подборе и покупке новых квартир, цены за квадратный метр, фото, планировки.

  8. Stevenhes May 2, 2024 at 5:58 pm - Reply

    Написание курсовых работ https://courseworkskill.ru/ на заказ быстро, качественно, недорого. Сколько стоит заказать курсовую работу. Поручите написание курсовой работы профессионалам.

  9. cooking May 2, 2024 at 8:21 pm - Reply

    It is best to take part in a contest for the most effective blogs on the web. I’ll advocate this site!

  10. ScottDon May 2, 2024 at 10:15 pm - Reply

    Creating exclusive articles on Medium and Telegraph, why it is necessary:
    Created article on these resources is enhanced ranked on low-frequency queries, which is very significant to get organic traffic.
    We get:

    organic traffic from search engines.
    organic traffic from the in-house rendition of the medium.
    The platform to which the article refers gets a link that is profitable and increases the ranking of the site to which the article refers.
    Articles can be made in any quantity and choose all less frequent queries on your topic.
    Medium pages are indexed by search algorithms very well.
    Telegraph pages need to be indexed individually indexer and at the same time after indexing they sometimes occupy places higher in the search engines than the medium, these two platforms are very beneficial for getting visitors.
    Here is a hyperlink to our offerings where we offer creation, indexing of sites, articles, pages and more.

  11. what's the best online pharmacy May 2, 2024 at 11:45 pm - Reply

    [url=https://bestmedsx.online/]canadian pharmacy viagra 100mg[/url]

  12. automotive performance metrics May 3, 2024 at 12:32 am - Reply

    Fantastic goods from you, man. I’ve keep in mind your stuff previous to and you’re simply too wonderful. I actually like what you have obtained right here, really like what you’re saying and the way during which you assert it. You’re making it enjoyable and you continue to care for to keep it wise. I cant wait to read far more from you. That is really a great web site.

  13. RichardGlunc May 3, 2024 at 12:49 am - Reply

    Квартиры с ремонтом в новостройках https://kupitkvartiruseychas.ru/ Казани по ценам от застройщика.Лидер по строительству и продажам жилой и коммерческой недвижимости.

  14. ScottDon May 3, 2024 at 2:01 am - Reply

    link building
    Backlink creation is just just as effective at present, simply the resources to work in this field possess altered.
    There are numerous options regarding backlinks, we use several of them, and these approaches operate and are actually tested by our team and our customers.

    Not long ago we carried out an trial and we found that low-volume searches from one domain name ranking nicely in search engines, and this does not have to be your own domain name, you can utilize social media from the web 2.0 range for this.

    It additionally it is possible to in part shift mass through web page redirects, providing a varied backlink profile.

    Go to our very own website where our offerings are typically provided with detailed explanations.

  15. [url=http://prednisoneiv.online/]prednisone in mexico[/url]

  16. can you buy synthroid online May 3, 2024 at 5:14 am - Reply

    [url=https://synthroidotp.online/]synthroid mcg[/url]

  17. synthroid brand 88 mcg May 3, 2024 at 6:01 am - Reply

    [url=https://isynthroid.com/]tsh synthroid[/url]

  18. tadalafil paypal May 3, 2024 at 7:34 am - Reply

    [url=http://tadalafilstd.com/]cheap tadalafil no prescription[/url]

  19. buy generic valtrex May 3, 2024 at 9:54 am - Reply

    [url=https://valtrexid.com/]cost of valtrex[/url]

  20. Rogermek May 3, 2024 at 11:04 am - Reply

    Почему посудомоечная машина https://kulbar.ru/2024/01/21/pochemu-posudomoechnaya-mashina-eto-neobhodimost-dlya-sovremennogo-doma/ необходимость для современного дома? Как использовать и как выбрать посудомойку?

  21. online pharmacy ordering May 3, 2024 at 2:39 pm - Reply

    [url=http://happyfamilystorerx.online/]foreign pharmacy no prescription[/url]

  22. AlvinGap May 3, 2024 at 3:42 pm - Reply

    Купить квартиру https://newflatsale.ru/ в новостройке: однокомнатную, двухкомнатную, трехкомнатную в жилом комплексе в рассрочку, ипотеку, мат. капитал от застройщика.

  23. prednisone 5mg coupon May 3, 2024 at 5:26 pm - Reply

    [url=http://prednisonexg.online/]prednisone 60 mg price[/url]

  24. best rx pharmacy online May 3, 2024 at 7:21 pm - Reply

    [url=https://happyfamilystorerx.online/]pharmacy order online[/url]

  25. RicardoNaicy May 3, 2024 at 10:32 pm - Reply

    Продажа квартир в Казани https://kupitkvartiruzdes.ru/ от застройщика. Большой выбор квартир. Возможность купить онлайн. Квартиры с дизайнерской отделкой.

  26. lisinopril 20 mg May 3, 2024 at 11:55 pm - Reply

    [url=http://olisinopril.online/]lisinopril online usa[/url]

  27. oborudovan_fssr May 4, 2024 at 2:20 am - Reply

    Оборудование диспетчерских центров [url=http://www.oborudovanie-dispetcherskih-centrov.ru/]http://www.oborudovanie-dispetcherskih-centrov.ru/[/url] .

  28. oborudovan_xhsr May 4, 2024 at 2:50 am - Reply

    мебель для диспетчерских залов [url=https://oborudovanie-dispetcherskih-centrov.ru/]https://oborudovanie-dispetcherskih-centrov.ru/[/url] .

  29. ScottDon May 4, 2024 at 2:56 am - Reply
  30. Justindus May 4, 2024 at 3:33 am - Reply

    обучение эксель – Обучение с гарантиями государственного университета.

  31. Justindus May 4, 2024 at 3:40 am - Reply

    курсы excel – Обучение с гарантиями государственного университета.

  32. Justindus May 4, 2024 at 3:47 am - Reply

    онлайн курс excel – Обучение с гарантиями государственного университета.

  33. Justindus May 4, 2024 at 3:51 am - Reply

    excel курсы – Обучение с гарантиями государственного университета.

  34. super pharmacy May 4, 2024 at 3:54 am - Reply

    [url=https://drugstorepp.online/]sky pharmacy[/url]

  35. Justindus May 4, 2024 at 4:13 am - Reply

    курс excel – Обучение с гарантиями государственного университета.

  36. Justindus May 4, 2024 at 4:21 am - Reply

    бесплатный курс по excel – Обучение с гарантиями государственного университета.

  37. Justindus May 4, 2024 at 4:42 am - Reply

    курсы excel онлайн – Обучение с гарантиями государственного университета.

  38. Zrlhhs May 4, 2024 at 5:51 am - Reply

    rosuvastatin pills hat – caduet calm caduet buy spend

  39. Justindus May 4, 2024 at 6:01 am - Reply

    эксель обучение – Обучение с гарантиями государственного университета.

  40. Justindus May 4, 2024 at 7:42 am - Reply

    эксель курс – Обучение с гарантиями государственного университета.

  41. azithromycin online australia May 4, 2024 at 9:33 am - Reply

    [url=https://azithromycinmds.com/]zithromax 1000 mg pills[/url]

  42. Donaldfoort May 4, 2024 at 11:56 am - Reply

    Купить квартиру в новостройке https://newhomesale.ru/ в Казани. Продажа новой недвижимости в ЖК новостройках по ценам от застройщика.

  43. ラブドール May 4, 2024 at 2:36 pm - Reply

    YLDOLL そのビデオはありますか?詳細を知りたいのですが。

  44. valtrex 500mg uk May 4, 2024 at 3:26 pm - Reply

    [url=http://valtrexv.com/]order valtrex onlines[/url]

  45. Josephsen May 4, 2024 at 7:11 pm - Reply

    Стальные трубчатые радиаторы Arbonia (Чехия) и Rifar Tubog (Россия) https://medcom.ru/forum/user/226934/ подходят как для частных домов, так и для квартир в многоэтажках.

  46. buy azithromycin 500mg uk May 4, 2024 at 9:56 pm - Reply

    [url=http://oazithromycin.online/]azithromycin online without prescription[/url]

  47. ScottDon May 4, 2024 at 10:07 pm - Reply

    link building
    Link building is just just as effective currently, only the instruments to operate in this field have shifted.
    You can find numerous choices regarding incoming links, our team utilize some of them, and these strategies function and have been tested by us and our clients.

    Recently our team carried out an experiment and we found that low-volume searches from just one domain name position effectively in search engines, and it doesnt require to be your own website, you are able to use social networks from web2.0 range for this.

    It additionally possible to partly transfer mass through site redirects, offering a diverse hyperlink profile.

    Visit to our web page where our own services are offered with comprehensive overview.

  48. synthroid 125 mcg tablet cost May 4, 2024 at 11:33 pm - Reply

    [url=https://synthroidsl.online/]cost of brand name synthroid[/url]

  49. DouglasDix May 4, 2024 at 11:41 pm - Reply

    подходят как для частных домов, так и для квартир в многоэтажках.

  50. HaroldCYHOK May 5, 2024 at 1:49 am - Reply

Leave A Comment

your ideal recruitment agency

view related content