table of contents

We live in tumultuous, but interesting times. The rich have gotten richer, the poor—poorer and innovators have devised innovative ways to work through the disruption that has been brought about by the coronavirus-induced pandemic. The pandemic has also brought about a battery of changes to our lifestyle, beginning with many of us learning how to cook complex dishes from scratch, others finding newer hobbies, or even spending time to learn something new about ourselves. During the pandemic, many of us have also finally found the time to curl up on our couches, turn into couch potatoes and binge-watch Netflix originals till we run out of bandwidth. 

Sudden surges

Although most services such as Netflix, Amazon Prime video and many other video and audio streaming service providers have highly scalable systems that can withstand sudden surges and spikes in usage, there are chances that these services might experience outages which can result in user frustration and in some extreme cases of long-term outages—abandonment of the platform too. Complex, large-scale distributed systems such as Netflix and Amazon Prime video and many others that potentially have millions of users must be tested effectively and extensively keeping in mind surges and spikes. 

However, unusually heavy spikes such as those caused by the pandemic have been unprecedented and have possibly not been in any company’s testing team kitty.

Netflix team

Continuous integration delivery and production

The problems of CI/CD and resolving the problems of constantly engaged systems

Companies like Netflix have constant updates to their system, which are continuously tested and delivered to their live platforms. For this, Netflix testing teams create hundreds of thousands of tester accounts every day, each being used in thousands of test scenarios to avoid any shortfalls.

This has caused the testing of Netflix to move from a manual testing regimen that would work on a test system before making it live to a large, distributed automated testing of Netflix client and server applications running at scale in production. To facilitate this, testing at Netflix has gone from a low-volume manual mode to a continuous, fully automated, voluminous mode where nothing is left to chance.

An imaginary scenario with real implications

Imagine this—you, and millions of others are at nail-biting, suspenseful climax in the story and suddenly—boom! Netflix is now offline. This would send alarm bells ringing at Netflix HQ and testing SWAT teams would suddenly fly in from your windows to analyse what went wrong. However, thankfully, this does not happen often.

The Goal

The goal at Netflix is simple—to be online for their users 99.99% of the time. Although Netflix has a pretty decent track record of staying online, they do occasionally encounter glitches that put the system off track. One of these incidents occurred when a development team at Netflix deployed software that impacted the large infrastructure at Netflix negatively, causing widespread disruption in services and thousands of unhappy customers.

This led to Netflix scrambling to create a fix that would essentially resolve the issue in few hours, but also gave Netflix some food for thought—that their testing regimen was inadequate and ineffective for such a large, distributed, user-facing system.

What could go wrong?

What happened at Netflix was an oversight on various levels. A new piece of code that was designed to clean up unused resources was actually being tested on the production server. This oversight caused two major problems due to bugs in the code:

  1. The first bug caused a dry run mode flag in cleanup that was to protect the actual cleanup to be interpreted incorrectly—reversing its effect. This was caused to a poorly written unit test that could have caused this issue to be caught in development.
  2. The second bug was in a piece of code that checked if a resource was actually unused. The conclusion of this check overlooked some cases that existed only in production.

The combination of these two bugs caused a removal of key resources in production—resulting in the actual outage at Netflix.

Preventing these problems

Preventing or reducing the incidents of these problems leads to a common dilemma

Should testing be done in a test environment or in a production environment? Although most of us would advocate testing to be done in pre-production so that actual customers are not impacted, some would advocate testing in production to ensure that code is running well in both test and prod. The reality of the scenario is that the code should be tested in all three situations: dev, test and prod. The challenge faced by Netflix was to devise an effective methodology that helps in deciding why, when and how to test in these environments.

This also led to another set of questions

  • Is the test environment a safe and complete mirror of our production environment?

OR

  • Is the test environment the latest build with features that others might need to integrate with?

The result of this was the common scenario of having overtly complex and numerous test environments. 

The answer

The answer to this problem that was creating from thinking of a fix to the existing problem was simple—end-to-end automation that would replicate thousands of scenarios without problems. 

This answer, however, came with its own set of problems—finding a scalable solution to creating a production-like pre-production environment that does not require cloning production entirely and resulting in a massive investment requirement as well.

Another problem was that pre-production and production usage patterns could be completely different from each other. Traffic is also thousands of times less when compared to production. 

Testing payments

Testing payments was another colossus altogether. Instead of testing payments in production using real money, it is better to create fake MOPs and fake transactions exercised on them in sandbox accounts that does not overburden the existing payment systems in place. 

Netflix testing payment

The approach

Of the thousands of possible approaches, Netflix chose production capture and replay to scale their test to be as close as possible to prod.

A large number of requests from customer devices was taken from persistence and duplex-replayed them in test after they were stripped of their personally identified information. This caused tests to become real-world scenarios. This also helped in identifying numerous corner-case bugs that were previously unknown.

The bugs identified were routed back into functional and integrated tests via a schema. This also helped in gaining confidence on quality feature migration and helped to accelerate change velocity. This also gave way to an interesting learning:

All the basic duplex tests could be run in PRODUCTION through tester accounts. However, prod capture and replay duplex tests were limited to the test environment because replaying in production would harm actual customer data with reissue of requests.

Netflix owner

Hastings says. “And instead tragically it is a biological one, so everybody is locked up and we had the greatest growth in the first half of this year that we ever had.” With a market capitalization of around US$230 billion, it has been vying with Walt Disney since March for the title of the world’s most valuable entertainment group.

Masked and refreshed data could safely be used to replay requests in the test environment after a time delay. This focused our interest on the data set and not the production environment. Although this was not totally as stable as production, but gave us a good idea of how it could be.

Failing is important in testing. Failures help test teams to identify real issues in downstream implementations. To mitigate this, all functional validations were to run real canaries in production, essentially exposing a small percentage of actual customer traffic to both versions of the API under test.

Running canary analysis algos on the metrics that were gathered from these implementations and a compare-verify regimen would check if client and server metrics were equivalent. This would help to capture failing request logs from the canaries and would help to debug and triage issues better. 

Learnings

Learnings from such an approach are manifold. 

  • The first one would be to understand that test and prod are different, but their differences must be embraced to utilize the capability of both.
  • Although testing is good in a sandboxed environment, testing in production is important for such implementations.
  • Solving the problems in either environment can go a long way in ensuring test success
  • Stay on the lookout for rethinking your testing strategy. Even if it may come at an extra cost, the end result would be worth it.
  • Find a pragmatic testing shape that is right for your company—do not look for a textbook shape that fits in.
  • Start production simulation and chaos experiments—these will help to validate your functional and resiliency testing capabilities for the future.

At Netflix, chaos testing is done at scale in production. Testing everything from fire raining from the sky to aliens killing their servers, they leave nothing to chance. If they haven’t, why should you? The testing teams at Volumetree are experienced, reliant and know where to hand out the red flags. Give your software the quality edge it needs. Schedule a consultation with our test consultants today!

build your mobile app

 

post tags :

4,228 Comments

  1. Bodrum günlük tekne kiralama December 8, 2023 at 8:57 am - Reply

    Hello, I think your blog migh be having browsesr compatibility issues.
    When I look at your blog in Ie, it looks fine but when opening in Internet Explorer, it has some
    overlapping. I just wanted to gkve yyou a quick heads
    up! Other then that, terrific blog!

    Feel free to surf to my web page; Bodrum günlük tekne kiralama

  2. IvyHok December 10, 2023 at 8:13 am - Reply

    [url=http://budesonide.cyou/]budesonide 250 mcg[/url]

  3. random team generator December 11, 2023 at 10:48 pm - Reply

    You’ve made some good points there. I looked on the net for more info
    abokut the issue and found most people will go along
    with your views on this website.

    My website: random team generator

  4. Ankara mutfak tadilatı December 12, 2023 at 5:37 am - Reply

    What i don’t realize iis actually how you’re no longer actually much more neatly-preferred
    than you may be right now. You are very intelligent. You recognize thereefore significantly in terms of this topic,
    produced me individually believe it from a lot of varied angles.
    Its like women and men don’t seem to be fascinated uness it is one thing to do
    with Girl gaga! Youur personal stuffs excellent. At all times maintain it up!

    Check out my web site – Ankara mutfak tadilatı

  5. Samandağ taksi December 14, 2023 at 10:56 pm - Reply

    An intriguing discussion is worth comment. There’s no doubt
    that that you should publish more on this issue, it might not be a
    taboo subject but typically folks don’t speak about these subjects.
    To the next! All the best!!

    Have a look at my blog – Samandağ taksi

  6. kasık ağrısı December 15, 2023 at 1:05 am - Reply

    Hello, I llog on to yolur blkgs on a regular basis. Yoour writing style is witty, keep it up!

    Also viit my web blog … kasık ağrısı

  7. tabela modelleri December 15, 2023 at 2:00 am - Reply

    Thank yyou for the auspicious writeup.It in fact was a amusement
    account it. Look advanced to more added agreeable from you!

    However, how could we communicate?

    my wweb page; tabela modelleri

  8. toy poodle December 15, 2023 at 2:12 am - Reply

    Nice answer back in return of this query with solid arguments and telling everything about that.

    Have a look at my homepage; toy poodle

  9. color wheel picker December 15, 2023 at 8:05 am - Reply

    Woah! I’m reaslly loving the template/theme of this website.
    It’s simple, yet effective. A lot of times it’s very
    difficult to get that “perfect balance” between usedr friendliness and visual appearance.
    I must say you’ve done a aaesome job with this. In addition,
    the blog loads very fast for me on Internet explorer.
    Outstanding Blog!

    Feeel free to visit my web age … color wheel picker

  10. Tesettür giyim December 16, 2023 at 7:25 am - Reply

    Great post.

    My web-site: Tesettür giyim

  11. işitme cihazı fiyatları hepsiburada December 16, 2023 at 7:43 am - Reply

    I love looking thhrough an artcle that will make men and women think.
    Also, many thanks for permitting me to comment!

    Visit my website – işitme cihazı fiyatları hepsiburada

  12. tlover tonet December 17, 2023 at 2:46 am - Reply

    Fantastic web site. Lots of useful information here. I¦m sending it to a few pals ans also sharing in delicious. And obviously, thank you in your effort!

  13. IvyHok December 17, 2023 at 7:58 am - Reply

    [url=https://augmentin.guru/]how much is augmentin[/url]

  14. best laser eye surgery Turkey December 17, 2023 at 4:34 pm - Reply

    Nice blog here! Also your site loads up vsry fast!
    Whhat host are you using? Can I get your affiliate link to yyour host?
    I wish my web site loaded up as fasst ass yours lol

    my blog best laser eye surgery Turkey

  15. AshHok December 19, 2023 at 12:56 pm - Reply

    [url=https://accutane.cyou/]where to buy accutane usa[/url]

  16. ads network December 19, 2023 at 4:49 pm - Reply

    The very root of your writing while appearing agreeable initially, did not work perfectly with me personally after some time. Someplace within the sentences you were able to make me a believer unfortunately just for a while. I nevertheless have a problem with your leaps in logic and you might do nicely to fill in those breaks. When you actually can accomplish that, I would certainly end up being fascinated.

  17. CurtisKiz December 20, 2023 at 10:31 am - Reply

    [url=https://diflucan.cyou/]diflucan 100 mg price[/url]

  18. elektronik sigara fiyatları December 21, 2023 at 3:02 pm - Reply

    We’re a group of volunteeers andd opening a new scheme in our community.
    Your web site provided us with valuable information to work on. You have done
    an impressive job and our entire community will be thankful to you.

    my site; elektronik sigara fiyatları

  19. Brent David Willis December 21, 2023 at 3:11 pm - Reply

    I like this web blog so much, bookmarked. “Nostalgia isn’t what it used to be.” by Peter De Vries.

  20. DavidDen December 22, 2023 at 3:12 am - Reply

    [url=https://vardenafil.cyou/]levitra 40 mg price[/url]

  21. branda metrekare fiyatı December 23, 2023 at 9:18 pm - Reply

    Hey there I am so thrilled I found your site, I really
    found you by mistake, whijle Iwwas browsing on Bing for something
    else, Regardless I am here now and wwould just like to say kudos for
    a fantastic post and a all round thrilling blog (I also
    love the theme/design), I don’t have time too go through it all aat the minute but
    I have bookmarked it annd als added yyour RSS feeds, so when Ihave time I will be back to read a lot more,
    Please do keep up the superb jo.

    my page branda metrekare fiyatı

  22. Wow, that’s what I was exploring for, what a data! existing
    here at this weblog, thanks admin of this web site.

    Here is my site: Пересадка волос в Стамбуле

  23. Kişisel Gelişim December 26, 2023 at 12:34 am - Reply

    He was an introvert that extroverts seemed to love. -Stetson Rosario

  24. IvyHok December 26, 2023 at 5:16 am - Reply

    [url=http://gabapentin.cfd/]can i buy neurontin over the counter[/url]

  25. ingilizce Kurslar December 27, 2023 at 6:52 am - Reply

    Goodd day! Would you mind if I share your blog with my myspace group?

    There’s a lot of folks that I think would really enjoy your content.
    Please let me know. Thank you

    Here is my page – ingilizce Kurslar

  26. At this time itt appeawrs like WordPress is the preferred blogging platform available right now.
    (from what I’ve read) Is that what you’re uusing on your
    blog?

    My blog post Beste Haartransplantation in der Türkei

  27. kkslot777 December 30, 2023 at 1:46 pm - Reply

    you are really a good webmaster. The web site loading speed is amazing. It seems that you’re doing any distinctive trick. Moreover, The contents are masterwork. you have performed a excellent job in this matter!

  28. kkslot777 December 30, 2023 at 2:07 pm - Reply

    he blog was how do i say it… relevant, finally something that helped me. Thanks

  29. Crwxvd December 31, 2023 at 1:42 pm - Reply

    does allegra require a prescription best generic allergy pills best allergy medicine without antihistamine

  30. Refrigeration system December 31, 2023 at 4:13 pm - Reply

    Greetings! Very useful advice within this post! It is thee
    little changes that make the biggest changes. Thanks a loot
    for sharing!

    Feel free to visit my website :: Refrigeration system

  31. IvyHok January 2, 2024 at 7:00 pm - Reply

    [url=https://doxycycline.cyou/]order doxycycline without prescription[/url]

  32. Kutu harf January 3, 2024 at 3:35 pm - Reply

    Hi there, I discovered your blog by way of Gogle whilst searching for a related subject, your website got here up, it seemms good.
    I have bookmarked it in my google bookmarks.
    Hello there, just became aware of your weblog thru Google,
    and located thwt it is really informative. I am going to be careful forr brussels.

    I’ll appreciate should you continue this in future.
    Lots of folks will likely bee benefited out of
    your writing. Cheers!

    Stop by my site :: Kutu harf

  33. 2. el amfi January 3, 2024 at 11:17 pm - Reply

    I simply could not go away your site prior to suggesting that
    I actually loved the usual info a person provide
    to your visitors? Is gonna be back frequently in order to check up on new posts

    Feel free too visi my web bllog – 2. el amfi

  34. coffee accessories January 4, 2024 at 12:58 pm - Reply

    A powerful share, I just given this onto a colleague who was doing somewhat analysis on this. And he in actual fact purchased me breakfast as a result of I found it for him.. smile. So let me reword that: Thnx for the treat! However yeah Thnkx for spending the time to debate this, I really feel strongly about it and love reading more on this topic. If possible, as you turn into experience, would you mind updating your blog with extra details? It is highly helpful for me. Big thumb up for this weblog post!

  35. Ueaczs January 4, 2024 at 11:21 pm - Reply

    sleeping pills online buy promethazine ca

  36. Slotbom88 January 6, 2024 at 10:26 am - Reply

    You made some first rate points there. I looked on the web for the issue and located most individuals will associate with together with your website.

  37. sleeping pills uk January 6, 2024 at 2:11 pm - Reply

    I gotta bookmark this site it seems very useful very helpful

  38. AshHok January 7, 2024 at 12:07 am - Reply

    [url=http://doxycycline.cyou/]how to buy doxycycline without a prescription[/url]

  39. Ftiuya January 8, 2024 at 7:05 am - Reply
  40. DavidDen January 9, 2024 at 2:17 am - Reply

    [url=http://levitra.cfd/]levitra bayer[/url]

  41. Nzdwbf January 10, 2024 at 8:47 am - Reply

    fast heartburn relief medicine order glucophage 1000mg pill

  42. latestModapks January 10, 2024 at 11:10 am - Reply

    Great article! It’s impressive to see how Netflix has prioritized QA in their growth and success. As a reader, I appreciate the transparency and willingness to share their experiences. It’s a great reminder of the importance of QA in any industry, and I’ll definitely be sharing this article with my own team.

  43. IvyHok January 11, 2024 at 8:04 am - Reply

    [url=http://ivermectin.guru/]ivermectin buy[/url]

  44. Erteleme January 11, 2024 at 10:43 am - Reply

    She can live her life however she wants as long as she listens to what I have to say. -Willow Cherry

  45. Aşırı iyimserlik nedir January 11, 2024 at 1:36 pm - Reply

    A good example of a useful vegetable is medicinal rhubarb. -Leonardo Morales

  46. İyimserlik Nedir January 11, 2024 at 1:47 pm - Reply

    Too many prisons have become early coffins. -Ximena Li

  47. Jsimnz January 12, 2024 at 8:09 am - Reply

    strongest acne over the counter cost isotretinoin cause of pimples in adults

  48. Werkzeugspindel ÜBerholung January 13, 2024 at 1:05 pm - Reply

    My brother suggested I might liuke this website. He was totally right.
    This post actually made my day. You can not imagine just how much time I had spent for this info!
    Thanks!

    Here is myy blog post; Werkzeugspindel ÜBerholung

  49. Xjebli January 15, 2024 at 3:22 am - Reply

    strongest antacid over counter order accupril 10 mg without prescription

  50. Imhaqb January 16, 2024 at 12:08 am - Reply

    accutane 10mg brand order isotretinoin 40mg generic order accutane 40mg online

Leave A Comment

your ideal recruitment agency

view related content