{"id":714,"date":"2017-04-27T00:00:00","date_gmt":"2017-04-27T00:00:00","guid":{"rendered":"https:\/\/nl1g1e2381-staging.onrocket.site\/seeking-truth-in-networking-from-testing-to-verification\/"},"modified":"2025-02-19T10:08:47","modified_gmt":"2025-02-19T18:08:47","slug":"seeking-truth-in-networking-from-testing-to-verification","status":"publish","type":"post","link":"https:\/\/www.forwardnetworks.com\/blog\/2017\/04\/27\/seeking-truth-in-networking-from-testing-to-verification\/","title":{"rendered":"Seeking Truth in Networking: From Testing To Verification"},"content":{"rendered":"\n<p>Sharp network admins already&nbsp;<em>verify<\/em>&nbsp;the network in a variety of ways, right? Pings, traceroutes, and custom scripts&nbsp;<em>verify&nbsp;<\/em>expected connectivity. Link and CPU utilization monitoring programs&nbsp;<em>verify<\/em>&nbsp;normal operation. Maybe pushed configs are read back in to&nbsp;<em>verify<\/em>&nbsp;that the device accepted them. And isn\u2019t verification just another term for testing, anyway?<\/p>\n\n\n\n<p><strong>No. Just like data ain\u2019t knowledge, testing ain\u2019t verification!<\/strong><\/p>\n\n\n\n<p>In this blog post, you\u2019ll see how stories from other technology domains (like semiconductors and software) point toward verification as a natural next step in any critical environment, especially one like networking.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Learning from history: verification in other domains<\/strong><\/h3>\n\n\n\n<p><strong>Computer Hardware<\/strong><\/p>\n\n\n\n<p>Manufacturing the next great computer chip begins with producing a mask set \u2013 a stack of extremely detailed circuit patterns \u2013 which is then used to etch circuits onto a silicon wafer. If a mistake slips into the circuit design and is discovered in the lab, a new mask set is required. This is a&nbsp;<strong>big<\/strong>&nbsp;deal! A single bad mask set can cost millions of dollars and delay a product\u2019s introduction.<\/p>\n\n\n\n<p>But even worse is when a bug escapes the lab and customers find it. Remember the&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Pentium_FDIV_bug\" target=\"_blank\" rel=\"noreferrer noopener\">Intel Pentium FDIV bug<\/a>? A rare but easy-to-replicate math error would corrupt data. Recalling and replacing the affected processors cost Intel nearly $500M.<\/p>\n\n\n\n<p><br>It\u2019s not as if Intel didn\u2019t test their chip in advance. Skilled engineers wrote unit tests to confirm the expected behavior of their own circuits, created integration tests to confirm the behavior of entire modules, and ran extensive test patterns in the lab. But it\u2019s extremely hard to think of every test worth running.<\/p>\n\n\n\n<p>What if \u2013 instead of having more people write more tests and yet still have to cross their fingers \u2013 we could write a program that would&nbsp;<em>know<\/em>&nbsp;whether the chip logic was correct, to avoid any errors like FDIV corruption?<\/p>\n\n\n\n<p>Such programs exist under the category of&nbsp;<em>verification<\/em>. One particular verification technique, called model checking, could have helped&nbsp;<a href=\"http:\/\/ieeexplore.ieee.org\/document\/545654\/\" target=\"_blank\" rel=\"noreferrer noopener\">avoid the FDIV error<\/a>. It works by representing circuit states in a graph and checking all possible transitions; the trick is to do this fast enough and on large enough circuits.&nbsp;In fact, this development was so important that it led to a&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Turing_Award\" target=\"_blank\" rel=\"noreferrer noopener\">Turing award<\/a>&nbsp;for its creators.<\/p>\n\n\n\n<p>Verification techniques can\u2019t prevent chip fabrication errors and won\u2019t design the chip for you, but they provide extremely valuable assurance that you can\u2019t get any other way.<\/p>\n\n\n\n<p><strong>Computer Software<\/strong><\/p>\n\n\n\n<p>In the software world, correctness is just as important, especially when a bug has real real-world effects. Think software that operates an&nbsp;<a href=\"https:\/\/raygun.com\/blog\/10-costly-software-errors-history\/\" target=\"_blank\" rel=\"noreferrer noopener\">airport baggage system<\/a>,&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Therac-25\" target=\"_blank\" rel=\"noreferrer noopener\">controls a radiation dosage<\/a>, or guides a rocket into space, like on June 4, 1996, when the Ariane 5 rocket made its maiden voyage to space.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-embed-handler wp-block-embed-embed-handler wp-embed-aspect-4-3 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Longer video of &#039;Ariane 5&#039; Rocket first launch failure\/explosion\" width=\"500\" height=\"375\" src=\"https:\/\/www.youtube.com\/embed\/gp_D8r-2hwk?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>Thirty-seven seconds into the flight, the rocket exploded. &nbsp;Why? &nbsp;To determine the rocket\u2019s position, an inertial measurement unit adds up sensor values over time. &nbsp;When this code path experienced an integer overflow \u2013 but had no way to recover from it \u2013&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Cluster_(spacecraft)#Launch_failure\" target=\"_blank\" rel=\"noreferrer noopener\">the rocket lost control<\/a>.<\/p>\n\n\n\n<p>The ESA extensively tested their software in advance and even re-used known-good guidance code from the Ariane 4. &nbsp;But this rocket followed a slightly different flight trajectory, and the engineers had missed testing with sensor inputs that would have matched the rocket\u2019s actual flight profile.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cTesting shows the presence, not the absence of bugs\u201d<br><a href=\"https:\/\/en.wikiquote.org\/wiki\/Edsger_W._Dijkstra\" target=\"_blank\" rel=\"noreferrer noopener\">Dijkstra<\/a><\/p>\n<\/blockquote>\n\n\n\n<p>The standard approach to testing software is familiar \u2013 test code by itself, test it with other pieces, and then test with realistic inputs to all the pieces bundled together. &nbsp;Beyond this, many developers employ&nbsp;<em>coverage<\/em>&nbsp;tools to measure the completeness of their testing, to confirm that all lines of code are actually tested. But in all likelihood, even coverage testing would not have revealed this software fault; even systematic full-line-coverage testing can miss key input sequences.<\/p>\n\n\n\n<p>What if a program could trace through every execution path in a program, for every possible input, to confirm the absence of this bug?<\/p>\n\n\n\n<p>Software verification programs exist, and&nbsp;<a href=\"http:\/\/www.cas.mcmaster.ca\/~baber\/TechnicalReports\/Ariane5\/Ariane5.htm\" target=\"_blank\" rel=\"noreferrer noopener\">Ariane 5 became a poster child for them<\/a>. &nbsp;One example, static analyzers, verify that entire classes of problems are absent from a piece of software, including data leaks, race conditions, infinite loops, and \u2013 relevant to not just rocket builders \u2013 unhandled overflows.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong><br>Network Testing Today<\/strong><\/h3>\n\n\n\n<p>A single network testing error can have real-world consequences, too. &nbsp;Remember when a network configuration change&nbsp;<a href=\"https:\/\/aws.amazon.com\/message\/65648\/\">brought down Amazon EBS and much of the Internet<\/a>? &nbsp;Just look through the news for examples of planes, trains, and stock-trading terminals brought to a standstill by network issues. &nbsp;Why are we not doing enough to keep companies off the front page?<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Limitations of today\u2019s testing and how to move past them<\/strong><\/h4>\n\n\n\n<p><em>Partial Coverage vs Complete Coverage<\/em><br><a href=\"https:\/\/www.staging4.forwardnetworks.com\/wp-content\/uploads\/2017\/04\/Partial-Coverage-vs-Complete-Coverage.svg\"><\/a><\/p>\n\n\n\n<p>Network testing today is partial. &nbsp;We run pings, traceroutes, and maybe custom monitoring tools, but do these tests accurately reflect&nbsp;<em>all&nbsp;<\/em>the real paths used by all the applications? &nbsp;You don\u2019t know and you can\u2019t even measure how much you\u2019re missing.<\/p>\n\n\n\n<p><em>Indirect Evidence vs Direct Proof<br><\/em><a href=\"https:\/\/www.staging4.forwardnetworks.com\/wp-content\/uploads\/2017\/04\/Indirect-Evidence-vs-Direct-Proof.svg\"><\/a><br>Network testing today is indirect. &nbsp;We check utilizations and counters against thresholds, but is a real problem present? &nbsp;Over-threshold links, ping failures, and non-zero drop counters can trigger for normal transient behavior, while under-threshold links may actually be unavailable for application traffic. &nbsp;It\u2019s important to not just identify that the network may be broken, but point out where and why so you can actually fix it.<\/p>\n\n\n\n<p><em>Single-element view vs Whole-Network View<br><\/em><a href=\"https:\/\/www.staging4.forwardnetworks.com\/wp-content\/uploads\/2017\/04\/Single-element-view-vs-Whole-Network-View.svg\"><\/a><br>Another common tactic is device-level conformance checking. &nbsp;An auditing tool gives a big thumbs up to a configuration change to shift traffic, because it matches the master template. It\u2019s better than nothing, but it doesn\u2019t answer your question: will the network behave, end-to-end, as it should?<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong><br>The Aha Moment: Verifying Networks<\/strong><\/h3>\n\n\n\n<p>At this point, your spider-sense should be tingling. When other industries experienced critical, front-page errors, it spurred the development of new ways to address the limitations of human-driven testing. We\u2019re now in a world where&nbsp;<a href=\"http:\/\/www.computerweekly.com\/news\/2240242478\/Satya-Nadella-Every-business-will-be-a-software-business\" target=\"_blank\" rel=\"noreferrer noopener\">every business is becoming a software business<\/a>&nbsp;and the network is a crucial enabler.<\/p>\n\n\n\n<p>Other industries acted.&nbsp;<a href=\"https:\/\/www.youtube.com\/watch?v=Ho239zpKMwQ\" target=\"_blank\" rel=\"noreferrer noopener\">We can act too<\/a>. We\u2019ll focus on one specific technology.<\/p>\n\n\n\n<p><strong>Network Data-plane Verification<\/strong>&nbsp;provides a way to test that the intent of the network matches its implementation, as gleaned from forwarding state. For example, you can verify that all hosts in a subnet can reach each other on any port, despite being spread across multiple data centers. &nbsp;You can confirm general properties, like freedom from loops. &nbsp;You can even use this to catch future-outage-causing issues, like both sides of a link having different VLANs defined. Unlike today\u2019s testing, it is comprehensive, it is direct, and it analyzes end-to-end behavior.<\/p>\n\n\n\n<p>Why did it take so long to get here, when other industries had verification tools 20 years ago?<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>One possible answer: Systems that aren\u2019t clearly defined. RFCs and vendor manuals are not formal, unambiguous specifications; interactions&nbsp;<em>between&nbsp;<\/em>protocols are frequently undefined.<\/li>\n\n\n\n<li>A second possible answer: Overwhelming diversity. Who has the time to model the multi-dimensional matrix of protocols, OSes, devices, and versions in every modern network?<\/li>\n\n\n\n<li>A third possible answer: Modern networks are huge. Any analysis must scale to an enormous number of potential paths and packets.<\/li>\n<\/ul>\n\n\n\n<p>We know the answer \u2013 all of the above, and more! \u2013 because we\u2019ve lived these challenges for the last few years to bring the\u00a0<a href=\"\/forward-enterprise\/\" title=\"\">Forward Platform<\/a>\u00a0to market. \u00a0Ambiguity and diversity are addressed by careful testing, and now we support a majority of the world\u2019s networking equipment. For scaling, like in other verification contexts, the key is to build a model that captures important behavior details, but abstracts away the rest; we represent packets with wildcards, using the\u00a0<a href=\"https:\/\/www.usenix.org\/node\/162856\" target=\"_blank\" rel=\"noreferrer noopener\">Header Space Analysis framework<\/a>. \u00a0A range of practical challenges pop up too; I highly recommend\u00a0<a href=\"http:\/\/cacm.acm.org\/magazines\/2010\/2\/69354-a-few-billion-lines-of-code-later\/fulltext\" target=\"_blank\" rel=\"noreferrer noopener\">A Few Billion Lines of Code Later<\/a>, which describes these for static program analysis.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>At Forward Networks, verification is at the core of what we do and how we think about networking.<\/p>\n<\/blockquote>\n\n\n\n<p>Traditional testing will always have its value, and verification is not a panacea for all network issues. &nbsp;But it offers something uniquely valuable: confidence that your network actually works as intended. &nbsp;<\/p>\n\n\n\n<p>I hope this blog gave you a glimpse of that, and introduced some fascinating studies in engineering failures in the process.<\/p>\n\n\n\n<p>Many thanks to those who contributed to this post, including Andi Voellmy, Nikhil Handigol, and Siva Radhakrishnan.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Learn the modern way that companies&nbsp;stay off the front page<\/strong><\/h4>\n\n\n\n<p><a href=\"\/request-a-demo\/\" title=\"\">Experience a demo of the Forward Platform<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Sharp network admins already&nbsp;verify&nbsp;the network in a variety of ways, right? Pings, traceroutes, and custom scripts&nbsp;verify&nbsp;expected connectivity. Link and CPU utilization monitoring programs&nbsp;verify&nbsp;normal operation. Maybe pushed configs are read back in to&nbsp;verify&nbsp;that the device accepted them. And isn\u2019t verification just another term for testing, anyway? No. Just like data ain\u2019t knowledge, testing ain\u2019t verification! In [&hellip;]<\/p>\n","protected":false},"author":9,"featured_media":715,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","inline_featured_image":false,"footnotes":""},"categories":[17],"tags":[],"ppma_author":[27],"class_list":["post-714","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"acf":[],"aioseo_notices":[],"authors":[{"term_id":27,"user_id":9,"is_guest":0,"slug":"brandonheller","display_name":"Brandon Heller","avatar_url":{"url":"https:\/\/www.forwardnetworks.com\/wp-content\/uploads\/2023\/08\/brandon-heller.webp","url2x":"https:\/\/www.forwardnetworks.com\/wp-content\/uploads\/2023\/08\/brandon-heller.webp"},"author_category":"","user_url":"","last_name":"Heller","first_name":"Brandon","job_title":"","description":"Brandon Heller is a co-founder of and CTO at Forward Networks. Brandon holds a Ph.D from Stanford University in Computer Science."}],"_links":{"self":[{"href":"https:\/\/www.forwardnetworks.com\/wp-json\/wp\/v2\/posts\/714","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.forwardnetworks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.forwardnetworks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.forwardnetworks.com\/wp-json\/wp\/v2\/users\/9"}],"replies":[{"embeddable":true,"href":"https:\/\/www.forwardnetworks.com\/wp-json\/wp\/v2\/comments?post=714"}],"version-history":[{"count":3,"href":"https:\/\/www.forwardnetworks.com\/wp-json\/wp\/v2\/posts\/714\/revisions"}],"predecessor-version":[{"id":3653,"href":"https:\/\/www.forwardnetworks.com\/wp-json\/wp\/v2\/posts\/714\/revisions\/3653"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.forwardnetworks.com\/wp-json\/wp\/v2\/media\/715"}],"wp:attachment":[{"href":"https:\/\/www.forwardnetworks.com\/wp-json\/wp\/v2\/media?parent=714"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.forwardnetworks.com\/wp-json\/wp\/v2\/categories?post=714"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.forwardnetworks.com\/wp-json\/wp\/v2\/tags?post=714"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.forwardnetworks.com\/wp-json\/wp\/v2\/ppma_author?post=714"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}