Why unemployment sites crash but Netflix doesn’t

Some of the government'smost important websites are crashing when we need them the most. More than 22 million people have filed for unemployment in the last month, an unprecedented number driven by the global coronavirus outbreak. Now Congress has put asidean extra 250 billion dollars to handle the new applicants, but as people go to thestate level systems to file, a lot of those websitesare just timing out. - Some say that applyingfor unemployment benefits is nearly impossible. - The state computer systemis having some trouble. - They need to fix the website. - This isn't how the internet usual works. Services like Netflix and Zoom have seen a huge surge in traffic too,but aside from a few hiccups, you'd never know the difference. Most web engineers planto be able to handle ten times the regular trafficwithout breaking a sweat. But government systemsdon't work that way. And it's surprisinglyhard to shift them over. A lot of that is becauseof the backend programming, most of which is written in acoding language called COBOL that dates all the way back to the 50s. But to understand whythey're still using COBOL and why it's such aproblem, you have to see how these sites were originally built. And most importantly, you haveto look at the big picture. The story of COBOL starts in 1959, way before personalcomputers or the internet. A corporation or universitymight have a computer network, but you were really onlygoing to run programs within your specific system.


 So each network developedslightly different rules and it became really hardto transfer programs or data from one network to another. So a group of engineersincluding legendary Navy programmer GraceHopper, started working on a common programming language that could bridge those networksand be the main language for businesses going forward. They called it the Common Business Oriented Language, or COBOL. By the 70s, COBOL was the standard. If you were managing ahuge database system, you wrote all your code in COBOL. And that dominance is a big part of why it's still in use today. This is by no means a dead language. It's something that certainly millions, possibly billions offinancial transactions rely on COBOL on a daily basis. - If you want to switch off COBOL, you basically have to start from scratch. So a lot of people just stuck with it. It also locks you into a particular kind of server architecture. Running COBOL code meantyou were running everything off a handful of serverson your internal network. When it was developed,that was the only option. And even later there werereal advantages to it. You could teach your server special tricks for handling your specific kind of data. And deploy programs to the whole network without having to install themon every specific machine. But it was also putting a lot of weight on that one server. If that server goes down,the whole network goes down. And if you try to bring in a replacement, you'll need to teach itall those special tricks. But when the internet happened, you suddenly had to worry aboutkeeping your service running in the face of huge shifts in usage and constant code updates. That meant treating your servers in a completely different way. As engineers started to put it, they're not pets anymore,now they're cattle. When you've got 50 servers running, it doesn't matter ifone of them goes down. You just bring in another one and you make sure they're all so dumb and interchangeablethat you can cycle them in and out without anyone noticing. You don't train them, you just herd them. And because these are global web services, that also means you candistribute your herd all around the world, scaling up or down depending on how many people are visiting the site that morning.


With cloud hosts like Amazon Web Services or Microsoft Azure, you don't even need to buy a whole server. You can just rent one percent of a server for a few hours, just to make it through that morning's spike in demand. Name any online service that's launched in the last 20 years. They basically all workon the cattle model. That means lots ofbasically disposable servers cycling in and out. But a lot of these stateunemployment systems have been runningcontinuously for 40 years, processing thousands ofapplications every week, all on COBOL. They never switched overto disposable servers. Which makes it hard to processthe kind of traffic surge that YouTube of Netflixwould take in stride. It's not that COBOL is abad programming language, but it locks you into a badway of managing your network. It forces you to treatyour servers like pets. And because switching offof COBOL is so much work, a lot of government systemshave never been able to make the leap to the cattle model. - It's incredibly difficultto even find workers who know COBOL. The language is old and some of the people still fluent in it are even older, with many approaching retirement age. This has become a recipe for disaster in states that still operate under COBOL. Governors like New Jersey's Phil Murphy have called for programmersto come out of retirement to help maintain theiroverwhelmed systems.


You can't really move aCOBOL program to the AWS cloud. So it just sits there getting older and a little harder to maintain each year. Programmers called this technical debt. And if you aren't spendingmoney on upgrades every year, it piles up fast. - For more than 10 years,the federal government has been pressuringstate Medicaid programs to update their aging systems. They've been handingthem large sums of money to modernize, but it'sstill an enormous lift. - Before these folks retired, many of them had been fired, they'd been laid off. And then they'd actuallybeen brought back in in crisis moments to fix andupgrade the COBOL systems, which ideally they shouldhave just been kept on to maintain the entire time. - The real problem is,we just haven't been spending money maintaining these systems. We haven't wanted to or we thought we could skate by without it. And then when millionsof people suddenly need unemployment checks, the entire system is buried in technical debt. It's a hard lesson, butif we want the reliability that we expect from web services, we're gonna have to pay for it. Thanks for watching. If you want to know more about COBOL and this whole saga, bycolleague Makena Kelly wrote a great article in the description. And let us know in the comments if there's anything else youthink we should be covering. 

No comments