Was anyone else surprised how little disruption they personally experienced? I had braced for impact that weekend. But all my flights were perfectly on time, all my banking worked, providers worked, and sites & resources were available.
I don’t know if I somehow just have little exposure to Windows in my life or if there’s an untold resiliency story for the global internet in the face of such a massive outage.
All I can say is THANK YOU to all the unsung heroes who answered the call and worked their butts off. Infrastructure doesn’t work without you. We see you & we thank you!
This is a good writeup, but to be fair it's just not a matter of banking regulations. Basically all big companies are under similar obligations regarding endpoint protection.
Any explanation that doesn't boil this down to "software required by corporate policy checklist not written by technical team" is almost certainly missing something here. This is almost definitionally policy capture by a security team and the all too common consequences that attach.
The section that goes over why this wasn't federally pushed is largely accurate, mind. Not all capture is at the federal level. Is why you can get frustrated with customer support for asking you a checklist of unrelated questions to the problem you have called in.
And the super frustrating thing is that these checklists are often very effective for why they exist.
Just as a general comment on this whole affair:
This would be the third incident I'm familiar with of a file of entirely zeroes breaking something big.
Folks, as much as we wish it weren't true, null comes up all the damn time, and if you don't have tests trying to force-feed null into your system in novel and exciting ways, production will demonstrate them for you.
Never assume 'zero' (for whatever form zero takes in context) can't be an input.
> This created a minor emergency for me, because it was an other-than-minor emergency for some contractors I was working with.
> Many contractors are small businesses. Many small businesses are very thinly capitalized. Many employees of small businesses are extremely dependent on receiving compensation exactly on payday and not after it. And so, while many people in Chicago were basically unaffected on that Friday because their money kept working (on mobile apps, via Venmo/Cash App, via credit cards, etc), cash-dependent people got an enormous wrench thrown into their plans.
I never really thought about not having to worry about cashflow problems as a privilege before, but it makes sense, considering having access to the banking system to begin with is a privilege. I remember my bank's website and app were offline, but card processing was unaffected - you could still swipe your cards at retailers. For me, the disruption was a minor annoyance since I couldn't check my balance, but I imagine many people were probably panicking about making rent and buying groceries while everything was playing out.
While reading this I was struck with an interesting question: What risk does any particular software vendor pose to an industry at large?
For example (making up numbers here): if 75% of all airline computers have croudstrike falcon installed that seems like a very concentrated risk.
I actually wouldn't be surprised if we had this we would see really high concentrations of a small number of vendors in any industry.
Did it really hit banks hard? Core banking systems don't run windows, they run on mainframes typically on IBM z/OS. I know it hit the financial firms hard and knocked out their trading systems but I don't know of any major bank losing their core bank system due to crowdstrike.
Australia got hit hard because they modernized their bank systems and now most are cloud based. I am not aware of any major bank running their core systems on the cloud or on windows.
> Configuration bugs are a disturbingly large portion of engineering decisions which cause outages
I work in medical device software -- the stuff that runs on machines in hospital labs, ER's or at patient bedside.
The first "ohmigod do we need to recall this?" bug I remember was an innocuous piece of code that was inserted to debug a specific problem, but which was supposed to be disabled in the "non-debug" configuration.
Then somehow, the software update shipped with a change to the configuration file that enabled that code to run. Timing-critical debug code running on a real-time system with a hard deadline is a recipe for disaster.
Thankfully, we got out of that pretty easily before it affected more than a small handful of users, but things could have been a lot worse.
The article specifically mentions US banks and as I personally didn't see any disruption over here - is there (anec)data on how popular CrowdStrike is in the US vs the EU?
I feel like this only impacted the larger banks. I've heard absolutely no explosion noises coming from smaller institutions. The effect of regulations and their enforcement is felt differently across the spectrum.
There is something to be said for a diverse banking industry when it comes to this kind of problem. Also, this event is a powerful argument for keeping the core systems on unusual mainframe architectures. I think building a bank core on windows would be a really bad choice, but some vendors have already done this.
Regulations are a big reason why this happened, sure, but also it hit the companies with great security budgets more.
Hospitals, for instance, weren't that widely affected as they barely have any money to buy security tooling.
Silver linings and all that, I guess.
We blame car manufacturers for defects from suppliers, but we don't blame platform manufacturers (Microsoft) for holes in their architecture?
> For historical reasons, that area where almost everything executes is called “userspace.”
It's an old term at this point, but I don't think the reasons for it being called "userspace" have changed or become outdated since then, so I wouldn't call them historic per se.
The takeaway from this article seems to be: buy crowdstrike shares, because major corps are unable to make any changes, and will continue to pay licensing fees for this "service" for the foreseeable future.
I'm still amazed how the blame shifted from Microsoft to CrowdStrike. Yes, CrowdStrike update caused that -- but applications fail all the time. It was Microsoft's oversight to put it on Windows critical path.
And banks/airlines etc were hit hard because their _Windows_ didn't boot, not because of an application crash on a perfectly working Windows.
> Another way is if it has recently joined a botnet orchestrated from a geopolitical adversary of the United States after one of your junior programmers decided to install warez because the six figure annual salary was too little to fund their video game habit.
Fictional statements like this make me reluctant to read further, and ignore source of such "news" in the future.
I like the technical stuff here.
I'm not so sure about this:
> money is core societal infrastructure, like the power grid and transportation systems are. It would be really bad if hackers working for a foreign government could just turn off money.
Sure, it would be inconvenient in the short term. But I think the current design is holding us back.
I suspect that most of us would have more to gain than to lose if we managed to shut off money-as-we-know-it and keep it off for long enough to iterate on alternatives. Any design that even tried to step beyond "well that's how we've always done it" would likely land somewhere better than what we're doing. Much has changed since Alexander Hamilton.
Maybe the IT departments at the affected orgs take solace in the fact that so many other orgs had issues that the heat is off - but in my opinion this was still a failure of IT itself. There's no reason that update should have been pushed automatically to the entire fleet. If Crowdstrike's software doesn't give you a way to rollout updates on a portion of your network before the entire fleet, it shouldn't be used.