Fastly Outage

by pcr0on 6/8/2021, 9:57 AMwith 694 comments

by lpmitchellon 6/8/2021, 10:00 AM

This seems to be impacting a number of huge sites, including the UK government website[0].

[0] https://www.gov.uk/

https://m.media-amazon.com/

https://pages.github.com/

https://www.paypal.com/

https://stackoverflow.com/

https://nytimes.com/

Edit:

Fastly's incident report status page: https://status.fastly.com/incidents/vpk0ssybt3bj

by austinjpon 6/8/2021, 10:18 AM

Yeah so it's been mentioned in the comments already, but to everyone in Fastly right now: I feel for you. Something like this must be insanely stressful, and not just during the outage. There will be (should be) a massive post-mortem. People will be losing sleep over this for days, weeks, months.

:(

Edit: There seems to be a major empathy outage in this thread. Disgusted but not surprised, unfortunately.

by iso1631on 6/8/2021, 10:05 AM

https://easydns.com/blog/2020/07/20/turns-out-half-the-inter...

The whole idea of the internet was a distributed network impervious to most attacks.

The reality is that a single failure can knock out 90% of the services people use.

by mrzoolon 6/8/2021, 10:04 AM

Why is this a link to the Fastly homepage, where absolutely no information is provided?

This is the page that should be linked:

https://status.fastly.com

by baroslon 6/8/2021, 10:04 AM

I didn't know so many sites were depending on Fastly. Stack Overflow, GitHub, reddit, .... Even pip is unavailable. My development workflow is completely janked up. It is a bit scary that we are putting too many eggs in one basket.

by csmattryderon 6/8/2021, 10:00 AM

Here's the status page incident for this.

https://status.fastly.com/incidents/vpk0ssybt3bj

by optiomal_isgoodon 6/8/2021, 10:23 AM

Amazon.com was completely broken here (Europe) and they're back, I was observing from where the assets were loaded from and they switched from EU to NA as a failover. Homework well done.

by creamyhorroron 6/8/2021, 10:00 AM

basically the internet is down

reddit, stackoverflow, github, paypal, pypi, twitter, twitch, NYT, CNN, BBC, the Guardian...

edit: wow, even Amazon.com relies on Fastly for some of its edge caches!

by atymicon 6/8/2021, 10:26 AM

This has got to be even bigger than when cloudflare went offline, in terms of big companies affected. Clearly they have way more F500 customers than CF.

Good luck to the on call engineers!

by omkon 6/8/2021, 1:02 PM

This outage made me realize that github is served over a single IP address (A record) for my point of origin (India). Stackoverflow has 4 A record listing, but all of these belong to fastly.

The internet is designed for redundancy. Wonder why these companies don't have a fail over network. Makes me wonder if cost is factor considering their already massive infra. But a single point of failure ... <confused>.

by k_on 6/8/2021, 10:50 AM

Update: The issue has been identified and a fix is being implemented. Posted Jun 08, 2021 - 10:44 UTC

Seems like this is being resolved; curious to see the details afterwards

(from https://status.fastly.com/incidents/vpk0ssybt3bj)

by permbon 6/8/2021, 10:27 AM

Made my alpine linux docker builds fail as well (varnish) - but shouldn’t it use a mirror when the primary download site is gone?

fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKIN... fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/... ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.12/main: temporary error (try again later)

by ClearAndPresenton 6/8/2021, 10:06 AM

What conclusions can we draw about concentrating web content in a few CDNs?

by oneeyedpigeonon 6/8/2021, 10:31 AM

Good marketing for Fastly! I had no idea so much of the internet relied on it...

by threeseedon 6/8/2021, 10:03 AM

Shopify's CDN is down.

Which is causing $15+ million in lost product sales for every hour of outage.

Not to mention the loss of any new customers.

by Haydos585x2on 6/8/2021, 10:05 AM

Such a huge number of sites. It seems like it's mostly US based sites and Australians are okay. Sending good vibes to whatever poor person is on support right now.

by jujodion 6/8/2021, 10:52 AM

Would be fascinating if Fastly is not be able to use GitHub, Travis, Terraform, pip, etc. to deploy their fix

by csomaron 6/8/2021, 10:05 AM

So I'm wondering where in the "hundreds of servers around the world" did they exactly go wrong.

This happened with Cloudflare before too. I think we are a little too dependent on these services.

by alexchamberlainon 6/8/2021, 11:20 AM

Stupid question: why didn't sites "just" fail over to their actual servers to handle the traffic, albeit slowly? I guess they won't be sized to handle the load in a lot of cases, and Fastly was responding, so DNS fail over didn't work?

by sjaakon 6/8/2021, 10:42 AM

Perhaps Fastly is simply taking their commitment to reducing CO2 seriously? Three hurrays for the climate!

by snookdebookon 6/8/2021, 10:43 AM

I gave it about 10 tries, and it seems a very small percentage of transactions do go through.

A decent number of tries is rejected right at the Varnish front door:

< HTTP/2 503 < server: Varnish < retry-after: 0 < date: Tue, 08 Jun 2021 10:11:41 GMT < x-varnish: 271470009 < via: 1.1 varnish < fastly-debug-path: (D cache-bma1666-BMA 1623147101) < fastly-debug-ttl: (M cache-bma1666-BMA - - -) < content-length: 450 < Service Unavailable Guru Mediation: Details: cache-bma1666-BMA 1623147101 271470009

Many more reach some backend system that just dumps "connection failure":

< HTTP/2 502 < content-type: text/plain; charset=utf-8 < content-length: 18 < connection failure

And a tiny few do get through:

< HTTP/2 200 < content-type: text/html; charset=UTF-8 < cache-control: max-age=0, must-revalidate < date: Tue, 08 Jun 2021 10:11:43 GMT < via: 1.1 varnish < vary: accept-encoding < set-cookie: ...snip... < server: snooserv < content-length: 275036 < <!doctype html><html>...snip...

by pimterryon 6/8/2021, 10:14 AM

This is one of the things that excites me about IPFS: in a world of decentralized data storage, yes self-hosting and control over your data is nice and all, but serious resilience to most random infrastructure outages is a much bigger deal.

It's still early days, but I'm hopeful that it can provide a real solution to today's CDN centralization.

by aero-glide2on 6/8/2021, 10:18 AM

isitdownrightnow.com is down

by DoreenMicheleon 6/8/2021, 10:27 AM

I'm having intermittent Reddit issues, as one more data point.

I'm grateful for HN. I rebooted my computer. I thought it was my device and then saw this on my phone while rebooting.

by monkeyduston 6/8/2021, 10:29 AM

Just occurring to me how CDNs are a major point of failure now for the internet

by unfuncoon 6/8/2021, 10:14 AM

Amazon being down surely points to something other than Fastly being the cause?

by Jamie9912on 6/8/2021, 9:59 AM

Yep, seems like:

Reddit BBC News Twitch.tv Twitter emoji cdn?

are all down 503 service error

by kyproon 6/8/2021, 11:04 AM

Some people are claiming online that this is a cyber attack. I contract for the UK Gov and I'm hearing reports that traffic is going through the roof right now.

Anyone know if there is any legitimacy to this?

by cph-won 6/8/2021, 11:05 AM

I did not realise fastly adoption was so wide-spread. Can anyone more enlightened tell my why or have some resource on which use-cases fastly is superior to other CDNs such as CloudFlare?

by simonbarker87on 6/8/2021, 10:13 AM

how will their devs fix it if stackoverflow has gone down?!

by lyspon 6/8/2021, 10:14 AM

This incident affects: Europe (Amsterdam (AMS), Dublin (DUB), Frankfurt (FRA), Frankfurt (HHN), London (LCY)), North America (Ashburn (BWI), Ashburn (DCA), Ashburn (IAD), Ashburn (WDC), Atlanta (FTY), Atlanta (PDK), Boston (BOS), Chicago (ORD), Dallas (DAL), Los Angeles (LAX)), and Asia/Pacific (Hong Kong (HKG), Tokyo (HND), Tokyo (TYO), Singapore (QPG)).

by modshaterealityon 6/8/2021, 7:46 PM

This post is suspiciously ranked much lower than it should be (1216 points, 9 hours ago), lower than posts with < 100 points.

by sleepyshifton 6/8/2021, 9:59 AM

Looks like this has taken out Reddit at least.

by optiomal_isgoodon 6/8/2021, 10:42 AM

FWIW, Fastly ~8 hours ago (3am UTC) reported another incident: https://status.fastly.com/incidents/1glxxb8sf2zv and deployed a fix—either the fix made it worse or wasn't sufficient to mitigate the problem.

by marmot777on 6/9/2021, 12:55 AM

I think the honorable thing would be for them to have a statement easily findable.

So many companies sweep this sort of things under the rug if it’s only customer data that’s been breached. If they can’t sweep they have a high priced PR agency do the communicating.

I do not trust companies who handle things this way.

by ZoomStopon 6/8/2021, 10:40 AM

The outage has already been added to the Fastly Wikipedia page

by choulton 6/8/2021, 10:31 AM

My money is on an expired internal certificate or CA.

by dkarpon 6/8/2021, 10:10 AM

Before the "Error 503 Service Unavailable" messages appeared, there were a few minutes where the error was a single line:

    connection failure
Not sure if that provides anyone here with more insight into what might have caused this!

by tommooron 6/8/2021, 10:26 AM

Hands up if you're also here after being woken up by downtime alerts on the west coast

by i386on 6/8/2021, 3:04 PM

Anyone want to talk about half the internet going out because one provider couldn’t keep their service up instead of SO jokes and feels for the engineers? the entire internet is like a stack of cards from the protocol to the economic model.

by gansaion 6/8/2021, 10:26 AM

wouldn't websites have alternate CDN's managing their traffic, why should they have a single point of failure ?

I was assuming there are couple of services like Fastly and companies might have architected keeping in mind the alternatives too, I guess.

by fagnerbrackon 6/8/2021, 10:04 AM

https://dashboard.stripe.com/ is down https://github.com/ is defaced

by fullstackwifeon 6/8/2021, 10:59 AM

No mention of outage on https://status.cloud.google.com/, and I wonder why, because apparently this is a GCP problem.

by mschuster91on 6/8/2021, 10:07 AM

Ah yes, the wonders of centralized internet infrastructure.

Let's use a handful of providers for everything, they said. It will be cheaper, they said. It will be easier to manage, they said.

And it was cheaper, until downtimes began to affect more and more sites when central SPOFs got hit.

And I wonder how much of that need for these centralized SPOFs actually comes from the sheer absurd amount of bloat, ads, code and assets that sites these days "have" to deliver to the customer. I 'member times when pages had 100kb total size, loaded in an instant and were perfectly usable.

by evougaon 6/8/2021, 10:34 AM

Since Fastly’s own website is currently down:

What is fastly? Why are a huge number of web sites dependent on them? They are some kind of web host for companies that don’t want to run their own servers/data centers?

by devops000on 6/8/2021, 10:28 AM

BTC/USD is down too.

by ysaviron 6/8/2021, 10:38 AM

Tangential question, but with services like these, is there a known way to handle failure gracefully? Some way to automatically bypass these services if they are known to be down?

by sergiomatteion 6/8/2021, 10:23 AM

Yikes, seems like a massive outage.

EDIT: Hexdocs is down, elixir-lang.org is down

by angledon 6/8/2021, 10:46 AM

None of the ES/NQ/RTY/YM futures contracts took kindly to the outage! This could have had a much wider financial impact. Most seem to have recovered now.

by asicspon 6/8/2021, 10:01 AM

Related thread: https://news.ycombinator.com/item?id=27432397

by hypnoscriptoon 6/8/2021, 10:06 AM

Looks like fastly.com uses fastly…

by mcintyre1994on 6/8/2021, 10:01 AM

Do they have an official status page? Googling gets https://docs.fastly.com/en/guides/fastlys-network-status which is 503

Edit: Elsewhere in the comments: https://status.fastly.com/incidents/vpk0ssybt3bj

by devops000on 6/8/2021, 10:19 AM

Hacker News is the only one UP!

by john37386on 6/8/2021, 10:46 AM

It should be resolve soon. From fastly status page:

The issue has been identified and a fix is being implemented. Posted 1 minute ago. Jun 08, 2021 - 10:44 UTC

by willvarfaron 6/8/2021, 10:41 AM

https://www.bbc.com/news/technology-57399628 is rendering and reporting on the story, but BBC itself was down at the start of the outage, with the same 503 varnish error message.

Presumably the BBC has some kind of fallback in place.

The journalists ought interview their own techies :)

by jchandraon 6/8/2021, 10:35 AM

https://www.greenhouse.io/ down as well.

by hestefiskon 6/8/2021, 11:36 AM

The Guardian summarised this as well: https://www.theguardian.com/technology/2021/jun/08/massive-i...

by perinoon 6/8/2021, 10:07 AM

Anything hosted on Firebase seems to be down

by easytigeron 6/8/2021, 10:23 AM

I will NEVER understand why people put so much trust in single provider solutions for anything critical.

by vfclistson 6/8/2021, 10:07 AM

What happens when there is excessive centralization.

I thought that one of the principles behind the Internet is to be able to reroute around failures, but neither these service providers nor their clients ever seem to learn.

I guess in their mind that only applies to packet routing not services. SMH

by MrGilberton 6/8/2021, 10:28 AM

Interestingly, https://www.fastly.com/ works for me, whereas https://fastly.com/ doesn't.

by Omnious58on 6/8/2021, 10:32 AM

I was wondering why my Tidal app just stopped mid song and won't connect, after much googling and absolutely no help or even notifications from Tidal explaining there's an issue it seems this outage is the culprit. Bugger.

by diveanonon 6/8/2021, 11:01 AM

Time to develop CDN for CDNs.

It seems like a pattern that CDN have overly centralized the web and lead to issues like this.

Maybe its time to build a CDN that distributes your static assets to multiple CDNs and has a set of fallback states for service outtages.

by tfaron 6/8/2021, 10:03 AM

https://flutter.dev/ and https://fastlane.tools/ as well.

by Dobbson 6/8/2021, 11:03 AM

I got a push notification from the CNN app telling me a bunch of the internet was down due to a cloud provider. I clicked the link only for the app to open to a 503. In hindsight not surprising, but quite amusing.

by misnomeon 6/8/2021, 10:01 AM

pypi.org, but not https://status.python.org/ - I'm impressed that they actually hosted the status page differently!

by lopatinon 6/8/2021, 10:20 AM

Their status page keeps claiming that my region, Chicago (ORD), is either Degraded Performance, or Operational. But clearly it's down. Is fuzzing metrics like this how they hit their SLA targets?

by abhiminatoron 6/8/2021, 10:46 AM

Looks like they're currently applying a fix.

https://status.fastly.com/incidents/vpk0ssybt3bj

by montagon 6/8/2021, 10:20 AM

It's funny, I searched Twitter for "Ebay down" and the top result was an Ebay tweet with some not coincidentally broken Twitter emoji SVGs (as another person mentioned)...

by thegingeron 6/8/2021, 10:04 AM

GitHub? I had some issues, checked the service status page said no issues, but images were returning a 503. Maybe they host their service status page elsewhere including using fastly.

by monkeyduston 6/8/2021, 10:16 AM

Pretty bad www.gov.uk is down as more services move to digital.

by plasmaon 6/8/2021, 10:16 AM

I briefly saw an output error about "domain not found" when hitting fastly.com, wonder if some list of domains has hit a limit/flushed/etc.

by fareeshon 6/8/2021, 3:55 PM

How does one design a system that has a redundancy for when the CDN goes down? Paying for more than one CDN is probably too expensive isn't it?

by grumpleon 6/8/2021, 11:06 AM

Good job Fastly for getting the issue identified and resolved so quickly. < 1 hour to identify, <13 minutes to fix (assuming status is accurate).

by an0n4uon 6/8/2021, 10:03 AM

numpy docs, too. i think it's cloudflare related as well. at least, I keep seeing some cloudflare errors interpolated with the 503 varnish error.

by MyOnePieceon 6/8/2021, 10:43 AM

Quick question if the cdns are down why cant traffic be routed to the web servers the central web servers the company owns ?

I thought cdns had fallback configured ?

by _kyranon 6/8/2021, 10:47 AM

Those of you that work in DevOps, SRE or are CTOs.

What kind of things do you put in place to manage these kind of centralised issues that are beyond your control?

by devops000on 6/8/2021, 10:15 AM

Heroku is down https://dashboard.heroku.com/

by JCWasmx86on 6/8/2021, 11:02 AM

>The issue has been identified and a fix has been applied. Customers may experience increased origin load as global services return.

Is fixed

by Nilefon 6/8/2021, 10:01 AM

Ironically, even this Outage page is out for me

by ur-whaleon 6/8/2021, 10:16 AM

Wow, talk about a brutal SPOF, most of the things I had planned to work with today are broken: reddit, github, stack overflow.

by taosxon 6/8/2021, 10:46 AM

I̶n̶ ̶r̶o̶m̶a̶n̶i̶a̶ ̶e̶v̶e̶r̶y̶t̶h̶i̶n̶g̶ ̶s̶e̶e̶m̶s̶ ̶b̶a̶c̶k̶ ̶t̶o̶ ̶n̶o̶r̶m̶a̶l̶.̶.̶.̶?̶

Edit: nope, just worked for 2-3 requests (10 secs)

by anotheryouon 6/8/2021, 10:53 AM

Looks fixed: https://downdetector.com/

by jl6on 6/8/2021, 10:43 AM

Worrying that this is impacting so many dev toolchains and services, which will hinder the ability to respond to the issue.

by timviseeon 6/8/2021, 10:19 AM

This seems to be a bigger issue. BGP failure?

by _kyranon 6/8/2021, 10:46 AM

Things seem to have come back online in Australia, although not sure if that's just sites switching over their DNS?

by LightGon 6/8/2021, 10:48 AM

"The internet will just route around a local / centralised problem ... like water around an object"

Obligatory LOL ...

by graphmanon 6/8/2021, 10:11 AM

Firebase Dynamic Links is affected too. Checking the IP looks like they are using Fastly which is quite surprising.

by taurathon 6/8/2021, 10:20 AM

I’ve noticed lots of social media content is tied to this - Reddit and Twitter images and some videos, for one.

by loriverkutyaon 6/8/2021, 10:48 AM

The issue has been identified and a fix is being implemented. Posted 3 minutes ago. Jun 08, 2021 - 10:44 UTC

by ilakshon 6/8/2021, 10:37 AM

Let's make all of the main internet sites dependent upon one central private service. Great idea guys.

by artembugaraon 6/8/2021, 11:01 AM

Seems like another single point of failure. What is a solution to not be affected by such an outage?

by toongon 6/8/2021, 10:57 AM

It is time to remove that "100% uptime guarantee" claim from the website :grimacing:

by classicflavouron 6/8/2021, 10:48 AM

My work's website is down too and the regular sites I use to escape work borderm

by gansaion 6/8/2021, 10:49 AM

Fastly is back now. (The issue has been identified and a fix is being implemented.)

by pattyjon 6/8/2021, 10:53 AM

It would be interesting to see estimations on the man-hour cost of this outage.

by mothersheshaon 6/8/2021, 10:20 AM

Got the same here (Australia)

by johnstonnorthon 6/8/2021, 10:00 AM

rubygems.org affected too

by vincentmarleon 6/8/2021, 10:36 AM

Well I know where to go next time if I were to be a Russian hacker

by clawphantomon 6/8/2021, 10:27 AM

Twitch isn’t working and not responding and also the web dashboard

by luke2mon 6/8/2021, 10:49 AM

When this happens to cloudflare, it will be even more impactful.

by colesantiagoon 6/8/2021, 10:59 AM

Looks like Fastly did not work as advertised, very misleading.

by reuben_scrattonon 6/8/2021, 11:10 AM

I'm sure it's just a coincidence that today is Patch Tuesday.

:-|

by zwirblon 6/8/2021, 10:49 AM

Spotify is also hit, though it still works without images

by ddtayloron 6/8/2021, 10:40 AM

Someone must have 51% attack the Pied Piper blockchain!

by vlan121on 6/8/2021, 12:04 PM

Damn, I thought I cloud blame myself or the provider..

by ronyfadelon 6/8/2021, 10:01 AM

Ten Percent Happier is down, and now my day is ruined.

by fsnowdinon 6/8/2021, 10:23 AM

just had my own site down because of this. glad to see it wasn't my fault lol but good luck to the Fastly people on fixing the issue.

by clawphantomon 6/8/2021, 10:28 AM

Twitch isn’t responding and also the web dashboard

by 8K832d7tNmiQon 6/8/2021, 10:09 AM

That explains why I couldn't access reddit

by navanchauhanon 6/8/2021, 10:02 AM

No wonder, The Verge and NYT are down too.

by rich_sashaon 6/8/2021, 10:20 AM

www.python.org down as well, with the shortest of messages: 'connection failure'. Probably related?

by NewLogicon 6/8/2021, 10:06 AM

Even amazon.com styling is borked for me

by dilawaron 6/8/2021, 10:03 AM

I think reddit in India is down as well.

by JosephKon 6/8/2021, 10:56 AM

Extremely long call, but what are the chances this turns out connected to the raids on organised crime using the An0m app that started today?

by john37386on 6/8/2021, 10:52 AM

It's probably a DDoS attack.

by dragosbulugeanon 6/8/2021, 10:22 AM

And all Webflow sites it seems...

by alixaxelon 6/8/2021, 10:02 AM

Indeed, part of GitHub (.io) too.

by ur-whaleon 6/8/2021, 10:19 AM

Looks like HN is working ;-)

by jfnyon 6/8/2021, 10:48 AM

Do companies really not run test suites / do manual testing before deploying to production?

by timetosleepon 6/8/2021, 10:55 AM

Seems to be back online

by rvzon 6/8/2021, 10:04 AM

Basically everything is broken. "Centralising Everything" huh

by dragosbulugeanon 6/8/2021, 10:22 AM

All Webflow sites?

by mlnjon 6/8/2021, 10:00 AM

StackOverflow too.

by schappimon 6/8/2021, 10:02 AM

Parts of Shopify

by ur-whaleon 6/8/2021, 10:18 AM

Looks like an SRE team rolled out buggy software.

by rottc0ddon 6/8/2021, 11:11 AM

github is back online. SSO too.

by rayluson 6/8/2021, 10:14 AM

Whew, DevOps fire alarms are going off!

by rayluson 6/8/2021, 10:05 AM

github.com is pretty broken

by schappimon 6/8/2021, 10:00 AM

SMH.com.au

by heavyduston 6/8/2021, 10:48 AM

the problem has been fixed

by heavyduston 6/8/2021, 10:06 AM

reddit.com is affected too

by alexannicon 6/8/2021, 10:01 AM

cnn.com is down as well.

by cwenon 6/8/2021, 10:56 AM

A real-world Chaos experiment!

by cdevon 6/8/2021, 11:08 AM

it seems to be up now

by magicturtleon 6/8/2021, 10:03 AM

reddit down aswell

by Metacelsuson 6/8/2021, 10:18 AM

I first noticed that xkcd was down. Then I went to post about it on reddit . . . also down! Good thing HN is up.

by nindalfon 6/8/2021, 10:00 AM

Taken out xkcd as well.

by pts_on 6/8/2021, 10:01 AM

Are these sites on the same cloud or CDN?

by colesantiagoon 6/8/2021, 11:09 AM

Also, why has this been allowed to happen? Billions of dollars lost because of this one company?

I don't understand this.

by ramraj07on 6/8/2021, 10:10 AM

For a moment I thought all of Western internet was cut off from India. Says how siloed my browsing habits are!

by raphaeljon 6/8/2021, 10:17 AM

Couldn't be happier I moved https://noisycamp.com to BunnyCDN.com.

by TheRealDunkirkon 6/8/2021, 10:47 AM

Every other comment about what's down in this thread -- as if we needed dozens of site-by-site accountings of this outage in the first place -- is a bitch about reddit. Why is reddit so important to this crowd? The specific topics I used to read the site for (half a dozen years ago) have all been overrun by "bucket people," there is literally never an answer to any question I find a google link to there, and the site's design is actively user-hostile. Seriously: what's keeping that place afloat? Porn, I suppose.