Oh, I know that problem, we did change Puppet root CA due to mishap of one of the admins during updating to sha256 certs. But IIRC (it was long time ago) Puppet CA cert by default are issued for like 10-20 years, would be a bit weird if true. Also, old versions didn't had trust chain "just" root CA so puppet master would have to have key for that on disk anyway, proper "root CA + leaf CA for puppetmasters" have been a thing for just few years in Puppet.
It would only be really problematic if they also lost SSH access to those machines using Puppet. If you have root access the fix is not exactly hard.
But then they fired people that did had access so that might also be a problem
We made sure all of our machines can be accesses both by Puppet and by SSH kinda for that reason; we had both accidents of someone fucking up Puppet, and someone fucking up SSH config rendering machines un-loggable (the lessons were learned and etched in stone).
So really, depending on who has access to what, it can be anything from "just pipe list of hosts to few ssh commands fixing it" to "get access manually to the server and change stuff, or redeploy machine from scratch". Again, assuming muski boy didn't fire wrong people
> Musk fired everyone with access to the private key to their internal root CA,
The way forward is to generate a new CA root certificate.
> and they can no longer run puppet because the puppet master's CA cert expired
They can reconfigure internal tools to use the new CA root certificate, or rather one of the signed intermediate certificates.
> and they can't get a new one because no one has access.
They can simply generate new CA root certificates, and sign or create new intermediate certificates.
> They no longer can mint certs.
Yes, they, can...
> My limited understanding in this area is that this is...very bad
No, it, is, not...
There are two immediate issues that come to mind.
* Twitter was so awful before, that it relied on people to safeguard the keys to the kingdom. This is very bad practice, and one of the many things Musk will no doubt be fixing. For any mission critical assets, and especially certificates, but also passwords... current modern day corporate practice is to have a secure ledger of these that can be accessed by the board of directors, the executive managers, and designated maintainers. At no point ever should the password be entrusted to anybody, but rather a "role" that functions as the one who has access. Say for example, the CIO/CTO and their subordinates.
* The Second issue is the one everyone is fixating upon, and that's firing important people who put the company at risk. This is a big issue, and certainly Musk could have done a better job of scoping out who represents a single-point-of- failure at twitter, eliminate that risk, and then proceed with the culling. In a modern enterprise no single person should be capable of putting the entire operation at risk. It's just that simple. So in a way, Musk accelerated what was probably inevitable at Twitter already. They were probably precariously close to destruction already, and now they can learn the hard way of not repeating these mistakes.
I'll take the rumor with a grain of salt, but can anyone unpack what the recovery plan would be for something like this? It would obviously be a big problem, but where would you even start?
The certificate for Twitter's hidden service expired a full week ago and they still haven't exchanged it.
Taking it with a pinch of salt, but this stuff does happen.
I've received calls from past employers, usually when they migrate a site I worked on to a new CMS or platform. There is some critical service (AWS, CDN credentials, domain related) etc. that no one knows who has access... Happily those appear to get resolved... but this... yikes (if true)
The really interesting part of this is what else is tied to that CA. If it’s just Puppet, it’s bad enough; internal PKIs have a habit of metastasizing into lots of other places, though, precisely because everything internal trusts them. Worst-case here is that some piece of the internals of the Twitter app relies on things from that CA—-for instance, it relies on packages to do app config changes or updates and the packages have to be signed from that chain or served from something with a cert from it. In that case they’d be hosed: you’d have to replace every copy of the Twitter app. Fairly unlikely, but wouldn’t be the first time I’ve seen it happen.
Beyond that, though: Internal build systems? Data encryption? User client auth to critical services? Internal app mTLS for data exchanges? The list of possibilities goes on and on…
It's pretty amazing that a formerly public company like Twitter had such shitty documentation/processes/infrastructure.
I thought SOX mandated this sort of internal controls - after all, Twitter basically seems to be full of infrastructure risks that would (and have) negatively impacted them financially in a material way.
No key access? Why didn't they print it out and stick it in a safe deposit box, which is what a couple of startups I've been with have done...along with a couple of other key pieces of paper. Physical backup.
very possibly bullshit but huge if true: https://twitter.com/davidgerard/status/1634633886712954881
Wonder if their servers all share a common NTP server/pool (that they control).
Since about 3-4 days I had issues with using tweetdeck, which doesn't load in Firefox with a key pinning failure. I'm not sure it's related, but it seems like to large of a coincidence not to.
That Mastodon server has a load time problem.. took a solid 30 seconds for me to load
If this is true - who knows - then it reflects rather badly on the people who were fired - as they didn't implement safeguard for a 'run over by a bus' scenario when they were in charge.
I think people who work in reliability see this type of thing as the real existential threat to twitter. It's unrealistic that a large infrastructure would fall over overnight, but what is very realistic is small problems being neglected until they become big problems, or multiple problems happening at the same time.
This alone is probably manageable, it might even be simple but painful to handle for 2-15 of twitters employees (pre-firing) with specialized knowledge. If 3 people knew the disaster recovery plan and they all got fired because they were so busy maintaining things and fighting fires that they failed to get good reviews by building things, well I wouldn't be surprised. Likewise the employees trusted with extreme disaster recovery mechanisms are not the poor souls on H1Bs who don't have the option of leaving easily, so the people trusted with access might have already jumped ship since they aren't being coerced into staying on board with a mad man.
The real existential threat is another problem compounding on top of this or a disastrous recovery effort. Auto-remediation systems could do something awful. A master database could fall over and a replica be promoted, but if that happens twice, 4 times? Without puppet to configure replacement machines appropriately, there could be a very real problem very quickly. Similarly, extremely powerful tools, like a root ssh key, might be taken out, but those keys do not have seat-belts and one command typed wrong could be catastrophic. Sometimes bigger disasters are made trying to fix smaller ones.
Puppet can be in the critical path of both recovery (via config change) and capacity.