It's worth remembering that the main reason this kind of data breach is a real problem is mostly due to the incompetence of the IRS. For any serious financial organization, knowing a person's SSN, name, address, etc doesn't allow you to access or withdraw that person's finances.
But the stupidity of the IRS means that people are easily targeted by false tax return attacks. File a fake tax return for someone, using their SSN/name/address, but tell the IRS you changed address. Then the IRS sends your tax refund to the new address, and boom, you just collected some poor sod's refund. To add insult to injury, the IRS is probably going to audit the person whose refund you stole.
Troy mentions "data opt-out services. Every person who used some sort of data opt-out service was not present."
Anyone have experience with these sort of services? A search brings up a lot of scammy looking results. But if services exist to reduce my profile id be interested.
Extreme Privacy by Michael Bazzell is a great resource to learn how to limit exposure to these aggregator services.
It is crazy to me that data brokers are even a legal form of business. All of these services should be opt in at minimum. If they are obtaining publicly available information and making it easier to access, they should have to maintain insurance or a deposit with the government to compensate victims of cybersecurity incidents. Telling people to get credit monitoring is in NO WAY an acceptable way to make us whole. They need to pay for a lifetime of monitoring and INSURANCE up to the net worth of affected individuals. This needs to become law ASAP.
"there were no email addresses in the social security number files. If you find yourself in this data breach via HIBP, there's no evidence your SSN was leaked, and if you're in the same boat as me, the data next to your record may not even be correct. "
Seems like Troy is skeptical about this being a real full breach?
For years I've said the entire SSN database just needs to be published alongside legislation strictly assigning liability to any company who defrauded as a result of using the SSN as a "secret". That would fix the problem with SSN's and "identity theft" quickly.
Part 1 has been accomplished. Let's get part 2 going!
Aside: It amazes me how the American public has allowed defrauded companies to assign the company's loss as a liability to innocent individuals (in the form of "identity theft"). It would be great if we could get that changed in the minds of the public. A well-informed public could collectively turn "identity theft" into the "bank's problem" (from the old adage "If you owe the bank a billion dollars they have a problem..."). The insurance industry would swoop in as the defrauded parties start making claims and shoddy security practices would get tightened-up.
(Edit: I fear insurance companies coming in to "fix this" to some extent-- citing my experiences with PCI DSS compliance auditing and Customers who have had 'cyber insurance' policies coming with ridiculous security theatre requirements. Maybe we can end up with something like a 'cyber' Underwriters Labs in the end.)
(Also: Yikes! I hate that I just typed 'cyber' un-ironically.)
For non-Americans (and Americans) that don't quite understand what SSN is and why it's a problem, CGP Grey [1] has a great (and short) video about the history and why it's not technically an identifier, but has become one.
> The problem with verifying breaches sourced from data aggregators is that nobody willingly - knowingly - provides their data to them
This is a bit of a tangent but I feel like if we can prove this statement then these data aggregators should be made illegal. How can you consent to something that you don’t know you’re consenting to? Likewise why do these entities have the right to collect detailed personal information like SSN without your explicit, beyond reasonable doubt, consent? To me this is the most obvious failure of the legal system, it clearly goes against well established legal principles that a basic requirement of an agreement is that all parties know what they are agreeing to.
Obviously there is some leeway with agreements where it’s not possible to clarify every eventuality but lets say if you’re applying to rent a place through an online form and that form shares your SSN to a data aggregator, it should be extremely clear about that, and possible to out out while still allowing you to complete the rental application without discrimination.
It’s like, it should be possible to show that no one, with in reason, consented to sharing their data with this aggregator because no one is able to confirm that they did. Sure one person could forget, or lie, but 100s of millions of people? No. Clearly almost zero people knowingly consents.
I was wondering why Google suddenly turned on "prompt authentication" on zero-security feature accounts yesterday. Now I "must" have a phone nearby to use Gmail... Tap to authenticate every time you want to look at ... ad spam.
With this, Ticketmaster, and the CDK Global car theft, is there anybody on Earth who doesn't need data protection? Poor people in Somalia need data breach notices. People who are not even on the WWW need data breach notices...
I recently hired the experts of {hacker11tech (@) gmail com} to help me track my spouse's GPS location, as I suspected infidelity. They provided me with accurate and timely information, revealing that my spouse was frequently visiting another person's location instead of going to work as claimed. Their expertise and professionalism were very impressive, and their ethical approach ensured a discreet and confidential process. The evidence gathered was comprehensible and reliable, giving me clarity that I needed to address the situation. I highly appreciate the {hacker11tech (@) gmail com} dedication helping to uncover the truth while maintaining ethical standards, their services was valuable in helping me make decisions about my relationship. I highly recommend this team {hacker11tech (@) gmail com} for anyone seeking reliable ethical practices and their commitment is reassuring.
Anything the average SSN holder should be doing proactively?
Why are data aggregators legal? In California can we create a proposition to shut them down in the state?
This sort of stuff will continue happening until the regulatory framework acknowledges a fundamental consumer right to privacy.
If a data broker collects data without the consent of the consumer, then their only real risk is a class action lawsuit which drags on for six years, gets settled for a few days profit, and the consumer gets $13.50 after the legal fees. This massive skew in the risk reward calculus of data brokers is why we have the problem. Because there's little to no real downside, the trend is automatically collect as much data on as many people as possible.
Fixing this means big, mandatory, cash penalties in the law code - say $5k per consumer data leak, directly to the affected consumer, with added penalties if the company lies about the leak or delays payment. The fine must be big, mandatory, and paid directly to the consumer. Only that changes the risk reward ratio.
In that new world, companies would have to re assess their risks. They'd either build invulnerable systems and hire a lot more people reading HN to protect their golden goose, or better still they'd decide to exit the business entirely. That sounds bad, but the only reason the industry exists is because regulators failed to foresee massive leaks like this happening every three months.
We need a consumer data privacy law, with massive fines, to force companies to change their behavior. What we're doing now clearly does not work.
I used Robokiller to remove myself from data broker lists. I'm extremely impressed with it. I pay yearly. My only annoyance with Robokiller is that
A) It's necessary. When is the government going to start creating laws to help us and prosecute this?
B) It's expensive. Most people cannot afford this. I can barely afford it but my information has been leaked online.
C) It's inconvenient. A majority of calls are spam, but I'll often miss important calls from unknown numbers because Robokiller acts as a proxy and for some reason the call is routed through the Internet.
Anyhow, my wife and I are not on this list. I'm wondering if using Robokiller saved us from a lot of pain here.
Even before this, anyone operating a service who isn't treating SSNs as public knowledge in 2024 needs to be, well, shamed or penalized or something.
I’ve finally figured out the play: war of attrition.
Eventually enough data will be leaked to make moot the benefits of securing any personal data. At that point everyone stops trying and moves on to more financially rewarding activities.
I mean even if I’m an elephant, and data breaches are blind men, eventually enough blind men will draw a true comprehensive picture.
Several other commenters have brought about the sneaky wordplay involved in saying "identity theft" instead of simply calling it "fraud on the bank", and somehow turning the person into the victim rather than the bank that has been defrauded.
Has anyone tried to argue this point in court? Has this survived / how did this terminology shift survive judicial scrutiny?
From the NPD website:
> Please be advised that we will not collect, use, disclose, sell, or share the sensitive personal information or sensitive data of California, Virginia, Colorado, or Connecticut residents as those terms are defined by the CCPA/CPRA, VCDPA, CPA, or CTDPA, respectively.
Does anyone else just not give a fuck at this point about their SSN? I feel like maybe early 00s this would be scary but it's clear that everyone's SSN is out there already or waiting to get breached from a shady private data broker.
The problem lies in how institutions treat the SSN, not the number itself.
Are there any ways to check the breach to see if my information is there, other than downloading it myself? I’m not sure of the legality of doing so.
Time for services everywhere to stop using SSNs for identification and for the US to move on to a more advanced form of identification.
And lock your credit.
Is there a straightforward way to download this file for research purposes?
Downloaded the torrent, and it's a 164GB text file.
What's a quick way to search if my SSN is in the file? I ask before diving in, it's currently extracting and ETA is 40 minutes.
Can't the SSA just issue 330 million new social security numbers, and tell people to be more careful with them from this point forward?
Discussion from last week: https://news.ycombinator.com/item?id=41184420
What if we just made all this data free , some AI is going to compile them anyway (and probably already has). Deterrence is the best defense, right ?
“The database DOES NOT contain information from individuals who use data opt-out services. Every person who used some sort of data opt-out service was not present.”
Like what?
i.m.o. "National Public Data" in title should be capitalized; it is a proper noun https://en.wikipedia.org/wiki/National_Public_Data
And where is this information that this random group supposedly has? I have yet to see proof of that being real
the government should have put out honey pots or something, or maybe it’s time to get new numbers and just invalidate all the stolen data, there is clearly money for fixing this kind of thing but they’re using it to spy on us and do who knows what else instead
Does anyone know the correct password?
I worked incident response for years, logging thousands of hours of actual on site work with impacted clients.
No on cares.
Clients see this as the cost of doing business and have no incentive to do better. Even after Equifax and OPM.
Until we have a GDPR style law in the U.S. it will continue to be status quo.
I sure wish the US had a version of GDPR.
I get a data breach notice at least a few times a year. I got one for my kids two months ago for their medical data. I thought HIPPA had huge penalties but I guess not.
Perhaps HN readers would appreciate a detailed account of what the NPD torrents contain.
The torrent deliver two files like so:
NPD202401.7z 33,456,912,010 bytes (32GB)
NPD202402.7z 20,548,499,322 bytes (20GB)
Uncompressing NPD202401.7z results in: ssn.txt 176,806,109,779 bytes (165GB)
wc -l ssn.txt ==>> 1,698,302,005 lines
Uncompressing NPD202402.7z results in: ssn2.txt 120,722,361,611 bytes (113GB)
wc -l ssn2.txt ==>> 997,379,508 lines
This is a total of 1698302005+997379508 = 2,695,681,513 lines.Each line is a comma separated record with these fields:
ID,firstname,lastname,middlename,name_suff,dob,address,city,county_name,st,zip,phone1,aka1fullname,aka2fullname,aka3fullname,StartDat,alt1DOB,alt2DOB,alt3DOB,ssn
Generally records have ID, firstname, lastname, middlename, address, city, county_name, st, zip, and ssn. Most records do not have the fields for name_suff (name suffix), phone1, aka1fullname, aka2fullname, aka3fullname, StartDat, alt1DOB, alt2DOB, and alt3DOB.
There are no emails at all. There is no "@" in the files anywhere. Phone numbers are very rare.
I don't know what the ID number at the head of each line represents. I presume it is an internal index used by the organization that compiled the data. The SSN is at the end of each line.
The files have U.S. addresses only as far as I can tell. Nothing from Mexico, Canada, or other foreign countries.
Many of the lines (records) concern the same person at various addresses. Of 7 random people who I personally know that I checked on, all had entries. There were between 3 and 20 lines (records) for these 7 persons, averaging about 10. They usually differed only in the address field. Going by an estimate of 10 records per person, the 2.6 billion lines represents about 2695681513/10 = 269,568,151 distinct persons in the U.S.
The U.S. population is about 337M where 78% is over 18 years of age. In other words, 337000000*0.78 = 262,860,000 Americans are adults. This is pretty close to my estimate of 269,568,151 distinct individuals in the NPD data files.
Of the 7 persons I checked on, the names were spelled correctly, although the middle name was sometimes just an initial. I searched each person by multiple methods (address, last name, birth date) so I believe I would have detected names that were spelled slightly wrong.
The addresses appeared correct but there was no way to tell which was the current address and the order in which they lived at each address. There is a StartDat field but it was almost never filled in. The latest entry was not always the most current address. In a couple cases, the current address, where the person has been living for several years, was absent.
The birth dates were correct in a couple cases, were abbreviated in three cases (that is, instead of showing 19800704, meaning July 4 1980, it showed 19800700, meaning July 1980 without an exact day), and was wrong for one person by a wide margin.
All 7 persons I checked had SSN numbers. It was correct for 1 person but I don't know for the other 6. The SSN numbers were consistent for each of the 7 persons I checked on. By this I mean that a person did not have more than 1 SSN number, at least among the 7 persons I checked on.
off topic
does HIBP automatically cover plus addressing variants of an email
example I submit johndoe@example.com
but a breach had johndoe+verizon@example.com
will it match
Ahh, cool, pour the corpus through GPTs and start tweeting Congressional rep personal info at them until they pass a law to outlaw data brokers (in keeping with historical precedent [1] [2]).
[1] https://en.wikipedia.org/wiki/Video_Privacy_Protection_Act
[2] https://jolt.law.harvard.edu/digest/dodging-the-thought-poli...
I am just dreading the day when a near simultaneous cyberattack on a high number of(more vulnerable like middle-lower income individuals) start in a DDoS fashion:
1. Credit histories will be(unlocked) used to file multiple credit applications and tax credits will be applied for.
2. Multiple Cell phones will be hijacked through Sim Hijacking or other zeroday attacks to make it very difficult to get back in.
3. A person's profile will be used to attack the most vulnerable things: - Their families will get fake calls to create confusion. - Their financial services will be frozen or worst weak 2fac auth ones will be compromised.
4. Deep fake image and videos will be created from compromised accounts to sow further mayhem.
This already happens in targeted and one startegy of teh other fashion. Imagine what one could do with a bit more compute and completed profiles and orchestrate this kind of terrible vengeance.
TL;DR:
> an intriguing story that doesn't require any further action.
> While the specifics of the data breach remain unclear, the trove of data was put up for sale on the dark web for $3.5 million in April, the complaint reads.
I guess they failed to sell it because links to the leaked data on usdod.io have been available on Breachforum/Leakbase for over a week now. Someone created a magnet link yesterday and it's fully seeded so speeds are fast.
The data in the breach is irreversibly public now.