Hacker News

by m-hodgeson 6/19/2025, 4:11 PMwith 52 comments

by karel-3don 6/19/2025, 11:50 PM

I was thinking LLMs can be long-term regressive?

As the "proper solution" here is of course not using PDFs that are hard-to-parse, but force elections to have machine parseable outputs. And LLMs can "fix in place" stupid solutions.

That's not a hate on the author though. I needed to do some PDF parsing for bank statements before; but also; the proper long-term solution is force banks (by law or by public interest) to have parseable statements, not parse it!

Like putting LLMs to understand bad codebase will not fix the bad codebase, but will build on top of it.

oh well c'est la vie

by Normal_gaussianon 6/20/2025, 2:00 AM

I'm not convinced.

I had Gemini convert a bunch of charity forms yesterday, and the deviation was significant and problematic. Rephrasing questions, inventing new questions, changing the emphasis; it might be performing a lot better for numerical data sets, but it's rare to have one without a meaningful textual component.

by fasthands9on 6/19/2025, 9:06 PM

In college (about 15 years ago) I worked for a professor who was compiling precint level results for old elections. My job was just to request the info and then do manual data entry. It was abysmally slow.

This application seems very good - but still a bit amazing that lawmakers haven't just required that all data be uploaded via csv! Even if every csv was slightly different format, it would be way easier for everyone (LLM or not).

by simonwon 6/19/2025, 5:43 PM

This is such an excellent example of a responsible and thorough application of vision LLMs to a gnarly data entry problem.

by o11con 6/20/2025, 1:48 AM

You know, not ignoring the percentage column would mean you can do math checks yourself.

by antonkaron 6/20/2025, 12:31 AM

Related: Interesting mockups to turn X/open source Bsky into direct democratic massive "prothetic" polls in each post.

And paid polls that the author claims will replace prediction markets:

https://x.com/MelonUsks/status/1929660387995115713

by GardenLetter27on 6/19/2025, 8:07 PM

Why is the original source data not available anywhere digitally?

Since it's printed it is clearly already in a database somewhere. Why can't that just be made public too.

Seems bizarre to OCR printed documents (although I am aware of many companies doing this to parse invoices, etc.)

by nxrablon 6/19/2025, 6:13 PM

Very interesting! Is this the state of the art for accurate OCR of tabular PDFs, or is there other work in the space to compare against?

by benobon 6/19/2025, 7:29 PM

I wonder how difficult it would be to bias a model so that it subtly corrupts election results when performing OCR.

How OpenElections uses LLMs