Given that schema is known, should be able to avoid general JSON parsing. Would be much faster.
How does it compare with serde, which AFAIK uses the same approach
The benchmark section ("But is it fast?") contains a common error when trying to represent ratios as percentages.
For the "Tweets" case, it reports a speedup of 229%. The old value is 11.73 and the new is 5.108. That is a speedup of 2.293 (i.e. the new measurement is 2.293 times faster), but that is a difference of -56%, not 229%, so it's 129% faster, if you really want to use a comparative percentage.
Because using percentages to express ratio of change can be confusing or misleading, I always recommend using speedup instead, which is a simple ratio. A speedup of 2 is twice as fast. A speedup of 1 is the same. 0.5 is half as fast.
Formulas:
speedup(old, new) = old / new
relativePercent(old, new) = ((new / old) - 1) * 100
differenceInPercent(old, new) = (new - old) / old * 100
It would be great if someone could implement the schema discovery algorithm from the DB research GOAT, Thomas Neumann, and add it to Apache Arrow: https://db.in.tum.de/~durner/papers/json-tiles-sigmod21.pdf