I'm only vaguely aware of how other people do perfect hashing (generators I've used always seem to produce arrays to load from), but dabbled in a purely-arithmetic toy problem recently.
As an exercise for the reader:
There are exactly 32 symbols in ASCII:
!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
Taking input in a register, uniquely hash them to the range 0-31.
Any other input values may return any number, but must not
trap or exhibit undefined behavior. The obvious approach of
"just make a 256-element array" isn't allowed.
This can be done in few enough cycles that you need to
carefully consider if there's any point to including:
loads (even hitting L1)
branches (even if it fully predicts when it is taken)
multiplication (unless just using lea/add/shift)
I found that out-of-order only helps a little; it's
difficult to scatter-gather very wide in so few cycles.
Writing C code mostly works if you can persuade the compiler
not to emit an unwanted branch.
I’ve always considered this a bit of an academic conversation because as others have people have pointed out the up front costs are often too much to bear. However we have languages now that can run functions at build time. So a static lookup table is possible.
And where else would one use a static lookup table to great effect? When I actively followed the SIGPLAN (programming languages) proceedings, one of the papers that really stood out for me was one about making interface/trait based function calls as fast as inheritance by using perfect hashing on all vtable entries.
This is a really good article. The subject area has a lot of new developments in the past few years (wonder why) and the survey discusses them. I always thought of perfect hashing as a niche optimization good for some things and of theoretical interest, but it's apparently more important than I thought.
Interesting that they don’t cover boomphf which is the fastest MPHF I’ve encountered.
They completely forget the startup-time in the query-time, which dominates by a factor of 1000.
Some PHF's can be pre-compiled, while most needs to be deserialized at run-time. I worked on a pre-compiled pthash variant, but got struck by C++ bugs.
There's a huge overhead for ordered variants in some, to check for false positives.
For small n gperf is still the fastest by far. And it is pre-compiled only.
We've made extensive use of perfect hashing in HeavyDB (formerly MapD/OmniSciDB), and it has definitely been a core part of achieving strong group by and join performance.
You can use perfect hashes not only the usual suspects of contiguous integer and dictionary-encoded string ranges, but also use cases like binned numeric and date ranges (epoch seconds binned per year can use a perfect hash range of one bin per year for a very wide range of timestamps), and can even handle arbitrary expressions if you propagate the ranges correctly.
Obviously you need a good "baseline" hash path to fall back to you, but it's surprising how many real-world use cases you can profitably cover with perfect hashing.