The simhash patent has expired and is now free to use

by ubutleron 9/11/2024, 2:51 PMwith 1 comments

by ubutleron 9/11/2024, 2:51 PM

Simhash is an extremely fast and simple algorithm for detecting near duplicate text at scale which makes it particularly useful for deduplicating AI training datasets.