Cool.
But a warning, based on doing quite a lot of crawling from home through my own search engine, it's very easy to have your IP or IP-block end up on annoying graylists where basically every other website you visit will throw a CAPTCHA in your face. I'm aware this is a risk and use a VPN for most of my private web surfing anyway so it's not that much of a bother, but it's a bit sketchy to expose other people to that risk through something like this.
It would probably be wise to use canned crawls for major websites, maybe something like trading WARCs <https://en.wikipedia.org/wiki/Web_ARChive> over bit-torrent or whatever. Most of these types of websites don't change that often in the places that matter.
Hey HN, I'm building an open source search platform that lives on your device, indexing what you want, exposing it to you in a super simple & super fast interface.
I took the idea of adding "site:reddit.com" to your Google searches and expanded on it with the idea of "lenses" to add context to your search query and give the crawler direction in terms of what to crawl & index. This means that all queries are run locally, it does not relay your search to any 3rd-party search engine. Think of it as your personal bookcase at home vs. the Library of Congress.
It's still in a super early state but would love for people to start using it and providing some feedback and see what sort of lenses people want to build and search through!
Some details about the stack for the interested:
Thanks in advance!