this post was submitted on 09 Oct 2024
609 points (96.8% liked)
Technology
60021 readers
1944 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
So what about open source self hosted search engines? If it requires some hardware I'd gladly team up with a small group of people to finance a bigass server that just gets us our personal search engine
Any good ones out there?
There's stuff like Searxng or whoogle, but these aren't "real" search engines, merely "search aggregators" - they relay requests to a bunch of actual search engines, like bing or google, and aggregate the results. That's why they don't require tons of compute and scraping, and also why they often fail to work (since the search engines in question don't like or allow this). I believe it's not feasible to run a "real" search engine alone or even as a small group of people - according to this comment you need a powerful server with terabytes* of drive, hundreds of gigabytes of RAM and a lot of compute - and all of this will just let you crawl some top domains, nowhere near a good chunk of the internet.
*which sounds low actually, I would have expected more for this
Searxng, but there are plenty of instances already
Perplexica is interesting too, but it uses a moderate amount of ram because of elastic search.
And of course you need to have ollama running
Very cool