this post was submitted on 19 May 2024
68 points (82.7% liked)

Technology

59296 readers
4550 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 7 points 5 months ago

I had a similar idea: Could search engines be broken up and distributed instead of being just a couple of monoliths?

Reading the HN thread, the short answer is: NO.

Still, its fun to imagine what it might look like if only......

I think the OP is looking for an answer to the problem of Google having a monopoly that gives them the power to make it impossible to be challenged. The cost to replicate their search service is just so astronomical that its basically impossible to replace them. Would the OP be satisfied if we could make cheaper components that all fit together to make a competing but decentralized search service? Breaking down the technical problems is just the first step, the basic concepts for me are:

Crawling -> Indexing -> Storing/host index -> Ranking

All of them are expensive because the internet is massive! If each of these were isolated but still interoperable then we get some interesting possibilities: Basically you could have many smaller specialized companies that can focus on better ranking algorithms for example.

  • What if crawling was done by the owners of each website and then submitted to an index database of their choice? This flips the model around so things like robots.txt might become less relevant. Bad actors and spam however now don't need any SEO tricks to flood a database or mislead as to their actual content, they can just submit whatever they like!. These concerns feed into the next step:
  • What if there were standard indexing functions similar to how you have many standard hash functions. How a site is indexed plays an important role in how ranking will work (or not) later. You could have a handful of popular general purpose index algorithms that most sites would produce and then submit (e.g. keywords, images, podcasts, etc.) combined with many more domain specific indexing algorithms (e.g. product listings, travel data, mapping, research). Also if the functions were open standards then it would be possible for a browser to run the index function on the current page and compare the result to the submitted index listing. It could warn users that the page they are viewing is probably either spam or misconfigured in some way to make the index not match what was submitted.
  • What if the stored indexes were hosted in a distributed way similar to DNS? Sharing the database would lower individual costs. Companies with bigger budgets could replicate the database to provide their users with a faster service. Companies with fewer resources would be able to use the publicly available indexes yet still be competitive.
  • Enabling more competition between different ranking methods will hopefully reduce the effectiveness of SEO gaming (or maybe make it worse as the same content is repackaged for each and every index/rank combination). Ranking could happen locally (although this would probably not be efficient at all but that fact that it might even be possible at all is quite a novel thought)

Sigh enough daydreaming already........