I use Duckduckgo, but I realised these big(ish) search engines give me all the commercialised results. Duckduckgo has been going down the slope for years, but not at such a rate as Google or Bing has.
I want to have a search engine that gives me all the small blogs and personal sites.
Does something like this exist?
I’m intrigued. The search results are more akin to how they used to be 25 years ago on the internet that I loved
Https://Search.marginalia.nu is definitely something I’ll be exploring going forward!
Try this engine
Or a SearXNG instance
https://search.disroot.org/search
You may also be interested in the Indie Web movement. This site is a great resource for it, with yet more links to indie sites and blogs.
Finally, not quite what you asked but here’s a freebie, in case you didn’t know about it:
It’s an old web search engine. It only indexes pages from the 00s and earlier.
Offtopic but ddg is a bing frontend so they should share the same results.
This is a great question, in that it made me wonder why the Fediverse hasn’t come up with a distributed search engine yet. I can see the general shape of a system, and it’d require some novel solutions to keep it scalable while still allowing reasonably complex queries. The biggest problems with search engines is that they’re all scanning the entire internet and generating a huge percent of all internet traffic; they’re all creating their own indexes, which is computationally expensive; their indexes are huge, which is space-expensive; and quality query results require a fair amount of computing resources.
A distributed search engine, with something like a DHT for the index, with partitioning and replication, and a moderation system to control bad actors and trojan nodes. DDG and SearX are sort of front ends for a system like this, except that they just hand off the queries to one (or two) of the big monolithic engines.
YaCy
Yah, it does. I’ve come across it before, but it rode in on a wave of alternative search engines and got lost in the shuffle.
Thanks.
I thought Gigablast was a one-man company? Yet it had good search results and it was expansive.
Yes, it was. Matt Wells closed it down just over one year ago.
You’re looking for Kagi.com
Not only does it give better search results quality wise on “the big web” - you can select to search specific parts, like blogs.
Best part - it’s completely ad and spam free. You pay for it with actual money instead of with your data.
Why not run an SearXNG instance and help everyone instead? Y’know, Kagi is pretty expensive and they are also getting into AI shit.
Can you expand on how running your own SearXNG helps others? Does it contribute to some shared index or something?
SearXNG is a meta search engine, which means it gets the search results from other search engines (Google, Bing, Qwant, etc.) and show them to you. It acts a proxy, thus hiding the users IP. This means Google can’t target ads based on your IP and also can’t make a profile about you.
What IP is Google getting if I self host the instance?
The instance’s IP
Right. So, my IP. Which is the same (IP-wise) as if I’d just searched Google directly, leaving aside the benefits of searching other engines simultaneously.
I’ve also seen people suggest we should open our self-hosted SearXNG instances to others and let random people submit searches to it thereby causing searches to appear to come from my home IP address. That strikes me as a terrible idea given what some people search on the web. I have also never run a TOR exit node.
I use Kagi myself and I was hooked after using their free trial so I’m comparing to that.
When I submit a search to Kagi, Google (and their other downstream search engines) gets the search from Kagi. Yes, that means I have to trust Kagi to some extent but as we can see, there are obvious problems with SearXNG whether using it myself or opening it to others.
The AI features are mentioned further up the thread as a negative but I disagree. I recently cancelled my subscription to ChatGPT ($20/mo) and upgraded my Kagi subscription ($25/mo) which gives me searching and access to all the most popular LLMs which I do use from time to time, mostly for code help. Personally, it’s a great value.
I didn’t even know about the AI features when I started paying for it. That “side” of Kagi is fully optional and very unobtrusive.
That strikes me as a terrible idea given what some people search on the web. I have also never run a TOR exit node.
It is somewhat like a TOR exit node indeed. Though you can easily explain by saying that you did not make these searches but that you merely run a meta search engine that helps others protect their privacy. Even the TOR project has templates for exit nodes to submit them to the government or whoever is contacting them in those cases.
https://community.torproject.org/relay/community-resources/tor-abuse-templates/
Teclis - Includes search results from Marginalia, free to use at the moment. This search index has been in the past closed down due to abuse.
Kagi, whose creation Teclis is, is a paid search engine (metasearch engine to be more precise) also incorporates these search results in their normal searches. I warmly recommend giving Kagi a try, it’s great, I’ve been enjoying it a lot.
–
Other options I can recommend; You could always try to host your own search engine if you have list of small-web sites in mind or don’t mind spending some effort collecting such list. I personally host Yacy [github link] (and Searxng to interface with yacy and several other self-hosted indexes/search engines such as kiwix wiki’s.). Indexing and crawling your own search results surprisingly is not resource heavy at all, and can be run on your personal machine in the background.
I’m building my own. Keep you posted.
Before google existed I used https://www.metacrawler.com it appears to still be around. I have not used it in a long time, so I know nothing about it any longer.
https://system1.com/ adtech company syndicating Bing and/or Google
https://system1.com/ adtech company syndicating Bing and/or Google
They own metacrawler now?
yep, in footer “© 2024 Infospace Holdings LLC, A System1 Company”
Google are the ones who have really gone down the toilet in recent years. They ditched cached pages, soured search results with paid ads and even their image search is as bad as Tineye for reverse image searching these days. Literally the only thing Alphabet really have going for them anymore is Android and YouTube.
It’s baffling that a company which was once so dominant in the web search space that their name was literally used as a verb for looking things up for decades have now enshittified their flagship product so much that they’re making rivals like Bing, Lycos, Duckduckgo, etc look like viable alternatives.
Every company is going down the drain just at different speeds.
If you want blogs, I recommend you use gemini: https://en.wikipedia.org/wiki/Gemini_(protocol)
Download Lagrange and begin browsing. It’s basically a small-web of personal blogs.
Don’t know if this fits your criteria, but I’ve been using Gruble a lot recently. You can personalise the look and language in the settings, plus it’s open source.
the link should be: https://gruble.de/. But as stated it’s “just” a SearXNG instance. See the full list: https://searx.space/
I make one for web dev and mastodon.
Maybe try Mojeek, it uses completely independent indexing system.