When I first got into self hosting, I originally wanted to join the Fediverse by hosting my own instance. After realizing I am not that committed to that idea, I went into a simpler direction.
Originally I was using Cloudflare’s tunnel service. Watching the logs, I would get traffic from random corporations and places.
Being uncomfortable with Cloudflare after pivoting away from social media, I learned how to secure my device myself and started using an uncommon port with a reverse proxy. My logs now only ever show activity when I am connecting to my own site.
Which is what lead me to this question.
What do bots and scrapers look for when they come to a site? Do they mainly target known ports like 80 or 22 for insecurities? Do they ever scan other ports looking for other common services that may be insecure? Is it even worth their time scanning for open ports?
Seeing as I am tiny and obscure, I most likely won’t need to do much research into protecting myself from such threats but I am still curious about the threats that bots pose to other self-hosters or larger platforms.
Oof, sorry, that sucks. I think you could still go the route I described though: For your domain
example.com
and example servicemyservice
, listen on port:12345
and drop everything that isn’t requestingmyservice.example.com:12345
. Then forward the matching requests to your service’s actual port, e.g.23456
, which is closed to the internet.Edit: and just to clarify, for service
otherservice
, you do not need to open a second port; stick with the one, but in addition tomyservice.example.com:12345
, also accept requests forotherservice.example.com:12345
, but proxy that to the (again, closed-to-the-internet) port:34567
.The advantage here is that bots cannot guess from your ports what software you are running, and since caddy (or any of the mature reverse proxies) can be expected to be reasonably secure, I would not worry about bots being able to exploit the reverse proxy’s port. Bots also no longer have a direct line of communication to your services. In short, the routine of “let’s scan ports; ah, port x is open indicating use of service y; try automated exploit z” gets prevented.
I think I am already doing that. My Kiwix docker container port is set to 127.0.0.1:8080:8080 and my reverse proxy is only open to port 12345 but will redirect kiwi.example. com:12345 to port 8080 on the local machine.
I’ve learned that docker likes to manipulate iptables without any notice to other programs like UFW. I have to be specific in making sure docker containers only announce themselves to the local machine only.
I’ve also used this guide to harden Caddy and adjusted that to my needs. I took the advice from another user and use wildcard domain certs instead of issuing certs for each sub domain, that way only the wildcard domain is visible when I search it up at https://crt.sh/ . That way I’m not advertising my sub domains that I am using.
TBH, it sounds like you have nothing to worry about then! Open ports aren’t really an issue in-and-on itself, they are problematic because the software listening on them might be vulnerable, and the (standard-) ports can provide knowledge about the nature pf the application, making it easier to target specific software with an exploit.
Since a bot has no way of finding out what services you are running, they could only attack caddy - which I’d put down as a negligible danger.