When I first got into self hosting, I originally wanted to join the Fediverse by hosting my own instance. After realizing I am not that committed to that idea, I went into a simpler direction.
Originally I was using Cloudflare’s tunnel service. Watching the logs, I would get traffic from random corporations and places.
Being uncomfortable with Cloudflare after pivoting away from social media, I learned how to secure my device myself and started using an uncommon port with a reverse proxy. My logs now only ever show activity when I am connecting to my own site.
Which is what lead me to this question.
What do bots and scrapers look for when they come to a site? Do they mainly target known ports like 80 or 22 for insecurities? Do they ever scan other ports looking for other common services that may be insecure? Is it even worth their time scanning for open ports?
Seeing as I am tiny and obscure, I most likely won’t need to do much research into protecting myself from such threats but I am still curious about the threats that bots pose to other self-hosters or larger platforms.
My ISP blocks incoming data to common ports unless you get a business account. That’s why I used Cloudflare’s tunnel service initially. I changed my plans with the domain name I currently own and I don’t feel comfortable giving more power and data to an American Tech company so this is my alternative path.
I use Caddy as my reverse proxy so I only have one uncommon port open. My plans changed from many people accessing my site to just me and very few select friends of mine which does not need a business account.
Oof, sorry, that sucks. I think you could still go the route I described though: For your domain
example.com
and example servicemyservice
, listen on port:12345
and drop everything that isn’t requestingmyservice.example.com:12345
. Then forward the matching requests to your service’s actual port, e.g.23456
, which is closed to the internet.Edit: and just to clarify, for service
otherservice
, you do not need to open a second port; stick with the one, but in addition tomyservice.example.com:12345
, also accept requests forotherservice.example.com:12345
, but proxy that to the (again, closed-to-the-internet) port:34567
.The advantage here is that bots cannot guess from your ports what software you are running, and since caddy (or any of the mature reverse proxies) can be expected to be reasonably secure, I would not worry about bots being able to exploit the reverse proxy’s port. Bots also no longer have a direct line of communication to your services. In short, the routine of “let’s scan ports; ah, port x is open indicating use of service y; try automated exploit z” gets prevented.
I think I am already doing that. My Kiwix docker container port is set to 127.0.0.1:8080:8080 and my reverse proxy is only open to port 12345 but will redirect kiwi.example. com:12345 to port 8080 on the local machine.
I’ve learned that docker likes to manipulate iptables without any notice to other programs like UFW. I have to be specific in making sure docker containers only announce themselves to the local machine only.
I’ve also used this guide to harden Caddy and adjusted that to my needs. I took the advice from another user and use wildcard domain certs instead of issuing certs for each sub domain, that way only the wildcard domain is visible when I search it up at https://crt.sh/ . That way I’m not advertising my sub domains that I am using.
TBH, it sounds like you have nothing to worry about then! Open ports aren’t really an issue in-and-on itself, they are problematic because the software listening on them might be vulnerable, and the (standard-) ports can provide knowledge about the nature pf the application, making it easier to target specific software with an exploit.
Since a bot has no way of finding out what services you are running, they could only attack caddy - which I’d put down as a negligible danger.
Yeah, a few weeks ago a achieved my state of “secure” for my server. I just happened to notice a dramatic decrease in activity and that’s what prompted this question that’s been sitting in the back of my mind for weeks now.
I do think it’s important to talk about it though because there seems to be a lack of talk about security in general for self hosting. So many guides focus on getting services up and running as fast as possible but don’t give security much thought.
I just so happened to have gained an interest for the security aspect of self hosting over hosting actual services. My risks for self hosting is extremely low so I’ve reached a point of diminishing returns on security but the mind is still curious and wants to know more.
I might write up a guide/walkthrough of my setup in the future but that’s low priority. I have some other not self hosting related things I want to focus on first.