When I first got into self hosting, I originally wanted to join the Fediverse by hosting my own instance. After realizing I am not that committed to that idea, I went into a simpler direction.
Originally I was using Cloudflare’s tunnel service. Watching the logs, I would get traffic from random corporations and places.
Being uncomfortable with Cloudflare after pivoting away from social media, I learned how to secure my device myself and started using an uncommon port with a reverse proxy. My logs now only ever show activity when I am connecting to my own site.
Which is what lead me to this question.
What do bots and scrapers look for when they come to a site? Do they mainly target known ports like 80 or 22 for insecurities? Do they ever scan other ports looking for other common services that may be insecure? Is it even worth their time scanning for open ports?
Seeing as I am tiny and obscure, I most likely won’t need to do much research into protecting myself from such threats but I am still curious about the threats that bots pose to other self-hosters or larger platforms.
I am scratching my head here: why open up ports at all? It it just to avoid having to pay for a domain? The usual way to go about this is to only proxy 443 traffic to the intended host/vm/port based on the (sub) domain, and just drop everything else, including requests on 443 that do not match your subdomains.
Granted, there are some services actually requiring open ports, but the majority don’t (and you mention a webserver, where we’re definitely back to: why open anything beyond 443?).
My ISP blocks incoming data to common ports unless you get a business account. That’s why I used Cloudflare’s tunnel service initially. I changed my plans with the domain name I currently own and I don’t feel comfortable giving more power and data to an American Tech company so this is my alternative path.
I use Caddy as my reverse proxy so I only have one uncommon port open. My plans changed from many people accessing my site to just me and very few select friends of mine which does not need a business account.
My ISP blocks incoming data to common ports unless you get a business account.
Oof, sorry, that sucks. I think you could still go the route I described though: For your domain
example.com
and example servicemyservice
, listen on port:12345
and drop everything that isn’t requestingmyservice.example.com:12345
. Then forward the matching requests to your service’s actual port, e.g.23456
, which is closed to the internet.Edit: and just to clarify, for service
otherservice
, you do not need to open a second port; stick with the one, but in addition tomyservice.example.com:12345
, also accept requests forotherservice.example.com:12345
, but proxy that to the (again, closed-to-the-internet) port:34567
.The advantage here is that bots cannot guess from your ports what software you are running, and since caddy (or any of the mature reverse proxies) can be expected to be reasonably secure, I would not worry about bots being able to exploit the reverse proxy’s port. Bots also no longer have a direct line of communication to your services. In short, the routine of “let’s scan ports; ah, port x is open indicating use of service y; try automated exploit z” gets prevented.
I think I am already doing that. My Kiwix docker container port is set to 127.0.0.1:8080:8080 and my reverse proxy is only open to port 12345 but will redirect kiwi.example. com:12345 to port 8080 on the local machine.
I’ve learned that docker likes to manipulate iptables without any notice to other programs like UFW. I have to be specific in making sure docker containers only announce themselves to the local machine only.
I’ve also used this guide to harden Caddy and adjusted that to my needs. I took the advice from another user and use wildcard domain certs instead of issuing certs for each sub domain, that way only the wildcard domain is visible when I search it up at https://crt.sh/ . That way I’m not advertising my sub domains that I am using.
Yes, they do. Most just search the common ports, but some scan all.
Being tiny and obscure doesn’t mean they won’t find you, it might just take longer.
That’s been my main goal throughout securing my personal devices including my web facing server. To make things inconvenient as possible for potential outside interference. Even if it means simply wasting their time.
With how complex computers and other electronic devices have become, I never expect anything I own to be 100% secure even if I take steps I think will make me secure.
I’ve been on the internet long enough to have built a habit of obscuring my online or digital presence. It won’t save me but it makes me less or a target.
There’s no “wasting their time”. These attacks are all automated, not some guy sitting at a keyboard running stuff interactively.
I get that.
I was generally (in my head) speaking about all my devices. If someone stole my computer, the full disk encryption is more of a deterrence than the idea of my data being fully secured. My hope is that the third party is more likely to delete than to access. If I catch the attention of someone that actually wants my data, I have bigger issues to worry about than security of my electronic devices.
There is no hiding in that sense. Bots will scan all IPs on all ports over time.
Will it be less on nonstandard ports? Likely. Will it matter? Not really, the attack vectors would be exactly the same.
Secure your systems and running on default or nonstandard ports won’t be an issue.
Read up on shodan.io. bot networks and scrapers can use the database as a seed to find open ports.
The cli massscan can (under reasonable conditions) scan the the entire ipv4 address space for a single port in 3 minutes. It would take an estimated 74 years for massscan to scan all 64k ports for the entire ipv4 network.
So, using a seed like shodan, can compliment scanners/scrapers to isolate ip addresses to further recon.
I honestly don’t know if this helps your question, I don’t actually know how services in general deal with nonstandard ports, but I’ve written a lot of scanning agents (not ai, old school agents) to recon for red/blue teams. I never started with raw internet guesses, I always used a seed. Shodan, or other scan results.
Thanks for the insight. It’s useful to know what tools are out there and what they can do. I was only aware of
nmap
before which I use to make sure the only ports open are the ports I want open.My web facing device only serves static sites and a file server with non identifiable data I feel indifferent about being on the internet. No databases or stress if it gets targeted or goes down.
Even then, I still like to know how things work. Technology today is built on so many layers of abstraction, it all feels like an infinite rabbit hole now. It’s hard to look at any piece of technology as secure these days.
You can meaningfully portscan the entire internet in a trivial amount of time. Security by obscurity doesn’t work. You just get blindsided. Switching to a non-standard port cleans the logs up because most of the background noise targets standard ports.
It sounds like you’re doing alright so far. Trying not to get got is only part of the puzzle though. You also ought to have a backup and recovery strategy (one tactic is not a strategy). Figuring out how to turn worst-case scenarios into solvable annoyances instead of apocalypse is another (and almost equally as important). If you’re trying to increase your resiliency, and if your Disaster Recovery isn’t fully baked yet, then I’d toss effort that way.
Also doing basic things like running your webserver in a VM, and you can write some script or something to just block any IP that is port scanning I’m pretty sure. I would do that if I was hosting. Also remember to block port scanning in Firefox. It’s not enabled by default. This helps to keep you safe when you land on a scanning webpage.
Early when I was learning self hosting, I lost my work and progress a lot. Through all that I learned how to make a really solid backup/restore system that works consistently.
Each device I own has it’s own local backup. I copy those backups to a partition on my computer dedicated to backups, and that partition gets copied again to an external SSD which can be disconnected. Restoring from external SSD to my Computer’s backup partition to each device all works to my liking. I feel quite confident with my setup. It took a lot of failure to gain that confidence.
I also spent time hardening my system. I went through this Linux hardening guide and applied what I thought would be appropriate for my web facing server. Since the guide seems more for a personal computer (I think), the majority of it didn’t apply to my use case. I also use Alpine Linux so there was even less I could do for my system but it was still helpful in understanding how much effort it is to secure a computer.
Exactly. Using nonstandard ports will clean up the logs a bit though, but an actual attacker doesn’t care what ports you use.
Given enough time, yes. Just look at shodan.
When I used to have SSH on a nonstandard port, I got login failures from bots. It really depends on the bot and how aggressive they have set it up.
There are a few very simple things that don’t improve security per se but help break the onslaught. One of them would be to not use standard ports for ssh etc. Another could be to use non-standard usernames (not “admin”). Or rename URLs from the standard “admin.php” or “/contact” to something else.
I use a different port for SSH, I also have use authorized keys. My SSHD is setup to only accept keys with no passwords and no keyboard input. Also when I run
nmap
on my server, the SSH port does not show up. I’ve never been too sure how hidden the SSH port is beyond the nmap scan but just assumed it would be discovered somehow if someone was determined enough.In the past month I did rename my devices and account names to things less obvious. I also took the suggestion from someone in this community and setup my TLS to use wildcard domain certs. That way my sub domains aren’t being advertised on the public list used by Certificate Authorities. I simply don’t use the base domain name anymore.
SSH keys are absolutely essential, but those are actual security as opposed to what I wrote above. I should’ve made that clearer.
My SSHD is setup to only accept keys with no passwords and no keyboard input.
I don’t see how that improves security. Surely an SSH key with an additional passphrase is more secure than one without.
I agree with the last point, I only mentioned that because I don’t really know what other setting in my SSHD config is hiding my SSH port from nmap scans. That just happened to be the last change I remember doing before running an nmap scan again and finding my SSH port no longer showed up.
Accessing SSH still works as expected with my keys and for my use case, I don’t believe I need an additional passphrase. Self hosting is just a hobby for me and I am very intentional with what I place on my web facing server.
I want to be secure enough but I’m also very willing to unplug and walk away if I happen to catch unwanted attention.
Sounds like a healthy attitude towards online security.
I’m doing my first ever nmap scan right now, thanks for the inspiration. It’s taking a long time - either my ISP does not like what I’m doing there or I’m being too thorough - but it looks like it does not see my SSH port either.
I started with a local scan first, something like
nmap 192 168.40.xxx
for a specific device ornmap 192.168.40.0/24
for everything in your current network.Nmap is quite complex with lots of options but there’s a lot of guides online to help out with the basics. You can press enter in your terminal while the scan is running and it should give a progress report.
Some attackers check services that have already cataloged the services you are running, even on uncommon ports. You won’t hear from them unless you are running a potentially vulnerable service.
Moving your port over to a nonstandard one is not a solution (unless the problem you experience is too many logs from sshd, and even then, logrotate exists), its security by obscurity which doesn’t really solve anything at all. Only way your server will be safe is by ensuring the packages on your server are up to date and that you harden it to the point where it isn’t too much of nuisance.
It helps, honestly one of the best defensive strategies is defence in depth.