When I first got into self hosting, I originally wanted to join the Fediverse by hosting my own instance. After realizing I am not that committed to that idea, I went into a simpler direction.

Originally I was using Cloudflare’s tunnel service. Watching the logs, I would get traffic from random corporations and places.

Being uncomfortable with Cloudflare after pivoting away from social media, I learned how to secure my device myself and started using an uncommon port with a reverse proxy. My logs now only ever show activity when I am connecting to my own site.

Which is what lead me to this question.

What do bots and scrapers look for when they come to a site? Do they mainly target known ports like 80 or 22 for insecurities? Do they ever scan other ports looking for other common services that may be insecure? Is it even worth their time scanning for open ports?

Seeing as I am tiny and obscure, I most likely won’t need to do much research into protecting myself from such threats but I am still curious about the threats that bots pose to other self-hosters or larger platforms.

  • smiletolerantly@awful.systems
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    3 hours ago

    My ISP blocks incoming data to common ports unless you get a business account.

    Oof, sorry, that sucks. I think you could still go the route I described though: For your domain example.com and example service myservice, listen on port :12345 and drop everything that isn’t requesting myservice.example.com:12345. Then forward the matching requests to your service’s actual port, e.g. 23456, which is closed to the internet.

    Edit: and just to clarify, for service otherservice, you do not need to open a second port; stick with the one, but in addition to myservice.example.com:12345, also accept requests for otherservice.example.com:12345, but proxy that to the (again, closed-to-the-internet) port :34567.

    The advantage here is that bots cannot guess from your ports what software you are running, and since caddy (or any of the mature reverse proxies) can be expected to be reasonably secure, I would not worry about bots being able to exploit the reverse proxy’s port. Bots also no longer have a direct line of communication to your services. In short, the routine of “let’s scan ports; ah, port x is open indicating use of service y; try automated exploit z” gets prevented.

    • confusedpuppy@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 hours ago

      I think I am already doing that. My Kiwix docker container port is set to 127.0.0.1:8080:8080 and my reverse proxy is only open to port 12345 but will redirect kiwi.example. com:12345 to port 8080 on the local machine.

      I’ve learned that docker likes to manipulate iptables without any notice to other programs like UFW. I have to be specific in making sure docker containers only announce themselves to the local machine only.

      I’ve also used this guide to harden Caddy and adjusted that to my needs. I took the advice from another user and use wildcard domain certs instead of issuing certs for each sub domain, that way only the wildcard domain is visible when I search it up at https://crt.sh/ . That way I’m not advertising my sub domains that I am using.

      • smiletolerantly@awful.systems
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 hour ago

        TBH, it sounds like you have nothing to worry about then! Open ports aren’t really an issue in-and-on itself, they are problematic because the software listening on them might be vulnerable, and the (standard-) ports can provide knowledge about the nature pf the application, making it easier to target specific software with an exploit.

        Since a bot has no way of finding out what services you are running, they could only attack caddy - which I’d put down as a negligible danger.