When I first got into self hosting, I originally wanted to join the Fediverse by hosting my own instance. After realizing I am not that committed to that idea, I went into a simpler direction.

Originally I was using Cloudflare’s tunnel service. Watching the logs, I would get traffic from random corporations and places.

Being uncomfortable with Cloudflare after pivoting away from social media, I learned how to secure my device myself and started using an uncommon port with a reverse proxy. My logs now only ever show activity when I am connecting to my own site.

Which is what lead me to this question.

What do bots and scrapers look for when they come to a site? Do they mainly target known ports like 80 or 22 for insecurities? Do they ever scan other ports looking for other common services that may be insecure? Is it even worth their time scanning for open ports?

Seeing as I am tiny and obscure, I most likely won’t need to do much research into protecting myself from such threats but I am still curious about the threats that bots pose to other self-hosters or larger platforms.

  • derek@infosec.pub
    link
    fedilink
    English
    arrow-up
    7
    ·
    1 day ago

    You can meaningfully portscan the entire internet in a trivial amount of time. Security by obscurity doesn’t work. You just get blindsided. Switching to a non-standard port cleans the logs up because most of the background noise targets standard ports.

    It sounds like you’re doing alright so far. Trying not to get got is only part of the puzzle though. You also ought to have a backup and recovery strategy (one tactic is not a strategy). Figuring out how to turn worst-case scenarios into solvable annoyances instead of apocalypse is another (and almost equally as important). If you’re trying to increase your resiliency, and if your Disaster Recovery isn’t fully baked yet, then I’d toss effort that way.

    • DarkAri@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      2
      ·
      11 hours ago

      Also doing basic things like running your webserver in a VM, and you can write some script or something to just block any IP that is port scanning I’m pretty sure. I would do that if I was hosting. Also remember to block port scanning in Firefox. It’s not enabled by default. This helps to keep you safe when you land on a scanning webpage.

      • derek@infosec.pub
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        Absolutely. VMs and Containers are the wise sysadmin’s friends. Instead of rolling my own ip blocker I use Fail2Ban on public-facing machines. It’s invaluable.

    • confusedpuppy@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      11 hours ago

      Early when I was learning self hosting, I lost my work and progress a lot. Through all that I learned how to make a really solid backup/restore system that works consistently.

      Each device I own has it’s own local backup. I copy those backups to a partition on my computer dedicated to backups, and that partition gets copied again to an external SSD which can be disconnected. Restoring from external SSD to my Computer’s backup partition to each device all works to my liking. I feel quite confident with my setup. It took a lot of failure to gain that confidence.

      I also spent time hardening my system. I went through this Linux hardening guide and applied what I thought would be appropriate for my web facing server. Since the guide seems more for a personal computer (I think), the majority of it didn’t apply to my use case. I also use Alpine Linux so there was even less I could do for my system but it was still helpful in understanding how much effort it is to secure a computer.

      • derek@infosec.pub
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        That sounds pretty good to me for self-hosted services you’re running just for you and yours. The only addition I have on the DR front is implementing an off-site backup as well. I prefer restic for file-level backups, Proxmox Backup Server for image backups (clonezilla works in a pinch), and Backblaze B2 for off-site storage. They’re reliable and reasonably priced. If a third party service isn’t in the cards then get a second SSD and put it in a safety deposit box or bury it on the other side of town or something. Swap the two backup disks once a month.

        The point is to make sure you’re following the 3-2-1 principal. Three copies of your data. Two different storage mediums. One remote location (at least). If disaster strikes and your home disappears you want something to restore from rather than losing absolutely everything.

        Extending your current set up to ship the external SSD’s contents out to B2 would likely just be pointing rsync at your B2 bucket and scheduling a cron or systemd timer to run it.

        After that if you’re itching for more I’d suggest reading/watching some Red Team content like the stuff at hacker101 dot com and sans dot org. OWASP dot org is also building some neat educational tools. Getting a better understanding of the what and why around internet background noise and threat actor patterns is powerful.

        You could also play around with Wazuh if you want to launch straight into the Blue Team weeds. Education of the attacking side is essential for us to be effective as defenders but deeper learning anywhere across the spectrum is always a good thing. Standing up a full blown SIEM XDR, for free, offers a lot of education.

        P. S. I realize this is all tangential to your OP. I don’t care for the grizzled killjoys who chime in with “that’s dumb don’t do that” or similar, offer little helpful insight, and trot off arrogantly over the horizon on their high horse. I wanted to be sure I offered actionable suggestions for improvement and was tangibly helpful.