I just wanted to see what other selfhosters backup emergency plan is if the primary internet router goes offline but the internet isn’t out (ie a router reboot would fix the problem), leaving you without access to your stuff even via vpn.
the options I’ve considered so far:
- cellular smart plug to reboot router
I tried a ubibot smart plug (SP1) that is supposed to work with cellular, but the device or sim is bad. I’m currently troubleshooting. The problem with this one is it requires a proprietary cloud service, it’s supposedly self hostable, but it’s a pia to setup and their app port can’t be changed easily allowing for a reverse proxy setup on VPS.
- the other option I am considering is cellular wifi router and a wifi smart plug connected to that device to reboot router
what other options have I overlooked? Also, specific models of devices would be helpful info from others doing this already.
TIA!
Edit: also just thought of possibly a cellular internet backup on my opnsense box, but from everything I’ve read that’s also very involved to setup
Edit2: I’ve setup a homeassistant automation to reboot a zigbee smart plug if 2 external hosts are down for 15 mins, will try this out for a bit. I still need tp troubleshoot why the device goes down in General. Thanks for all the responses and ideas!
I use a cheap Mikrotik LTE Router as a second route. It has the smallest data plan my provider offers - but it’s enough for maintenance and if I need more due to the main line being faulty it’s the same provider’s fault and they pay the bill anyway.
It mainly goes into the OPNsense as a second gateway,but it also allows me to VPN in and reboot the OPN if needed.
If the OPN would be fucked totally in theory I could run the network directly over it,but that would be nasty.
A friend of mine actually has a pretty nifty solution,but he is an absolute pro at these things. He has a small device (don’t ask me what SBC exactly) ping and check (I think DNS and a http check is included as well) various stages of his network, including his core switch, firewall and DSL modem. If one of them freezes the device sends a data packet via LoraWAN. He can then send a downstream command to reboot the devices.
I have a Shelly plug that is programmed internally to turn itself on if off for 20 seconds. Home assistant turns the plug off for 10 seconds if curl ip.me fails for 15 minutes.
My modem is plugged into that.
I’ve never had a router or firewall crash if I wasn’t fucking with it and did something
stupidill-advised, so I don’t try that kind of stuff unless I’m home.The term to look for is out of band management. Typically this will provide serial/console access to a device, and can often perform actions like power cycling. A lot of server hardware has this built in (eg idrac for Dell, IPMI generically). Some users will have a separate oobm network for remotely accessing/managing everything else.
I made an 8 outlet box with relays connected to each outlet (might post a how to). That’s connected to a Pi via GPIO.
The Pi runs PiKVM, but also has a service that:
- Checks if the router can be pinged
- Checks if the internet can be pinged
- Checks if the router webUI is up
If any of those fail, it toggles the plugs for modem and router.
I run OpnSense on a 5V miniPC. I have a second one and will be setting up CARP, too.
Note: Cellular backup is more involved, but a separate Cellular inbound might not be. I’ve considered putting one on the Pi above.
If my lab goes down, it sucks, but that’s it. I have no critical service running there.
I have some recoverability, but it requires for the main router to run. If it isn’t running it’s either a HW failure, which I will not fix remotely anyway or power is down. In which case, not much I can do about it neither.
I have router with OpenWRT with Wireguard and main server (NUC) on a smart plug. If the router runs and server is mishaving to the point where I cannot reboot it, I can power cycle it via the smart plug connected to the router.
You mentioned your brother lives 30mins away - well put some tiny server in his house. Having everything at your home is not build for redundancy at all. That’s just the risk management, if you absolutely need access to your server, then 1 site is not going to cut it.
I have my ‘incident recovery’ docs on my server.
It went down once, and when that connected, my single thought was ‘fuck’ haha.
For me I’d just say oh well, gotta fix it when I’m home again.
Otherwise I’d probably write some script on the server, which reboots my router when the server either doesn’t have internet anymore or can’t ping itself.
This only works if you’re planning on being home within a reasonable time. The situation that got me thinking about it in the first place was, I was out of state for several weeks and my router went down a couple days into my trip and had no access to jellyfin (mind you at the time this was really the only service I really wanted). So I had to call my brother who lives 30 mins away to go reboot my router.
It didn’t happen in more than 5 years…
For critical equipment you need to spend the extra dollar to minimize this kind of stuff
I don’t understand this comment? Your router hasn’t gone down in 5 years, is that what you’re saying?
Yep, at least for their own internal mistake. I rebooted them a few times in that time because updates and important config updates. Even for power outages (2 iirc) they are resilient, they are set to automatic boot up when the power is back.
Grammar aside, not having to reboot a router in 5 years isn’t unreasonable.
deleted by creator
If you expect it to be flaky you could get one of those old school mechanical time switches with the clicky pegs (or a more modern digital equivalent) and just have it set to power down for 1 click, normally 15 mins, at 4am or whenever suits you - minimally technically complicated and guaranteed stability through planned instability!
You have the potential to run into issues if the device is externally managed. At&t likes to push firmware updates at early hours. Cutting power during one of those would be problematic.
I’ve got one of those KeepConnect smart plugs which monitors a few different external servers and their own cloud, and automatically power cycles its outlet if things don’t work. They’ve damn near doubled in price since I’ve bought mine but it does work very well for me. Annual fee is reasonable too.
I could build something similar but I have too many projects as it is, and I feel I’d be fiddling with it endlessly just because I can. This is literally set and forget and in the last 2y it’s cycled the outlet 48 times, most of them in the middle of the night, presumably with my cable provider maintenance windows.
I’m thinking this is probably my best option if I can get past the using cloud service issue in my head lol.
I do recall seeing the keepconnect a while ago, but completely forgot about it. will definitely look into this! I guess the main issue I see is that it uses a cloud service, what happens when that service goes offline permanently?
If you were able to capture some traffic, you could probably figure out what its hitting and the response its looking for and then override that dns entry and fake that from your homelab or you’re own cloud hosted app/lambda/api.
On holiday, I’ll turn on my router’s auto-reboot option to reboot daily.
That’s not a bad idea, I def didn’t think of that lol. Thanks!
Redundancy. I have two independent firewalls, each separately routing traffic out through two totally independent multi-homed network connections (one cable, one DSL, please god somebody give me fiber someday) that both firewalls have access to. For awhile was thinking of replacing the DSL with starlink until Elon turned out to be such a pile of nazi garbage, so for now DSL remains the backup link.
To make things as transparent as possible, the firewalls manage their IPs with CARP. Obviously there’s no way to have a single public IP that ports itself magically from one ISP to another, but on the LAN side it works great and on the WAN side it at least smooths out a lot of possible failure scenarios. Some useful discussions of this setup are here.
I had a 4G modem with a web interface many years ago. It was flaky and would often hang. I just had a raspberry pi on my network pinging some known address, if it failed for long enough it’d replay the commands to restart the web interface.
If I’d have the same problem today I’d probably have home assistant power cycle the router with a smart plug.
homeassitant access would require internet wouldn’t it?
Nah, you can use an HA Ping (Settings>Add Integration-> Ping) trigger against 9.9.9.9 or whatever and run a script if it comes back false for X minutes
I already run homeassistant, that’s def something to look into. Thanks!
I buy better gear that doesn’t regularly require a reboot
My mikrotik has not NEEDED a reboot ever, except when I run upgrades. Everything is set up to auto recover when disconnects happen, and power up properly if there’s an extended power failure that causes UPS shutdowns.
I will never understand why people think rebooting their router regularly is a normal thing. That just means your gear or setup is crap.
I get what you mean, I only use L3 top-of-rack data center switches, what a bunch of amateur peasants !
That’s called unnecessary overkill and you’ll introduce failures from excess complexity.
Actually they are a reduction in complexity, yes I am not using most of the features, they run in L2, but their backbone runs off a x86 single board computer and they run a mostly hardware agnostic OS (Sonic).
This is what I mean by a reduction in complexity, it’s basically running debian os with pcie switchdev interfaces on a PC. It’s familiar and stable, not locked in to proprietary hardware, they’re cheap and plentiful
My Mikrotik routers and switches also reboot in seconds (even for upgrades), which I’ve never seen consumer gear do!
Even my Ubiquiti switches seem to take a minute or so to start forwarding traffic after a reboot; whilst my Mikrotik switches reboot faster than any of my unmanaged switches start up.
Cisco, HP, and many other “Enterprise” switches will take a minute or two to start forwarding frames after boot.
Doesn’t really excuse Ubiquiti but that’s what they’re trying for.
You have to say what your installation is like. If it’s typical consumer cable modem crap that locks up and needs a power cycle now and then, the simplest approach might just be to add a remote power cycle mechanism:
https://www.adafruit.com/product/2935
isn’t the cheapest but it’s nicely packaged. That’s just a switchable power strip, so yes you’d need some kind of cellular internet or meshtastic or something to operate it if you want to do it manually, or else just have something automatically power cycle the router if it notices the internet down for more than 3 minutes or something.
In the more serious case where your box is at a data center, you generally open a ticket with the data center and ask them to reboot the box (“remote hands”). Sometimes they will do that for free, other times they charge you.
0 4 * * * /usr/sbin/reboot
Adjust interval as needed.
Or if you want something a bit faster and less disruptive:
#!/bin/sh NAME="$0" logger_cmd () { echo $@ logger -p daemon.info -t "$NAME[$$]" $@ } if ! which ncat 1>/dev/null then logger_cmd "ncat not found, installing..." opkg update && opkg install ncat fi chk_conn () { echo "Checking connectivity to $@" if ncat --send-only --recv-only -w 334ms $@ 2>/dev/null; then return 0 fi logger_cmd "Cannot reach $@" return 1 } restart_network_iface() { # TODO: Don't restart every minute COOLDOWN_LOCK=/tmp/internet-connectivity-watchcat.tmp COOLDOWN_SECONDS=300 cooldown_time_end=$(cat $COOLDOWN_LOCK || echo 0) time_now="$(cat /proc/uptime)" time_now="${time_now%%.*}" cooldown_time_left=$((cooldown_time_end - time_now)) if [ "$cooldown_time_left" -lt "1" ] then logger_cmd "Restarting network interface: \"$1\"." ifdown "$1" ifup "$1" cooldown_time_end=$((time_now + COOLDOWN_SECONDS)) echo $cooldown_time_end > $COOLDOWN_LOCK else logger_cmd "Skipping interface \"$1\" restart due to cooldown. Cooldown left: $cooldown_time_left seconds" fi } logger_cmd "Checking internet connectivity..." if chk_conn google.com 443 \ || chk_conn amazon.com 443 \ || chk_conn facebook.com 443 \ || chk_conn cloudflare.com 443 \ || chk_conn telekom.de 443 then logger_cmd "Connected to internet." else logger_cmd "Not connected to internet." restart_network_iface "$1" fi
In
restart_network_iface
use/usr/sbin/reboot
instead of interface up/down and run the script every few minutes via cron or systemd timer. This was written for OpenWrt so if you use that you can use it as-is. For other systems you’d also have to adjust thelogger_cmd
.You can place that on another machine and send a signal to a smart plug instead if you’re worried of a locked up / frozen router. That said if your router freezes like that, you should probably change it and you should be able to run this script on it.
I will give this a shot. it hasn’t happened in a couple weeks, So I can’t remember if the device freezes completely or an interface reboot would do the trick.
Even if it isn’t an OpenWRT router if you have a hardwired server it can probably do a soft reset of the router or even modem (most modems I’ve used have had a web interface). If your router is in such a bad state it only responds to a hard reset it’s probably reaching EoL.