Hey everybody. I found this interesting. It’s likely not a game changer for anyone, but “hardware” watchdogs in Proxmox was a new one for me, and was a cheap and easy, hacky fix to deal with a low value VM that was periodically hanging. This is a nice tool to add to the belt, hope you all enjoy!



Nice.
Other actions are possible with watchdog timers, especially with hypervisors. They can invoke a script or use an agent to kill a misbehaving process.
Ultimately, the best solution is not to need the timers at all, so finding the culprit within the client is ideal, though not always possible.
VMs hanging on memory often have incorrect caching policies, you may want to investigate that.
You’re absolutely right! I’d point you back to my notion of cost-benefit analysis. Anything more than the 20min that I’ve spent on analysis so far isn’t worth my time. If the VM falls over permanently, that was a risk and my time savings has already been worth that risk. If I were looking at something like a production file server or domain controller, sure – I’d spend more time on it. Likely though, I’d just have engineered it better in the first place. Not every problem warrants a high precision solution. 🙂