Cluster Operations & Maintenance
Watchdog Timers
Using hardware watchdogs to workaround hardware/software lockups.
Talos Linux now supports hardware watchdog timers configuration.
Hardware watchdog timers allow to reset (reboot) the system if the software stack becomes unresponsive.
Please consult your hardware/VM documentation for the availability of the hardware watchdog timers.
The implementation of the watchdog device can be queried with:
To enable the watchdog timer, patch the machine configuration with the following:
Talos Linux will set up the watchdog time with a 5-minute timeout, and it will keep resetting the timer to prevent the system from rebooting.
If the software becomes unresponsive, the watchdog timer will expire, and the system will be reset by the watchdog hardware.
To inspect the watchdog timer status, run:
Current status of the watchdog timer can also be inspected via Linux sysfs: