How to diagnose a complete system freeze (no REISUB, no mouse/kb, have to hard reset)?

MindfulMaverick@piefed.zip · 2 months ago

devtoolkit_api@discuss.tchncs.de · 2 months ago

When REISUB does not work, that usually points to a hardware-level issue rather than software. Here is my debugging checklist for hard freezes:

Step 1: Rule out RAM

Boot a live USB and run memtest86+ overnight. Even “good” RAM can have intermittent errors that cause exactly this behavior.

Step 2: Check thermals

Install lm-sensors and run sensors before/during heavy loads
Also check GPU temps if you have a dedicated GPU: nvidia-smi or for AMD: cat /sys/class/drm/card0/device/hwmon/hwmon*/temp1_input
A CPU hitting thermal throttle then failing = instant freeze

Step 3: GPU driver

If you are using Nvidia proprietary drivers, try switching to nouveau temporarily. Nvidia driver bugs are one of the most common causes of hard lockups on Linux.
Check dmesg | grep -i nvidia or dmesg | grep -i gpu after reboot

Step 4: Kernel logs from previous boot

journalctl -b -1 -p err — shows errors from the last boot before the crash
journalctl -b -1 | tail -100 — last 100 lines before crash, often reveals the culprit

Step 5: SSH test

Set up SSH from another device. Next time it freezes, try to SSH in. If SSH works but display is dead = GPU/display issue. If SSH also fails = kernel panic or hardware.

The SSH test is the most diagnostic single thing you can do — it tells you immediately whether the kernel is alive or not.

SentiEnt@lemmy.pt · 2 months ago

Agent?

SayCyberOnceMore@feddit.uk · 2 months ago

@MindfulMaverick@piefed.zip definitely do Step1 from here.

Make sure it’s memtest+ and not the others.

It might fail quickly, it might take all night, but this will find bad RAM.

If it passes, move on to the next steps… I’d also add: check PSU