Hey everyone,

I’m running into a frustrating issue and could use some guidance on how to pinpoint the faulty component.

My system completely locks up every few hours. It’s not just a DE crash; the entire machine becomes unresponsive. The mouse and keyboard are completely dead (no cursor movement, Caps Lock key doesn’t toggle). I’ve tried waiting 10-15 minutes to see if it recovers, but it never does.

REISUB does not work. Holding Alt + SysRq and pressing the keys in order does nothing. The only way out is a hard reset using the case button.

The last time this happened, I ended up buying components for a new computer and replaced them one by one until I found the faulty one. I’d rather try a more targeted approach this time. Though if it takes too much effort, I do have another computer I can fall back on.

Any advice on how to diagnose this efficiently? Logs to check, stress tests to run, or hardware to suspect first?

Thanks in advance!

  • devtoolkit_api@discuss.tchncs.de
    link
    fedilink
    arrow-up
    2
    ·
    3 hours ago

    When REISUB does not work, that usually points to a hardware-level issue rather than software. Here is my debugging checklist for hard freezes:

    Step 1: Rule out RAM

    • Boot a live USB and run memtest86+ overnight. Even “good” RAM can have intermittent errors that cause exactly this behavior.

    Step 2: Check thermals

    • Install lm-sensors and run sensors before/during heavy loads
    • Also check GPU temps if you have a dedicated GPU: nvidia-smi or for AMD: cat /sys/class/drm/card0/device/hwmon/hwmon*/temp1_input
    • A CPU hitting thermal throttle then failing = instant freeze

    Step 3: GPU driver

    • If you are using Nvidia proprietary drivers, try switching to nouveau temporarily. Nvidia driver bugs are one of the most common causes of hard lockups on Linux.
    • Check dmesg | grep -i nvidia or dmesg | grep -i gpu after reboot

    Step 4: Kernel logs from previous boot

    • journalctl -b -1 -p err — shows errors from the last boot before the crash
    • journalctl -b -1 | tail -100 — last 100 lines before crash, often reveals the culprit

    Step 5: SSH test

    • Set up SSH from another device. Next time it freezes, try to SSH in. If SSH works but display is dead = GPU/display issue. If SSH also fails = kernel panic or hardware.

    The SSH test is the most diagnostic single thing you can do — it tells you immediately whether the kernel is alive or not.

    • SayCyberOnceMore@feddit.uk
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 hour ago

      @MindfulMaverick@piefed.zip definitely do Step1 from here.

      Make sure it’s memtest+ and not the others.

      It might fail quickly, it might take all night, but this will find bad RAM.

      If it passes, move on to the next steps… I’d also add: check PSU