How do you debug system issues on Linux?

PumpkinDrama@reddthat.com · 1 day ago

How do you debug system issues on Linux?

paequ2@lemmy.today · 18 hours ago

On Guix, I could bisect (like git bisecting) my OS. So usually what would happen is:

I’m running in a good state
I accidentally mess something up
oh no
guix system switch-generation $n, where n is the last known good state
then binary search until I find the first bad generation
look at the config changes I made
fix them
back to good state

Unfortunately, my laptop is too new so Guix isn’t fully compatible with all my hardware. (Yes, I was using nonguix)

But that was a pretty neat experience compared to debugging something on Arch.

DigitalDilemma@lemmy.ml · 1 day ago

Sysadmin here, this is my usual flow for various distros

as /u/FigMcLargeHuge mentions, recent logfiles in /var/log. Notably /var/log/messages (EL) and syslog (Debian) but anything that’s recent.
journalctl - More and more things are moving to binary logging. If you know the process, then journalctl -u processname restricts to just that. also add a -f for tailing it for ongoing logs.
dmesg -T - especially at system level, this captures any hardware/low level logs. (-T reports actual times, not just seconds since boot)
Once you have some logs that you think are related, but don’t know WTF they actually mean, you have two options. The first is to google likely strings. This is… ineffective much of the time - accidental misinformation and outdated advice is increasingly common. The answer might be there, but it takes time and can be frustrating to weed out the cruft.

The better way, (IMO, and people downvote me for saying this) is to use AI. Get a few lines of logs with the errors, check them for confidential information, and simply paste the suspect lines into chatgpt, gemini, claude, co-pilot, whatever. No need for context, it’ll figure that out. The LLM will, 4 times out of 5, identify the problem very quickly.

Now, once it’s identified that, it will offer to fix it for you. This is where you’ve got to be on your toes as LLMs are really really quick to give bad advice at this level. But that first triage is nearly always worth doing and helps shape your own mind as to what’s going on. AI is still useful for fixing it, but do understand what it’s telling you to do.

MangoCats@feddit.it · 11 hours ago

use AI. Get a few lines of logs with the errors, check them for confidential information, and simply paste the suspect lines into chatgpt, gemini, claude, co-pilot, whatever

concur. I used to put smaller snippets of the logs into Google search to hopefully bring up pages from fellow sufferers of the same malaise; that usually worked, but AI is doing it better - now.

BCsven@lemmy.ca · 1 day ago

I have resorted to the AI step also, if Stract.com doesn’t give me a good link, because if I paste a minidlna crash log Google responds with:

Mini Cooper on sale
Buy your DAC device here
want to sign up to streaming music
network and NAS comparisons

Useless.

At least AI said: based on your error it appears a file in your database has metdata tags it cannot parse properly. Sure enough the tagger I used had applied a tag to a wmv file and Minidlna couldn’t deal with tag1 area vs tag 2 areas used in other file formats.

ZeDoTelhado@lemmy.world · 1 day ago

Did you try to do this workflow with local models? If so, in your experience what are the better models for this?

DigitalDilemma@lemmy.ml · 17 hours ago

We did experiment with local models. They were okay, if a little slow with the resources we allocated for testing. Ultimately though, we paid for copilot. I’m still a little sceptical that it won’t leak data, despite the assurances, so I do clean anything sensitive before pasting.

As for best models - generally gpt4 or 5 is my go-to, but the others have their uses. I tend to stick with one until it annoys me, then move on. Claude’s pretty good for code help, imo, but there’s not really a huge difference between them.

What’s your experiences?

ZeDoTelhado@lemmy.world · 18 minutes ago

I do not use models in general online, but my needs are also much smaller. Max I use my local model for ollama is translations. I am always interested in seeing more focused models so we can use on lower end hardware

silly goose meekah@lemmy.world · 1 day ago

If you’re using systemd you should know journald. There are UIs to make searching the journal logs easier, like journald browser

buckykat [none/use name]@hexbear.net · 1 day ago

I simply stopped using Manjaro, this resolved all system-level issues I had encountered.

FigMcLargeHuge@sh.itjust.works · edit-2 1 day ago

Just as a general rule, I would start checking log files. You can start by searching /var/log for files that have been modified in the last few mins with something like “sudo find /var/log -mmin -10 -ls 2>/dev/null”. That will get you all log files in /var/log changed within the last 10 mins. Then you can tail those or grep them looking for clues. I have done searches of the entire file system looking for log files that were recently modified to find clues. It might also help to send the output to a file so you can view that and scroll up and down rather than just trying to read the output of the find, tail or grep commands. Put a “1>/{path}/filenameyouwanttouse.out” at the end of the command or you can pipe it to the tee command and it will show on the screen and write to the file you specify.

rozodru@piefed.social · 1 day ago

journalctl and log files are very valuable. If it’s specific to an application running said application in a terminal with verbose also gives can potentially provide you with a clear indication of what’s going wrong.

I’m dyslexic so I get syntax errors all the damn time and thankfully using NixOS it likes to remind me on rebuild how much of an idiot I am.

Worse comes to worse you can always plug the error into an LLM like Claude or Chatgpt. But take that with a grain of salt. It’ll give you a good base to start from for debugging but never trust something like Claude that will constantly tell you “it’s a known issue” when it isn’t.

All this being said I’ve had the best experience for help via whatever application/distro/whatever IRC channels on Libera chat.

TimeSquirrel@kbin.melroy.org · 1 day ago

There’s an entire step between trying to figure it out yourself and resorting to an LLM which is probably likely to tell you to shove cheese into the USB ports. Regular web searching. Forum and social media posts. The distro’s wiki itself or other such resources. You know, the stuff the AI originally sucked up, mashed together, mixed around, and spat back out.

rozodru@piefed.social · 1 day ago

right that’s why I said worse comes to worse and to take it with a grain of salt. For very simple issues it’s fine, beyond that it’s a coin toss. It’s a fine rubber duck. Like if I missed something obvious but I’m just not seeing it then it might point that out for me. Like for example I recently reinstalled my OS and I couldn’t get wireguad to work so as a last ditch effort I plugged it into Claude and it told me that I had forgotten to replace a privatekey on one of the peers. I had just completely missed it.

Sophienomenal@lemmy.blahaj.zone · 1 day ago

Generally, it depends on the issue. The first thing I’d check is journalctl, and if there are any errors, I usually look up “[pasted error] [distro name]” and go from there. if I’m unable to find errors, then my next bet is to look up “[description of issue] [distro name]”. Unless I am directly familiar with the component that is having an issue, I try to see if I can find a solution online first. Of course, I never recommend running commands you read online that you don’t understand, so take it as a learning experience and pull up some man pages to see what everything is doing. By doing that, you can even begin to learn how to debug and fix these issues by yourself. Even just finding issues other people have and proving it isn’t your issue helps narrow it down.

What I will never under any circumstances recommend is using an LLM. Please, just use a normal search engine (I prefer DDG), and find forum posts from real people. Those people are generally capable of understanding what they’re saying, so they won’t give completely made up information based on generation of the most likely next word from the data an LLM model was trained on. Besides, chances are that the LLMs are trained on the data you would find by searching anyway, so why not go straight to the source?

I do find myself having to troubleshoot issues entirely on my own sometimes, but usually those are of my own doing, and I can likely figure out what I did wrong (I host my own server and tinker with it quite often). Of course, since switching to atomic distros on my desktop, I haven’t had any system issues to troubleshoot with it in years. Running Manjaro is practically a guarantee that you’ll have system issues, though. I’ve never had a worse experience with my system than when I ran it, and I’m not alone in that.

Otherwise, if you find yourself unable to find an easy solution, backups are a wonderful thing. My server recently had part of its boot corrupted, and it was just a case of recovering from a backup to restore it. Remember, with backups: 2 is 1 and 1 is none. Data can (and will) get corrupted eventually.

BCsven@lemmy.ca · 1 day ago

Stact.com if you remember the good google times pre 2010

stonkage@aussie.zone · 1 day ago

I plug the error into ChatGPT usually get pointed in the right direction, I mean I no longer have a functional laptop and was given instructions on how to build a really good toaster. But hey I’m learning!

d00phy@lemmy.world · 1 day ago

What distro is the toaster running?

stonkage@aussie.zone · 19 hours ago

As a new user GPT said it would be best to install a beginner level distro like arch.

Admetus@sopuli.xyz · 14 hours ago

“beginner level”

tangeli@piefed.social · 1 day ago

There are guides available. Search for ‘Linux kernel debugging’ or ‘Linux module debugging’, depending on which you are more interested in. And, of course, learn about the relevant parts of the kernel.

You might have a look at Debugging kernel and modules via gdb¶. The kernel.org site has a wealth of information.

Eugenia@lemmy.ml · 1 day ago

I use a distro that doesn’t fall into such things all the time. Linux Mint works great for me, as is Debian.