I have spent a few days tweaking this setup to attain these results:
| Model | Prompt (tok/s) | Generation (tok/s) |
|---|---|---|
gemma-26b-moe |
8.9 | 6.4 |
qwen3.5-4b-no-think |
21.5 | 8.4 |
Although modest, It is great for local parsing and analysis of my self-hosted homelab data where sending logs to external APIs is not desirable.
Typical workflows:
- Log analysis: Piping
journalctloutput to the API for error triage and root cause hypothesis generation. - Configuration synthesis: Generating AdGuard Home rewrite rules, nginx location blocks, or
fstabentries based on defined parameters. - Troubleshooting constraints: Querying for failure modes specific to the local topology (e.g., NFS mount failures over a 1 Gbps unmanaged switch, Tailscale DERUP routing behind CGNAT).
- Alert context: Correlating Beszel/Uptime Kuma notifications with service-specific knowledge (e.g., “mediabox CPU spike while SabNZBd is extracting”).



Why would anyone laugh? That’s a perfectly cromulent result.