I have spent a few days tweaking this setup to attain these results:

Model Prompt (tok/s) Generation (tok/s)
gemma-26b-moe 8.9 6.4
qwen3.5-4b-no-think 21.5 8.4

Although modest, It is great for local parsing and analysis of my self-hosted homelab data where sending logs to external APIs is not desirable.

Typical workflows:

  • Log analysis: Piping journalctl output to the API for error triage and root cause hypothesis generation.
  • Configuration synthesis: Generating AdGuard Home rewrite rules, nginx location blocks, or fstab entries based on defined parameters.
  • Troubleshooting constraints: Querying for failure modes specific to the local topology (e.g., NFS mount failures over a 1 Gbps unmanaged switch, Tailscale DERUP routing behind CGNAT).
  • Alert context: Correlating Beszel/Uptime Kuma notifications with service-specific knowledge (e.g., “mediabox CPU spike while SabNZBd is extracting”).