xcjs

xcjs@programming.dev · 2 months ago

The client is open source and can be administered using the open source Headscale server. I use it with Keycloak as an auth gateway.

xcjs@programming.dev · 5 months ago

It is! It’s a port of OpenSSH. The server has been ported as well, but requires installation as a “Windows Feature”.

xcjs@programming.dev · 5 months ago

Windows now has an SSH client built in.

xcjs@programming.dev · 9 months ago

Getting Keycloak and Headscale working together.

But I did it after three weeks.

I captured my efforts in a set of interdependent Ansible roles so I never have to do it again.

xcjs@programming.dev · 10 months ago

It would be extremely barebones, but you can do something like this with Pandoc.

xcjs@programming.dev · 11 months ago

That I agree with. Microsoft drafted the recommendation to use it for local networks, and Apple ignored it or co-opted it for mDNS.

xcjs@programming.dev · 11 months ago

Macs aren’t the only thing that use mDNS, either. I have a host monitoring solution that I wrote that uses it.

xcjs@programming.dev · 11 months ago

Yeah, that’s why I started using .lan.

xcjs@programming.dev · 11 months ago

I was using .local, but it ran into too many conflicts with an mDNS service I host and vice versa. I switched to .lan, but I’m certainly not going to switch to .internal unless another conflict surfaces.

I’ve also developed a host-monitoring solution that uses mDNS, so I’m not about to break my own software. 😅

xcjs@programming.dev · edit-2 1 year ago

Coincidentally, I just found this other thread that mentions EasyEffects: https://programming.dev/post/17612973

You might be able to use a virtual device to get it working for your use case.

xcjs@programming.dev · edit-2 1 year ago

I just wanted to update this to mention that there are a lot of custom low level performance improvements for CPU based inferencing in Llamafile: https://justine.lol/matmul/

xcjs@programming.dev · 1 year ago

It’s just a different use case to create a single-file large language model engine that automatically chooses the “best” parameters to run under. It uses llama.cpp under the hood.

The intent is to make it as easy as double clicking a binary to get up and running.

xcjs@programming.dev · 1 year ago

It depends on the model you run. Mistral, Gemma, or Phi are great for a majority of devices, even with CPU or integrated graphics inference.

xcjs@programming.dev · 1 year ago

I’m also going to push forward Tilda, which has been my preferred one for a while due to how minimal the UI is.

xcjs@programming.dev · 1 year ago

We all mess up! I hope that helps - let me know if you see improvements!

xcjs@programming.dev · edit-2 1 year ago

I think there was a special process to get Nvidia working in WSL. Let me check… (I’m running natively on Linux, so my experience doing it with WSL is limited.)

https://docs.nvidia.com/cuda/wsl-user-guide/index.html - I’m sure you’ve followed this already, but according to this, it looks like you don’t want to install the Nvidia drivers, and only want to install the cuda-toolkit metapackage. I’d follow the instructions from that link closely.

You may also run into performance issues within WSL due to the virtual machine overhead.

xcjs@programming.dev · 1 year ago

Good luck! I’m definitely willing to spend a few minutes offering advice/double checking some configuration settings if things go awry again. Let me know how things go. :-)

xcjs@programming.dev · edit-2 1 year ago

It should be split between VRAM and regular RAM, at least if it’s a GGUF model. Maybe it’s not, and that’s what’s wrong?

xcjs@programming.dev · 1 year ago

Ok, so using my “older” 2070 Super, I was able to get a response from a 70B parameter model in 9-12 minutes. (Llama 3 in this case.)

I’m fairly certain that you’re using your CPU or having another issue. Would you like to try and debug your configuration together?

xcjs@programming.dev · 1 year ago

Unfortunately, I don’t expect it to remain free forever.