Perhaps give Ramalama a try?
Just a stranger trying things.
Perhaps give Ramalama a try?
Indeed, Ollama is going a shady route. https://github.com/ggml-org/llama.cpp/pull/11016#issuecomment-2599740463
I started playing with Ramalama (the name is a mouthful) and it works great. There is one or two more steps in the setup but I’ve achieved great performance and the project is making good use of standards (OCI, jinja, unmodified llama.cpp, from what I understand).
Go and check it out, they are compatible with models from HF and Ollama too.
Sorry didn’t mean to sound condescending, but capacitors can indeed output their charge at extremely high rates but have terrible energy storage capacity. You would need an unreasonably large capacitor bank, but it is technically feasible as that’s what the CERN has. But in this case batteries are a more suitable option, they can be tuned between energy and power to fit the exact use case more appropriately.
Capacitors, lol
Isn’t that something you solve with snooze? Like put the alarm for the earlier time, set the snooze time to 15min and hit snooze until you want to wake up?
Remove unused conda packages and caches:
conda clean --all
If you are a Python developer, this can easily be several or tens of GB.
I think it has potential but I would like to see benchmarks to determine how much. The fact that they have 5Gbps Ethernet and TB4 (or was it 5?) is also interesting for clusters.
Would you be able to share more info? I remember reading their issues with docker, but I don’t recall reading about whether or what they switched to. What is it now?
Well, in the case of legacy GPUs you are forced to downgrade drivers. In that case, you can no longer use your recent and legacy GPU simultaneously, if that’s what you were hoping for.
But if you do go the route of legacy drivers, they work fine.
I can’t speak about vulkan, but I had an old GTX 680 from 2012, that has worked without issue until a year back or so. I was able to get it recognized by nvidia-smi.
I had it running using the proprietary drivers, with the instructions from here, using the legacy method: https://rpmfusion.org/Howto/NVIDIA#Legacy_GeForce_600.2F700
Is that what you did?
PS: When I mean working without issue I mean gaming on it using proton.
Deepseek is good at reasoning, qwen is good at programming, but I find llama3.1 8b to be well suited for creativity, writing, translations and other tasks which fall out of the scope of your two models. It’s a decent all arounder. It’s about 4.9GB in q4_K_M.
I think the requested salary amount plays a big role. If a typical 100k annual role was rejected on salary misalignments despite requesting 60k, I would be much more critical of the company.
Regarding photos, and videos specifically:
I know you said you are starting with selfhosting so your question was focusing on that, but I would like to also share my experience with ente which has been working beautifully for my family, partner and myself. They are truly end to end encrypted, with the source code available on github.
They have reasonable prices. If you feel adventurous you can actually also host it yourself. They have advanced search features and face recognition which all run on device (since they can’t access your data) and it works very well. They have great sharing and collaborating features and don’t lock features behind accounts so you can actually gather memories from people on your quota by just sharing a link. You can also have a shared family plan.
To run the full 671B sized model (404GB in size), you would need more than 404GB of combined GPU memory and standard memory (and that’s only to run it, you would most probably want it all to be GPU memory to make it run fast).
With 24GB of GPU memory, the largest model which would fit from the R1 series would be the 32b-qwen-distill-q4_K_M (20GB in size) available at ollama (and possibly elsewhere).
Ollama is very useful but also rather barebones. I recommend installing Open-Webui to manage models and conversations. It will also be useful if you want to tweak more advanced settings like system prompts, seed, temperature and others.
You can install open-webui using docker or just pip, which is enough if you only care about serving yourself.
Edit: open-webui also renders markdown, which makes formatting and reading much more appealing and useful.
Edit2: you can also plug ollama into continue.dev, an extension to vscode which brings the LLM capabilities to your IDE.
Alternatively, you don’t even need podman or any containers, as open-webui can be installed simply using python/conda/pip, if you only care about serving yourself:
https://docs.openwebui.com/getting-started/quick-start/
Much easier to run and maintain IMO. Works wonderfully.
Thank you for clearing it out. Important detail indeed. You get what you pay for I guess?
Seems the chapter for Jellyfin has been “coming soon” for 3 years, too bad.
I’m not saying it’s not true, but nowhere on that page is there the word donation. And if it is, the fact that it is described and a license, tied to a server or a user causes a lot of confusion to me, especially when combined with the fact that there is no paywall but that it requires registration.
Why use the term license, server and user? Why not simply say donation and with the option of displaying the support by getting exclusive access to a badge like signal does?
Again, I’m very happy immich is free, it is great software and it deserves support but this is just super confusing to me and the buy.immich.app link does not clarify things nor does that blog post.
Edit: typo
Congrats! Amazing project, exciting interface and you went the extra mile on the integration side with third parties. Kudos!
Edit: I’ll definitely have to try it out!