RahXephon outro

DavidGarcia@feddit.nl · 5 months ago

Q4 will give you like 98% of quality vs Q8 and like twice the speed + much longer context lengths.

If you don’t need the full context length, you can try loading the model at shorter context length, meaning you can load more layers on the GPU, meaning it will be faster.

And you can usually configure your inference engine to keep the model loaded at all times, so you’re not loosing so much time when you first start the model up.

Ollama attempts to dynamically load the right context lenght for your request, but in my experience that just results in really inconsistent and long time to first token.

The nice thing about vLLM is that your model is always loaded, so you don’t have to worry about that. But then again, it needs much more VRAM.

DavidGarcia@feddit.nl · 5 months ago

In my experience anything similar to qwen-2.5:32B comes closest to gpt-4o. I think it should run on your setup. the 14b model is alright too, but definitely inferior. Mistral Small 3 also seems really good. anything smaller is usually really dumb and I doubt it would work for you.

You could probably run some larger 70b models at a snails pace too.

Try the Deepseek R1 - qwen 32b distill, something like deepseek-r1:32b-qwen-distill-q4_K_M (name on ollama) or some finefune of it. It’ll be by far the smartest model you can run.

There are various fine tunes that remove some of the censorship (ablated/abliterated) or are optimized for RP, which might do better for your use case. But personally haven’t used them so I can’t promise anything.

DavidGarcia@feddit.nl · 5 months ago

True, but the newest mistral model is already pretty great

DavidGarcia@feddit.nl · 6 months ago

no tonley fritto down lowed, butte emaity lie sensed a swell

DavidGarcia@feddit.nl · 6 months ago

why would I want to stream myself peeing??

DavidGarcia@feddit.nl · 6 months ago

Every Linux user has to go through a period of compulsive distro hopping. Don’t worry, eventually you’ll grow tired of it and just settle on one workhorse distro.

DavidGarcia@feddit.nl · 6 months ago

I wonder if we’ll ever see open source state of the art high performance RISCV processors with no funny microcontrollers hidden inside them that have full access to all memory and the network

DavidGarcia@feddit.nl · 6 months ago

good, god fearing christians should only watch mormon porn

DavidGarcia@feddit.nl · 7 months ago

fair point when it comes to gaming. My only contact point with Linux + GPU drivers is at work, where everyone would laugh if you’d suggest buying AMD cards

DavidGarcia@feddit.nl · 7 months ago

Intel has like half the valuation of AMD at this point. So expect it to be as good as AMD to half as good.

It’s hard to beat Nvidia since they can hire more than 10x the people.

DavidGarcia@feddit.nl · 7 months ago

How long until CPUs and GPUs just merge into one thing

DavidGarcia@feddit.nl · 8 months ago

crazy what these poor rats have to go through for our curiosity. now we even make them live in car dependent cities, oh the horror

DavidGarcia@feddit.nl · 8 months ago

why are you so interested in logs? are you like a lumberjack?

DavidGarcia@feddit.nl · 9 months ago

Maybe if they make a watch with a camera cover and a laser that draws a little box around what it can see and it all runs locally, then I might be interested.

Mainly to identify plants and mushrooms.

Not a fan of the idea of everyone pointing AI powered cameras at me all the time, like with this weird pin or smart glasses.

Such products should have a legally mandated camera cover, microphone shutoff and a REALLY OBVIOUS tell to everyone around you if you are using the camera or mic.

Bonus points if it screams a really loud “PERVERT” alarm if you’re doing something creepy.

If only that was true for smartphones too…

DavidGarcia@feddit.nl · 9 months ago

It is kind of interesting how open machine learning already is without much explicit advocacy for it.

It’s the only field I can think of where the open version is just a few months behind SOTA in all of IT.

Open training pipelines and open data are the only aspects that could still use improvements in ML, but there are plenty of projects that are near-SOTA and fully open.

ML is extremely open compared to consumer mobile or desktop apps that are always ~10 years behind SOTA

DavidGarcia@feddit.nl · 9 months ago

I think you are looking at work horse distros, like Ubuntu, Fedora, etc… That by now are heavily used for productive work, not personal use. So they favor stability and minor quality of life improvements over shiny new updates.

There’s plenty shiny new cutting edge distros out there that are innovating, e.g. Nix, Silverblue, VanillaOS, all the container focused ones CoreOS, Container OS, Flatcar Container Linux and probably dozens more newer ones I am not aware of .

DavidGarcia@feddit.nl · 9 months ago

Probably because 3rd party app stores can’t install apps like the play store can. you need a rooted phone or flash them as a system app to get the same functionality as Play Store. which isn’t something your average Joe will do.

On a normal unmodified phone you have to manually confirm each app you want to install. so no auto-updates in the background etc.

DavidGarcia@feddit.nl · 9 months ago

it ends on “insertion”

DavidGarcia@feddit.nl · 10 months ago

Vanilla Llama3 can not generate images. It’s probably just used to write the prompt for a text to image model.

But there’s also some Llama3 based Image/Text to Image/Text models out there, I think.

DavidGarcia@feddit.nl · 11 months ago

If generation temperature is non-zero (which it often is), there is inherent randomness to the output. So even if the first number in a statistic should be 1, sometimese it will just randomly pick any other plausible number. Even if the network always picks the correct token as the highest probability, it’s basically doing a coin toss for every token to make answers more creative.

That’s on top of hoping the LLM has even seen that data during training AND managed to memorize it during training AND that the networks just happens to be able to reproduce the correct data given your prompt (it might not be able to for a different prompt).

If you want any reliability at all, you need to use RAG AND also you yourself have to double check all the references it quotes (if it even has that capability).

Even if it has all the necessary information to answer correctly in it’s context window, it can still answer incorrectly.

None of the current models are anywhere close to producing trustworthy output 100% of the time.

DavidGarcia@feddit.nl · 1 year ago

RahXephon outro

DavidGarcia@feddit.nl · 2 years ago

So Ra No Wo To AMV [mild spoilers]

DavidGarcia@feddit.nl · 2 years ago

Iria Zeiram the Animation opening