llama.cpp for GPU only

bia@lemmy.ml · 3 years ago

llama.cpp for GPU only

Hudsonius@lemmy.ml · 3 years ago

GPTQ-for-llama with ooba booga works pretty well. I’m not sure to what extent it uses CPU, but my GPU is at 100% during inference so it seems to be mainly that.

bia@lemmy.ml · 3 years ago

I’ve looked at that before. Do you use it with any UI?

Hudsonius@lemmy.ml · 3 years ago

Yea it’s called Text Generation web UI. If you check out the Ooba Booga git, it goes into good details. From what I can tell it’s based on the automatic1111 UI for stable diffusion.

dragonfyre13@sh.itjust.works · 3 years ago

It’s using Gradio, which is what auto1111 also uses. Both of these are pretty heavy modifications/extensions that do a lot to push Gradio to it’s limits, but that’s package being used in both. Note, it also has an api (checkout the --api flag I believe), and depending on what you want to do there’s various UIs that can hook into the Text Gen Web UI (oobabooga) API in various ways.

Equality_for_apples@sh.itjust.works · 3 years ago

Personally, I have nothing but issues with Oogas ui, so I connect Silly Tavern to it or KoboldCPP. Works great