Recommendations for a lightweight Python LLM framework for a webapp?

hendrik@palaver.p3x.de · 3 hours ago

Hmm… Would be interesting to find out what kind of effect that has on the average marriage or relationship 😅

hendrik@palaver.p3x.de · 4 hours ago

Likely everyday stuff… Meeting minutes, phone or video conferences and such…

hendrik@palaver.p3x.de · 8 hours ago

I think a dual-channel system with DDR3-1600 isn’t what we call fast any more. So you should try to avoid offloading with that. But I’m not an expert on the numbers, and it depends a bit on the specific use-case whether it makes sense to invest in old hardware, or buy a new machine along with a graphics card, since that’s quite some money.

hendrik@palaver.p3x.de · edit-2 6 days ago

I think after initial installation, you open a browser with the post-installation step and configure a username and password there. I’m not entirely sure, it’s been some time since I did it. But depending on installation method, I don’t think it has a provided password.

General password advice: Check caps lock, and if you use like a German keyboard if ‘z’ and ‘y’ are swapped.

hendrik@palaver.p3x.de · edit-2 8 days ago

I think pretty much any mosfet / h-bridge / motor control board with pwm should do.
If you have those 4-wire fans with a pwm input that accepts 3V3 logic, you might even be able to attach them directly to the ESP:

https://github.com/KlausMu/esp32-fan-controller#wiring-diagram-for-fan-and-bme280

But that’s not all fans, I had some mixed results with that.

hendrik@palaver.p3x.de · edit-2 12 days ago

Yes. Steam is available on Linux, pretty easy to install and it comes with a compatibility layer (Proton) which works quite well.

Linux is a bit different than Windows. But I’d say just using it is about as complicated as using Windows. You’ll just have to try and see whether you like it. And if it’s hard or easy for you to relearn a few things. I mean if you’re in the Browser and Steam all day, those will be the same applications and also look and work the same way. Other than that you could face some issues with gaming hardware and you have to fiddle with things, or everything works out of the box. You can’t tell beforehand.

hendrik@palaver.p3x.de · 14 days ago

You’re welcome. If you fail and you can’t just add more RAM, maybe have a look at renting cloud servers. For example you can rent a computer on runpod.io for $2 an hour with double your specs. At least that’s how I do one-off big compute tasks.

hendrik@palaver.p3x.de · edit-2 14 days ago

Most frontends should have you covered and scale down the image appropriately (and automatically). I’m not entirely sure about that. I think working on resolutions higher than supported should either not work due to the image encoder, or lead to degraded performance. Usually they’re scaled to what the model needs, somewhere within the pipeline. You can crop them if you like, that sheds off some pixels. Or split them up and feed them in one part after the other, if that somehow makes sense. But I bet with most software you can just upload a random image and it’ll do whatever scaling is required for you.

hendrik@palaver.p3x.de · edit-2 14 days ago

Uh sorry, no. Since I don’t use vllm, I don’t know. It certainly depends on the method you choose. The activation-aware ones will use a lot of resources. Just truncating the numbers to 8bit (or whatever) uses very little resources, I did that on my laptop. Also depends on model architecture and the size of the model you feed in. Since you gave an 32b parameter model, I’d expect it to take about 64GB loaded fully into memory using 16bit floating point numbers. (32 ~~million~~ billion times 2 bytes.) But I don’t really know whether quantization methods load the full thing, or if they do it in chunks. You’d have to google that or try it.

hendrik@palaver.p3x.de · edit-2 15 days ago

I think they have you covered with this: https://github.com/vllm-project/llm-compressor

I always find quantizations available since I use gguf models. But I’ve done it before, it’s not hard to do. Pick a method with good performance, it’s going to affect your inference speed. I don’t know which one is “best” for vllm.

hendrik@palaver.p3x.de · edit-2 15 days ago

I think MiniCPM advertises with strong OCR capabilities. Maybe try that one. But is that 384 number correct? Seems to me even the common Llama 3.2 models do images up to 1120x1120px for a while and that should be enough. MiniCPM can do even more (1.8 million pixels).

Setup shouldn’t be too hard. Lots of interfaces and inference frameworks have vision capabilities these days. It should work pretty much out of the box with something like Ollama, AnythingLLM or llama.cpp based stuff.

Other approaches include doing OCR by more traditional means and then feed the result into an LLM. Just make sure to check that whatever you use can read Chinese.

I’m not entirely sure what your prompt should look like. Obviously attach the image. With ChatGPT I believe I had a bit better results when I told it to first copy and write down the text, and then translate it in a second step. And then do whatever I want after that. I’m not sure what it does if you make it do everything in one go.

hendrik@palaver.p3x.de · edit-2 15 days ago

Maybe Discover isn’t the best choice. I believe that’s made for the KDE desktop and Gnome should come with “Gnome Software” per default?! I’m not entirely sure what kind of concept Fedora has. I usually use the command line or some of the older package managers with more options and settings, so I can’t really tell what’s best here. These modern and shiny ones also regularly confuse me and I install Flatpaks by accident or whatever. Maybe try something else, maybe the Fedora community has some recommendations for a better one.

hendrik@palaver.p3x.de · 15 days ago

Okay, that’s weird. It shouldn’t try to remove sudo. That’s just silly. But I haven’t used Fedora, I have no idea what’s going on here.

hendrik@palaver.p3x.de · edit-2 15 days ago

Hmm, that doesn’t show a lot of details but it seems correct. Gnome should have an Extension manager. It should show up if you type some first letters of the word “Extensions” into the app overview. Maybe look if shows up there and got activated.

hendrik@palaver.p3x.de · edit-2 16 days ago

The Gnome-shell extension is supposed to show up as a button in the top right menu. How did you install it? Should be called “gnome-shell-extension-gsconnect”.

hendrik@palaver.p3x.de · 16 days ago

I don’t have good first hand experience, but i know the Awesome Selfhosted list has a plethora of them.

hendrik@palaver.p3x.de · edit-2 17 days ago

I’m not sure if this is what you’re looking for, but for AI generated novels, we have Plot Bunni. That’s specifically made to draft, generate an outline and chapters and then the story. Organize ideas… It has a lot of rough edges though. I had some very limited success with it, and it’s not an editor. But it’s there and caters to storywriting.

hendrik@palaver.p3x.de · edit-2 23 days ago

I think that’s a size where it’s a bit more than a good autocomplete. Could be part of a chain for retrieval augmented generation. Maybe some specific tasks. And there are small machine learning models that can do translation or sentiment analysis, though I don’t think those are your regular LLM chatbots… And well, you can ask basic questions and write dialogue. Something like “What is an Alpaca?” will work. But they don’t have much knowledge under 8B parameters and they regularly struggle to apply their knowledge to a given task at smaller sizes. At least that’s my experience. They’ve become way better at smaller sizes during the last year or so. But they’re very limited.

I’m not sure what you intend to do. If you have some specific thing you’d like an LLM to do, you need to pick the correct one. If you don’t have any use-case… just run an arbitrary one and tinker around?

hendrik@palaver.p3x.de · 24 days ago

Thanks! I’ve updated the link. I always just use Batocera or something like that, which has Emulationstation and Kodi set up for me. So I don’t pay a lot of attention to the included projects and their development state…

I didn’t include this, since OP wasn’t mentioning retro-gaming. But Batocera, Recalbox, Lakka, RetroPie are quite nice. I picked one which includes both Kodi and Emulationstation and I can switch between the interfaces with the gamecontroller. I get all the TV and streaming stuff in Kodi, and Emulationstaation launches the games. And I believe it can do Flatpaks and other applications as well.

hendrik@palaver.p3x.de · edit-2 24 days ago

https://plasma-bigscreen.org/ from KDE? I’m not sure if they’ve replaced that since. Wikipedia says it’s unmaintained. Depending on your use-case, you might want to have a look at Emulationstation, Steam Big Picture and Kodi Plugins, as well.

hendrik@palaver.p3x.de · edit-2 4 months ago

Recommendations for a lightweight Python LLM framework for a webapp?

hendrik@palaver.p3x.de · edit-2 6 months ago

(New) papers by Meta: Large Concept Models and BLT

hendrik@palaver.p3x.de · edit-2 9 months ago

Is Arli AI a legit cloud LLM inference service? Any user experience?

hendrik

Recommendations for a lightweight Python LLM framework for a webapp?

Recommendations for a lightweight Python LLM framework for a webapp?

(New) papers by Meta: Large Concept Models and BLT

(New) papers by Meta: Large Concept Models and BLT

Is Arli AI a legit cloud LLM inference service? Any user experience?

Is Arli AI a legit cloud LLM inference service? Any user experience?