• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    7 hours ago

    What model size/family? What GPU? What context length? There are many different backends with different strengths, but I can tell you the optimal way to run it and the quantization you should run with a bit more specificity, heh.