What’s higher precision for you? What I remember from the old measurements for ggml is, lower than Q3 rarely makes sense and roughly at Q3 you’d think about switching to a smaller variant. But on the other hand everything above Q6 only shows marginal differences in perplexity, so Q6 or Q8 or full precision are basically the same thing.
As a memory-poor user (hence the 8gb vram card), I consider Q8+ to be is higher precision, Q4-Q5 is mid-low precision (what i typically use), and below that is low precision
Thanks. That sounds reasonable. Btw you’re not the only poor person around, I don’t even own a graphics card… I’m not a gamer so I never saw any reason to buy one before I took interest in AI. I’ll do inference on my CPU and that’s connected to more than 8GB of memory. It’s just slow 😉 But I guess I’m fine with that. I don’t rely on AI, it’s just tinkering and I’m patient. And a few times a year I’ll rent some cloud GPU by the hour. Maybe one day I’ll buy one myself.
What’s higher precision for you? What I remember from the old measurements for ggml is, lower than Q3 rarely makes sense and roughly at Q3 you’d think about switching to a smaller variant. But on the other hand everything above Q6 only shows marginal differences in perplexity, so Q6 or Q8 or full precision are basically the same thing.
As a memory-poor user (hence the 8gb vram card), I consider Q8+ to be is higher precision, Q4-Q5 is mid-low precision (what i typically use), and below that is low precision
Thanks. That sounds reasonable. Btw you’re not the only poor person around, I don’t even own a graphics card… I’m not a gamer so I never saw any reason to buy one before I took interest in AI. I’ll do inference on my CPU and that’s connected to more than 8GB of memory. It’s just slow 😉 But I guess I’m fine with that. I don’t rely on AI, it’s just tinkering and I’m patient. And a few times a year I’ll rent some cloud GPU by the hour. Maybe one day I’ll buy one myself.