How to calculate cost-per-tokens output of local model compared to enterprise model API access

SmokeyDope@lemmy.world · edit-2 4 months ago

How to calculate cost-per-tokens output of local model compared to enterprise model API access

rebelsimile@sh.itjust.works · 4 months ago

I do all my local LLM-ing on an M1 Max macbook pro with a power draw of around 40-60 Watts (which for my use cases is probably about 10 minutes a day in total). I definitely believe we can be more efficient running these models at home.

wise_pancake@lemmy.ca · 4 months ago

I wish I’d sprung for the max when I bought my M1 Pro, but I am glad I splurged on memory. Really aside from LLM workloads this thing is still excellent.

Agree we can be doing a lot more, the recent generation of local models are fantastic.

Gemma 3n and Phi 4 (non reasoning) are my local workhorses lately.