embeddinggemma :300m

cm0002@piefed.world · 10 days ago

embeddinggemma :300m

panda_abyss@lemmy.ca · 10 days ago

I’ve been looking forward to trying this one.

I have some use cases where I need to do some large scale data cleanup, but using an LLM is overkill and I already get good results with smaller embeddings.

I want to try using this model and taking advantage of the Matryoshka dimension reduction to manage the progressively more complex use cases.

mierdabird@lemmy.dbzer0.com · 10 days ago

Not really sure I understand how these work, do you just feed it a large textual document like a transcript or something, and it turns it into a more machine readable vector format or something?

Or is it just a much smaller LLM that’s more optimized for reading than generating?

panda_abyss@lemmy.ca · 10 days ago

Basically yes

I’ve only built my own systems that use sentence transformers

You pass in a list of strings, it generates a list of vectors, those vectors can be used for all sorts of similarity analysis and retrieval.

embeddinggemma :300m

embeddinggemma :300m

embeddinggemma:300m