Not really sure I understand how these work, do you just feed it a large textual document like a transcript or something, and it turns it into a more machine readable vector format or something?
Or is it just a much smaller LLM that’s more optimized for reading than generating?
Not really sure I understand how these work, do you just feed it a large textual document like a transcript or something, and it turns it into a more machine readable vector format or something?
Or is it just a much smaller LLM that’s more optimized for reading than generating?
Basically yes
I’ve only built my own systems that use sentence transformers
You pass in a list of strings, it generates a list of vectors, those vectors can be used for all sorts of similarity analysis and retrieval.