• panda_abyss@lemmy.ca
    link
    fedilink
    English
    arrow-up
    4
    ·
    10 days ago

    I’ve been looking forward to trying this one.

    I have some use cases where I need to do some large scale data cleanup, but using an LLM is overkill and I already get good results with smaller embeddings.

    I want to try using this model and taking advantage of the Matryoshka dimension reduction to manage the progressively more complex use cases.

    • mierdabird@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      2
      ·
      10 days ago

      Not really sure I understand how these work, do you just feed it a large textual document like a transcript or something, and it turns it into a more machine readable vector format or something?

      Or is it just a much smaller LLM that’s more optimized for reading than generating?

      • panda_abyss@lemmy.ca
        link
        fedilink
        English
        arrow-up
        4
        ·
        10 days ago

        Basically yes

        I’ve only built my own systems that use sentence transformers

        You pass in a list of strings, it generates a list of vectors, those vectors can be used for all sorts of similarity analysis and retrieval.

  • ryokimball@infosec.pub
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    10 days ago

    Looking at the website, I do not see a way to directly download the model. I am on my phone and want to try it out on PocketPal. Anyone see a way to do this?

    Edit: Oh, it’s on hugging face already