• NerdsGonnaNerd@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 month ago

    What hardware are you running them on? I am interested in selfhosting a llm myself but I am not sure which hardware I need. How do you think do these self hosted variants compare to for example claude sonnet 4.6?

            • pishadoot@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              16 days ago

              Rookie question, forgive me:

              How are the scores generated? How do you get 7/8.5 on a complicated ethical question? How are these scales even defined?

                • pishadoot@sh.itjust.works
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  15 days ago

                  Ok, I really really appreciate the depth you’ve put into your answers.

                  I always look at these grading rubrics people post for models and I’ve never seen an example of how they get ranked.

                  At this point I don’t think I’ll be ranking models myself, I’m not an enthusiast (yet) just running some ~30B models at home for various things and trying to stay afloat in what is a significantly more complicated ecosystem than I had imagined when I started.

                  But I really appreciate what you’ve written and I’m going to save all this.

                  Last questions - I see that you used Claude to come up with your test questions, right? How do you even validate the anchor answers if you’re not an expert in the field?

                  Do you do this professionally?