• 0 Posts
  • 42 Comments
Joined 1 year ago
cake
Cake day: June 4th, 2023

help-circle

  • mathematically “correct” sounding output

    It’s hard to say because that’s a rather ambiguous way of describing it (“correct” could mean anything), but it is a valid way of describing its mechanisms.

    “Correct” in the context of LLMs would be a token that is likely to follow the preceding sequence of tokens. In fact, it computes a probability for every possible token, then takes a random sample according to that distribution* to choose the next token, and it repeats that until some termination condition. This is what we call maximum likelihood estimation (MLE) in machine learning (ML). We’re learning a distribution that makes the training data as likely as possible. MLE is indeed the basis of a lot of ML, but not all.

    *Oversimplification.



  • I don’t understand the image. Is that supposed to be a Venn diagram?

    Anyway, to answer your question, I use GitHub Copilot for all of my coding work, and ChatGPT here and there throughout the week. They’ve both been great productivity boosters. Sometimes, it also gets hoisted onto me when I don’t want it. Like when trying to talk to customer service, or Notion trying to put words in my mouth when I accidentally hit the wrong keyboard shortcut.




  • It’s not completely subjective. Think about it from an information theory perspective. We want a word that maximizes the amount of information conveyed, and there are many situations where you need a word that distinguishes AGI, LLMs, deep learning, reinforcement learning, pathfinding, decision trees and the like from the outputs of other computer science subfields. “AI” has historically been that word, so redefining it without a replacement means we don’t have a word for this thing we want to talk about anymore.

    I refuse to replace a single commonly used word in my vocabulary with a full sentence. If anyone wants to see this changed, then offer an alternative.







  • This article got me curious about how these 1-bit models worked so I read up on it a bit.

    https://arxiv.org/html/2402.11295v3

    The model parameters aren’t completely converted to 1-bit. It’s decomposed into a sign matrix (the 1-bit part) and two full precision vectors which together make a rank 1 approximation of the original matrix. So if I understand correctly, this means everything still functions the same way as a regular transformer. Input vectors, intermediate values, and outputs, all are full precision and have no problem going through nonlinearities.





  • Personally, I can’t use bookmarks because if they’re out of sight, they get forgotten. Keeping things in an open tab is like having the browser constantly bugging me to remind me that I have to do this thing. It doesn’t guarantee that it gets addressed in a timely manner, but with the alternative it’s guaranteed to not be done at all.

    It also helps to keep my place in my work. There’s things that I’ll always have open because I need quick access to them and don’t want the friction of trying to find the page to lead to procrastination. Same with anything that’s relevant to work in progress.


  • Responding to your first two paragraphs:

    The enjoyability of a piece of art isn’t independent of the creator. I will only speak for myself since I don’t know other people’s experiences. When you see something that tickles the happy part of your brain, part of that emotional response is in knowing that there’s another person out there who probably felt that way and wanted to share those feeling with you. In experiencing those emotions, you also experience a connection with another human being. The knowledge that you’re not alone and someone else out there has experienced the same thing. I wouldn’t read through the credits because I don’t care who that person is. I just care that this person existed. When you look at AI generated work and it just feels empty despite the surface beauty, this is the missing piece. It’s the human connection.