A jargon-free explanation of how AI large language models work

Gaywallet (they/it)@beehaw.org · 1 year ago

A jargon-free explanation of how AI large language models work

pezhore@lemmy.ml · 1 year ago

Does anyone else start freaking out when we have such complex programs that researchers don’t fully understand how they work?

Gaywallet (they/it)@beehaw.org · 1 year ago

For what is worth a lot of medicine works this way. I’m fairly certain this isn’t the only field, either. I’d imagine studying ecology or space feels similar

Czorio@kbin.social · 1 year ago

We know how they work, otherwise we couldn’t design and implement them. What we don’t really know, and we don’t really have to know is the exact parameters the model trains to.

The issue you’re thinking of is that any one parameter does not necessarily map to one aspect, but they are a coherent collection that makes the whole work. Some interesting insights can be gleaned from trying to figure out these relationships, but due to the massive amount of parameters (billions!) it gets a little much to get your head around.

PlatinumPangolin@kbin.social · 1 year ago

The whole “we don’t know how they work” thing is a bit overblown. We have all the formulas, we know exactly how the math and code works. You can go and look at the weights for every node, you’re just not going to derive any meaning or necessarily explain why one number works better than another.

u_tamtam@programming.dev · edit-2 1 year ago

This is the definition of complexity, isn’t it? The fact here is that we can’t scale up our understanding at a small level to make sense of the bigger picture. Having worked myself with (much simpler) artificial neural networks, I think it’s very much correct and to the point to say that “we don’t know how it works”. I would even go further and claim that we will never know how it works fully: the weights in the network in essence form structures that do what they do, that we can recognize by analogy (e.g. logic gates, contour extractors, …), but this is an anthropomorphic approximation which moreover only works in a certain range of values/set of conditions. Had we a formal definition of what the weights represent, we would then be dealing with a (much simpler and efficient) algorithm in the traditional sense (with cleanly delineated and rigorously defined specialized functions).