[Paper] Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in SOTA Large Language Models

rufus@discuss.tchncs.de · 1 year ago

I think they don’t take inspiration from Photoshop. Either it’s been a clone of a different product at some time or they developed it themselves. Hence the differences. I mean the whole UI doen’t really resemble similarity to Photoshop.

rufus@discuss.tchncs.de · edit-2 1 year ago

Yeah, I agree. And I appreciate your perspective.

I don’t think growing Lemmy and funnelling in users works out. We don’t grow. We’re somewhere between 40k and 50k active users and there is no trend in either direction.

Last year, I despised beehaw for doing their own thing and not respecting how federation is supposed to work. That is connecting people and not being a patchwork of small spaces that don’t talk to each other because of small minds/perspectives… I think I changed my mind a bit. Their way of doing things turned out to foster better behaviour than on other instances. It’s still detrimental to the idea of a federated platform, but still… The effects aren’t just negative.

I think we have lots of issues here. The culture is a bit different from what I’d like it to be. It’s a tiny bit above Reddit in atmosphere, but on the downside it lacks the (niche) experts. It’s more average people here and just the most predominant opinions. Furthermore, it’s too much discussing the news and not much else that’d be meaningful for my life. It’s too small for lots of things that this place could excel in and that you won’t find anywhere else.

And the technology really isn’t that good. Progress is super slow, they don’t implement the things the users need and wish for. And it doesn’t foster growth or nice behaviour.

And I think that’s the main issue. We’d need a solid basis to build something upon. It needs to be shiny, have excellent moderation tools and user-facing features. All of this has been requested but except for things like instance blocking by the user, that doesn’t even block their users, we didn’t get much.

My personal wish is that new approaches like PieFed will go ahead and provide that to us. I think I’d like to host an instance with that and then invite some people. As of now I didn’t advertise for Lemmy because I think neither the software, nor the atmosphere/community, nor the content here is worth convincing anyone to join. At this point I’m just waiting for one of the three to get anywhere. But I think I’d also like to defederate from a few people. And force them to be nice, upvote replies, not just dump any random links but provide some text in a post, and have some niche interest communities, because just dumping links to news and posting memes isn’t cutting it. We already have X and Mastodon for that…

And a little disclaimer: I’m being negative in this comment. But that’s not all there is to it. There is a reason why I’m here. I regularly have nice interactions, learn new things and have good conversations. It’s just that it’s far between and I see lots of potential for more. And I’d really like that to become reality.

rufus@discuss.tchncs.de · 1 year ago

Interestingly enough, that doesn’t happen in some communities like !Technology@beehaw.org and I thought that wouldn’t happen in communities that are dedicated to AI. Well, maybe I’m wrong and lots of people don’t understand the concept of communities. Yeah, but I nowadays also see lots of AI stuff in the general communities like Ask Lemmy.

rufus@discuss.tchncs.de · edit-2 1 year ago

I think most people use something like exllamav2 or vllm or use GGUF to do inference and it seems neither of those projects have properly implemented multimodality or this specific model architecture, yet.

You might just be at the forefront of things and there isn’t yet any beaten path you could follow.

The easiest thing you could do is just use something that already exists, be it 4bit models, wait a few weeks and then upgrade. And I mean you can also always quantize models yourself and set the parameters however you like, if you have some inference framework that supports your model including the adapters for vision and has the quantization levels you’re interested in…

rufus@discuss.tchncs.de · edit-2 1 year ago

Well, I’d say there is information in language. That’s kinda the point of it and why we use it. And language is powerful. We can describe and talk about a lot of things. (And it’s an interesting question what can not be described with language.)

I don’t think the stochastical parrot thing is a proper debate. It’s just that lots of people don’t know what AI is and what it can and cannot do. And it’s neither easy to understand nor are the consequences always that obvious.

Training LLMs involves some clever trickery, limit their size etc so they can’t just memorize everything, but instead are forced to learn concepts behind those texts.

I think they form models of the world inside of them. At least of things they’ve learned from the dataset. That’s why they can for example translate text. They have some concept of a cat stored inside of them and can apply that to a different language that uses entirely different characters to name that animal.

I wouldn’t say they are “tools to learn more aspects about nature”. They aren’t a sensor or something. And they can infer things, but not ‘measure’ things like an X-ray.

rufus@discuss.tchncs.de · edit-2 1 year ago

I’m currently reading the paper. I occasionally debate here on Lemmy whether LLMs are just stochastic parrots, or if they actually grasp the concepts they’re talking about. There’s also evicence for that.

Ultimately I wonder if and when we’ll get LLMs that address ‘hallucinations’ and expose a setting to adjust the factuality of the answer. I suppose that’s somewhere in the model or at least possible to learn for the model. But certainly not controlled or factored in in the current generation of LLMs.

rufus@discuss.tchncs.de · edit-2 1 year ago

[Paper] Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in SOTA Large Language Models

rufus@discuss.tchncs.de · edit-2 1 year ago

services.tabby.enable = true;
services.tabby.acceleration = "cuda";

? Could be another way.

rufus@discuss.tchncs.de · edit-2 1 year ago

https://github.com/YellowRoseCx/koboldcpp-rocm

That will be optimized for AMD and as far as I know has the same / a very similar user interface.

(The 8GB of VRAM on your graphics card will be some limitation. So maybe stick with smaller and quantized models.)

And share your success stories on !ChatbotsNSFW@lemmynsfw.com

rufus@discuss.tchncs.de · edit-2 1 year ago

I’m pretty sure he did this out of this own motivation because he thinks/thought it’s a fascinating topic. So, sure this doesn’t align with popularity. But it’s remarkable anyways, you’re right. And I always like to watch the progression. As far as I remember the early videos lacked professional audio and video standards that are nowadays the norm on Youtube. At some point he must have bought better equipment, but his content has been compelling since the start of his Youtube ‘career’. 😊

And I quite like the science content on Youtube. There are lots of people making really good videos, both from professional video producers and also from scientists (or hobbyists) who just share their insight and interesting perspective.

rufus@discuss.tchncs.de · edit-2 1 year ago

And maybe have a look at his Youtube channel and the older videos, too. Lots of them are a bit more philosophical and not too technical for the average person. I think he’s quite inspiring and conveys very well what AI safety is about, and what kinds of problems that field of science is concerned with.

rufus@discuss.tchncs.de · edit-2 1 year ago

I recently listened to this German language podcast episode about the social cost and how life is for a few clickworkers in Africa: Das Wissen | SWR: Clickworker – Ausgebeutet für künstliche Intelligenz

rufus@discuss.tchncs.de · edit-2 1 year ago

Yeah, doesn’t really work. I mean it has a rough idea of that it needs to go east. And I’m surprised that it knows which interstates are in an area and a few street names in the cities. I’m really surprised. But I told it to get me from Houston to Montgomery as in your example. And in Houston it just tells random street names that aren’t even connected and in different parts of the city. Then it drives north on the I-45 and somehow ends up in the south on the I-610-E and finally the I-10-E. But then it makes up some shit, somehow drives to New Orleans, then a bit back and zig-zags it’s way back onto the I-10. Then some more instructions I didn’t fact check and it gets that it needs to go through Mobile and then north on the I-65.

I’ve tested ChatGPT on Germany. And it also gets which Autobahn is connected to the next. It still does occasional zig-zags and in between it likes to do an entire loop of 50km (30 miles) that ends up 2 cities back where it came from… Drives east again and on the second try takes a different exit.

However: I’m really surprised by the level of spatial awareness. I wouldn’t have expected it to come up with mostly correct cardinal directions and interstates that are actually connected and run through the mentioned cities. And like cities in between.

I don’t think I need to try “phi”. Small models have very limited knowledge stored inside of them. They’re too small to remember lots of things.

So, you were right. Consider me impressed. But I don’t think there is a real-world application for this unless your car has a teleporter built in to deal with the inconsistencies.

rufus@discuss.tchncs.de · edit-2 1 year ago

Which model(s) did you try? I’m willing to test it later. Downside is, I mainly use smaller LLMs, live in Germany, in an urban region with lots of streets and different Autobahnen and it’s kind of a hassle to deal with textual driving instructions anyways. 😆

rufus@discuss.tchncs.de · edit-2 1 year ago

Quite some AI questions coming up in selfhosted in the last few days…

Here’s some more communities I’m subscribed to:

And a few inactive ones on lemmy.intai.tech

I’m using koboldcpp and ollama. KoboldCpp is really awesome. In terms of hardware it’s an old PC with lots of RAM but no graphics card, so it’s quite slow for me. I occasionally rent a cloud GPU instance on runpod.io Not doing anything fancy, mainly role play, recreational stuff and I occasionally ask it to give me creative ideas for something, translate something or re-word or draft an unimportant text / email.

Have tried coding, summarizing and other stuff, but the performance of current AI isn’t enough for my everyday tasks.

rufus@discuss.tchncs.de · edit-2 1 year ago

What’s that got to do with AI?

Edit: Ah. Probably the search bar from the screenshot.

rufus@discuss.tchncs.de · 1 year ago

Isn’t that very similar to what TikTok does? Just with a different algorithm and maybe other content than just videos?

rufus@discuss.tchncs.de · edit-2 1 year ago

Hmmh. We’ve seen all kinds of claims and hype regarding AI. I’d like to see and judge for myself. Guess I’ll have to wait a few days.

Edit 2024-05-18: And yesterday it showed up in the webinterface. How do I get the talking and the emotions? Is that not available yet? Or do I need a phone app for that?