StackOverflow activity down to 2008 numbers

ikt@aussie.zone · 10 months ago

StackOverflow activity down to 2008 numbers

Elgenzay@lemmy.ml · 10 months ago

Think in the future LLMs will perform worse on modern problems due to the lack of recent StackOverflow training data?

HelloRoot@lemy.lol · edit-2 10 months ago

StackOverflow training data

Q: detailed problem description with research and links explaining how problem is different from existing posts and that the mentioned solutions did not work for this case.

A: duplicate. (links to same url Q explicitly mentioned and explained)

mindbleach@sh.itjust.works · 10 months ago

Don’t need eight billion parameters to go “But why do you want that?”

atzanteol@sh.itjust.works · 10 months ago

I suspect it may be a self-balancing problem. For topics that llms don’t do well there will be discussions in forums. Then the AI will have training data and catch up.

ikt@aussie.zone · edit-2 1 month ago

At the current rate yeah, it simply isn’t good enough, my go to question is print Hello World in brainfuck and then it passes that have it print Hello <random other place>

In this case I just asked it ‘I have a question about brainfuck’ and it gave an example of Hello World! Great!

Unfortunately it just outputs “HhT”

So I know that they are trying hard with synthetic data:

https://www.youtube.com/watch?v=m1CH-mgpdYg

but I think fundamentally they just need to be straight better at absorbing the data that they’ve already got

cevn@lemmy.world · 10 months ago

I think the disconnect we are experiencing is how the AI will write some code and never execute it. It should absolutely be trying to compile it in some sandbox if we had a really smart AI , thru installing it on some box. Maybe someone has already come up with this.

ikt@aussie.zone · 1 month ago

it’s learning 😃

markovs_gun@lemmy.world · 10 months ago

I think so. I am legitimately worried about what happens in 10 years with everyone relying on llms to code when nobody seems to be planning for how things will work when LLM coding is nearly universal

mindbleach@sh.itjust.works · 10 months ago

2005 post, s/LLM/Google/g.

vrighter@discuss.tchncs.de · 10 months ago

there’s nothing to plan for. Shit will be broken, shit is already expected to be broken nowadays, business as usual. I hate what programming has become.

ikt@aussie.zone · 10 months ago

Do you realise what sub you’re in?

ikt@aussie.zone · 10 months ago

I do wonder if a new programming language will be invented that is ‘ai friendly’ and far more better integrated

markovs_gun@lemmy.world · 10 months ago

The main concern for me is how that would even work. LLMs struggle to come up with anything truly novel, and are mostly copying from their training set. What happens when 99% of the training corpus for a programming language is AI code or at least partially AI code? Without human data to start with how do LLMs continue to get better? This is kind of an issue with everything LLMs do but especially programming.

ikt@aussie.zone · edit-2 10 months ago

I’m thinking more along the lines of a new programming language unlike any programming language ever made, simply made for an LLM to produce, like machine generation of machine code (but who knows, LLM’s in themselves are frankly magic to me, last thing I want to do is be like someone in the early 1900’s predicating in the year 2000 we’ll all use advanced hot air balloons to move about)

mindbleach@sh.itjust.works · 10 months ago

2035: BASIC supremacy.

supersquirrel@sopuli.xyz · edit-2 9 months ago

imagine if there were a plethora of them already lurking out there in the deep?

isn’t it strange then that you don’t already have an AI overlord then?

Rexios@lemm.ee · 10 months ago

Maybe but a lot of StackOverflow answers come straight from documentation anyways so it might not matter

brucethemoose@lemmy.world · 10 months ago

LLMs with more native access to documentation should do OK.

There’s potential for a sea of “niche” LLMs too. A good example is this version of Qwen finetuned just to write CUDA: https://huggingface.co/cognition-ai/Kevin-32B

Kühlschrank@lemmy.world · 10 months ago

Do llms get the bulk of their training date from Stack? Legitimately curious as I am sure they do get at least some training from non Q&A style sources

Vince@lemmy.world · 10 months ago

Is the drop all due to AI?

Pennomi@lemmy.world · 10 months ago

The fast drop yes, but really it’s been in decline for around a decade before that.

MrZee@lemm.ee · 10 months ago

Interesting! When I first read your comment, I looked at the chart and thought “it looks to me like the drop starts at the end of 2022. Isn’t that before LLMs started being used broadly?”

Nope. Looks like ChatGPT was released in November 2022. It doesnt feel like it’s been around that long, but I guess it has.

unexposedhazard@discuss.tchncs.de · 10 months ago

They also announced their AI stuff in July 2023 https://stackoverflow.blog/2023/07/27/announcing-overflowai/

MudMan@fedia.io · 10 months ago

The drop starts in 2013, but people were certainly ready to all bail at once by the time LLMs came around.

Vince@lemmy.world · 10 months ago

That sucks, is there an alternative people are using? seems like it would still be a useful knowledge base to have

HellieSkellie@lemmy.dbzer0.com · 10 months ago

The common alternative is to just ask ChatGPT your software questions, get false information from the AI, and then try and push that horrible code to production anyway if my past two jobs are any indicator.

Stack Overflow is still useful to find old answers, but fucking sucks to ask new questions on. If you aren’t getting an AI answer to your question, then you’re getting your question deleted for some made up reason.

The real answer that everyone hates is: If you have a question about something, read the documentation and experiment with it to figure that something out. If the documentation seems wrong, submit an issue report to the devs (usually on GitHub) and see what they say.

The secondary answer is that almost everything FOSS has a slack channel or even sometimes discord channels. Go to the channels and ask people who use/make whatever tool you need help with.

atzanteol@sh.itjust.works · 10 months ago

The common alternative is to just ask ChatGPT your software questions, get false information from the AI, and then try and push that horrible code to production anyway if my past two jobs are any indicator.

If you have developers pushing bad and broken code to production your problem isn’t AI.

HellieSkellie@lemmy.dbzer0.com · 10 months ago

deleted by creator

Psaldorn@lemmy.world · 10 months ago

That and they cover up half the fucking page when you try to view it. Google login, giant cookie popup etc

magic_lobster_party@fedia.io · 10 months ago

I believe it’s more of a generational shift.

The age groups who used to rely on SO are now skilled enough not to rely on it as much (or they more often have the types of questions SO can’t answer).

Younger age groups probably prefer other means of learning (like ChatGPT, Discord and YouTube videos).

shaserlark@sh.itjust.works · edit-2 10 months ago

Yeah I’m working in some niche and there is a stackoverflow that they refer newbies to because "no developer support on their discord“. But if you ask a question there no one will ever answer, otoh if you know where and how to ask you’ll actually get help on discord. I feel like SO is pretty much dead with anything where change happens quickly.

errer@lemmy.world · 10 months ago

There’s also only so many ways to ask how to sort a list or whatever and SO removes duplicate questions. So at some point the number of unique questions asked begins to plateau. I think that explains the slow drop before LLMs came on the scene.

Korhaka@sopuli.xyz · 10 months ago

I assumed it was because stackoverflow already had all the answers I needed except for the things too obscure to search for that result in my crying and trying to piece it together from scraps of info on 50 different tabs.

calcopiritus@lemmy.world · 10 months ago

Yes. But not just in the “obvious” way.

I first started to contribute back when LLMs first appeared. Then SO allowed became LLM training grounds. Which made me stop contributing instantly.

I guess a not-insignificant amount of people stopped answering questions, which means less search results, which ends in less traffic.

I’m sure the fall wouldn’t be as big as it is if they didn’t allow LLMs to train on their data.

FaceDeer@fedia.io · 10 months ago

How do you disallow LLMs to train on their data while still allowing humans to train on their data?

calcopiritus@lemmy.world · 10 months ago

If they can charge for it. It means they can block it. https://www.wired.com/story/stack-overflow-will-charge-ai-giants-for-training-data/

You can also rate-limit. Blacklist known scrapper IPs.

And if it doesn’t work. You make signing-in not optional. Which makes rate-limiting way easier.

The rate of human data consumption is much lower than LLM’s. The humans won’t even notice that they have a rate limit. At most they would only notice the need to create a stack overflow account.

juli@lemmy.world · 10 months ago

It probably started off when reddit/discord became a friendly place for troubleshooting (code among other things), then the AI dropped it off the cliff.

dragon-donkey3374@sh.itjust.works · 10 months ago

Rammed it off the cliff

JackbyDev@programming.dev · 10 months ago

Nah, that drop comes WELL before AI answers. Look at the dates. They’ve had a culture of people overly aggressively closing new questions for pointless/irrelevant reasons as well as being generally nasty to new users for ages. Sure, it started dropping way faster post 2020 because of AI, but the problem was already there.

resipsaloquitur@lemm.ee · edit-2 10 months ago

It’s not LLMs — see the peak at 2013. They aggressively started closing any “duplicate” questions around then. The whole premise was that experts were supposed to answer questions for clout that would bolster their resume, but after getting silenced a few times, why would they come back? And anyone with the temerity to ask a question that was asked (with or without a good answer, ten years ago) would also never come back after getting shut down.

They couldn’t decide if they were a forum or Wikipedia and became neither.

JackbyDev@programming.dev · 10 months ago

Note that the decline began well before “AI” stuff became a thing. Stack Overflow has had a major culture problem as well as not treating their users with respect for ages.

For the part about respecting users, they have a history of ignoring Meta (their site specifically for talking about Stack Overflow site itself) while acting like they use it.

Zarathustra@lemmy.world · edit-2 10 months ago

In the future we will be dependent on LLMs for everything because the only people with enough money to maintain libraries of data which are untainted by LLMs will be the people who own the LLMs.

Step 1: Steal all of the data (including copyrighted stuff)

Step 2: Poison the well

Step 3: Profit

zr0@lemmy.dbzer0.com · 10 months ago

Not surprised, even without the LLM boom, StackOverflow was doomed for the same reason reddit is doomed: power tripping bastards, gatekeeping everything which is not part of their narrow minded world.

LaunchesKayaks@lemmy.world · 10 months ago

Posted once in stack overflow in college and got absolutely destroyed. I was not let down lmao

letsgo@lemm.ee · 10 months ago

I’m not surprised. StackOverflow has moderated itself out of relevance. Ask a question and get flamed. DDG a question plus “stackoverflow” and get something that may well have been correct and useful in 2012 but tech moves on and it’s now archaic trivia, somewhat akin to facts about punched cards. “Help me StackOverflow, you’re my only hope” hasn’t been true for quite some time now.

JackbyDev@programming.dev · 10 months ago

They moderated themselves out of relevance because when you ask new questions that aren’t duplicates they still close them as duplicates.

Possibly linux@lemmy.zip · 10 months ago

I keep getting the Cloudflare checks