ProgramBench, a new benchmark from Facebook/Meta(by SWE-Bench creators) to see if LLMs can recreate real executable programs (ffmpeg, SQLite) from scratch with no internet access- They all score 0%.

beep@piefed.world · 3 hours ago

ProgramBench, a new benchmark from Facebook/Meta(by SWE-Bench creators) to see if LLMs can recreate real executable programs (ffmpeg, SQLite) from scratch with no internet access- They all score 0%.

Jiral@lemmy.org · 59 minutes ago

I am not a fan if AI slop and low effort vibe coding and hate AI images, music and videos with a passion, but it should be pointed out that this study was about re-programming a complete program from scratch, from documentation.

There, all failed miserably. This study was not about smaller work packages or narrower tasks.

ProgramBench, a new benchmark from Facebook/Meta(by SWE-Bench creators) to see if LLMs can recreate real executable programs (ffmpeg, SQLite) from scratch with no internet access- They all score 0%.

ProgramBench, a new benchmark from Facebook/Meta(by SWE-Bench creators) to see if LLMs can recreate real executable programs (ffmpeg, SQLite) from scratch with no internet access- They all score 0%.

./ProgramBench

Can language models rebuild programs from scratch?