Intro

Some claim that fully autonomous SWE is around the corner, that devs will be useless in a few months.

“we just killed software engineering”

Their sales pitch? Saturated benchmarks. But these evaluations don’t represent the work we do every day as great engineers on a complex codebase.

Coding is not just vibecoding.

It’s a great narrative to push when you are a coding agent raising money, but it’s simply not true.

The best engineers in the world can perform tasks that are very far beyond the actual capabilities of LLMs.

And to prove it, we need to set up a fair game. Code that has never been seen by LLMs. Well-scoped and feasible tasks.

We will grant 100 incredible engineers working on 100 closed-source hardcore projects. From this work, we will create the ultimate benchmark.

A coding benchmark, with a 0% success rate from LLMs.

Never bet against humanity.

Last updated