On Coding Agents

It's the end of software engineering as we know it

đź”· Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

đź”® Devin-ing the Future

Cognition Labs, backed by the Founders Fund amongst others, launched Devin, an AI coding agent.

There is no open demo, but the pre-recorded demo one shows Devin building websites from single-line prompts including:

  • investigating which libraries to use

  • recovering from errors

  • using print statements for debugging

  • reading API documentation

  • keeping track of various steps in the process

  • providing audit access

Devin was built by a team led by Scott Wu, an International Olympiad for Informatics legend

Scott was always legendary by the way

Devin’s team has 10 IOI gold medals between the 9 co-founders which is absolutely insane if you think about it.

Andrej Karpathy, former Tesla AI head, had this to say:

# automating software engineering

In my mind, automating software engineering will look similar to automating driving. E.g. in self-driving the progression of increasing autonomy and higher abstraction looks something like:

1. first the human performs all driving actions manually

2. then the AI helps keep the lane

3. then it slows for the car ahead

4. then it also does lane changes and takes forks

5. then it also stops at signs/lights and takes turns

6. eventually you take a feature complete solution and grind on the quality until you achieve full self-driving.

There is a progression of the AI doing more and the human doing less, but still providing oversight. In Software engineering, the progression is shaping up similar:

1. first the human writes the code manually

2. then GitHub Copilot autocompletes a few lines

3. then ChatGPT writes chunks of code

4. then you move to larger and larger code diffs (e.g. Cursor copilot++ style, nice demo here https://youtube.com/watch?v=Smklr44N8QU)

5....

Devin is an impressive demo of what perhaps follows next: coordinating a number of tools that a developer needs to string together to write code: a Terminal, a Browser, a Code editor, etc., and human oversight that moves to increasingly higher level of abstraction.

There is a lot of work not just on the AI part but also the UI/UX part. How does a human provide oversight? What are they looking at? How do they nudge the AI down a different path? How do they debug what went wrong? It is very likely that we will have to change up the code editor, substantially.

In any case, software engineering is on track to change substantially. And it will look a lot more like supervising the automation, while pitching in high-level commands, ideas or progression strategies, in English.

Good luck to the team!

This is not actually a promising quote, as Karpathy seems to imply a 10-year or more ramp to automating code generation fully.

What is surprising for me is that Devin wraps GPT-4, meaning costs right now are unbearably high, somewhere between $120-300 an hour:

So what’s really going on?

  • Cognition has a ridiculous team

  • Founders Fund funds them

  • Devin is a demo product, and examples of usage online are cherry-picked but not doctored

  • Scaffolded agentic systems making calls to LLMs are going to be a thing

  • Devin aims to be the coding agent that does that

My guess? GPT-5 is likely to have much of this capability built in. The unfortunate fact of the matter is that OpenAI owes Microsoft 2 trillion in revenue in order to free AGI. That means that OpenAI will have to continuously tackle larger markets. Coding is definitely on the target list. Every team building connective tissue to overcome GPT-4’s limitations is probably in for a surprise.

I can kind of sense that the OpenAI team is almost apologetic at this point, hoping not to destroy too many friendships in the future.

🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🗞️ Things Happen

  • We finally found out how they named MAMBA:

  • Perplexity integrates with Maps and Yelp. It’s a pleasure to watch them ship, and ship and ship. If they ship fast and hard enough, they may be able to severely wound Google.

🖼️ AI Artwork Of The Day

A mix of Caucasian - African - Asian - Indian - Middle Eastern - Native American - Latin American in equal parts - u/9m2m from r/midjourney

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.