- Emergent Behavior
- Posts
- On Coding Agents
On Coding Agents
It's the end of software engineering as we know it
đź”· Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.
Here’s today at a glance:
đź”® Devin-ing the Future
Cognition Labs, backed by the Founders Fund amongst others, launched Devin, an AI coding agent.
Today we're excited to introduce Devin, the first AI software engineer.
Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.
Devin is… twitter.com/i/web/status/1…
— Cognition (@cognition_labs)
1:50 PM • Mar 12, 2024
There is no open demo, but the pre-recorded demo one shows Devin building websites from single-line prompts including:
investigating which libraries to use
recovering from errors
using print statements for debugging
reading API documentation
keeping track of various steps in the process
providing audit access
Devin was built by a team led by Scott Wu, an International Olympiad for Informatics legend
I hope competitive programming will get even more popular now that competitive programmers get credits for the products they deliver at top tier ML startups.
btw, here is Scott Wu, founder of Cognition, securing 1st place at IOI with a perfect score, not even 10 years ago:
— Iulia Groza (@_iuliagroza)
7:55 PM • Mar 13, 2024
Scott was always legendary by the way
this is the ceo of cognition 14 years ago
the idea that 10x/100x engineers don’t exist is such a cope
— Siqi Chen (@blader)
12:22 AM • Mar 13, 2024
Devin’s team has 10 IOI gold medals between the 9 co-founders which is absolutely insane if you think about it.
Andrej Karpathy, former Tesla AI head, had this to say:
# automating software engineering
In my mind, automating software engineering will look similar to automating driving. E.g. in self-driving the progression of increasing autonomy and higher abstraction looks something like:
1. first the human performs all driving actions manually
2. then the AI helps keep the lane
3. then it slows for the car ahead
4. then it also does lane changes and takes forks
5. then it also stops at signs/lights and takes turns
6. eventually you take a feature complete solution and grind on the quality until you achieve full self-driving.
There is a progression of the AI doing more and the human doing less, but still providing oversight. In Software engineering, the progression is shaping up similar:
1. first the human writes the code manually
2. then GitHub Copilot autocompletes a few lines
3. then ChatGPT writes chunks of code
4. then you move to larger and larger code diffs (e.g. Cursor copilot++ style, nice demo here https://youtube.com/watch?v=Smklr44N8QU)
5....
Devin is an impressive demo of what perhaps follows next: coordinating a number of tools that a developer needs to string together to write code: a Terminal, a Browser, a Code editor, etc., and human oversight that moves to increasingly higher level of abstraction.
There is a lot of work not just on the AI part but also the UI/UX part. How does a human provide oversight? What are they looking at? How do they nudge the AI down a different path? How do they debug what went wrong? It is very likely that we will have to change up the code editor, substantially.
In any case, software engineering is on track to change substantially. And it will look a lot more like supervising the automation, while pitching in high-level commands, ideas or progression strategies, in English.
Good luck to the team!
This is not actually a promising quote, as Karpathy seems to imply a 10-year or more ramp to automating code generation fully.
What is surprising for me is that Devin wraps GPT-4, meaning costs right now are unbearably high, somewhere between $120-300 an hour:
sanity check on @cognition_labs
Per task, Devin is likely doing 10-20 maxed out GPT4-32k calls per minute, thats ~$2-$5/min, or $120-$300/hr depending on Input/Output token ratio.
my fellow waterloo interns will gladly outperform Devin for that hourly cost :)
— brian-machado-finetuned-7b (e/snack) (@sincethestudy)
1:52 PM • Mar 13, 2024
So what’s really going on?
Cognition has a ridiculous team
Founders Fund funds them
Devin is a demo product, and examples of usage online are cherry-picked but not doctored
Scaffolded agentic systems making calls to LLMs are going to be a thing
Devin aims to be the coding agent that does that
My guess? GPT-5 is likely to have much of this capability built in. The unfortunate fact of the matter is that OpenAI owes Microsoft 2 trillion in revenue in order to free AGI. That means that OpenAI will have to continuously tackle larger markets. Coding is definitely on the target list. Every team building connective tissue to overcome GPT-4’s limitations is probably in for a surprise.
I can kind of sense that the OpenAI team is almost apologetic at this point, hoping not to destroy too many friendships in the future.
2024 is going to be an exciting year for AI
— Noam Brown (@polynoamial)
11:43 PM • Mar 12, 2024
🌠Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!
Or send them the below subscription link:
🗞️ Things Happen
We finally found out how they named MAMBA:
Mamba is such a great name for an AI model. Its predecessor was called the Structured State Space for Sequence Modeling (S4). Mamba adds selective scanning, making it the Selective Scan Structured State Space for Sequence Modeling (S6). So they named it after a snake: SSSSSS.
— Timothy B. Lee (@binarybits)
3:20 PM • Mar 13, 2024
Perplexity integrates with Maps and Yelp. It’s a pleasure to watch them ship, and ship and ship. If they ship fast and hard enough, they may be able to severely wound Google.
Simple daily needs like findjng good coffee requires doing some research. And to be a good answer engine and a research buddy, the product needs to evolve to get the right information and present it in the most accessible way. We are taking the first steps towards that with Maps… twitter.com/i/web/status/1…
— Aravind Srinivas (@AravSrinivas)
4:07 PM • Mar 14, 2024
🖼️ AI Artwork Of The Day
A mix of Caucasian - African - Asian - Indian - Middle Eastern - Native American - Latin American in equal parts - u/9m2m from r/midjourney
That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:
Reply