Emergent Behavior
Posts
2024-11-22: The Race to the TopDario Amodei on AGI, Risks, and the Future of Anthropic

2024-11-22: The Race to the TopDario Amodei on AGI, Risks, and the Future of Anthropic

Prakash Ate-A-Pi
November 22, 2024

🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

The Race to the Top: Dario Amodei on AGI, Risks, and the Future of Anthropic
AI artwork of the day

🏁 The Race to the Top: Dario Amodei on AGI, Risks, and the Future of Anthropic

Dario Amodei, CEO of Anthropic, recently appeared on the Lex Fridman podcast to discuss his vision for artificial general intelligence (AGI) and how to shape its development responsibly. The conversation touched on critical milestones, risks, and Anthropic’s unique approach to advancing the field. Here’s a deeper dive into the discussion, highlighting Anthropic’s strategies, Dario’s AGI timeline, and his perspective on safety and innovation.

To iterate, Dario’s definition of AGI (which he calls “strong AI”) is

Strong AI is Country of geniuses in a datacenter:

* Smarter than a Nobel prize winner across many fields

* Has complete access to digital interfaces ie audio, video, search for both input and output. It can communicate and instruct humans

* It can autonomously plan and carry out tasks over long periods of time

* Does not have a physical embodiment but can control any robots its connected to

* The training resources can redeployed to run millions of its instances, and the model can absorb and generate information at 10x-100x human speed

* Each of these millions of instances can act independently, or can collaborate with other instances as necessary

Dario Amodei, Machines of Loving Grace

The Race to the Top

Anthropic is committed to driving innovation responsibly through what Dario calls the “Race to the Top.” This strategy involves pushing competitors to adopt best practices by setting an example, particularly in mechanistic interpretability—a technique Anthropic excels at for improving AI safety and transparency.

Anthropic often wins at hiring vs its competitors because of interpretability, and so he tells the new employees “The other places you didn’t go, tell them why you came here”. In this way, Anthropic shapes the incentives for the entire field.

Addressing Complaints About AI Models “Getting Dumber”

Dario addressed claims that AI models are becoming less intelligent, attributing this perception to a form of hedonic adjustment—users growing accustomed to the model’s capabilities over time. Anthropic rarely alters its models after launch, apart from minimal A/B testing just before release, suggesting that complaints reflect shifting expectations rather than actual performance degradation.

Risks on the Horizon: ASL Levels

The two key risks Dario is concerned about are:

a) cyber, bio, radiological, nuclear (CBRN)

b) model autonomy

These risks are captured in Anthropic’s framework for understanding AI Safety Levels (ASL):

1. ASL-1: Narrow-task AI like Deep Blue (no autonomy, minimal risk).

2. ASL-2: Current systems like ChatGPT/Claude, which lack autonomy and don’t pose significant risks beyond information already accessible via search engines.

3. ASL-3: Agents arriving soon (potentially next year) that can meaningfully assist non-state actors in dangerous activities like cyber or CBRN (chemical, biological, radiological, nuclear) attacks. Security and filtering are critical at this stage to prevent misuse.

4. ASL-4: AI smart enough to evade detection, deceive testers, and assist state actors with dangerous projects. AI will be strong enough that you would want to use the model to do anything dangerous. Mechanistic interpretability becomes crucial for verifying AI behavior.

5. ASL-5: AGI surpassing human intelligence in all domains, posing unprecedented challenges.

Anthropic’s if/then framework ensures proactive responses: if a model demonstrates danger, the team clamps down hard, enforcing strict controls.

Building the Future: Talent and Research Priorities

Dario emphasized Anthropic’s preference for “talent density” over “talent mass,” favoring small, elite teams aligned with the mission. Promising research areas he identified include:

• Mechanistic interpretability: Making AI models transparent and predictable.

• Long-horizon learning: Teaching models to plan for and execute complex tasks over extended timelines.

• Evaluations of dynamic systems: Understanding how models behave in multi-agent scenarios.

• Multi-agent coordination: Ensuring AI systems collaborate effectively and safely.

Conversely, he sees diminishing returns in new architecture searches, as the field is already saturated with researchers.

How Fast Will AGI Transform Society?

Dario adopts a middle-ground perspective on AGI’s impact. He dismisses scenarios of rapid, world-altering change within days, citing two factors:

1. Physical systems take time to build.

2. AGI will likely adhere to human laws and navigate regulatory systems.

However, he also pushes back against estimates like Tyler Cowen’s 50-100 years timeline, pointing out that competitive forces can accelerate progress. Most economists follow Robert Solow, “You see the computer revolution everywhere except the productivity statistics.” in doubting the impact of recent tech on the economy.

Drawing on his experience with governments and corporations, Dario notes that a few visionaries can often initiate deployments, which then trigger a “competitive tailwind”—as one player adopts AGI, others follow to stay competitive. if Goldman does it, the rest of the investment banks try to catch up, if the US does it, China then tries to exceed the US. “The spectre of competition plus a few visionaries” is all you need you start a productivity cascade that pulls humanity forward.

The AGI Timeline

Dario predicts AGI development on a straight-line extrapolation by 2026-2027, with a 1-2 year buffer for potential stumbling blocks. While barriers in the field persist, they are receding, making AGI an increasingly plausible reality.

Conclusion

Dario seems sincere. We are on the cusp of AGI, risks exist, but if we avoid them, the future could be so bright

Share this story

🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🖼️ AI Artwork Of The Day

Thought I’d make some minecraft environs also - 12washingbeard from midjourney

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.