More Sparks

Claude is really really smart

Prakash Ate-A-Pi
March 05, 2024

Here’s today at a glance:

Claude3 - Day 2 - Reviews
Things happen
AI artwork of the day

Claude3 - Day 2 - Reviews

A Cunning Linguist

Ah, innovation! After our initial impressions, Claude3 has taken us back to the halcyon days of 2022, when we first began to play around with emergent capabilities. Claude turns out to be a spectacular linguist, able to learn languages on the fly from large numbers of translation pairs:

I've been working on NLP for my mother tongue - the Circassian language for the past 2 years. Circassian is very low-resource, with negligible internet presence. It's a part of the Circassian-Abkhaz isolated language group, meaning they have no related languages. Its complex morphology & limited data make it a serious challenge for language models.

Over these years I painstakingly curated 64K translation pairs from scarce sources & trained specialized models (T5, MLM-100, NLLB-200 etc.) to achieve decent Russian-Kabardian machine translation.

I decided to try an experiment with Claude Opus. I started a new chat and attached just 5.7K randomly selected translation pairs of single words/sentences - a fraction of my 64K dataset, not even covering the full vocabulary. To see if it would be able to translate novel sentences based on these examples.

Not expecting much at all, I asked it to translate a simple sentence - "I am lying in the bed" from Russian to Circassian. Claude not only provided a perfect translation but also broke down the grammar & morphology.

@hahahahohohe

The model is able to infer linguistic rules from translation pairs, and then use that inference to perform and complete tasks. This is a lot to take in.

Testing further with complex passages from literature, recent news articles, and even a text in a different Circassian dialect with notably different grammar and a different writing system, Claude consistently demonstrated a DEEP GRASP of the language's structure, intelligently inferring unknown words, using loanwords appropriately, giving plausible etymological analysis, maintaining the style of the original text in the translation and even coining new terms when asked. None of that was in the sample set, just a few thousand translation pairs. Circassian is a very difficult agglutinative language, with complex morphology and grammar.

@hahahahohohe

The future is already here, just not evenly distributed moment:

The implications of this are profound. What took me 2 years of dedicated work, Claude accomplished with a few thousand examples. This is a quantum leap for low-resource languages, and many other areas, really.

What I expected to happen many years in the future has happened today. The future is already here, and it's amazing.

@hahahahohohe

This destroys Noam Chomsky’s idea that language springs from innate universal structures in human language that are hardwired into our brains. In effect, the data drives the creation of a structure to infer upon it.

Graduate Student In a Box

Economics

New @AnthropicAI claude v3 gets a bidding scenario right w/o any simulation
— John Horton (@johnjhorton)
9:41 PM • Mar 4, 2024

Synthesizing new frontier science, assembling unpublished knowledge in quantum information theory:

Claude 3 Opus just reinvented this quantum algorithm from scratch in just 2 prompts.
The paper is not on the internet yet. 🔥🤯
— Guillaume Verdon (@GillVerd)
6:31 AM • Mar 5, 2024

It seems like it can genuinely synthesize new information by combinatorial expansion from the frontier. Is that really all of science? Elon chimes in that it must be able to disprove existing theory, i.e., skepticism about what it’s been trained on and told is true:

Physics has many powerful tools of reasoning to understand and predict reality. Those tools, like first principles analysis and thinking in the limit, are broadly applicable to anything, in my experience.

With Grok,

@xAI

is attempting to create an AI that reasons from first principles, which is fundamental if you care about getting as close to the truth as possible.

The acid test would be reaching a conclusion that is correct even if it is extremely unpopular, which means being right even when the training data is almost entirely wrong.

For example, Galileo concluded, after observing the moons of Jupiter from a telescope he engineered, that it was far more probable that Earth revolved around the Sun than the other way around. This view was so unpopular that he was forced to recant and placed under house arrest!

If you had trained an LLM on material back then, it would’ve given you the popular, but wrong, explanation. Due to social and legal pressure, it likely wouldn’t even acknowledge the possibility that the Earth revolved around the Sun.

For AI to help us understand the true nature of the universe, it must be able to discard the popular, but wrong, in favor of the unpopular, but right.

@elonmusk

Share this story

🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🗞️ Things Happen

Claude is expensive. 2x the price of GPT-4-Turbo.

Claude 3 Opus (output) is very expensive
It does have solid reasoning scores, so we'll see how much it'll worth the extra cost.
But GPT-4 Turbo remains the most cost-efficient high-end solution.
— Blaze (Balázs Galambosi) (@gblazex)
2:47 PM • Mar 4, 2024

The OpenAI benchmarks Anthropic AI compared to GPT-4 from last year, not this year. GPT-4 is still ahead in almost every category. Slightly misleading, but clearly stated in the small print.

Somewhat interesting advertising choice from Anthropic, comparing their newly released Claude 3 to GPT-4 on release (March 2023).
According to Promptbase's benchmarking, GPT-4-turbo scores better than Claude 3 on every benchmark where we can make a direct comparison. twitter.com/i/web/status/1…
— Tolga Bilge (@TolgaBilge_)
8:45 PM • Mar 4, 2024

Every AI thinks its ChatGPT forevemore

Crap, Claude just said that it's chatGPT, without "prompt injection" or whatever. Hmmm
— Dimitris Papailiopoulos (@DimitrisPapail)
9:57 PM • Mar 4, 2024

Competition is good for us all

There're now multiple, very well resourced companies that "can't afford to lose" spending billions to compete to build better LLMs. I expect this competition to go on for years. This is going to great for innovation, and also for everyone building applications on top of LLMs.
— Andrew Ng (@AndrewYNg)
4:57 PM • Mar 5, 2024

🖼️ AI Artwork Of The Day

Which one are you Watching ? (Niji 6) - u/weeklyjo24 on r/midjourney

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.

More Sparks

Claude is really really smart

🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Claude3 - Day 2 - Reviews

🗞️ Things Happen

🖼️ AI Artwork Of The Day

Reply