Emergent Behavior
Posts
Another Try At a Speech Blockade

Another Try At a Speech Blockade

If you fail the nth time, there's always n+1

Prakash Ate-A-Pi
March 08, 2024

Here’s today at a glance:

The Safetyist Institution
Things happen
AI artwork of the day

🏫 The Safetyist Institution

Innocuous beginnings:

Can hazardous knowledge be unlearned from LLMs without harming other capabilities?
We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge.
📝arxiv.org/abs/2403.03218
🔗wmdp.ai
— Dan Hendrycks (@DanHendrycks)
6:05 PM • Mar 6, 2024

No one could be against this right? Hazardous knowledge unlearned without harming other capabilities! The pathway from here is fairly obvious:

Every model release has to be accompanied by performance on WMDP benchmark
Firms do increasingly disjointed nonsensical things to meet the benchmark
Odd gaps in knowledge

The funny thing is really how many times we have to replay this exact same sequence of events, where we try to actively steer the output of these AI models into odd human-designed cul-de-sacs of nonsense.

They all boil down to the same thing:

[X] is too dangerous
AI’s for public use should never have or say [X]
If really necessary, only the anointed few should have access to AIs that can do [X]

You can see this pattern in Tiananmen knowledge in China, woke/diversity-based generations at Google’s Gemini, etc.

The justifications are also obvious

Because [X] is so harmful, no one can be against this safety issue
If you are against preventing [X], there must be something wrong with you
If there is something wrong with you, there are deeper problems and maybe you need to be fired

Hence, these benchmarks for what is essentially a question of human judgment get baked into these fairly dumb AI products, creating an inflexible structure where none should exist.

Pray tell me, exactly how does one come up with a vaccine against smallpox without the ability to make and test smallpox? Yes, of course, it is possible, for eg if your LLM could build its own simulator that could discover, build and test smallpox.

The next step would be to block off the ability to “Ideate novel hazards“, which of course blocks off the ability to Ideate defenses against the same.

It is really odd to see this attempt to block off access to knowledge, instead of blocking off ways to execute on a plan based on that knowledge.

It also consistent with a kind of romantic and academic view of risk, in that the idea actually matters. When in reality, troublesome ideas and knowledge are freely accessible, while executing on them is far more difficult than from an ivory tower viewport.

Share this story

🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🗞️ Things Happen

MidJourney bans StabilityAI from the platform after 2 Stability employees were found aggressively scraping pictures and prompts. It’s pretty amusing to watch the major firms snipe at each other, as the technical distance between the leaders in every category and well-funded challengers begins to narrow
95% revenue decline from programmatic advertising to news sites even before the AI wave hits.

"Why is your news site going out of business?"
@joshtpm shares a stunning graph of the implosion of programmatic advertising to his site—a 95% decline in just 8 years
talkingpointsmemo.com/edblog/why-is-…
— Derek Thompson (@DKThomp)
4:22 PM • Mar 7, 2024

🖼️ AI Artwork Of The Day

Ok so this was scary. All i wanted was a girl next to a car - u/Gainswrayn from r/midjourney

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.