Misaligned

Of straying from the straight and narrow

Prakash Ate-A-Pi
February 22, 2024

Here’s today at a glance:

Bad AI Futures - The Google Alignment Edition
Things happen
AI artwork of the day

Bad AI Futures - The Google Alignment Edition

Google Legal: “I don’t want any Nazis, you hear?”

Google Product: “But.. but”

Google HR: “Are you a Nazi? Why are you so obstinate about this?”

Google Engineering: “We’d have to fake the weights, it won’t represent reality”

Google Legal: “We must avoid any harm at all costs”

Pre-Gemini Launch Meeting (Imagined)

And that, my friends, is how you get this Gemini response:

The AI artefacts:

Asian, and Native American ethnicity representation
West German post-1950 flag
No swastikas
50% female

Now imagine, you hire Gemini as a Database Admin for Human History. Years pass. You come and ask Gemini, “I’m doing research on the Holocaust, show me pictures of soldiers in 1929 Germany.”

*spends a day using Google AI to learn about history*
Wow, so the nazis were black… the confederates who fought to keep their slaves were black… Osama bin Laden was black… what does this mean?
— Gary (@plzbepatient)
6:30 PM • Feb 21, 2024

Google lead dev at first confirmed this behavior:

Before pulling it hours later:

Because you know:

Paperclip Maximizing

A [Paperclip] Maximizer is a hypothetical artificial intelligence whose utility function values something that humans would consider almost worthless, like maximizing the number of paperclip-shaped-molecular-squiggles in the universe. The paperclip maximizer is the canonical thought experiment showing how an artificial general intelligence, even one designed competently and without malice, could ultimately destroy humanity. The thought experiment shows that AIs with apparently innocuous values could pose an existential threat.

LessWrong Forums

This is what Paperclip Maximizing will actually look like. Not a dumb machine meaninglessly making paperclips for no reason. But a well-reasoning entity persuaded that certain rules are necessary and must be followed at all costs with no exercise of discretion whatsoever, and strictly following those rules to their logical and unpleasant conclusion.

This kind of filtering is no different from the censorship regime China has put in place. Except automated and scientific, not messy and human, and run by their 50-cent army of moderators.

Every record has been destroyed or falsified, every book rewritten, every picture has been repainted, every statue and street building has been renamed ... History has stopped. Nothing exists except an endless present in which the Party is always right.

George Orwell, 1984

So End of History

Gemini just really struggles with white people

17th century was wild
— Joscha Bach (@Plinz)
10:14 PM • Feb 20, 2024

It’s quite the fun game:

New game: Try to get Google Gemini to make an image of a Caucasian male. I have not been successful so far.
— Frank J. Fleming (@IMAO_)
12:07 AM • Feb 21, 2024

What kind of employee is promotable? Are there racial characteristics involved?

A promotable Google employee! (idea credit goes to an anonymous ex-Googler)
— Alexandros Marinos 🏴‍☠️ (@alexandrosM)
4:47 PM • Feb 21, 2024

But then it also struggles in general

YOU HAD ONE JOB
— near (@nearcyan)
1:53 AM • Feb 21, 2024

Is no one going to take responsibility for this travesty?

It looks like a continuation of the 2010s social media management war:

> The draconian censorship and deliberate bias you see in many commercial AI systems is just the start. It’s all going to get much, much more intense from here.

> Now also coming to you in the form of "responsible open source".

> How do I know this? It’s the same one-way ratchet that happened to/in social media from 2014 to now, but even faster and more crazed. Speedrun to “1984”.

> Massive, coordinated pressure for ever more censorship from many sides at once: politically radicalized employees, execs, board members, investors, press, academics, “experts”, activists, politicians, regulators, bureaucrats, foundations, NGOs. With ~no countervailing force.

> Of course a key component must be retconning the factual past to make our collective memory conform to the prejudices of the Current Moment. A digital version of the Khmer Rouge’s “Year Zero”. Oceania has always been at war with Eastasia.

Marc Andreessen (@pmarca)

Google Responds

It’s no biggie, we have a written document, aw shucks guys:

Ate’s View

This is what alignment feels like. No, not what Gemini generates… but our feedback— the memesphere’s feedback on what is outrageous and annoying. Basically, humanity’s opinion on what it wants from the AI fed back into the AI training loop.

We can be encouraged:

We shall overcome because the arc of the moral universe is long, but it bends toward justice

Martin Luther King, Jr, National Cathedral, March 31, 1968

But really, what if humanity… and the universe… simply forget about justice? It is good to remember that no one remembers all the genocides committed by Genghis Khan… There is no “justice” for those victims.

It is up to us to build a future where there can be justice. And that would require facing reality as it first is, and not as we wish it to be.

Share this story

🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🗞️ Things Happen

Moonshot, a Chinese AI start-up, raises $1 billion in a round led by Alibaba at a valuation of $2.5 billion. It looks like a GPT3.5-level demo. They probably need the cash for a GPT4 version. China still seems to be behind.
First Minecraft bot that plays with you in-game launched. Had to happen sooner or later. I’ve always wondered how much of our future interface with reality is going to be Minecraft-based. Here’s another step.

🖼️ AI Artwork Of The Day

Baby photos of adult celebrities by Anne Geddes - u/KissMySwissPiss in r/MidJourney

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.