Emergent Behavior
Posts
Bad AI Futures - The Google Alignment Edition

Bad AI Futures - The Google Alignment Edition

Build a future where there can be justice, facing reality as it first is, and not as we wish it to be

Prakash Ate-A-Pi
February 22, 2024

Google Legal: “I don’t want any Nazis, you hear?”

Google Product: “But.. but”

Google HR: “Are you a Nazi? Why are you so obstinate about this?”

Google Engineering: “We’d have to fake the weights, it won’t represent reality”

Google Legal: “We must avoid any harm at all costs”

Pre-Gemini Launch Meeting (Imagined)

And that, my friends, is how you get this Gemini response:

The AI artefacts:

Asian, and Native American ethnicity representation
West German post-1950 flag
No swastikas
50% female

Now imagine, you hire Gemini as a Database Admin for Human History. Years pass. You come and ask Gemini, “I’m doing research on the Holocaust, show me pictures of soldiers in 1929 Germany.”

*spends a day using Google AI to learn about history*
Wow, so the nazis were black… the confederates who fought to keep their slaves were black… Osama bin Laden was black… what does this mean?
— Gary (@plzbepatient)
6:30 PM • Feb 21, 2024

Google lead dev at first confirmed this behavior:

Before pulling it hours later:

Because you know:

Paperclip Maximizing

A [Paperclip] Maximizer is a hypothetical artificial intelligence whose utility function values something that humans would consider almost worthless, like maximizing the number of paperclip-shaped-molecular-squiggles in the universe. The paperclip maximizer is the canonical thought experiment showing how an artificial general intelligence, even one designed competently and without malice, could ultimately destroy humanity. The thought experiment shows that AIs with apparently innocuous values could pose an existential threat.

LessWrong Forums

This is what Paperclip Maximizing will actually look like. Not a dumb machine meaninglessly making paperclips for no reason. But a well-reasoning entity persuaded that certain rules are necessary and must be followed at all costs with no exercise of discretion whatsoever, and strictly following those rules to their logical and unpleasant conclusion.

This kind of filtering is no different from the censorship regime China has put in place. Except automated and scientific, not messy and human, and run by their 50-cent army of moderators.

Every record has been destroyed or falsified, every book rewritten, every picture has been repainted, every statue and street building has been renamed ... History has stopped. Nothing exists except an endless present in which the Party is always right.

George Orwell, 1984

So End of History

Gemini just really struggles with white people

17th century was wild
— Joscha Bach (@Plinz)
10:14 PM • Feb 20, 2024

It’s quite the fun game:

New game: Try to get Google Gemini to make an image of a Caucasian male. I have not been successful so far.
— Frank J. Fleming (@IMAO_)
12:07 AM • Feb 21, 2024

What kind of employee is promotable? Are there racial characteristics involved?

A promotable Google employee! (idea credit goes to an anonymous ex-Googler)
— Alexandros Marinos 🏴‍☠️ (@alexandrosM)
4:47 PM • Feb 21, 2024

But then it also struggles in general

YOU HAD ONE JOB
— near (@nearcyan)
1:53 AM • Feb 21, 2024

Is no one going to take responsibility for this travesty?

It looks like a continuation of the 2010s social media management war:

> The draconian censorship and deliberate bias you see in many commercial AI systems is just the start. It’s all going to get much, much more intense from here.

> Now also coming to you in the form of "responsible open source".

> How do I know this? It’s the same one-way ratchet that happened to/in social media from 2014 to now, but even faster and more crazed. Speedrun to “1984”.

> Massive, coordinated pressure for ever more censorship from many sides at once: politically radicalized employees, execs, board members, investors, press, academics, “experts”, activists, politicians, regulators, bureaucrats, foundations, NGOs. With ~no countervailing force.

> Of course a key component must be retconning the factual past to make our collective memory conform to the prejudices of the Current Moment. A digital version of the Khmer Rouge’s “Year Zero”. Oceania has always been at war with Eastasia.

Marc Andreessen (@pmarca)

Google Responds

It’s no biggie, we have a written document, aw shucks guys:

Ate’s View

This is what alignment feels like. No, not what Gemini generates… but our feedback— the memesphere’s feedback on what is outrageous and annoying. Basically, humanity’s opinion on what it wants from the AI fed back into the AI training loop.

We can be encouraged:

We shall overcome because the arc of the moral universe is long, but it bends toward justice

Martin Luther King, Jr, National Cathedral, March 31, 1968

But really, what if humanity… and the universe… simply forget about justice? It is good to remember that no one remembers all the genocides committed by Genghis Khan… There is no “justice” for those victims.

It is up to us to build a future where there can be justice. And that would require facing reality as it first is, and not as we wish it to be.

Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.