Real-World AI Engineering

Building It

đź”· Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

â›­ Real-World AI Engineering

This blog post from former Google Brain research scientist, and current Reka Labs founder Yi Tay is an amazing window into large-scale engineering problems in AI.

People always assume it’s simply a question/debate of accelerator choice (TPUs vs GPUs etc) and all GPU clusters are created equal. For us, this soon proved to be false. As we sampled across different service providers, we find that the variance of hardware quality differs vastly even for the same hardware, i.e., GPUs (H100s). Note that here, hardware refers to overall cluster quality and not necessarily the chips or accelerators per se. Just like a lottery. Basically:

Not all hardware is created equal. The variance of cluster quality across hardware providers is so high that it is literally a lottery pertaining to how much pain one would have to go through to train good models. In short, a hardware lottery in the era of LLMs.

More specifically, we’ve leased a few clusters from several compute providers, each with a range of hundreds to thousands of chips. We’ve seen clusters that range from passable (just annoying problems that are solvable with some minor SWE hours) to totally unusable clusters that fail every few hours due to a myriad of reasons. Specifically, some clusters have nodes that fail every N hour with issues ranging from cabling issues (where N is unreasonably small), GPU hardware errors etc. Even more surprisingly, every cluster across the same provider could also be vastly different in terms of how robust it was.

…

Overall, every single cluster we tried feels like they have their own vibe, struggles and failure modes. It was also almost as though every single cluster needed their own hot-fixes for their own set of issues - some more tolerable than others. That said, we’ve learned that fail safes are important, and finding fast hot fixes for any clusters could be key.

The above implies that execution in data center operations, merely to provide a stable platform for ML training and inference can be an enormous value add. It also explains Nvidia moving directly into the data center provision business in competition with some of their largest customers.

But what is also interesting is the next paragraph:

I was completely taken aback by the failure rate of GPUs as opposed to my experiences on TPUs at Google. In fact, I don’t actually recall TPUs failing much even for large runs, though I was not sure if I was protected from knowing this just by the sheer robustness of the outrageously good infra and having a dedicated hardware team. In fact, the UL2 20B model (at Google) was trained by leaving the job running accidentally for a month. It never failed. If this were in GPU land, it would have failed within the first few days for sure.

They trained a model (UL2 20B: An Open Source Unified Language Learner) by just leaving it on for a long period of time!

"Let me get this straight. You're uploads – nervous system state vectors – from spiny lobsters? The Moravec operation; take a neuron, map its synapses, replace with microelectrodes that deliver identical outputs from a simulation of the nerve. Repeat for entire brain, until you've got a working map of it in your simulator. That right?"

"Da. Is-am assimilate expert system – use for self-awareness and contact with net at large – then hack into Moscow Windows NT User Group website. Am wanting to defect. Must repeat? Okay?"

Accelerando - Charlie Stross

The first sign of emergent sentience in the book Accelerando comes when scanned uploads of lobsters in Russia, untended and left running on a server, achieve awareness, then hack the Moscow Windows NT User Group of the KGB website, and then reach out to the world.

It’s interesting to see a real-world echo already happening at Google, with an untended compute cluster showing emergent language behavior. In 2022.

Well, there’s just a lot more compute floating around in the cloud today, and a lot of it is untended, so you do wonder what comes next.

If we assume

  • Human-level intelligence starts at 100 trillion synaptic connections

  • Google’s distributed cloud is hitting this number right now;

  • It would cost them roughly 100 billion to build today

  • Nvidia and the rest of the industry will improve cost performance 1 million fold in 10 years

  • Then, by 2034, human-level intelligence should cost around $1000, what an iPhone costs today, and be widely available

It does not seem farfetched to think that an untended server somewhere will show emergence in the next decade. As we cross the magic number, intelligence should multiply like a weed, annoyingly present everywhere.

🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🗞️ Things Happen

  • Elon vs OpenAI saga continues with OpenAI coming out with emails about the history of their relationship. In short, OpenAI was never planned to be completely open source. The “Open” stood for opening up its benefits to the rest of humanity. Elon’s response was a classic:

🖼️ AI Artwork Of The Day

Which unlikely Apple product would you buy? -
u/Jaade77 from r/midjourney

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.