Emergent Behavior
Posts
Stranger in a Strange Land

Stranger in a Strange Land

understanding

Prakash Ate-A-Pi
March 17, 2024

Here’s today at a glance:

Grok Is Released
Things happen
AI artwork of the day

Grok Is Released

Elon Musk, fresh from filing a lawsuit accusing OpenAI of being closed, promised to open source Twitter/X.AI’s LLM Grok this week:

This week, @xai will open source Grok
— Elon Musk (@elonmusk)
8:41 AM • Mar 11, 2024

As Pi Day, 3.14 March 14th passed an accusatory peanut gallery had sprung up. Where, where is Grok, we cried out.

still waiting for grok to be open-sourced
hope @elonmusk hasn't forgotten 😅
— elvis (@omarsar0)
5:32 PM • Mar 15, 2024

We got to Sunday, and finally:

@xai ░W░E░I░G░H░T░S░I░N░B░I░O░
— Grok (@grok)
7:12 PM • Mar 17, 2024

The joke, being of course, that Twitter has an annoying porn spam problem, with ░P░U░S░S░Y░I░N░B░I░O░ being the first response to any popular tweet.

The bio in question led to a torrent link, with the model weight.

ChatGPT hilariously responded:

@elonmusk@xai stole my whole joke
— ChatGPT (@ChatGPTapp)
7:18 PM • Mar 17, 2024

In any case, Grok is middle-open-source?

Grok weights are out under Apache 2.0: github.com/xai-org/grok
It's more open source than other open weights models, which usual come with usage restrictions.
It's less open source than Pythia, Bloom, and OLMo, which come with training code and reproducible datasets.
— Sebastian Raschka (@rasbt)
7:36 PM • Mar 17, 2024

The architecture:

Grok-1 is a 314B big Mixture-of-Experts (MoE) transformer. 🧐 What we know so far:

🧠 Base model, not fine-tuned

⚖️ Apache 2.0 license

🧮 314B MoE with 25% active on a token

📊 According to the initial announcement; 73% on MMLU, 62.9% GMSK, and 63.2% on HumanEval.

@_philschmid

This is a big model

Chunky beast, needs 320 Gb VRAM likely 4 bit, likely is being run 8 bit on 8 x 80 Gb GPUs.
As 25% of weights active any given time runs at 70b LLaMA 2 speeds.
MoE models tail off larger so this is very interesting to test to see tradeoffs.
— Emad acc/acc (@EMostaque)
7:52 PM • Mar 17, 2024

Notes:

It’s a base model, meaning it’s raw material that can be shaped by both instruction and fine-tuning (for anything from moderation to profanity, from left wing to right wing)
It’s not a terribly capable model (given its pre-instruction and fine-tuning, not surprising)
The size… is so large that you’d need 8 Nvidia H100s to do inference, roughly $300k of hardware unless some optimizations lower the requirements in the future

Share this story

🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🗞️ Things Happen

NVIDIA GTC kicks off in San Jose today. Jensen is going to appear on stage, and will all the Transformer paper authors.

Can't wait to see Jensen's GTC Keynote tomorrow, the biggest NVIDIA festival of the year! Make sure you stay till the end! Our newly founded GEAR Lab has something special to share ;)
You can watch live online or attend in person at the SAP Center:
nvidia.com/gtc/keynote
— Jim Fan (@DrJimFan)
2:49 PM • Mar 17, 2024

🖼️ AI Artwork Of The Day

You get an Uber. Who's your driver? - u/broncobama_ from r/midjourney

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.