- Emergent Behavior
- Posts
- Stranger in a Strange Land
Stranger in a Strange Land
understanding
🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.
Here’s today at a glance:
Grok Is Released
Elon Musk, fresh from filing a lawsuit accusing OpenAI of being closed, promised to open source Twitter/X.AI’s LLM Grok this week:
This week, @xai will open source Grok
— Elon Musk (@elonmusk)
8:41 AM • Mar 11, 2024
As Pi Day, 3.14 March 14th passed an accusatory peanut gallery had sprung up. Where, where is Grok, we cried out.
still waiting for grok to be open-sourced
hope @elonmusk hasn't forgotten 😅
— elvis (@omarsar0)
5:32 PM • Mar 15, 2024
We got to Sunday, and finally:
@xai ░W░E░I░G░H░T░S░I░N░B░I░O░
— Grok (@grok)
7:12 PM • Mar 17, 2024
The joke, being of course, that Twitter has an annoying porn spam problem, with ░P░U░S░S░Y░I░N░B░I░O░ being the first response to any popular tweet.
The bio in question led to a torrent link, with the model weight.
ChatGPT hilariously responded:
@elonmusk@xai stole my whole joke
— ChatGPT (@ChatGPTapp)
7:18 PM • Mar 17, 2024
In any case, Grok is middle-open-source?
Grok weights are out under Apache 2.0: github.com/xai-org/grok
It's more open source than other open weights models, which usual come with usage restrictions.
It's less open source than Pythia, Bloom, and OLMo, which come with training code and reproducible datasets.
— Sebastian Raschka (@rasbt)
7:36 PM • Mar 17, 2024
The architecture:
Grok-1 is a 314B big Mixture-of-Experts (MoE) transformer. 🧐 What we know so far:
🧠 Base model, not fine-tuned
⚖️ Apache 2.0 license
🧮 314B MoE with 25% active on a token
📊 According to the initial announcement; 73% on MMLU, 62.9% GMSK, and 63.2% on HumanEval.
This is a big model
Chunky beast, needs 320 Gb VRAM likely 4 bit, likely is being run 8 bit on 8 x 80 Gb GPUs.
As 25% of weights active any given time runs at 70b LLaMA 2 speeds.
MoE models tail off larger so this is very interesting to test to see tradeoffs.
— Emad acc/acc (@EMostaque)
7:52 PM • Mar 17, 2024
Notes:
It’s a base model, meaning it’s raw material that can be shaped by both instruction and fine-tuning (for anything from moderation to profanity, from left wing to right wing)
It’s not a terribly capable model (given its pre-instruction and fine-tuning, not surprising)
The size… is so large that you’d need 8 Nvidia H100s to do inference, roughly $300k of hardware unless some optimizations lower the requirements in the future
🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!
Or send them the below subscription link:
🗞️ Things Happen
NVIDIA GTC kicks off in San Jose today. Jensen is going to appear on stage, and will all the Transformer paper authors.
Can't wait to see Jensen's GTC Keynote tomorrow, the biggest NVIDIA festival of the year! Make sure you stay till the end! Our newly founded GEAR Lab has something special to share ;)
You can watch live online or attend in person at the SAP Center:
nvidia.com/gtc/keynote
— Jim Fan (@DrJimFan)
2:49 PM • Mar 17, 2024
🖼️ AI Artwork Of The Day
You get an Uber. Who's your driver? - u/broncobama_ from r/midjourney
That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:
Reply