- Emergent Behavior
- Posts
- 2024-03-29: Mix and Match: Fish Edition
2024-03-29: Mix and Match: Fish Edition
merger of the unequal
đź”· Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.
Here’s today at a glance:
Mix & Match
Sakana AI, the Tokyo-based AI startup run by Google and Stability David Ha (@hardmaru) who was the mod for r/MachineLearning for many years, and former Google Llion Jones, finally had a grand unveil of their project:
Model merging
Model merging explained in one image
— Omar Sanseviero (@osanseviero)
8:03 PM • Jan 11, 2024
In essence, they took:
A language model that could do math; and
A language model that could output Japanese
Merged them either by summing their weights (ie Parameter Space) or by changing the inference path the tokens take through the layers of the model (ie Data Flow Space) or both
Evolving the merge process toward a better outcome
Until they got to a single language model that could do Japanese math! Something it had not specifically been trained for.
The team innovated on:
Figured out a number of hacks to reduce the search space to something small enough to be feasible
Translated one of the widely used math datasets (GSM8k) to Japanese
Created a new benchmark dataset for Japanese visual language models
And they ended up with a 7 billion parameter model that is state of the art and better than 70 billion parameter Japanese language models.
This is pretty exciting, as there are 500,000 models of various kinds sitting on huggingface, a large number of which barely get any use in a typical Pareto structure. It is quite possible now that besides these models themselves, Sakana has added a toolkit to patch various model capabilities together to make something new.
Notably, the lead researcher on the project was Takuya Akiba, who worked at Stability with David Ha prior. It strikes me that David was able to hire local talent in Tokyo (and though Akiba-san has 3 degrees from Todai, he was working for a Japanese firm prior) at Stability, and saw the writing on the wall 8 months ago and struck out on his own. It does make one wonder how much other trapped talent there is in the world, that we just don’t really know how to hire or address.
Sakana also built a culturally aware Japanese image generation model in the same manner.
A generated image from Sakana’s diffusion model
🌠Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!
Or send them the below subscription link:
🗞️ Things Happen
Google Deepmind founder Demis Hassabis gets knighted.
Delighted and honoured to receive a Knighthood for services to AI. It’s been an incredible journey so far building @GoogleDeepMind over the past 15 years, helping accelerate the field and grow the UK & global AI ecosystems. Thanks to everyone who helped make this dream possible!
— Demis Hassabis (@demishassabis)
6:48 PM • Mar 28, 2024
TSMC comes out to hint that Nvidia’s targeted 1 million fold performance increase might be possible.
TSMC forecasting a 1000x improvement in GPU performance per watt over the next 15 years. Coupled with major algorithmic improvements we're quickly seeing every week, it isn't crazy to expect 100,000 to 1,000,000x increase in AI performance per dollar in the next decade and a half
— JD Ross (@justindross)
8:02 PM • Mar 28, 2024
🖼️ AI Artwork Of The Day
Bizzarre discoveries popping up globally - u/shaner4042 from r/midjourney
That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:
Reply