2024-03-29: Mix and Match: Fish Edition

merger of the unequal

đź”· Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

Mix & Match

Sakana AI, the Tokyo-based AI startup run by Google and Stability David Ha (@hardmaru) who was the mod for r/MachineLearning for many years, and former Google Llion Jones, finally had a grand unveil of their project:

Model merging

In essence, they took:

  • A language model that could do math; and

  • A language model that could output Japanese

  • Merged them either by summing their weights (ie Parameter Space) or by changing the inference path the tokens take through the layers of the model (ie Data Flow Space) or both

  • Evolving the merge process toward a better outcome

  • Until they got to a single language model that could do Japanese math! Something it had not specifically been trained for.

The team innovated on:

  • Figured out a number of hacks to reduce the search space to something small enough to be feasible

  • Translated one of the widely used math datasets (GSM8k) to Japanese

  • Created a new benchmark dataset for Japanese visual language models

And they ended up with a 7 billion parameter model that is state of the art and better than 70 billion parameter Japanese language models.

This is pretty exciting, as there are 500,000 models of various kinds sitting on huggingface, a large number of which barely get any use in a typical Pareto structure. It is quite possible now that besides these models themselves, Sakana has added a toolkit to patch various model capabilities together to make something new.

Notably, the lead researcher on the project was Takuya Akiba, who worked at Stability with David Ha prior. It strikes me that David was able to hire local talent in Tokyo (and though Akiba-san has 3 degrees from Todai, he was working for a Japanese firm prior) at Stability, and saw the writing on the wall 8 months ago and struck out on his own. It does make one wonder how much other trapped talent there is in the world, that we just don’t really know how to hire or address.

Sakana also built a culturally aware Japanese image generation model in the same manner.

A generated image from Sakana’s diffusion model

🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🗞️ Things Happen

  • Google Deepmind founder Demis Hassabis gets knighted.

  • TSMC comes out to hint that Nvidia’s targeted 1 million fold performance increase might be possible.

🖼️ AI Artwork Of The Day

Bizzarre discoveries popping up globally - u/shaner4042 from r/midjourney

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.