- Emergent Behavior
- Posts
- Beating GPT-4(V) Finally
Beating GPT-4(V) Finally
And we have a contender from the Far East
đź”· Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.
Here’s today at a glance:
🦾 GPT-4 Beater
The team at Alibaba releases the first model to surpass GPT-4 on some benchmarks yesterday. Qwen-VL-Max is matches or surpasses GPT-4 and Gemini Ultra on some metrics… and is open source.. so you can try it for yourself at huggingface (overloaded since launch.. so maybe in a few days!)
Notably:
It’s primarily an image comprehension model - which makes sense as the Chinese language is far more vision-centric with much more complex UI design than
It can annotate and respond with images. In response to an input image and the prompt “Locate the red car”
Qwen response is image with annotation and text “The red car is located in the bottom right corner of the image.“
It can understand the significance of parts of an image. One can only imagine an optimized model deployed locally, for autonomous driving, like the example below:
It understands flowcharts, diagrams, charts, and graphs. And can reason about them. It can solve grade school math problems (who needs AGI after this)
User: read the image and solve it step by step. - this prompt is enough for Qwen to find surface area and volume for both objects as per the question
Understands and can explain flowcharts
It can understand and parse and transform chart data. The below would replace the McKinsey analyst class of 2025.
It can reason from diagrams. The below is analogous to Raven’s Standard Progressive Matrices, a widely used intelligence test.. so we are not far from one of these models coolly beating humans at that test
It is very good at extracting structured data from images.. the below is a product many startups worked on in last decade
Qwen has also been trained on dense text information retrieval on unusual aspect ratios, like the below:
All in all, Qwen is going to be a very useful model for a lot of enterprise tasks and seems to have beaten GPT-4V on some of these. The language capability is not yet GPT-4 standard yet… but the intelligence is.. so maybe it’s not that far behind.
Benchmark performance
This is definitely acceleration, and we are happy China is open-sourcing it.
🌠Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!
Or send them the below subscription link:
🛍️ Buy Anything Now
What is the purpose of technology except fulfillment of human desire. And what better use of technology than to identify and fulfill those desires in the shortest amount of time possible?
This is what TikTok’s new Shop feature does, reports Bloomberg, though details are otherwise sparse. Reportedly:
you can click on any item in a TikTok video
and find similar products on TikTok Shop
turning every video into an infomercial
x1 million fold monetizable ad content
Every video is an infomercial
Tiktok of course is the most seamless, user-friendly, product-focused innovator in AI, and I have to say Zhang Yiming taking a step back from founder-CEO to product lead definitely shows in the velocity of AI product and execution.
The number of nascent technologies that have to be stiched together, debugged, and deployed at billion fold scale on video! video! are simply mindboggling. Finally, the rumors of multi-billion dollar backdated Nvidia chip orders and secret deals with TikTok headquarters Singapore adjacent Malaysia for data center capacity start to make sense.
🗞️ Things Happen
Neuralink brain computer interface implanted in a human for this first time yesterday, according to Elon. This is our best bet for the Merge, and Elon has gotten to the crux of the issue by attacking the problem from the data acquisition funnel entrance rather than the midpoint model that most academics work on. Good luck!
Movie studio hiring brief for VFX role reveals tools currently achieving prominence amongst professional users are all Stable Diffusion ecosystem based including Automatic1111, ComfyUI, InvokeAI, to deploy techniques including ControlNet, IpAdaptor, T2i Adapters, AnimDiff, Loras.
“AI-generated code resembles an itinerant contributor, prone to violate the DRY-ness [don't repeat yourself] of the repos visited“ - reports a study of Github CoPilot users indicating that code quality declines after AI code generators are implemented. Worrying.
🖼️ AI Artwork of The Day
Comic Panel - u/UnlimitedDuck on r/StableDiffusion
That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:
Reply