GPT-4 Beater

The team at Alibaba releases the first model to surpass GPT-4 on some benchmarks yesterday.

đź”· Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

The team at Alibaba releases the first model to surpass GPT-4 on some benchmarks yesterday. Qwen-VL-Max matches or surpasses GPT-4 and Gemini Ultra on some metrics… and is open source.. so you can try it for yourself at huggingface (overloaded since launch.. so maybe in a few days!)

Notably:

  • It’s primarily an image comprehension model - which makes sense as the Chinese language is far more vision-centric with much more complex UI design than

  • It can annotate and respond with images. In response to an input image and the prompt “Locate the red car”

Qwen response is image with annotation and text “The red car is located in the bottom right corner of the image.“

  • It can understand the significance of parts of an image. One can only imagine an optimized model deployed locally, for autonomous driving, like the example below:

  • It understands flowcharts, diagrams, charts, and graphs. And can reason about them. It can solve grade school math problems (who needs AGI after this)

User: read the image and solve it step by step. - this prompt is enough for Qwen to find surface area and volume for both objects as per the question

  • Understands and can explain flowcharts

  • It can understand and parse and transform chart data. The below would replace the McKinsey analyst class of 2025.

  • It can reason from diagrams. The below is analogous to Raven’s Standard Progressive Matrices, a widely used intelligence test.. so we are not far from one of these models coolly beating humans at that test

  • It is very good at extracting structured data from images.. the below is a product many startups worked on in last decade

  • Qwen has also been trained on dense text information retrieval on unusual aspect ratios, like the below:

All in all, Qwen is going to be a very useful model for a lot of enterprise tasks and seems to have beaten GPT-4V on some of these. The language capability is not yet GPT-4 standard yet… but the intelligence is.. so maybe it’s not that far behind.

Benchmark Performance

This is definitely acceleration, and we are happy China is open-sourcing it.

Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.