Qwen3-VL

by natryson 9/23/2025, 8:59 PMwith 160 comments

by richardlblairon 9/23/2025, 11:54 PM

As I mentioned yesterday - I recently needed to process hundreds of low quality images of invoices (for a construction project). I had a script that had used pil/opencv, pytesseract, and open ai as a fallback. It still has a staggering number of failures.

Today I tried a handful of the really poor quality invoices and Qwen spat out all the information I needed without an issue. What's crazier is it gave me the bounding boxes to improve tesseract.

by deepdarkforeston 9/23/2025, 10:22 PM

The Chinese are doing what they have been doing to the manufacturing industry as well. Take the core technology and just optimize, optimize, optimize for 10x the cost/efficiency. As simple as that. Super impressive. These models might be bechmaxxed but as another comment said, i see so many that it might as well be the most impressive benchmaxxing today, if not just a genuinely SOTA open source model. They even released a closed source 1 trillion parameter model today as well that is sitting on no3(!) on lm arena. EVen their 80gb model is 17th, gpt-oss 120b is 52nd https://qwen.ai/blog?id=241398b9cd6353de490b0f82806c7848c5d2...

by helloericsfon 9/23/2025, 11:15 PM

If you're in SF, you don't want to miss this. The Qwen team is making their first public appearance in the United States, with the VP of Qwen Lab speaking at the meetup below during SF teach week. https://partiful.com/e/P7E418jd6Ti6hA40H6Qm Rare opportunity to directly engage with the Qwen team members.

by be7aon 9/23/2025, 9:34 PM

The biggest takeaway is that they claim SOTA for multi-modal stuff even ahead of proprietary models and still released it as open-weights. My first tests suggest this might actually be true, will continue testing. Wow

by Workaccount2on 9/24/2025, 3:32 AM

Sadly it still fails the "extra limb" test.

I have a few images of animals with an extra limb photoshopped onto them. A dog with an leg coming out of it's stomach, or a cat with two front right legs.

Like every other model I have tested, it insists that the animals have their anatomically correct amount of limbs. Even pointing out there is a leg coming from the dogs stomach, it will push back and insist I am confused. Insist it counted again and there are definitely only 4. Qwen took it a step further and even after I told it the image was edited, it told me it wasn't and there were only 4 limbs.

by willahmadon 9/23/2025, 10:20 PM

China is winning the hearts of developers in this race so far. At least, they won mine already.

by sergiotapiaon 9/23/2025, 10:20 PM

Thank you Qwen team for your generosity. I'm already using their thinking model to build some cool workflows that help boring tasks within my org.

https://openrouter.ai/qwen/qwen3-235b-a22b-thinking-2507

Now with this I will use it to identify and caption meal pictures and user pictures for other workflows. Very cool!

by causalon 9/23/2025, 9:24 PM

That has got to be the most benchmarks I've ever seen posted with an announcement. Kudos for not just cherrypicking a favorable set.

by BUFUon 9/23/2025, 10:34 PM

The open source models are no longer catching up. They are leading now.

by vardumpon 9/24/2025, 8:31 AM

So 235B parameter Qwen3-VL is FP16, so practically it requires at least 512 GB RAM to run? Possibly even more for a reasonable context window?

Assuming I don’t want to run it on a CPU, what are my options to run it at home under $10k?

Or if my only option is to run the model with CPU (vs GPU or other specialized HW), what would be the best way to use that 10k? vLLM + Multiple networked (10/25/100Gbit) systems?

by isoprophlexon 9/24/2025, 9:31 AM

Extremely impressive, but can one really run these >200B param models on prem in any cost effective way? Even if you get your hands on cards with 80GB ram, you still need to tie them together in a low-latency high-BW manner.

It seems to me that small/medium sized players would still need a third party to get inference going on these frontier-quality models, and we're not in a fully self-owned self-hosted place yet. I'd love to be proven wrong though.

by vesseneson 9/24/2025, 3:01 AM

Roughly 1/10 the cost of Opus 4.1, 1/2 the cost of Sonnet 4 on per token inference basis. Impressive. I'd love to see a fast (groq style) version of this served. I wonder if the architecture is amenable.

by vesseneson 9/24/2025, 3:24 AM

I spent a little time with the thinking model today. It's good. It's not better than GPT5 Pro. It might be better than the smallest GPT 5, though.

My current go-to test is to ask the LLM to construct a charging solution for my macbook pro with the model on it, but sadly, I and the pro have been sent to 15th century Florence with no money and no charger. I explain I only have two to three hours of inference time, which can be spread out, but in that time I need to construct a working charge solution.

So far GPT-5 Pro has been by far the best, not just in its electrical specifications (drawings of a commutator), but it generated instructions for jewelers and blacksmith in what it claims is 15th century florentine italian, and furnished a year-by year set of events with trading / banking predictions, a short rundown of how to get to the right folks in the Medici family, .. it was comprehensive.

Generally models suggest building an Alternating current setup and then rectifying to 5V of DC power, and trickle charging over the USB-C pins that allow trickle charging. There's a lot of variation in how they suggest we get to DC power, and often times not a lot of help on key questions, like, say "how do I know I don't have too much voltage using only 15th century tools?"

Qwen 3 VL is a mixed bag. It's the only model other than GPT5 I've talked to that suggested building a voltaic pile, estimated voltage generated by number of plates, gave me some tests to check voltage (lick a lemon, touch your tongue. Mild tingling - good. Strong tingling, remove a few plates), and was overall helpful.

On the other hand, its money making strategy was laughable; predicting Halley's comet, and in exchange demanding a workshop and 20 copper pennies from the Medicis.

Anyway, interesting showing, definitely real, and definitely useful.

by mythzon 9/24/2025, 12:50 AM

Team Qwen keeps cooking! qwen2.5VL was already my preferred visual model for querying images, will look at upgrading if they release a smaller model we can run locally.

by fareeshon 9/24/2025, 7:53 AM

Can't seem to connect to qwen.ai with DNSSEC enabled

> resolvectl query qwen.ai > qwen.ai: resolve call failed: DNSSEC validation failed: no-signature

And

https://dnsviz.net/d/qwen.ai/dnssec/ shows

aliyunga0019.com/DNSKEY: No response was received from the server over UDP (tried 4 times). See RFC 1035, Sec. 4.2. (8.129.152.246, UDP_-_EDNS0_512_D_KN)

by mountainriveron 9/24/2025, 12:37 AM

Incredible release! Qwen has been leading the open source vision models for a while now. Releasing a really big model is amazing for a lot of use cases.

I would love to see a comparison to the latest GLM model. I would also love to see no one use OS World ever again, it’s a deeply flawed benchmark.

by drapadoon 9/23/2025, 10:10 PM

Cool! Pity they are not releasing a smaller A3B MoE model

by jadboxon 9/23/2025, 10:45 PM

How does it compare to Omni?

by ramon156on 9/24/2025, 11:39 AM

One downside is it has less knowledge of lesser known tools like orpc, which is easily fixed by something like context7

by ashvardanianon 9/24/2025, 3:34 PM

Qwen models have historically been pretty good, but there seems to be no architectural novelty here, if I’m not missing it. Seems like another vision encoder, with a projection, and a large autoregressive model. Have there been any better ideas in the VLM space recently? I’ve been away for a couple of years :(

by cluelesson 9/24/2025, 1:52 AM

This demo is crazy: "At what time was the goal scored in this match, who scored it, and how was it scored?"

by michaelanckaerton 9/24/2025, 10:43 AM

Qwen has some really great models. I recently used qwen/qwen3-next-80b-a3b-thinking as a drop-in replacement for GPT-4.1-mini in an agent workflow. Cost 4 times less for input tokens and half for output, instant cost savings. As far as I can measure, system output has kept the same quality.

by am17anon 9/24/2025, 1:30 AM

This model is literally amazing. Everyone should try to get their hands on a H100 and just call it a day.

by whitehexagonon 9/24/2025, 6:39 AM

Imagine the demand for a 128GB/256GB/512GB unified memory stuffed hardware linux box shipping with Qwen models already up and running.

Although I´m agAInst steps towards AGI, it feels safer to have these things running locally and disconnected from each other, than some giant GW cloud agentic data centers connected to everyone and everything.

by Alifatiskon 9/24/2025, 9:02 AM

Wow, the Qwen team doesn't stop and keep coming up with surprises. Not only did they release this but also the new Qwen3-Max model

by buyucuon 9/24/2025, 11:25 AM

The Chinese are great. They are making major contributions to human civilization by open sourcing these models.

by youssefarizkon 9/24/2025, 7:39 PM

Another day another Qwen model