Hacker News

by IdealeZahlenon 8/6/2025, 3:50 PMwith 60 comments

by nistenon 8/6/2025, 6:59 PM

If you want to have an opinion on it,

just install lmstudio and run the q8_0 version of it i.e. here https://huggingface.co/bartowski/Qwen_Qwen3-4B-Instruct-2507....

you can even run it on a 4gb raspberry pi Qwen_Qwen3-4B-Instruct-2507-Q4_K_L.gguf https://lmstudio.ai/

Keep in mind if you run it at the full 262144 tokens of context youll need ~65gb of ram.

Anyway if you're on mac you can search for "qwen3 4b 2507 mlx 4bit" and run the mlx version which is often faster on m chips. Crazy impressive what you get from a 2gb file in my opinion.

It's pretty good for summaries etc, can even make simple index.html sites if you're teaching students but it can't really vibecode in my opinion. However for local automation tasks like summarizing your emails, or home automation or whatever it is excellent.

It's crazy that we're at this point now.

by film42on 8/6/2025, 5:54 PM

Is there a crowd-sourced sentiment score for models? I know all these scores are juiced like crazy. I stopped taking them at face value months ago. What I want to know is if other folks out there actually use them or if they are unreliable.

by esafakon 8/6/2025, 4:45 PM

This one should work on personal computers! I'm thankful for Chinese companies raising the floor.

by frontsideairon 8/6/2025, 4:50 PM

According to the benchmarks, this one is improved in every one of them compared to the previous version, some better than 30B-A3B. Definitely worth a try, it’ll easily fit into memory and token generation speed will be pleasantly fast.

by gokon 8/6/2025, 4:41 PM

So this 4B dense model gets very similar performance to the 30B MoE variant with 7.5x smaller footprint.

by svnton 8/6/2025, 6:54 PM

It is interesting to think about how they are achieving these scores. The evals are rated by GPT-4.1. Beyond just overfitting to benchmarks, is it possible the models are internalizing how to manipulate the ratings model/agent? Is anyone manually auditing these performance tables?

by toleranceon 8/6/2025, 4:55 PM

Is there like a leaderboard or power rankings sort of thing that tracks these small open models and assigns ratings or grades to them based on particular use cases?

by jampaon 8/6/2025, 5:46 PM

I am reading this right, is this model way better than Gemma 3n[1]? (For only the benchmarks that are common among the models)

=====

LiveCodeBench

E4B IT: 13.2

Qwen: 55.2

===== AIME25

E4B IT: 11.6

Qwen: 81.3

[1]: https://huggingface.co/google/gemma-3n-E4B

by Demiurgeon 8/7/2025, 12:51 AM

I've been trying this today, and I'm getting a lot of hallucinations for suggestions. However, the analysis of problems really quite good.

Qwen3-4B-Thinking-2507