The model meets/beats Llama despite having an order-of-magnitude fewer active parameters (52B vs 405B). Absolutely bonkers. AI is moving so fast with these breakthroughs -- synthetic data, distillation, alt. architectures (e.g. MoE/SSM), LoRA, RAG, curriculum learning, etc.
We've come so astonishingly far in like two years. I have no idea what AI will do in another year, and it's thrilling.
the paper with details: https://arxiv.org/pdf/2411.02265
They use
- 16 experts, of which one is activated per token
- 1 shared expert that is always active
in summary that makes around 52B active parameters per token instead of the 405B of LLama3.1.
> Territory” shall mean the worldwide territory, excluding the territory of the European Union.
Anyone have some background on this?
- 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. - outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model.
Definitely not trained on Nvidia or AMD GPUs.
I'm no expert on these MoE models with "a total of 389 billion parameters and 52 billion active parameters". Do hobbyists stand a chance of running this model (quantized) at home? For example on something like a PC with 128GB (or 512GB) RAM and one or two RTX 3090 24GB VRAM GPUs?
How does it compare with LLama3.2?
Not open source. Even if we accept model weights as source code, which is highly dubious, this clearly violates clauses 5 and 6 of the Open Source Definition. It discriminates between users (clause 5) by refusing to grant any rights to users in the European Union, and it discriminates between uses (clause 6) by requiring agreement to an Acceptable Use Policy.
EDIT: The HN title was changed, which previously made the claim. But as HN user swyx pointed out, Tencent is also claiming this is open source, e.g.: "The currently unveiled Hunyuan-Large (Hunyuan-MoE-A52B) model is the largest open-source Transformer-based MoE model in the industry".