Hacker News

by helloericsfon 11/5/2024, 6:52 PMwith 103 comments

by mrobon 11/5/2024, 8:25 PM

Not open source. Even if we accept model weights as source code, which is highly dubious, this clearly violates clauses 5 and 6 of the Open Source Definition. It discriminates between users (clause 5) by refusing to grant any rights to users in the European Union, and it discriminates between uses (clause 6) by requiring agreement to an Acceptable Use Policy.

EDIT: The HN title was changed, which previously made the claim. But as HN user swyx pointed out, Tencent is also claiming this is open source, e.g.: "The currently unveiled Hunyuan-Large (Hunyuan-MoE-A52B) model is the largest open-source Transformer-based MoE model in the industry".

by a_wild_dandanon 11/5/2024, 8:45 PM

The model meets/beats Llama despite having an order-of-magnitude fewer active parameters (52B vs 405B). Absolutely bonkers. AI is moving so fast with these breakthroughs -- synthetic data, distillation, alt. architectures (e.g. MoE/SSM), LoRA, RAG, curriculum learning, etc.

We've come so astonishingly far in like two years. I have no idea what AI will do in another year, and it's thrilling.

by 1R053on 11/5/2024, 8:31 PM

the paper with details: https://arxiv.org/pdf/2411.02265

They use

- 16 experts, of which one is activated per token

- 1 shared expert that is always active

in summary that makes around 52B active parameters per token instead of the 405B of LLama3.1.

by the_dukeon 11/5/2024, 8:33 PM

> Territory” shall mean the worldwide territory, excluding the territory of the European Union.

Anyone have some background on this?

by helloericsfon 11/5/2024, 6:52 PM

- 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. - outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model.

by eptcykaon 11/5/2024, 8:21 PM

Definitely not trained on Nvidia or AMD GPUs.

by Tepixon 11/5/2024, 10:50 PM

I'm no expert on these MoE models with "a total of 389 billion parameters and 52 billion active parameters". Do hobbyists stand a chance of running this model (quantized) at home? For example on something like a PC with 128GB (or 512GB) RAM and one or two RTX 3090 24GB VRAM GPUs?

by adton 11/5/2024, 9:51 PM

https://lifearchitect.ai/models-table/

by iqandjokeon 11/6/2024, 2:45 AM

How does it compare with LLama3.2?

Tencent Hunyuan-Large