This is (possibly) a GPT-4 level dense model with an open source license. Nvidia has released models with issues before, but reports on this so far indicate it's a solid contender without any of the hiccups of previous releases.
A 340B model should require around 700GB vram or ram to run inference. To train or finetune, you're looking at almost double, which is probably why Nvidia recommends 2xA100 nodes with 1.28TB vram.
Jensen Huang is the king of AI summer.
The "open" and "permissive" license has an interesting section on "AI Ethics":
> AI Ethics. NVIDIA is committed to safety, trust and transparency in AI development. NVIDIA encourages You to (a) ensure that the product or service You develop, use, offer as a service or distributes meets the legal and ethical requirements of the relevant industry or use case, (b) take reasonable measures to address unintended bias and to mitigate harm to others, including underrepresented or vulnerable groups, and (c) inform users of the nature and limitations of the product or service. NVIDIA expressly prohibits the use of its products or services for any purpose in violation of applicable law or regulation, including but not limited to (a) illegal surveillance, (b) illegal collection or processing of biometric information without the consent of the subject where required under applicable law, or (c) illegal harassment, abuse, threatening or bullying of individuals or groups of individuals or intentionally misleading or deceiving others
https://developer.download.nvidia.com/licenses/nvidia-open-m...
Besides limiting the freedom of use (making it less "open" in my eyes), it's interesting that they tell you to meet "ethical requirements of the relevant industry or use case". Seems like that'd be super hard to pin down in a precise way.
it's 5x the price of llama3/qwen2 70b. the performance on the benchmark is similar. but with 70b you can break a task in steps and do 5+ steps. doesn't seem like it is worth it in general cases for the price. is 340 better for synthetic data generation (which is my primary usecase) are there tests for that? seems like synthetic data would benefit from multi step reasoning and reduction of hallucination and in those tests, the difference is small.
3 models are included: base, instruct, and reward. All under license permitting synthetic data generation and commercial use.
Has anyone runs evaluations to compare the instruct version with gpt-4o or llama3-70b etc.? It's so much larger than the leading open source models. So one would hope it would perform significantly better?
Or is this in one of the chat arenas or whatever? Very curious to see some numbers related to the performance.
But if it's at least somewhat better than the existing open source models then that is a big boost for open source training and other use cases.
https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_...
"...Nemotron-4-340B-Base was trained using 768 DGX H100 nodes"
That is 350 million dollars for you...Poor Startups, better have a rich sponsor.
I'm so confused.
Isn't "training LLMs on LLM output" the very definition of "model collapse" or "model poisoning"?
"...and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision"
OK I see the goal is to sell more H100s, they made it big enough so it's not compatible with a cheaper GPU
"Nemotron-4-340B-Instruct is a chat model intended for use for the English language" - frustrating
What is it? Is it an llms or what?
Why does nvidia release models that compete with its customers businesses but don’t make any money for nvidia?
Are they commodotising their complements?
> The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs.
I feel like everyone is missing this from the announcement. They explicitly are releasing this to help generate synthetic training data. Most big models and APIs have clauses that ban its use to improve other models. Sure it maybe can compete with other big commercial models at normal tasks, but this would be a huge opportunity for ML labs and startups to expand training data of smaller models.
Nvidia must see a limit to the growth of new models (and new demand for training with their GPUs) based on the availability of training data, so they're seeking to provide a tool to bypass those restrictions.
All for the low price of 2x A100s...