That’s an impressive result
The open rail license seems to reference some sort of limitations on safety and unethical use but I can’t see where in the repo that’s spelled out precisely what the authors have in mind?
One misleading thing is the notion that you need a 1-2B model to run on commodity hardware.
This is not really true. Llama 7B runs with Vulkan/llama.cpp on ~8GB smartphones and ~12GB laptops. That ease is going to get much better over time, as lower RAM hardware starts dropping out of the market and the Vulkan implementations get more widespread.
For users trying to run LLMs on 8GB or less machines, the AI Horde approach of distributed models seems much more practical anyway.
Hey, I have a genuine question:
What is the point of a new model that isn’t better than the best possible model (example: OpenAI GPT-4)?
What’s the point in having a smaller model? Who cares?
—-
This is a real, genuine question that I don’t have a clear answer to. Excuse my ignorance, plz enlighten your boi.
Just trying out the official container image for self-hosting along side the VSCode extension - I've got to say I'm really impressed with the scaffolding especially for an early stage project.
The web interface for the LLM server is especially nice and clean compared to many of the others I've tried - and it "just works". Very interested to see how this evolves.
Whats the difference between 1% and 99% of HumanEval? What does it tell really?
I dont trust any benchmarks for any LLM thats not coming from FB, Google, OpenAI, Anthropic, or Microsoft. These models are so dynamic, the simple benchmark numbers never tell the whole story of the quality of the model. Take for instance, a recent posting by anyscale, claiming their fine tuning of Llama 2 was competitive with OpenAI's model. The reality being their fined tuned model is basically worthless, and was competitive along a single metric/very narrow commoditized task. Its a great way to get clicks by posting these metrics though
Congrats on your achievement! I'm curious about your end goal. Do you aim to beat GitHub Copilot's performance and convince devs to use Refact for code completion instead of GitHub Copilot? I want to understand the motivation behind these different code-completion models that are not solely for academic research.
The title is misleading This model is not "SOTA for the size", there are smaller models that do 10-18% better in absolute score. The text says it's SOTA "among similar models" where they probably compare with other models with permissive licensing.
License text: https://drive.google.com/file/d/16NqKiAkzyZ55NClubCIFup8pT2j... [PDF]
See last page for restrictions
Say I want to fine tune a Golang specific model. How much $ and effort would I have to put in? Would using this as a base help in any way compared to starting from llama?
All these LLMs are pretty general if I understand correctly. Are there any efforts to create specialized models (other than for coding)? Or, what would be even better, "extract" certain areas from existing LLMs as a way to specialize them? With the goal to drastically reduce model size to be able to run on less powerful devices.
E.g. a model specializing in chemistry doesn't need to include data on world's history or to be able to write poetry.
Another model that we'll soon forget it ever existed.
For the sake of not giving Microsoft and a few other tech giants immense power over the world, I really do hope the cost and efficiency of LLMs improve dramatically, until we can get GPT-4-equivalent models trained on a few graphics cards and running offline on an iPhone. Really rooting for these kinds of projects until someone makes the breakthrough.
We’ve finished training a new code model Refact LLM which took us about a month. The main use-case is for blazing-fast code completion with fill-in-the-middle, additionally, the model could reply to chat prompts.
It has much better performance than all of the code models of similar size, and almost reaches the same HumanEval as Starcoder being 10x smaller in size.
With the small size, it can work with most modern GPUs requiring just 3GB Ram.
You can try self-hosting it in Refact https://github.com/smallcloudai/refact/ and get a local fast copilot alternative with decent suggestions.
Weights and model card https://huggingface.co/smallcloudai/Refact-1_6B-fim.
We would love to hear your feedback!
tangentially related: refact recently shared 4 bounties worth $9,000 to help improve their tech!
https://algora.io/org/smallcloudai/bounties
disclaimer: i'm a cofounder of algora, the platform enabling these bounties
Model Stats - Architecture: LLAMA-like model with multi-query attention - Objectives Fill-in-the-Middle, Chat - Tokens context: 4096 - Pretraining tokens: 1.2T - Finetuning tokens: 40B - Precision: bfloat16 - GPUs 64 NVidia A5000 - Training time 28 days
This post is misleading, in a way that is hard to do accidentally.
This is interesting work, and a good contribution, but it's important to compare similar models.[1] https://github.com/nlpxucan/WizardLM
[2] https://huggingface.co/vikp/llama_coder
[3] https://stability.ai/blog/stablecode-llm-generative-ai-codin...
[4] https://github.com/huggingface/blog/blob/main/starcoder.md