This is great. Increased context does seem like the biggest current issue with copilot. It will take a lot of work to understand an entire code base but if we could get to understanding 2-5 files and how they interact... Coding assistants would be a lot more useful.
This product looks interesting but very expensive ($60 pm) compared to class leaders like Claude.
Given that I think those alternatives are extremely cheap, that's not necessarily a problem - but it would be useful to understand more about how the results compare when the service is used in the real world, and what exactly is being provided - it seems to be saying this is a direct competitor to other LLMs enhanced with contextual magic, ie it doesn't sit on top of a third party LLM endpoint like, say, Cursor does.
But if it is 3 times the price of services like Cursor or Claude or ChatGPT etc, it really needs to give evidence for that value in its outreach efforts, rather than just saying "this is how context makes things better".
balancing compute-bound (prefill) and memory-bound (decode) is a fine art. Luckily there are lots of improvements (incentives) if you can adjust it to your use case (this time is Coding assistants), but it is generally a lonely journey. Good to see you paired with Colfax International.
Well a bigger context is important for understanding the global codebase and knowing the next code or decision.
Pretty good job
> The academic literature has recently caught on to this approach as well and we want to highlight the excellent papers on Sarathi (link 1, link 2) and DeepSpeed-FastGen (link). The academic literature calls the technique “chunked prefill”.
The Sarathi paper is more than a year old and implemented in both vllm and trtllm. There doesn't seem to be much else of substance in the article. Edit: to be fair it seems the article was written some time ago and only "updated" two weeks ago. So maybe it was more novel at the time.