Ollama and gguf

by indigodaddyon 8/11/2025, 5:54 PMwith 85 comments

by tarrudaon 8/11/2025, 9:05 PM

I recently discovered that ollama no longer uses llama.cpp as a library, and instead they link to the low level library (ggml) which requires them to reinvent a lot of wheel for absolutely no benefit (if there's some benefit I'm missing, please let me know).

Even using llama.cpp as a library seems like an overkill for most use cases. Ollama could make its life much easier by spawning llama-server as a subprocess listening on a unix socket, and forward requests to it.

One thing I'm curious about: Does ollama support strict structured output or strict tool calls adhering to a json schema? Because it would be insane to rely on a server for agentic use unless your server can guarantee the model will only produce valid json. AFAIK this feature is implemented by llama.cpp, which they no longer use.

by indigodaddyon 8/11/2025, 5:55 PM

ggerganov explains the issue: https://github.com/ollama/ollama/issues/11714#issuecomment-3...

by llmthrowawayon 8/11/2025, 9:02 PM

Confusing title - thought this was about Ollama finally supporting sharded GGUF (ie. the Huggingface default for large gguf over 48gb).

https://github.com/ollama/ollama/issues/5245

Sadly it is not and the issue still remains open after over a year meaning ollama cannot run the latest SOTA open source models unless they covert them to their proprietary format which they do not consistently do.

No surprise I guess given they've taken VC money, refuse to properly attribute the use things like llama.cpp and ggml, have their own model format for.. reasons? and have over 1800 open issues...

Llama-server, ramallama or whatever model switcher ggerganov is working on (he showed previews recently) feel like the way forward.

by iamshrimpyon 8/12/2025, 1:07 AM

This title makes no sense and it links nowhere helpful.

It's "Ollama's forked ggml is incompatible with other gpt-oss GGUFs"

and it should link to GG's comment[0]

[0] https://github.com/ollama/ollama/issues/11714#issuecomment-3...

by clarionbellon 8/12/2025, 9:17 AM

Why is anyone still using this? You can spin up llama.cpp server and have more optimized runtime. And if you insist on containers you can go for ramallama https://ramalama.ai/

by dcreateron 8/11/2025, 7:00 PM

I think the title buries the lede? Its specific to GPT-OSS and exposes the shady stuff Ollama is doing to acquiesce/curry favor/partner with/get paid by corporate interests

by am17anon 8/12/2025, 2:44 AM

There’s a GitHub link which is open from last year, about the missing license in ollama. They have not bothered to reply, which goes to show how much they care. Also it’s a YC company, I see more and more morally bankrupt companies making the cut recently, why is that?

by 12345hn6789on 8/11/2025, 9:40 PM

Just days ago ollama devs claimed[0] that ollama no longer relies on ggml / llama.cpp. here is their pull request(+165,966 −47,980) to reimplement (copy) llama.cpp code in their repository.

https://github.com/ollama/ollama/pull/11823

[0] https://news.ycombinator.com/item?id=44802414#44805396

by buyucuon 8/12/2025, 6:05 AM

ollama is a lost cause. they are going through a very aggressive phase of enshittification right now.

by diimdeepon 8/12/2025, 6:56 AM

ollama is certain in the future rent seeking wrapper by docker fame.

classic Docker Hub playbook: spread the habit for free → capture workflows → charge for scale.

moat isn't in inference speed — it’s in controlling the distribution & default UX, once they own that, can start rent-gate it.

by om8on 8/11/2025, 11:22 PM

llama.cpp is a mess and ollama is right to move on from it

by sunnycoder5on 8/12/2025, 4:29 AM

for folks wrestling with Ollama, llama.cpp or local LLM versioning - did you guys check out Docker's new feature - Docker Model Runner?

Docker Model Runner makes it easy to manage, run, and deploy AI models using Docker. Designed for developers, Docker Model Runner streamlines the process of pulling, running, and serving large language models (LLMs) and other AI models directly from Docker Hub or any OCI-compliant registry.

Whether you're building generative AI applications, experimenting with machine learning workflows, or integrating AI into your software development lifecycle, Docker Model Runner provides a consistent, secure, and efficient way to work with AI models locally.

For more details check this out : https://docs.docker.com/ai/model-runner/