Hacker News

by zahirbmirzaon 2/20/2026, 3:20 PMwith 0 comments

According to Claude, "the underlying infrastructure uses shared compute resources — many users' requests are handled across a pool of GPUs"

What is the size of this pool, ie how many GPUs would it take for an individual user to be able to run their own equivalent today? Let's assume the LLM is fully downloadable.

I ask, because, if LLMs stop improving exponentially, surely soon enough we will ALL be able to run un-quantised local LLMs of sufficient quality for day to day tasks.

Ask HN: On-Device vs. Cloud Based LLMs