Couldn't find anything online, but just wondered what the chances would be, if, much like SETI at the time, people would share some of their GPU processing time (small slots, here and there, when underutilized) to train an open-source community-owned model? Basically a model by the people, for the people :)
I wonder for how much longer we'll get to see (high quality) open models be shared freely, it's already quite a limited landscape.
Would there even be enough GPU power? Would something like it be doable with a reasonably realistic community effort? The human part of training could also be a part of this effort, which is yet another thing a single company is having to pay a lot of money for (well, I guess not anymore, as everyone is now helping out for free, by using the hosted LLMs).
Then, of course the problem of the data, not sure how far off the data private companies have bought, stolen, ...found is from what's available out there.
Curious if anyone more involved in with the topic has a thought or two about it.
This has definitely been discussed. There have even been some projects, although I haven't checked on the status of any of them lately. As best as I can recall, there are some specific structural reasons why it's hard to train LLM's this way, but I don't recall all the details offhand.
https://www.google.com/search?q=distributed+model+training+s...
https://news.ycombinator.com/item?id=35799843
https://www.reddit.com/r/ArtificialInteligence/comments/18q6...
https://www.reddit.com/r/slatestarcodex/comments/1gtnxgd/wha...
https://github.com/BOINC/boinc/wiki/Using-BOINC-for-AI
https://the-decoder.com/ai-startup-prime-intellect-trains-fi...
I have been working on special licence and a library project for years, finally getting close to be fully released to the public. It comes with 3 new licences, The D.I.C ( pronounce diss licence), The D.I.C.K Licence ( say it the D-Licence ), and the D-Code open sovereign licence. Which had a special clause for AI bot and llm training forcing any llm training on this subset of data to also be published under the D-open code sovereign licence .
Read more at
Https://datapond.earth And Https://dsafe.us
The project is not officially launched yet, in two weeks I will have much more Information on how to join and grow this public domain D-initiative.