I mean it has many diagrams and logical explanation of Goroutines and concurrency concepts in general but it is definitely not under the hood descriptions.
It didn't really lift the hood at all, unfortunately. Luckily for us the runtime is extensively commented, e.g. https://github.com/golang/go/blob/master/src/runtime/proc.go
I love Go and goroutines, but...
> A newly minted goroutine is given a few kilobytes
a line later
> It is practical to create hundreds of thousands of goroutines in the same address space
So it's not practical to create 100s of Ks of goroutines - it's possible, sure, but because you incur GBs of memory overhead if you are actually creating that many goroutines means that for any practical problem you are going to want to stick to a few thousand goroutines. I can almost guarantee you that you have something better to do with those GBs of memory than store goroutine stacks.
Asking the scheduler to handle scheduling 100s of Ks of goroutines is also not a great idea in my experience either.
While the blog is a great introductory post, https://www.youtube.com/watch?v=KBZlN0izeiY is a great watch if you're interested in the magical optimizations in goroutine scheduling.
I've fallen in love with Python's asyncio for some time now, but I know that go has coroutines integrated as a first class citizen.
This article (which I have not read but just skimmed) made me search for a simple example, and I landed at "A Tour of Go - Goroutines"[0]
That is one of the cleanest examples I've ever seen on this topic, and it shows just how well integrated they are in the language.
In the conclusion the author states:
>"Go run-time scheduler multiplexes goroutines onto threads and when a thread blocks, the run-time moves the blocked goroutines to another runnable kernel thread to achieve the highest efficiency possible."
Why would the Go run-time move the blocked goroutines to another runnable kernel thread? If it is currently blocked it won't be schedulable regardless no?
So it's just coroutines on top of n:m scheduling, similar to what SysV offered a while ago?
(2020) But so high level it's still relevant.
One thing that really goes against my intuition is that user space threads (lightweight treads, goroutines) are faster than kernel threads. Without knowing too much assembly, I would assume any modern processor would make a context switch a one instruction affair. Interrupt -> small scheduler code picks the thread to run -> LOAD THREAD instruction and the processor swaps in all the registers and the instruction pointer.
You probably can't beat that in user space, especially if you want to preempt threads yourself. You'd have to check after every step, or profile your own process or something like that. And indeed, Go's scheduler is cooperative.
But then, why can't you get the performance of Goroutines with OS threads? Is it just because of legacy issues? Or does it only work with cooperative threading, which requires language support?
One thing I'm missing from that article is how the cooperativeness is implemented. I think in Go (and in Java's Project Loom), you have "normal code", but then deep down in network and IO functions, you have magic "yield" instructions. So all the layers above can pretend they are running on regular threads, and you avoid the "colored function problem", but you get runtime behavior similar to coroutines. Which only works if really every blocking IO is modified to include yielding behavior. If you call a blocking OS function, I assume something bad will happen.