Discussed at the time:
Why Momentum Works - https://news.ycombinator.com/item?id=14034426 - April 2017 (95 comments)
Distill.pub has such high quality content consistently. It's a shame they don't seem to be active anymore.
While I'm interested in the topic of the post and have seen plenty of visualisations of balls rolling around hills, I was a little disappointed that it didn't cover the thing that has bugging me for years.
Momentum, or specifically inertia, in physics, what the hell is it? There's a Feynman tale where he asked his father why the ball rolled to the back of a trolly when he pulled the trolley. The answer he received was the usual description of inertia, but also the rarely given insight that describing something and giving it a name is completely different from knowing why it happens.
It's one of those things that I lie in bed thinking about. The other one is position, I can grasp the notion of spacetime and the idea of movement and speed as changes in position in space relative to position in time. I really don't have a grasp of what position is though. I know the name, I can attach the numbers to it, but that doesn't really cover what the numbers are of though.
Perhaps it is an elementary doubt, but does it all apply to rotational motion? Does a wheel rotating along its own axis continue to rotate in perpetuum, in the absense of friction, air resistance etc?
Only skimped through the article for now, but have to give props to the author - it's beautifully made.
Geez. What a dithering article.
I was curious how well the simple momentum step-size approach shown in the first interactive example compares to alternative methods. The example function featured in the first interactive example is named bananaf ("Rosenbrok Function banana function"), defined as
The interactive example uses an initial guess of [-1.21, 0.853] and a fixed 150 iterations, with no convergence test.From manually fiddling with (step-size) alpha & (momentum) beta parameters, and editing the code to specify a smaller number of iterations, it seems quite difficult to tune this momentum-based approach to get near the minima and stay there without bouncing away in 50 iterations or fewer.
Out of curiosity, I compared minimising this bananaf function with scipy.optimize.minimize, using the same initial guess.
If we force scipy.optimize.minimize to use method='cg', leaving all other parameters as defaults, it converges to the optimal solution of [1.0, 1./3.] requiring 43 evaluations of fx and dfx,
If we allow scipy.optimize.minimize to use all defaults -- including the default method='bfgs', it converges to the optimal solution after only 34 evaluations of fx and dfx.
Under the hood, scipy's method='cg' and method='bfgs' solvers do not use a fixed step size or momentum to determine the step size, but instead solve a line search problem. The line search problem is to identify a step size that satisfies a sufficient decrease condition and a curvature condition - see Wolfe conditions [1]. Scipy's default line search method -- used for cg and bfgs -- is a python port [2] of the dcsrch routine from MINPACK2. A good reference covering line search methods & BFGS is Nocedal & Wright's 2006 book Numerical Optimization.
[1] https://en.wikipedia.org/wiki/Wolfe_conditions [2] https://github.com/scipy/scipy/blob/main/scipy/optimize/_dcs...