Author here! What a surprise. This was an abandoned project from 2019, that we never linked or advertised anywhere as far as I know. Anyways, happy to answer questions.
What frustrates me about Bayesian NNs is that talking about "priors" doesn't make nearly as much sense as it does in a regression context. A prior over parameter weights has no interpretation in the way that a prior over a regression coefficient, or even a spline smoothness, does. What you really want -- and what natural intelligence probably has -- are priors over aspects of the world.
Francois Chollet's paper on measuring intelligence was really informative for me on this front; the "priors" you should have about the world are not half-cauchys over certain hyperparameters or whatever, but priors about agent-ness, object-ness, goal-oriented-ness, and so on. How to encode that in a network...well, that's the real trick, right?
I like Bayesian inference for few-parameter models where I have solid grounds for choosing my priors. For neural networks, I like to ask people "what's your prior for ReLU versus LeakyReLU versus sigmoid?" and I've never gotten a convincing answer.
https://publications.aston.ac.uk/id/eprint/373/1/NCRG_94_004...
mixture density networks are quite interesting if you want probabilistic estimates of neural. here, your model learns to output and array of gaussian distribution coefficient distributions, and mixture weights.
these weights are specific to individual observations, and trained to maximise likelihood.
BNNs were an attractive choice in scenarios where the data is expensive to collect, like actual physical experiments. But boosting and other tree-based regression methods give you similar performance with a more straightforward framework for limited tabular data.
I like Bayes, but I thought the "surprising" result is that double descent is supposed to prevent nns from overfitting?
Bayesian Neural Networks just seem like a failed approach, unfortunately. For one, Bayesian inference and UQ fundamentally depends on the choice of the prior, but this is rarely discussed in the Bayesian NN literature and practice, and is further compounded by how fundamentally hard to interpret and choose these priors are (what is the intuition behind a NN's parameters?). Add to that the fact that the Bayesian inference is very much approximate, and you should see the trouble.
If you want UQ, 'frequentist nonparametric' approaches like Conformal Prediction and Calibration/Multi-Calibration methods seem to work quite well (especilly when combined with the standard ML machinery of taking a log-likelihood as your loss), and do not suffer from any of the issues above while also giving you formal guarantees of correctness. They are a strict improvement over Bayesian NNs, IMO.