You're confusing naive Bayes and the entirety of Bayesian statistics here. In the example given, you are correct that you cannot apply naive Bayes. But you can still have a Bayesian regression model and apply Bayes rule, you're just assuming a model. There's nothing special about deep learning here, it's just assuming a very complex and flexible model but you can frame it as a regression anyway.

But isn't there a lot of arbitrariness in deep learning models?

We could train different models with different architectures and different hyper-parameters, and get different results, right? Why is this better than the "ad-hoc-ness" of the Bayesian approach?

I think here it's important to distinguish between normative ideals and practical algorithms. Bayes' rule is the normative ideal of prediction because Bayes' rule follows from the axioms of probability, and if you don't follow the axioms of probability you get dutch booked. Solomonoff's universal induction solves the problem of where your initial prior comes from. However, there are no practical algorithms that are explicitly Bayesian and can handle lots of nonlinear features while exhibiting good generalization, so we resort to deep learning, which is a good enough approximation to solomonoff induction. Nb. https://www.inference.vc/everything-that-works-works-because-its-bayesian-2/

## Where Bayes Falls Short

You're confusing naive Bayes and the entirety of Bayesian statistics here. In the example given, you are correct that you cannot apply naive Bayes. But you can still have a Bayesian regression model and apply Bayes rule, you're just assuming a model. There's nothing special about deep learning here, it's just assuming a very complex and flexible model but you can frame it as a regression anyway.

But isn't there a lot of arbitrariness in deep learning models?

We could train different models with different architectures and different hyper-parameters, and get different results, right? Why is this better than the "ad-hoc-ness" of the Bayesian approach?

I think here it's important to distinguish between normative ideals and practical algorithms. Bayes' rule is the normative ideal of prediction because Bayes' rule follows from the axioms of probability, and if you don't follow the axioms of probability you get dutch booked. Solomonoff's universal induction solves the problem of where your initial prior comes from. However, there are no practical algorithms that are explicitly Bayesian and can handle lots of nonlinear features while exhibiting good generalization, so we resort to deep learning, which is a good enough approximation to solomonoff induction. Nb. https://www.inference.vc/everything-that-works-works-because-its-bayesian-2/