I'm confused about the first graph (and some of the subsequent ones) - how can 1 - R^2 ever be greater than 1? Are you actually just plotting mean squared error instead?

As you mentioned, one of the issues with deep learning is interpretability. If working in predictive modelling or similar scenarios, is there any way we can determine what these features learned by deep learning are?

For example in your case where you engineered your new features in the regression example, you could determine what new features were 'important' or that were significant. You also mention DL is better than me at _finding_ good features but is there anyway it can tell me what these features it has learned are?

After gradient descent "memorizes the training set", why would it then move to a more elegant explanation as described here? It seems this would not improve the loss function over the training set, which is what gradient descent is optimizing

## Deep Learning Is Better Than Linear Regression

A footnote mentions a code repository linked at the end of the essay, but I can't seem to find one. Can you provide a link?

I'm confused about the first graph (and some of the subsequent ones) - how can 1 - R^2 ever be greater than 1? Are you actually just plotting mean squared error instead?

As you mentioned, one of the issues with deep learning is interpretability. If working in predictive modelling or similar scenarios, is there any way we can determine what these features learned by deep learning are?

For example in your case where you engineered your new features in the regression example, you could determine what new features were 'important' or that were significant. You also mention DL is better than me at _finding_ good features but is there anyway it can tell me what these features it has learned are?

After gradient descent "memorizes the training set", why would it then move to a more elegant explanation as described here? It seems this would not improve the loss function over the training set, which is what gradient descent is optimizing