I am a fan of simple and concise explanations. When it comes to Deep Learning and Machine Learning it is often very useful to understand from a higher level what a concept really is, before diving in deeper. This post will help you have a higher level understanding of 2 very important concepts Bias and Variance.
If a model's inference changes drastically on unfamiliar data points it means the model varies too much between data points. We want our models prediction to be correct across multiple datasets even if the underlying properties of the dataset are different. What we basically want is a model that can generalize across all the data on the planet. Which means we want low variance which means the model predictions will not vary drastically to unfamiliar datasets. If it does then the model has memorized the specifics of the dataset it was trained on. This is called high variance.
On the other hand the model can perform badly on a data set. This would mean the model has high bias which means it makes incorrect predictions and is learning a simple model which is not enough for the complexity of the dataset. High bias happens because the model is oversimplified and is not sufficient for the given task.
High Bias = Underfitting = Model is too simple
High Variance = Overfitting = Model is too complex
High bias solutions
- Bigger network more complex network
- Train for longer
High variance solutions
- Get more training data
- Regularization
The bias variance tradeoff is not a big problem in Deep Learning because there are tools to drive down just the bias and just the variance without hurting the other one.
Training bigger network will drive down bias without effecting the variance too much
Getting more data will drive down variance without effecting bias too much.