When would you want to use L1 regularization as opposed to L2 regularization

The main difference between L1 and L2 regularization is that L1 can yield sparse models while L2 doesn't. Sparse model is a great property to have when dealing with high dimensional data, for at least 2 reasons.

- Model compression: increasingly important due to the mobile growth
- Feature selection: it helps to know which features are important and which features are not or redundant.

The difference between L1 and L2 regularization are as follows:

·

L1/Laplace tends to tolerate both large values as well as very small values of coefficients more than L2/Gaussian

·

L1 can yield sparse models while L2 doesn't

·

L1 and L2 regularization prevents overfitting by shrinking on the coefficients

·

L2 (Ridge) shrinks all the coefficient by the same proportions but eliminates none, while L1 (Lasso) can shrink some coefficients to zero, performing variable selection

·

L1 is the first moment norm |x1-x2| that is simply the absolute dıstance between two points where L2 is second moment norm corresponding to Euclidean Distance that is |x1-x2|^2.

·

L2 regularization tends to spread error among all the terms, while L1 is more binary/sparse

