L1 & L2 regularization
L1(Lasso) and L2(Ridge) regularization are used to address overfitting by penalizing large weights in the model.
- L1 Regularization(Lasso): Adds the absolute value of all weights in the model to the loss function.
- Encourages sparsity, meaning some weights can be driven to zero, effectively removing unimportant features
- prefer when features selection is desired
- L2 Regularization(Ridge): Adds the squared value of all weights to the loss function.
- Discourages excessively large weights but generally doesn’t drive them to zero, resulting in a smoother weight distribution
- more stable and efficient, default choice
Implementing in PyTorch
Using weight_decay
PyTorch的一些优化算法如SGD,Adam内部实现了L2正则化,weight_decay
代表L2的强度。
A higher value of weight_decay leads to stronger regularization.