Most of the regression problems I dealt with focused on minimizing the L2 norm of the difference between the predictions of a model and the true value. However, from a business point of view, it may make more sense to minimize directly the L1 norm.
Indeed, using a least square approach will mainly penalize large errors. But if the cost of an error is constant (once again, from the business perspective), then, the L1 penalty makes sense as well.
However, note that the estimator may be vastly different, if we want to use a constant model, the values of the intercept differ, one being the mean, the other one, the median.
So far, so good. However, things become complex quite quickly: a theoretical advantage of L2 penalty is that it makes the penalty differientiable, and we enjoy many closed formulas for various problems relying on L2 penatlies.
I will focus on implementing the univariate case, without intercept. Including intercept or multivariate case relies on a (much) more complex optimization.
Calling we note that is a convex function, whose derivative is not continuous.
It derivative, where it is defined, is:
Given that is convex, must be monotonic. Besides and .
Therefore, we will look for a zero of using dichotomy.
As detailed above, one we have the penalty and the dichotomy algorithm implemented, there is nothing else to do:
import numpy as np def dichotomy(n, f, l, r): c = (l + r) / 2 if n ==0: return c else: if f(c) < 0 : return dichotomy(n-1, f, c, r) else: return dichotomy(n-1, f, l, c) def penalty(x, y, b): return -np.sum(x * np.sign(y - b * x)) class L1Regressor: def __init__(self, n_iterations=20): self.n_iterations = n_iterations self.b = None def fit(self, x, y): ratios = y / x l, r = np.min(ratios), np.max(ratios) self.b = dichotomy(self.n_iterations, lambda b: penalty(x, y, b), l, r) return self
If we append the following:
if __name__ == "__main__": import matplotlib.pyplot as plt x = np.random.rand(100) y = x * 0.5 + 0.1 * np.random.normal(size=100) slope = L1Regressor().fit(x, y).b plt.scatter(x, y) plt.plot(x, x*slope, c = 'r') plt.savefig("./illustration.svg")
Comparison with L2
We can add some noise to the observations and we expect to have a more robust regression with L1 penalty.
Below are plotted the two slopes: in red, the L1 penalty is minimized, in green, the L2 penalty.
from sklearn.linear_model import LinearRegression x = np.random.rand(100) y = x * 0.5 + 0.1 * np.random.normal(size=100) x[:10] = 0.9 + 0.1 * np.random.rand(10) y[:10] = 2 + x[:10] * 0.5 + np.random.normal(size=10) slope = L1Regressor().fit(x, y).b slopel2 = LinearRegression(fit_intercept=False).fit(x.reshape(-1,1), y).coef_ plt.scatter(x, y) plt.plot(x, x*slope, c = 'r') plt.plot(x, x*slopel2, c = 'g') plt.savefig("./illustration_noise.svg")