Iterative Loss Minimization with ` 1-Norm Constraint and Guarantees on Sparsity

We study the problem of minimizing the loss of a linear predictor with a constraint on the `1 norm of the predictor. We describe a forward greedy selection algorithm for this task and analyze its rate of convergence. As a direct corollary of our convergence analysis we obtain a bound on the sparsity of the predictor as a function of the desired optimization accuracy, the bound on the `1 norm, and the Lipschitz constant of the loss function. 1 Outline of main results We consider the problem of searching a linear predictor with low loss and low `1 norm. Formally, let X be an instance space, Y be a target space, and D be a distribution over X × Y . Our goal is to approximately solve the following optimization problem min w E(x,y)∼D[L(〈w,x〉, y)] s.t. ‖w‖1 ≤ B , (1) where L : R × Y → R is a loss function. Furthermore, we would like to find an approximated solution to Eq. (1) which is also sparse, namely, ‖w‖0 = |{i : wi 6= 0}| is small. We describe an iterative algorithm for solving Eq. (1) that alters a single element of w at each iteration. Assuming that L is convex and λ-Lipschitz with respect to its first argument, we prove that after performing T iterations of the algorithm it finds a solution with accuracy O((λB/ )). Our analysis therefore implies that we can find w such that • ‖w‖0 = O((λB/ )) • For all w with ‖w‖1 ≤ B we have E[L(〈w,x〉, y)] ≤ E[L(〈w,x〉, y)] + In a separate technical report, we show that this relation between ‖w‖0, B, and is tight.