 This event has passed.
YES VII: “Stochastic Optimization and Online Learning”
Mar 26, 2014  Mar 28, 2014
Summary
Stochastic optimization embodies a collection of methodologies and theory that aim at devising optimal solutions to countless realworld inference problems, particularly when these involve uncertain and missing data. At the heart of stochastic optimization is the idea that many deterministic optimization problems can be addressed in a more powerful and convenient way by introducing intrinsic randomness in the optimization algorithms. Furthermore, this generalization gives rise to a set of techniques that are well suited for settings involving uncertain, incomplete, or missing data. Online (machine) learning is concerned with learning and prediction in a sequential and online fashion. In many settings the goal of online learning is the optimal prediction of a sequence of instances, possibly given a sequence of sideinformation. For example, the instances might correspond to the daily value of a financial asset or the daily meteorological conditions, and one wants to predict tomorrow’s value of the asset or weather conditions. Interestingly, it is possible to devise very powerful online learning algorithms able to cope with adversarial settings, meaning that powerful adversary can generate a sequence of instances that attempts to “break” the algorithms strategy. However, one can show that these algorithms must necessarily incorporate a randomization of the predictions, and can be often casted as stochastic optimization algorithms. One of the goals of this workshop is to make such connections between online learning and stochastic optimization more transparent.
Particularly important in the present workshop is the quantification of the performance of a given online stochastic optimization/online learning procedure. This challenging question requires several ingredients to be answered adequately; in particular one needs to develop a proper optimality framework. Here parallels with modern statistical theory emerge, as notions such as consistency, convergence rates and minimax bounds, common in statistical theory, all have parallels in stochastic optimization and online learning. Therefore there is plenty of room for crossfertilization between all these fields, which is the main motivation for this workshop.
The aim of the workshop “Stochastic Optimization and Online Learning” is to introduce these broad fields to young researchers, in particular Ph.D. students, postdoctoral fellows and junior researchers, who are interested and eager to tackle new challenges in the fields of stochastic optimization and online learning.
Sponsors
Organizers
Rui Castro  TU Eindhoven 
Eduard Belitser  VU Amsterdam 
Tutorial Speakers
Nicolò CesaBianchi  Università degli Studi de Milano 
Francis Bach  INRIA – Laboratoire d’Informatique de l’Ecole Normale Superieure, Paris 
Anatoli Juditsky  Laboratoire Jean Kuntzmann, Université Joseph Fourier, Grenoble 
Programme



Abstracts
Francis Bach
Beyond stochastic gradient descent for largescale machine learning
Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations (“large n”) and each of these is large (“large p”). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes over the data. Given n observations/iterations, the optimal convergence rates of these algorithms are O(1/\sqrt{n}) for general convex functions and reaches O(1/n) for stronglyconvex functions.
In this tutorial, I will first present the classical results in stochastic approximation and relate them to classical optimization and statistics results. I will then show how the smoothness of loss functions may be used to design novel algorithms with improved behavior, both in theory and practice: in the ideal infinitedata setting, an efficient novel Newtonbased stochastic approximation algorithm leads to a convergence rate of O(1/n) without strong convexity assumptions, while in the practical finitedata setting, an appropriate combination of batch and online algorithms leads to unexpected behaviors, such as a linear convergence rate for strongly convex problems, with an iteration cost similar to stochastic gradient descent. (joint work with Nicolas Le Roux, Eric Moulines and Mark Schmidt)
Anatoli Juditsky
Deterministic and stochastic first order algorithms of largescale convex optimization
Syllabus:
1. First Order Methods of proximal type Nonsmooth Black Box setting: deterministic and stochastic Mirror Descent algorithm (MD). ConvexConcave Saddle Point problems via MD.
2. Utilizing problem’s structure: Mirror Prox algorithm (MP). Smooth/Bilinear Saddle Point reformulations of convex problems: calculus and examples. The Mirror Prox algorithm. Favorable geometry domains and good proximal setups.
3. Conditional Gradient type First Order Methods for problems with difficult geometry: Convex problems with difficult geometry. Smooth minimization, normregularized smooth minimization. Nonsmooth minimization.