Chih-Jen Lin

National Taiwan University

Optimization plays an important role in many machine learning methods. However, the two areas have very different focuses. The gap has caused that, on the one hand, some machine learning tasks may not suitably use optimization techniques, and on the other hand, optimization researchers may wrongly consider irrelevant issues when applying their methodology to machine learning. In this course I will discuss my past experiences in developing optimization methods for kernel, linear classification, and matrix factorization. We will discuss how to incorporate properties of machine learning problems in designing useful optimization methods.

The new wave of applications of neural networks to textual problems, Word vectors, Applications of Word vectors, Sentence and Document representations, Recurrent neural network models, Applications.


Laura Palagi

Sapienza University of Rome

Online learners use the information of observed training data into the model via incremental updates without the need for using the entire training data set. This approach is particularly well suited for large-scale machine learning that represents a current setting in which online learning may offer significant computational advantages over batch learning algorithms. Nevertheless, as any model in machine learning, online update models need to take into account conflicting features that are to be good enough to capture the characteristics of the training data, but also avoid overfitting having available only a part of the training data. We review theoretical and practical aspects of incremental gradient and stochastic gradient methods together with recent advances and trends specific for online machine learning. Progress towards distributed and parallel computing is also considered.


Massimiliano Pontil

Istituto Italiano di Tecnologia & University College London

Machine learning studies the problem of learning to perform a given task from a dataset of examples. A fundamental limitation of standard machine learning methods is the cost incurred in preparing large training datasets. Often in applications a limited number of examples is available and the task cannot be solved in isolation. A potential remedy is offered by multitask learning, which aims to learn several related tasks simultaneously. If the tasks share some constraining or generative property which is sufficiently simple it should allow for better learning of the individual tasks even when the individual training datasets are small. In the course, I will present a wide class of multitask learning methods which encourage different forms of task relatedness and illustrate the performance of these methods in applications arising in affective computing, computer vision and user modelling.