Scalable Large-Margin Structured Learning: Theory and Algorithms, Kai Zhao (CUNY)
Much of NLP tries to map structured input (sentences) to some form of structured output (tag sequences, parse trees, semantic graphs, or translated/paraphrased/compressed sentences). Thus structured prediction and its learning algorithm are of central importance to us NLP researchers. However, when applying machine learning to structured domains, we often face scalability issues for two reasons:
1. Even the fastest exact search algorithms for most NLP problems (such as parsing and translation) is too slow for repeated use on the training data, but approximate search (such as beam search) unfortunately breaks down the nice theoretical properties (such as convergence) of existing machine learning algorithms.
2. Even with inexact search, the scale of the training data in NLP still makes pure online learning (such as perceptron and MIRA) too slow on a single CPU.
This talk reviews recent advances that address these two challenges. In particular, we will cover principled machine learning methods that are designed to work under vastly inexact search, and parallelization algorithms that speed up learning on multiple CPUs. We will also extend structured learning to the latent variable setting, where in many NLP applications such as translation and semantic parsing the gold-standard derivation is hidden.
Kai Zhao is a Ph.D. student at the City University of New York (CUNY), working with Professor Liang Huang. He received his B.S. from the University of Science and Technology in China (USTC). He has published on structured prediction, online learning, machine translation, and parsing algorithms. He worked as summer interns at IBM TJ Watson Research Center in 2013 and Microsoft Research in 2014.
Practical Learning Algorithms for Structured Prediction, Kai-Wei Chang (UIUC)
In many prediction problems, decisions are structured, in that the goal is to assign values to multiple inter-dependent variable, where the relations among the output variables could be modeled as a sequence, as set of clusters or a graph. When solving these problems, it is important to make coherent decisions that take the inter-dependencies among output variables into account. Such problems are often referred to as structured prediction problems. In this tutorial, we will focus on recent developments of discriminative structured prediction models such as structured SVMs and Structured Perceptron. Beyond introducing the algorithmic approaches in this domain, we will discuss ideas that result in significant improvements both in the learning and in the inference stages of these algorithms. In particular, we will discuss the use of selecting and caching techniques to reuse computation. Participants will learn about existing trends in learning and the inference for the structured prediction models and how they can be applied in NLP applications.
Kai-Wei Chang is a doctoral candidate in Computer Science at the University of Illinois at Urbana-Champaign. His research interests lie in designing practical machine learning techniques for large and complex data and applying them to real world applications. He has been working on various topics in Machine learning and Natural Language Processing, including large-scale learning, structured learning, coreference resolution, and relation extraction. Kai-Wei was awarded the KDD Best Paper Award in 2010 and won the Yahoo! Key Scientific Challenges Award in 2011. He was one of the main contributors of a popular linear classification library, LIBLINEAR.
Statistical natural language processing relies on probabilistic models of linguistic structure. More complex models can help capture our intuitions about language, by adding linguistically meaningful interactions and latent variables. However, inference and learning in the models we want often poses a serious computational challenge.
Belief propagation (BP) and its variants provide an attractive approximate solution, especially using recent training methods. These approaches can handle joint models of interacting components, are computationally efficient, and have extended the state-of-the-art on a number of common NLP tasks, including dependency parsing, modeling of morphological paradigms, CCG parsing, phrase extraction, semantic role labeling, and information extraction (Smith and Eisner, 2008; Dreyer and Eisner, 2009; Auli and Lopez, 2011; Burkett and Klein, 2012a; Naradowsky et al., 2012; Stoyanov and Eisner, 2012).
This breakout session delves into BP with an emphasis on recent advances that enable state-of-the-art performance in a variety of tasks. Our goal is to elucidate how these approaches can easily be applied to new problems. We also cover the theory underlying them.
Matt Gormley is a PhD student at Johns Hopkins University working with Mark Dredze and Jason Eisner. His current research focuses on joint modeling of multiple linguistic strata in learning settings where supervised resources are scarce. He has authored papers in a variety of areas including topic modeling, global optimization, semantic role labeling, and grammar induction.