From Text to Knowledge: Advances in Information Extraction and Social Media Analysis

Aron Culotta
Northeastern Illinois University

Date and Location: Friday, March 1st, 2013, 11:25am – 12:25pm @ Stuart Building, Room 111.

Abstract

The continued growth of online text data presents exciting opportunities for automated knowledge discovery. In this talk, I will present two lines of research developing machine learning algorithms to convert large text collections into actionable knowledge. First, I will discuss information extraction (IE), which infers a relational database from unstructured text. After giving an overview of several IE tasks, including entity extraction, coreference resolution, and relation extraction, I will describe a new learning algorithm, SampleRank, that efficiently models the complex statistical dependencies inherent in IE, and present state-of-the-art results extracting information from news stories. Second, I will turn to the analysis of informal texts, specifically Twitter data. What can we infer about society from this data? I will outline the fundamental challenges in this line of research and present our work monitoring flu activity, alcohol consumption, and anxiety towards Hurricane Irene, as well as recent research inferring the geographical origin of Twitter messages.

Biography

Aron Culotta obtained his Ph.D. in Computer Science from the University of Massachusetts, Amherst in 2008, advised by Dr. Andrew McCallum. He was a Microsoft Live Labs Fellow from 2006-2008, and completed research internships at IBM, Google, and Microsoft Research. He is currently an Assistant Professor of Computer Science at Northeastern Illinois University.