Machine learning and data mining methods have emerged as
cornerstone technologies for transforming the deluge of data
generated by modern society into actionable intelligence. For
applications ranging from business intelligence to public policy
to clinical guidelines, the overarching goal of “big data”
analytics is to identify, analyze, and summarize the available
evidence to support decision makers. While ubiquitous computing
has greatly simplified data collection, successful deployment of
machine learning techniques is also generally predicated on
obtaining sufficient quantities of human-supplied annotations.
Accordingly, judicious use of human effort in these settings is
crucial to building high-performance systems in a cost-effective
manner.
In this talk, I will describe methods for reducing annotation costs and improving system performance via interactive learning protocols. Specifically, I will present models capable of exploiting domain-expert knowledge through the use of labeled features -- both within the active learning framework to explicitly reduce the need for labeled data during training and the more general setting of improving classifier performance in high-expertise domains. Furthermore, I will contextualize this work within the scientific systematic review process, highlighting the importance of interactive learning protocols in a particular scenario where information must be reliably extracted from multiple information sources, synthesized into a cohesive report, and updated as new evidence is made available in the scientific literature. I will demonstrate that we can partially automate many of the aspects of this important task, thus reducing the costs incurred when interacting with highly-trained experts.