Course Objective


This course will provide an overview of fundamental concepts, methodologies and issues in information retrieval, focusing on both relevant theory and applications. The core focus is on algorithms and methodologies for finding relevant documents relevant to user queries, accurately and efficiently. The students will learn the basic components of a retrieval systems and study the challenges behind designing and implementing these components. Time permitting, the course will look at additional topics such as dynamic information retrieval and introduction to image retrieval. Programming experience is expected. Please refer blackboard for lecture notes, assignments and project details.

Prerequisites


CS 331 or CS 401; strong programming knowledge expected.

Recommended textbook


  • Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Cambridge University Press. 2008.
For reference:
  • Information Retrieval: Algorithms and Heuristics, D.A. Grossman, O. Frieder. Springer, 2004.
Lecture slides, reading assignment and assignments will be posted on the course website.

Course Schedule


WeekTopics
Week 1Introduction to IR
Search Architecture
Week 2Indexing
Dictionaries
Week 3Scalable Indexing
Index compression
Week 4Vector space model
Week 5Performance metrics
Query optimization
Week 6Probabilistic IR
Week 7Language Models
Week 8Data mining techniques: Classification
Week 9Data mining techniques: Clustering
Week 10Data mining techniques: Classification
Week 11Link analysis/Page Rank algorithm
Week 12Advanced topics in IR
Week 13Advanced topics in IR
Week 14Final Class
Review of topics

Grading

AssessmentComments%
Homework AssignmentsAround 4-650%
Midterm Exam20%
Final Exam20%
Class QuizAround 4-6 quizzes10%

Course Outcomes


  • Explain the information retrieval storage methods (Inverted Index and Signature Files)
  • Explain retrieval models, such as Boolean model, Vector Space model, Probabilistic model, Inference Networks, and Neural Networks.
  • Explain retrieval utilities such as Stemming, Relevance Feedback, N-gram, Clustering, and Thesauri, and Parsing and Token recognition.
  • Design and implement a search engine prototype using the storage methods, retrieval models and utilities.
  • An ability to apply mathematical foundations, algorithmic principles, and computer science theory in the modeling and design of computer-based systems in a way that demonstrates comprehension of the tradeoffs involved in design choices.
  • Apply the research ideas into their experiments in building a search engine prototype.

Program Outcomes


  • An ability to apply knowledge of computing and mathematics appropriate to the program's student outcomes and to the discipline.
  • An ability to analyze a problem, and identify and define the computing requirements appropriate to its solution.
  • An ability to design, implement and evaluate a computer-based system, process, component, or program to meet desired needs.
  • An ability to use current techniques, skills, and tools necessary for computing practices.
  • An ability to apply mathematical foundations, algorithmic principles, and computer science theory in the modeling and design of computer-based systems in a way that demonstrates comprehension of the tradeoffs involved in design choices.
  • An ability to apply design and development principles in the construction of software systems of varying complexity.

Honor Code


The university academic dishonesty policies are in force for the course. Please refer to the handbook for details. Students will not collaborate on assignments or homeworks unless it is explicitly allowed. Students will also read the College of Science academic integrity pledge.