Course Objective
This course will provide an overview of fundamental concepts, methodologies and issues in information retrieval, focusing on both relevant theory and applications. The core focus is on algorithms and methodologies for finding relevant documents relevant to user queries, accurately and efficiently. The students will learn the basic components of a retrieval systems and study the challenges behind designing and implementing these components. Time permitting, the course will look at additional topics such as dynamic information retrieval and introduction to image retrieval. Programming experience is expected.
Please refer
blackboard for lecture notes, assignments and project details.
Prerequisites
CS 331 or CS 401; strong programming knowledge expected.
Recommended textbook
- Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Cambridge University Press. 2008.
For reference:
- Information Retrieval: Algorithms and Heuristics, D.A. Grossman, O. Frieder. Springer, 2004.
Lecture slides, reading assignment and assignments will be posted on the course website.
Course Schedule
Week | Topics |
Week 1 | Introduction to IR Search Architecture |
Week 2 | Indexing Dictionaries |
Week 3 | Scalable Indexing Index compression |
Week 4 | Vector space model |
Week 5 | Performance metrics Query optimization |
Week 6 | Probabilistic IR |
Week 7 | Language Models |
Week 8 | Data mining techniques: Classification |
Week 9 | Data mining techniques: Clustering |
Week 10 | Data mining techniques: Classification |
Week 11 | Link analysis/Page Rank algorithm |
Week 12 | Advanced topics in IR |
Week 13 | Advanced topics in IR |
Week 14 | Final Class Review of topics |
Grading
Assessment | Comments | % |
Homework Assignments | Around 4-6 | 50% |
Midterm Exam | | 20% |
Final Exam | | 20% |
Class Quiz | Around 4-6 quizzes | 10% |
Course Outcomes
- Explain the information retrieval storage methods (Inverted Index and Signature Files)
- Explain retrieval models, such as Boolean model, Vector Space model, Probabilistic model, Inference Networks, and Neural Networks.
- Explain retrieval utilities such as Stemming, Relevance Feedback, N-gram, Clustering, and Thesauri, and Parsing and Token recognition.
- Design and implement a search engine prototype using the storage methods, retrieval models and utilities.
- An ability to apply mathematical foundations, algorithmic principles, and computer science theory in the modeling and design of computer-based systems in a way that demonstrates comprehension of the tradeoffs involved in design choices.
- Apply the research ideas into their experiments in building a search engine prototype.
Program Outcomes
- An ability to apply knowledge of computing and mathematics appropriate to the program's student outcomes and to the discipline.
- An ability to analyze a problem, and identify and define the computing requirements appropriate to its solution.
- An ability to design, implement and evaluate a computer-based system, process, component, or program to meet desired needs.
- An ability to use current techniques, skills, and tools necessary for computing practices.
- An ability to apply mathematical foundations, algorithmic principles, and computer science theory in the modeling and design of computer-based systems in a way that demonstrates comprehension of the tradeoffs involved in design choices.
- An ability to apply design and development principles in the construction of software systems of varying complexity.
Honor Code
The university academic dishonesty policies are in force for the course. Please refer to the handbook for details. Students will not collaborate on assignments or homeworks unless it is explicitly allowed. Students will also read the College of Science academic integrity pledge.