UMBC CMSC 491/691-I Fall 2002 Home  |  News  |  Syllabus  |  Project   ]
Last updated: 11 December 2002

Information Retrieval

T-Th 5:30-6:45pm
PHYS 201
Dr. Ian Soboroff
ian@cs.umbc.edu

This course is an introduction to the theory and implementation of software systems designed to search through large collections of text. Ever wonder how World-Wide Web search engines work? Ever wondered why they don't? You'll learn about it here. Information retrieval (IR) is one of the oldest branches of computer science, and has influenced nearly every aspect of computer usage: "search and replace" in a word processor, querying a card catalog, grep'ing through your source code, filtering the spam out of your email, searching the Web.

This course will have two main thrusts. The first is to cover the fundamentals of IR: retrieval models, search algorithms, and IR evaluation. The second is to give a taste of the implementation issues by having you write (a good chunk of) your own text search engine and test it out on a sample text collection. This will be a semester-long project, details TBA.

You will need to have taken the equivalent of CMSC 341 (Data Structures), and an algorithms course (441 or 641) is recommended. Linear algebra (MATH 221) and Statistics (STAT 355) are recommended but not required; they give background which will be helpful in understanding many IR concepts. Undergraduates will be expected to cover the basic material in the textbook and the programming project. Graduate students will also be expected to read additional papers (indicated in class), and implement something in their project from at least one of them.

News

11 Dec

Class on Thursday, 12/12 will go from 6-8pm. This is the time scheduled for a final exam, but we will use that time for class presentations.


5 Dec

Classes are cancelled today at UMBC, so today's presentations will take place on Tuesday, and Tuesday's presentations will take place next Thursday. See the presentation page for details.


8 Nov

We will have class on Tuesday night, 12 November. Dr. Scott Cost will be speaking on agent-based IR and the CARROT-II project.

The schedule of project presentations has been posted.

Both umbc-crawl-2 and the new Reuters Corpus rcv1 are now available under /data/nicholas2/ian


7 Nov

Homework 6, due on Thursday, Nov 14, is finally out and will be discussed in class tonight. Note that this homework involves the new umbc-crawl-2 collection!


22 Oct

Homework 5 is out and will be discussed in class tonight. On Thursday, Dr. Charles Nicholas will be giving a talk on latent semantic indexing.


17 Oct

Homework 4 and Phase III of the project will both be discussed this evening. HW4 is due on Tuesday, 22 October.


15 Oct

The syllabus has been rearranged. Some topics have been shuffled, and Phase III project proposals will be due 10/31.


24 Sep

Homework 3 is out, due Tuesday, 1 October.


19 Sep

Solutions for Homework 2 have been posted.


12 Sep

Homework 2 is out and is due on Tueday, 17 September.


11 Sep

Solutions for Homework 1 have been posted. There is a bug fix in my solution to question 4.


3 Sep

Homework 1 is out, and due next Tuesday, 10 Sep. I will go over the instructions in class tonight. I am planning to release phase 1 of the project on Thursday.

The permissions should now be set properly for you to access the collections in /data/nicholas2/ian. Please let me know if there are any problems.


27 Aug

The course is fully enrolled (with more than 30 students), and there is a hold list of at least 20 more students last I checked. Please come to the first class and speak with me if you are on the hold list.