UMBC CMSC 491/691-I Fall 2002 Home  |  News  |  Syllabus  |  Project   ]
Last updated: 7 November 2002

Homework 6

Assignment: Finalize relevance judgments for umbc-crawl-2 topics.

Goal: To turn umbc-crawl-2 from a web crawl into a usable test collection. From the relevant documents found in Homework 5, you will define the final relevance judgments with which we will measure our search systems' performance.

Due date:Thursday, November 14, 2002.

Description

For Homework 4, you designed three search topics for the UMBC collection. For Homework 5, you distributed each topic to three people and yourself received nine different topics from others. Now, you will get back the results of those searches for your topics.

By class on October 29th, you should have received the results of three students' search efforts on each of your three topics. Those results take the form of a list of relevant documents for that topic, according to the judgment of the searcher. For this homework, you will create a final set of relevance judgements for each topic using these searches.

For each topic, Determine if the page actually exists in umbc-crawl-2, using the docs file in the umbc-crawl-2 directory. Each line of this file has a document ID, the page's actual URL, and the filename where that file is stored (under umbc-crawl-2/test). If we have the page in the crawl, note it's document identifier (e.g., UC-4367). Then, examine the proposed relevant page according to the standard of relevance defined in Homework 5. Look at the page in the crawl (that is, under /data/nicholas2/ian/umbc-crawl-2/test/...), not on the live web. Decide if the page is truly relevant to your search topic.

(You will probably find that you disagree with your searchers about what pages are really relevant to the topic. Avoid changing your search topic, unless it's obvious that the searchers completely misunderstood your intention.)

Record your judgments in the following format:

          <topic-id> <doc-id> <rel?>
Where

NOTE - use umbc-crawl-2!

Please remember to check your pages against umbc-crawl-2. This is the crawl we will be using for web documents. DO NOT codify your relevance judgments against the original umbc-crawl!

There is likely to be some issues with relevant pages appearing after the searches were done. If you run across a relevant page not referenced by the searches, feel free to add it to the relevance judgments if the page is present in umbc-crawl-2.

What to turn in

Submit (via the Blackboard dropbox) your relevance judgments (in a single ASCII text file), and your final topic statements. Add a <topicnum> field to each topic statement with the identifier you assigned it.