CMSC 491W/691W

Web and Data Mining

Description

This 3 credit course will cover the fundamentals of data mining from both the database and pattern recognition perspectives. Also, we will study how mining can work in semi-structured environments such as the web. Emphasis will be placed on both understanding the basics as well as the new and emerging technologies. While a text may be prescribed, most of the reading will be from papers. There will be a significant  group project.

Course Organization

 
Instructor: Anupam Joshi
Office: ECS 225 E
Office Phone: 455-2590
E-Mail Address: joshi@cs.umbc.edu
Office Hours: Walk-in/ By appointment. 
TA: Jianfeng Wang
Office: ECS 202 G
Office Phone: 455-3641
E-Mail Address: jwang6@cs.umbc.edu
Office Hours: TBA

Texts:  None at this time

References:

  1. KDNuggets Directory: Data Mining and Knowledge Discovery [http://www.kdnuggets.com/]
  2. The Corporate KDD Bookmark [http://www.cs.su.oz.au/~thierry/ckdd.html]
Other sites will be added. You will also be assigned to read some additional materials.

Prerequisites:

CSEE senior or graduate (or equivalent). Must have UG level background in either databases (e.g. CMSC 461) or AI/Pattern Recognition (e.g. CMSC471) . Talk to the instructor if you would like to take the course, but are not sure of
your background.
 

Course Information (or Stuff you should know up front ...)

A few noteworthy points. First, this is a 400/600 level course, meant for CSEE seniors and graduate students. This course will assume that you are largely familiar with most of these topics: It is your responsibility to catch up, in any case. In class, I will assume that all students have the requisite background. Second, this is a course in the "systems" area. That means that hands on work is almost as important as theoretical knowledge, and projects will account for almost 60% of your grade. Expect to be putting in significant effort! I will expect you to follow good programming practices (commenting, headers, version control, makefiles, etc. etc.) that you have learnt in previous classes.

We will use the World Wide Web as a convenient tool for distributing course material and presenting other information. The URL for the class web server is http://www.cs.umbc.edu/~ajoshi/courses/cmsc491w/. A class newsgroup will also be created.

Homeworks will be mostly in the form of readings. Projects will be interdisciplinary in nature, be done is a group, and be somewhat tailored to each group based on their prior background and training. Quizzes may be given if needed to ``encourage'' students to read the assigned material. A midterm exam is planned, a final exam will likely not be given.

The Important Stuff (i.e. grades)

Given the format of this course, attendance and class discussion are essential for the learning process. While I cannot require attendance, your regular attendance will be needed in order to participate in class and in order to take the (unannounced) quizzes. I will not give any makeups for these quizzes. Course grades will be a function of your performance in the projects, paper presentations, quizzes, and exams, as well as of your participation in class. The grades will be based on a curve. A tentative breakdown of grades, likely to change as we go along is

Academic Dishonesty

As you have probably been told umpteen times by now, violating this policy is a strict no-no! If we catch anyone cheating, we
will take the maximum action possible against them, including reporting the matter to the appropriate university authorities. Please cooperate by doing your own work and not seeking inappropriate help from your classmates. You may, of course, discuss homeworks and assignments amongst yourselves, as long as that discussion does not lead to a exchange of solutions.

ADA Compliance

We recognize that some of you may have disabilities that require special attention from the instruction staff. Please make us aware of them at your earliest so that UMBC can make suitable arrangements.