Questions About the Exam

  1. in Q2, what if there are more than one large itemsets with the required support of the same size?
    Please list all such itemsets
  2. in Q1, When clustering, are we to make assumptions on the number of clusters to work towards or is there some way of determining a satisfactory number?
    There really isn't a good way to predermine cluster number. If you use approaches such as FCM or K-means, you probably want to try with 2-3 possible values. For example, as the number of data points is 14, you may want to try 7 or 3 or 5 as the number of clusters and see where you get the best results. If you use SAHN, then of course you will start by assuming the each point belongs to a cluster and build upwards.

Anupam Joshi
Last modified: Wed Apr 28 09:57:48 EDT 1999