{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## CMSC 691: Homework 2\n", "\n", "In this homework you will be working on clustering and try analyzing the data using various clustering algos from the scikit learn module.\n", "\n", "You may use the Clusetring Jupyter Notebook Dr. Kalpakis prepared for the class." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### DataSet: 'Diabetic Data' \n", "\n", "This dataset represents 10 years (1999-2008) of clinical care at 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes. Information was extracted from the database for encounters that satisfied the following criteria.\n", "1.\tIt is an inpatient encounter (a hospital admission).\n", "2.\tIt is a diabetic encounter, that is, one during which any kind of diabetes was entered to the system as a diagnosis.\n", "3.\tThe length of stay was at least 1 day and at most 14 days.\n", "4.\tLaboratory tests were performed during the encounter.\n", "5.\tMedications were administered during the encounter.\n", "The data contains such attributes as patient number, race, gender, age, admission type, time in hospital, medical specialty of admitting physician, number of lab test performed, HbA1c test result, diagnosis, number of medication, diabetic medications, number of outpatient, inpatient, and emergency visits in the year before the hospitalization, etc.\n", "\n", "https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('diabetic_data.csv')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q1. K-Means (10 pts):\n", "\n", "Use the k-means clustering algo from sklearn.\n", "Cluster the data over any numeric attribute by varying any 2 combinations of parameters of the KMeans algorithm \n", "and discuss the results obtained\n", "\n", "Documentation:\n", "http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.cluster import KMeans\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Q2. Use the k-medoids clustering algorithm (30 pts):\n", "\n", "k-medoids is a special case of k-means where the centroids are chosen from among the data points in the dataset.\n", "\n", "Using k-medoids, create 2 different clusters of the above dataset.\n", "1. Clustering over some numeric attributes of your choice.\n", "2. Clustering over some nominal and numerical attributes of your choice.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q3 Agglomerative (10 pts)\n", "\n", "Use agglomerative clustering to demonstrate hierarchical (agglomerative) clustering of the dataset. Plot a dendrogram for your clustering." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q4 Spectral Clustering (10 pts)\n", "\n", "Perform spectral clustering on attributes of your choice." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Q5 Plotting clusters and discussion of results (40 pts)\n", "\n", "Using tools from the matplot, seaborn, and pandas packages, illustrate the generated clusters for each of the clustering methods used above and discuss the results." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }