This page will be updated during the semester.
Topics | Readings | Notes | Final exam reading list | |
---|---|---|---|---|
Course overview and logistics What is data science? Data Science process |
Chapter 1 of O'Neil and Schutt There's More Than One Kind of Data Scientist by Harlan Harris. CRISP-DM CRISP-DM |
Intro to data science (PPT) | slides 1-10 | |
Databases - SQL | SQL PPT Slides | slides 1-41 | ||
Databases - NoSQL | PPT | slides 1-22 | ||
A Crash Course in Python |
Chapter 2 of Grus Appendix A of McKinney |
4 min overview of jupyter Try jupyter on your browser |
Python Crash course | |
Pandas |
Pandas Library Documentation McKinney's book on pandas |
10 Minutes to pandas Pandas Cookbook |
Pandas I Pandas II | |
Getting to Know your data | Chapter 2 of DMCT | PPT PDF | slides 1-41, 55-68 | |
Data Preprocessing | Chapter 3 of DMCT | PPT PDF | Slides 1-15, 23, 25, 26, 31, 34, 35, 42-48, 54-61 | |
Clustering | Chapter 10 of DMCT | PPT PDF | Slides 1-27 | |
Linear regression | Chapter 3 of ISLA | slides 1-26 | ||
Classification I Decision Trees Naive Bayes Logistic regression Model evaluation and Selection Bagging and Boosting Random Forests |
Chapter 4 of ISLA Chapter 8 of DMCT |
Logistic regression PPT PDF |
slides 1-11 of logistic reg
slides 1-13, 28-30, 48-55, 58-62, 68-70 of the PPT |
|
Classfication II Bayesian Belief Nets Neural Nets and Backpropagation Support Vector Machines k-Nearest Neighbors Active Learning Transfer Learning |
Chapter 9 of DMCT | PPT PDF | Slides 1-6, 12-17, 26-30, 35-40, 66-67 | |
Outlier Detection | Chapter 12 of DMCT | PPT PDF | slides 1-13 | |
Neural Nets and Deep Learning | ||||
Dimensionality reduction (SVD and CUR) | Chapter 11 of MMDS | PPT PDF | slides 7-18, 46-54 | |
Recommendation Systems | Chapter 9 of MMDS | PPT PDF | slides 9-23 | |
Mining Social-Network Graphs | Chapter 10 of MMDS | PPT PDF | slides 7-14, 19-30, 49-52 | |
Distributed/Cloud computing, scaling up | Amazon EC2 Tutorial | |||
Hadoop and Map-Reduce | Chapter 2 of MMDS
MapReduce Tutotial by Yahoo Hadoop Streaming Framework "Writing an Hadoop MapReduce Program in Python" by Michael Noll "A Guide to Python Frameworks for Hadoop" by Uri Laserson Making Python on Apache Hadoop Easier with Anaconda and CDH |
PPT PDF | slides 1-20, 23-25 | |
Spark | Apache Spark Tutorial: Machine Learning with PySpark |
An Overview of Spark by Jim Scott. Intro to Spark by Matei Zaharia |
slides 1-13, 15-26, 29-31, 38-51, 58 from Zaharia's PPT |
© Copyright 2017- Dr. K. Kalpakis |