FrontPage


Machine Learning 101 & 102 Fall 2010

 

Overview of the course

Machine Learning 101, deals primarily with supervised learning problems.  Machine Learning 102 covers unsupervised learning and fault detection. 

 

Both 101 and 102 begin at the level of elementary probability and statistics and from that background survey a broad array of machine learning techniques.  The classes will give participants a working knowledge of these techniques and will leave them prepared to apply those techniques to real problems.  To get the most out of the class, participants will need to work through the homework assignments. 

 

Prerequisites

This class assumes a moderate level of computer programming proficiency.  We will use R (the open source statistics language) for the homework and for the examples in class.  We will cover some of the basics of R and do not assume any prior knowledge of R.  You can find references to how to use R on this website and we will give out sample code during classes that will help get you started. 

 

You'll need some general beginner-level background in probability, calculus, linear algebra and vector calculus.  We will cover most of what is required during the lectures.  The appendices in the back of the Tan text are more than sufficient level for this class. 

 

Machine Learning 101 and 102 can be taken in any any order.  The prerequisites for the two classes are the same.  They second five week session (Machine Learning 102) will culminate in the students giving presentations on papers they have read.

 

Why use R?

We're going to use R as our lingua franca for looking at homework problems, discussing them and comparing different solution approaches.    Load R onto your laptop or desk computer before you come to the first class.   http://cran.r-project.org/  We will include some descriptive material on using R in the first two lectures in order to get everyone up to speed on it. To integrate R with Eclipse click here. References for R are here: References for R Comment on these references here:  Reference for R Comments  More R references

 

Please note that anyone can read this web site, however only the instructors have permission to write on the site.  We welcome new members to the class, but we are not granting permissions to edit this site.

 

General Sequence of Classes:

 

Machine Learning 101:   Supervised learning

Machine Learning 102Unsupervised Learning and Fault Detection

Text: "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbach and Vipin Kumar

 

Machine Learning 201:    Advanced Regression Techniques, Generalized Linear Models, and Generalized Additive Models    

Machine Learning 202:   Collaborative Filtering, Bayesian Belief Networks, and Advanced Trees

Text:  "The Elements of Statistical Learning - Data Mining, Inference, and Prediction"  by Trevor Hastie, Robert Tibshirani, and Jerome Friedman

 

Future Topics 

     Data Mining Social Networks

     Text Mining

     Recommender Methods

     Big Data

 

 

     Outline for Fall 2010                                                                        

 

 

   Machine Learning 101  
1st Week  
 
22-Sep Chapter 1 & 2                                       Notes For First Week
23-Sep Chapter 3  
2nd Week  
 
29-Sep Chapter 4                                                Notes for 2nd Week

HW #1 Due

HomeworkAssignment01.doc

HW # 1  HW1.pdf

30-Sep Chapter 4   
3rd Week  
 
6-Oct Simple Regression                                 Notes for  Week 03

HW # 2 Due

Homework02.doc

Homework02.pdf

7-Oct Ridge Regression & k Nearest Neighbors
 
4th Week  
 
13-Oct

Chapter 5                                                                 Week04

finish k Nearest Neighbors

Naive Bayes

HW #3 Due

Homework03.doc Homework03.pdf

14-Oct

Chapter 5

Support Vector Machines

 
5th Week  
 
20-Oct

Chapter 5 Week05

finish SVM

Start Ensemble Methods

HW #4 Due

Homework04.doc

Homework04.pdf

 

21-Oct

Chapter 5

Finish Ensemble Methods

 
     
  Machine Learning 102  
6th Week  
 
27-Oct

Chapter 5  Week06

Class Imbalance

HW #5 Due

Homework05.doc  Homework05.pdf

28-Oct Chapter 6  
7th Week  
 
3-Nov Chapter 8 Week07

HW #6 Due

Homework06.doc  Homework06.pdf

4-Nov Chapter 8 Cluster Analysis  
8th Week  
 
10-Nov

Papers Week08

Group 3, Group 4

Work Hard on your Presentations
11-Nov Papers  Group 1, Group 2  
13-Nov    Data Mining Camp Saturday Instructions 
9th Week  
 
17-Nov Chapter 9 Week09  HW #7 on Chapter 8  Due
18-Nov Chapter 9  
10th Week    
1-Dec Chapter 10, Week10  HW #8 on Chapter 9 Due
2-Dec Chapter 10  
 
 
 

 

Lectures are in the Lectures Folder

Homeworks are in the Homework Folder

DataFiles

 

There are more Machine Learning References on Patricia's web site http://patriciahoffmanphd.com/