Introduction to Statistical Learning and Machine Learning (Fall 2018)

Lectures:  Tuesday (evening, H4206)

Office hours: Wed. 4:10-5:30pm (N210, Zibin building)

Instructor: Yanwei Fu   (yanweifu@fudan.edu.cn)

Teaching Assistants:  (各位同学,有什么事情,可以给两个助教发邮件,也可以wechat联系助教.

Please email/Wechat the two TAs if you have any problems.)

(1)谢宇    wechat: Y1314941    email:  15955038579@163.com

(2)孙强    wechat: sunqiang6861    email: sunqiang85@gmail.com

Synopsis: As an introduction to statistical learning and machine learning, this course is about learning from data: statistical learning refers to a set of tools for modeling and understanding complex datasets; and machine learning is defined as a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform other kinds of decision making under uncertainty. Thus, the main objectiveness is to present students a unified view of both two fields through the teaching of the methodology, applications and the key ideas behind the methods. The whole course is illustrated with R as well as other statistical programming languages such as Matlab and Python. We aim at gradually cultivating students the abilities of both theoretical analysis and practical problem solving.

Textbook

1. James, Witten, Hastie and Tibshirani (2013) An Introduction to Statistical Learning, with applications in R. Springer.

2. Bishop, C.M. (2006), Pattern recognition and Machine Learning, Springer.

3. Hastie, T., Tibshurani, R. and Friedman, J. (2011) The Elements of Statistical Learning, data mining, inference and Prediction, 2ndEdition. Springer.

Prerequisites:

Registration:

Grading:

(1) Class attendance (10%), includes class performance, class discussion and critical thinking.

Each absent: -1%

(2) Weekly homework (20%), is of 5 times. We expect the student can finish each one within 1.5-2.5 hours.

Each: 4% *5;

Late Submission (after Dec. 14th, 2017) will be penalized by 50% of total score of each homework, in other word, the highest score for each one of the late submission is 2%.

(3) Monthly mini-projects (50%), is of 3-4 projects which are selected from the real-world Big-Data problems, including but not limited to, computer vision, pattern recognition, recommendation system, social network, financial data analysis and bioinformatics. In general, the reports should be written in English, and include algorithm skims (3%), critical codes (2%), experimental analysis (3%); and the discussion of proposed method (2%).

About the Submission of mini-projects.

The report can be written by Word or Latex.  Generate a single pdf file of your mini-projects. The file name should be SLML_yourname_student-id.pdf.  Also put the names and Student ID in your paper. To submit the report, email the pdf file to 15955038579@163.comAbout the deadline and penalty. In general, you should submit the paper according to the deadline of each mini-project. The late submission is also acceptable; however, you will be penalized 10% of scores for each week's delay.

(4) Final project (20%) is finished by one team. Each team should have up to 3 students; and will solve a real-world Big-Data problem. In general, the final report should be written in English. The main components of the report will cover (1) introduction to background and potential applications (2%); (2) Review of the state-of-the-art (3%); (3) Algorithms and critical codes in a nutshell (10%); (4) Experimental analysis and discussion of proposed methodology (5%).

Reference books:

Math cookbook   Linear Algebra Review

Note that:

(1) mini-projects are not allowed to use any existing toolbox; you have to write every line codes by yourself.

(2) In final project, you can use the toolbox. 

(3) Meanwhile, we will randomly check some students' projects by asking his/her some questions, in order to validate that the projects are done by himself/herself.



Timetable


Topic Slides Exec &Notes Other material Web Videos
1 Overview  Introduction  ex1     Notes 
Rcode 
Intro1
Intro2
2 Linear regression linear regression


3 Project -1
Oct-13   5:00pm
project1


4 Linear classification
linear classification
ex2


5 Linear SVM
linear_svm
Chap4(Page170) 6, 7;  Chap 9 (Page 368) 1, 2,3 tutorial: Latex
Latex Example
Chinese Intro

6
SVM
svm



7 project-2
deadline: 5:00pm, Nov 19, 2018 project2 Naive Bayes
8 neural_network(1)
nn1


NN1
9 neural_network(2) nn2



10 neural_network(3) nn3



11 Learning theory
learning theory



12 Project-3
deadline: 5:00pm, Jan 5th, 2018 project3


13 Mid-term
slides
ex4
Notes_Andrew_Ng

14 Unsupervised Learning
slides

EM_GMM

15 final projects
 
final_project


16 semi-supervised learning
slides



17 tree-based method
slides



18




Good Reading Material: