Introduction to Statistical Learning and Machine Learning (Fall 2018)

Lectures: Tuesday (evening, H4206)

Office hours: Wed. 4:10-5:30pm (N210, Zibin building)

Instructor: Yanwei Fu (yanweifu@fudan.edu.cn)

Teaching Assistants: (各位同学，有什么事情，可以给两个助教发邮件，也可以wechat联系助教.

Please email/Wechat the two TAs if you have any problems.）

（1）谢宇 wechat: Y1314941 email: 15955038579@163.com

（2）孙强 wechat: sunqiang6861 email: sunqiang85@gmail.com

Synopsis: As an introduction to statistical learning and machine learning, this course is about learning from data: statistical learning refers to a set of tools for modeling and understanding complex datasets; and machine learning is defined as a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform other kinds of decision making under uncertainty. Thus, the main objectiveness is to present students a unified view of both two fields through the teaching of the methodology, applications and the key ideas behind the methods. The whole course is illustrated with R as well as other statistical programming languages such as Matlab and Python. We aim at gradually cultivating students the abilities of both theoretical analysis and practical problem solving.

Textbook:

1. James, Witten, Hastie and Tibshirani (2013) An Introduction to Statistical Learning, with applications in R. Springer.

2. Bishop, C.M. (2006), Pattern recognition and Machine Learning, Springer.

3. Hastie, T., Tibshurani, R. and Friedman, J. (2011) The Elements of Statistical Learning, data mining, inference and Prediction, 2ndEdition. Springer.

Prerequisites:

Registration:

Grading:

(1) Class attendance (10%), includes class performance, class discussion and critical thinking.

Each absent: -1%

(2) Weekly homework (20%), is of 5 times. We expect the student can finish each one within 1.5-2.5 hours.

Each: 4% *5;

Late Submission (after Dec. 14th, 2017) will be penalized by 50% of total score of each homework, in other word, the highest score for each one of the late submission is 2%.

(3) Monthly mini-projects (50%), is of 3-4 projects which are selected from the real-world Big-Data problems, including but not limited to, computer vision, pattern recognition, recommendation system, social network, financial data analysis and bioinformatics. In general, the reports should be written in English, and include algorithm skims (3%), critical codes (2%), experimental analysis (3%); and the discussion of proposed method (2%).

About the Submission of mini-projects.

The report can be written by Word or Latex. Generate a single pdf file of your mini-projects. The file name should be SLML_yourname_student-id.pdf. Also put the names and Student ID in your paper. To submit the report, email the pdf file to 15955038579@163.com. About the deadline and penalty. In general, you should submit the paper according to the deadline of each mini-project. The late submission is also acceptable; however, you will be penalized 10% of scores for each week's delay.

(4) Final project (20%) is finished by one team. Each team should have up to 3 students; and will solve a real-world Big-Data problem. In general, the final report should be written in English. The main components of the report will cover (1) introduction to background and potential applications (2%); (2) Review of the state-of-the-art (3%); (3) Algorithms and critical codes in a nutshell (10%); (4) Experimental analysis and discussion of proposed methodology (5%).

Reference books:

Math cookbook Linear Algebra Review

Note that:

(1) mini-projects are not allowed to use any existing toolbox; you have to write every line codes by yourself.

(2) In final project, you can use the toolbox.　

(3) Meanwhile, we will randomly check some students' projects by asking his/her some questions, in order to validate that the projects are done by himself/herself.

Timetable

	Topic	Slides	Exec &Notes	Other material	Web Videos
1	Overview	Introduction	ex1 Notes	Rcode	Intro1 Intro2
2	Linear regression	linear regression
3	Project -1	Oct-13 5:00pm	project1
4	Linear classification	linear classification	ex2
5	Linear SVM	linear_svm	Chap4(Page170) 6, 7; Chap 9 (Page 368) 1, 2,3	tutorial: Latex Latex Example Chinese Intro
6	SVM	svm
7	project-2	deadline: 5:00pm, Nov 19, 2018	project2	Naive Bayes
8	neural_network(1)	nn1			NN1
9	neural_network(2)	nn2
10	neural_network(3)	nn3
11	Learning theory	learning theory
12	Project-3	deadline: 5:00pm, Jan 5th, 2018	project3
13	Mid-term	slides	ex4	Notes_Andrew_Ng
14	Unsupervised Learning	slides		EM_GMM
15	final projects		final_project
16	semi-supervised learning	slides
17	tree-based method	slides
18

Good Reading Material:

Creativity in Machine Learning

高等统计选讲

Foundation_of_data_science