SDS 383C: Statistical Modeling I
The course syllabus can be found here.
The scribing material can be found in here.
08/25 : Different types of models and convergence of random variables [Scribe notes]. 6.2 and 5.1-5.2 from All of Statistics
08/30 : Properties of the MLE. 5.3, 9.3-9.10 from All of Statistics, [Scribe notes].
09/1 : Multivariate random variables, Parametric Bootstrap from All of Statistics, [Scribe notes].
09/3 : First homework is out. [HW1]. Dataset [dir1.txt]. [Solutions, thanks to Giorgio Paulon!].
09/13 : Failings of the MLE. Shrinkage estimators and empirical Bayes. 5.3, 9.3-9.10 from All of Statistics, [Scribe notes].
09/15 : Empirical Bayes and James Stein estimators more closely, Linear regression, Hastie-Tibshirani-Friedman 3.1-3.2.2 [Scribe notes].
09/20 : Linear regression, Hastie-Tibshirani-Friedman 3.1-3.2.2 [Scribe notes].
09/25 : Second homework is out. [HW2]. [Solutions, thanks to Spencer Woody!].
09/22 : Model selection. Optimism of training error. H-T-F 7.1 , 7.4 7.5 [scribe notes].
09/23 : Model selection continued. AIC, BIC, Variable selection, H-T-F 7.10 , 3.4 [scribe notes].
09/23 : Cross Validation. Ridge regression and Lasso H-T-F 7.10 , 3.4 [scribe notes].
09/27 : Ridge regression and the Lasso - orthogonal coefficients. H-T-F 7.10 , 3.4 [scribe notes].
09/29 : Logistic regression. H-T-F 4.4 [scribe notes].
10/3 : Third homework is out. [HW3]. Dataset [Breast cancer data]. Solutions.
10/4 : Collinearity in lasso/ridge and introduction to LDA [scribe notes].
10/11 : LDA and Fisher discriminant analysis HTF 4.3 [scribe notes].
10/13 : Old Midterm [Solutions].
10/13 : Some sample midterm questions [sample questions].
10/18 : Naive Bayes. Generalized linear models. Fitting. [scribe notes]. Tom Mitchell's book chapter on Naive Bayes can be found [here].
10/20 : Robust Statistics - sensitivity curve, influence function and breakdown point. Huber loss. M estimation. [scribe notes].
10/22 : M estimation continued. Models for clustering--kmeans [scribe notes].
10/25 : Midterm [Solutions].
10/27 : EM, missing data and censoring [scribe notes].
10/27 : Fourth homework is out. [HW4].
10/28 : Project guidelines. The project reports are due on the last day of class. [project guidelines].
11/08 : When EM may not work, lightbulb example [scribe notes]. Flury and Zoppe's word of caution on EM.
11/10 : d-separation demystified slides .
11/15 : Gibbs and collapsed gibbs for a Naive Bayes type document model. Gibbs for the Latent Dirichlet Allocation model. [scribe notes]. A great primer to Gibbs sampling for the uninitiated here. Check for collapsed gibbs sampling under 2.4.3 and 2.5.1. Check out the trouble in integrating out other continuous parameters under 2.6.
11/17 : Gibbs and collapsed gibbs for a the Latent Dirichlet Allocation model. [scribe notes]. A great source is here.
11/17 : More project guidelines: projects should be written in the NIPS format which can be downloaded [Here]. The length can be at most 8 pages plus 1 page of references. In the introduction, tell us what problem you are intending to solve and why its important. In the related work section talk in detail about the other works that are relevant. In Proposed work, tell me concretely what it is you are trying to solve, or which methods are you surveying. In experiments, compare the different methods on simulated and real datasets. And finally in Discussion, tell me what you have learned, what works better and when or why. The deadline is 5th Dec midnight, but I will not start grading until 9th Dec, morning.
11/15 : Fifth homework is out -- it is due Dec 5th. [HW5].
11/22 : Bootstrap and subsampling. [scribe notes]. A great source for bootstrap is here. The second part of the blog post talks about subsampling. Also chapter 8 of A-S is a good resource.
11/24 : Bootstrap and subsampling continued. [scribe notes].
12/1 : Density estimation. Chapter 6 of All of Nonparametric Statistics. This book is available online through lib.utexas.edu.
12/2 : Non-parametric classification: K-NN. Check out these related notes from CMU. Chapter 13 of HTF is a great source for learning about nearest neighbors and adaptive nearest neighbors.