Machine Learning in R

This course was developed by Jeffrey Girard and Shirley Wang for SMaRT Workshops (May 28–31, 2024).


Jeffrey Girard University of Kansas	Shirley Wang Yale University

Whereas statistical methods traditionally used in the social and behavioral sciences emphasize interpretability and quantification of uncertainty, machine learning methods emphasize complexity and accuracy of predictions. Machine learning methods are thus particularly well-suited for applications where (1) there are nonlinear and complex relationships among a large number of predictor variables and (2) accurately predicting the outcome variable is more important than fully understanding the relationships between variables.

This workshop will provide a hands-on introduction to the application of machine learning techniques in R using the {tidymodels} packages. It will emphasize practical knowledge and conceptual intuitions (e.g., teaching you how to drive a car) rather than technical and theoretical mastery (e.g., teaching you how to build a car). In addition, rather than briefly surveying the full breadth of available machine learning techniques, this workshop will provide a deep dive into several supervised learning methods with broad applicability in the social and behavioral sciences: regularized regression models (GLMNET), random forest ensembles (RF), and support vector machines (SVM). Introductory theory for these methods will be included and recommendations for further readings will be provided.

This workshop’s practical focus will allow attendees to learn about: formulating a good research question that machine learning can answer, preparing data for analysis, setting up a rigorous cross-validation procedure, evaluating predictive performance, and interpreting/reporting results for a scientific audience.