class: center, middle, inverse, title-slide .title[ #
Workflows and Metrics
] .subtitle[ ## 🚣 💯 📏 ] .author[ ### Machine Learning in R
SMaRT Workshops
] .date[ ### Day 2A Jeffrey Girard ] --- class: inverse, center, middle # Workflows --- # Setup ``` r # Set up packages library(tidyverse) library(tidymodels) tidymodels_prefer() ``` -- ``` r # Read in and tidy data titanic <- read_csv("https://tinyurl.com/mlr-titanic") %>% mutate( survived = factor(survived, levels = c(1, 0)), pclass = factor(pclass), sex = factor(sex) ) ``` --- ## Data Splitting and Model Setup ``` r # Create train/test split, stratified by fare set.seed(2022) fare_split <- initial_split(data = titanic, prop = 0.8, strata = fare) ``` -- ``` r # Extract training and testing sets fare_train <- training(fare_split) fare_test <- testing(fare_split) ``` -- ``` r # Set up model (linear regression using lm) lm_model <- linear_reg() %>% set_mode("regression") %>% set_engine("lm") ``` --- ## Create a simple workflow .onecol[ - A .imp[workflow] collects various specifications for your ML experiment - The **model** specifies the algorithm, mode, engine, and tuning steps - The **preprocessor** specifies the formula and feature engineering steps ] ``` r fare_wflow <- workflow() %>% add_model(lm_model) ``` -- ``` r fare_wflow ## ══ Workflow ════════════════════════════════════════════════════════════════════ ## Preprocessor: None ## Model: linear_reg() ## ## ── Model ─────────────────────────────────────────────────────────────────────── ## Linear Regression Model Specification (regression) ## ## Computational engine: lm ``` --- ## Add a formula as a simple preprocessor .onecol[ - We can use a .imp[formula] as a simple preprocessor (without feature engineering) ] ``` r fare_wflow <- fare_wflow %>% add_formula(fare ~ pclass + sex + age + sibsp + parch) ``` -- ``` r fare_wflow ## ══ Workflow ════════════════════════════════════════════════════════════════════ ## Preprocessor: Formula ## Model: linear_reg() ## ## ── Preprocessor ──────────────────────────────────────────────────────────────── ## fare ~ pclass + sex + age + sibsp + parch ## ## ── Model ─────────────────────────────────────────────────────────────────────── ## Linear Regression Model Specification (regression) ## ## Computational engine: lm ``` .footnote[*Note.* We will soon learn to use "recipes" as more powerful preprocessors] --- ## Fit a model using a workflow .onecol[ - We can explicitly fit the model to the training data using `fit()` ] ``` r fare_fit <- fit(fare_wflow, fare_train) ``` -- .scroll.h-3l[ ``` r fare_fit ## ══ Workflow [trained] ══════════════════════════════════════════════════════════ ## Preprocessor: Formula ## Model: linear_reg() ## ## ── Preprocessor ──────────────────────────────────────────────────────────────── ## fare ~ pclass + sex + age + sibsp + parch ## ## ── Model ─────────────────────────────────────────────────────────────────────── ## ## Call: ## stats::lm(formula = ..y ~ ., data = data) ## ## Coefficients: ## (Intercept) pclass2 pclass3 sexmale age sibsp ## 92.39601 -71.45150 -81.07618 -6.76926 -0.05083 4.13072 ## parch ## 11.74983 ``` ] --- ## Make predictions using the fit model .onecol[ - We can explicitly make predictions in the testing set using `predict()` ] ``` r fare_pred <- predict(fare_fit, fare_test) ``` -- .scroll.h-3l[ ``` r fare_pred ## # A tibble: 263 × 1 ## .pred ## <dbl> ## 1 113. ## 2 82.0 ## 3 91.2 ## 4 91.1 ## 5 83.8 ## 6 99.6 ## 7 90.9 ## 8 90.9 ## 9 90.2 ## 10 90.3 ## # ℹ 253 more rows ``` ] --- ## A helpful shortcut .onecol[ - Or we can do both automatically using `last_fit()` and the "split" object - This will fit to the (entire) training set and predict the testing set ] ``` r fare_fit <- last_fit(fare_wflow, split = fare_split) ``` -- ``` r fare_fit ## # Resampling results ## # Manual resampling ## # A tibble: 1 × 6 ## splits id .metrics .notes .predictions .workflow ## <list> <chr> <list> <list> <list> <list> ## 1 <split [1046/263]> train/test split <tibble> <tibble> <tibble> <workflow> ``` --- ## Collecting the predictions .onecol[ - We can gather the testing set predictions with `collect_predictions()` ] ``` r fare_pred <- collect_predictions(fare_fit) ``` -- .scroll.h-3l[ ``` r fare_pred ## # A tibble: 263 × 5 ## .pred id .row fare .config ## <dbl> <chr> <int> <dbl> <chr> ## 1 113. train/test split 2 152. Preprocessor1_Model1 ## 2 82.0 train/test split 10 49.5 Preprocessor1_Model1 ## 3 91.2 train/test split 13 69.3 Preprocessor1_Model1 ## 4 91.1 train/test split 14 78.8 Preprocessor1_Model1 ## 5 83.8 train/test split 20 75.2 Preprocessor1_Model1 ## 6 99.6 train/test split 21 52.6 Preprocessor1_Model1 ## 7 90.9 train/test split 25 222. Preprocessor1_Model1 ## 8 90.9 train/test split 33 165. Preprocessor1_Model1 ## 9 90.2 train/test split 42 27.7 Preprocessor1_Model1 ## 10 90.3 train/test split 45 134. Preprocessor1_Model1 ## # ℹ 253 more rows ``` ] --- ## Plotting the predictions (basic) .pull-left[ ``` r ggplot( fare_pred, aes(x = fare, y = .pred) ) + geom_point() ``` ] .pull-right[ ![](data:image/png;base64,#../figs/fare_pred1.png) ] --- ## Plotting the predictions (advanced) .pull-left[ ``` r ggplot( fare_pred, aes(x = fare, y = .pred) ) + geom_point(alpha = .2) + geom_abline(color = "darkred") + coord_obs_pred() + labs( x = "Observed Fare", y = "Predicted Fare" ) ``` ] .pull-right[ ![](data:image/png;base64,#../figs/fare_pred2.png) ] --- ## Collecting the performance metrics .onecol[ - We can gather the testing set predictions with `collect_metrics()` ] ``` r fare_perf <- collect_metrics(fare_fit) ``` -- ``` r fare_perf ## # A tibble: 2 × 4 ## .metric .estimator .estimate .config ## <chr> <chr> <dbl> <chr> ## 1 rmse standard 30.2 Preprocessor1_Model1 ## 2 rsq standard 0.567 Preprocessor1_Model1 ``` .footnote[*Note.* We will soon learn how to interpret these metrics.] --- ## Live Coding: Putting it all together .scroll.h-0l[ ``` r # Load packages library(tidyverse) library(tidymodels) tidymodels_prefer() # Load and tidy data titanic <- read_csv("https://tinyurl.com/mlr-titanic") %>% mutate( survived = factor(survived), pclass = factor(pclass), sex = factor(sex) ) # Create data splits, stratified by fare set.seed(2022) fare_split <- initial_split(data = titanic, prop = 0.8, strata = 'fare') fare_train <- training(fare_split) fare_test <- testing(fare_split) # Set up model (linear regression using lm) lm_model <- linear_reg() %>% set_mode("regression") %>% set_engine("lm") # Set up workflow with simple formula preprocessor fare_wflow <- workflow() %>% add_model(lm_model) %>% add_formula(fare ~ pclass + sex + age + sibsp + parch) # Fit workflow and make predictions using data splits fare_fit <- last_fit(fare_wflow, split = fare_split) # Collect predictions and performance metrics fare_pred <- collect_predictions(fare_fit) fare_perf <- collect_metrics(fare_fit) ``` ] --- ## Live Coding: Modifying it for classification .scroll.h-0l[ ``` r # Load packages and data library(tidyverse) library(tidymodels) tidymodels_prefer() titanic <- read_csv("https://tinyurl.com/mlr-titanic") %>% mutate( survived = factor(survived), pclass = factor(pclass), sex = factor(sex) ) # Create data splits, stratified by survived set.seed(2022) *surv_split <- initial_split(titanic, prop = 0.8, strata = 'survived') surv_train <- training(surv_split) surv_test <- testing(surv_split) # Set up model (logistic regression using glm) glm_model <- * logistic_reg() %>% * set_mode("classification") %>% * set_engine("glm") # Set up workflow with simple formula preprocessor surv_wflow <- workflow() %>% * add_model(glm_model) %>% * add_formula(survived ~ pclass + sex + age + sibsp + parch + fare) # Fit workflow and make predictions using data splits surv_fit <- last_fit(surv_wflow, split = surv_split) # Collect predictions and performance metrics surv_pred <- collect_predictions(surv_fit) surv_perf <- collect_metrics(surv_fit) ``` ] --- class: inverse, center, middle # Performance Metrics --- class: onecol ## Performance Metrics .left-column.pt3[ <img src="data:image/png;base64,#../figs/target.jpg" width="100%" /> ] .right-column[ **Metrics for Regression** - .imp[Distance] between predicted and trusted values - .imp[Correlation] between predicted and trusted values .pt1[ **Metrics for Classification** ] - .imp[Confusion matrix] between predicted and trusted classes - Compare predicted .imp[class probabilities] to trusted classes ] --- class: onecol ## Default distance metric for regression .pull-left[ **Root Mean Squared Error (RMSE)** - Based on squared loss - Penalizes severe errors harsher - Ranges from `\(0\)` to `\(+\infty\)`, lower is better .pt1[ `$$RMSE=\sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2}$$` ] ] .pull-right.pt4[ <img src="data:image/png;base64,#slides_2a_files/figure-html/rmse-1.png" width="100%" /> ] --- class: onecol ## Default correlation metric for regression **R-Squared `\((R^2\)` or RSQ)** - Calculated in ML as the **squared correlation** between the predictions and labels - Ranges from `\(0\)` to `\(1\)`, higher is better `$$R^2 = \left(\frac{\text{cov}(y, \hat{y})}{\sigma_y\sigma_{\hat{y}}}\right)^2 = \left(\frac{\sum(y_i - \bar{y})(\hat{y}_i - \bar{\hat{y}})}{\sqrt{\sum (y_i-\bar{y})^2}\sqrt{\sum(\hat{y}_i-\bar{\hat{y}})^2}}\right)^2$$` -- .bg-light-yellow.b--light-red.ba.bw1.br3.pl4[ + RSQ is a measure of *consistency* (i.e., linear association) and not distance + RSQ can become unstable or undefined when data variability is low + RSQ can become unstable when applied in small samples (e.g., test sets) ] --- class: twocol ## Default confusion matrix metric for classification | | Truth = `\(+\)`<br /> `\((y=1)\)` | Truth = `\(-\)`<br /> `\((y=0)\)` | |:---------------------------------------------:|:-------------------------:|:-------------------------:| | **Predicted = `\(+\)`** <br /> `\((\hat{y}=1)\)` | True Positive (TP) | False Positive (FP) | | **Predicted = `\(-\)`** <br /> `\((\hat{y}=0)\)` | False Negative (FN) | True Negative (TN) | .footnote[*Note.* There are many, many confusion matrix metrics to choose from.] -- .mt3.pull-left[ `$$\text{Accuracy} = \frac{TN + TP}{TN + FN + FP + TP}$$` .tc.mt4[ Ranges from `\(0\)` to `\(1\)`, higher is better ] ] -- .mt3.pull-right[ `$$\text{Sensitivity}=\frac{TP}{TP+FN}=\frac{TP}{P}$$` `$$\text{Specificity}=\frac{TN}{TN+FP}=\frac{TN}{N}$$` ] --- ## Live Coding: Classification metrics .scroll.h-0l[ ``` r # Add that metric set to the fitting function surv_fit <- last_fit(surv_wflow, split = surv_split) # Collect and print testing set metrics surv_perf <- collect_metrics(surv_fit) surv_perf # Calculate raw confusion matrix surv_cm <- collect_predictions(surv_fit) %>% conf_mat(truth = survived, estimate = .pred_class) surv_cm # Plot confusion matrix autoplot(surv_cm, type = "mosaic") autoplot(surv_cm, type = "heatmap") # Calculate all confusion matrix metrics summary(surv_cm) ``` ] --- class: onecol ## Default probability metric for classification - When a classifier outputs class probabilities, we can choose any **decision threshold** - We might naturally consider any probability over 50% positive and all others negative - But we could choose a threshold more conservative (e.g., 75%) or liberal (e.g., 25%) -- .pt1[ - **Performance curves** plot the characteristics of different decision thresholds - This gives us an overview of how the classification system performs *overall* ] -- .pt1[ - A popular option is the **Receiver Operating Characteristic** (ROC) Curve - The **area under the curve (AUC)** condenses the ROC Curve into a single metric ] --- class: onecol ## Interpreting a ROC curve and its AUC .pull-left[ Each point in a ROC curve corresponds to a possible decision threshold - Sensitivity is `\(TP / P\)` - Specificity is `\(TN / N\)` Better curves are closer to the top-left The area under the ROC curve (AUC) ranges from `\(0.5\)` to `\(1.0\)`, higher is better. *ROC AUC is the probability that a random positive example has a higher estimate than a random negative example.* ] .pull-right[ <img src="data:image/png;base64,#slides_2a_files/figure-html/rocex-1.png" width="95%" /> ] --- ## Live Coding: ROC Curves .scroll.h-0l[ ``` r # Examine predictions surv_pred # Examine outcome level ordering levels(surv_pred$survived) # Calculate per-threshold performance surv_roc <- roc_curve( data = surv_pred, truth = survived, .pred_1, ) surv_roc # Plot the ROC curve autoplot(surv_roc) ``` ] --- class: twocol ## Comprehension Check \#1 <span style="font-size:30px;">Bindi trains Model [A] to predict how many kilometers each bird will migrate this year and Model [B] to predict whether or not it will reproduce this year.</span> .pull-left[ **1. Which combination of performance metrics would be appropriate to use?** a) Accuracy for [A] and [B] b) RMSE for [A] and [B] c) Accuracy for [A] and RMSE for [B] d) RMSE for [A] and Accuracy for [B] ] .pull-right[ **2. Which combination of performance scores should Bindi hope to see?** a) RSQ = 0.10 and ROC AUC = 0.04 b) RSQ = 0.45 and ROC AUC = 0.92 c) RSQ = 0.10 and ROC AUC = 0.92 d) RSQ = 0.45 and ROC AUC = 0.04 ] --- class: inverse, center, middle # Time for a Break!
−
+
10
:
00