GLMNET Example and Tuning

class: center, middle, inverse, title-slide

.title[
# <span style="font-size:48pt;">GLMNET Example and Tuning</span>
]
.subtitle[
## 🥅 🎛️ 🎚️
]
.author[
### Machine Learning in R<br /><i>SMaRT Workshops</i>
]
.date[
### Day 3B     Jeffrey Girard
]

---

class: inverse, center, middle
# Parameter Tuning
---
class: twocol
## Model and tuning parameters
.pull-left[
#### Model Parameters

-   Often specify the learned .imp[relationships]

-   **LM:** intercept and slope(s)

-   **GLMNET:** intercept and slope(s)

-   To "fit" a model is to estimate the model parameter values using training data
]

.pull-right[
#### Tuning Parameters

-   Often control the model's .imp[complexity]

-   **LM:** *(none)*

-   **GLMNET:** penalty and mixture

-   Unfortunately, tuning parameter values .underline[cannot be estimated] using training data
]

--
.pv3[
.bg-light-yellow.b--light-red.ba.bw1.br3.ph4[
**Caution:** The "default" values of many tuning parameters often perform quite poorly...
]
]

.footnote[
[1] Different algorithms may have different model parameters and tuning parameters.<br />
[2] Tuning parameters are sometimes also called "hyperparameters."
]
---
class: onecol
## Parameter Tuning
.pull-left[
-   Recall that we need to find a balance

-   Too simple a model will **underfit**
  
  -   Too complex a model will **overfit**
  
  -   Both lead to poor generalizability

-   To find this optimal balance, we...

-   Try various hyperparameter values
  
  -   Select those that best generalize
  
  -   (This is done during resampling)
]

.pull-right.pv3.tc[
![](data:image/png;base64,#../figs/overfitting.png)

.flex.items-center.justify-center.nt3[
.f3[Complexity Dial:]

]
]

---
class: onecol
## Determining which values to try

.lh-copy[
-   .imp[Grid Search] is the approach we will use in this course
  1.   Start with some reasonable boundaries<sup>1</sup> for each tuning parameter
  2.   Generate a list of possible values within those boundaries
  3.   Create a list of all (or some) combinations of these values
  4.   Try all of these values and then select the best
]

--
.lh-copy[
-   .imp[Iterative Search] is a more advanced approach
  1.    Start with some reasonable value<sup>1</sup> for each tuning parameter
  2.    Fit a model with those values and examine performance
  3.    Try slightly different values and compare performance
  4.    Continue until performance barely changes anymore
]

.footnote[[1] But what are "reasonable" values? Tidymodels has us covered!]
---
## Determining which values to try

.pull-left[
![](data:image/png;base64,#../figs/grid_search.png)
]

.pull-right[
![](data:image/png;base64,#../figs/iterative_search.png)
]

---
class: onecol
## Steps in parameter tuning

1.    Split your data (full or training set) into resamples

2.    Determine which parameters to tune (often all)

3.    Create a list of tuning parameter values to try

4.    Try each combination of values during resampling
  -   Train a model on some of the data with these values
  -   Test the model just described on the rest of the data
  -   *Optional:* Repeat several times and average the results

5.    Find the combination of values with the best performance

6.    Train a final, "tuned" model on all your data with the best values...

---
class: twocol
## Key functions for tuning

-   `tune()`: Tell tidymodels which parameters to tune

.pt1[
-   `extract_parameter_dials()`: Extract information about one parameter

-   `extract_parameter_set_dials()`: Extract information about all parameters
]

.pt1[
-   `finalize()`: Determine reasonable boundaries for each parameter (automatically)

-   `tune_grid()`: Create a list of value combinations (within boundaries) and try them
]

.pt1[
-   `select_best()`: Determine which combination of values was the best

-   `finalize_workflow()`: Store the best values in the workflow object
]

---
class: inverse, center, middle
# GLMNET Example
---

## Live Coding: Prepare data and folds

``` r
# Load data
titanic <- read_csv("https://tinyurl.com/mlr-titanic")

# Create data splits, stratified by fare
set.seed(2023)

fare_split <- initial_split(data = titanic, prop = 0.8, strata = fare)
fare_train <- training(fare_split)
fare_test <- testing(fare_split)

fare_folds <- vfold_cv(data = fare_train, v = 10, repeats = 3, strata = fare)
```

---

## Live Coding: Set up model and parameters

``` r
# Set up model (linear regression using glmnet)

?linear_reg

install.packages("glmnet")

glmnet_model <- 
  linear_reg(penalty = tune(), mixture = tune()) %>%
  set_mode("regression") %>% 
  set_engine("glmnet")
glmnet_model
```

---

## Live Coding: Prepare workflow and metrics

.scroll.h-0l[

``` r
fare_recipe <- 
  recipe(fare_train) %>% 
  update_role(fare, new_role = "outcome") %>% 
  update_role(pclass:parch, new_role = "predictor") %>% 
  update_role(survived, new_role = "ignore") %>% 
  step_naomit(fare) %>% 
  step_mutate(
    pclass = factor(pclass),
    sex = factor(sex)
  ) %>% 
  step_dummy(all_nominal_predictors()) %>%
  step_impute_linear(age) %>%
  step_nzv(all_predictors()) %>%
  step_corr(all_numeric_predictors()) %>%
  step_lincomb(all_numeric_predictors()) %>% 
  step_normalize(all_predictors())

# Prepare workflow

fare_wflow <-
  workflow() %>% 
  add_model(glmnet_model) %>% 
  add_recipe(fare_recipe)
```
]

---

## Live Coding: Set up the parameter dials

``` r
# Extract the parameters to tune and finalize with the data folds

glmnet_param <-
  glmnet_model %>%
  extract_parameter_set_dials() %>% 
  finalize(fare_folds)

# View the finalized grids (not necessary, just to look)

glmnet_param %>% extract_parameter_dials("penalty")

glmnet_param %>% extract_parameter_dials("mixture")
```

---

## Live Coding: Configure grid search and tune grid

.scroll.h-0l[

``` r
# Perform tuning by searching over space-filling grid

fare_tune <- 
  fare_wflow %>%
  tune_grid(
    resamples = fare_folds,
    grid = 10,
    param_info = glmnet_param
  )
fare_tune

# View the performance by parameter values (averaged across folds and repeats)

collect_metrics(fare_tune)

# Plot the marginal performance by parameter values

autoplot(fare_tune, metric = "rmse")
```
]

---

## Live Coding: Finalize the workflow

``` r
# Select the best parameters values

fare_param_final <- select_best(fare_tune, metric = "rmse")
fare_param_final

# Finalize the workflow with best parameter values

fare_wflow_final <- 
  fare_wflow %>% 
  finalize_workflow(fare_param_final)
fare_wflow_final
```

---

## Live Coding: Final performance and interpretation

``` r
# Fit the final model to the entire training set and test in testing set

fare_final <- 
  fare_wflow_final %>% 
  last_fit(fare_split)

collect_metrics(fare_final)

collect_predictions(fare_final) %>% 
  ggplot(aes(x = fare, y = .pred)) + 
  geom_point(alpha = 0.2) +
  geom_abline(color = "darkred") +
  coord_obs_pred()

fare_final %>% 
  extract_fit_parsnip() %>% 
  vip::vip()

fare_final %>% 
  extract_fit_parsnip() %>% 
  vip::vi()
```

---

## Live Coding: Deployment

``` r
# Fit the final model to the entire dataset for interpretation and deployment

fare_deploy <- 
  fare_wflow_final %>% 
  fit(titanic)

# Deployment: If new data comes in, use this model to make predictions

titanic_new <- tibble(
  name = c("Jack Dawson", "Rose DeWitt Bukater"),
  survived = c(0, 1),
  pclass = c(3, 1),
  sex = c("male", "female"),
  age = c(20, 17),
  sibsp = c(0, 1),
  parch = c(0, 1)
)
predict(fare_deploy, titanic_new)
```

---

## Some helpful boilerplate

.scroll.h-0l[

``` r
# Get some boilerplate for GLMNET to start with and then modify

library(usemodels)
use_glmnet(formula = fare ~ ., data = titanic)
## glmnet_recipe <- 
##   recipe(formula = fare ~ ., data = titanic) %>% 
##   step_zv(all_predictors()) %>% 
##   step_normalize(all_numeric_predictors()) 
## 
## glmnet_spec <- 
##   multinom_reg(penalty = tune(), mixture = tune()) %>% 
##   set_mode("classification") %>% 
##   set_engine("glmnet") 
## 
## glmnet_workflow <- 
##   workflow() %>% 
##   add_recipe(glmnet_recipe) %>% 
##   add_model(glmnet_spec) 
## 
## glmnet_grid <- tidyr::crossing(penalty = 10^seq(-6, -1, length.out = 20), mixture = c(0.05, 
##     0.2, 0.4, 0.6, 0.8, 1)) 
## 
## glmnet_tune <- 
##   tune_grid(glmnet_workflow, resamples = stop("add your rsample object"), grid = glmnet_grid)
```
]

---

class: inverse, center, middle

# Time for a Break!
<div class="countdown" id="timer_c73af799" data-warn-when="120" data-update-every="1" tabindex="0" style="right:33%;bottom:15%;left:33%;">
<div class="countdown-controls"><button class="countdown-bump-down">−</button><button class="countdown-bump-up">+</button></div>
<code class="countdown-time"><span class="countdown-digits minutes">60</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>