Load packages
Hands-on Activity
Our goal is to build a model predicting body_mass_g
.
Create an initial split, stratified by the label.
Create a recipe:
- Create dummy variables for categorical features
- Normalize all numeric features
- Remove features with near-zero variance
- Remove features with large absolute correlations with other features
- Remove features that are a linear combination of other features
Specify a linear regression model
Build a workflow
Fit a model on the training data with 10-fold cross-validation, repeated three times, stratified by the label
Examine performance during cross-validation
Fit a final model on the training data and plot variable importance
Evaluate performance in the final, hold-out test set.
Answer key
Click here to view the answer key to the hands-on activity
# 1
bmg_split <- initial_split(penguins, prop = 0.8, strata = body_mass_g)
bmg_train <- training(bmg_split)
bmg_test <- testing(bmg_split)
# 2
bmg_recipe <-
recipe(bmg_train, formula = body_mass_g ~ .) %>%
step_dummy(all_nominal_predictors()) %>%
step_nzv(all_predictors()) %>%
step_corr(all_predictors()) %>%
step_lincomb(all_predictors()) %>%
step_normalize(all_numeric_predictors())
# 3
lin_reg <- linear_reg() %>%
set_engine("lm") %>%
set_mode("regression")
# 4
bmg_wflow <-
workflow() %>%
add_model(lin_reg) %>%
add_recipe(bmg_recipe)
#5
bmg_folds <- vfold_cv(
data = bmg_train,
v = 10,
repeats = 3,
strata = body_mass_g
)
bmg_fitr <-
bmg_wflow %>%
fit_resamples(
resamples = bmg_folds
)