# load the necessary libraries
library(tidyverse)
library(ggthemes)
library(janitor)
library(broom)
library(mlbench)
library(tidymodels)
library(probably)
<- dplyr::select
select theme_set(theme_stata(base_size = 10))
data(PimaIndiansDiabetes2)
<- PimaIndiansDiabetes2
db <- db %>% drop_na() %>%
db mutate(diabetes = fct_relevel(diabetes, c("neg", "pos"))) # Relevels 'diabetes' factor to ensure 'neg' comes before 'pos'
Class Activity 24
Group Activity 1
In this activity, we will calculate the probability of diabetes for a glucose level of 150 mg/dL using the logistic regression coefficients \(\beta_0 = -5.61\) and \(\beta_1 = 0.0392\).
a. Calculate Log Odds
First, calculate the log odds for a glucose level of 150 mg/dL.
Click for answer
<- -5.61 + (0.0392 * 150)
log_odds log_odds
[1] 0.27
b. Convert Log Odds to Odds
Click for answer
<- exp(log_odds)
odds odds
[1] 1.309964
c. Convert Odds to Probability
Click for answer
Finally, convert the odds to probability.
<- odds / (1 + odds)
probability probability
[1] 0.5670929
The probability of having diabetes at a glucose level of 150 mg/dL is calculated to be 0.5670929.
Group Activity 2
- Let’s fit the logistic regression model.
set.seed(12345)
<- db %>% select(diabetes, glucose)
db_single <- initial_split(db_single, prop = 0.80)
db_split
# Create training data
<- db_split %>% training()
db_train
# Create testing data
<- db_split %>% testing()
db_test
<- logistic_reg() %>% # Call the model function
fitted_logistic_model # Set the engine/family of the model
set_engine("glm") %>%
# Set the mode
set_mode("classification") %>%
# Fit the model
fit(diabetes~., data = db_train)
tidy(fitted_logistic_model)
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -5.61 0.678 -8.28 1.20e-16
2 glucose 0.0392 0.00514 7.62 2.55e-14
- We are interested in predicting the diabetes status of patients depending on the amount of glucose. Verify that the glucose value of 143.11 gives the probability of having diabetes as 1/2.
- What value of glucose is needed to have a probability of diabetes of 0.5?
- Make a classifier that classifies the diabetes status of new patients with a threshold of 0.5, i.e, a new patient is classified as negative if the estimated class probability is less than 0.75. Also, create a confusion matrix of the resulting predictions. Evaluate the model based on accuracy, sensitivity, specificity, and ppv.
- Generate a ROC Curve and Determine the Optimal Threshold: Evaluate the performance of your diabetes prediction model by plotting a ROC curve. Use the curve to identify the point that is closest to the top-left corner (maximizing sensitivity and minimizing 1 - specificity), and back-calculate to find the corresponding optimal threshold. This threshold represents the best balance between sensitivity (true positive rate) and specificity (false positive rate).