```
# load the necessary libraries
library(tidyverse)
library(ggthemes)
library(janitor)
library(broom)
library(mlbench)
library(tidymodels)
library(probably)
<- dplyr::select
select theme_set(theme_stata(base_size = 10))
data(PimaIndiansDiabetes2)
<- PimaIndiansDiabetes2
db <- db %>% drop_na() %>%
db mutate(diabetes = fct_relevel(diabetes, c("neg", "pos"))) # Relevels 'diabetes' factor to ensure 'neg' comes before 'pos'
```

# Class Activity 24

## Group Activity 1

In this activity, we will calculate the probability of diabetes for a glucose level of 150 mg/dL using the logistic regression coefficients \(\beta_0 = -5.61\) and \(\beta_1 = 0.0392\).

### a. Calculate Log Odds

First, calculate the log odds for a glucose level of 150 mg/dL.

## Click for answer

```
<- -5.61 + (0.0392 * 150)
log_odds log_odds
```

`[1] 0.27`

### b. Convert Log Odds to Odds

## Click for answer

```
<- exp(log_odds)
odds odds
```

`[1] 1.309964`

### c. Convert Odds to Probability

## Click for answer

Finally, convert the odds to probability.

```
<- odds / (1 + odds)
probability probability
```

`[1] 0.5670929`

The probability of having diabetes at a glucose level of 150 mg/dL is calculated to be 0.5670929.

## Group Activity 2

- Let’s fit the logistic regression model.

```
set.seed(12345)
<- db %>% select(diabetes, glucose)
db_single <- initial_split(db_single, prop = 0.80)
db_split
# Create training data
<- db_split %>% training()
db_train
# Create testing data
<- db_split %>% testing()
db_test
<- logistic_reg() %>% # Call the model function
fitted_logistic_model # Set the engine/family of the model
set_engine("glm") %>%
# Set the mode
set_mode("classification") %>%
# Fit the model
fit(diabetes~., data = db_train)
tidy(fitted_logistic_model)
```

```
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -5.61 0.678 -8.28 1.20e-16
2 glucose 0.0392 0.00514 7.62 2.55e-14
```

- We are interested in predicting the diabetes status of patients depending on the amount of glucose. Verify that the glucose value of 143.11 gives the probability of having diabetes as 1/2.

- What value of glucose is needed to have a probability of diabetes of 0.5?

- Make a classifier that classifies the diabetes status of new patients with a threshold of 0.5, i.e, a new patient is classified as negative if the estimated class probability is less than 0.75. Also, create a confusion matrix of the resulting predictions. Evaluate the model based on accuracy, sensitivity, specificity, and ppv.

- Generate a ROC Curve and Determine the Optimal Threshold: Evaluate the performance of your diabetes prediction model by plotting a ROC curve. Use the curve to identify the point that is closest to the top-left corner (maximizing sensitivity and minimizing 1 - specificity), and back-calculate to find the corresponding optimal threshold. This threshold represents the best balance between sensitivity (true positive rate) and specificity (false positive rate).