STAT 220
Some common scale functions:
scale_fill_manual() : Manually define the fill colors for different categoriesscale_fill_brewer() : Use color palettes from the ColorBrewer libraryscale_color_viridis() : Use the viridis color scale for continuous data.scale_shape_manual() : Manually define the shapes for different categories.scale_x_log10(): Transform the x-axis to a logarithmic scale.scale_y_reverse(): Reverse the direction of the y-axis.scale_color_gradient(): Define a custom color gradient for continuous data.scale_fill_discrete(): Use a predefined color palette for discrete data.Let’s make Lake #1 steelblue and Lake #2 maroon
Theme: The non-data ink on your plots
Examples:

Click for various basic ggplot2 themes and for ggthemes

library(ggthemes)
ggplot(data) +
  geom_histogram(
    aes(x = Depth, fill = Location), 
    binwidth = 1,
    color = "lightblue") +
    scale_fill_manual(values = c("steelblue", "maroon")) + 
    theme_solarized() +
    theme(legend.position = "none") +
    annotate("text", x = 20, y = 15, label = "Lake #1", color = "steelblue") + 
    annotate("text", x = 39, y = 15, label = "Lake #2", color = "maroon")By default, ggplot2 uses a Cartesian coordinate system, but there are others available!
coord_cartesian: Adjusts the x and y axis limits without modifying the data.coord_equal: Ensures equal scaling for the x and y axes.coord_fixed: Sets a fixed aspect ratio for the plot.coord_flip: Flips the x and y axes.coord_map: Projects the plot onto a map projection.coord_polar: Transforms the plot to a polar coordinate system.coord_quickmap: Provides an approximation for a map projection.coord_sf: Designed for use with sf objects (spatial data).coord_trans: Transforms the plot’s x and y axes using specified transformations.The ggplot2 package contains latitude and longitude to define geographic boundaries
state, usa, world, county?map_data or ?maps for more regions (may need to install maps)Rows: 15,537
Columns: 6
$ long      <dbl> -87.46201, -87.48493, -87.52503, -87.53076, -87.57087, -87.5…
$ lat       <dbl> 30.38968, 30.37249, 30.37249, 30.33239, 30.32665, 30.32665, …
$ group     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ order     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ region    <chr> "alabama", "alabama", "alabama", "alabama", "alabama", "alab…
$ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
A set of latitude longitude points…
… that are connected with lines in a very specific order.
Add other geographic information by adding geometric layers to the plot
Add non-geographic information by altering the fill color for each state
Use geom = "polygon" to treat states as solid shapes to add color
Incorporate numeric information using color shade or intensity
Incorporate categorical information using color hue
geom_polygon connects the dots between lat (y) and long (x) points in a given group. It connects start and end points which allows you to fill a closed polygon shape
Rows: 51
Columns: 7
$ State                                  <chr> "Alabama", "Alaska", "Arizona",…
$ `7-day avg. cases`                     <int> 0, 0, 0, 0, 128, 0, 0, 13, 0, 0…
$ `7-day avg. deaths`                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ Cases                                  <chr> "1,659,936", "287,319", "2,486,…
$ Deaths                                 <chr> "21,138", "1,457", "29,852", "1…
$ `7-day avg. hospitalizations`          <int> 30, 4, 152, 30, 377, 92, 20, 83…
$ `7-day avg. hospitalizations per 100k` <dbl> 0.6, 0.6, 2.0, 1.0, 1.0, 1.0, 0…
Source: https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/
We need to add the covid info to the state polygon data set
# A tibble: 15,537 × 12
   long   lat group order region  subregion x7_day_avg_cases x7_day_avg_deaths
  <dbl> <dbl> <dbl> <int> <chr>   <chr>                <int>             <int>
1 -87.5  30.4     1     1 alabama <NA>                     0                 0
2 -87.5  30.4     1     2 alabama <NA>                     0                 0
3 -87.5  30.4     1     3 alabama <NA>                     0                 0
4 -87.5  30.3     1     4 alabama <NA>                     0                 0
5 -87.6  30.3     1     5 alabama <NA>                     0                 0
# ℹ 15,532 more rows
# ℹ 4 more variables: cases <dbl>, deaths <dbl>,
#   x7_day_avg_hospitalizations <int>,
#   x7_day_avg_hospitalizations_per_100k <dbl>

library(viridis)
ggplot(covid_data) + 
  geom_polygon(aes(long, lat, group = group, fill = cases)) +
  scale_fill_viridis_c(option = "viridis", 
                       trans = "log10", 
                       labels = scales::comma,
                       guide = guide_colorbar(title.position = "top")) +
  labs(fill = "cases", title = "COVID-19 Cases by State") +
  coord_map() + theme_map() +
  theme(legend.position="right")Uses color or shading of sub regions to visual data. Displays divided geographical areas or regions that are colored in relation to a numeric variable.
ACS <- read.csv("https://raw.githubusercontent.com/deepbas/statdatasets/main/ACS.csv")
ACS <- dplyr::filter(ACS, !(region  %in% c("Alaska", "Hawaii"))) # only 48+D.C.
ACS$region <- tolower(ACS$region)  # lower case (match states regions)
glimpse(ACS)Rows: 49
Columns: 8
$ X              <int> 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18,…
$ region         <chr> "alabama", "arizona", "arkansas", "california", "colora…
$ PopSize        <int> 4841164, 6728577, 2968472, 38654206, 5359295, 3588570, …
$ MedianAge      <dbl> 38.6, 37.1, 37.7, 36.0, 36.4, 40.6, 39.6, 33.8, 41.6, 3…
$ PercentFemale  <dbl> 51.5, 50.3, 50.9, 50.3, 49.8, 51.2, 51.6, 52.6, 51.1, 5…
$ BornInState    <int> 3387845, 2623391, 1823628, 21194542, 2294446, 1981427, …
$ MedianIncome   <int> 23527, 26565, 22787, 27772, 31325, 34124, 30648, 41160,…
$ PercentInState <dbl> 69.98, 38.99, 61.43, 54.83, 42.81, 55.21, 45.49, 36.72,…
Don’t need to merge ACS and states data!

30:00