STAT 220
Some common scale functions:
scale_fill_manual()
: Manually define the fill colors for different categoriesscale_fill_brewer()
: Use color palettes from the ColorBrewer
libraryscale_color_viridis()
: Use the viridis color scale for continuous data.scale_shape_manual()
: Manually define the shapes for different categories.scale_x_log10()
: Transform the x-axis to a logarithmic scale.scale_y_reverse()
: Reverse the direction of the y-axis.scale_color_gradient()
: Define a custom color gradient for continuous data.scale_fill_discrete()
: Use a predefined color palette for discrete data.Let’s make Lake #1 steelblue
and Lake #2 maroon
Theme: The non-data ink on your plots
Examples:
Click for various basic ggplot2 themes and for ggthemes
library(ggthemes)
ggplot(data) +
geom_histogram(
aes(x = Depth, fill = Location),
binwidth = 1,
color = "lightblue") +
scale_fill_manual(values = c("steelblue", "maroon")) +
theme_solarized() +
theme(legend.position = "none") +
annotate("text", x = 20, y = 15, label = "Lake #1", color = "steelblue") +
annotate("text", x = 39, y = 15, label = "Lake #2", color = "maroon")
By default, ggplot2
uses a Cartesian coordinate system, but there are others available!
coord_cartesian
: Adjusts the x and y axis limits without modifying the data.coord_equal
: Ensures equal scaling for the x and y axes.coord_fixed
: Sets a fixed aspect ratio for the plot.coord_flip
: Flips the x and y axes.coord_map
: Projects the plot onto a map projection.coord_polar
: Transforms the plot to a polar coordinate system.coord_quickmap
: Provides an approximation for a map projection.coord_sf
: Designed for use with sf objects (spatial data).coord_trans
: Transforms the plot’s x and y axes using specified transformations.The ggplot2
package contains latitude and longitude to define geographic boundaries
state
, usa
, world
, county
?map_data
or ?maps
for more regions (may need to install maps
)Rows: 15,537
Columns: 6
$ long <dbl> -87.46201, -87.48493, -87.52503, -87.53076, -87.57087, -87.5…
$ lat <dbl> 30.38968, 30.37249, 30.37249, 30.33239, 30.32665, 30.32665, …
$ group <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ region <chr> "alabama", "alabama", "alabama", "alabama", "alabama", "alab…
$ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
A set of latitude longitude points…
… that are connected with lines in a very specific order.
Add other geographic information by adding geometric layers to the plot
Add non-geographic information by altering the fill color for each state
Use geom = "polygon"
to treat states as solid shapes to add color
Incorporate numeric information using color shade or intensity
Incorporate categorical information using color hue
geom_polygon
connects the dots between lat (y
) and long (x
) points in a given group
. It connects start and end points which allows you to fill
a closed polygon shape
Rows: 51
Columns: 7
$ State <chr> "Alabama", "Alaska", "Arizona",…
$ `7-day avg. cases` <int> 0, 0, 0, 0, 128, 0, 0, 13, 0, 0…
$ `7-day avg. deaths` <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ Cases <chr> "1,659,936", "287,319", "2,486,…
$ Deaths <chr> "21,138", "1,457", "29,852", "1…
$ `7-day avg. hospitalizations` <int> 30, 4, 152, 30, 377, 92, 20, 83…
$ `7-day avg. hospitalizations per 100k` <dbl> 0.6, 0.6, 2.0, 1.0, 1.0, 1.0, 0…
Source: https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/
We need to add the covid info to the state polygon data set
# A tibble: 15,537 × 12
long lat group order region subregion x7_day_avg_cases x7_day_avg_deaths
<dbl> <dbl> <dbl> <int> <chr> <chr> <int> <int>
1 -87.5 30.4 1 1 alabama <NA> 0 0
2 -87.5 30.4 1 2 alabama <NA> 0 0
3 -87.5 30.4 1 3 alabama <NA> 0 0
4 -87.5 30.3 1 4 alabama <NA> 0 0
5 -87.6 30.3 1 5 alabama <NA> 0 0
# ℹ 15,532 more rows
# ℹ 4 more variables: cases <dbl>, deaths <dbl>,
# x7_day_avg_hospitalizations <int>,
# x7_day_avg_hospitalizations_per_100k <dbl>
library(viridis)
ggplot(covid_data) +
geom_polygon(aes(long, lat, group = group, fill = cases)) +
scale_fill_viridis_c(option = "viridis",
trans = "log10",
labels = scales::comma,
guide = guide_colorbar(title.position = "top")) +
labs(fill = "cases", title = "COVID-19 Cases by State") +
coord_map() + theme_map() +
theme(legend.position="right")
Uses color or shading of sub regions to visual data. Displays divided geographical areas or regions that are colored in relation to a numeric variable.
ACS <- read.csv("https://raw.githubusercontent.com/deepbas/statdatasets/main/ACS.csv")
ACS <- dplyr::filter(ACS, !(region %in% c("Alaska", "Hawaii"))) # only 48+D.C.
ACS$region <- tolower(ACS$region) # lower case (match states regions)
glimpse(ACS)
Rows: 49
Columns: 8
$ X <int> 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18,…
$ region <chr> "alabama", "arizona", "arkansas", "california", "colora…
$ PopSize <int> 4841164, 6728577, 2968472, 38654206, 5359295, 3588570, …
$ MedianAge <dbl> 38.6, 37.1, 37.7, 36.0, 36.4, 40.6, 39.6, 33.8, 41.6, 3…
$ PercentFemale <dbl> 51.5, 50.3, 50.9, 50.3, 49.8, 51.2, 51.6, 52.6, 51.1, 5…
$ BornInState <int> 3387845, 2623391, 1823628, 21194542, 2294446, 1981427, …
$ MedianIncome <int> 23527, 26565, 22787, 27772, 31325, 34124, 30648, 41160,…
$ PercentInState <dbl> 69.98, 38.99, 61.43, 54.83, 42.81, 55.21, 45.49, 36.72,…
Don’t need to merge ACS
and states
data!
30:00