STAT 220
The cat()
or writeLines()
function displays a string as it is represented inside R.
Definition: pulling apart some text or string
to do something with it
The most common tasks in string processing include:
Regular expressions are a language for expressing patterns in strings
stringr
packagestr_
and take the string as the first argument
stringr
cheatsheet
str_length()
tells you how many characters are in each entry of a character vector
str_count()
counts the number of non-overlapping matches of a pattern in each entry of a character vector
str_glue()
allows one to interpolate strings and values that have been assigned to names in R
str_sub()
Extract and replace substrings from a character vector
ca11-yourusername
repository from Github15:00
"or"
\\n
to match a newline character\\s
to match white space characters (spaces, tabs, and newlines)\\w
to match alphanumeric characters (letters and numbers)
[:alnum:]
\\d
to represent digits (numbers)
[:digit:]
Click here for extensive lists
stringr
cheatsheet
^
= start of a string$
= end of a string.
= any character*
= matches the preceding character any number of times+
= matches the preceding character once?
= matches the preceding character at most once (i.e. optionally)Try more regexes here
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE
str_extract()
Extract just the part of the string matching the specified regex instead of the entire entry
str_split()
splits a string into a list or matrix of pieces based on a supplied pattern
str_replace()
Replaces the first instance of the detected pattern with a specified string.
# A tibble: 51 × 4
state population total murder_rate
<chr> <chr> <chr> <dbl>
1 Alabama 4,853,875 348 7.2
2 Alaska 737,709 59 8
3 Arizona 6,817,565 309 4.5
4 Arkansas 2,977,853 181 6.1
5 California 38,993,940 1,861 4.8
6 Colorado 5,448,819 176 3.2
7 Connecticut 3,584,730 117 3.3
8 Delaware 944,076 63 6.7
9 District of Columbia 670,377 162 24.2
10 Florida 20,244,914 1,041 5.1
# ℹ 41 more rows
murders %>%
mutate(population = str_replace_all(population, ",", ""),
total = str_replace_all(total, ",", ""))
# A tibble: 51 × 4
state population total murder_rate
<chr> <chr> <chr> <dbl>
1 Alabama 4853875 348 7.2
2 Alaska 737709 59 8
3 Arizona 6817565 309 4.5
4 Arkansas 2977853 181 6.1
5 California 38993940 1861 4.8
6 Colorado 5448819 176 3.2
7 Connecticut 3584730 117 3.3
8 Delaware 944076 63 6.7
9 District of Columbia 670377 162 24.2
10 Florida 20244914 1041 5.1
# ℹ 41 more rows
murders %>%
mutate(population = str_replace_all(population, ",", ""),
total = str_replace_all(total, ",", "")) %>%
mutate_at(vars(2:3), as.double)
# A tibble: 51 × 4
state population total murder_rate
<chr> <dbl> <dbl> <dbl>
1 Alabama 4853875 348 7.2
2 Alaska 737709 59 8
3 Arizona 6817565 309 4.5
4 Arkansas 2977853 181 6.1
5 California 38993940 1861 4.8
6 Colorado 5448819 176 3.2
7 Connecticut 3584730 117 3.3
8 Delaware 944076 63 6.7
9 District of Columbia 670377 162 24.2
10 Florida 20244914 1041 5.1
# ℹ 41 more rows
15:00