STAT 220
Data science is the application of computational and statistical techniques to gain insight into some problem in the real world
\[ \begin{align*} \text{Data Science} &= \text{scientific inquiry } +\\ & \quad \text{ data collection } +\\ & \quad \text{ data processing } +\\ & \quad \text{ visualization } +\\ & \quad \text{ statistics } +\\ & \quad \text{ machine learning } +\\ & \quad \text{ communication } \end{align*} \]
Image adapted from work of Joe Blitzstein, Hanspeter Pfister, and Hadley Wickham
Focus on the “soup to nuts” approach to problem solving
Source Click here
.Rmd
documents during class
https://stat220-spring24.netlify.app
“R is written by statisticians, for statisticians,” — Norm Matloff, Author of The Art of R Programming, Prof. of Computer Science, UC Davis
Advantages of R over Python:
print()
, plot()
, summary()
help()
and example()
functions are much more informative than Python’s counterparts.Rmd
) integrates:
.Rmd
file produces various output formats
Source: Click here
.Rmd
, .r
, .csv
, etc.).R Markdown enhances the workflow by seamlessly integrating executable code with narrative text, making your data science projects reproducible and collaborative.
library(babynames)
your_name <- "Dee"
your_name_data <- babynames %>% filter(name == your_name)
ggplot(data=your_name_data, aes(x=year, y=prop)) +
geom_point(size = 3, alpha = 0.6) +
geom_line(aes(colour = sex), size = 1) +
scale_color_brewer(palette = "Set1") +
labs( x = 'Year',
y = stringr::str_c('Prop. of Babies Named ', your_name),
title = stringr::str_c('Trends in Names: ', your_name))
10:00