STAT 220
Data science is the application of computational and statistical techniques to gain insight into some problem in the real world
\[ \begin{align*} \text{Data Science} &= \text{scientific inquiry } +\\ & \quad \text{ data collection } +\\ & \quad \text{ data processing } +\\ & \quad \text{ visualization } +\\ & \quad \text{ statistics } +\\ & \quad \text{ machine learning } +\\ & \quad \text{ communication } \end{align*} \]
Image adapted from work of Joe Blitzstein, Hanspeter Pfister, and Hadley Wickham
Focus on the “soup to nuts” approach to problem solving
Rate my professor reviews
Source Click here
.Rmd
documents during class
https://stat220-spring24.netlify.app
“R is written by statisticians, for statisticians,” — Norm Matloff, Author of The Art of R Programming, Prof. of Computer Science, UC Davis
Advantages of R over Python:
print()
, plot()
, summary()
help()
and example()
functions are much more informative than Python’s counterparts.Rmd
) integrates:
.Rmd
file produces various output formats
Source: Click here
.Rmd
, .r
, .csv
, etc.).R Markdown enhances the workflow by seamlessly integrating executable code with narrative text, making your data science projects reproducible and collaborative.
library(babynames)
your_name <- "Dee"
your_name_data <- babynames %>% filter(name == your_name)
ggplot(data=your_name_data, aes(x=year, y=prop)) +
geom_point(size = 3, alpha = 0.6) +
geom_line(aes(colour = sex), size = 1) +
scale_color_brewer(palette = "Set1") +
labs( x = 'Year',
y = stringr::str_c('Prop. of Babies Named ', your_name),
title = stringr::str_c('Trends in Names: ', your_name))
10:00