About

Course overview

Greetings and welcome to Introduction to Data Science! In this course, we will delve into the computational aspects of data analysis, covering topics such as data acquisition, management, and visualization tools. Throughout this course, we will emphasize the principles of data-scientific, reproducible research and dynamic programming, utilizing the R/RStudio ecosystem.

If you have taken Stat 120, 230, or 250 at Carleton, you will find yourself well-equipped to handle the material. However, it is important to refresh your R and R-markdown skills before the start of the class. Specifically, I expect all students to be able to load a data set into R, calculate basic summary statistics, and perform basic exploratory data analysis. In the first week of class, we will delve into Git and GitHub version control, though prior exposure to these topics is not necessary.

Learning Objectives

Develop research questions that can be answered by data. Import/scrape data into R and reshape it to the form necessary for analysis.

Manipulate common types of data, including numeric, categorical (factors), text, date-times, geo-location variables in order to provide insight into your data and facilitate analysis.

Explore data using both graphical and numeric methods to provide insight and uncover relationships/patterns.

Utilize fundamental programming concepts such as iteration, conditional execution, and functions to streamline your code.

Build, tune, use, and evaluate basic statistical learning models to uncover clusters and classify observations.

Draw informed conclusions from your data and communicate your findings using both written and interactive platforms.