Data Assignment 1 - Importing Data
Instructions
Complete the following examples from ModernDive Chapter 4 - Data Importing and Tidy Data. Before beginning the assignment, create a GitHub repo called hpam7660_data1
. Then create a new RStudio project and link it to the new GitHub repo. Once you’ve done that, open a new Markdown document and give it a YAML header that includes the title “HPAM 7660 Data Assignment 1”, your name, the date, and “pdf_document” as the output format.
As you answer each of the following questions, be sure to include your R code and associated output in your Markdown document. Additionally, add a line or two describing what you’re doing in each code chunk.
Steps for Completing the Assignment
Load the following packages:
dplyr
,readr
,tidyr
,nycflights13
, andfivethirtyeight
. You may need to install these packages before loading if you haven’t done so already. Note, however, that it can cause problems to includeinstall.packages
commands in Markdown documents. So when creating your Markdown file, you’re fine to addlibrary
commands, but you should install any needed packages from the Console command line.Preview the
drinks
data frame from thefivethirtyeight
package using one of the methods we covered in Tutorial 2. (Hint: you should avoid using theView()
command here because Markdown won’t like it.)Using the help file for the
drinks
data frame, define each of the variables in the data.What does it mean for a dataset in R to by “tidy” and how does this relate to the concept of “wide” vs. “long” data?
Follow the example provided in ModernDive Section 4.2 and create a new data frame called
drinks_smaller
that subsets thedrinks
data frame and:- Includes only data from the U.S., China, Italy, and Saudi Arabia.
- Excludes the
total_litres_of_pure_alcohol
column. - Renames the variables “beer_servings”, “spirit_servings”, and “wine_servings” to “beer”, “spirits”, and “wine”.
Is the
drinks_smaller
data frame in “tidy” format, why or why not?Convert the
drinks_smaller
data frame to “tidy” format. Be sure to describe the commands you’re using for this conversion.Preview the new tidy version of the
drinks_smaller
data frame.Now do the same conversion and preview for the
airline_safety
data frame from thefivethirtyeight
package. (Hint: First you’ll need to get rid of some columns in the data frame (there’s an example in the chapter) and then you’ll want the tidy data frame to include the variablefatalities_years
that indicates the time period and the variablecount
that measures the fatality counts).
Thus far, we’ve only loaded data frames that have come bundled in R packages. Oftentimes, you’ll want to load data from other sources (e.g., websites, surveys, etc.) and in other formats (e.g., .csv, .xlsx, etc.). Let’s work through an example of loading data from a .csv file.
Load the
dem_score.csv
file from this link and save it in a data frame calleddem_score
. (Hint: refer to ModernDive Chapter 4.1.1 for code on importing a .csv file).Preview the data using one of the methods we covered in Tutorial 2. (Hint: you should avoid using the
View()
command here because Markdown won’t like it.)Is this data frame in “tidy” format? If yes, then explain why. If not, then convert the data to “tiny” format.
Once you’ve finished Step 12, knit your PDF document and push it to your GitHub repo. Make sure the document shows up in the repo, invite me to the repo, and you’re done!