Data Assignment 1 - Importing Data

Instructions

Complete the following examples from ModernDive Chapter 4 - Data Importing and Tidy Data. Before beginning the assignment, create a GitHub repo called hpam7660_data1. Then create a new RStudio project and link it to the new GitHub repo. Once you’ve done that, open a new Markdown document and give it a YAML header that includes the title “HPAM 7660 Data Assignment 1”, your name, the date, and “pdf_document” as the output format.

As you answer each of the following questions, be sure to include your R code and associated output in your Markdown document. Additionally, add a line or two describing what you’re doing in each code chunk.

Steps for Completing the Assignment

  1. Load the following packages: dplyr, readr, tidyr, nycflights13, and fivethirtyeight. You may need to install these packages before loading if you haven’t done so already. Note, however, that it can cause problems to include install.packages commands in Markdown documents. So when creating your Markdown file, you’re fine to add library commands, but you should install any needed packages from the Console command line.

  2. Preview the drinks data frame from the fivethirtyeight package using one of the methods we covered in Tutorial 2. (Hint: you should avoid using the View() command here because Markdown won’t like it.)

  3. Using the help file for the drinks data frame, define each of the variables in the data.

  4. What does it mean for a dataset in R to by “tidy” and how does this relate to the concept of “wide” vs. “long” data?

  5. Follow the example provided in ModernDive Section 4.2 and create a new data frame called drinks_smaller that subsets the drinks data frame and:

    1. Includes only data from the U.S., China, Italy, and Saudi Arabia.
    2. Excludes the total_litres_of_pure_alcohol column.
    3. Renames the variables “beer_servings”, “spirit_servings”, and “wine_servings” to “beer”, “spirits”, and “wine”.
  6. Is the drinks_smaller data frame in “tidy” format, why or why not?

  7. Convert the drinks_smaller data frame to “tidy” format. Be sure to describe the commands you’re using for this conversion.

  8. Preview the new tidy version of the drinks_smaller data frame.

  9. Now do the same conversion and preview for the airline_safety data frame from the fivethirtyeight package. (Hint: First you’ll need to get rid of some columns in the data frame (there’s an example in the chapter) and then you’ll want the tidy data frame to include the variable fatalities_years that indicates the time period and the variable count that measures the fatality counts).

Thus far, we’ve only loaded data frames that have come bundled in R packages. Oftentimes, you’ll want to load data from other sources (e.g., websites, surveys, etc.) and in other formats (e.g., .csv, .xlsx, etc.). Let’s work through an example of loading data from a .csv file.

  1. Load the dem_score.csv file from this link and save it in a data frame called dem_score. (Hint: refer to ModernDive Chapter 4.1.1 for code on importing a .csv file).

  2. Preview the data using one of the methods we covered in Tutorial 2. (Hint: you should avoid using the View() command here because Markdown won’t like it.)

  3. Is this data frame in “tidy” format? If yes, then explain why. If not, then convert the data to “tiny” format.

  4. Once you’ve finished Step 12, knit your PDF document and push it to your GitHub repo. Make sure the document shows up in the repo, invite me to the repo, and you’re done!