DA Homework 2 - SOLUTION

Task 0

Load dplyr and the flights data into R (you should install the nycflights13 package first).

library(dplyr)
library(nycflights13)
## Error in library(nycflights13): there is no package called 'nycflights13'
data(flights)
## Warning in data(flights): data set 'flights' not found

Look at the description of flights data here in the flights section.

You can find a great tutorial about dplyr here. I highly recommend it.

Task 1

What was the average departure delay of the flights?

flights %>%
    summarise(mean(dep_delay, na.rm = TRUE))
## Error in eval(lhs, parent, parent): object 'flights' not found

Task 2

What was the average departure delay of the flights landing in Anchorage (largest city in Alaska)?

flights %>%
    filter(dest == 'ANC') %>%
    summarise(mean(dep_delay, na.rm = TRUE))
## Error in eval(lhs, parent, parent): object 'flights' not found

Task 3

Where did the plane with the largest departure delay fly?

flights %>%
    filter(dep_delay == max(dep_delay, na.rm = TRUE)) %>%
    select(dest)
## Error in eval(lhs, parent, parent): object 'flights' not found

Another solution using arrange():

flights %>%
    arrange(-dep_delay) %>%
    select(dest) %>%
    head(1)
## Error in eval(lhs, parent, parent): object 'flights' not found

Task 4

What is the average departure delay by carrier? Which carrier delays the most? Which does the least?

flights %>%
    group_by(carrier) %>%
    summarise(avg_delay = mean(dep_delay, na.rm = TRUE)) %>%
    arrange(-avg_delay)
## Error in eval(lhs, parent, parent): object 'flights' not found

Task 5

From which airport fly the most planes to Boston?

flights %>%
    filter(dest == "BOS") %>%
    group_by(origin) %>%
    summarise(n=n()) %>%
    arrange(-n)
## Error in eval(lhs, parent, parent): object 'flights' not found

Another solution using count():

flights %>%
    filter(dest == "BOS") %>%
    count(origin) %>%
    arrange(-n)
## Error in eval(lhs, parent, parent): object 'flights' not found

Task 6

Give the destinations of the flights with the largest arrival delays by month.

flights %>%
    group_by(month) %>%
    filter(arr_delay == max(arr_delay, na.rm = TRUE)) %>%
    arrange(month) %>%
    select(dest)
## Error in eval(lhs, parent, parent): object 'flights' not found

Another solution using top_n():

flights %>%
    group_by(month) %>%
    top_n(1, arr_delay) %>%
    arrange(month) %>%
    select(month, dest, arr_delay)
## Error in eval(lhs, parent, parent): object 'flights' not found

Task 7

Get the destinations of the three planes which arrived with the largest arrival delay relative to the distance.

flights %>%
    mutate(
        relative_delay = arr_delay / distance
    ) %>%
    arrange(desc(relative_delay)) %>%
    select(tailnum, relative_delay) %>%
    head(3)
## Error in eval(lhs, parent, parent): object 'flights' not found

Task 8

Which of the airports is associated with the largest departure delays on average? Is this ordering the same for each month?

flights %>%
    group_by(origin) %>%
    summarise(delay = mean(dep_delay, na.rm=TRUE)) %>%
    arrange(desc(delay))
## Error in eval(lhs, parent, parent): object 'flights' not found
flights %>%
    group_by(origin, month) %>%
    summarise(delay = mean(dep_delay, na.rm=TRUE)) %>%
    group_by(month) %>%
    mutate(max_delay = max(delay)) %>%
    filter(delay == max_delay) %>%
    select(month, origin) %>%
    arrange(month)
## Error in eval(lhs, parent, parent): object 'flights' not found

Task +1

Watch this video and collect 3 positive (or negative) points about the presentation.