DA Homework 2 - SOLUTION
Task 0
Load dplyr
and the flights
data into R (you should install the nycflights13
package first).
library(dplyr)
library(nycflights13)
## Error in library(nycflights13): there is no package called 'nycflights13'
data(flights)
## Warning in data(flights): data set 'flights' not found
Look at the description of flights
data here
in the flights
section.
You can find a great tutorial about dplyr
here.
I highly recommend it.
Task 1
What was the average departure delay of the flights?
flights %>%
summarise(mean(dep_delay, na.rm = TRUE))
## Error in eval(lhs, parent, parent): object 'flights' not found
Task 2
What was the average departure delay of the flights landing in Anchorage (largest city in Alaska)?
flights %>%
filter(dest == 'ANC') %>%
summarise(mean(dep_delay, na.rm = TRUE))
## Error in eval(lhs, parent, parent): object 'flights' not found
Task 3
Where did the plane with the largest departure delay fly?
flights %>%
filter(dep_delay == max(dep_delay, na.rm = TRUE)) %>%
select(dest)
## Error in eval(lhs, parent, parent): object 'flights' not found
Another solution using arrange()
:
flights %>%
arrange(-dep_delay) %>%
select(dest) %>%
head(1)
## Error in eval(lhs, parent, parent): object 'flights' not found
Task 4
What is the average departure delay by carrier? Which carrier delays the most? Which does the least?
flights %>%
group_by(carrier) %>%
summarise(avg_delay = mean(dep_delay, na.rm = TRUE)) %>%
arrange(-avg_delay)
## Error in eval(lhs, parent, parent): object 'flights' not found
Task 5
From which airport fly the most planes to Boston?
flights %>%
filter(dest == "BOS") %>%
group_by(origin) %>%
summarise(n=n()) %>%
arrange(-n)
## Error in eval(lhs, parent, parent): object 'flights' not found
Another solution using count()
:
flights %>%
filter(dest == "BOS") %>%
count(origin) %>%
arrange(-n)
## Error in eval(lhs, parent, parent): object 'flights' not found
Task 6
Give the destinations of the flights with the largest arrival delays by month.
flights %>%
group_by(month) %>%
filter(arr_delay == max(arr_delay, na.rm = TRUE)) %>%
arrange(month) %>%
select(dest)
## Error in eval(lhs, parent, parent): object 'flights' not found
Another solution using top_n()
:
flights %>%
group_by(month) %>%
top_n(1, arr_delay) %>%
arrange(month) %>%
select(month, dest, arr_delay)
## Error in eval(lhs, parent, parent): object 'flights' not found
Task 7
Get the destinations of the three planes which arrived with the largest arrival delay relative to the distance.
flights %>%
mutate(
relative_delay = arr_delay / distance
) %>%
arrange(desc(relative_delay)) %>%
select(tailnum, relative_delay) %>%
head(3)
## Error in eval(lhs, parent, parent): object 'flights' not found
Task 8
Which of the airports is associated with the largest departure delays on average? Is this ordering the same for each month?
flights %>%
group_by(origin) %>%
summarise(delay = mean(dep_delay, na.rm=TRUE)) %>%
arrange(desc(delay))
## Error in eval(lhs, parent, parent): object 'flights' not found
flights %>%
group_by(origin, month) %>%
summarise(delay = mean(dep_delay, na.rm=TRUE)) %>%
group_by(month) %>%
mutate(max_delay = max(delay)) %>%
filter(delay == max_delay) %>%
select(month, origin) %>%
arrange(month)
## Error in eval(lhs, parent, parent): object 'flights' not found
Task +1
Watch this video and collect 3 positive (or negative) points about the presentation.