Homework 2
SAME AS PRACTICE EXERCISES OF THE LAST CLASS
TASK 6 IS UPDATED - THANKS TO ISTVAN FOR THE NOTICE
Due 7 October 24:00. Send the hw2_<your-last-name> file to divenyi.janos@phd.ceu.edu.
Task 0
Download the purchases.csv from the data section. Load it into R.
## Warning in file(file, "rt"): cannot open file '/home/divenyijanos/Dropbox/
## teaching/Programming_Tools/Fall2015/Data/purchases.csv': No such file or
## directory
## Error in file(file, "rt"): cannot open the connection
Task 1
Give the mean and the median of the individual purchases.
## Error in eval(lhs, parent, parent): object 'purchases' not found
Task 2
Tell R that your purchase_date variable is a date. You can do this by applying
the as.Date() function to the original variable (similar to how we can use
as.character()). Then you can get the median day of the purchases.
## Error in eval(lhs, parent, parent): object 'purchases' not found
## Error in eval(lhs, parent, parent): object 'purchases' not found
Task 3
List the 5 biggest buyer along with their aggregate purchases.
## Error in eval(lhs, parent, parent): object 'purchases' not found
Task 4
Plot the distributions of log sales amounts for the two years separately. For this you should have the year variable as factor.
## Error in eval(lhs, parent, parent): object 'purchases' not found
Task 5
List the number of buyers in each month by year. (Hint: you might need tidyr
for accomplishing this).
## Error in eval(lhs, parent, parent): object 'purchases' not found
Task 6
What share of total sales in 2013 comes from the top 5 buyers in 2013? You may
want to aggregate sales by contact first and then to use the cumsum() function
to calculate cumulative sums.
Hint: this table is an intermediate state you may want to achive.
## Error in eval(lhs, parent, parent): object 'purchases' not found
The correct answer you should get is (from the previous table)
## Error in eval(lhs, parent, parent): object 'middle' not found
Task 7
Plot the aggregate daily sales (you should combine dplyr and ggplot statements).
Note that you should have purchase_date as date instead of factor or character.
Add a smoothed line to the plot (you can experiment with the span option of
geom_smooth() to control the smoothness of your line).
Default:
## Error in eval(lhs, parent, parent): object 'purchases' not found
With span = 0.2:
## Error in eval(lhs, parent, parent): object 'purchases' not found
Task 8
Which month brings the most sales? Plot a bar graph with aggregate sales per
month. Look at the documentation of geom_bar() to solve this. Note the labels
of the x axis (the documentation helps to reproduce).
## Error in eval(lhs, parent, parent): object 'purchases' not found
Task 9
Recreate the previous graph by drawing the columns separately for the years (map the year variable to column and see the examples in the documentation to achieve side-by-side bars).
## Error in eval(lhs, parent, parent): object 'purchases' not found
Task 10
Plot a graph which gives you that to what share of all sales are the top x% of
buyers responsible. So a point at x = 0.5, y = 0.8 would tell 80% of all sales
come from the top 50% of buyers. (Hint: use your intermediate dataframe from
task 6.)
middle %>%
mutate(
id = 1,
cumulative_sales_share = cumulative_sales/all_sales,
cumulative_buyer_share = cumsum(id)/n()
) %>%
ggplot(aes(x=cumulative_buyer_share, y=cumulative_sales_share)) +
geom_line(size=2)
## Error in eval(lhs, parent, parent): object 'middle' not found