Homework 2

SAME AS PRACTICE EXERCISES OF THE LAST CLASS

TASK 6 IS UPDATED - THANKS TO ISTVAN FOR THE NOTICE

Due 7 October 24:00. Send the hw2_<your-last-name> file to divenyi.janos@phd.ceu.edu.

Task 0

Download the purchases.csv from the data section. Load it into R.

## Warning in file(file, "rt"): cannot open file '/home/divenyijanos/Dropbox/
## teaching/Programming_Tools/Fall2015/Data/purchases.csv': No such file or
## directory
## Error in file(file, "rt"): cannot open the connection

Task 1

Give the mean and the median of the individual purchases.

## Error in eval(lhs, parent, parent): object 'purchases' not found

Task 2

Tell R that your purchase_date variable is a date. You can do this by applying the as.Date() function to the original variable (similar to how we can use as.character()). Then you can get the median day of the purchases.

## Error in eval(lhs, parent, parent): object 'purchases' not found
## Error in eval(lhs, parent, parent): object 'purchases' not found

Task 3

List the 5 biggest buyer along with their aggregate purchases.

## Error in eval(lhs, parent, parent): object 'purchases' not found

Task 4

Plot the distributions of log sales amounts for the two years separately. For this you should have the year variable as factor.

## Error in eval(lhs, parent, parent): object 'purchases' not found

Task 5

List the number of buyers in each month by year. (Hint: you might need tidyr for accomplishing this).

## Error in eval(lhs, parent, parent): object 'purchases' not found

Task 6

What share of total sales in 2013 comes from the top 5 buyers in 2013? You may want to aggregate sales by contact first and then to use the cumsum() function to calculate cumulative sums.

Hint: this table is an intermediate state you may want to achive.

## Error in eval(lhs, parent, parent): object 'purchases' not found

The correct answer you should get is (from the previous table)

## Error in eval(lhs, parent, parent): object 'middle' not found

Task 7

Plot the aggregate daily sales (you should combine dplyr and ggplot statements). Note that you should have purchase_date as date instead of factor or character. Add a smoothed line to the plot (you can experiment with the span option of geom_smooth() to control the smoothness of your line).

Default:

## Error in eval(lhs, parent, parent): object 'purchases' not found

With span = 0.2:

## Error in eval(lhs, parent, parent): object 'purchases' not found

Task 8

Which month brings the most sales? Plot a bar graph with aggregate sales per month. Look at the documentation of geom_bar() to solve this. Note the labels of the x axis (the documentation helps to reproduce).

## Error in eval(lhs, parent, parent): object 'purchases' not found

Task 9

Recreate the previous graph by drawing the columns separately for the years (map the year variable to column and see the examples in the documentation to achieve side-by-side bars).

## Error in eval(lhs, parent, parent): object 'purchases' not found

Task 10

Plot a graph which gives you that to what share of all sales are the top x% of buyers responsible. So a point at x = 0.5, y = 0.8 would tell 80% of all sales come from the top 50% of buyers. (Hint: use your intermediate dataframe from task 6.)

middle %>%
    mutate(
        id = 1,
        cumulative_sales_share = cumulative_sales/all_sales,
        cumulative_buyer_share = cumsum(id)/n()
    ) %>%
    ggplot(aes(x=cumulative_buyer_share, y=cumulative_sales_share)) +
    geom_line(size=2)
## Error in eval(lhs, parent, parent): object 'middle' not found