DA Homework 4 - SOLUTION
library(readr)
library(dplyr)
library(ggplot2)
Task 1
Download the civil-rights-act.csv
from the Data section (it might make sense
to look at the description file as well). Using the data answer the questions
below (run regressions).
act <- read_csv('~/teaching/BME_adat/201617/data/civil-rights-act.csv')
## Error: '~/teaching/BME_adat/201617/data/civil-rights-act.csv' does not exist.
- Which of the two parties (democrats and republicans) were more
supportive for the Civil Rights Act? First, look at a scatter plot. It might
help to experiment with
jitter
to see the points better. (Hint: look at this.)
act %>% ggplot(aes(x = party, y = vote)) + geom_point()
## Error in eval(lhs, parent, parent): object 'act' not found
act %>% ggplot(aes(x = party, y = vote)) + geom_jitter()
## Error in eval(lhs, parent, parent): object 'act' not found
- Run a regression of the dummy of voting on the dummy of
party (note that
R
is automatically going to use a character variable in a regression as a dummy.). What are the shares of the two parties who voted for the act?
lm(data = act, vote ~ party)
## Error in is.data.frame(data): object 'act' not found
61.3% of Democrats, and 80.2% (61.3+18.9) of Republicans voted for the act.
- Which states (northern or southern) were more supportive for the act? What are the shares of the representatives in the two groups of states who voted for the act?
lm(data = act, vote ~ state)
## Error in is.data.frame(data): object 'act' not found
Northern states were more supportive for the act: almost 90% of the representatives voted for the act, whereas only 7.8% did from the southern states.
- When controlling for the state which party were more supportive for the act? How does it compare to what you found in task b)? How could you explain the difference?
lm(data = act, vote ~ party + state)
## Error in is.data.frame(data): object 'act' not found
If we control for state, Republicans are less supportive for the act. This is quite the opposite than what we found in task b). This could be explained by compositional effect (similar to what we found with Berkeley and discrimination). Southern states are less supportive for the act, and they are also more Democrat, whereas northern states are more Republican. When not controlling for state, we get that Republicans vote for the act. However, if we compare Republicans and Democrates from the same state, we see that they are less likely to vote for it.
act %>%
group_by(state) %>%
count(party)
## Error in eval(lhs, parent, parent): object 'act' not found
Task 2
Use the easyshare_sample.csv
for this task. This is a sample of the easyshare
project (you can read more about this here).
The Survey of Health, Ageing and Retirement in Europe is a multidisciplinary
panel survey targeting individuals above 50 years. Hungary participated once,
in the 4th wave. The data is about the Hungarian sample with only a few
variables: lm_status
is originally called as ep005_
, mbirth
as dn002_mod
.
You can read about the variables here.
The recall variables contain scores from a simple memory test: 10 simple words
are listed to the participants of the survey which they should repeat once
immediately (recall_1
) and once with some delay (recall_2
). The sum of these
two numbers (ranging from 0 to 20) form a great measure for the memory of the elderly.
- Know your data. Look at summaries, strange values, and try to clean them.
(Hint: negative values usually mean missing values, turn them into
NA
-s.)
easyshare <- read_csv('~/teaching/BME_adat/201617/data/easyshare_sample.csv')
## Error: '~/teaching/BME_adat/201617/data/easyshare_sample.csv' does not exist.
summary(easyshare)
## Error in summary(easyshare): object 'easyshare' not found
easyshare <- easyshare %>% mutate_all(funs(ifelse(. < 0, NA, .)))
## Error in eval(lhs, parent, parent): object 'easyshare' not found
summary(easyshare)
## Error in summary(easyshare): object 'easyshare' not found
- Create a new dummy variable which takes one for those who do not work (for whom
lm_status
not equal to 2). Create a new variable which is the total word recall.
easyshare <- easyshare %>%
mutate(
notworking = as.numeric(lm_status != 2),
twr = recall_1 + recall_2
)
## Error in eval(lhs, parent, parent): object 'easyshare' not found
- Look at whether the memory of those who are not working are worse than those who do. Run a regression which answers this question and interpret the coefficients.
lm(twr ~ notworking, easyshare)
## Error in is.data.frame(data): object 'easyshare' not found
Comparing two elderly, those who do not work recall on average 2.6 words less than those who do.
- Is there any other variable you may want to include the regression to get closer to the answer of the question whether working preserves ones memory? Include it in the regression and interpret your results.
Elderly who do not work are usually older, and might be less educated (educated people typically work longer). Older and less educated people might have worse memory anyway, independently of whether they actually work. Let’s control for these variables as well.
lm(twr ~ notworking + age + eduyears, easyshare)
## Error in is.data.frame(data): object 'easyshare' not found
Comparing two elderly with the same age and education, one who works and someone else who do not, the latter recalls on average a half word less. This difference is much smaller than what we found in the previous exercise.
Task 3
Collect the data you would like to use for your term project. Make a plot, or run a regression using that data, and interpret your results.
Task +1
Watch this video and collect 3 positive (or negative) points about the presentation.