Reference no: EM13856479
Instruction: The problem sets are designed to be difficult and very time-intensive, so plan ahead. The problem sets consists of solving theoretical problems and analyzing real data. You may discuss the questions with your classmates, but you are required to hand in your own independently written solutions. For problems that require you to use Stata submit independently written do-files, and log-files. No late work will be accepted and I do NOT accept any electronic copy. All the data necessary for the problem set is available under UBlearns.
Important: It is extremely important to write a clean well-commented program for transparency and replication purposes. In any empirical work, you should always be able to reproduce your result from raw data to support your claim.
What to hand in: Typed write-up answering the assigned questions and interpreting your findings, do-file, and log-file for problems that require you to use Stata. For questions involving data analysis, you will NOT get any credit if you do not provide a program code. You may NOT use Excel
1. Suppose the following equation describes the relationship between the average number of classes missed during a semester (missed) and the distance from school (distance, measure in miles) (Total 4 points):
missed = 3 + 0.2 distance
a. Sketch this line, being sure to label the axes. How do you interpret the intercept in this equation?
b. What is the average number of classes missed for someone who lives five miles away?
c. What is the difference in the average number of classes missed for someone who lives 10 miles away and someone who lives 20 miles away?
2. Use COLLDIS.dta for this problem. A detailed description of the data is given in COLLDIS_Description.pdf. This contains data from a random sample of high school seniors interviewed in 1980 and re-interviewed in 1986. In this exercise, you will use these data to investigate the relationship between the number of completed years of education for young adults and the distance from each student's high school to the nearest four-year college. (Proximity to college lowers the cost of education, so that students who live closer to a four-year college should, on average, complete more years of higher education.)
a. Run a regression of years of completed education (ed) on distance to the nearest college (dist), where dist is measured in tens of miles. (For example, dist = 2 means that the distance is 20 miles.) What is the estimated intercept? What is the estimated slope? Use the estimated regression to answer this question: How does the average value of years of completed schooling change when colleges are built close to where students go to high school?
b. Bob's high school was 20 miles from the nearest college. Predict Bob's years of completed education using the estimated regression. How would the prediction change if Bob lived 10 miles from the nearest college?
c. If the distance is measured in kilometers, what is your new estimation and interpretation of the result?
d. Beware the omitted variable. List five possible omitted variables. Are they all measurable? [Hint: Omitted variables from the regression may or may not be measurable by econometricians.]
3. Use CPS08.dta for this problem. A detailed description of the data is given in CPS08_Description.pdf. In this exercise, you will investigate the relationship between a worker's age and earnings. (Generally, older workers have more job experience, leading to higher productivity and earnings.
a. Report mean, median, and standard deviation of worker's age and earning.
b. Run a regression of average hourly earnings (AHE) on age (Age). What is the estimated intercept? What is the estimated slope? Use the estimated regression to answer this question: How much do earnings increase as workers age by 1 year?
c. Bob is a 26-year-old worker. Predict Bob's earnings using the estimated regression. Alexis is a 30-year-old worker. Predict Alexis's earnings using the estimated regression.
d. Does age account for a large fraction of the variance in earnings across individuals? Why?
4. Battery packs in electric go-carts need to last a fairly long time. The run-time (time until it needs to be recharged) of the battery packs made by a particular company are Normally distributed with a mean of 2 hours and a standard deviation of 20 minutes.
a. What percentage of these battery packs lasts longer than 3 hours? Show your work.
b. What is the third quartile for the run-time distribution? Show your work.
c. Battery packs that have a run-time in the highest 10% of the run-time distribution are highly sought after by go-cart drivers. How long does the battery pack have to last for it to fall in this highly sought-after class? Show your work.
5. In the language of government statistics, you are "in the labor force" if you are available for work and either working or actively seeking work. The unemployment rate is the proportion of the labor force (not of the entire population) who are unemployed. Here are data from the Current Population Survey (CPS) for the civilian population aged 25 years and over. The table entries are counts in thousands of people. You must show your work in answering the following questions.
Highest Education Total Population In Labor Force Employed
Did not finish high school 28,021 12,623 11,552
High school but no college 59,844 38,210 36,249
Some college, but no bachelor's degree 46,777 33,928 32,429
College graduate 51,568 40,414 39,250
b. Find the probabilities of the following events:
i. Enough sleep and not enough exercise
ii. Not enough sleep and enough exercise
iii. Not enough sleep and not enough exercise
iv. For each of parts i, ii, iii, states the rule that you used to find your answer.
8. Facebook provides a variety of statistic on their Web site that detail the growth and popularity of the site. One such statistic is that the average user has 130 friends. This distribution only takes integer values, so it is certainly not Normal. We will also assume it is skewed to the right with a standard deviation σ = 85. Consider a SRS of 30 Facebook users. You must show your work in answering the following questions.
a. What are the mean and standard deviation of the total number of friends in this sample?
b. What are the mean and standard deviation of the mean number of friends per user?
c. Use the central limit theorem to find the probability that the average number of friends in 30 Facebook users is greater than 140.
9. North Carolina State University posts the grade distribution for its courses online. Students in one section of English 210 in the Fall 2008 semester received 33% A's, 24% B's, 18% C's, 16% D's, and 9% F's. You must show your work in answering the following questions.
a. Using the common scale A=4, B=3, C=2, D=1, F=0, take X to be the grade of a randomly chosen English 210 students. Use the definition of the mean and standard deviation for discrete random variables to find the mean μ and the standard deviation σ of the grades in the course.
b. English 210 is a large course. We can take the grades of a simple random sample of 50 students to be independent of each other. If 5.¯is the average of these 50 grades, what are the mean and standard deviation of
c. What is the probability P(5.¯≥ 3) that the grade point average for 50 randomly chosen English 210 students is a B or better? (1 point)
10. A $1 bet in a state lottery's Pick 3 game pays $500 if the three-digit number you choose exactly matches the winning number, which is drawn at random. Here is the distribution of the payoff X:
Payoff X $0 $500
Probability 0.999 0.001
a. What are the mean and standard deviation of X?
b. Joe buys a Pick 3 ticket twice a week. What does the law of large numbers say about the average payoff Joe receives from his bets?
c. What does the central limit theorem say about the distribution of Joe's average payoff after 104 bets in a year?
d. Joe comes out ahead for the year if his average payoff is greater than $1(the amount he spent each day on a ticket). What is the probability that Joe ends the year head?
11. A selective college would like to have an entering class of 950 students. Because not all students who are offered admission accept, the college admits more than 950 students. Past experience shows that about 75% of the students admitted will accept. The college decides to admit 1,200 students. Assuming that students make their decisions independently, the number who accept has the B(1200,0.85) distribution. If this number is less than 950, the college will admit students from its waiting list. You must show your work in answering the following questions.
a. What are the mean and the standard deviation of the number X of students who accept?
b. Use the Normal approximation to find the probability that at least 800 students accept.
c. The college does not want more than 950 students. What is the probability that more than 950 will accept?
d. If the college decides to increase the number of admission offers to 1,300, what is the probability that more than 950 will accept?
12. Here is a simple probability model for multiple-choice tests. Suppose that each student has probability p of correctly answering a question chosen at random from a universe of possible questions. (A strong student has a higher p than a weak student.) The correctness of an answer to a question is independent of the correctness of answers to other questions. Jodi is a good student for whom p = 0.88. You must show your work in answering the following questions.
a. Use the Normal approximation to find the probability that Jodi scores 85% or lower on a 100-question test.
b. If the test contains 250 questions, what is the probability that Jodi will score 85% or lower?
c. How many questions must the test contain in order to reduce the standard deviation of Jodi's proportion of correct answers to half its value for a 100-item test?
d. Lisa is a weaker student for whom p = 0.72. Does the answer you gave in part c for the standard deviation of Jodi's score apply to Lisa's standard deviation also? Why or why not?
13. According to genetic theory, the blossom color in the second generation of a certain cross of sweet peas should be red or white in a 3:1 ratio. That is, each plant has probability ¾ of having red blossoms, and the blossom colors of separate plants are independent. Show your work.
a. What is the probability that exactly 9 out of 12 of these plants have red blossoms?
b. What is the mean number of red-blossomed plants when 120 plants of this type are grown from seeds?
c. What is the probability of obtaining at least 80 red-blossomed plants when 120 plans are grown from seeds?