Reference no: EM132857522
Consider the following example. In major league baseball in 1989, Andy Van Slyke had a batting average of 0.237, and Dave Justice's batting average was 0.235. (A higher batting average means that a player had more base hits per time at bat). In 1990, Van Slyke batted 0.284 and Justice batted 0.282. Van Slyke had the higher batting average two years in a row. But when the two years are combined, Van Slyke's batting average is only 0.261, while Justice's average is 0.278.
Year 1889 1990
Van Slyke 0.237 0.284
Justice 0.235 0.282
The formula for batting average is number of hits/number of times at bat
To combine the batting averages over two years, you'd need to divide the total number of hits by the total number of at-bats. So Justice's average over both years is not simply (0.235 +0.282)/2 because this does not account for the total number of times he was at bat in both years.
1) What is the lurking variable in the situation and why? Think about which batting statistics represent the aggregate data and which statistics represent the grouped data. How are the data grouped to produced the disaggregate statistics (statistics based on the data broken down into groups).
2) Who is the better hitter, Dave Justice or Andy Van Slyke? On what statistics do you base your assessment, and why?