Reference no: EM13371062
Significance Testing. T-Tests
1. In the course of the thesis work, a student develops a new approach for the solution of a prob-lem (here referred to as method B). The current state-of-the-art approach, method A, is well published in the literature and has been applied to a large standard problem set where its aver-age performance was discovered to be (and published in the main paper by the developers as) 5 with a standard deviation of 3 across the different problems in the problem set. In addition to the publication, the developers of method A also provide their code for anyone to be able to experiment with and the student decides to pick a random set of 15 problems from the standard problem set and apply both methods to these problems, resulting in the following performance numbers for method A. { 5 ,2 ,8 ,6 ,10 ,1 ,4 ,3 ,9 ,2 ,3 ,7,4 ,1 ,8 }, and the following performance numbers for the student's method B. {6 ,4 ,7 ,8 ,14 ,1 ,5 ,2 ,11 ,4 ,6 ,8 ,4 ,2 ,10 } . Looking at this data, the student discovers that it seems that method B outperforms method A and sets out to prove this using significance testing with a two-tailed 5% significance threshold. Given that both published performance results as well as the student's experimental results are available, a number of tests can be performed.
a) Use the standard t-test with the published results to evaluate the results in terms of the hypothesis that method B has a higher performance than method A. List all the steps (and formulas) involved in the test and what the result implies for the significance of the hypothesis.
b) Use the two-sample t-test with the student's results to evaluate the validity of the same hypothesis as in part a) . Again list all the steps (and formulas) involved in the test and what the result implies for the significance of the hypothesis.
c) Perform a paired-sample t-test with the student's results to perform another significance test for the same hypothesis as in part a) . Again list all the steps (and formulas) involved in the test and what the result implies for the significance of the hypothesis.
d) Discuss the difference between the tests in terms of their results and their assumptions. What do these results tell us about the application of the different tests and what does it tell us about the problem (and problem set) that the experiments were performed on (in terms of the relation between specific problems and the performance measure).
2. Performing a study on the development of body height, a student randomly measures the height of 20 persons in country A. The results turn out to be. { 1 .7 ,1 .6 ,1 .8 ,1 .9 ,1 .75 ,1 .83 ,1 .82 ,1 .65 , 1 .95 , 1.69 ,1 .82 ,1 .87 ,1 .65 ,1 .54 ,1 .98 ,1 .78 ,1 .69 ,1 .75 ,1 .62 ,1 .64 } . In the literature, a result is found that 20 years before, the average height of persons in country A was determined to be 1 .72 . Given that the acceptable threshold for significance is 5% , can this data be used to show that the average height of individuals in country A has increased in the last 20 years? (Show your calculations).
Comparing Distributions
3. Consider a sensor system for which we know that the sensor noise is normally distributed (and thus that an actual reading is taken from a normal distribution). Given an existing sensor with a known mean reading of µ = 2 .5 and a standard deviation of 0 .3 , we want to compare a new sensor to it. For the new sensor we take 10 measurements.:{2 ,3 .2 ,2 .7 ,2 .1 ,2 .8 ,3 .0 ,1 .8 ,2 .6 ,2 .2 ,2 .5 g}. Given this data we want to show that the sensors generate different data and that the new sensor is more reliable (i.e. has noise with a lower variance).
a) Evaluate whether the mean of the data sample from the new sensor is significantly different from that of data samples obtained from the original sensor. Include your calculations and the significance scores.
b) To evaluate the reliability of the new sensor, evaluate the hypothesis that the new sensor has a lower variance than the original sensor.
4. Consider a similar scenario as in problem 3 . where we have two distance sensors, A, and B, that have sensor noise that is normally distributed. Assume that the average sensor reading for both sensors is the actual distance and that we have a setup where both sensors can be applied to the same distance, and that we know that the first sensor has a standard deviation of 0 .3 . Given this, we perform a set of 10 experiments (with unknown, varying distance in each experiment) for which we get the following readings from sensor A:{2 .1 ,3 .5 ,5 .7 ,4 .2 ,8 .9 ,4 .2 ,12 .5 ,7 .4 ,9 .2 ,4 .8 } , and from sensor B: {1 .8 , 2 .5 ,6 .1 ,4 .0 ,9 .4 ,4 .7 ,11 .7 ,6 .8 ,9 .7 ,5 .1 } . Expand the principle of the paired-sample test that was covered for the t-test to the 2 test for variance and evaluate the hypothesis that the two sensors have different amounts of noise (i.e. that the second sensor does not have the same variance as the 0 .3 of the first sensor). Explain your rationale for this test and how you arrived at the distribution for the null hypothesis. Note. since in a paired test you are looking at the difference of two data items you have to be careful to correctly model the resulting distribution.
5. We want to evaluate the runtime performance of a randomized pattern recognition algorithm by comparing it to a known algorithm that has an average runtime performance of 25 with a standard deviation of 5 . To do this comparison, we pick 10 images that we can apply the algorithm to and run the randomized algorithm 15 times on each, resulting in the following 10 average run-time performances (one average per image):{23 .8 ,25 .1 ,24 .2 ,24 .6 ,25 .2 ,24 .1 ,23 .9 ,24 .4 ,24 .9 ,24 .3}.
a) Evaluate the significance of the hypothesis that the randomized algorithm has a lower average runtime than the known comparison algorithm. As before, explain and list your calculations, test choices, and conclusion drawn.
b) Evaluate whether the hypothesis that the randomized algorithm has a lower variance in the runtime than the comparison algorithm has statistically significant support in the data. As before, explain and list your calculations, test choices, and conclusion drawn.