Assignment Document

Statistical Tests Performing a x2-Test (I)

Pages:

Preview:


  • "Statistical Tests 12 Performing a ? -Test (I) ? In principle? A statistical test comparing the relative frequencies for theintervals/bins in a histogram with the theoretical probabilities ofthe chosen distribution • Assumptions – The distribution in..

Preview Container:


  • "Statistical Tests 12 Performing a ? -Test (I) ? In principle? A statistical test comparing the relative frequencies for theintervals/bins in a histogram with the theoretical probabilities ofthe chosen distribution • Assumptions – The distribution involves k parameters estimated from the sample – The sample contains n observations (sample size=n) – F (x) denotes the chosen/hypothesized CDF0 Data: x , x , …, x Model: X , X ,…, X 1 2 n 1 2 n (n observations from the real(Random variables, independent andsystem) identically distributed with CDF F(x)) Null hypothesis H : F(x) = F (x) 0 0 Alternative hypothesis H : F(x)? F (x) A 0 22 Performing a ? -Test (II) 1. Take the entire data range and divide it into r nonoverlapping intervals or bins f (x) 0 The area = p = F (a ) - F (a ) 2 0 2 0 1 Data values Min=a a a a a a a =Max 0 1 2 3 … r-2 r-1 r Bin: 1 2 3 r-1 r • p = The probability that an observation X belongs to bin ii? The Null Hypothesis? p = F (a ) - F (a ) i 0 i 0 i-1 • To improve the accuracy of the test – choose the bins (intervals) so that the probabilities p (i=1,2, …r)i are equal for all bins 32 Performing a ? -Test (III) 2. Define r random variables O , i=1, 2, …r i – O =number of observations in bin i (= the interval (a , a ]) i i-1 i – If H is true ? the expected value of O = n*p 0 i i • O is Binomially distributed with parameters n and p i i 3. Define the test variable T 2 r ?O ? n ? p ? i i T ? ? n ? p i ?1 i 2 – If H is true? T follows a ? (r-k-1) distribution 0– T = The critical value of T corresponding to a significance level? ? 2 obtained from a ? (r-k-1) distribution table – T = The value of T computed from the data material obs? If T > T ? H can be rejected on the significance level? obs ? 0 42 Validity of the ? -Test • Depends on the sample size n and on the bin selection(the size of the intervals) • Rules of thumb 2 – The? -test is acceptable for ordinary significance levels ( ?=1%,5%) if the expected number of observations in each interval isgreater than 5 (n*p >5 for all i) i – In the case of continuous data and a bin selection such that p isi equal for all bins? 2 ? n ?20 ? Do not use the? -test ? 20<n ?50 ? 5-10 bins recommendable ? 50<n ?100 ? 10-20 bins recommendable 0.5? n >100 ? n – 0.2n bins recommendable 5Example– Modeling Interarrival Times (IV) • Hypothesis – the interarrival time Y is Exp(0.084) distributed H : Y ?Exp(0.084) 0 H : Y ?Exp(0.084) A • Bin sizes are chosen so that the probability p is equal for all ri bins and n*p >5 for all i i – Equal p ? p =1/r i i – n*p >5? n/r > 5? r<n/5 i – n=50? r<50/5=10? Choose for example r=8? p =1/8 i • Determining the interval limits a , i=0,1,…8 i ln(1 ? i*p ) ?0.084*a i ?0.084*a i i ? i*p ?1 ? e ? a ? H ? F(a ) ? 1 ? e i i 0 i ? 0.084 i=1 ? a =ln(1-(1/8))/(-0.084)=1.590 1 i=2 ? a =ln(1-(2/8))/(-0.084)=3.425 2 ? i=8 ? a =ln(1-(8/8))/(-0.084)= ? 8 6Example– Modeling Interarrival Times (V) • Computing the test statistic T obs 2 Note: 8 ?o ? 50/8 ? i T ? ? 39.6 o = the actual number of? obs i 50/8 i ?1 observations in bin i • Determining the critical value T ? 2 2 – If H is true ? T ?? (8-1-1)= ? (6) 02 – If?=0.05? P(T ? T )=1- ?=0.95? / ? table/ ? T =12.60 0.05 0.05 • Rejecting the hypothesis – T =39.6>12.6= T obs 0.05 ? H is rejected on the 5% level 07The Kolmogorov-Smirnov test (I) • Advantages over the chi-square test + Does not require decisions about bin ranges + Often applied for smaller sample sizes • Disadvantages – Ideally all distribution parameters should be known with certaintyfor the test to be valid ? A modified version based on estimated parameter values existfor the Normal, Exponential and Weibull distributions ? In practice often used for other distributions as well 2 – For samples with n ?30 the? -test is more reliable! 8The Kolmogorov-Smirnov test (II) • Compares an empirical “relative-frequency” CDF with thetheoretical CDF (F(x)) of a chosen (hypothesized) distribution – The empirical CDF = F (x) = (number of x ?x)/n n i ? n=number of observations in the sample th ? x =the value of the i smallest observation in the sample i ? F (x )=i/n n i • Procedure 1. Order the sample data from the smallest to the largest value + – + – 2. Compute D , D and D = max{D , D } i i ?1 ? ? ? ? ? ? D ? max ? F(x ) D ? max F(x ) ? i i ? ? ? ? 1 ?i ?n 1 ?i ?n n n ? ? ? ? 3. Find the tabulated critical KS value corresponding to the sample size nand the chosen significance level, ? 4. If the critical KS value? D? reject the hypothesis that F(x) describesthe data material’s distribution 9Distribution Choice in Absence of Sample Data • Common situation especially when designing new processes – Try to draw on expert knowledge from people involved in similar tasks ? When estimates of interval lengths are available – Ex. The service time ranges between 5 and 20 minutes ? Plausible to use a Uniform distribution with min=5 and max=20 ? When estimates of the interval and most likely value exist – Ex. min=5, max=20, most likely=12 ? Plausible to use a Triangular distribution with those parameter values ? When estimates of min=a, most likely=c, max=b and theaverage value=x-bar are available ? Use a?-distribution with parameters ? and? ? ?b ? x ? (x ? a)(2c ? a ? b) ? ? ? ? (x ? a) (c ? x)(b ? a) 10"

Why US?

Because we aim to spread high-quality education or digital products, thus our services are used worldwide.
Few Reasons to Build Trust with Students.

128+

Countries

24x7

Hours of Working

89.2 %

Customer Retention

9521+

Experts Team

7+

Years of Business

9,67,789 +

Solved Problems

Search Solved Classroom Assignments & Textbook Solutions

A huge collection of quality study resources. More than 18,98,789 solved problems, classroom assignments, textbooks solutions.

Scroll to Top