Chi-square test : A most commonly used test for qualitative data analysis
Chi-square test : A most commonly used test for qualitative data analysis
Renuka Mewade1, Sonika Babhachade1, Asit Jain2, Mohan Singh Thakur2
College of Veterinary Science and Animal Husbandry, Jabalpur, Madhya Pradesh.
1Department of Microbiology, 2 Department of Animal Genetics and Breeding.
*Corresponding author email: renu.mewade786@gmail.com
Abstract
In many cases, making assumptions about the population distribution from which samples are drawn is not feasible. To address problems in such situations, non-parametric tests like the chi-square test are used. The chi-square test is distribution-free and is one of the most commonly used non-parametric tests in biological experiments. It is calculated based on sample frequencies and is applied exclusively to qualitative traits. The chi-square test is closely associated with data enumeration. This article covers the applications of the chi-square test, its assumptions, data calculation, certain properties, precautions, and conclusions.
Keywords: non-parametric, qualitative variables, cramers V strength test.
Introduction
Karl Pearson first developed the chi-square test in 1900 so this test is also called as Pearson Chi-square test. He applied this test to evaluate the goodness of fit in frequency curves. In 1904, he expanded its use to contingency tables to examine the independence between rows and columns. It is a non-parametric test which does not require any assumption about the parameters and such test are distribution free tests. It is a test that quantifies the extent of the difference between observed frequencies and expected frequencies based on certain assumptions. It is a relatively simple statistical test requiring small amount of computation. The distribution of data may be skewed or kurtotic. Cramer's V is a measure used to determine the strength of a significant difference found in the chi-square test.
Types of chi-square test :-
-test used for assesing goodness of fit between the observed and expected frequencies.
-test for association or independence of variables especially those of nominal in nature.
-test for population variance.
Assumptions :-
The sample size should be >50.
The expected frequency should never be <5 as it may reduce the degree of freedom.
The distribution should be in the original units, it should never be in a proportion or in a percentage.
Additive property of Chi-square test :-
This property is utilized when multiple studies in the same field are conducted, allowing the results to be combined to gain a more accurate understanding of the actual situation.
Calculation:-
Chi-Square test for goodness of fit :-
If the observed values are approximately equal to the expected value under a hypotheisis then it is said to be a fit for goodness.
If observed frequency is not equal to expected frequency i.e, non significant then it is said to be not fit for goodness.
The chi-square test is given as where O =Observed Frequency , E= Expected frequency at n-1 degree of freedom.
Chi-Square for the independent qualitative variables-
Here in this type of chi-square calculation there should be no association between the variables i.e, the variables should be independent.
Chi-square test is given as Row total Column total ,where N is Overall total
N
Example:-
Upon Inoculation of mastitis milk sample on MacConkey Lactose Agar, following colony morphological characteristics of bacteria can be assumed.
Table 1
Colony Characteristics Observed no.of colonies Expected no. of colonies Total
Pink round colonies 52 (a) 60 (b) 112 (a+b)
Pink irregular colonies 47 (c) 55 (d) 102(c+d)
Total 99(a+c) 115(b+d) 214 (a+b+c+d=n)
= Row total Column total
N
= (ad-bc)
(a+b) (c+d) (a+c) (b+d)
= (2860- 2820)
112+102+99+115
= 3.73
3. Chi-square test for determining population variance:-
Here, the chi-square test is used to ascertain whether the variance in the population could be a specified value.
Chi-square = ns where, n= sample size, s= sample variance, = population variance.
Applications of Chi-Square distribution:-
To determine whether the population has specified value of variance.
Can also be used for hypothesis testing at certain degree of freedom.
Additive property of Chi-square test:-
This property is used if number of samples studies conducted in a same field then the results can be pooled together for obtaining an accurate idea about the real position.
Precaution in the use of Chi-square test:-
If the frequency is less than 5 then it should be pooled with neighbouring frequency or Yates correction can be used.
Chi-square test should not be used for rates frequencies like percentage or proportion.
It should be used only if we have both occurred and non-occurred data.
It should not be used if the repeated measurements made on the same unit or attribute.
Conclusion:-
The chi-square test and its strength measure are straightforward to compute, making them useful in areas where other statistical programs might not be readily available. However, it's important to note that while the chi-square test evaluates the presence of an association, it does not indicate the strength of the association or whether the relationship is causal.
References:-
Mary L. McHugh (2013) Biochem Med 23(2):143-149.
Schober, Patrick, Vetter, Thomas R (2019) Chi-square tests in Medical Research 129(5):p 1193.
D.N. ELHANCE, Fundamentals of Statistic.
Todd Michael Franke, Timorthy Ho and Christina A (2011) Christie American Journal of evaluation 33:448.