Chi-square, like any analysis has its limitations. One of the limitations is that all participants measured must be independent, meaning that an individual cannot fit in more than one category. If a participant can fit into two categories a chi-square analysis is not appropriate. Keeping in line with our tomato plant example, if a tomato plant, when measured, can be put in more than one box, a chi-square statistic is not appropriate. So the plant must be either resistant or susceptible and show just one banding pattern (A, B or H).
Another limitation with using chi-square is that the data must be frequency data. For example if you are just counting how many tomato plants show resistance to bacterial spot versus how many show susceptiblity, than a chi-square is appropriate. Also when calculating the number of expected individuals for each class, there should be greater than 5 for each class for the most appropriate use of chi-square. Another consideration one must make is that the chi-square statistic is sensitive to sample size. Most recommend that chi-square not be used if the sample size is less than 50, or in this example, 50 F2 tomato plants. If you have a 2x2 table with fewer than 50 cases many recommend using Fisher’s exact test.
Chi-square also assumes random sampling so tomato plants being measured must be selected randomly from the total population. Researchers also need to remember that the chi-square test does not give much information about the strength of the relationship. For example one cannot say that a tomato plant height is correlated with its leaf size simply by running a chi-square statistic.
While chi-square does have limitations, it also has a number of strengths. One of the largest strengths of chi-square is that it is easier to compute than some statistics. Also it can be used with data that has been measured on a nominal (categorical) scale. It can also be used to see if there is a “difference” between two or more groups of participants. For example one could see if there is an association between the size of a tomato fruit and the number of fruit produced on a single plant. Another strength is that chi-square makes no assumptions about the distribution of the population. Other statistics assume certain characteristics about the distribution of the population such as normality.
Large Datasets This tomato breeding case study demonstrates the use of chi-square in two relatively simple scenarios. There might be times where the data set is much more complex and you’ll want to use a software program to do the calculation, instead of by hand. One common program is SAS.