Chi Square test
- In this section, we will learn how to interpret and use the Chi-square test in SPSS. Chi-square test is also known as the Pearson chi-square test because it was given by one of the four most genius of statistics Karl Pearson.
- The Chi-square test is a non-parametric test for testing the significant differences between group frequencies. Often when we work with data, we get the data not always in continuous data format. Data might come in the format of frequencies. For example, how many males, how many females are working in an office. Do they significantly differ from each other, which means there are significantly more males working in a job or office as compared to females.
- For example, suppose we are a government servant, and we want to convey a message that in sectors that are considered hazards like mining and all, there might be more male workers employed than female workers. So, in that case, we might compute a Pearson chi-square test and convey the result that there are significantly more male workers in mining as compared to female workers because the job is of a hazards nature.
- Suppose we are working in an IT sector or in some other sector where an equal number of males and females are supposed to be employed. In that case, we might be interested in proving that there is no significant difference between the number of males and females working in the office. So, in that case, we can use our data and calculate the Pearson chi-square test to covey our result. Since the number of males and females comes in the format of frequency so we cannot apply any parametric tests like ANOVA or t-test. So the chi-square test can be used only when we have our data in the format of frequencies.
- Frequencies can be as simple as the example I gave you as the number of males and the number of females. But if we take multiple groups, in that case, the frequencies might become complex. But chi-square makes our job easy by simply calculating the result. Let us look at this data:
We have taken this data from the SPSS folder. There is bank loan data, and we are having variables like age, education, years with current employer, address, income and other such variables along with the loan default. The researcher might be interested in finding out whether highly educated persons default less loan as compared to persons who are less educated. So we are having default in the format of frequency whether a person defaulted or not. In our case, not default is measured as 0 and default is measured as 1.
For education, we are having various categories ranging from 1 to 5 like this:
So it could be very interesting to find out if people across different educational categories significantly differ from each other in terms of loan default. In our case, the independent variable is educational qualification or educational category and loan default is our dependent variable. The independent variable and dependent variable are both coming in the format of frequency. So we cannot apply any parametric tests like t-test or ANOVA. In this case, a suitable test will be the Pearson chi-square test.