Statistical evaluation of panel repeatability in Check-All-That-Apply questions
Methodologies for evaluating panel repeatability in Check-All-That-Apply (CATA) questions are reviewed and developed. First, the limitations with using McNemar’s test as suggested elsewhere for the evaluation of repeatability are demonstrated through simple examples. Alternative approaches are then suggested and discussed. These include the binomial test, Gwet’s AC1 statistic, and Pearson’s χ2 goodness-of-fit test. These methodologies are applied to previously published orange juice data. The advantages of using the binomial test or the Pearson’s χ2-goodness-of-fit test are related to their accessibility in most statistical software packages. The advantages of using the Gwet’s AC1 statistic or the Pearson’s χ2 goodness-of-fit test are related to the fact that tests are easily generalized to more than 2 replications. Pearson’s χ2 goodness-of-fit test is widely available in statistical software and generalizable to more than 2 replications, but this test is sensitive to very low expected frequencies. For this reason we suggest using Gwet’s AC1 statistic, which does not share this limitation.
Meyners, M., Castura, J. C., & Worch, T. (2016). Statistical evaluation of panel repeatability in Check-All-That-Apply questions. Food Quality and Preference, 49, 197–204. https://doi.org/10.1016/j.foodqual.2015.12.010