Skip to content

Blog

Dr. William Dardick and Dr. Brandi Weiss published an article titled, "An investigation of chi-square and entropy-based methods of item-fit using item-level contamination in item response theory" in the Journal of Modern Applied Statistical Methods, 18(2), EP3208. (DOI: 10.22237/jmasm/1604190480).

Abstract

New variants of entropy as measures of item-fit in item response theory are investigated. Monte Carlo simulation(s) examine aberrant conditions of item-level misfit to evaluate relative (compare EMRjX2G2, S-X2, and PV-Q1) and absolute (Type I error and empirical power) performance. EMRj has utility in discovering misfit.  

(Publication Date: October 2, 2020)

Keywords: item response theory, IRT item-fit, IRT model fit, Monte Carlo simulation.

1

Drs. Brandi A. Weiss and William Dardick published an article titled, "Making the cut: Comparing methods for selecting cut-point location in logistic regression" in the Journal of Experimental Education: Measurement, Statistics, and Research Design (DOI: 10.1080/00220973.2019.1689375).

Classification measures and entropy variants can be used as indicators of model fit for logistic regression. These measures rely on a cut-point, c, to determine predicted group membership. While recommendations exist for determining the location of the cut-point, these methods are primarily anecdotal. The current study used Monte Carlo simulation to compare mis-classification rates and entropy variants across four cut-point selection methods: default 0.5, MAXCC, nonevent rate, and MAXSS. Minimal differences were found between methods when group sizes were equal or large between-groups differences were present. The MAXSS method was invariant to group size ratios, however, yielded the highest total misclassification rate and highest amount of misfit. The 0.5 and MAXCC methods are recommended for use in applied research. Recommendations are provided for researchers concerned with small group classification who may use the MAXSS method. EFR and EFR-rescaled were less influenced by cut-point location than classification methods.

Keywords: logistic regression; model-fit; cut-point methods; entropy; classification; misclassification.

Drs. Brandi A. Weiss and William Dardick published an article titled, "Separating the odds: Thresholds for entropy in logistic regression" in the Journal of Experimental Education (DOI: 10.1080/00220973.2019.1587735).

Researchers are often reluctant to rely on classification rates because a model with favorable classification rates but poor separation may not replicate well. In comparison, entropy captures information about borderline cases unlikely to generalize to the population. In logistic regression, the correctness of predicted group membership is known, however, this information has not yet been utilized in entropy calculations. The purpose of this study was to, 1) introduce three new variants of entropy as approximate-model-fit measures, 2) establish rule-of-thumb thresholds to determine whether a theoretical model fits the data, and 3) investigate empirical Type I error and statistical power associated with those thresholds. Results are presented from two Monte Carlo simulations. Simulation results indicated that EFR-rescaled was the most representative of overall model effect size, whereas EFR provided the most intuitive interpretation for all group size ratios. Empirically-derived thresholds are provided.

Keywords: Classificationcut-point methodsentropylogistic regressionmodel-fitmisclassification

Drs. Brandi A. Weiss and William R. Dardick will present their research, Detecting Differential Item Functioning with Entropy in Logistic Regression, at the Frontiers in Educational Measurement conference at The University of Oslo, Norway in September 2018.

In this talk we will discuss the adaptation of four entropy variants to detect differential item functioning (DIF) in logistic regression (LR): entropy (E), entropy misfit (EM), the entropy fit ratio (EFR), and a rescaled entropy fit ratio (Rescaled-EFR). Logistic regression is frequently used to detect DIF due to its flexibility for use with uniform and nonuniform DIF, binary and polytomous LR, and groups with 2+ categories. In this talk we will focus on binary LR models with two groups (reference and focal), however, we will also discuss the use of entropy with polytomous LR models and models with 2+ focal groups. We will present both a mathematical framework and results from a Monte Carlo simulation.

A fair test is free of measurement bias and construct-irrelevant variance. When groups are found to differ on an underlying construct test fairness may be impacted. DIF may help identify potentially biased items. While traditionally, dichotomous measures of statistical significance have been used to detect DIF in LR (e.g., χ2 and G2), more recent work has emphasized the importance of simultaneously examining measures of effect size. Model fit statistics can be thought of as a type of effect size. Previously, entropy has been used to capture the separation between categories and is expressed as a single measure of approximate data-model fit in latent class analysis, data-model fit in binary logistic regression, person-misfit in item response theory (IRT), and item-fit in in IRT. Entropy captures discrimination between categories and can be thought of as a measure of uncertainty that may be useful in conjunction with other measures of DIF. In this presentation we extend entropy for use as a measure to detect DIF that complements currently utilized DIF measures.

Monte Carlo simulation results will be presented to demonstrate the usefulness of entropy-based measures to detect DIF with a specific focus on model comparison and changes in entropy variants. We evaluate the following variables across 1,000 replications per condition: sample size, group size ratio, between-groups impact (i.e., difference in ability distributions), percentage of DIF items in the test, type of DIF (uniform vs nonuniform), and amount of DIF. Results will be presented comparing entropy variants to current measures used to detect DIF in LR (e.g., χ2, G2, δR2, difference in probabilities, and the delta log odds ratio). Statistical power and Type I error rates will be discussed.

Entropy-based measures may be advantageous for detection of DIF by providing a more thorough examination of between-group differences. More specifically, entropy exists on a continuum thus representing the degree to which DIF may be present, does not rely on dichotomous hypothesis testing, has an intuitive interpretation because values are bounded between 0 and 1, and can simultaneously be used as an absolute measure of fit and a relative measure for between-groups comparisons.

Drs. William Dardick and Brandi Weiss published an article titled Entropy-based measures for person fit in item response theory in Applied Psychological Measurement (DOI:10.1177/0146621617698945).

This article introduces three new variants of entropy to detect person misfit (Ei, EMi, and EMRi), and provides preliminary evidence that these measures are worthy of further investigation. Previously, entropy has been used as a measure of approximate data–model fit to quantify how well individuals are classified into latent classes, and to quantify the quality of classification and separation between groups in logistic regression models. In the current study, entropy is explored through conceptual examples and Monte Carlo simulation comparing entropy with established measures of person fit in item response theory (IRT) such as lz, lz*, U, and W. Simulation results indicated that EMi and EMRiwere successfully able to detect aberrant response patterns when comparing contaminated and uncontaminated subgroups of persons. In addition, EMi and EMRi performed similarly in showing separation between the contaminated and uncontaminated subgroups. However, EMRi may be advantageous over other measures when subtests include a small number of items. EMi and EMRiare recommended for use as approximate person-fit measures for IRT models. These measures of approximate person fit may be useful in making relative judgments about potential persons whose response patterns do not fit the theoretical model.