Theses Doctoral

Growth Mixture Modeling with Non-Normal Distributions - Implications for Class Imbalance

Han, Lu

Previous simulation studies on the non-normal GMM are very limited with respect to examining effects of a high degree of class imbalance. To extend previous studies, the present study aims to examine through Monte Carlo simulation the impact of a higher degree of imbalanced class proportion (i.e., 0.90/0.10) on the performance of different distribution methods (i.e., normal, t, skew-normal, and skew-t) in estimating non-normal GMMs.

To fulfill this purpose, a Monte Carlo simulation was based on a two-class skew-t growth mixture model under different conditions of sample sizes (1000, 3000), class proportions (0.90/0.10, 0.50/0.50), skewness for intercept (1, 4), kurtosis (2, 6), and class separations (high, low), using the four different distributions (i.e., normal, t, skew-normal, and skew-t). Furthermore, another aim of the present study was to assess the ability of various model fit indices and LRT-based tests (i.e., AIC, BIC, sample size-adjusted BIC, LMR-LRT, LMR-adjusted LRT, and entropy) for detection non-normal GMMs under a higher degree of class imbalance (0.90/0.10).

The results indicate that (1) the skew-t distribution is highly recommended for estimating non-normal GMMs under high-class separation with highly imbalanced class proportions of 0.90/0.10, irrespective of sample size, skewness for intercept, and kurtosis; (2) For low-class separation with high class imbalance (0.90/0.10), the normal distribution is highly recommended based on the AIC, BIC, and sample size-adjusted BIC, while the skew-t distribution is most recommended based on the entropy; (3) poor class separation significantly reduces the performance of every distribution for estimating non-normal GMMs with high class imbalance, especially for the skew-t and t GMMs; (4) insufficient sample size significantly reduces the performance of the skew-t and t distributions for estimating non-normal GMMs with high class imbalance; (5) high class imbalance (0.90/0.10) and poor class separation significantly reduces the ability of the LRT-based tests for all distributions across different conditions; (6) excessive levels of skewness for the intercept significantly decreases the ability of most fit indices for the skew-t distribution (BIC and LRT-based tests), t (AIC, BIC, sBIC, and LRT-based tests), skew-normal (AIC and BIC), and normal (LRT-based tests) distributions when estimating non-normal GMMs with high class imbalance; (7) excessive levels of kurtosis has a partial negative effect on the performance of the skew-t (AIC, BIC, and LRT-based tests) and t (AIC, BIC, sBIC, and LRT-based tests) distributions when the level of skewness for intercept is excessive; and (8) for the highly imbalanced class proportions of 0.90/0.10, the sBIC and entropy for the skew-t distribution outperform the other fit indices under high-class separation, while the AIC, BIC, and sample size-adjusted BIC for the normal distribution and the entropy for the skew-t distribution are the most reliable fit indices under low-class separation.

Files

  • thumnail for Han_tc.columbia_0055E_11448.pdf Han_tc.columbia_0055E_11448.pdf application/pdf 560 KB Download File

More About This Work

Academic Units
Human Development
Thesis Advisors
DeCarlo, Lawrence T.
Keller, Bryant
Degree
Ed.D., Teachers College, Columbia University
Published Here
September 11, 2024