A comparison of three approaches to covariate effects on latent factors

In educational and psychological research, it is common to use latent factors to represent constructs and then to examine covariate effects on these latent factors. Using empirical data, this study applied three approaches to covariate effects on latent factors: the multiple-indicator multiple-cause (MIMIC) approach, multiple group confirmatory factor analysis (MG-CFA) approach, and the structural equation model trees (SEM Trees) approach. The MIMIC approach directly models covariate effects on latent factors. The MG-CFA approach allows testing of measurement invariance before latent factor means could be compared. The more recently developed SEM Trees approach partitions the sample into homogenous subsets based on the covariate space; model parameters are estimated separately for each subgroup. We applied the three approaches using an empirical dataset extracted from the eighth-grade U.S. data from the Trends in International Mathematics and Science Study 2019 database. All approaches suggested differences among mathematics achievement categories for the latent factor of mathematics self-concept. In addition, language spoken at home did not seem to affect students’ mathematics self-concept. Despite these general findings, the three approaches provided different pieces of information regarding covariate effects. For all models, we appropriately considered the complex data structure and sampling weights following recent recommendations for analyzing large-scale assessment data.


Background
In educational and psychological research, it is common to use latent factors to represent constructs. Latent factors are often established using the common factor model that includes both exploratory and confirmatory factor models. After factor models are run and tested against empirical data, there is usually a need for further analysis that involves effects of other covariates. For example, researchers may be interested in knowing whether the same factor structure would work for a normative sample vs. a referral sample (e.g., Parkin & Wang, 2021) or whether student sex and grade would be significant predictors of classroom engagement (e.g., Wang et al., 2014b). Within the framework of structural equation modeling (SEM), there are typically two methods for covariate Page 3 of 18 Wang Large-scale Assessments in Education (2022) 10:26 CFA is a type of common factor model (Brown, 2006;Thurstone, 1947). The common factor postulates that each measured variable is a linear function of one or more common factors and a unique variable. Once the common factor(s) are removed, the observed variables are uncorrelated with each other. The unique variable is a combination of measurement error and specific error that is due to the selection of the measured variable. Suppose there are data of N participants on p observed variables and the score for the ith person on the jth variable is denoted as Y ij . The linear factor model can be written as In matrix form, the response vector of participant i can be written as where y i is a p × 1 vector of p observed variables, v is the p × 1 vector of item intercepts, is a p × m matrix of factor loadings, η i ∼ N (κ, �) is an m × 1 vector of common factors and is an m × m matrix of factor covariance matrix, ε i ∼ N (0, �) is a p × 1 vector of unique factors and is a p × p matrix of unique variances and covariances.
Further, it is assumed that E(η i ) = 0 , E(ε i ) = 0 , and Cov(η i , ε i ) = 0. Under these assumptions, the population mean vector µ and the population covariance matrix of the p observed variables can be written, respectively, as where is a p × p population covariance matrix of the observed variables, is an m × m matrix of factor covariance matrix, is a p × p matrix of unique variances and covariances.
Because latent factors are unobserved, it is necessary to set a location and metric for each latent factor. Two methods are commonly used: (a) putting the latent factors on a scale of zero mean and standard deviation of 1; and (b) choosing a marker indicator and set its loading to 1 and intercept to 0. Another method, called the effects coding method, imposes linear constraints on the unstandardized pattern coefficients to identify the model and can also be used (Little et al., 2006).

Covariate effects on latent factors
Whereas CFA, as a measurement technique, is often used for scale development and validation (typically together with exploratory factor analysis-another common factor model; e.g., Pratscher et al., 2019), it is also widely used in examining covariate effects. These covariates could represent demographical differences among individuals, or they could be attitudinal, psychological, situational, or trait variables. For example, the researcher may be interested in whether age is related to the latent factor means (e.g., Frisby & Wang, 2016). Covariates could be observed variables, or they themselves could

MIMIC approach to covariate effects
For (observed) covariate effects on latent factors, based on Eq.
(1), we further have where x i is a q × 1 vector of observed covariates, Ŵ is an m × q matrix of regression coefficients representing the covariate effects on latent factors, ζ i is an m × 1 vector of disturbances, ζ i ∼ N (0, ) , and α is an m × 1 vector of intercepts of the latent factors that are typically set to be zero. The model parameter vector then is θ = (v, �, κ, �, �, Ŵ, �, α) . For model identification, it is often the case that diag( ) = I , κ = 0 , v = 0 , α = 0 (see Wu & Estabrook, 2016).
The MIMIC model is a single-group analysis and a special type of the full SEM model. In a MIMIC model, covariates directly affect the latent factor(s) and the path coefficients from the covariates to the latent factor(s) represent their effects. With a categorical covariate with more than two categories, some coding scheme (e.g., dummy coding) is used to create dummy variables. The effects of dummy variables on the latent factor(s) represent group differences, controlling for the other covariate(s).
The MIMIC approach is a direct extension of the linear regression model. The regular assumptions for regression models (independence of observations, linearity, and no correlations between covariates and the disturbance) also apply to the MIMIC model. A practical difference between the MIMIC model and the regression model is that the coefficients from dummy variables to the latent factor(s) in the MIMIC model should be standardized with respect to the latent factors because the scale of the latent factors is arbitrary, whereas in regression the unstandardized coefficients reflect group comparisons on the dependent variable.

Multiple group confirmatory factor analysis
When the covariates are categorical variables with a relatively small number of categories, their effects can be and often are examined using multiple group CFA (MG-CFA). The advantage of using MG-CFA is that cross-group equality of different types of parameters (e.g., factor means, factor variances, and covariates) can be tested. In addition to structural level parameters that involve latent factors and relationships between them, measurement level parameters-which represent relationships between latent factors and the observed indicators variables-are often investigated as well. There is a large body of literature on measurement invariance under the CFA framework, both methodologically (e.g., Liu et al., 2017;Meredith, 1993;Millsap, 2011), and empirical applications (e.g., Chan et al., 2019).
MG-CFA for covariate effects can be thought of as an extension of the analysis of variance (ANOVA) for group differences on observed means. Population parameters for (4) η i = α + Ŵx i + ζ i Page 5 of 18 Wang Large-scale Assessments in Education (2022) 10:26 different groups are specified and tested, typically through null hypothesis significance testing (NHST). For ANOVA, the population parameters for NHST are the means, and the testing assumes that the groups have the same variance on the outcome variable in the population. When MG-CFA is used for covariate effects, the population parameters to be tested under NHST usually include mean differences on the latent factors (for identification purposes, the latent factor means for a reference group are usually constrained to zero), factor variances and covariances; however, other parameters can also be tested. For MG-CFA, group sizes should be large enough to run CFA using data from individual groups. In addition, when there are many groups, even small differences between model parameters would be statistically significant, although Bayesian methods could be used for testing measurement invariance among many groups (Muthen & Asparouhov, 2014). When the covariate is continuous, some categorization is necessary before conducting MG-CFA.
When a covariate x represents group membership, instead of explicitly modeling the effect of x on latent factors as in Eq. (4), the covariate is used to subset data in MG-CFA. When there are multiple covariates, the researcher can either run multiple MG-CFA models, each time with a single covariate, or construct groups based on these covariates before conducting MG-CFA. The latter method may suffer from small sample sizes when the data are sliced in more ways. With G groups, Eqs. (5) and (6) show the population mean vector and the population covariance matrix of the p observed indicator variables, respectively, for a specific group g.
The parameter vector θ is expanded to include parameters for multiple groups. For model identification, it is necessary to constrain parameters for each group (Millsap, 2011). When there are no equality constraints across groups, identification constraints for each group are similar to those for single group CFA (e.g., identifying the scale of latent factors). With equality constraints across groups (e.g., equal factor loadings, equal item intercepts), identification constraints are typically different for one group (e.g., the first group) compared to the other groups.
The biggest advantage of using MG-CFA is testing equality of different types of parameters across groups (i.e., invariance testing). In fact, invariance testing has been increasingly used in the development and validation and scales that involve CFA (e.g., Wang et al., 2014b). Like the MIMIC approach, MG-CFA is a standard method in SEM software packages such as Mplus (Muthén & Muthén, 1998-2017 and the "lavaan" R package (Rosseel, 2012).

Decision trees
Decision trees, also called trees, classification and regression trees (CART; Breiman et al., 2017;Loh, 2011), or recursive partitioning, are methods to split (i.e., partition) the space of covariates into subsets. Response values are similar within each subset but (5) µ g = v g + g κ g (6) g = g g g ′ + g Page 6 of 18 Wang Large-scale Assessments in Education (2022) 10:26 different between subsets. The partitioning is repeated recursively until no splitting could be done based on some stopping criteria. When the outcome is a categorical variable, classification trees are built; when the outcome is numeric, regression trees are built. Decision trees have been extended to incorporate parametric models (model-based recursive partitioning; MOB; Zeileis et al., 2008). With MOB, a stochastic model (e.g., a regression model) is assumed (called the template model); and the sample is split into groups with different values of model parameters. For example, if the template model is a regression model, the intercept and slopes may vary between subgroups according to some covariates. Therefore, in an example of regressing achievement on motivation, the intercept and slope may differ for students with different socioeconomic status (SES); therefore, a tree could use SES to divide participants based on differences in the regression model parameters.
MOB has been used to incorporate different stochastic models (e.g., Item Response Theory models) and decision trees (Brandmaier et al., 2013;Jeon & De Boeck, 2019;Merkle et al., 2014;Spratto et al., 2021;Wang et al., 2014a). SEM Trees (Brandmaier et al., 2013) combine recursive partitioning and SEM. SEM Trees use the likelihood ratio test or score-based tests to split observations based on covariates.

SEM Trees
For each covariate, data are split along all possible points of that covariate to create homogeneous groups according to some criteria (typically the likelihood but score-based tests are also available). These splits are binary splits, meaning that when a split happens, data are split into two groups (i.e., two nodes). For each candidate split, the log-likelihood values before and after the potential split are obtained. Because the model before the split is nested within the model after the split, a likelihood ratio test can be used to compare models. The partition of the covariate that leads to the greatest improvement in the model is retained. The process continues until a stopping criterion is reached. Stopping criteria could be a maximum tree depth, a minimum number of observations in a node, the p-value for the likelihood ratio test, etc.
There are a few different packages that can be used to implement SEM Trees. The "semtree" R package (Brandmaier et al., 2013) is a tree algorithm designed specifically for SEM. The package is based on the "OpenMx" package (Boker et al., 2011), which is a flexible R package that allows estimation of a wide variety of advanced multivariate statistical models including SEM. The "semtree" package can also be used together with the "lavaan" (Rosseel, 2012), a most popular R package for SEM. Another R package, "partykit" (Zeileis et al., 2008), is a general framework for MOB. To implement SEM Trees with the "partykit" package, some preliminary work is necessary to set up the SEM. It is possible to set up the SEM model with "lavaan".
Both "semtree" and "partykit" are solely based on the R language. Another package, "MplusTrees" is based on Mplus (Muthén & Muthén, 1998-2017 and the "MplusAutomation" R package (Hallquist & Wiley, 2018) that serves as an interface between Mplus and R (Serang et al., 2021). Mplus Trees taking advantage of the comprehensive Mplus software, allows users to specify complex SEM models using the regular Mplus syntax. The splitting procedure for the tree to grow is determined by the complexity parameter (cp) due to the package's reliance on the "rpart" package (Therneau & Atkinson, 1997). cp reflects the relative improvement in the model fit for the split to be retained. If a candidate split improves the -2logL of the root node by a factor of at least cp, the split is made. The smaller the cp, the more complex the final tree is likely to be. Other stopping criteria such as the minimum number of observations within a node needed to attempt a split, the minimum observations within a terminal node, the maximum depth of the tree, the p-value for likelihood ratio tests can also be used/added.

Dataset description
To illustrate the three approaches, we used an empirical dataset from TIMSS 2019 (Martin et al., 2020). Specifically, we used the eighth grade U.S. data and only considered a subset of variables but data from all eighth graders who participated and were included in the public-use database were used. The sample size is 8,698. The variables for this study were seven indicators for the latent factor of mathematics self-concept, student's sex, home resources, language spoken at home, and mathematics achievement category. The seven indicator variables were: (a) I usually do well in mathematics; (b) Mathematics is not one of my strengths; (c) I learn things quickly in mathematics; (d) Mathematics makes me nervous; (e) I am good at working out difficult mathematics problems; (f ) My teacher tells me I am good at mathematics; and (g) Mathematics makes me confused. They were rated on a 4-point Liker scale (1 = Agree a lot, 2 = Agree a little, 3 = Disagree a little, 4 = Disagree a lot). The four positively worded items (a, c, e, and f ) were reverse coded so that a higher numeric rating would represent more mathematics self-concept. We coded the student sex variable as "0" for boys and "1" for girls. For the language spoken at home variable, always or almost always speaking English at home was coded as "1" and sometimes or never speaking English at home was coded as "0". Home resources was a categorical variable with three categories "Many resources", "Some resources", and "Few resources". It was dummy coded with "Some resources" as the reference group for MIMIC analysis. There were five mathematics achievement categories based on the first plausible value of mathematics achievement (Level 1 = Below 400, Level 2 = At or above 400 but below 475, Level 3 = At or above 475 but below 550, Level 4 = At or above 550 but below 625, Level 5 = At or above 625. These cutoffs and levels are from TIMSS. See Martin et al., 2020). The mathematics achievement category variable was dummy coded with Level 3 as the reference group for MIMIC analysis. The study was reviewed by the Institutional Review Board at the author's university, which determined that this project does not constitute human subjects research according to the Department of Health and Human Services regulatory definitions. Informed consent is not applicable as this study analyzes publicly available data which do not include identifiable information.

Data analysis
For all models, we appropriately considered the complex data structure and sampling weights following recent recommendations (Stapleton, 2006a;Wang et al., 2019). Missing data for SEM models, including CFA, MIMIC, MG-CFA, and the template SEM model for SEM Trees were dealt with using the full information maximum likelihood estimation for which both complete and partial data points were used. Specifically, observations with a missing value on any of the covariates involved in a particular analysis were deleted and observations with missing values on all indicator variables were removed. Missing data on the covariates for partitioning for the SEM Trees were dealt with using surrogate split. These are the default methods for these approaches. All SEM models were estimated in Mplus using the "MplusAutomation" R package. For SEM Trees, we used the "MplusTrees" R package. The estimator for all SEM models was the robust maximum likelihood estimator (MLR) which adjusts standard errors of parameter estimates and rescales model chi-square values.
CFA without any covariates was applied to obtain an initial model that was later used as a template for MIMIC, MG-CFA, and SEM Trees. For the MIMIC model, all covariates, dummy coded if necessary, were tested simultaneously. For MG-CFA models, three measurement invariance tests (configural, metric, and scalar) were conducted for each of the covariates separately. We evaluated model fit in two ways. First, model fit for each model according to commonly used methods: comparative fit index (CFI) ≥ 0.95, Tucker-Lewis index (TLI) ≥ 0.95, root mean square error of approximation (RMSEA) < 0.06, and standardized root mean square residual (SRMR) < 0.08 (Hu & Bentler, 1999). Second, decreases in CFI and/or increases in RMSEA for testing factor loading and item intercept invariance. Chen (2007) and Cheung and Rensvold (2002) recommended that a decrease of at least 0.01 in CFI and an increase of at least 0.015 in RMSEA would suggest that the more constrained model fit the data significantly worse than the less constrained model. It should be noted that Δχ 2 tests for nested models that are based on the Satorra-Bentler scaled χ 2 values could have been used as well. However, it is well known that the Δχ 2 test, just like the χ 2 test, is sensitive to sample size. When the sample size is large, as is the case in this project, a small discrepancy would likely lead to a statistically significant Δχ 2 test.
For SEM Trees in this study (i.e., Mplus Trees), we tested two cp values of 0.001 and 0.01, following recommendations by Serang et al. (2021). In addition, it is common to use other stopping criteria such as the minimum number of observations within a node needed to attempt a split, the minimum observations within a terminal node, the maximum depth of the tree, the p value for likelihood ratio tests, etc. For this study, we set the maximum depth of the tree to four, and the minimum number of observations in any terminal node to 100.

Confirmatory factor analysis
The original single-factor CFA model with seven indicators for the construct of mathematics self-concept did not fit the data well: CFI  Table 1 Model fit The estimator used was MLR. All models except Model 1 had correlated residuals among the three negatively worded indicators. Correlated residuals were not constrained to be equal across groups in multiple group analysis. In all models, the first factor loading was fixed to 1 to identify the latent factor in each group. For configural invariance and metric invariance models, the latent factor mean was fixed to 0 and the latent factor variance was free in all groups. For scalar invariance models, the latent factor mean for the reference group was fixed to 0 and was free for the other groups; the latent factor variance was free in all groups. CFI = comparative fit index; TLI = Tucker-Lewis index; RMSEA = root mean square error of approximation; SRMR

MIMIC model
The MIMIC model was based on the one-factor CFA with correlated residuals among the three negatively worded items. The model fit the data well: CFI = 0.963, TLI = 0.951, SRMR = 0.028, RMSEA = 0.032 with 90% CI [0.030, 0.035]. Regarding covariate effects, girls had statistically significantly lower mathematics self-concept than boys (standardized coefficient = -0.107, p < 0.001). Language spoken at home and home resources did not have statistically significant effects on math self-concept. Students in lower mathematics achievement categories (Levels 1 and 2) had statistically significantly lower mathematics self-concept than those in the middle category (Level 3). Students in higher mathematics achievement categories (Levels 4 and 5) had statistically significantly higher mathematics self-concept than those in the middle category (Level 3). Table 1 has model fit information. Based on the model fit of individual models as well as model comparisons, scalar invariance existed between boys and girls, between those who mainly spoke English at home and those who mainly spoke another language at home, and between students with many, some, or few home resources. However, measurement invariance did not seem to exist among the achievement category groups. In other words, students' mathematics self-concept could be compared across sex, language, and home resources groups, but not among the achievement groups. Further examination suggested that girls had lower mathematics self-concept than boys. Students' mathematics self-concept did not differ statistically significantly between those who mainly spoke English at home and those who spoke another language at home. For students with many resources at home, their mathematics self-concept would be significantly higher than those with some or few resources at home.

SEM Trees
The template CFA model was the single-factor CFA with correlated residuals among the three negatively worded items (i.e., Model 2 in Table 1). The covariates were the same as in the MIMIC model and as the grouping variables for MG-CFA models. We grew two trees. The first had a cp of 0.001, and the second had a cp of 0.01. Figures 1 and 2 show the final trees for cp = 0.001 and cp = 0.01, respectively. Figure 2 has the first three nodes and therefore a subtree of Fig. 1. In both trees, the initial split was based on mathematics achievement categories. Those in Levels 1, 2, and 3 were more homogenous than those in Levels 4 and 5. In Fig. 1, following the initial split, those in the lower mathematics achievement categories (Levels 1, 2, and 3) were further split by whether they were in the lowest mathematics achievement category (Level 1), and then whether they were boys or girls. For those in the higher mathematics achievement categories (Levels 4 and 5), they were also split by sex. Variables for the language spoken at home and home resources did not show up as split variables in the tree.
There were six terminal nodes in the first tree and two terminal nodes in the second tree. The estimates of parameters (factor loadings, item intercepts, variance of the latent factor, residual variances, and residual covariances for the three negatively worded items) for the terminal nodes are in Tables 2 and 3 for Tree 1 and Tree 2, respectively. One advantage of SEM Trees is that we do not have to specify the interaction effects beforehand. Instead, the tree would identify the interaction effects. From Tree 1 (Fig. 1), the covariate effects can be thought of as interaction effects. There was an interaction effect between achievement category and student sex on the factor structure of the latent factor of mathematics self-concept. From Tree 2 (Fig. 2), there was only a main effect of mathematics achievement category. The terminal nodes of both trees include the splitting of observations. The parameter estimates for those nodes should be consistent with those obtained for the model if we subset the sample according to the splitting rules from the trees.

Discussion
The MIMIC model and MG-CFA have been widely used to examine covariate effects on latent factors. In this study, we included an additional method, SEM Trees, to examine covariate effects. SEM Trees allow examination of nonlinear interaction effects through recursive partitioning of the sample. SEM Trees separate the template SEM model from potential covariates. The sample is split in the covariate space based on data; and the SEM model is theory-driven. Therefore, SEM Trees combine theory-based and databased approaches and allow theory-driven exploration (Brandmaier et al., 2016).
We used an empirical dataset to illustrate the three approaches to covariate effects on latent factors. The empirical data were about eighth-grade students' mathematics selfconcept and two personal (sex and mathematics achievement) and two environmental (language spoken at home and home resources) covariates. There were some consistent findings across the three approaches. First, mathematics achievement was related to mathematics self-concept, both in terms of the factor structure (lack of measurement invariance from MG-CFA and data partitioning in SEM Trees) and in terms of the magnitude (statistically significant coefficient in the MIMIC model). This is not surprising because when students responded to items on mathematics self-concept, they would likely evaluate their mathematics ability. In this study, we used mathematics achievement as a covariate and mathematics self-concept as the outcome. This is consistent with the line of inquiry of the big-fish-little-pond effect (BFLPE; e.g., in Koivuhovi et al., 2020;Wang, 2020;Wang & Bergin, 2017) that looks into both personal and contextual effects in the formation of academic self-concept. In contrast, in the research area of achievement motivation, self-concept is typically modeled as a predictor of mathematics achievement and studies have found relationships between the two constructs as well (e.g., Wang et al., 2012;Wigfield & Eccles, 2000).
The other consistent finding across the three approaches was that the language spoken at home did not seem to have an effect on mathematics self-concept. This may be good news as the U.S. is a "melting pot" with many different languages and cultures. However, the nonsignificant finding may be due to the academic subject. It would be interesting to see if the finding could be generalized to other academic subjects such as English language and literacy.
Regarding differences between sex groups, the factor structure of mathematics selfconcept was likely to be similar (scalar invariance from MG-CFA, Mplus Tree 2) but that factor structure may be unstable for different achievement groups. Mplus Tree 1 suggests an interaction between mathematics achievement and student sex. In addition, Page 13 of 18 Wang Large-scale Assessments in Education (2022) 10:26 from both MIMIC and MG-CFA, girls tended to have lower mathematics self-concept than boys. Other studies have also found gender differences favoring boys (e.g., Koivuhovi et al., 2020) but there is also research that showed no gender differences (e.g., Ghasemi & Burley, 2019). Having many home resources seemed to be beneficial from MG-CFA results, although home resources was not found to be a significant predictor of mathematics self-concept using the MIMIC or SEM Trees approaches. This could be due to the positive relationship between home resources and academic achievement. In MG-CFA, when home resources was examined, it was considered separately from the other covariates, whereas in MIMIC and SEM Trees, it was considered simultaneously with the other covariates.
One particular limitation of Mplus Trees is parameter constraints across nodes. After the sample is split, the SEM model is essentially estimated with each subset of the data. Although it is possible to fix and/or constrain parameters within the SEM model for each node, our understanding is that it is not possible to constrain and test parameter relationships between nodes. This could be an important limitation for analysis such Table 2 Parameter estimates for terminal nodes from Mplus tree with complexity parameter of 0.001 The stopping criteria for tree growth were a maximum tree depth of four, a minimum number of observations in any terminal node of 100, and complexity parameter (cp) = 0.001. The first factor loading was fixed to 1 for model identification. Node 8 was the group of boys in mathematics achievement category 1. Node 9 was the group of girls in mathematics achievement category 1. Node 10 was the group of boys in mathematics achievement categories 2 and 3. Node 11 was the group of girls in mathematics achievement categories 2 and 3. Node 6 was the group of boys in mathematics achievement categories 4 and 5. Node 7 was the group of girls in mathematics achievement categories 4 and 5

Parameter
Node 8  as measurement invariance testing, for which equality constraints on parameters are routinely tested. The "semtree" package allows invariance testing through either global invariance (fixing selected parameters to the sample estimation before the tree is grown) and local invariance (chosen parameters cannot differ while growing the tree). Such invariance options are supported with OpenMx, but not with lavaan to specify the SEM model. As one reviewer pointed out, the mathematics achievement category variable is categorized based on the first plausible value. Technically speaking, that plausible value, along with the other four plausible values for the same measure, is not an observed variable; and results might change if a different plausible value were to be used. Plausible values in large-scale assessments are generated based on item response modeling, latent regression, and multiple imputation (von Davier, 2020). Multiple plausible values are typically used in analysis to allow for the calculation of total variances of estimates which consist of within-and between-imputation variances. However, for the purpose of this article, we focus on comparing the three approaches instead of using plausible values. Readers interested in best practices of using plausible values from large-scale Table 3 Parameter estimates for terminal nodes from Mplus tree with complexity parameter of 0.01 The stopping criteria for tree growth were a maximum tree depth of four, a minimum number of observations in any terminal node of 100, and complexity parameter (cp) = 0.01. The first factor loading was fixed to 1 for model identification. Node 2 was the group of students in mathematics achievement categories 1, 2, and 3. Node 3 was the group of students in mathematics achievement categories 4 and 5 assessments can refer to many resources on this topic. Among them are Rutkowski et al. (2014), von Davier et al. (2009), Wang (2020 and Wu (2005).
In this article, we only used observed covariates. Although conceptually the methods can be extended to include latent covariates, we think there are still some technical challenges. The MIMIC approach is relatively easy to include latent covariates. The MG-CFA approach inherently relies on observed grouping variables as covariates although it can be extended to latent classes as covariates under the mixture modeling framework. It is challenging, in our opinion, to using latent, instead of observed, covariates with the SEM Trees approach due to the recursive partitioning of the sample. However, we are optimistic and look forward to new advancements in SEM (as an example, Merkle & Zeileis, 2013).
The data used were nationally representative and had a complex structure. Such large-scale assessment data are available for researchers to conduct substantive and methodological research (Rutkowski et al., 2014;Wang, 2017). There have been methodological advancements on how to analyze such data (e.g., Asparouhov, 2006;Asparouhov & Muthen, 2006;Hahs-Vaughn et al., 2011;Muthen & Satorra, 1995;Stapleton, 2006b;Trendtel & Robitzsch, 2020;Wu & Kwok, 2012), as well as software development (Bailey et al., 2020;Caro & Biecek, 2017;Oberski, 2014). In this study, we considered both the complex data structure and sampling weights, taking advantage of Mplus. The "lavaan" and "OpenMx" packages can both handle some aspects of complex data structures but additional research is needed for more user-friendly tools.

Conclusions
Using empirical data from TIMSS 2019, this study applied MIMIC, MG-CFA, and SEM Trees approaches to covariate effects on latent factors. The MIMIC and MG-CFA have been widely used in educational and psychological research. The SEM Trees approach is a more recent development. Applied researchers can take advantage of the new method with the availability of several packages. This study is one application that demonstrates the use of SEM Trees and hopefully will generate more interest in using it with large-scale assessment data.