In structural equation modeling, the fit indices establish whether, overall, the model is acceptable. If the model is acceptable, researchers then establish whether specific paths are significant. Acceptable fit indices do not imply the relationships are strong. Indeed, high fit indices are often easier to obtain when the relationships between variables are low rather than high--because the power to detect discrepancies from predictions are amplified.

Many of the fit indices are derived from the chi-square value. Conceptually, the chi-square value, in this context, represents the difference between the observed covariance matrix and the predicted or model covariance matrix.

The fit indices can be classified into several classes. These classes include:

- Discrepancy functions, such as the chi square test, relative chi square, and RMS
- Tests that compare the target model with the null model, such as the CFI, NFI, TFI, and IFI
- Information theory goodness of fit measures, such as the AIC, BCC, BIC, and CAIC
- Non-centrality fit measures, such as the NCP.

Many researchers, such as Marsh, Balla, and Hau (1996), recommend that individuals utilize a range of fit indices. Indeed, Jaccard and Wan (1996) recommend using indices from different classes as well& this strategy overcomes the limitations of each index.

A model is regarded as acceptable if:

- The Normed Fit Index (NFI) exceeds .90 (Byrne, 1994) or .95 (Schumacker & Lomax, 2004)
- The Goodness of Fit Index exceeds .90 (Byrne, 1994)
- The Comparative Fit Index exceeds .93 (Byrne, 1994)
- RMS is less than .08 (Browne & Cudeck, 1993)--and ideally less than .05 (Stieger, 1990). Alternatively, the upper confidence interval of the RMS should not exceed .08 (Hu & Bentler, 1998) The relative chi-square should be less than 2 or 3 (Kline, 1998& Ullman, 2001).

These criteria are merely guidelines. To illustrate, in a field in which previous models generate CFI values of .70 only, a CFI value of .85 represents progress and thus should be acceptable (Bollen, 1989).

The chi-square for the model is also called the discrepancy function, likelihood ratio chi-square, or chi-square goodness of fit. In AMOS, the chi-square value is called CMIN.

If the chi-square is not significant, the model is regarded as acceptable. That is, the observed covariance matrix is similar to the predicted covariance matrix--that is, the matrix predicted by the model.

If the chi-square is significant, the model is regarded, at least sometimes, as unacceptable. However, many researchers disregard this index if both the sample size exceeds 200 or so and other indices indicate the model is acceptable. In particular, this approach arises because the chi-square index presents several problems:

- Complex models, with many parameters, will tend to generate an acceptable fit
- If the sample size is large, the model will usually be rejected, sometimes unfairly
- When the assumption of multivariate normality is violated, the chi-square fit index is inaccurate. The Satorra-Bentler scaled chi-square, which is available in EQS, is often preferred, because this index penalizes the chi-square for kurtosis.

The relative chi-square is also called the normed chi-square. This value equals the chi-square index divided by the degrees of freedom. This index might be less sensitive to sample size. The criterion for acceptance varies across researchers, ranging from less than 2 (Ullman, 2001) to less than 5 (Schumacker & Lomax, 2004).

The RMS, also called the RMR or RMSE, represents the square root of the average or mean of the covariance residuals--the differences between corresponding elements of the observed and predicted covariance matrix. Zero represents a perfect fit, but the maximum is unlimited.

Because the maximum is unbounded, the RMS is difficult to interpret and consensus has not been reached on the levels that represent acceptable models. Some researchers utilized the standardized version of the RMS instead to override this problem.

According to some researchers, RMS should be less than .08 (Browne & Cudeck, 1993)--and ideally less than .05 (Stieger, 1990). Alternatively, the upper confidence interval of the RMS should not exceed .08 (Hu & Bentler, 1998).

The comparative fit index, like the IFI, NFI, BBI, TLI, and RFI, compare the model of interest with some alternative, such as the null or independence model. The CFI is also known as the Bentler Comparative Fit Index.

Specifically, the CFI compares the fit of a target model to the fit of an independent model--a model in which the variables are assumed to be uncorrelated. In this context, fit refers to the difference between the observed and predicted covariance matrices, as represented by the chi-square index.

In short, the CFI represents the ratio between the discrepancy of this target model to the discrepancy of the independence model. Roughly, the CFI thus represents the extent to which the model of interest is better than is the independence model. Values that approach 1 indicate acceptable fit.

CFI is not too sensitive to sample size (Fan, Thompson, and Wang, 1999). However, CFI is not effective if most of the correlations between variables approach 0--because there is, therefore, less covariance to explain. Furthermore, Raykov (2000, 2005) argues that CFI is a biased measure, based on non-centrality.

The incremental fit index, also known as Bollen's IFI, is also relatively insensitive to sample size. Values that exceed .90 are regarded as acceptable, although this index can exceed 1.

To compute the IFI, first the difference between the chi square of the independence model--in which variables are uncorrelated--and the chi-square of the target model is calculated. Next, the difference between the chi-square of the target model and the df for the target model is calculated. The ratio of these values represents the IFI.

The NFI is also known as the Bentler-Bonett normed fit index. The fit index varies from 0 to 1--where 1 is ideal. The NFI equals the difference between the chi-square of the null model and the chi square of target model, divided by the chi-square of the null model. In other words, an NFI of .90, for example, indicates the model of interest improves the fit by 90% relative to the null or independence model.

When the samples are small, the fit is often underestimated (Ullman, 2001). Furthermore, in contrast to the TLI, the fit can be overestimated if the number of parameters is increased& the NNFI overcomes this problem.

The TLI, sometimes called the NNFI, is similar to the NFI. However, the index is lower, and hence the model is regarded as less acceptable, if the model is complex. To compute the TLI:

- First divide the chi square for the target model and the null model by their corresponding df vales--which generates relative chi squares for each model.
- Next, calculate the difference between these relative chi squares.
- Finally, divide this difference by the relative chi square for the null model minus 1.

According to Marsh, Balla, and McDonald (1988), the TFL is relatively independent of sample size. The TFI is usually lower than is the GFI--but values over .90 or over .95 are considered acceptable (e.g., Hu & Bentler, 1999).

The AIC, like the BIC, BCC, and CAIC, is regarded as an information theory goodness of fit measure--applicable when maximum likelihood estimation is used (Burnham & Anderson, 1998). These indices are used to compare different models. The models that generate the lowest values are optimal. The absolute AIC value is irrelevant--although values closer to 0 are ideal& only the AIC value of one model relative to the AIC value of another model is meaningful.

Like the chi square index, the AIC also reflects the extent to which the observed and predicted covariance matrices differ from each other. However, unlike the chi square index, the AIC penalizes models that are too complex. In particular, the AIC equals the chi square divided by n plus 2k / (n-1). In this formula, k = .5v/v + 1 - df, where v is the number of variables and n = the sample size.

The BCC is similar to the AIC. That is, the BCC and AIC both represent the extent to which the observed covariance matrix differs from the predicted covariance matrix--like the chi square statistic--but include a penalty if the model is complex, with many parameters. The BCC bestows an even harsher penalty than does the AIC.

The BCC equals the chi square divided by n plus 2k / (n- v - 2). In this formula, k = .5v/v + 1 - df, where v is the number of variables and n = the sample size.

The CAIC is similar to the AIC as well. However, the CAIC also confers a penalty if the sample size is small.

The Bayesian Information Criterion is also known as Akaike's Bayesian Information Criterion (ABIC) and the Schwarz Bayesian Criterion (SBC). This index is similar to the AIC, but the penalty against complex models is especially pronounced--even more pronounced than is the BCC and CAIC indices. Furthermore, like the CAIC, a penalty against small samples is include.

BIC was derived by Raftery (1995). Roughly, the BIC is the log of a Bayes factor of the target model compared to the saturated model.

Many other indices have also been developed. These indices include the GFI, AGFI, FMIN, noncentrality parameter, and centrality index. The GFI and, to a lesser extent, the FMIN used to be very popular, but their use has dwindled recently.

Some indices are especially sensitive to sample size. For example, fit indices overestimate the fit when the sample size is small--below 200, for example. Nevertheless, RMSEA and CFI seem to be less sensitive to sample size (Fan, Thompson, and Wang, 1999).

Anderson, J. C., & Gerbing, D. W. (1984). The effect of sampling error on convergence, improper solutions and goodness-of-fit indices for maximum likelihood confirmatory factor analysis.** Psychometrika**, 49, 155-173.

Bentler, P M. (1990). Comparative fit indexes in structural models. ** Psychological Bulletin**, 107, 238-246.

Bentler, P. M., & Bonett, D. G. (1980). Significant tests and goodness of fit in the analysis of covariance structures. ** Psychological Bulletin**, 88, 588-606.

Bentler, P. M., & Mooijaart, A. (1989). Choice of structural model via parsimony: A rationale based on precision. ** Psychological Bulletin**, 106,315-317.

Bollen, K. A. (1989). ** Structural equations with latent variables**. NY: Wiley.

Bollen, K. A. (1990). Overall fit in covariance structure models: Two types of sample size effects. ** Psychological Bulletin**, 107, 256-259.

Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for covariance structures. ** Multivariate Behavioral Research**, 24, 445-455.

Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136-162). Newsbury Park, CA: Sage.

Burnham, K, P., and D. R. Anderson (1998). ** Model selection and inference: A practical information-theoretic approach**. New York: Springer-Verlag.

Byrne, B. M. (1994). ** Structural equation modeling with EQS and EQS/Windows**. Thousand Oaks, CA: Sage Publications.

Cheung, G. W. & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. ** Structural Equation Modeling**, 9, 233-255.

Fan, X., B. Thompson, and L. Wang (1999). Effects of sample size, estimation method, and model specification on structural equation modeling fit indexes. ** Structural Equation Modeling**, 6, 56-83.

Hipp J. R., & Bollen K. A. (2003). Model fit in structural equation models with censored, ordinal, and dichotomous variables: testing vanishing tetrads. ** Sociological Methodology**, 33, 267-305.

Hu, L. T., & Bentler, P. M. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 76-99). Thousand Oaks, CA: Sage.

Kline, R. B. (1998). ** Principles and practice of structural equation modeling**. NY: Guilford Press.

Jaccard, J., & Wan, C. K. (1996). ** LISREL approaches to interaction effects in multiple regression**. Thousand Oaks, CA: Sage Publications.

Joreskog, K. G. (1993). Testing structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 294-316). Newbury, CA: Sage.

Marsh, H. W., Balla, J. R., & Hau, K. T. (1996). An evaluation of incremental fit indexes: A clarification of mathematical and empirical properties. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling techniques(pp.315-353 . Mahwah , NJ : Lawrence Erlbaum.

Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness-of-fit indexes in confirmatory factor analysis: The effect of sample size. ** Psychological Bulletin**, 103, 391-410.

Marsh, H. W., & Hau, K. T. (1996). Assessing goodness of fit: Is parsimony always desirable?

, 64, 364-390.

Raftery, A. E. (1995). Bayesian model selection in social research. In Adrian E. Raftery (Ed.) (pp. 111-164). Oxford: Blackwell.

Raykov, T. (2000). On the large-sample bias, variance, and mean squared error of the conventional noncentrality parameter estimator of covariance structure models. ** Structural Equation Modeling**, 7, 431-441.

Raykov, T. (2005). Bias-corrected estimation of noncentrality parameters of covariance structure models. ** Structural Equation Modeling**, 12, 120-129.

Schumacker, R. E., & Lomax, R. G. (2004). ** A beginner's guide to structural equation modeling**, Second edition. Mahwah, NJ: Lawrence Erlbaum Associates.

Steiger, J. H. (1990). Structural model evaluation and modification: An interval estimation approach. ** Multivariate Behavioural Research**, 25, 173-180.

Steiger J. H. (2000). Point estimation, hypothesis testing and interval estimation using the RMSEA: Some comments and a reply to Hayduk and Glaser. ** Structural Equation Modeling**, 7, 149-162.

Tucker, L. R., & Lewis, C. (1973). The reliability coefficient for maximum likelihood factor analysis. ** Psychometrika**, 38, 1-10.

Ullman, J. B. (2001). Structural equation modeling. In B. G. Tabachnick & L. S. Fidell (2001). Using Multivariate Statistics (4th ed& pp 653- 771). Needham Heights, MA: Allyn & Bacon.

Last Update: 6/27/2016