A SYSTEMATIC REVIEW OF STRUCTURAL EQUATION MODEL (SEM)

LICENSE: This work by Open Journals Nigeria is licensed and published under the Creative Commons Attribution License 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided this article is duly cited. COPYRIGHT: The Author(s) completely retain the copyright of this published article. OPEN ACCESS: The Author(s) approves that this article remains permanently online in the open access (OA) mode. QA: This Article is published in line with “COPE (Committee on Publication Ethics) and PIE (Publication Integrity & Ethics)”. ABSTRACT


INTRODUCTION
Structural equation modeling (SEM) is a collection of statistical techniques that allow a set of relationships between one or more independent variables and one or more dependent variables to be examined. Such statistical techniques include path analysis, regression analysis, factor (confirmatory) analysis, analysis of variance, etc. SEM is used to test hypothesized relationships among a set of observed (measured) and unobserved (latent) variables. A latent variable is a variable that cannot be directly observed or measured but is indicated or inferred by responses to a number of observable variables (indicators). Latent constructs such as intelligence, reading ability, motivation, and so on are often gauged by responses to a battery of items that are designed to tap those constructs. On the other hand, a variable that is directly observed and measured is called a manifest variable. It is also called simultaneous equation modeling, analysis of covariance structure, path analysis or confirmatory factor analysis.
Structural equation modeling is a general term that has been used to describe a large number of statistical methods employed to evaluate the validity of substantive theories with empirical data (Pui-Wa & Quiong, 2007).
Statistically, it represents an extension of general linear modeling (GLM) procedures such as the ANOVA and multiple regression analysis. These statistical techniques including path analysis assume that variables are measured without errors. However, SEM incorporates measurement model in its analysis to care for errors that may result from the measurement of a variable.
Thus, SEM has several advantages over the ordinary path analysis method. These include: (i) SEM allows the measure of overall fit between the theory (as described in the path model) and the correlations among the scores in the sample through a fit index, (ii) SEM uses or models latent variables in its analysis and such latent variables (constructs) are measured through some fallible indicators that can be easily observed, (iii) In SEM, a variable (latent or manifest) can serve both as a dependent or an independent variable in a chain of causal hypotheses, (iv) SEM includes measurement model in its analysis which removes the biases due to errors of measurement, (v) Current development in SEM, which path analysis has not addressed, include the modeling of changes over time (growth models), modeling of latent classes or profiles, modeling of data having nested structures (such as multilevel model, multi-sample model, multitrait-multimethod model, etc.) as well as nonhierarchical model. such as regression analysis, factor analysis and simultaneous equation modelling. It can explicitly account for less than perfect reliability of the observed variables, providing analyses of attenuation and estimation of bias due to measurement error. The SEM approach is sometimes called causal modelling because competing models can be postulated about the data and tested against each other.
Traditional statistical approaches to data analysis such as path/regression analysis, ANOVA and correlation specify default models, assume measurement occurs without error, and are somewhat inflexible. However, structural equation modelling requires specification of a model based on theory and relevant empirical researches. It is a multivariate technique incorporating measured variables and latent constructs, and explicitly specifies measurement error (Suhr, 2000). SEM is thus a hybrid of path analysis and measurement model. In simple terms, SEM involves the evaluation of two models: a path model or structural model and a measurement model.
One of the primary goals of Social Science research is to understand social systems through the explication of causal relationships (Lleras, 2005). However, given the complexity of social life, disentangling the interrelationships among variables is often a difficult task. Path analysis is a methodological tool that helps researchers using quantitative (correlational) data to disentangle the various causal processes underlying a particular outcome.
Also, one of the unique contributions of path analysis to Social Science research is its ability to decompose the associations between several variables into their causal components which could be direct or indirect, and noncausal component which could be spurious or unanalyzed (Alwin & Hauser, 1975).
Measurement model is incorporated into the analysis of SEM due to the weakness in the assumption of path analytic models that variables are measured without error. The model is included in the SEM analysis to take care of errors or biases that may result from the measurement of research constructs or variables (Ayandele, 2014). These measurement errors may result from the administration of the instrument, scoring, some in the typical responses of the respondents to the test or the nature of the group being tested, mechanical factors such as misspelled words, poorly phrased items, test taker's temporary psychological or physical state at the time of testing (Gbaleyi & Akinyemi, 1995). The measurement model in SEM is evaluated through confirmatory factor analysis (CFA). Jeremy and Hun (2009) showed that confirmatory factor analysis corresponds to the measurement model of SEM.
Structural equation model (SEM) is a set of statistical techniques used for examining relationships between variables. It is a comprehensive statistical approach for testing hypotheses about relationships among observed and latent variables (Hoyle, 1995). SEM is a hybrid of path analysis and measurement model which is incorporated into the analysis of SEM due to the weakness in the assumption of path analytic model that variables are measured without error. (2008) (2007) employed SEM to analyse the relationship of students' attitude towards statistics, prior reasoning abilities and course performance. A total of 1618 students were involved in the study with 64% (male) and 36% (female). Findings of the study revealed that the data fitted the model with such goodness indices of χ2 = 1599.26, df=620, RMSEA = 0.035, GFI = 0.94, CFI=0.97, etc. The result of the study also showed, among other things, absence of a strong relationship between the reasoning abilities and students' performance. Burnett (2002) examined relationships between teacher praise and feedback, and students' perception of the classroom environment using structural equation model. The instruments exploited were Teacher Feedback Scale (TFS) and My Classroom Scale (MCS). The result showed that Chi-square (χ2=94, df=39) and other fitness indices were within the acceptable ranges. The results also indicated negative teacher feedback and effort feedback were both related to students' relationships with their teachers, while ability feedback was associated with perceptions of the classroom environment. Praise was not related to classroom environment or teacher-student relationships. collected fitted the hypothesized model. Also, the result of the study showed, among other things, that attitude towards work had positive but not significant effect to job satisfaction and employee performance.

ASSUMPTIONS OF STRUCTURAL EQUATION MODEL
SEM, like other multivariate statistical techniques, has a number of assumptions but the most important ones are stated below: 1) Linearity: SEM assumes linear relationships between latent or measured variables.
2) Multivariate normal distribution of the variables: Each observed or latent variable should be normally distributed for each value of the other observed or latent variable. Data collected need to be examined for univariate and multivariate outliers. Multivariate normality is required by the Maximum Likelihood Estimation (MLE), which is the predominant method for estimating structural parameters in SEM. However, there are estimation methods that do not require normality.
3) Multicollinearity: Complete multicollinearity is assumed to be absent, but correlations among the independent variables may be modelled explicitly in SEM.

4)
Sample size: Sample size should not be small since SEM relies on tests which are sensitive to sample size.
Literature in SEM indicates that sample sizes commonly run from 200-400 cases for models with 10-15 indicator variables. Another commonly applied rule of the thumb is: N > 50 +8k (with k = number of indicators).

5) Multiple indicators:
A minimum of three indicators should be used to measure each latent variable in the model. 6) Data are assumed to be at interval level of measurement. However, unlike traditional path analysis, SEM explicitly models error, including error arising from use of ordinal data. Exogenous variables may be dichotomous or dummy 30 | Olaoye, 2020 OJED 1(2) variables, but unless special approaches are taken, categorical dummy variables may not be used as endogenous variables.

STAGES IN SEM ANALYSIS
There are four stages in SEM analysis. These include model specification, model estimation, model evaluation and model modification (Byrne, 1998;Pui-Wa & Quiong, 2007). However, other authors classified the steps into five: model specification, model identification, model estimation, model evaluation and model modification.

(a) Model Specification
A model is a statistical statement about the relationships between variables. The first stage in the modelling process is to specify the model i.e. to state the specific set of hypotheses to be tested. This is done mostly through diagram.
The relationships specified are translated into equations and the model is then estimated. Thus, model specification is the translation of theory into equations through diagrams. Model specification should be based on relevant theory and research literature (Suhr, 2000).
However, specifications need not only be based on substantive theory. There are other two sources of specification: measurement theory and experimental design (Kenny, 2004). Traditionally, psychologists focus on experimental design, psychometricians on measurement theory and econometrician on substantive theory. Rather than choosing one's specification by one's discipline, specifications should be chosen to fit the problem (Kenny, 2004).
To determine whether the estimation of the model specified is possible or not brings the issue of model identification. Model identification provides a preliminary check to determine whether or not the structural equation model can be estimated (Ayandele, 2014). A model is identified if it is possible to obtain a unique solution for every parameter. A necessary but not sufficient condition to be able to identify and estimate the causal parameters of a set of structural equations is that the number of correlations between the measured variables (i.e. data points) be greater than or equal to the number of causal parameters. This necessary condition is called the "minimum condition" of model identification. Given that the two are equal, it may be possible that the set of causal parameters is just identified; that is, there is one and only one estimate for each causal parameter. If there are more correlations (data points) than parameters, the structural model is said to be over identified, that is, there is more than one way of estimating a causal parameter in the system. A set of equations is said to be under identified if there are more parameters than correlations.
That is, if there are fewer data points than parameters to be estimated, the model is said to be under identified and the parameters cannot be estimated. The number of parameters needs to be reduced by fixing, constraining or deleting some of them. A parameter may be fixed by setting the parameter equal to another parameter (Kline, 2005).
To assess the status of identification, one determines the number of correlations between the observed or measured variables. If n variables are measured, the number of correlations is . This is followed by taking a count of the number of unknown parameters making sure that it includes: 1. All the path coefficients 2. All correlations between purely exogenous variables 3. All correlations between disturbances 4. Factor loadings/path coefficients of the disturbances (Jeremy & Hun, 2009). 31 | Olaoye, 2020 OJED 1(2)

(b) Model Estimation
A properly specified SEM often has some fixed parameters and some free parameters to be estimated from the data. This is because the scale of a latent variable is arbitrary and has to be set.

(c) Model Evaluation
Once model parameters have been estimated, the next step is to make a dichotomous decision either to retain or reject hypothesized model. This is essentially a statistical hypothesis testing problem, with the null hypothesis being that the model under consideration fits the data. Specifically, two general aspects of the model are evaluated: 1. The overall fit of the model.

The significance of parameters of the model
SEM has a number of fit indices and one of the mostly reported fit indices is Chi square ( 2 ). Its major attraction is that, of all of the fit indices used in SEM, it is the only one that has a test of significance associated with it. As a rough rule of thumb, you want 2 to be nonsignificant and 2 to be less than or equal to 3. However, the major problem with the 2 test is its sensitivity to the normality of the data and to the sample size: too many subjects, and it is always significant; too few subjects, and it is always insignificant (Suhr, 2000).
There are five general classes of fit indices: comparative fit, absolute fit, residual-based fit, proportion of variance accounted for indices, and parsimony adjusted proportion of variance accounted for indices. According to Bollen (1989) and Bentler (1999)

2) To test hypotheses
Model modification involves adjusting a specified and estimated model by either freeing parameters that were fixed or fixing parameters that were free. Modification indices signify the improvement in fit that will result in the inclusion of a particular relationship in the model. In SEM, model modification is analogous to post-hoc comparisons in ANOVA. Model modification could sacrifice control over Type I error and lead to a situation where sample specific characteristics are generalized to a population. Some of the methods that are used in model modification are Chi-square, Lagrange multiplier, Wald test. All are asymptotically equivalent under the null hypothesis but approach model modification differently (Pui-Wa & Quiong, 2007).

SIMILARITIES BETWEEN CONVENTIONAL STATISTICAL METHODS AND SEM
According to Suhr (2000), SEM is similar to traditional methods like correlation, regression and analysis of variance in many ways. Firstly, both traditional methods and SEM are based on linear assumption. Secondly, statistical tests associated with both methods are valid if certain assumptions are met. Traditional methods and SEM are based on normal distribution assumption. Thirdly, neither approach offers a test of causality. Suhr (2000) observed the following differences between SEM and other statistical methods of examining relationship between variables.

DIFFERENCES BETWEEN CONVENTIONAL STATISTICAL METHODS AND SEM
First, SEM is a highly flexible and comprehensive methodology. This methodology is appropriate for investigating achievement, economic trends, health issues, family and peer dynamics, self-concept, self-efficacy, Finally, a graphical language provides a convenient and powerful way to present complex relationships in SEM. Model specification involves formulating statements about a set of variables. Diagram, a pictorial representation of a model, is transformed into a set of equations. The set of equations are solved simultaneously to test model fit and estimate parameters.

ITEM PARCELLING
One of the assumptions of SEM as stated above is the use of multiple indicators. This has the advantage of minimizing measurement errors and enhancing reliability. However, multiple indicators may increase complexity of the structural equation model due to increase in number of items. One approach that is commonly adopted to mitigate model complexity is exploiting item parcel as indicators rather than individual items. Parcelling involves the averaging or summing of several raw items to form a single score, which can then be used as an indicator of a latent variable in a factor analysis model or structural equation model (Sterba, 2011). It is a measurement practice that is used mostly in multivariate approaches to psychometrics, particularly for use with latent variable analysis techniques (e.g SEM).
Thus, a parcel can be defined as an aggregate of indicators comprising the sum (or average) of two or more items or responses.
Item parcelling is a process by which raw items' scores are combined into sub-scales prior to analysis. This is commonly done by summing or averaging item responses into parcel scores which are then used as the indicator variables in the CFA analyses. An item parcel may be defined as "an aggregate-level indicator comprised of the sum (or average) of two or more items" (Little, Cunningham, Sahahar & Widaman, 2002). Parcels are more reliable than individual items, have more scale points, and are more likely to have linear relations with each other and with factors (Comrey, 1988;Little et al., 2002;Kishton & Widaman, 1994).
The proponents of parcelling view it as an attempt to iron out the inevitable empirical "wrinkles" caused by the unreliability of items, the non-linear relations between items, the unequal intervals between scale points, the smaller ratio of common variance to unique variance, and the tendency for unique variances to be correlated in confirmatory factor analyses. Such "wrinkles" may lead to unsatisfactory factor analytic results and the rejection of useful measurement models (Little et al., 2002).
When items are aggregated, their shared variance is pooled which means that the proportion of common variance increases relative to the proportion of unique variance. This leads to stronger factor loadings and communalities. Furthermore, the distributions of parcels are likely to be more normal than the distributions of individual items. Further advantages are that the number of scale points in parcels is increased and that the distances between scale points are likely to be reduced. Bandalos (2002) demonstrated that when items within a particular scale have a unidimensional structure, the factor analysis of parcels leads to improved model-data fit and less biased structural parameters. When the items have a multidimensional structure, however, the factor analysis of parcels may mask the multidimensionality and lead to the acceptance of mis specified models as well as biased structural parameters. Hence, it is recommended that parcelling should be used only when the items within a scale have a unidimensional structure (Bandalos, 2002;Little et al., 2002). In a nutshell, the psychometric merits of item parcelling include greater reliability than individual items, higher communality, distributions that are more closely approximate normality and an interval scale, a more optimal indicator to sample size ratio, less item-idiosyncratic influence, a greater likelihood of achieving a proper model solution, and better model fit.

METHODS OF PARCELING
The three methods of parcelling according to De Bruin (2004) are (a) random assignment of items to parcels, (b) a priori parcel construction, and (c) empirical assignment of items to parcels. Random assignment of items to parcels is justified when the items form an essentially unidimensional scale. Under this condition, each item may be seen as an alternative and equivalent indicator of the construct or factor. Here the researcher first decides on the number of parcels he or she prefers and then randomly assigns (without replacement) items to the parcels.
A second approach to parcelling is to intentionally construct homogenous sets of items that are aggregated to form parcels. This approach requires of the researcher to first specify the number of parcels and the content or meaning of the parcels. Homogeneous sets of items are then written for each parcel.
Lastly, parcels may be formed empirically, where the total pool of items is subjected to a factor analysis.
Clusters of highly correlating items are then combined to form parcels, which then serve as the input variables for further analyses (Gorsuch, 1997;Schepers, 1992).
However, the general practice of parcelling is criticized by some authors (Bandalos, 2002). The critics, whom Little et al. (2002) described as philosophically empirical-conservative, argue that parcelling distorts the reality and that it serves as a smoke screen that clouds the issues of incorrect model specification and/or poor item selection.
These critics believe that all sources of variance in an item should be reflected in a confirmatory factor analysis. In contrast, the proponents of parcelling described it as philosophically pragmatic-liberal with the opinion that it is impossible to account a priori for every possible source of variance in each item (Little et al., 2002).

SOME SOFTWARE CURRENTLY USED FOR SEM
The estimation techniques currently employed for the analysis of SEM are LISREL (linear structural relationship), Equations (EQS), Mplus, Amos (analysis of moment structure), Mx (matrix). Others include CALIS (covariance analysis and linear structure equations), RAMONA (reticular action model or near approximation), SEPATH (structural equation modeling and path analysis).

TERMS USED IN SEM AND THEIR MEANINGS
A model is a statistical statement about the relationships between variables.
A structural model is a part of the entire structural equation model diagram used to relate all the latent variables (factors) that will be needed to account for in the model.
A measurement model is a part of the entire structural equation model diagram that relates latent variables to their indicators.
A path diagram is a pictorial representation of a model.
A variable in SEM could be latent, measured, exogenous or endogenous.
A latent variable is a variable that is not amenable to direct measurement i. e. it cannot be directly observed.
A measured variable is the one that can be observed directly.
A recursive or unidirectional or hierarchical structural equation model is a model in which causation is in one single direction.
A non-recursive or bidirectional or non-hierarchical structural equation model has causation which flows in both directions at some parts of the model.