Exploratory factor analysis - summary of chapter 18 of Statistics by A. Field (5th edition)

Statistics
Chapter 18
Exploratory factor analysis

In factor analysis, we take a lot of information (variables) and a computer effortlessly reduces this into a simple message (fewer variables).

## When to use factor analysis

Latent variable: something that cannot be accessed directly.

Measuring what the observable measures driven by the same underlying variable are.

Factor analysis and principal component analysis (PCA) are techniques for identifying clusters of variables.
Three main uses:

• To understand the structure of a set of variables
• to construct a questionnaire to measure an underlying variable
• to reduce a data set to a more manageable size while retaining as much of the original information as possible.

## Factors and components

If we measure several variables, or ask someone several questions about themselves, the correlation between each pair of variables can be arranged in a table.

• this table is sometimes called the R-matrix.

Factor analysis attempts to achieve parsimony by explaining the maximum amount of common variance in a correlation matrix using the smallest number of explanatory constructs.
Explanatory constructs are known as latent variables (or factors) and they represent clusters of variables that correlate highly with each other.

PCA differs in that it tries to explain the maximum amount of total variance in a correlation matrix by transforming the original variables into linear components.

Factor analysis and PCA both aim to reduce the R matrix into a smaller set of dimensions.

• in factor analysis these dimensions, or factors, are estimated form the data and are believed to reflect constructs that can’t be measured directly.
• PCA transforms the data into a set of linear components. It doesn’t estimate unmeasured variables, it just transforms measured ones.

Graphical representation

Factors and components can be visualized as the axis of a graph along which we plot variables.
The coordinates of variables along each axis represent the strength of relationship between that variable and each factor.
In an ideal world a variable will have a large coordinate for one of the axes and small coordinates for any others.

• this scenario indicates that this particular variable is related to only one factor.
• variables that haver large coordinates on the same axis are assumed to measure different aspects of some common underlying dimension.

If we square the factor loading for a variable we get a measure of its substantive importance to a factor.

Mathematical representation

A component ins PCA can be described as:

Componenti = b1Variable1i + b2Variable2i + … + bnVariableni

There is no intercept in the equation because the lines intersects at zero.
There is no error because we are simply transforming the variables.

Ideally, variables would have very high b-values for one component and very low b-values for all other components.

The factors in factor analysis are not represented the same way as components.
A factor is defined as:

Common factors: factors that explain the correlations between variables
Unique factors: factors that cannot explain the correlations between variables

In PCA we predict components from the measured variables
In factor analysis we predict the measured variables form the underlying factors.

Both factor analysis and PCA are linear models in which loadings are used as weights.
In both cases, these loadings can be expressed as a matrix in which the columns represent each factor and the rows represent the loadings of each variable on each factor.

Factor scores

Having discovered which factor exists, and estimated the equation that describes them, it should be possible to estimate a person’s score on a factor, based on their scores for the constituent variables.
These are known as factor scores (or components in PCA).

The scales of measurement will influence the resulting scores, and if different variables use different measurement scales, then factor scores for different factors cannot be compared.

There are several techniques for calculating factor scores that use factor score coefficients as weights rather than the factor loadings.
Factor score coefficients can be calculated in several ways

• the simplest way is the regression method, in which factor loadings are adjusted to take account of in the initial correlations between variables.

Tot obtain the matrix of factor score coefficients (B) we multiply the matrix of factor loadings by the inverse (R-1) of the original correlation or R-matrix.

Using the regression technique, the resulting factor scores have a mean of 0 and a variance equal to the squared multiple correlation between the estimated factor scores and the true factor values.

• the downside is that the scores can correlate not only with factors other than the one on which they are based, but also with other factor scores from a different orthogonal factor.
• To overcome this problem two adjustments have been proposed
-The Barlett method
produces scores that are unbiased and that correlate only with their own factor
-the Anderson-Rubin method
a modification of the Barlett method that produces factor scores that are uncorrelated and standardized

Factor scores: a composite score for each individual on a particular factor.

There are several uses of factor scores

• if the purpose of the factor analysis is to reduce a large set of data into a smaller subset of measured variables, then the factor scores tell us an individual’s score on this subset of measures.
Any further analysis can be carried out on the factor scores rather than the original data.
• overcoming collinearity problems in linear models
If we have identified sources of multicollinearity in a linear model then a solution is to reduce collinear predictors to a subset of uncorrelated factors using PCA and enter the component scores as predictors instead of the raw variable scores. By using uncorrelated component scores as predictors we can be confident that there will be no correlation between predictors. Hence, no multicollinearity.

## Discovering factors

Choosing a method

There are two things to consider

• whether you want to generalize the findings from your sample to a population
• whether you are exploring your data or testing a specific hypothesis
This chapter describes techniques for exploring data using factor analysis.

Assuming we want to explore, we need to consider whether we want to apply our findings to the sample collected (descriptive method) or to generalize our findings to a population (inferential method).

Certain techniques assume that the sample used is the population and results cannot be extrapolated beyond that sample.

A different approach assumes that participants are randomly selected but that the variables measured constitute the population of variables in which they are interested.
By assuming this, it is possible to generalize from the sample to a larger population, but with the caveat that any findings hold true only for the set of variables measured;

Communality

The total variance of a variable in the R-matrix will have two components:

• Common variance: shared with other variables or measures
• Unique variance:specific of that measure
• random variance: variance that is specific to one measure but not reliably so
• Communality: the proportion of common variance present in a variable

Thus, a variable that has no unique variance would have a communality of 1.
A variable that shares none of its variance with any other variable would have a communality of 0.

Factor analysis tries to find common underlying dimensions within the data and so is primarily concerned with the common variance.
We want to find out how much of the variance in our data is common.

To solutions

• assume that all variance is common by assuming that the communality of every variable is 1
• estimate the amount of communalities
Squared multiple correlation (SMC) for each variable with all others.

Factor analysis or PCA?

Factor analysis derives a mathematical model from which factors are estimated.
Principal component analysis decomposes the original data into a set of linear variates.

Only factor analysis can estimate the underlying factors and it relies on various assumptions for these estimates to be accurate.

PCA is concerned only with establishing which linear components exists within the data and how a particular variable might contribute to a given component.

Theory behind PCA

Principal component analysis works in a very similar way to MANOVA and discriminant function analysis.

• correlation matrices represents the same information as an SSCP in MANOVA.

We take the correlation matrix and calculate the variances.
There are no groups of observations, so the number of variates calculated will always equal the number of variables measured (p).
The variates are described by the eigenvectors with the correlation matrix.
The elements of the eigenvectors are the weights of each variable on the variate. The values are the loadings.
The largest eigenvalue associated with each of the eigenvectors provides a single indicator of the substantive importance of each component. The basic idea is that we retain components with relatively large eigenvalues and ignore those with relatively small eigenvalues.

Factor analysis works differently, but there are similarities.

• rather than using the correlation matrix, factor analysis starts by estimating the communalities between variables using SMC.
It then replaces the diagonal of the correlation matrix with these estimates.
Then the eigenvectors and associated eigenvalues of this matrix are computed.
These eigenvalues tell us about the substantive importance of the factors, and, based on them, a decision is made about how many factors to retain.

Factor extraction: eigenvalues and the scree plot

In both PCA and factor analysis, not all factors are retained.
Extraction:the process of deciding how many factors to keep.

Eigenvalues associated with a variate indicate the substantive importance of that factor.
Retain only factors with large eigenvalues.

Scree plot: plotting each eigenvalue (Y-axis) against the factor with which it is associated (X-axis).

It is possible to obtain as many factors as there are variables and each has an associated eigenvalue.
By graphing the eigenvalues, the relative importance of each factor becomes apparent.
Typically there will be a few factors with quite high eigenvalues, and many factors with relatively low eigenvalues.
This graph as a very characteristic shape, there is a sharp descent in the curve followed by a tailing off.

• the point of inflexion is where the slope of the line changes dramatically. This point might be used as a cut-off point for retaining factors.

An alternative to the scree plot is to use the eigenvalues, because these represent the amount of variation explained by a factor.
You set a criterion value that represents a substantial amount of variation and retain factors with eigenvalues above this criterion.
Two common criteria

• Kaiser’s criterion
Retain factors with eigenvalues greater than 1
• a more liberal value of .7.

The three criteria often provide different answers.
In these situations consider the communalities of the factors.
In both PCA and factor analysis we determine how many factors/components to extract and then re-estimate the communalities. The factors we retain will not explain all the variance data and so the communalities after extraction will always be less than 1.
The factors retained do not map perfectly onto the original variables, they merely reflect the common variance in the data.
The closer the communalities are to 1, the better our factors are at explaining the original data.
The communalities are good indices of whether too few factors have been retained.

Factor rotation

Once factors have been extracted, it is possible to calculate the degree to which variables load onto these factors.

• generally, you will find that most variables have high loadings on the most important factor and small loadings on all other factors.

Factor rotation: a technique used to discriminate factors.
If we visualize our factors as an axis along which variables can be plotted, then factor rotation effectively rotates these axes such that variables are loaded maximally to only one factor.

Factor rotation amounts to rotating the axes to try to ensure that both clusters of variables are intersected by the factor to which they relate most.
After rotation, the loadings of the variables are maximized on one factor and minimized on the remaining factor(s).
If an axis passes through a cluster of variables, then these variables will have a loading close to zero on the opposite axis.

There are two flavours of rotation

• Orthogonal rotation
we rotate factors while keeping them independent, or uncorrelated .
Before rotation, all factors are independent, and orthogonal rotation ensures that the factors remain this way.
• Oblique rotation
Allows factors to correlate.

SPSS implements three methods or orthogonal rotation

• Quartimax
• Varimax
• Equamax
a hybrid between the other two.

SPSS has two methods of oblique rotation

• Direct oblimin
Determines the degree to which factors are allowed to correlate by the value of a constant called delta.
The default value in SPSS is 0, and this ensures that high correlation between factors is not allowed (direct quartimin rotation)
If you set delta less than 0 (down to -0.8) you can expect less correlated factors.
• Promax
A faster procedure designed for very large data sets

The choice of orthogonal or oblique rotation depends on:

• whether there is a good theoretical reason to suppose that factors should correlate or be independent
• how the variables cluster on the factors before rotation

Factor transformation matrix: used to convert the unrotated factor loadings into the rotated ones. Values in this matrix represents the angle through which the axes have been rotated, or the degree to which factors have been rotated.

Interpreting the factor structure

Once a structure factor has been found, it needs to be interpreted.
Loadings are a gauge of the substance of a given variable to a given factor.
We use these values to place variables with factors. Every variable will have a loading on every factor, so we’re looking for variables that load highly on a given factor. Once we’ve identified these variables, we look for a theme within them.

It is possible to assess the statistical significance of a loading, but the p-value depends on sample size.
So, instead we can gauge importance by squaring the loadings to give an estimate of the amount of variance in a factor for by a variable.

## Preliminary analysis

• Scan the correlation matrix for variables that have very small correlations with most other variables, or correlate very highly (r = 0.9) with one or more other variables.
• in factor analysis, check that the determinant of this matrix is higher than 0,00001; if it is then multicollinearity isn’t a problem. You don’t need to worry about this for principal component analysis.
• In the table labelled KMO and Bartlett’s Test the KMO statistic should be greater than 0,5 as a bare minimum; if it isn’t, collect more data. You should check the KMO statistic for individual variables by looking at the diagonal of the anti-image matrix. These values should also be above 0,5 (this is useful for identifying problematic variables if the overall KMO is unsatisfactory).
• Bartlett’s test of sphericity will usually be significant (the value of Sig. Will be less than 0,05), if it’s not, you’ve got a disaster on your hands.

## Factor extraction

• To decide how many factors to extract, look at the table labelled Communalities and the column labelled Extraction. If these values are all 0,7 or above and you have less than 30 variables then the default (Kaiser’s criterion) for extracting factors is fine. Likewise, if your sample size exceeds 250 and the average of the communalities is 0,6 or greater. Alternatively, with 200 or more participants the scree plot can be used.
• Check the bottom of the table labelled Reproduced Correlations for the percentage of ‘nonredundant residuals with absolute values greater than 0.05’. this percentage should be less than 50% and the smaller it is, the better.

## Interpretation

• If you’ve conducted orthogonal rotation then look at the table labelled ‘Rotated Factor Matrix’. For each variable, note the factor/component for which the variable has the highest loading (above about 0,3-0.4 when you ignore the plus or minus sign). Try to make sense of what the factors represent by looking for common themes in the items that load highly on the same factor.
• If you’ve conducted oblique rotation then do the same as above but for the ‘Pattern Matrix’. Double-check what you find by doing the same for the ‘Structure Matrix’.

## How to report factor analysis

When reporting factor analysis, provide readers with enough information to make an informed opinion about what you’ve done.
Be clear about your criteria for extracting factors and the method of rotation used.
Provide a table of the rotated factor loadings of all items and flag values about a criterion level.
Report the percentage of variance that each factor explains and possibly the eigenvalue too.

## Reliability analysis

Measures of reliability

If you’re using factor analysis to validate a questionnaire, it is useful to check the reliability of your scale.
Reliability: that a measure should consistently reflect the construct that it is measuring.

• the simplest way is to use split-half reliability.
Splits the scale set into randomly selected sets of items. A score for each participant is calculated on each half of the scale. If a scale is reliable a person’s score on one half of the scale should be the same as their score on the other half.
Across several participants, the scores from the two halves of the questionnaire should correlate very highly.

Cronbach’s alpha, α: a measure that is loosely equivalent to creating two sets of items in every way possible and computing the correlation coefficient for each split. The average of these values.

Α = (N2 * mean(cov))/ (Σs2item+Σcovitem)

For each item in our scale we cal calculate two things.

• the variance within that item
• the covariance between a particular item and any other item on the scale.

We can construct a variance-covariance matrix of all items.
The top half of the equation is the number of items (N) squared multiplied by the average covariance between items.
The bottom half is the sum of all the item variances and item covariances.

Interpreting Cronbach’s α: some cautionary tales

The value of α depends on the number of items on the scale.
Alpha should not be used as a measure for unidimensionality (the extent to which the scale measures one underlying factor or construct).
Reverse items will affect α.

## Reliability

• Reliability analysis is used to measure the consistency of a measure
• Remember to reverse-score any items that were reverse-phrased on the original questionnaire before you run reliability analysis
• Run separate reliability analyses for all subscales of your questionnaire
• Cronbach’s α indicates the overall reliability of a questionnaire, and values around 0,8 are good (or 0,7 for ability tests and the like)
• The ‘Cronbach’s Alpha if Item Deleted’ column tells you whether removing an item will improve the overall reliability: values greater than the overall reliability indicate that removing that item will improve the overall reliability of the scale. Look for items that dramatically increase the value of α and remove them.
• If you remove items, rerun the factor analysis to check that the factor structure still holds.

How to report reliability analysis

Report the reliabilities in the text using the symbol α.

Page access
Public
Content is used in bundle
This content is related to:
Using summaries:

Join World Supporter
Join World Supporter
Content categories
• Public
• WorldSupporters only
• JoHo members
• Private
Statistics
 [totalcount] 1 1