Blog
Factor Analysis With SPSS
Introduction
Factor analysis as the name implies involves dealing with the (estimated) factors that most impact a variable of concern. This could result into data reduction or in a more robust term data preprocessing. Data reduction or structure detection are the two main applications of factor analysis. With factor analysis, you can have a clearer picture of your dataset.
Data reduction is the process of eliminating redundant (highly correlated) variables from the data file and maybe replacing them with a smaller set of uncorrelated variables.
The goal of structure discovery is to examine the underlying (or latent) relationships between the variables.
Numerous extraction techniques are available in the Factor Analysis approach to build a solution.
In order to reduce data: Finding a linear combination of variables (a component) that explains as much variance in the original variables as feasible is the first step in the principal components' extraction technique.
The primary aim of this analysis is to forecast vehicle sales using a variety of indicators. These indicators are as well function of many other factors.
However, from the problem, the two problems that every effective extraction technique should aim to address are:
The data used for this analysis can be found at automobile_dataset
We will use factor analysis with principal component extraction to concentrate on the investigation on a reasonable subset of the predictors.
From the Spss Analyze options, select: to do principal components factor analysis. Dimensions reduction => Analyze => Factor.
Then the Correlation matrix under the analyze is selected followed by the “Unrotated factor solution” and “Scree plot” for the display.
In the Factor Analysis dialogue, Click Rotation. Within the Method group, select Varimax. Then click continue and choose Scores.
Select Display the factor score coefficient matrix and save it as variables. Then click on Continue, In the Factor Analysis dialogue, and select Ok.
The communality value greater than or equal to 0.60 or an average value of 0.70 are considered good indicator of factors describing a variable.
From the result below, all the factors have their communality value greater than 0.60 and even on average it is above 0.70. In other words, every communality at this table is substantial., demonstrating that the extracted components accurately reflect the variables.
Assuming the communality is below the threshold then one might need to extract another component if the major components' communalities are extremely low in extraction.
Initial communalities are calculations of the variation in each variable that each component or factor can be accounted for. However, correlation analysis is always set to 1.0 for principal component extraction, as shown in figure 1.
Estimates of the variation in each variable that the components are expected to account for are known as extraction communalities.
In a correlations analysis, the total of the eigenvalues equals the number of components. Hence the first solution has the same number of components as variables. The top three main components comprise the retrieved solution because we asked for eigenvalues more significant than 1.
The extracted components are displayed in the second portion of the table. With just a 12% loss of information, they account for roughly 88% of the variability in the original 10 variables, allowing us to significantly decrease the complexity of the data collection.
As earlier stated, that if the selected factors are with low communality, then one needs to select another set of factors. Then the question would be: what if there are numerous factors?
Here is where the use of scree plot comes to play. We can find the ideal number of components using the scree plot. Each element's Eigenvalue in the first solution is displayed.
Typically, the number of drop-offs of the scree plot are considered as the good component for the analysis. In the figure below, there are three drop-offs (as shown with red circle below) before it finally becomes a straight line.
Price in thousands and horsepower are most correlated variable with the first component. Price expressed in thousands serves as a better proxy since it is less associated with the other two factors as shown in component 2 and 3.
Factor analysis is highly recommended when you are dealing with large dataset in which your target is a combination of many variables or influenced by many variables. This target variable is usually difficult to measure so you’ve got to combine information from many other explanatory variables to derive a factor.
In this analysis, we have identified important factors that are most correlated with the first component and the factor mostly capable of explaining the first component.
By combining factor analysis with a principal component’s extraction, any data file can be condensed from many variables to just few important ones. Let keep in mind that the associations specified in the rotated component matrix will have an impact on how future studies are interpreted.
Although this "translation" phase adds a little amount of complexity. Finally using principal component analysis for data reduction has an advantage of employing uncorrelated predictors and a smaller data file.
Related blogs: Ratio Analysis
No comments added
Your one-stop website for academic resources, tutoring, writing, editing, study abroad application, cv writing & proofreading needs.