Blog
Categorical Regression Analysis With IBM SPSS
If you have been following the blogs on this platform you will notice that regression analysis has been explained in detail in both theory and practical. However, this blog aims to explain another type of regression analysis which is categorical regression. Categorical implies the regression analysis on categorical data. In this type of regression analysis, the categorical variables are recoded. Hence, we treat the data as normal continuous regression analysis. That simple!
Let’s apply this…!!!
This categorical regression describes the relationship between a response variable and a set of explanatory variables. Quantifying this relationship can predict response values for any combination of predictors. For instance, a business looking to sell a new carpet cleaner tool wishes to investigate the impact of five variables: packaging design, brand name, price, a Good Housekeeping seal, and a money-back guarantee on customer preference.
The carpet-cleaner variable is used to study the impact between the variables, as shown in the table below. Three brand names (K2R, Glory, and Bissell), three price levels, and two (either no or yes) are used for the last two variables. This variable represents a general indicator of preference for each customer profile. Using categorical regression, we will investigate the connections between the five components versus preference.
Variable name
Variable label
Variable values
package
Package design
A*, B*, C*
brand
Brand name
K2R, Glory, Bissell
price
Price
$1.19, $1.39, $1.59
seal
Good Housekeeping seal
No, yes
money
Money-back guarantee
Table 1: Explanatory variables in the carpet-cleaner study
The average ranks for each profile are contained in the variable preference. High preference is correlated with low ranks, and ten customers rank 22 profiles. The carpet_data file contains this data collection.
From the link above, import the data as a CSV file, then in Step 2 of the text import wizard, Select the labelled [1] as shown in figure 1 to notify Spss that your data has a header row.
Figure 1: To indicate that the data has a header row
So as shown in figure 2, uncheck the Space option as a delimiter for importing data, then continue selecting Next until the data is imported.
Figure 2: Deselected the Space as a CSV delimiter
Before analysing the imported data, we must first make sure our variables view is changed and lablled as shown in figure 4 and using the value shown in the table above.
Figure 3: Encode values and label in a given column
To produce categorical linear regression output from the menus, click Analyse => Regression => Linear.
Figure 4: Linear regression Analyse option
Drag and drop preference as the dependent variable as shown in figure 5, then select Package design and other variables as independent variables. Click on Plot, then select Y as *ZRESID and X as *ZPRED.
Figure 5: Standardised Linear regression plot
We have to select standardised in the Residuals group before doing the data analysis the select continue; in the Linear Regression dialogue, click OK to create.
Figure 6: Select Standardised Statistic
Linear regression is the standard method for explaining the relationships between variables; R2 is the most typical metric for assessing how well a regression model fits the data. This statistic shows the proportion of the response's variation that can be accounted for by the weighting combination of predictors. The more the model R2 is closer to 1, the better. Since R2 is 0.707, it indicates that the predictor variables in the linear regression account for almost 71% of the variation in the customer preference rankings when preference is inferred on the five predictors.
Model Summary b
Model
R
R Square
Adjusted R Square
Std. Error of the Estimate
1
.841a
.707
.615
3.998
a. Predictors: (Constant), Money-back guarantee, Price, Good Housekeeping seal, Brand name, Package design
b. Dependent Variable: Preference
Table 2: Regression Model summary
In the table, the standardized coefficients are displayed. If all other predictors remain constant, it is a sign that the coefficient tells us whether the expected response rises or falls when the predictor rises.
Coefficients a
Unstandardised Coefficients
Standardised Coefficients
t
Sig.
B
Std. Error
Beta
(Constant)
22.529
5.177
4.352
.000
-4.159
1.036
-.560
-4.015
.001
.429
1.054
.056
.407
.689
2.703
1.009
.366
2.681
.016
-4.314
1.780
-.330
-2.423
.028
-2.779
1.921
-.197
-1.447
.167
a. Dependent Variable: Preference
Table 3: Standardised Regression Coefficient
The category coding for categorical data establishes the significance of an increase in a predictor.
For instance, a higher money-back guarantee, a better package, or the Good Housekeeping mark will lead to a lower anticipated preference rating. For instance, a one standard deviation change in the brand name results in a 0.056 standard deviation increase in the projected preference. Because Preference's standard deviation is 6.44, it rises by 0.056 × 6.44 = 0.361. The most significant changes in predicted preference result from changes in package design.
Plotting is done between the standardised predicted values and the standardised residuals. The target is check if the linear model is appropriate for this analysis. How do we know?
Linear model fits the problem well if and only if no patterns is observed in the residual plot. But if a pattern is observed such as a U-shape then it is probably that linear model is not the best to fit the problem; a non-linear model would perform better. Figure 7 shows a blurry-U-shape pattern.
Figure 7: Standardised regression plot
This can be further understood by plotting another chart of standardised residual and package name. Select the following menu options to create a scatterplot of the residuals by the predictor Package design: Graphs >>> Chart Builder.
Figure 8: Using a graph to confirm the U shape pattern
Choose Simple Scatter from the Scatter/Dot collection. Choose the y-axis variable to be Standardised Residual and the x-axis to be Package design. Select OK.
Figure 9: A plot of standardised residual and package design
From the figure above, a U-shape pattern can be seen more clearly and it implies that a non-linear model better fits the relationship between preference and the predictors.
In this blog, categorical regression analysis has been explained and applied to real-world problem. The first part of it explains the categorical regression and it differences from continuous regression. Using the categorical regression on the data showed that it is not appropriate to best describe the relationship between the target variable and predictors.
The residual plots' U-shape suggests that the package design should be nominally treated. However, the influence of a predictor or the connections between the predictors cannot be entirely captured using only regression coefficients in this case or analysis. Thus, it is recommended that this approach of validating model is conducted to ensure that the best model is used for a problem.
No comments added
Your one-stop website for academic resources, tutoring, writing, editing, study abroad application, cv writing & proofreading needs.