Machine Learning With IBM SPSS - Regression

Introduction

Machine Learning this day has become ubiquitous in the sense that four out of five standing would agree to have heard of the words. Sincerely, machine learning is providing us with many techniques to solve problems that literally would require one to have advanced coding skills to solve. But I have good news, you can use machine learning to solve your problems by using already-made algorithms featured in many software such as IBM SPSS and MS EXCEL. In this blog, you shall be introduced to IBM SPSS Machine Learning Algorithms to solve some real-world problems.

We shall be using “Automobile Datasets” from Kaggle, and the details of the data can be found from the website: Dataset

NB: In this blog we are assuming you have some basic knowledge of data analysis such as the approaches to dealing with data. Also, it is assumed that you have basic knowledge of IBM SPSS. There are some interesting blogs that you may check out if the assumption does not work for you.

Let’s start with the first technique which is regression. For details on Regression analysis: Linear Regression Simplified

Regression Analysis

Machine Learning Regression Analysis - SPSS

Figure 1 shows the first ten rows and first ten columns of our data. Our scenario may be evaluating the impacts of some variables on the price of a car. Now the task is: what are these variables?

Follow the steps below to identify the key variables that may be used to develop a predictive model such as regression on the data:

Do a blind analysis on the variables such as performing random visualization on the variables to see the relationship between them.
Do a correlation analysis to fast track the first step above.
Identify the most correlated variables based on their correlation coefficients.
Do the visualization of your identified predictors against the target variable.
Then perform your regression analysis.

Blind Analysis

Intuitively, we have selected ‘enginesize’ ‘numofcylinders’ ‘wheelbase’ ‘horsepower’ as the predictors and ‘price’ as the target variable. As you may know, we need continuous variables to perform regression analysis but ‘numofcylinders’ is a string type variable. Hence, we’ve got to transform the variable (see this blog on how to do that: Variable Transformation). We transformed the variable and named it ‘numofcylinders_tr.’

Data visualization

Machine Learning Regression Analysis - SPSS

Correlation analysis

See this blog on how to do correlation analysis on IBM SPSS: Correlation Analysis

Machine Learning Regression Analysis - SPSS

The Regression Model

Follow this: Analyze >>> Regression >>> Linear as shown below

Machine Learning Regression Analysis - SPSS

After you click ‘Linear’ the window below should appear. Select the dependent variable and independent variable(s).

Machine Learning Regression Analysis - SPSS

After selecting the variables for the model, then you can proceed to Statistics to make some choices as shown in figure below. To make it simple, we have selected the Estimates Model fit and Descriptives. (Ensure you check out the other functions to deepen your knowledge.) Then click Continue and Ok.

Machine Learning Regression Analysis - SPSS

Regression Model results

Focus on three main parameters to interpret regression model results:

R Square value
Sig. of ANOVA
Signs of the coeffeicients

Machine Learning Regression Analysis - SPSS

R Square value of 0.807 implies that about 80% of the variation in the price of the car is explained by the model (predictors). The rest of the 20% are for the variables that are not included in the model.

Machine Learning Regression Analysis - SPSS

Sig. value implies that the p-value of the regression model is less than 0.0005 and therefore less than 0.05. Thus, the regression model is significant.

Machine Learning Regression Analysis - SPSS

Mathematically,

The regression model equation for prediction is given by:

Machine Learning Regression Analysis - SPSS

The regression equation above implies all the predictors have a positive relationship with the price that is, as they increase the price of a car also increase. This was inferred from the ‘+’ sign of the coefficients.

Conclusion

In this blog, the regression machine learning technique has been introduced. Multiple linear regression is presented above. Here is a task for you: perform simple linear regression with each of the predictors separately and compare the R Square value with the one above. Ask yourself this question: why is it different? Ensure you answer the question!

See you in the next technique - Logistic Regression!!!

← Back

Shopping cart

Machine Learning With IBM SPSS - Regression

-

search

Category

Recent Posts

Tags

Machine Learning With IBM SPSS - Regression

Introduction

Blind Analysis

Correlation analysis

The Regression Model

Regression Model results

Conclusion

Comments

Leave a Reply

Do you need help with your academic work? Get in touch

AcademicianHelp