Blog
Machine Learning this day has become ubiquitous in the sense that four out of five standing would agree to have heard of the words. Sincerely, machine learning is providing us with many techniques to solve problems that literally would require one to have advanced coding skills to solve. But I have good news, you can use machine learning to solve your problems by using already-made algorithms featured in many software such as IBM SPSS and MS EXCEL. In this blog, you shall be introduced to IBM SPSS Machine Learning Algorithms to solve some real-world problems.
We shall be using “Automobile Datasets” from Kaggle, and the details of the data can be found from the website: Dataset
NB: In this blog we are assuming you have some basic knowledge of data analysis such as the approaches to dealing with data. Also, it is assumed that you have basic knowledge of IBM SPSS. There are some interesting blogs that you may check out if the assumption does not work for you.
Let’s start with the first technique which is regression. For details on Regression analysis: Linear Regression Simplified
Regression Analysis
Figure 1 shows the first ten rows and first ten columns of our data. Our scenario may be evaluating the impacts of some variables on the price of a car. Now the task is: what are these variables?
Follow the steps below to identify the key variables that may be used to develop a predictive model such as regression on the data:
Intuitively, we have selected ‘enginesize’ ‘numofcylinders’ ‘wheelbase’ ‘horsepower’ as the predictors and ‘price’ as the target variable. As you may know, we need continuous variables to perform regression analysis but ‘numofcylinders’ is a string type variable. Hence, we’ve got to transform the variable (see this blog on how to do that: Variable Transformation). We transformed the variable and named it ‘numofcylinders_tr.’
Data visualization
See this blog on how to do correlation analysis on IBM SPSS: Correlation Analysis
Follow this: Analyze >>> Regression >>> Linear as shown below
After you click ‘Linear’ the window below should appear. Select the dependent variable and independent variable(s).
After selecting the variables for the model, then you can proceed to Statistics to make some choices as shown in figure below. To make it simple, we have selected the Estimates Model fit and Descriptives. (Ensure you check out the other functions to deepen your knowledge.) Then click Continue and Ok.
Focus on three main parameters to interpret regression model results:
R Square value of 0.807 implies that about 80% of the variation in the price of the car is explained by the model (predictors). The rest of the 20% are for the variables that are not included in the model.
Sig. value implies that the p-value of the regression model is less than 0.0005 and therefore less than 0.05. Thus, the regression model is significant.
Mathematically,
The regression model equation for prediction is given by:
The regression equation above implies all the predictors have a positive relationship with the price that is, as they increase the price of a car also increase. This was inferred from the ‘+’ sign of the coefficients.
In this blog, the regression machine learning technique has been introduced. Multiple linear regression is presented above. Here is a task for you: perform simple linear regression with each of the predictors separately and compare the R Square value with the one above. Ask yourself this question: why is it different? Ensure you answer the question!
See you in the next technique - Logistic Regression!!!
No comments added
Your one-stop website for academic resources, tutoring, writing, editing, study abroad application, cv writing & proofreading needs.