Importance of Data Visualization

During exploratory data analysis (EDA), one would often check the descriptive statistics of the data, handle missing values before modeling the data. Descriptive statistics are statistics that summarize the characteristics of a data often with a single value such as mean, standard deviation, variance, etc. It is very important that descriptive statistics be evaluated during EDA and before modelling.

Descriptive statistics are however not enough in describing data and in deciding the type of machine learning model to use in modelling the data.

Ascombe’s Quartet

Ascombe’s Quartet is a set of four data each consisting of two variables x and y, and 11 observations. This dataset created by Francis Anscombe in 1973 is very popular because it shows the importance of visualizing and graphing data before analysis and building machine learning models. The four data each has approximately the same simple statistics but they surprisingly look different when plotted. Let’s examine these data using python so as to see the importance of data visualization.

First, we will import the python libraries we need for this analysis, and then scrap the Ascombe’s quartet data from Wikipedia.

Python Notebook to scrap data from Ascombe's quartet

We will then have a view of the four data. They are as shown below:

Next, we will check the datatype of the data before going further with our analysis.

Python Notebook to check data type

Since all the columns are object datatype, and the first row of the data contains string of ‘x’ and ‘y’, we will drop the first row and then convert the datatype to float. The implementation and output are as shown below:

Python notebook code

python notebook code

For ease of analysis, we will split the data into two datasets, X and y where X will contain each x variable as a column, and y will contain each y variable as a column. Data X and y are as shown below:

Python Notebook code to split data

Python Notebook to process data

The descriptive statistics considered when work with the Asombe’s quartet are:

Mean of each column of X and of y.
Sample standard deviation (or variance) of each column of X and of y
Correlation between a column of X and the corresponding column in y.
The regression coefficient and intercept gotten from fitting a regression line on each of X and the corresponding column in y.
The coefficient of the determinant (r-squared) of the regression line fitted on each column of X with the corresponding column in y.

We implemented these statistics in python and they are shown below:

Python Notebok to compute statistics

Python Notebook for statistics data

From the descriptive statistics, we see that they are all the same.

Data Visualization

However, when we visualize the data, we see a different story.

Python Notebook Code to plot data

Data visualisation graph in Python

It turns out that though these data have the same descriptive statistics, they are actually distributed differently and have different graphical patterns.

Fitting models on data without visualizing can lead to heavy bias as we can see in the plots above. For example, x2 and y2 could better be modeled with a polynomial curve.

Conclusion

Descriptive statistics are very important but cannot in themselves tell the whole story about data. Data visualizing is very important and should be done before modelling. It may give us a clue on the kind of model to use in modelling our data so we wouldn’t build a biased model, it may also reveal outliers or influential points in our data.

← Back

kunal gaikwad Feb 11, 2022

What an amazing post! I always look forward to reading your posts. They are so engrossing and well-written. The way you consider your readers' preferences and leave a hint of personally relatable experiences is simply exceptional. There are always a few lines in your posts that stick with me and keep me hooked on your writing. Reading one of your blogs immediately makes visitors want to read another and share it. This post was wonderful, and please continue to share such blogs in the future. I believe that my expertise in the field in which you write will be very beneficial to you. You can get a sense of my writing style by browsing through my high-quality content

Shopping cart

Importance of Data Visualization

-

search

Category

Recent Posts

Tags

Importance of Data Visualization

Ascombe’s Quartet

Data Visualization

Conclusion

Comments

kunal gaikwad Feb 11, 2022

Leave a Reply

Do you need help with your academic work? Get in touch

AcademicianHelp