Blog
Exploratory Data Analysis (EDA) With IBM SPSS
Exploratory data analysis (EDA) as the name implies it is an approach of data analysis that involves you searching (exploring your data) for information that can give you an insight into the content of your data. It is widely used across various fields and on all types of data. EDA can assist in determining the suitability of the statistical approaches we are considering for a data analysis. EDA provides various graphical and numerical data summaries for all instances or separately for subsets of cases.
The grouping variables might be ordinal or nominal, but the dependent variable must be a scale variable.
We shall be investigating the compliance of corn to the Afla-toxin standard. The threshold value for afla-toxin in corn is 20 PPB according to the United States law. Afla-toxin value above this threshold is deemed unsuitable for human consumption. Eight crop harvests have been sent to a grain processor, but before they can be processed, the parts per billion (PPB) distribution of aflatoxin must be evaluated.
The data used for this analysis can be found at aflatoxin_data. The data consist of 16 samples from each of the eight (8) crop yields.
Pick the following from the options to start the analysis: Analyse => Descriptive Statistics => Explore.
From figure 2 as shown, the Aflatoxin PPB should be chosen as the dependent variable. As the factor variable, choose Corn Yield. Select OK. Explore option provides you with many analyses from the data out of which you can infer key information. Use “Descriptives…” and compare the results with Explore to see the difference!
We have selected “Dependents together” because it enables us to visualize the afla-toxin (PPB) for each of the corn yield.
We can reposition the Descriptives table to show the desired data if we wish to analyze how the mean of Aflatoxin PPB changes with Corn Yield. Double-click the Descriptives table to make it active in the output window.
The statistics shown in Figure 3 is estimated for all the corn yield but only the corn yield 1 is cropped above. This is to safe space. Perform the analysis yourself to see the result for the rest of the corn yield.
According to the U.S. Regulatory Standard, the threshold aflatoxin for both human food and animal feed is 20 ppm (USDA). From Figure 3 (Box plot), some corn yields (1, 2, 3, 5 and 6) are beyond the threshold while some are within and below the threshold (4, 7, and 8). From this peradventure you want to carry out further analysis on the corn yield with Aflatoxin above 20 ppm, you can easily identify them for your analysis. Therefore, EDA is a very handy and timesaving first approach towards data analysis.
This blog introduces EDA as a key analysis approach to discovering information about your data that could enable you to identify cases for further analysis. In this blog, we have been able to identify the Corn Yield with Aflatoxin above, within, and below the threshold for human foods and animal feeds.
The Explore process has several choices that enable a more in-depth examination of how groups may differ from one another or expectations, whilst boxplots give some data regarding the shape of the distributions.
You might want to know the statistical significance of the difference in the Aflatoxin of each Corn Yield, I recommend you read the blog on “Compare Means”. This enables you to perform a statistically well-established comparative analysis.
EDA is a good start for a blind analysis!
Related blogs: Factor Analysis, Ratio Analysis
No comments added
Your one-stop website for academic resources, tutoring, writing, editing, study abroad application, cv writing & proofreading needs.