By Mike Loughrin, CEO for Transformance Advisors
Data Visualization Powerhouse
Scatter diagrams (AKA scatter plots) are one of my favorite data visualization tools. They can quickly show a correlation between two sets of data. Research ranks the scatter diagram as the best chart when it comes to people perceiving information quickly and accurately.
Revealing how two sets of data are correlated can often be the central point in a report or proposal. Making a “call to action” based upon easy to understand charts is the mission of many data visualization wizards.
However, scatter diagrams will not work for everything. A great diagram will “pop” off the screen and immediately announce there is something important to see. Alternatively, working with the wrong data will be frustrating and lead to a meaningless diagram which confuses people.
Let’s explore what it takes to create a great scatter diagram and what to avoid.
“The greatest value of a picture is when it forces us to notice what we never expected to see.”
– John W. Tukey
Great Scatter Diagrams
A great scatter occurs when you have two sets of data which you suspect have a connection (correlation) to each other. There are three types of correlation:
- A positive correlation occurs when both sets of data move in the same direction. For example, height and weight are an excellent example which almost always have a positive correlation. Taller people tend to weigh more than those who stay closer to the ground. Of course, there are exceptions and the correlation is not 100%. That’s the reason you will see the data points “scatter” over the chart. When a positive correlation exists, you will see the dots on a scatter diagram slope upward from left to right.
- A negative correlation occurs when the sets of data move in opposite directions. For example, higher interest rates and new car sales will almost always move in opposite directions. Higher interest rates lead to lower new car sales. When a negative correlation exists, you will see the dots on a scatter diagram slope downward from left to right.
- No correlation occurs when the sets of data have no connection to each other. For example, the temperature in Amsterdam will probably not relate to the score of the Packers vs. da Bears game in Chicago. If you made a scatter diagram of the last 50 games, it would show no relationship and simply look like random dots on the chart.
Selecting two sets of data for a great scatter diagram is easier said than done. Possibilities besides those mentioned above include:
- Years of experience vs. salary
- Amount of snow vs. time to clear the streets
- Covid-19 vaccinations vs. Covid-19 related deaths
- Investments in mental health services vs. crime rate
- Time spent studying vs. correct answers on the final exam
A great scatter diagram will quickly show if a correlation exists and how strong the correlation is. However, do not leap from correlation to causation. A scatter diagram can show two things appear to be correlated. It does not prove one causes the other.
One final point is how a scatter diagram can also be “great” if it demonstrates there is no correlation where people have assumed one exists. If you can disprove a false belief, then you are on the road to becoming a data visualization wizard.
“Great charts enable, inform, and improve decision making.”
– Dante Vitagliano
Frustrating Scatter Diagrams
I have found most articles about scatter diagrams will give a few examples of data which shows a correlation. However, they all seem to miss a discussion on what not to do. My experience and research reveals you should avoid the following:
- Time: If one set of your data is date or time, then you probably want to make a run chart or a control chart. Showing data over time is not leveraging the power of a scatter diagram. A run chart can show a trend over time. This trend in not the same thing as correlation.
- Descriptions: You can’t do much if one set of your data is names such as California, Colorado, New York, and Texas. These descriptions are not “continuous” data and don’t work well on a scatter diagram. Some things like small, medium, and large could be converted to 16 ounces, 21 ounces, and 32 ounces. (A nod to my days at Sweetheart Cup.) But for most descriptive data, look at visual charts like maps or pie charts.
- Buckets: Be careful if one set of data is in buckets like months, grades, rankings, or other categories meant to consolidate or group the data. These buckets will often be more powerfully displayed on a histogram. If your data is begging for a histogram, then don’t waste your time on a scatter diagram.
- No correlation: Showing a scatter diagram with no correlation will generally not advance the story you want to tell. Your viewers will wonder why you are wasting their time. They will question your claim of being a data visualization wizard. The one exception, as mentioned above, is when you know your audience thinks a correlation exists and you can clear the fog with a scatter diagram showing two things have no correlation.
Frustration is sure to follow if you force the wrong data onto a scatter diagram. Additional possibilities, besides those listed above, include:
- Precipitation by month (use a histogram)
- Customer service measurements over time (use a run chart)
- Percentage of sales for your top 3 customers (try a pie chart)
- The planets and their distance from the sun (use a visual map)
- Your salary vs. the age of your neighbor’s dog (no correlation)
In general, know your data and the story you want to tell. Different types of charts have their unique strengths and weaknesses. Don’t force your data onto a scatter diagram when another type of chart will do a better job.
“I have not failed. I’ve just found 10,000 ways that won’t work.”
– Thomas A. Edison
Summary
Scatter diagrams are a powerful tool used by data visualization wizards. The best ones will “pop” off the screen and quickly reveal when positive, negative, or no correlation exists. Your viewers should exclaim “wow” at what they see.
As with life, there are right ways and wrong ways to craft a scatter diagram.
The right way will reveal a positive or negative correlation exists. Or reveal where no correlation exists when tribal knowledge believes differently.
The wrong way will lead to frustration for both the creator of the chart and the victim who finds no value in the visualization.
Those with the courage and perseverance to master the creation of great scatter diagrams will find their data visualization stories come to life.
“Life is like a scatter plot with millions of data values and no correlation.”
– Dulan Gamage
Mike Loughrin is the CEO and Founder of Transformance Advisors. He also teaches for Louisiana State University Shreveport and is on the board of directors for the Association for Supply Chain Management Northern Colorado.
Mike brings exceptional experience in industry, consulting services, and education. Mike has helped organizations such as Levi Strauss, Warner Home Video, Lexmark, and Sweetheart Cup.
Keeping a commitment to a balanced life, Mike loves downhill skiing, bicycle rides, and hiking in the mountains. See one of his trails of the month at: Little Switzerland.
Subscribe to our newsletter
References
What is a Scatter Diagram? by kanbanize.
What is a Scatter Diagram? by PPCexpo.
Scatter Diagram Types by Project Management Academy.