Continuing the discussion on skewed data after the previous article of Skewed Data on a Bar Chart, it is more common to encounter skewed data on numeric data, therefore, it is more likely to deal with skewed data on a scatterplot than on a bar chart. One of the classic examples is displaying the relationship between GDP (Gross Domestic Product, which measures a territory’s income) and population. If we plot the data from the territories on GDP and population, you may find the observations are very skewed to one side like this:

Territories’ GDP and Population in Absolute Terms

Unfortunately, most of the countries are clustered in a lower left area. At the same time, the scatterplot does not clearly show the relationship between GDP and Population.

Like the previous article suggested, you may take a logarithm to the data point. If you are visualizing the data with Python and Plotly, you don’t need to manually take a logarithm to each data point, but rather simply pass the logarithm as a parameter in the layout setting like below:

data = []data.append(go.Scatter(x=df[‘Population’], # No need to take log
y=df[‘Nominal_GDP’], # No need to take log
marker_color=df[‘color’],
text=df[‘Territory’],
hoverinfo=’text’,
mode=’markers’))

layout = {‘title’:{‘text’:’Nations’ GDP vs Population’, ‘x’:0.5},
‘xaxis’: {‘gridcolor’: ‘lightgray’,
‘type’:’log’ # Add this parameter to take a log on x-axis
},
‘yaxis’: {‘gridcolor’: ‘lightgray’,
‘type’:’log’ # Add this parameter to take a log on y-axis
},
‘plot_bgcolor’: ‘rgba(0,0,0,0)’}

fig = go.Figure(data=data, layout=layout)

Once you have passed these arguments to Plotly, it will generate the scatterplot like below:

Territories’ GDP and Population after Taking Logarithm

Now, not only it is more readable by declustering the observations, but also it is more clear to show the upward-sloping relationship between GDP and population.

The scripts for generating these scatterplots can be found on my Github

DataViz_notes/skewed_scatterplot at main · jacquessham/DataViz_notes

My LinkedIn:

https://www.linkedin.com/in/jacquessham/

Skewed Data on a Scatterplot was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

​ Level Up Coding – Medium

about Infinite Loop Digital

We support businesses by identifying requirements and helping clients integrate AI seamlessly into their operations.

Gartner
Gartner Digital Workplace Summit Generative Al

GenAI sessions:

  • 4 Use Cases for Generative AI and ChatGPT in the Digital Workplace
  • How the Power of Generative AI Will Transform Knowledge Management
  • The Perils and Promises of Microsoft 365 Copilot
  • How to Be the Generative AI Champion Your CIO and Organization Need
  • How to Shift Organizational Culture Today to Embrace Generative AI Tomorrow
  • Mitigate the Risks of Generative AI by Enhancing Your Information Governance
  • Cultivate Essential Skills for Collaborating With Artificial Intelligence
  • Ask the Expert: Microsoft 365 Copilot
  • Generative AI Across Digital Workplace Markets
10 – 11 June 2024

London, U.K.