6 Visualization with plotnine
in Python
Visualization is a key part of statistical analyses, especially in systems engineering. In this tutorial, we’ll learn to visualize data with plotnine
(the Python port of ggplot2
).
Please follow along using the code below!
Getting Started
Gapminder data
## country continent year lifeExp pop gdpPercap
## 0 Afghanistan Asia 1952 28.801 8425333 779.445314
## 1 Afghanistan Asia 1957 30.332 9240934 820.853030
## 2 Afghanistan Asia 1962 31.997 10267083 853.100710
## 3 Afghanistan Asia 1967 34.020 11537966 836.197138
## 4 Afghanistan Asia 1972 36.088 13079460 739.981106
## ... ... ... ... ... ... ...
## 1699 Zimbabwe Africa 1987 62.351 9216418 706.157306
## 1700 Zimbabwe Africa 1992 60.377 10704340 693.420786
## 1701 Zimbabwe Africa 1997 46.809 11404948 792.449960
## 1702 Zimbabwe Africa 2002 39.989 11926563 672.038623
## 1703 Zimbabwe Africa 2007 43.487 12311143 469.709298
##
## [1704 rows x 6 columns]
## (country object
## continent object
## year int64
## lifeExp float64
## pop int64
## gdpPercap float64
## dtype: object, (1704, 6))
6.1 Your first scatterplot
## <plotnine.ggplot.ggplot object at 0x000001EC7B182BA0>
Add points with + geom_point()
.
## <plotnine.ggplot.ggplot object at 0x000001EC7A03D130>
Learning Check 1
Question
What kind of relationship does this graph show? Why might it matter to policymakers?
[View Answer!]
As wealth per person (GDP per capita) increases, life expectancy rises quickly then tapers off. This shows a strong relationship between wealth and health.
6.2 Transparency (alpha)
## <plotnine.ggplot.ggplot object at 0x000001EC7B1834A0>
## <plotnine.ggplot.ggplot object at 0x000001EC7B1B42C0>
## <plotnine.ggplot.ggplot object at 0x000001EC7B1B6DE0>
Learning Check 2
Question
What happens when you change alpha
across the three visuals above?
[View Answer!]
alpha
controls transparency from 0 to 1. Higher values are more opaque; lower values are more transparent.
6.3 Color: constant vs mapped
# Single color
(ggplot(gapminder, aes(x='gdpPercap', y='lifeExp')) +
geom_point(alpha=0.5, color='steelblue'))
## <plotnine.ggplot.ggplot object at 0x000001EC7B19AE40>
# Color mapped by continent
(ggplot(gapminder, aes(x='gdpPercap', y='lifeExp', color='continent')) +
geom_point(alpha=0.5))
## <plotnine.ggplot.ggplot object at 0x000001EC7B1DDB20>
Learning Check 3
Question
Where do you place color
for a single color vs. multiple colors based on a variable?
[View Answer!]
Single color: set color
inside geom_point(color='...')
(outside aes
). Mapped colors: set color
inside aes(color='variable')
.
6.4 Improving our visualizations
(ggplot(gapminder, aes(x='gdpPercap', y='lifeExp', color='continent')) +
geom_point(alpha=0.5) +
labs(x='GDP per capita (USD)',
y='Life Expectancy (years)',
color='Continent',
title='Does Wealth affect Health?',
subtitle='Global Health Trends by Continent',
caption='Points display individual country-year observations.'))
## <plotnine.ggplot.ggplot object at 0x000001EC7A8CBB30>
You can save visuals as objects to reuse them.
myviz = (ggplot(gapminder, aes(x='gdpPercap', y='lifeExp', color='continent')) +
geom_point(alpha=0.5) +
labs(x='GDP per capita (USD)', y='Life Expectancy (years)', color='Continent',
title='Does Wealth affect Health?', subtitle='Global Health Trends by Continent',
caption='Points display individual country-year observations.'))
myviz
## <plotnine.ggplot.ggplot object at 0x000001EC7AF92870>
## <plotnine.ggplot.ggplot object at 0x000001EC7A9AA0C0>
## <plotnine.ggplot.ggplot object at 0x000001EC7A94E8A0>
## <plotnine.ggplot.ggplot object at 0x000001EC7B196930>
6.5 Visualizing diamonds
data
## carat cut color clarity depth table price x y z
## 0 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43
## 1 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31
## 2 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31
## (carat float64
## cut category
## color category
## clarity category
## depth float64
## table float64
## price int64
## x float64
## y float64
## z float64
## dtype: object, (53940, 10))
Learning Check 4
Question
Why do the two boxplot versions look different? What changed in the code to create those effects?
[View Answer!]
Constant fill uses geom_boxplot(fill='steelblue')
. Mapped fill uses aes(fill='cut')
to color by variable.
6.6 Visualizing Distributions
(ggplot(diamonds, aes(x='price', fill='cut')) +
geom_histogram(color='white') +
labs(x='Price (USD)', y='Frequency', title='US Diamond Sales'))
## <plotnine.ggplot.ggplot object at 0x000001EC7B196A20>
Learning Check 5
Question
Make a histogram of price
with a narrower binwidth and apply a different theme. Which choices improve readability?
[View Answer!]
(ggplot(diamonds, aes(x='price', fill='cut')) +
geom_histogram(color='white', binwidth=250) +
theme_bw() +
labs(x='Price (USD)', y='Frequency'))
## <plotnine.ggplot.ggplot object at 0x000001EC7B1D5D00>