Crafting Data Visualizations for Medium Stories with Matplotlib
Written on
Chapter 1: Introduction to Data Visualization
In this guide, I will walk you through the process of creating the data visualization displayed above. This tutorial is designed to be fast-paced and is not overly detailed, which should suit your busy schedule. The visualization we will create is somewhat complex, featuring sub-plots and numerous reusable functions. While this may seem daunting for those less familiar with Matplotlib, don't worry—by the end, you’ll be able to replicate these visualizations without altering any code. Ready to dive in? Let’s begin!
Step 1: Import Required Libraries
To get started, you’ll need to import the following libraries:
import requests
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from PIL import Image
from matplotlib.transforms import blended_transform_factory
Step 2: Fetch the Data
I have sourced a dataset from one of my articles for this tutorial. You can access it using this code:
data = requests.get(
).json()
If you prefer to work with your own data, check out this tutorial on extracting and preparing story data from Medium.
Step 3: Set a Seaborn Style
I always start by establishing a style with Seaborn to enhance the aesthetics of my charts. Feel free to customize the style as you see fit. Here are my specific settings for this visualization:
font_family = "Work Sans"
background_color = "#302C32"
grid_color = "#F4CAB9"
text_color = "#ffffff"
edgecolor = "#01110A"
sns.set_style({
"axes.facecolor": background_color + "00",
"figure.facecolor": background_color,
"axes.edgecolor": text_color,
"axes.grid": False,
"axes.axisbelow": True,
"grid.color": grid_color,
"text.color": text_color,
"font.family": font_family,
"xtick.color": text_color,
"ytick.color": text_color,
"xtick.bottom": False,
"xtick.top": False,
"ytick.left": False,
"ytick.right": False,
"axes.spines.left": False,
"axes.spines.bottom": False,
"axes.spines.right": False,
"axes.spines.top": False,
})
My aim was to produce a dark and minimalist chart that is visually appealing.
The video titled "Full Machine Learning Project — Data Visualization with Matplotlib (Part 3)" provides a detailed walkthrough of creating visualizations using Matplotlib. It focuses on practical applications and best practices, making it a perfect complement to this guide.
Step 4: Define Helper Functions
I’ve created some utility functions for later use. The first two functions convert the Matplotlib figure into a PIL image, making it simpler to manage padding and combine multiple charts into one visualization. The last function generates a list of dates for visualizing my story data:
def create_image_from_figure(fig):
plt.tight_layout()
fig.canvas.draw()
data = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8)
data = data.reshape((fig.canvas.get_width_height()[::-1]) + (3,))
plt.close()
return Image.fromarray(data)
def add_padding_to_chart(chart, left, top, right, bottom, background):
size = chart.size
image = Image.new("RGB", (size[0] + left + right, size[1] + top + bottom), background)
image.paste(chart, (left, top))
return image
def get_dates(story_data):
start = pd.to_datetime(min(story_data.keys())).replace(day=1)
end = pd.to_datetime(max(story_data.keys()))
delta = end - start
date_list = [(start + pd.Timedelta(days=i)).strftime('%Y-%m-%d') for i in range(delta.days + 1)]
return date_list
While Matplotlib offers functionality for padding, I often find it tricky to navigate! 😅
Step 5: Create Data Functions
Next, I've constructed data functions to efficiently extract subsets of data for visualization. I plot one month at a time in sub-plots and have made these functions reusable.
def list_earnings(dates, stats):
result = []
for d in dates:
if d not in stats.keys():
result.append(0)else:
result.append(stats[d].get("earning", 0) / 100)
return result
def list_statistic(dates, stats, readers, statistic):
result = []
for d in dates:
if d not in stats.keys():
result.append(0)else:
value = sum(stats[d][reader][statistic] for reader in readers)
result.append(value)
return result
def list_total(dates, stats, field):
return list_statistic(dates, stats, ["member", "nonmember"], field)
def list_nonmember(dates, stats, field):
return list_statistic(dates, stats, ["nonmember"], field)
def list_member(dates, stats, field):
return list_statistic(dates, stats, ["member"], field)
I prioritize writing readable code. Functions like total_reads provide clarity compared to using a simple sum().
Step 6: Develop Plotting Functions
This step involves the most complexity, but with some experimentation, you’ll grasp the concepts quickly. One challenge is how I handle earnings differently due to its unique structure. The plot_grid_lines() function creates grid lines that extend across multiple subplots.
def plot_earnings(ax, dates, stats):
sns.barplot(
ax=ax, x=dates, y=list_earnings(dates, stats),
facecolor="#2EC4B6", edgecolor=edgecolor, saturation=1, width=1,
)
def plot_bars(ax, dates, stats, settings):
sns.barplot(
ax=ax, x=dates, y=settings["function"](dates, stats, settings["field"]),
facecolor=settings["color"], edgecolor=edgecolor, saturation=1, width=1,
)
def write_month(ax, date):
ax.annotate(
pd.to_datetime(date).month_name(), (0.5, -0.04), ha="center", va="top", fontsize=32,
annotation_clip=False, xycoords="axes fraction", fontweight=500
)
def plot_grid_lines(fig, line_start=0.081, is_earnings=False):
transform = blended_transform_factory(fig.transFigure, fig.axes[0].transData)
for y in fig.axes[0].get_yticks()[1:-1]:
fig.axes[0].annotate(
text="${:,}".format(int(y)) if is_earnings else "{:,}".format(int(y)),
xy=(0.075, y), ha="right", va="center", fontsize=32, fontweight=500,
annotation_clip=False, xycoords=("figure fraction", "data")
)
line = plt.Line2D([line_start, 1], [y, y], transform=transform, color=text_color + "22", zorder=-1)
fig.lines.extend([line])
line = plt.Line2D([line_start, 1], [0, 0], transform=transform, color=text_color, zorder=10)
fig.lines.extend([line])
Step 7: Create Chart Functions
In this step, I define functions to generate individual charts for the various metrics I want to analyze. Each function follows a similar structure but varies in its input.
def create_earnings_chart(data, dates):
title = "Total Earnings: ${:,}".format(int(round(sum(list_earnings(dates, data)))))
earnings_chart = create_bar_chart(data, dates, [{"function": plot_earnings}], title, is_earnings=True)
return earnings_chart
def create_views_chart(data, dates):
total_views = sum(list_total(dates, data, "readersThatViewed"))
total_reads = sum(list_total(dates, data, "readersThatRead"))
title = "Total Views: {:,} ( Read: {:.1f}% )".format(total_views, 100 * total_reads / total_views)
view_chart = create_bar_chart(data, dates, [
{"function": list_total, "field": "readersThatViewed", "color": "#DB546144"},
{"function": list_total, "field": "readersThatRead", "color": "#DB5461"}
], title)
return view_chart
Step 8: Assemble the Final Visualization
Now, let’s put everything together. Here’s the code to download the data, generate the list of dates, create metrics for each chart, and merge them into a single final visualization:
def create_chart(charts):
chart = Image.new('RGB', (
charts[0].size[0], sum(c.size[1] for c in charts)))
y_offset = 0
for c in charts:
chart.paste(c, (0, y_offset))
y_offset += c.size[1]
return chart
data = requests.get(
).json()
dates = get_dates(data)
charts = []
title_text = "Spaces vs. Tabs: Impact on Salaries"
charts.append(create_title(title_text))
charts.append(create_earnings_chart(data, dates))
charts.append(create_views_chart(data, dates))
charts.append(create_members_chart(data, dates))
charts.append(create_claps_chart(data, dates))
chart = create_chart(charts)
The resulting chart mirrors the one showcased at the beginning of this tutorial.
Conclusion
This tutorial provided a quick overview of how to create visualizations for your Medium stories. By using the code shared here, you can gain insights into how your articles perform over time. I encourage you to apply these concepts to your own data and enhance the code to suit your needs. I hope you found this guide useful, and I look forward to seeing you next time! 😄
The video titled "Data Storytelling in Python" discusses how to effectively convey data narratives through visualizations, aligning perfectly with the principles covered in this tutorial.