AI Development Lesson 4: Data Visualization
In AI/ML, visualization reveals patterns, outliers, and model performance. You spend more time understanding data than building models.
Matplotlib Basics
import matplotlib.pyplot as plt
import numpy as np
# Line chart
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x), label="sin")
plt.plot(x, np.cos(x), label="cos")
plt.title("Sine & Cosine")
plt.xlabel("x")
plt.legend()
plt.show()
# Subplots
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
axes[0].hist(data, bins=30)
axes[1].scatter(x, y, alpha=0.5)
axes[2].bar(categories, values)
plt.tight_layout()
plt.savefig("analysis.png", dpi=150)
Seaborn for Statistical Plots
import seaborn as sns
import pandas as pd
df = sns.load_dataset("tips")
# Box plot
sns.boxplot(x="day", y="total_bill", data=df)
# Correlation heatmap
corr = df[["total_bill","tip","size"]].corr()
sns.heatmap(corr, annot=True, cmap="coolwarm")
# Pair plot (explore all relationships at once)
sns.pairplot(df, hue="sex")
# Distribution
sns.histplot(df["total_bill"], kde=True)
🏋️ Practice Task
Load the Titanic dataset. Create a figure with 4 subplots: (1) survival rate by passenger class (bar chart), (2) age distribution by survival (overlapping histograms), (3) fare distribution by class (box plot), (4) correlation heatmap of numeric columns. Label everything clearly.
💡 Hint: fig, axes = plt.subplots(2, 2, figsize=(12,10)). Use axes[0,0], axes[0,1], etc.