AI Development Lesson 3: NumPy & Pandas
NumPy is the math engine for AI — fast array operations. Pandas is the data manipulation library — every AI project starts here.
NumPy
import numpy as np
# Arrays (much faster than Python lists!)
a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])
# Vectorized operations (no loops!)
a + b # [11, 22, 33, 44, 55]
a * 2 # [2, 4, 6, 8, 10]
a ** 2 # [1, 4, 9, 16, 25]
np.sqrt(a) # element-wise sqrt
np.dot(a, b) # 550 (dot product)
# 2D arrays (matrices)
matrix = np.array([[1,2,3],[4,5,6],[7,8,9]])
matrix.shape # (3, 3)
matrix[0] # first row: [1,2,3]
matrix[:,1] # second column: [2,5,8]
matrix.T # transpose
np.linalg.inv(matrix) # inverse
Pandas
import pandas as pd
# Load data
df = pd.read_csv("data.csv") # from file
df = pd.read_json("data.json")
# Explore
df.head(5) # first 5 rows
df.info() # column types, nulls
df.describe() # stats (mean, std, min, max)
df.shape # (rows, columns)
# Filter
df[df["age"] > 30]
df[(df["age"] > 30) & (df["role"] == "admin")]
# Transform
df["full_name"] = df["first"] + " " + df["last"]
df["age_bucket"] = pd.cut(df["age"], bins=[0,18,65,100], labels=["youth","adult","senior"])
# Handle missing data
df.dropna() # remove rows with NaN
df.fillna(0) # fill NaN with 0
df["salary"].fillna(df["salary"].mean())
# Group and aggregate
df.groupby("department")["salary"].mean()
df.groupby("role").agg({"salary": "mean", "age": ["min","max"]})
🏋️ Practice Task
Download the Titanic dataset (available on Kaggle, or use seaborn: import seaborn as sns; df = sns.load_dataset(“titanic”)). Answer with Pandas: (1) Overall survival rate. (2) Survival rate by gender. (3) Average age by passenger class. (4) How many nulls in each column?
💡 Hint: df.groupby(“sex”)[“survived”].mean(). df.groupby(“pclass”)[“age”].mean(). df.isnull().sum()