DevOps Lesson 10: Monitoring & Observability
You can’t fix what you can’t see. Monitoring tells you when things break; observability tells you WHY they broke. Both are essential in production.
The Three Pillars
// 1. METRICS: numerical measurements over time
// CPU usage, memory, request count, error rate, latency
// Tools: Prometheus, Datadog, CloudWatch
// 2. LOGS: timestamped events
// What happened? When? With what data?
// Tools: ELK Stack (Elasticsearch+Logstash+Kibana), Loki, Papertrail
// 3. TRACES: request journey through system
// User request → API → DB → Cache → ... how long each step?
// Tools: Jaeger, Zipkin, DataDog APM
Add Metrics to Node.js
// npm install prom-client
const prometheus = require("prom-client");
const register = new prometheus.Registry();
prometheus.collectDefaultMetrics({ register });
// Custom metrics
const httpRequestDuration = new prometheus.Histogram({
name: "http_request_duration_seconds",
help: "Duration of HTTP requests in seconds",
labelNames: ["method", "route", "status"],
buckets: [0.1, 0.5, 1, 2, 5]
});
register.registerMetric(httpRequestDuration);
// Middleware to track every request
app.use((req, res, next) => {
const end = httpRequestDuration.startTimer();
res.on("finish", () => {
end({ method: req.method, route: req.path, status: res.statusCode });
});
next();
});
// Expose metrics for Prometheus to scrape
app.get("/metrics", async (req, res) => {
res.set("Content-Type", register.contentType);
res.end(await register.metrics());
});
You completed the DevOps course!
- Terraform — Infrastructure as Code
- Grafana — Beautiful dashboards for Prometheus metrics
🏋️ Practice Task
Add Prometheus metrics to your API. Track: request count, latency (histogram), error rate, active DB connections. Install Prometheus locally (docker run prom/prometheus). Configure it to scrape /metrics. Install Grafana, connect to Prometheus, create a dashboard.
💡 Hint: prometheus.yml: scrape_configs: [{job_name:”myapp”, static_configs:[{targets:[“host.docker.internal:3000”]}]}]