If you haven't read Phase 1, go read that first. Short version: I manually deployed Prometheus, Grafana, Node Exporter on Kubernetes, got the dopamine hit from seeing real metrics, and promised I'd do the mature thing next Terraform, custom metrics, and a real autoscaling controller.
This is that post. And honestly this one was more fun to build.
Tearing Everything Down to Rebuild It Properly
The first thing I did was kubectl delete everything I'd built in Phase 1.
Not because it was broken. It worked fine. But that's the problem it worked and I didn't fully know why it worked, or how to reproduce it in 5 minutes if I had to.
So I converted the entire stack to Terraform. Every namespace, every RBAC rule, every PVC, every deployment all of it now lives in .tf files. terraform apply and you have the full monitoring stack. terraform destroy and it's gone. No YAML files scattered everywhere, no "which order do I apply these in" nonsense.
The annoying part? Terraform state. I renamed a resource mid-way through and Terraform lost track of it. It wanted to delete and recreate something that was already running. Lesson learned: once you name something in Terraform, don't change it unless you're ready to deal with the consequences. terraform plan is free run it every single time before apply.
Architecture
Building WatcherBot A Custom Metrics Exporter
CPU and memory autoscaling is a bad idea for most real applications. If your service is waiting on a database, CPU is fine and memory is fine, but you're still backed up. You need metrics that actually reflect what the application is doing.
So I built WatcherBot. It's a small Go service that does one thing: tracks active tasks and exposes that as a Prometheus metric called watcherBot_active_tasks.
Three endpoints:
/metrics- Prometheus scrapes this/start_task- increments the gauge/finish_task- decrements it
That's it. The point wasn't to build something complex, it was to build something I had full control over so I could test the autoscaler against realistic load patterns.
Multi-stage Docker build for this. The final image went from 1.2GB (naive build) to 20MB (Alpine base, only the compiled binary). If you're not doing multi-stage builds for Go, you're just wasting everyone's time.
Writing the Autoscaler Controller
This was the part I actually wanted to build from the beginning.
The controller runs a loop every 15 seconds. Here's what it does each iteration:
- Queries Prometheus:
sum(watcherBot_active_tasks) - Calculates desired replicas:
ceil(active_tasks / 10)- the target is 10 tasks per replica - Clamps between 1 and 5 replicas
- If current replicas ≠ desired replicas, hits the Kubernetes API and updates the deployment
The math looks stupid-simple but that's the point. Good autoscaling logic is usually simple. The complexity is in the edge cases - what happens when Prometheus is temporarily unreachable? What if the scaling decision oscillates because you're right at the boundary?
For the Prometheus failure case, I added exponential backoff. The controller doesn't panic and try to scale to 0, it just waits and retries.
For thrashing - if you have 10 tasks and your target TTR is 10, you're at exactly 1 replica. One more task pushes you to 2. One less and you're back to 1. That's annoying in production. I kept it simple for now but in a real system you'd add a cooldown window or deadband.
The RBAC for the controller needs get and update on Deployments. That's it. Don't over-permission your controllers.
Running it inside the cluster uses in-cluster config. For local testing it falls back to ~/.kube/config. Took me 20 minutes to figure out why it was failing in-cluster before I realized the ServiceAccount wasn't mounted correctly.
Does It Actually Work?
Yeah.
# Simulate 25 tasks
for i in {1..25}; do curl http://localhost:8088/start_task; done
Controller logs show the scaling decision:
active_tasks=25 current_replicas=1 desired_replicas=3 → scaling up
Check the deployment:
kubectl get deployment watcher-bot -n monitoring -w
Replicas go from 1 → 3. Then when I hit /finish_task 20 times, it scales back down to 1. The full loop — metric generation, Prometheus scrape, controller query, Kubernetes API update — works end to end.
The Grafana dashboard shows watcherBot_active_tasks over time correlated with replica count. That's the part that actually looked good in a demo.
What the Stack Looks Like Now
- Prometheus with persistent storage, RBAC, Kubernetes service discovery
- Node Exporter as DaemonSet (runs on every node, always)
- Kube-State-Metrics for Kubernetes object state
- Grafana connected to Prometheus, persistent dashboards
- WatcherBot - custom Go exporter, Dockerized, exposes
watcherBot_active_tasks - Autoscaler Controller - custom Go controller, queries Prometheus, scales deployments via Kubernetes API
- All of the above managed with Terraform
Honest Reflection
Phase 1 felt like following a tutorial with extra steps. Phase 2 felt like actually building something.
The moment the controller made its first real scaling decision , not because I told it to, but because it read a metric, did math, and hit the API, that was genuinely satisfying.
Next I want to add proper alerting rules in Prometheus and wire Grafana alerts to somewhere useful. Also thinking about Helm charts so this is actually distributable. We'll see.
Code is on GitHub. The README has the full architecture breakdown if you want to run it yourself.
