A senior DevOps or platform engineer in the US runs $180K to $250K all-in (base, equity, benefits, recruiting), with a 3 to 6 month lead time before they’re shipping anything useful. That’s roughly a year of seed-stage runway for one hire whose entire job is to keep your build pipelines green. For most early teams, it’s the wrong first hire, but the work still has to happen: someone needs to deploy the app, rotate the secrets, restore the database when it goes sideways.

This post is about how to do that work without making the hire. Not “how to build a platform team,” and not “how to avoid Kubernetes.” The opposite of both: a deliberately small AWS stack a normal application engineer can run in the background, and the four or five expensive mistakes we see teams make on the way there.

It’s a how-to, not a buyer-decision post. If you’re still deciding whether to leave Heroku or Render, start with A Heroku alternative in your own AWS account or Render vs your own AWS account first, then come back here for the build-out.

The mistakes we see every week

Before the stack, the mistakes. These are the ones that eat a quarter of runway and leave you further from shipping than when you started.

Starting with EKS. Someone reads a blog post about production Kubernetes, thinks “we should do it right,” and spins up EKS with a VPC CNI, a managed node group, an ALB ingress controller, external-dns, cert-manager, and a Helm chart nobody on the team can debug. Three months later your app is still running on Heroku because nobody can get TLS working end to end. EKS is a good choice for a 30-engineer team with a platform lead. It is a terrible first step. (Full breakdown in EKS vs k3s on AWS for startups.)

Picking ECS because it looks simpler than Kubernetes. It is simpler, and that simplicity is the trap. ECS is AWS-proprietary end to end: no portable API, no Helm, no operators, no ecosystem that works anywhere else. The day you want to run your workload in another account, another cloud, or on a laptop, you rewrite every manifest. k3s gets you close to the same operational simplicity without the one-way door.

Terraforming everything from day one. You sit down to declare your entire infrastructure in code before you’ve deployed anything. Two weeks in, you have a beautiful module structure and zero running services. Infrastructure as code is worth it, but not before you know what infrastructure you actually need.

Handrolling CI/CD in GitHub Actions. A deploy.yml file that SSHs into an EC2 box, pulls the latest image, and runs docker-compose up -d. It works for three months. Then a deploy fails halfway through, leaves a container orphaned, and your API is down at 11 PM on a Friday with nobody who knows how to roll it back.

Putting production on a single EC2 instance with no plan for the day it dies. It will die. AWS will send you a “scheduled maintenance” email, or the disk will fill up, or you’ll push a bad deploy and lose the host. If you can’t recreate the machine in under ten minutes, you do not have a production environment. You have a time bomb.

Treating RDS as the easy part. RDS looks simple until you need to run a migration, restore a PITR backup to a new instance, or figure out why your connections are saturating. “Managed” does not mean “ignore it.” You still need to know how to take a snapshot, how to restore one, and how to rotate credentials.

The minimum stack

Here is the smallest surface area that will actually hold up in production for a startup doing real traffic.

Compute: one or two EC2 instances running k3s. k3s is a single-binary Kubernetes distribution from Rancher (source on GitHub). It is real Kubernetes (same API, same kubectl, same ecosystem) but it installs in under a minute, runs in about 500 MB of RAM, and does not require you to think about control plane sizing. For most startups, one m6i.large or m7g.large is enough to run your app, workers, and a few supporting services. Add a second node for HA when you have paying customers.

Database: Postgres, either on the same EC2 box or on RDS. For a single-node setup with low traffic, running Postgres in-cluster on EC2 is fine. A managed Postgres operator handles backups, PITR, and failover, and you pay only for the EC2 volume. When the box starts feeling loaded or when you need multi-AZ, move to RDS with PITR enabled, Performance Insights on, and 7+ days of backups. Don’t pay for multi-AZ RDS before you have revenue that justifies it.

Object storage: S3. One bucket per environment. Enable versioning on anything you’d be upset to lose.

Secrets: AWS Secrets Manager or SSM Parameter Store. Not environment variables checked into a .env. Not hardcoded in your GitHub Actions secrets. Somewhere with rotation and an audit trail.

DNS, TLS, and edge protection: Cloudflare. Put your zone on Cloudflare’s free plan and you get managed DNS, automatic TLS at the edge, DDoS protection, bot mitigation, and scrape protection, all for zero dollars. You can terminate TLS at Cloudflare and keep a simple http:// listener inside the VPC, or terminate TLS in-cluster with cert-manager + Cloudflare DNS-01. Route 53 is a fine fallback if you’re already committed to AWS-native DNS, but for most startups Cloudflare removes the DNS and WAF line items from your first-month AWS bill.

Load balancer: an AWS Network Load Balancer in front of your k3s node(s). NLB is cheaper and simpler than ALB, and you let an ingress controller inside the cluster handle HTTP-layer routing.

Ingress: Traefik or nginx-ingress. Either is fine. Traefik is what k3s ships with by default.

Logs: CloudWatch Logs, or Loki if you’d rather not touch CloudWatch. Ship container stdout somewhere you can actually search it.

Observability: one Prometheus + Grafana, or a free Grafana Cloud account. You want to be able to answer “is the API slow right now” in 30 seconds.

That’s the stack. No service mesh. No GitOps tool. No multi-region. No Kubernetes operators you didn’t need. When your second product engineer joins the company, they should be able to read the whole runbook in an afternoon.

Why raw AWS gets painful

You can do everything above by hand. Many startups do. The pain shows up slowly, not all at once.

The first month it’s fine. Then a pod gets OOMKilled, or a deploy half-succeeds, or you need to restore a database to 40 minutes ago, and nobody has the context to fix it without reading AWS docs for two hours. Then you try to spin up staging and realize your one EC2 box is a special snowflake: IAM roles added ad-hoc, security group rules nobody remembers, a DB password pasted into Slack. Reproducing it takes a week.

Then a developer asks “how do I deploy my branch somewhere the designer can see it?” You don’t have a good answer. Preview environments (the thing that made Heroku feel magical) require real infrastructure automation you don’t have time to build. Then your first paying customer asks for an SLA and you realize you have one host, no failover, and no runbook for a bad deploy.

None of these are AWS’s fault. AWS gives you primitives. Turning primitives into a platform is the work, and the work is what you don’t have headcount for.

The lightweight path

Here is the sequence that has worked for the startups we’ve watched get this right without hiring.

1. Get one EC2 box running k3s, pointed at a domain, terminating TLS. Do this in an afternoon. Use k3sup or a 20-line cloud-init script. Put Traefik or nginx in front, point a Cloudflare DNS record at the NLB, and let Cloudflare terminate TLS at the edge. No cert-manager required for the first week. You now have a real Kubernetes cluster for about $60/month, with DDoS and bot protection already in front of it.

2. Deploy your app with a plain Deployment + Service + Ingress. Three YAML files, maybe 80 lines total. Push a Docker image to ECR. Apply the manifests. Your app is live on HTTPS.

3. Point RDS at it. Create a Postgres RDS instance in the same VPC. Store the connection string in Secrets Manager. Mount it into the pod via the Secrets Store CSI driver, or sync it into a Kubernetes Secret with external-secrets.

4. Set up GitHub Actions to build, push, and kubectl apply. Keep the workflow under 50 lines. Use an OIDC role. Don’t paste long-lived AWS keys into secrets.

5. Add a staging namespace. Same cluster, different namespace, a -staging subdomain. This is the point where the cheap setup starts paying for itself.

6. Add a second node for HA. When you have revenue. Not before.

That’s the lightweight path. A single engineer who is comfortable with AWS and Kubernetes basics can build it in two weeks. It will hold you to about 50 engineers and tens of thousands of requests per minute. Past that you need real investment, but by then you can afford it.

What to automate first

You cannot automate everything. Here’s the priority order when you have limited engineering time.

Deploys. If a deploy takes more than five minutes or requires a remembered command sequence, engineers start batching changes to avoid it, and ship buggier code. Automate git push to live in under five minutes. Nothing else matters more.

TLS. Certificates that expire on a Sunday morning are a self-inflicted outage. cert-manager with Let’s Encrypt, automatic renewal, and alerts on renewal failure. Set it up once, never touch it.

Logs. If you cannot search yesterday’s logs from one place in under 30 seconds, you are flying blind during incidents. Ship everything to CloudWatch Logs or Loki on day one. It is much harder to retrofit later.

Rollback. Every deploy needs a one-command rollback. kubectl rollout undo deployment/api is fine. What is not fine is “rebuild the previous image, repush, redeploy, hope.” Rehearse the rollback before you need it.

Backups. RDS automated backups are on by default, but useless if you’ve never tested a restore. Once a quarter, restore the latest snapshot to a new instance and run your app against it. The first time, you’ll find something broken. Better now than during an incident.

Logs, alerts, dashboards, IaC, preview environments, cost monitoring. All of these matter. But the five above are the ones that keep you alive. Do them first, in order.

Where this breaks down

This path works until one of four things is true:

You have more than about 50 engineers and the cognitive overhead of “everyone knows the stack” stops scaling.
You have compliance requirements (HIPAA, FedRAMP) that demand a level of audit logging and network isolation a single k3s node can’t easily provide.
You have traffic that genuinely needs multi-region.
You are running stateful workloads (databases, queues) inside Kubernetes, in which case you need someone whose full-time job is to care about them.

Until then, a single k3s cluster on EC2 with RDS for data, S3 for blobs, and Secrets Manager for credentials is enough. Everyone who tells you otherwise is either selling you something or works somewhere big enough that their advice doesn’t apply to you.

The shortest path

The setup above is what we install for you. Ownkube runs inside your own AWS account, sets up k3s, wires in TLS, logs, backups, deploy automation, and preview environments, and then gets out of the way. Your AWS credits still apply. The infrastructure is vanilla enough that if you ever want to fire us, everything keeps running.

If you are staring at a migration from Heroku and the idea of hiring a platform team is not realistic, connect your cloud and we’ll have you deployed on your own AWS account this week. No DevOps hire required.