# Ownkube: full content dump for AI agents > Developer platform that lives in your own AWS account. A team of named agents (Cost, Incident, Scaling, Security) runs ops alongside your team, so a 5 to 20 person engineering org can ship to production without hiring a $200K DevOps engineer. This file concatenates the homepage summary, pricing, and every blog post in one document so AI agents can ingest the site in a single fetch. - Site: https://ownkube.io - Signup: https://app.ownkube.io/signup - Login: https://app.ownkube.io/login - Short index: https://ownkube.io/llms.txt - Sitemap: https://ownkube.io/sitemap-index.xml --- ## Product summary Ownkube is a developer platform that runs in your own AWS account (GCP coming soon). You connect the cloud account with least-privilege access, push code via git, and Ownkube handles container builds, deployments into your VPC, CI/CD, monitoring, and preview environments on a Cloudflare-backed domain. Everything deploys as vanilla cloud resources (VPCs, load balancers, container services, managed databases) on either a Starter cluster (one AWS instance, free) or a Production cluster (highly available across availability zones, paid tier). If you disconnect Ownkube, your infrastructure keeps running. Because workloads run in your own cloud account, AWS Activate and GCP for Startups credits burn down efficiently before they expire, instead of being stranded on a managed PaaS bill. ## Named agents (the product surface) Ownkube ships with four named agents. Each has a single, concrete job and produces specific output, not vague "AI insights". - **Cost agent**: right-sizes workloads, auto-sleeps idle environments, catches spend anomalies. Sample output: "api-worker over-provisioned: 2GB allocated, 340MB peak. Right-sized. ~$18/mo saved." - **Incident agent**: plain-English crash reports with root cause hints. Sample output: "Your worker tried to load a 2GB dataset into 512MB RAM. OOMKilled at 14:32." - **Scaling agent**: replica and spot instance management ahead of traffic. Sample output: "Traffic up 2.4x in 5 min. Scaled api-gateway to 3 replicas. ETA: 12s." - **Security agent**: IAM drift, exposed secrets, CVE flags on base images. Sample output: "secret AWS_KEY committed in commit a1b2c3. Rotated. PR opened." Combined, the agents cover the recurring ops work of a senior DevOps / SRE / platform engineer (right-sizing, crash explanations, replica scaling, IAM drift). They do not replace strategic platform decisions or one-off migrations. Honest framing: a 5 to 20 person team can ship to production without a dedicated DevOps function. ## Pricing - Starter tier: one AWS instance, free for teams. Unlimited apps, preview environments, Incident agent, Cost agent, no credit card required. - Production tier: $5 per vCPU per month plus $1 per GB RAM per month (Ownkube platform fee). Highly available across availability zones, managed Postgres, autoscaling, jobs, cron, 30-day monitoring, 7-day logs, alerting, email support. - Enterprise: custom. Premium SLAs, RBAC, SAML SSO, on-prem installation. - Cloud infrastructure is billed by your cloud provider (AWS/GCP) at their published rates. Ownkube never marks it up. - Burn AWS Activate and GCP for Startups credits efficiently. Workloads run in your own cloud account, so credits apply to every EC2 instance, S3 bucket, and byte of bandwidth before they expire. ## How it works 1. Connect your cloud (least-privilege AWS role). Self-serve from a browser; no sales call required. 2. Push your code. Ownkube detects the stack, builds a container, deploys into your VPC. 3. Ship. The four named agents watch errors, costs, traffic, and security. Alerts come with plain-English explanations. ## How Ownkube compares | Capability | Ownkube | Heroku | Render | Railway | |---|---|---|---|---| | Runs in your cloud account | Yes | No | No | No | | Named ops agents (Cost, Incident, Scaling, Security) | Built-in | No | No | No | | Preview environments | Full-stack + DB fork | Review apps | Preview | PR deploy | | Cost optimization | Auto-sleep + spot | Eco dynos | Manual | Manual | | Burns AWS / GCP startup credits | Yes | No | No | No | | Vendor lock-in | Low | High | Medium | Medium | ## FAQ **Which clouds?** AWS today. GCP coming soon. **Can I use AWS Activate credits?** Yes. Activate credits expire in 12 to 24 months, and Ownkube is built to help you burn them efficiently. Workloads run in your own AWS account, so every EC2 instance, S3 bucket, and byte of bandwidth is billed against your credits at 1:1. **What if I disconnect?** Infrastructure keeps running. Ownkube uses standard cloud resources only. **Do I need a platform team?** No. Basic cloud and container literacy helps, but nobody needs infrastructure as their full-time job. --- # Blog posts (full text) --- ## Kubernetes cost optimization for startups: 7 patterns that cut bills in half > The 2026 cost-optimization playbook for startups running Kubernetes on AWS or GCP. Right-sizing, spot, idle sleep, namespace quotas, image pulls, NAT routing, and the one structural change that compounds them all. - Canonical: https://ownkube.io/blog/kubernetes-cost-optimization-startups - Markdown: https://ownkube.io/blog/kubernetes-cost-optimization-startups.md - Published: 2026-05-18 - Author: Ownkube team - Category: Engineering - Tags: kubernetes-cost, cloud-cost, aws, eks, k3s, startup-infrastructure The first surprise about a Kubernetes bill at a startup isn't how high it is. It's how predictable the leaks are. Across the small-team AWS and GCP bills we audit, the same seven patterns drive 70 to 90% of the waste, and they're all fixable in a single quarter. This post is the consolidated playbook. We'll cover the seven patterns, the realistic savings on each, and the one structural change that compounds them: putting the cost optimization itself on autopilot. **Skim answer:** - **The seven highest-leverage patterns:** right-sizing, spot capacity, idle environment sleep, namespace quotas, image-pull traffic reduction, NAT-routing audits, and storage class right-sizing. - **Combined impact:** typically cuts a small-team Kubernetes bill by 40 to 65%. - **Timeline:** all seven are fixable in a single quarter. ## Why startup K8s bills are usually 2x what they should be The math behind the typical waste: - Most teams set resource requests at 2x to 4x what the workload actually uses, "to be safe". Cluster autoscaler then provisions nodes for the inflated request, not the real usage. - Idle preview environments and staging clusters run 24/7 even though they're used 30 hours a week. - Container images get pulled from public registries through NAT gateways, racking up data-processing fees nobody sees. - EBS volumes are provisioned at default gp2 sizes that exceed real I/O needs by an order of magnitude. None of these are dramatic failures. They're small consistent bleeds. Add up enough of them and the cluster bill is double what it should be. ## Pattern 1: Right-size every workload The single largest lever. Most workloads we see have resource requests set during initial deployment, never revisited. A typical web pod might have a 1 vCPU / 1 GB RAM request and a 95th-percentile usage of 0.18 vCPU / 240 MB RAM. The other 80% of the request is reservation paid for and unused. **The fix**: use [Vertical Pod Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler) in recommendation mode for one to two weeks, then apply the recommendations. For long-tail workloads with bursty traffic, use VPA in `Auto` mode with sensible min/max bounds. **Realistic savings**: 25 to 45% of cluster compute. The single biggest line-item improvement on every audit. ## Pattern 2: Spot capacity for the workloads that tolerate it Stateless web pods, queue consumers, build runners, batch jobs, and preview environments are excellent candidates for spot. Database primaries, control plane nodes, and single-replica stateful services are not. We covered the full pattern in [AWS spot instances in production](/blog/aws-spot-instances-production-guide). **The fix**: mixed instance pools with on-demand base + spot for the rest. Karpenter or [Cluster Autoscaler with mixed ASGs](https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-purchase-options.html) handles the orchestration. Set `PodDisruptionBudgets` and 30 to 120 second graceful shutdown windows. **Realistic savings**: 50 to 70% off the spot-eligible portion of the compute bill. For a typical small-team workload, that translates to roughly 30 to 50% off the total cluster compute. ## Pattern 3: Sleep idle environments Staging, preview, and developer sandbox environments sit idle the majority of the week. A preview environment created on Monday for a PR that merges Friday runs for ~120 hours while being used for maybe 4 hours. The other 116 hours are pure waste. **The fix**: implement an idle-detection loop that scales preview deployments to zero after N hours of no traffic, then scales back up on the next request. Tools like KEDA, scaler-controller, or platform-layer features handle this. **Realistic savings**: 50 to 75% of preview environment cost. For teams with 5 to 20 active previews at any time, this can be the second-biggest single-pattern saving. ## Pattern 4: Namespace quotas and request governance Without `ResourceQuota` on each namespace, any service team can ship a deployment that requests more than it needs. Quotas force the conversation: "you want 8 vCPU for this worker, defend it." **The fix**: a `ResourceQuota` per namespace pinned to a realistic budget. A `LimitRange` to set sensible default requests for pods without explicit requests. A monthly review where the quotas are revisited. **Realistic savings**: 10 to 20% indirect via culture change. The bigger win is preventing future drift. ## Pattern 5: Pull images from a private registry Most teams pull container images from Docker Hub, GitHub Container Registry, or Quay through a NAT gateway. Each pull is a few hundred MB. At a busy CI fleet plus rolling production deploys, that's tens of GB per day routed through NAT at $0.045 per GB. **The fix**: mirror your base images to a private [Amazon ECR](https://aws.amazon.com/ecr/) registry (or GCP Artifact Registry), enable the ECR VPC interface endpoint, and configure your image pull policy to use the mirror. Image pulls now stay on the AWS backbone at near-zero per-GB cost. **Realistic savings**: $50 to $400 per month for a typical small team, more for high-deploy-rate orgs. ## Pattern 6: Audit NAT routing NAT gateway data-processing fees are the most under-noticed cost on AWS bills. We covered the full pattern in our [NAT gateway cost guide](/blog/aws-nat-gateway-cost-fix). The short version: - Enable S3 and DynamoDB gateway endpoints (free) on every VPC. - Add interface endpoints for high-volume AWS services (Secrets Manager, STS, CloudWatch Logs). - Dual-stack your VPC and route IPv6 traffic through a free egress-only IGW. - Reduce NAT topology from 3 AZs to 2 if availability tolerates. **Realistic savings**: $200 to $1,500 per month depending on cluster traffic. ## Pattern 7: Storage class right-sizing EBS gp3 has replaced gp2 as the default sensible choice for most cluster volumes. It's cheaper, faster, and you pay only for the IOPS you provision (rather than the IOPS that scale with volume size on gp2). **The fix**: migrate from gp2 to gp3 across your cluster. Right-size volume capacity (most workloads provision 100 GB when they use 12 GB). For workloads with very low I/O, consider [sc1 or st1](https://aws.amazon.com/ebs/cold-hdd/) for cold data. **Realistic savings**: 20 to 35% on EBS line items. ## The structural change that compounds them all Each of the seven patterns above is a one-time engineering project. Implementing them all takes 2 to 4 weeks of focused work for a small team. The harder question is: what stops the drift from coming back six months later? The honest answer is "nothing, unless someone owns it." Without an explicit owner, request inflation creeps back in, new services get deployed without quotas, new environments forget the sleep schedule, new container pulls go through NAT, and the cluster bill returns to its old shape within two quarters. The structural fix is to put the cost watch on autopilot. At [Ownkube](https://ownkube.io) the Cost agent does exactly this, inside your own cloud account: - **Right-sizing**: continuous VPA-style recommendations applied with safety thresholds. Sample output: "api-worker over-provisioned: 2GB allocated, 340MB peak. Right-sized. ~$18/mo saved." - **Spot ratio tracking**: realized spot savings reported monthly. Sample output: "Spot ratio: 78%. Realized savings vs on-demand: $612 last month." - **Idle sleep**: previews auto-scaled to zero after 4 hours of inactivity, scaled up on first request. - **NAT and image-pull audits**: anomaly detection flags new patterns that drive NAT cost. You still own the architectural decisions. The agents handle the recurring vigilance. ## A worked example Take a typical 2026 SaaS running Kubernetes on AWS: 1 cluster, ~24 vCPU production fleet, 8 vCPU staging, 10 active preview environments, RDS Postgres, ElastiCache. **Before optimization**: ~$2,400/month cluster compute + $180 NAT + $90 EBS = **$2,670/month**. **After applying all seven patterns**: | Pattern | Saving | |---|---| | Right-sizing (Pattern 1) | -$840 | | Spot capacity (Pattern 2) | -$520 | | Idle environment sleep (Pattern 3) | -$280 | | Namespace quotas + governance (Pattern 4) | -$120 | | Private image registry (Pattern 5) | -$140 | | NAT routing audit (Pattern 6) | -$110 | | Storage class right-sizing (Pattern 7) | -$30 | | **Total saved** | **-$2,040** | **After**: ~$630/month. About a 76% reduction on the cluster bill. Numbers are illustrative for a defined workload. Your savings will vary. ## Decision checklist Before you start, confirm: - [ ] Do you have a way to measure current per-workload resource usage (Prometheus, CloudWatch, Datadog)? - [ ] Is your cluster on Kubernetes 1.27+ (so VPA and Karpenter work cleanly)? - [ ] Do you have at least one engineer with a couple of weeks of focused time? - [ ] Is anyone going to own ongoing cost vigilance after the initial pass, or do you need that on autopilot? If you ticked all four, you're set up to do the work in-house. If you ticked three or fewer, consider a platform layer that runs these patterns by default. ## Closing Kubernetes cost optimization at a small startup isn't a mysterious art. It's seven well-understood patterns plus one structural change to stop the drift from coming back. Implement the seven, and you'll halve the cluster bill within a quarter. Put the watch on autopilot, and it stays halved. If you'd rather skip the initial work and start with the patterns already applied, Ownkube runs them by default inside your own AWS account, and the Cost agent watches the drift. [Connect your cloud and try it](https://app.ownkube.io/signup). --- ## Fly.io alternative in 2026: when teams move to their own AWS account > Why teams leave Fly.io in 2026 (region coverage, compliance, AWS credits, bill predictability) and what the move to your own AWS account actually looks like for a small team. - Canonical: https://ownkube.io/blog/fly-io-alternative-own-aws - Markdown: https://ownkube.io/blog/fly-io-alternative-own-aws.md - Published: 2026-05-15 - Author: Ownkube team - Category: Engineering - Tags: fly-io-alternative, platform-engineering, aws, self-hosted, heroku-alternative [Fly.io](https://fly.io) is one of the best platform products of the last five years. The Firecracker microVMs are fast, the region story is real, the developer experience is sharp, and the team has built a genuinely opinionated take on what "global by default" should mean. For a lot of small projects in 2026, Fly is still our recommendation. This post is for the teams who've outgrown it. The triggers tend to be specific (a customer asking for AWS-only data residency, an AWS Activate balance you're not capturing, a bill that's drifted into uncomfortable territory), and the next move isn't obvious. We'll cover where Fly stops fitting, who should stay, what your AWS-native alternative actually looks like, and the honest cost of the migration. **Skim answer:** - **Why teams leave Fly:** customers require AWS data residency, they qualify for AWS Activate credits they can't redeem on Fly, their Fly bill crosses $1,500 a month with surprises, or they hit Fly-specific scaling and observability limits. - **2026 alternative most small teams pick:** run on their own AWS account at wholesale rates with a platform layer (e.g. Ownkube) on top. - **What you keep:** the deploy ergonomics they liked on Fly. ## Where Fly.io is still the right call Be fair to the incumbent first. Fly remains the strongest choice when: - You need genuinely low-latency reads in 5+ regions and your app architecture supports it (Litestream, Fly Postgres with read replicas in-region, region-aware request routing). - Your team is 1 to 5 engineers and you don't want to learn AWS. - Your workload is naturally stateful at the application layer (game servers, real-time collaboration, anything where stickiness matters) and Fly's "place this VM here" model is a feature, not a bug. - You don't qualify for, or aren't going to use, AWS Activate credits. If two or more describe you, stay on Fly. Stop reading. ## Why teams eventually leave The migration triggers we see in 2026 are remarkably consistent. **Trigger 1: AWS credits you can't capture.** A funded startup with up to $100,000 in AWS Activate credits and a Fly bill is, mathematically, lighting half the credits on fire. We wrote up the underlying math in our [AWS Activate credits guide](/blog/aws-activate-credits-guide-2026). The short version: credits redeem against AWS spend, not Fly bills, and the validity window is 12 to 24 months. **Trigger 2: Customer or compliance requires AWS specifically.** SOC 2, HIPAA, and most enterprise procurement are doable on Fly's infrastructure, but the path is shorter on AWS. When a customer's security team says "we need your workloads in our region of AWS, under our KMS keys", Fly is the wrong substrate. **Trigger 3: The bill drifts.** Fly's metered model is great when traffic is small. It surprises you when traffic spikes, when you accidentally pin a VM that should have been autoscaled-down, or when egress costs compound. Several teams in our network report unexpected $3K to $8K monthly Fly bills that, when migrated, ran $700 to $1,800 on wholesale AWS. **Trigger 4: Operational limits in 2026.** Fly's observability story is improving but still trails AWS-native (CloudWatch, OTel, X-Ray). Multi-region Postgres on Fly is good for some patterns and awkward for others. Some teams report tail-latency variance that AWS reserved capacity doesn't have. If three or more of these apply, the move is usually worth the migration cost. ## What the AWS-native alternative looks like The basic shape: - **Compute**: EKS (multi-AZ) or k3s on EC2, depending on team size and traffic. - **Database**: Managed RDS Postgres in the same region (Multi-AZ for production). For multi-region read replicas, use RDS cross-region replication. - **Caching**: ElastiCache (Redis or Memcached). - **Edge**: Cloudflare or CloudFront. We default to Cloudflare for managed DNS, preview domains, DDoS, bot, and scrape protection out of the box. - **Storage**: S3 for artifacts, build cache, and large objects. - **Secrets**: AWS Secrets Manager with workload identity via IRSA on EKS. - **Observability**: CloudWatch + OTel, optionally fanned out to Datadog/Honeycomb/etc. The deploy ergonomics that made Fly nice (git push, preview environments, the abstraction over individual VMs) come from the platform layer on top, not from AWS directly. That's the part most teams underweight. ## Two roads to that alternative Two practical paths exist. ### Road 1: Stand it all up yourself with a [DevOps engineer](/blog/devops-engineer-salary-cost-2026) You hire (or borrow) a platform engineer, they build the cluster, configure the deploy pipeline, wire up observability, stand up preview environments, design the secrets story. **Time to production**: 4 to 12 weeks for a competent engineer, longer if they're learning. **Loaded cost**: ~$200K/year for the hire, plus tooling licenses, plus your engineering attention. We wrote about that loaded cost in our [DevOps engineer salary breakdown](/blog/devops-engineer-salary-cost-2026). **Verdict**: appropriate if you're 25+ engineers and the platform engineer has follow-on work after the migration. Heavy if you're smaller. ### Road 2: Use a managed platform layer that runs in your AWS account A product like [Ownkube](https://ownkube.io) installs in your AWS account, provisions the cluster, wires up the deploy pipeline, configures preview environments on a Cloudflare-managed domain, sets up secrets and observability, and includes a small team of named agents that handle recurring ops: - **Cost agent**: right-sizes workloads, sleeps idle previews, flags spend anomalies. - **Incident agent**: reads crashes and explains them in plain English. - **Scaling agent**: manages replica counts and spot capacity ahead of traffic spikes. - **Security agent**: flags IAM drift, exposed secrets, CVEs. **Time to production**: hours to days for a small team. Connect the AWS account, point the platform at your git repo, deploy. **Cost**: free on the k3s tier (one AWS instance, fits side projects and small-team production). $5 per vCPU + $1 per GB RAM on EKS tier when you scale. **Verdict**: appropriate if you're 5 to 30 engineers and you want the migration to take days, not months. ## A worked example A small SaaS on Fly with: 1 web service, 2 background workers, Fly Postgres, multi-region read replica, a handful of preview environments, ~3 TB/month egress. **On Fly.io (2026 metered pricing)**: approximately $1,800 to $2,800 per month. **On AWS at wholesale (EKS multi-AZ + RDS Multi-AZ + ElastiCache + Cloudflare edge)**: approximately $700 to $1,100 per month in AWS spend, fully redeemable against Activate credits. **Platform layer on top**: $0 on k3s mode (if traffic fits) or ~$150 to $250 on EKS mode for this footprint. Year-1 cash impact for a credit-funded startup: from ~$24K out the door on Fly to under $3K on the AWS-native setup with Ownkube on top. ## The migration is real work, but bounded A common myth is that leaving Fly means months of rewrites. For most small teams the actual work is bounded: - **Containerize what isn't already**. Most Fly apps are already containers. If not, this is a half day. - **Move the database**. RDS Postgres logical replication or `pg_dump` + restore. For a 30 GB Postgres, plan on 4 to 8 hours including verification. - **Re-point DNS**. Cloudflare-managed if you're using a platform layer. Half day. - **Re-wire CI**. Push to a different registry, deploy to a different cluster. Half day. - **Validate preview environments and observability**. One day of testing. Total elapsed time for a small team: 1 to 2 weeks of focused work. Not free, but not the multi-month project some make it out to be. ## When NOT to migrate Stay on Fly if: - You're under $500/month of Fly spend. - Your application architecture genuinely benefits from Fly's per-VM region placement model. - You don't have AWS Activate credits and aren't on a compliance path that requires AWS. - Your team has 1 to 3 engineers and the operational simplicity is worth more than the bill delta. The argument is not "Fly is bad". The argument is "the math has flipped for a specific class of team". ## Decision checklist - [ ] Is your Fly bill over $1,500/month? - [ ] Do you have AWS Activate credits or GCP credits you're not redeeming? - [ ] Is a customer or compliance requirement pushing you toward AWS specifically? - [ ] Are you 5+ engineers and growing? - [ ] Do you need observability or networking primitives Fly doesn't expose? Three or more yeses: the migration usually pays back inside the first 90 days. ## Closing Fly.io is still one of the best platform products of its generation, and we're not in the business of arguing otherwise. The honest 2026 reality is just that some teams outgrow the model, and the next stop isn't another shared PaaS. It's your own AWS account, with a platform layer that gives you the deploy ergonomics you liked. If that's the move you're sketching, Ownkube is built for the migration. Free on a Starter cluster (one AWS instance), $5 per vCPU + $1 per GB RAM when you scale. [Connect your cloud and try it](https://app.ownkube.io/signup). --- ## AWS spot instances in production: the 2026 playbook for safe 60% to 80% savings > How to safely run production workloads on AWS spot instances in 2026. Interruption handling, fallback patterns, realistic savings benchmarks, and the workloads that should never go on spot. - Canonical: https://ownkube.io/blog/aws-spot-instances-production-guide - Markdown: https://ownkube.io/blog/aws-spot-instances-production-guide.md - Published: 2026-05-13 - Author: Ownkube team - Category: Engineering - Tags: aws-spot-instances, aws-cost, cloud-cost, ec2, kubernetes The cheapest EC2 capacity AWS sells is also the one most teams refuse to use. Spot instances run at 60 to 80% off on-demand pricing in 2026 (depending on instance family and region), and a healthy share of workloads at every startup we've audited could safely run on them. The reason most teams don't: a vague memory of an interruption story from 2017, and no clean pattern for handling reclaim events. **Skim answer:** - **What they are:** EC2 capacity AWS reclaims when on-demand demand spikes. - **What they cost in 2026:** 60 to 80% less than on-demand. - **Safe for:** stateless web pods, queue consumers, build runners, batch jobs, and most preview environments. - **Unsafe for:** stateful primaries (Postgres, Redis), control planes (Kubernetes masters), and any single-replica workload with hard SLAs. - **Right answer for most small teams:** mixed. On-demand for the few stateful things, spot for everything else. This post is the playbook for that split. ## How spot pricing actually works in 2026 AWS sells the same EC2 capacity in three flavors: | Type | Price (relative) | Reclaim behavior | |---|---|---| | On-demand | 100% | None. Yours until you stop it. | | Savings Plans / RIs | 50 to 70% of on-demand | None. Pre-committed for 1 or 3 years. | | Spot | 20 to 40% of on-demand | AWS can reclaim with 2 minutes notice. | Spot pricing in 2026 is largely steady. The wild 5-minute price swings of 2018 are gone; today most instance families show stable spot prices within a band, with occasional reclaim events during regional capacity crunches. Real numbers from `us-east-1` in April 2026 (approximate, varies by hour): | Instance | On-demand $/hr | Spot $/hr | Savings | |---|---|---|---| | t3.xlarge | $0.166 | $0.045 | 73% | | m6i.large | $0.096 | $0.027 | 72% | | c7g.large | $0.072 | $0.022 | 69% | | r6i.large | $0.126 | $0.034 | 73% | | g5.xlarge (GPU) | $1.006 | $0.291 | 71% | The spot discount is real. The question is operational: what workload can tolerate a 2-minute eviction notice without breaking a customer experience? ## What's safe on spot Workloads that handle interruption gracefully: - **Stateless web pods behind a load balancer**. The load balancer drains an evicted pod; another pod on a different node takes over. As long as you have more than one replica and the cluster has on-demand fallback capacity, the customer sees nothing. - **Queue consumers**. A worker that's halfway through a message either finishes (2 minutes is usually enough) or the message returns to the queue and another worker picks it up. Design for idempotency, which you should be doing anyway. - **Build runners**. Github Actions self-hosted runners, GitLab runners. If a build dies, retry. The cost saving on a busy CI fleet is significant. - **Batch jobs and cron**. Same logic. Idempotent batch jobs survive interruption. Non-idempotent jobs need to be made idempotent before going on spot. - **Preview environments**. Per-PR environments are by definition transient. Spot is perfect for them. - **ML training and inference at large scale**. Most training frameworks support checkpointing; modern inference is multi-replica behind a load balancer. ## What's unsafe on spot These should stay on on-demand or reserved capacity: - **Database primaries**. Postgres, MySQL, MongoDB, Redis. A 2-minute notice is not enough to safely fail over a database. Use spot for read replicas, not the primary. - **Stateful services with no replication**. Single-replica workloads of any kind. - **Kubernetes control plane nodes**. If you're running k3s on EC2 (instead of managed EKS), keep the control-plane node on on-demand. - **Anything with a hard latency SLA**. The reclaim event introduces a brief disruption. If your SLA is "99.99% under 50ms p99", spot adds variance you can't afford. - **Workloads with long stateful operations**. A video transcoder mid-job, a long-running data migration, anything that loses progress when interrupted. ## The patterns that make spot safe Three patterns turn a "spot is too scary" team into a "we run mostly on spot" team: ### Pattern 1: Mixed instance pools with on-demand fallback Configure your autoscaling group or Karpenter (on EKS) with a base of on-demand capacity plus spot for the rest. The on-demand base absorbs spot reclaim events while replacement spot capacity provisions. A typical small-startup ratio: 25% on-demand, 75% spot. For a 8-vCPU production fleet, that's 2 vCPU of on-demand always there, with 6 vCPU of cheaper spot capacity layered on top. If spot is reclaimed, the workload continues on the on-demand base while new spot capacity comes online (typically under 5 minutes). ### Pattern 2: Diversified instance families and AZs Spot reclaim events are usually scoped to a specific instance family in a specific AZ. If your pool is "any of m6i.large, m6a.large, c6i.large, c6a.large across 3 AZs", a reclaim event on one type rarely takes the whole pool down. Both `eksctl` and `karpenter` make this trivial. Configure 4 to 6 instance types in the same broad family (general-purpose, compute-optimized, memory-optimized) and let the scheduler pick. ### Pattern 3: Pod disruption budgets and graceful shutdowns In Kubernetes, set [`PodDisruptionBudgets`](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/) so the cluster knows the minimum-available count for each workload. When a spot node is reclaimed, Kubernetes drains pods according to those budgets, giving each container a `terminationGracePeriodSeconds` window to finish what it's doing. For most web pods, 30 seconds is enough. For queue consumers processing a single message, 90 to 120 seconds is closer to right. The spot notice is 2 minutes, so design termination grace to fit inside it. ## A worked example Take a typical 2026 SaaS: 1 web service (3 replicas), 2 background workers (2 replicas each), 5 preview environments, build runners, RDS Postgres (Multi-AZ) and ElastiCache (managed AWS, untouched by this exercise). | Workload | Replicas | Spot strategy | Approx. monthly EC2 savings | |---|---|---|---| | Web pods | 3 | 1 on-demand, 2 spot | $120 | | Worker pods | 4 | All spot | $200 | | Preview envs | 5 | All spot | $180 | | Build runners | variable | All spot | $80 | | Control plane (k3s) | 1 | On-demand | $0 (baseline) | **Total**: roughly $580 per month saved on a base compute bill of ~$800. About 70% off on a workload that's already pretty modest. For a Series A SaaS pushing 50+ vCPU of production traffic, the saving compounds into mid-five figures annually. ## When NOT to bother Be honest with yourself. - **Very small workloads** (1 to 2 small instances, under $100/month total) don't move the needle. The engineering time to configure spot exceeds the saving. - **Pre-product-market-fit teams** should not optimize compute cost. Ship features. Optimize when growth makes the bill visible. - **Workloads governed by deeply regulated SLAs** (financial trading, real-time medical) shouldn't introduce spot variance. ## How the platform layer handles this A point worth making: spot configuration is the kind of work nobody on a 5- to 20-person team gets around to doing well. The instance-family diversification, the on-demand fallback, the pod disruption budgets, they sit on someone's "next quarter" list for years. At [Ownkube](https://ownkube.io) the platform layer ships with mixed ASG and Karpenter-driven node selection by default. Stateless workloads go on spot with on-demand fallback automatically. The Cost agent tracks the realized savings vs. the on-demand baseline and reports it in the dashboard. Sample output: "Spot ratio: 78%. Realized savings vs on-demand: $612 last month. No reclaim-induced incidents." You don't have to configure any of this. It's the default. ## Decision checklist Before flipping a workload to spot: - [ ] Does the workload have more than one replica behind a load balancer or queue? - [ ] Is the workload idempotent (or made idempotent) so an interruption doesn't corrupt state? - [ ] Have you configured `terminationGracePeriodSeconds` (or equivalent) inside the 2-minute spot notice window? - [ ] Do you have an on-demand fallback layer that absorbs reclaim events? - [ ] Is your monitoring set up to alert on actual customer impact (not just "node went away")? Five yeses, go to spot. Any no, fix that first. ## Closing Spot is one of the highest-leverage cost moves on AWS in 2026, and it's massively underused by small teams. The pattern is well-understood: mixed pools, instance diversification, graceful shutdowns, and an on-demand base. With those four in place, 60 to 80% off compute is a real saving, not a paper exercise. If you'd rather have the platform layer set those defaults for you (and a Cost agent that watches the realized savings), Ownkube runs the spot story for you inside your own AWS account. [Connect your cloud and try it](https://app.ownkube.io/signup). --- ## What is an internal developer platform (IDP), and when does a small team actually need one? > A plain-English 2026 definition of internal developer platforms, the four signals that say your team needs one, and the build-vs-buy framework most growing startups end up using. - Canonical: https://ownkube.io/blog/internal-developer-platform-2026 - Markdown: https://ownkube.io/blog/internal-developer-platform-2026.md - Published: 2026-05-11 - Author: Ownkube team - Category: Engineering - Tags: internal-developer-platform, platform-engineering, developer-experience, devops "Internal developer platform" (IDP) is the most overloaded term in the 2026 infrastructure conversation. Some people mean a Backstage portal. Some mean a Kubernetes setup with golden paths. Some mean a custom-built layer on top of AWS that abstracts away the rough edges. All of those are correct. None of them are useful when you're a 10-person team trying to decide what to actually do. This post is the operator-level definition. We'll cover what an IDP is in plain English, the four signals that say your team has crossed the line where one would help, the three common shapes IDPs take in the wild, and a build-vs-buy framework that most growing startups settle into. **Skim answer:** - **What it is:** software that gives engineers a self-serve way to deploy applications, run services, and access infrastructure without filing tickets or memorizing cloud-vendor consoles. - **Who doesn't need one:** a 5-engineer team. - **Who almost always does:** a 25-engineer team. - **The crossover:** around 10 to 15 engineers, when ad-hoc DevOps starts taxing senior engineers' weeks. ## What an IDP actually is Strip away the conference-talk language and an IDP is a layer with five jobs: 1. **A self-serve deploy path**. An engineer pushes code, the platform builds and deploys it. No tickets, no Slack DMs to the one person who knows how the cluster works. 2. **Golden paths**. Opinionated templates for common patterns: a web service, a worker, a cron, a database, a queue. New services start from the template, not from a blank YAML file. 3. **Environment management**. Production, staging, and preview environments per pull request, on demand. 4. **Observability and access**. Logs, metrics, and SSO into running containers, available to the engineer who owns the service, not gatekept by a platform team. 5. **Guardrails**. Policies that prevent obvious mistakes (a workload without resource limits, a public S3 bucket, a secret in plain text) without blocking velocity. Notice what's not on the list. An IDP is not a Kubernetes cluster. Kubernetes is one possible substrate. An IDP is not Backstage. Backstage is one possible portal. An IDP is not a custom Terraform monorepo. That's one possible implementation. The IDP is the **experience** the engineer has, regardless of what's underneath. ## The four signals that say you need one We see startups try to spec out an IDP project for two reasons: a real signal that the current ad-hoc setup is breaking, or a vanity project ("Stripe has Backstage so we should have Backstage"). The second is expensive. Watch for the first. **Signal 1: Senior engineers are spending more than a day a week on deploys, alerts, and infra tickets.** This is the canonical break-even moment. The cost of building or buying a platform is justified when you have measurable engineering time being burned on recurring ops. **Signal 2: New hires take more than a week to ship their first production change.** If the answer to "how do I deploy this?" requires a half-day onboarding session, the platform is the bottleneck. A good IDP makes this a 30-minute walkthrough. **Signal 3: You have more than two flavors of "how to deploy a service" in production.** Some services use one set of scripts, some another, some are deployed by hand. This is a leading indicator of a real incident waiting to happen. **Signal 4: Compliance is asking who can access what, and you can't answer in less than an hour.** SOC 2, HIPAA, and enterprise procurement all want auditable, policy-enforced access. An IDP centralizes the answer. If two or more apply, you're past the line. If only one, you can probably hold off another quarter and revisit. ## The three shapes IDPs take in 2026 In the wild, IDPs land in one of three patterns. None of them is wrong; they're each right for a different team size and stage. ### Shape 1: The custom-built monorepo A platform team builds and operates the IDP in-house. Usually on top of Kubernetes (EKS for AWS shops), with Terraform or Pulumi for cloud resources, ArgoCD or Flux for GitOps, Backstage or a custom portal for developer self-serve, OPA or Kyverno for policy. **Best for**: large engineering organizations (100+ engineers) where the platform team can be 5+ people, and where the business value of a perfectly bespoke developer experience justifies the multi-year roadmap. **The honest cost**: a dedicated platform team of 4 to 8, plus tooling licenses, plus the ongoing maintenance burden as Kubernetes versions, AWS services, and security policies evolve. ### Shape 2: The Backstage-style portal on existing infrastructure A smaller team adopts Backstage (or similar) as a service catalog and developer portal, on top of whatever cloud infrastructure they already run. **Best for**: mid-size organizations (30 to 100 engineers) that already have a platform engineering function and want a unified front-end across heterogeneous backends. **Where it stops fitting**: Backstage is a portal, not a platform. It doesn't deploy your services for you. Underneath, you still need the platform substrate. For most startups, the cost of building the substrate is the actual work. ### Shape 3: A platform layer that already exists, on your cloud account A managed product that runs in your own AWS, GCP, or Azure account and gives you the self-serve deploy, golden paths, environment management, observability, and guardrails out of the box. You don't operate the platform; you use it. **Best for**: 5- to 50-engineer startups that want IDP-level developer experience without standing up a platform team to build it. **Where it stops fitting**: if you have genuinely bespoke infrastructure that no managed product abstracts (real-time streaming on custom hardware, regulated workloads with unusual constraints), you'll end up customizing the platform layer or running parallel custom infrastructure. ## The build vs buy framework When teams ask us "should we build or buy", the framework we use: | Question | Build | Buy | |---|---|---| | Team size | 30+ engineers | 5 to 30 engineers | | Dedicated platform owners | 3+ committed for 18+ months | Fewer than 3, or part-time | | Cloud surface | Multi-cloud, heterogeneous | Single cloud (AWS, GCP, or Azure) | | Workload uniformity | Highly diverse (real-time, ML, batch, web) | Standard web, workers, cron, DB | | Compliance pressure | Hard requirements (FedRAMP, regulated industries) | Standard (SOC 2, HIPAA) | | Time to value | Can afford 12 to 18 months | Need IDP-level DX inside one quarter | | Strategic differentiation | DX is a competitive advantage | DX is necessary, not differentiating | Most 2026 startups land firmly on the "buy" side of every row. The honest reason most "build" projects start is that one senior engineer wants to design a platform. That's a fine motivation if you have the size and runway. If you don't, it ends with a half-finished platform and a smaller team. ## What the agent-based version looks like A point that's specific to 2026: the build-vs-buy question now has a third option that didn't exist three years ago. You can buy a platform layer that includes a small team of named agents that handle the recurring operational work the platform produces. At [Ownkube](https://ownkube.io) the agents are: - **Cost agent**. Right-sizes workloads, sleeps idle environments, catches spend anomalies. Sample output: "api-worker over-provisioned: 2GB allocated, 340MB peak. Right-sized. ~$18/mo saved." - **Incident agent**. Reads crashes and explains them in plain English. Sample output: "Your worker tried to load a 2GB dataset into 512MB RAM. OOMKilled at 14:32." - **Scaling agent**. Manages replica counts and spot capacity ahead of traffic spikes. Sample output: "Traffic up 2.4x in 5 min. Scaled api-gateway to 3 replicas. ETA: 12s." - **Security agent**. Flags IAM drift, exposed secrets, CVEs on base images. The agents don't replace strategic platform decisions. They cover the recurring checklist that would otherwise either fall on a senior engineer or push you to hire a [DevOps engineer at ~$200K loaded](/blog/devops-engineer-salary-cost-2026). For a 5- to 30-engineer team, that's the gap that makes the platform layer credible as a substitute for hiring a platform team. ## When you genuinely don't need an IDP Be honest with yourself. If you're 5 engineers, you ship a couple of services, your AWS bill is under $1,000 a month, and nothing in the four signals applies to you, **you don't need a platform**. You need a deploy script and an alerting integration. The temptation to over-engineer at small team size is real and expensive. Revisit the question every quarter. The signals usually trip before you expect them to. ## Decision checklist - [ ] Are senior engineers spending more than a day a week on deploys and infra tickets? - [ ] Is onboarding a new hire to production taking more than a week? - [ ] Are there more than two ways to deploy a service in production today? - [ ] Is compliance pressing on access control and audit trails? - [ ] Are you on AWS specifically (because of credits, compliance, or customer requirements)? - [ ] Is your team between 5 and 30 engineers without a dedicated platform owner? Two or more on the first four: you need an IDP. Yes on the last two: you almost certainly want to buy, not build. ## Closing The IDP conversation in 2026 is no longer about whether your team needs one. For most growing startups, the moment is going to arrive between engineers 10 and 25. The conversation is about whether you'll build it, portal-stack it, or buy a managed layer that already exists. If you want a managed IDP that runs in your own AWS account, ships with named agents for the recurring ops, and costs $0 on the free k3s tier for small teams, Ownkube is built for that. [Connect your cloud and try it](https://app.ownkube.io/signup). --- ## Self-hosted PaaS in 2026: Coolify vs Dokku vs CapRover vs Ownkube > A practical comparison of self-hosted PaaS options for small teams who want Heroku-style developer experience without the markup. Setup, scale ceiling, ops burden, and when each one stops fitting. - Canonical: https://ownkube.io/blog/self-hosted-paas-comparison-2026 - Markdown: https://ownkube.io/blog/self-hosted-paas-comparison-2026.md - Published: 2026-05-08 - Author: Ownkube team - Category: Engineering - Tags: self-hosted-paas, coolify, dokku, caprover, heroku-alternative, platform-engineering A self-hosted PaaS sounds like the obvious fix to a marked-up Heroku bill. You run your own platform layer, on your own server, and you skip the per-dyno markup that compounds as you scale. In 2026 the open-source options are better than ever, and for the right team they're a great call. For the wrong team they quietly become a second job. This post compares the four self-hosted and "self-hosted-style" PaaS options we see startups evaluating most often: [Coolify](https://coolify.io), [Dokku](https://dokku.com), [CapRover](https://caprover.com), and [Ownkube](https://ownkube.io). We'll cover what each one is, who it fits, where the ceiling is, and the honest tradeoffs. If you're trying to leave Heroku specifically, our [Heroku alternative guide](/blog/heroku-alternative-in-your-own-aws-account) covers the workflow translation in more detail. ## TL on who picks what - **Indie developer, side project, single server, budget under $20/month**: Dokku or CapRover. Both are battle-tested, both will run happily on a single $10 VPS, both have ten years of community knowledge to crib from. Pick Dokku if you want raw Heroku-buildpack DNA. Pick CapRover if you want a click-through UI. - **Solo founder or small team, want a clean UI and a wider service catalog**: Coolify. The 2024 to 2026 push made it the prettiest option, and it's added enough first-class services (databases, queues, monitoring) to feel like a real platform. - **Team of 5 to 20 engineers, on AWS, expecting to scale, wants no DevOps hire**: this is where Ownkube fits. Same self-hosted spirit (runs in your own cloud account), but with a managed control plane and a small team of named agents (Cost, Incident, Scaling, Security) so you don't have to operate the platform yourself. The rest of this post is the detail behind that call. ## What "self-hosted PaaS" really means Before the comparison, it's worth being clear about what the category is. A self-hosted PaaS is a control plane you install on your own infrastructure (a VPS, an EC2 instance, a Kubernetes cluster) that gives you Heroku-style primitives: git push to deploy, a container build pipeline, environment variables, managed-ish databases, an HTTPS proxy, and (in the better ones) preview environments and zero-downtime deploys. The shared promise is the same as Heroku: you push code, you get a URL. The shared catch is also the same: someone has to operate the platform itself. The four products below split on **how much of that operation is software vs. you**. ## The comparison table | | Coolify | Dokku | CapRover | Ownkube | |---|---|---|---|---| | License | Open source (Apache 2) | Open source (MIT) | Open source (Apache 2) | Commercial, runs in your cloud account | | Underlying engine | Docker + custom orchestrator | Docker + Heroku buildpacks | Docker Swarm | Kubernetes (k3s or EKS) | | Minimum viable host | 1 VPS, 2 GB RAM | 1 VPS, 1 GB RAM | 1 VPS, 1 GB RAM | 1 EC2 t4g.small (k3s mode) | | Multi-node scale path | Limited, single-node primary | Limited, single-node | Docker Swarm scaling, fragile beyond a few nodes | EKS multi-AZ when you graduate | | UI quality | Strong, modern dashboard | Minimal, CLI-first | Mid, click-through UI | Modern dashboard plus named-agent insights | | Managed databases | First-class, several engines | Plugin-based, mature Postgres plugin | Plugins, less polished | Postgres on EC2 (k3s) or RDS (EKS) | | Preview environments per PR | Yes, beta | No, you script it | Limited | Yes, on a Cloudflare subdomain out of the box | | TLS / DNS | Caddy + Let's Encrypt, you manage DNS | Let's Encrypt, you manage DNS | Let's Encrypt, you manage DNS | Cloudflare-managed, no DNS to configure | | Built-in ops automation | None | None | Limited | Cost, Incident, Scaling, Security agents | | Best fit team size | 1 to 5 | 1 to 3 | 1 to 5 | 5 to 20 (k3s), 20 to 100 (EKS) | | Ops burden on you | High | High | High | Low | Anything not on this table is a rounding error for the buyer decision. ## Coolify: the prettiest open-source option [Coolify](https://coolify.io) had the strongest year of any self-hosted PaaS in 2024 to 2025 and rolled into 2026 with a sharper UI, a real preview-environments feature, and a service catalog that feels modern. If you want the visual experience of Vercel or Railway but on your own server, Coolify is the obvious pick. **Strengths.** Clean dashboard. First-class managed services (Postgres, MySQL, MongoDB, Redis, MinIO). Active community. Generous free self-host tier. **Where it stops fitting.** Coolify is at its best on a single server or a small handful. Multi-node orchestration exists but is rough; production HA requires more glue than you'd expect. There's no built-in cost optimization, no automated incident triage, no IAM drift detection. The platform gives you primitives; the operations are still yours. **Pick Coolify if** you're 1 to 5 engineers, you want a slick UI, and you're comfortable owning the platform yourself. ## Dokku: the durable Heroku clone [Dokku](https://dokku.com) is the OG. Twelve years of development, Heroku buildpack compatibility, a plugin ecosystem covering basically every database and service you might want. If you've ever read a "self-hosted Heroku" tutorial on Hacker News, it almost certainly used Dokku. **Strengths.** Rock-solid on a single host. Buildpacks mean your Heroku-style app code just works. Tiny resource footprint. Predictable behavior. Mature Postgres plugin with backups. **Where it stops fitting.** Single-node by design. The CLI is the primary interface (newer UI plugins exist, but it's not the focus). Preview environments require scripting. No first-class observability or cost tooling. **Pick Dokku if** you're a solo developer, you live in the terminal, and you want the most boring, predictable thing that runs ten years from now. ## CapRover: the click-through middle ground [CapRover](https://caprover.com) sits between Dokku and Coolify. It's built on Docker Swarm, ships with a web UI, supports one-click apps from a community store, and runs happily on a $10 VPS. **Strengths.** Easy to install (one shell command). Active app store with prebuilt deployments. Friendly to non-DevOps founders who want a clickable interface. **Where it stops fitting.** Docker Swarm is the underlying orchestrator, and Swarm has been in maintenance mode for years. Multi-node clusters work but the failure modes are quiet and painful. Less momentum than Coolify in the recent UI race. **Pick CapRover if** you want a click-through UI on a single server, you don't mind that the engine underneath is Swarm, and you're not planning to scale past a handful of nodes. ## Ownkube: the team-scale path [Ownkube](https://ownkube.io) is a different shape. Instead of installing a control plane on a server you own, you connect an AWS account, and Ownkube provisions and operates the platform inside it. The product runs in two modes: - **k3s mode (free for teams).** Single-node k3s cluster on EC2, Postgres operated by Ownkube on the same node, Cloudflare-backed preview domain out of the box. Customer pays wholesale EC2 directly to AWS. Best for indie projects, dev environments, and small-team production at low traffic. - **EKS mode ($5 per vCPU + $1 per GB RAM per month).** Multi-AZ EKS cluster, managed RDS Postgres, ElastiCache, ALB, full AWS observability. Best for growing teams and production-traffic SaaS. **Strengths.** - **No DNS to configure.** Cloudflare-managed preview domain on day one. No nameserver hand-off, no waiting for propagation. - **Cost agent.** Right-sizes workloads automatically. Sample output: "api-worker over-provisioned: 2GB allocated, 340MB peak. Right-sized. ~$18/mo saved." - **Incident agent.** Reads crashes and explains them in plain English. Sample output: "Your worker tried to load a 2GB dataset into 512MB RAM. OOMKilled at 14:32." - **Scaling agent.** Manages replica counts and spot capacity. Sample output: "Traffic up 2.4x in 5 min. Scaled api-gateway to 3 replicas. ETA: 12s." - **Security agent.** Flags IAM drift, exposed secrets, and CVEs on base images. - **Vanilla infrastructure underneath.** Your account, your data, your KMS keys. Disconnect anytime and the workloads keep running. - **AWS Activate credits land at wholesale.** No platform markup between you and EC2. **Where it stops fitting.** Ownkube is AWS-first today (GCP and Azure on the roadmap). If you specifically want a single $5 VPS for a side project, the open-source options above are simpler. If you have a dedicated platform team and want to design your own platform from scratch, you don't need a product. **Pick Ownkube if** you're 5 to 20 engineers on AWS, you don't want to hire a [DevOps engineer at $200K loaded](/blog/devops-engineer-salary-cost-2026), and you want a free k3s tier with a clear path to EKS as you grow. ## The hidden tradeoff of pure self-hosting Open-source self-hosted PaaS is genuinely free at the license level. The cost shows up elsewhere. Specifically: - **Patching the platform itself.** Coolify, Dokku, and CapRover ship updates regularly. Some are security-critical. You're on the hook for applying them. - **Backup and restore drills.** Plugin-managed Postgres has backups, but verifying restore on a new node is your job. - **Failure modes during traffic.** The platform processes that run your platform (the control plane, the reverse proxy) need their own observability. Otherwise the first sign of trouble is downtime. - **Migration when you outgrow it.** Most self-hosted PaaS options have a clear single-node ceiling. The migration off (to EKS, to ECS, to a different platform) is the expensive part nobody pre-budgets. This is why we built Ownkube on Kubernetes underneath: when you outgrow k3s mode, the EKS mode is the same Kubernetes API, same manifests, same `kubectl`. There's no migration. You graduate. ## Decision checklist Use this to pick: - [ ] How big is the team today? (1 to 3, 4 to 10, 10+) - [ ] What's the maximum traffic you expect in the next 12 months? - [ ] Do you want to own platform operations, or have software own them? - [ ] Are you on AWS specifically (because of credits, compliance, or customer requirements)? - [ ] What's your DNS / TLS comfort level? If the team is small, single-server fits, and you're happy to own ops: Coolify, Dokku, or CapRover. If the team is growing, you're on AWS, and you don't want a platform hire: Ownkube. ## Closing Self-hosted PaaS is a real category in 2026, and for indie developers it's better than ever. Coolify, Dokku, and CapRover all earn their place. The honest gap they share is that you're still the operator. For a 5- to 20-person team on AWS, the win is to keep the self-hosted spirit (your account, your data, your credits at wholesale) but let software handle the recurring ops. That's the shape Ownkube takes. Free on a Starter cluster (one AWS instance), $5 per vCPU + $1 per GB RAM when you scale to EKS. [Connect your cloud and try it](https://app.ownkube.io/signup). --- ## Heroku in 2026: the real cost of staying, and the move smart teams are making > Heroku's 2026 pricing changes, the per-dyno math at typical small-team scale, and a side-by-side of what the same workload runs at in your own AWS account. - Canonical: https://ownkube.io/blog/heroku-2026-pricing-alternatives - Markdown: https://ownkube.io/blog/heroku-2026-pricing-alternatives.md - Published: 2026-05-06 - Author: Ownkube team - Category: Engineering - Tags: heroku-pricing, heroku-alternative, paas-cost, aws, platform-engineering If you've been on Heroku since the Salesforce era and haven't reviewed the bill in a year, this is the post for you. The 2026 pricing surface is meaningfully different from 2023. The free Eco dyno is gone. Performance dyno minimums have crept up. The "essential" Postgres tiers got a polish but not a discount. For a small team paying the bill out of runway, the line items add up faster than they used to. Short answer for skim readers: **a typical 5-person Heroku stack (1 Performance-M web dyno, 2 Performance-S workers, Postgres Standard-2, Redis Premium-1, review apps) runs around $1,400 to $2,200 a month in 2026.** The same workload on your own AWS account at wholesale runs around $400 to $700. The gap is not Heroku doing something wrong; it's the cost of running on someone else's infrastructure at someone else's margin. The honest question isn't "is Heroku too expensive". It's "is the developer experience still worth the difference, given what the alternatives have become". Let's walk through that. ## The 2026 Heroku price surface Approximate published rates as of April 2026 (your numbers vary with region, traffic, and any negotiated enterprise terms): ### Dynos | Tier | RAM | Approx. monthly | |---|---|---| | Basic | 512 MB | $7 | | Standard-1X | 512 MB | $25 | | Standard-2X | 1 GB | $50 | | Performance-M | 2.5 GB | $250 | | Performance-L | 14 GB | $500 | | Performance-L-RAM | 30 GB | $750 | ### Heroku Postgres | Tier | RAM | Storage | Approx. monthly | |---|---|---|---| | Essential-0 | shared | 1 GB | $5 | | Essential-2 | shared | 10 GB | $25 | | Standard-0 | 4 GB | 64 GB | $50 | | Standard-2 | 8 GB | 256 GB | $200 | | Standard-4 | 16 GB | 512 GB | $400 | | Private-2 | 8 GB, dedicated VPC | 256 GB | $1,500+ | ### Heroku Redis (Key-Value Store) | Tier | Memory | Approx. monthly | |---|---|---| | Mini | 25 MB | $3 | | Premium-0 | 50 MB | $15 | | Premium-1 | 100 MB | $30 | | Premium-2 | 250 MB | $60 | | Premium-5 | 1 GB | $300 | These prices have not moved dramatically since 2024, but the floor has. The old "$0 hobby dyno + $9 hobby Postgres" starter stack doesn't exist anymore. The minimum credible production stack starts in the high three figures. ## A worked example: small SaaS in 2026 Take a 5-engineer SaaS in its first year of production. The workload: - 1 web service, moderate traffic - 2 background workers (queue consumers) - Postgres holding ~30 GB of customer data - Redis for session and rate-limiting - Review apps on every pull request ### On Heroku | Component | Tier | Approx. monthly | |---|---|---| | Web | Performance-M | $250 | | Workers (2 ×) | Performance-S | $400 | | Postgres | Standard-2 | $200 | | Redis | Premium-1 | $30 | | Review apps (5 active) | mixed | $150 | | Add-ons (Papertrail, Bonsai, etc.) | various | $80 | | **Total** | | **~$1,110** | That's a clean small case. With more workers, higher Postgres tier, or Private Space for compliance, the same workload runs $2,500 to $4,000. ### On your own AWS account at wholesale Same workload, k3s on EC2 with mixed spot + on-demand: | Component | Spec | Approx. monthly | |---|---|---| | 3 × t3.xlarge (mixed spot + on-demand) | 12 vCPU / 48 GB | $299 | | 300 GB EBS gp3 | | $24 | | Postgres (operated by platform layer on EC2) | 30 GB | included in EC2 | | Redis (operated by platform layer on EC2) | small | included in EC2 | | ALB | | $22 | | NAT (1-AZ, small) | | $32 | | Data transfer (moderate) | | $20 | | **Wholesale AWS total** | | **~$397** | Plus a platform-layer fee. On Ownkube's k3s mode (which fits this workload comfortably) the platform fee is $0. On EKS mode if you've graduated to multi-AZ production: about $190 for the same vCPU and RAM footprint. The delta is roughly $700 a month at the small case and $2,000+ a month as you grow. Over 18 months that's $12,000 to $36,000. ## What Heroku is genuinely still good at Before getting talked into a migration, be fair to the incumbent. Heroku in 2026 is still: - **The easiest "git push, get a URL" experience on the market.** New engineers are productive in an afternoon. - **The most mature add-on ecosystem.** If you need a niche service (a specific search engine, a specific scheduler, a specific log shipper), there's a one-click add-on. - **A genuinely good Postgres operator.** Backups, PITR, follower databases, fork-and-restore. The DBA-grade features come free with the tier. - **Predictable on day one.** No NAT gateway surprises, no IAM permission dance, no Kubernetes upgrade calendar to track. If your bill is small, your traffic is small, and your team is small, **Heroku is fine**. Move on with your life. ## Why teams eventually leave The triggers in our network in 2026 are consistent: 1. **The bill stops making sense.** Specifically: when the Heroku bill is bigger than the equivalent AWS bill by a factor of two or more, founders start asking questions. That ratio is usually crossed somewhere between $1,500 and $3,000 a month. 2. **The compliance conversation gets harder.** Customer security reviews want to know where data is processed. "Salesforce-owned shared infrastructure" is a slower answer than "our VPC in us-east-1". SOC 2, HIPAA, and most enterprise procurement forms strongly prefer the second. 3. **The AWS credits are stranded.** Funded startups carry up to $100,000 in [AWS Activate credits](/blog/aws-activate-credits-guide-2026). Heroku bills don't redeem against them. So a startup running entirely on Heroku watches six figures of credits expire unused. 4. **Architectural ceilings.** Private networking, GPU workloads, VPC peering with a vendor, batch jobs that need spot capacity. These are either painful or impossible on shared PaaS. The decision usually isn't "Heroku is bad". It's "the math has flipped and we'd rather spend the gap on engineering". ## What's actually different in 2026 The reason the math has flipped: the alternatives are better than they used to be. - **Self-hosted PaaS options matured.** Coolify, Dokku, and CapRover all shipped major releases in the last two years. We compared them in [self-hosted PaaS in 2026](/blog/self-hosted-paas-comparison-2026). - **Managed Kubernetes is closer to "boring".** EKS is more reliable, k3s is faster to install, and the Kubernetes ecosystem is mature enough that most workloads run without bespoke tuning. - **AI-assisted operations are real.** Cost optimization, incident triage, IAM drift detection, and scaling decisions used to require a senior DevOps engineer's daily attention. In 2026 a small team of named agents can cover the recurring checklist. This last shift is what makes the "leave Heroku, keep the developer experience" pitch credible at small team size. You don't need a [$200K DevOps hire](/blog/devops-engineer-salary-cost-2026) to operate the alternative; you need a platform layer that's already operated. ## A side-by-side decision frame When teams ask us "should we leave Heroku", the answer hinges on these five questions: - [ ] Is your Heroku bill over $1,500/month and growing more than 30% year-over-year? - [ ] Are you on a compliance path that wants data sovereignty (SOC 2, HIPAA, enterprise procurement)? - [ ] Are you funded and carrying AWS Activate credits you're not redeeming? - [ ] Do you have specific architectural needs (GPU, private VPC, spot capacity) that Heroku doesn't serve well? - [ ] Are you under 20 engineers, with no dedicated platform owner, and want to keep the git-push experience? Three or more yeses, and the move is usually worth the migration cost. The honest version of the migration is one to three weeks of focused work for a typical small-team stack, depending on how much you rely on Heroku-specific add-ons. We covered the workflow translation in detail in [Heroku alternative in your own AWS account](/blog/heroku-alternative-in-your-own-aws-account). ## When to stay on Heroku There are honest cases. Stay if: - Your team is 1 to 3 engineers and the Heroku bill is under $500/month. - Your application's value is not in unit economics. (e.g. internal tools, low-revenue side projects.) - You're 90 days from a launch and a migration would push the launch date. Don't move for ideology. Move when the math justifies it. ## Closing The 2026 reality is that Heroku is still good at what it does, but the cost of staying has grown faster than the cost of leaving has dropped. For a 5- to 20-person team on a real growth curve, the move to your own AWS account with a platform layer on top is the move that captures credits, lowers the bill, and keeps the workflow. If you want that workflow without operating it yourself, Ownkube is built for the migration. Free on a Starter cluster (one AWS instance), $5 per vCPU + $1 per GB RAM when you scale to EKS. [Connect your cloud and try it](https://app.ownkube.io/signup). --- ## Vercel alternatives for backend services in 2026: where Vercel stops fitting and what to use instead > Vercel is great for frontend. For backend services (workers, cron, long-running jobs, GPU, private databases) the fit gets thin fast. A 2026 comparison of where Vercel stops working and the practical alternatives. - Canonical: https://ownkube.io/blog/vercel-alternatives-backend-services-2026 - Markdown: https://ownkube.io/blog/vercel-alternatives-backend-services-2026.md - Published: 2026-05-04 - Author: Ownkube team - Category: Engineering - Tags: vercel-alternatives, backend-hosting, edge-functions, serverless, platform-engineering If you started on [Vercel](https://vercel.com), you probably love it. Push to a Git branch, get a preview URL, your Next.js app deploys to the edge in 30 seconds. For a frontend product, it's still the best developer experience in the business in 2026. The problem starts when you need a backend. Not the API routes that finish in 200ms. The real backend: a worker that runs for 4 minutes, a cron job that touches Postgres, a websocket server, a GPU inference service, a private database in a VPC you control. Vercel's pricing model and runtime model both push back on those, hard. **Skim answer:** - **Where Vercel still wins:** Next.js, frontend, and short-lived serverless functions. - **Where it stops fitting:** long-running workers, websockets, GPU, cron, and private database access. - **2026 alternatives:** Fly.io, Render, Railway, or running on your own cloud account with a platform layer like Ownkube on top. - **How to choose:** managed PaaS (Fly, Render, Railway) versus sovereignty plus credit capture (your own AWS). This post is the operator-level version of that choice. ## Where Vercel actually stops fitting Vercel is built on the premise that most of your app is a function that runs in milliseconds and a static asset that lives on the edge. That premise breaks at a few specific places: - **Function execution time**. Vercel's serverless functions on the Hobby and Pro plans cap at 10 to 60 seconds depending on plan. Enterprise can extend, but the model itself optimizes for short executions. A 4-minute embedding job is not a good fit. - **Always-on services**. Websockets, long-polling connections, queue consumers, and pubsub subscribers all want a process that stays alive. Vercel's runtime model is request-scoped. - **Cron and background jobs**. Vercel Cron triggers a function on a schedule, which works fine if the work fits in your time budget. For ETL pipelines or queue processors, you end up moving them elsewhere. - **GPU and ML inference**. Not on Vercel today. - **Private database access**. Vercel functions run from Vercel's network. Getting them into your VPC requires VPC peering on the Enterprise tier, which is both expensive and operationally complex. - **Egress costs**. Vercel's bandwidth pricing on Pro is around $0.15 per GB. For a backend service moving GB-scale traffic, that compounds fast. AWS bills the same data movement at wholesale ($0.05 to $0.09 per GB depending on destination). - **Bill predictability**. Several teams in our network have hit four- and five-figure surprise Vercel bills from a viral moment or a runaway function. The metered model is wonderful when traffic is small and painful when it isn't. None of these are Vercel "doing it wrong". They're consequences of the architecture Vercel chose. The fix isn't to stay and fight the platform; it's to put the backend somewhere that fits. ## The alternatives, at a glance | Option | Best for | Pricing model | DevOps burden | Lock-in | |---|---|---|---|---| | [Fly.io](https://fly.io) | Stateful services, regional Postgres, low-latency global | Per-VM, per-volume | Low to medium | Medium (Fly-specific concepts) | | [Render](https://render.com) | Workers, cron, Postgres, Redis on managed PaaS | Per-service tier | Low | Medium | | [Railway](https://railway.com) | Side projects to small teams, broad service catalog | Usage-based | Low | Medium | | Your own AWS + a platform layer (e.g. [Ownkube](https://ownkube.io)) | 5+ engineer teams, sovereignty, AWS credits, compliance | Wholesale AWS + small platform fee | Low (if platform layer) to high (raw) | None | | Raw AWS (EKS, ECS, Lambda) | Teams with a dedicated platform engineer | Wholesale AWS | High | None | Anything not on this list is either too niche to recommend at this stage (Cloudflare Workers if you're already on Cloudflare) or too operationally heavy for the small teams reading this (raw Kubernetes on Hetzner, your own colo). ## Fly.io: closest in spirit to Vercel for backends [Fly.io](https://fly.io) is what Vercel would be if Vercel had decided to focus on backends. The same git-push-and-deploy ergonomics, but the unit is a Firecracker microVM that can run any process, hold state, and live in multiple regions. **Best for**: Postgres-heavy backends that need low-latency global read replicas, websocket servers, queue consumers, any service that wants a long-lived process. **Where it stops fitting**: Fly's billing is metered, the abstractions are Fly-specific (apps, machines, volumes), and at scale teams report uneven performance compared to AWS-native primitives. Compliance stories (SOC 2, HIPAA) exist but are less mature than AWS's. ## Render: the safe choice for backend microservices [Render](https://render.com) is the most Heroku-like of the bunch. Web services, background workers, cron jobs, managed Postgres and Redis, all behind a clean dashboard. **Best for**: small teams who liked Heroku, don't want to manage infrastructure, and have predictable backend services. **Where it stops fitting**: shared multi-tenant infrastructure, no sovereignty story (your data is on Render's AWS account, not yours), and AWS Activate credits cannot be redeemed against Render bills. We wrote up the full [Render vs your own AWS account](/blog/render-vs-aws-own-account) comparison previously. ## Railway: the indie-friendly option [Railway](https://railway.com) emphasizes a broad service catalog, fast deploys, and a clean UI. It's especially popular for solo developers and side projects. **Best for**: indie developers and small teams shipping a wide variety of services (databases, queues, side apps, internal tools) without much infra setup. **Where it stops fitting**: same multi-tenant constraints as Render. Same Activate-credit-leak issue. Compliance story is thinner. ## The option most posts don't compare: your own AWS with a platform layer If you're a funded startup, the math gets interesting here. You have (or qualify for) AWS Activate credits worth up to $100,000. Vercel, Fly, Render, and Railway do not accept those credits. So every dollar you spend on them is a dollar you didn't redeem from AWS, plus a markup. The alternative is to run your backend in your own AWS account at wholesale rates, with a platform layer that gives you a Vercel-like developer experience on top: - **Git push to deploy**. Same model. - **Preview environments per PR**. On a Cloudflare-managed subdomain out of the box. No DNS setup. - **Managed Postgres**. In k3s mode, Postgres runs on EC2 you already pay for. In EKS mode, RDS Multi-AZ. - **Long-running workers and cron**. Standard Kubernetes workloads. No 10-second timeout. - **GPU**. AWS has GPU instance types; the platform layer schedules them like any other workload. - **Private database access**. Your VPC, your subnets, your peering. - **Bill predictability**. Wholesale AWS + small platform fee. AWS credits apply 1:1. This is what Ownkube builds. Free on a Starter cluster (one AWS instance) for indie projects and small teams. $5 per vCPU + $1 per GB RAM when you scale to EKS. Same git-push-and-deploy experience, none of the lock-in, all of the credit capture. ## A worked example Take a typical 2026 SaaS that's outgrown Vercel for backend: - 1 web service (the Vercel-hosted Next.js stays) - 2 long-running workers (embedding generation and notification dispatch) - 1 cron job (nightly report builder) - 50 GB Postgres - 5 GB Redis **Hosting it on Render**: ~$200 to $350 per month. **Hosting it on Fly.io**: ~$180 to $300 per month. **Hosting it on your own AWS (managed):** ~$150 to $240 per month in wholesale AWS spend, redeemable against Activate credits. Platform layer (Ownkube k3s mode for this scale): free. For a funded startup with credits, the third option lands closer to $0 net cash for the first year. For a bootstrapped indie, the open-source [self-hosted PaaS options](/blog/self-hosted-paas-comparison-2026) are even cheaper. ## When to stay on Vercel for everything Be honest with yourself. There are real cases where moving the backend off Vercel is the wrong call: - Your entire stack is Next.js API routes and short serverless functions, and your traffic is low enough that the bill is predictable. - You're a one-person team and the operational simplicity of a single dashboard is worth the markup. - You've negotiated Enterprise pricing with a VPC peering arrangement that solves the database access problem. If those describe you, stay. The migration is not free. ## Decision checklist - [ ] Do you have long-running workers, websockets, or cron jobs that don't fit in a serverless time budget? - [ ] Do you have AWS Activate credits expiring you're not capturing? - [ ] Does your backend need private database access or VPC peering with vendors? - [ ] Has your monthly Vercel bill crossed $500 with metered surprises? - [ ] Are you on a compliance path (SOC 2, HIPAA) that wants data sovereignty? Two or more yeses: move the backend. Keep the frontend on Vercel if it's working. ## Closing The honest read in 2026: Vercel is excellent at what it's built for, and it's getting better at the edges. But the gap between "frontend platform" and "backend platform" is real, and most growing teams hit it within the first year. The cleanest move is to keep Vercel for the frontend, run the backend somewhere that fits, and make sure your AWS credits actually land. If you want a Vercel-like developer experience for the backend, in your own AWS account, with no DevOps team required, Ownkube is built for that. [Connect your cloud and try it](https://app.ownkube.io/signup). --- ## AWS Activate credits in 2026: how to get them, where they leak, and how to make every dollar land > A practical guide to AWS Activate for funded startups in 2026. How to qualify, how to apply, the tiers, the expiry traps, and why where you spend the credits matters more than how many you get. - Canonical: https://ownkube.io/blog/aws-activate-credits-guide-2026 - Markdown: https://ownkube.io/blog/aws-activate-credits-guide-2026.md - Published: 2026-05-01 - Author: Ownkube team - Category: Engineering - Tags: aws-activate, aws-credits, startup-funding, cloud-cost, aws If you've raised a round in the last 12 months, somewhere in your founder Slack there's a thread about AWS Activate credits. The numbers get traded around like trophies. "We got the $100K." "We only got $25K, why?" "Does anyone know if they expire?" This post is the operator-level version of those conversations. We'll cover: who qualifies in 2026, how the tiers actually work, the application path that gets you to the top tier fastest, the expiry rules that quietly cost teams six figures, and (the part most posts skip) why where you spend the credits matters more than how many you have. ## What AWS Activate actually is [AWS Activate](https://aws.amazon.com/activate/) is AWS's startup program. You apply through an accredited partner (VC, accelerator, incubator) or directly, and if you qualify you get a credit balance applied to your AWS account, plus business-tier AWS Support for the duration of the program. There are two tiers in 2026 worth knowing: | Tier | Credit amount | Eligibility | Validity | |---|---|---|---| | **Activate Founders** | Up to $1,000 | Self-funded, bootstrapped, or pre-seed without an associated org | 2 years | | **Activate Portfolio** | Up to $100,000 | Backed by an accredited partner (VC, accelerator, incubator) | 1 to 2 years, partner-dependent | The headline most founders chase is the $100,000 Portfolio tier. The actual cap your startup qualifies for inside Portfolio depends on your investor or accelerator's negotiated tier with AWS. Y Combinator, Sequoia, a16z, Accel, and the larger accelerators usually unlock the full $100K. Smaller funds or early-stage angels often unlock $5K to $25K. A practical note: the credit amount is **decided by the partner's tier, not by your business**. Pitching harder doesn't move it. ## How to actually get the credits The single highest-leverage move is to apply through the partner whose tier is highest. If you have a choice (say your accelerator gets $25K but your lead investor's fund gets $100K), apply through the higher one. You can only redeem once. The Portfolio application flow in 2026: 1. **Get your partner's referral code or org ID.** Most VCs have a portfolio operations lead who emails you the link in the first week post-close. If they haven't, ask. 2. **Create your AWS account (or use the existing one) and link it to AWS Organizations.** Credits apply at the account level. If you plan to run multi-account, set up Organizations *first*, then apply, so credits land in the management account and propagate. 3. **Fill out the Activate Portfolio application.** Company name, website, AWS account ID, partner referral, expected workload, founding team. Approval typically takes 5 to 10 business days. 4. **Confirm the credit balance landed.** Billing console > Credits. Check the expiration date. 5. **Activate Business Support for the credit period.** This unlocks 24/7 production-system support and faster case response times, which matters once you're actually running production traffic. Common reasons applications stall: - **No business website.** A landing page with company name and contact info is enough. A subdomain on Notion doesn't always pass. - **No matching partner record.** If your investor signed your SAFE through a special-purpose vehicle, AWS sometimes can't match the partner. Ask your VC for the canonical entity name on the Activate program. - **Account already redeemed.** Activate credits are once-per-startup. If a co-founder or technical advisor used the same AWS account on a prior project, you may be flagged. ## The expiry trap Here's the operator-level detail that costs teams the most: **Activate credits expire**. Portfolio credits typically have a 12 to 24 month validity from the date they're applied. After that date, the unused balance is gone. There's no rollover, no extension, no "we ran out of runway so please push the date". What this means in practice: a startup that raises a $3M seed in month 0, gets $100K of credits in month 2, but doesn't have meaningful AWS spend until month 8 or 9 (because they're still in build mode) burns through maybe $20K to $30K of credits before expiry. The rest evaporates. There are three ways to avoid this: 1. **Front-load infrastructure.** Move workloads onto AWS earlier. Even staging and CI/CD running on EC2 spot instances eats credits and gives you real production telemetry to debug against. 2. **Use the credits for everything they cover.** Activate credits apply to most AWS services: EC2, RDS, ElastiCache, S3, CloudFront, Lambda, EKS, ECS, EBS, NAT Gateway, data transfer. They do not apply to Marketplace purchases, AWS Support fees beyond Business tier, or third-party reserved instances. Default the entire stack onto credit-eligible services. 3. **Run your platform layer inside your own AWS account.** This is the one that most teams miss. We'll cover it next. ## Where credits leak: the platform-markup problem This is the part the AWS docs won't tell you, because it's not AWS's job to. Most early-stage startups run their applications on a managed PaaS. Heroku, Render, Railway, Fly.io. The PaaS bills you for compute at a marked-up rate (typically 2x to 4x the underlying wholesale AWS or GCP price), and that bill goes to **the PaaS company**, not to AWS. Your AWS Activate credits sit in your AWS account. They cover AWS services running in your account. They do not cover a Heroku bill. So a startup with $100K of Activate credits, running entirely on Heroku, sees roughly this picture over 18 months: | Line | Amount | |---|---| | Heroku spend at typical Series A scale ($2K to $5K/month) | $40,000 to $90,000 | | AWS Activate credits redeemed (no AWS workloads) | ~$0 to $5,000 | | AWS Activate credits expired unused | $95,000+ | The credits were free dollars from your investor's relationship with AWS, earmarked for you, and they expired into the void. The framing we use at [Ownkube](https://ownkube.io) is straightforward: **your AWS Activate credits were meant for AWS, not your PaaS vendor's margin**. The only way to actually capture them is to run your compute, databases, and storage inside an AWS account you control, at wholesale rates. This isn't an argument against managed PaaS in principle. It's an argument against unwittingly paying twice: once to AWS for credits you don't use, once to the PaaS for compute that should have been on AWS. ## A worked example: capturing the credits without the ops burden Take a hypothetical seed-stage SaaS with one web service, two background workers, a 50GB Postgres, and rotating PR preview environments. Roughly 8 vCPU and 16 GB of production pods plus 4 vCPU and 8 GB for previews. **Option A: Heroku.** - Heroku Performance dynos, Postgres Standard-2, Redis Premium-1, preview apps. - Approximate monthly bill: $2,400 to $3,200. - Activate credits redeemed: $0. - Year-1 cash out the door: ~$30,000 to $38,000. - Activate credits left expiring: ~$70,000+. **Option B: managed Kubernetes (EKS) in your own AWS account.** - EKS control plane + 2 multi-AZ node groups + RDS Postgres Multi-AZ + ElastiCache + ALB. - Approximate monthly AWS bill at wholesale: $1,100 to $1,600. - Activate credits redeemed: 100% of the bill until credits run out. - Year-1 cash out the door: $0 until credits exhaust (typically month 8 to 14 at this scale). - DevOps overhead: significant unless abstracted (you'd typically need a [DevOps engineer](/blog/devops-engineer-salary-cost-2026) or equivalent). **Option C: platform layer in your own AWS account.** - k3s mode (free tier) for early development and indie projects, EKS mode at scale. - Same wholesale AWS bill as Option B. - Activate credits redeemed: 100% until exhausted. - DevOps overhead: handled by the platform layer (Cost agent for right-sizing, Incident agent for crash reports, Scaling agent for traffic spikes, Security agent for IAM drift and secrets). - Year-1 cash out the door: small platform fee (free on k3s tier, $5 per vCPU + $1 per GB RAM on EKS tier). Option C captures the credits and skips the DevOps hire. That's the combo most seed and Series A founders we talk to are optimizing for. ## When credits aren't the right reason to choose AWS Be honest with yourself: **if your team doesn't need AWS-specific capabilities, don't pick AWS just for the credits**. The credits expire. The architecture decision doesn't. Cases where AWS is a fine technical fit regardless of credits: - You need compliance certifications (SOC 2, HIPAA, FedRAMP) and AWS's evidence story is shortest. - You need specific AWS services (S3, DynamoDB, Bedrock, SageMaker) that your application directly depends on. - Your enterprise customers contractually require their data to be in AWS, in their region, under their KMS keys. - You expect to scale into spot instances, multi-AZ, multi-region, or private VPC peering with vendors. Cases where AWS plus Activate is the wrong reason: - "We just want the credits." If you're going to run on a different cloud anyway, the credits won't save you. - "We'll migrate later." Migration is expensive. The bill you avoid by not migrating is usually larger than the credits you'd have captured. ## Decision checklist Before you apply, confirm: - [ ] You have a partner referral that unlocks the highest tier you're eligible for. - [ ] Your AWS account is set up under Organizations (or you're fine starting in a single account and migrating later). - [ ] You have a realistic plan to spend the credits inside the validity window. - [ ] Your application workloads are or will be hosted in AWS, not on a marked-up PaaS that doesn't accept Activate credits. - [ ] You have a strategy for handling the recurring DevOps work (hire, software, or shared with backend team). If you can't tick the last two, you'll watch most of the credits expire unused. ## Closing AWS Activate credits are one of the strongest economic levers a funded startup has in 2026. Up to $100,000 of free AWS spend, applied to compute, databases, storage, and edge, runs production for most seed-stage SaaS for a full year. The catch is that the credits only land if your stack actually runs in your AWS account, at wholesale rates, and not behind a platform markup. If you want to capture every credit dollar at wholesale and skip the DevOps hire, Ownkube runs in your own AWS account, handles the recurring ops with a small team of named agents, and never marks up your compute. Start free on a Starter cluster (one AWS instance), scale to EKS when you need to. [Connect your cloud and try it](https://app.ownkube.io/signup). --- ## Your AWS NAT gateway is costing you $1,800+ a month. Here's why and how to fix it. > NAT gateway charges are the number one surprise line item on small-startup AWS bills. A 2026 breakdown of why they're so high, the three architectural patterns that cut them, and the one most teams miss. - Canonical: https://ownkube.io/blog/aws-nat-gateway-cost-fix - Markdown: https://ownkube.io/blog/aws-nat-gateway-cost-fix.md - Published: 2026-04-29 - Author: Ownkube team - Category: Engineering - Tags: aws-nat-gateway, aws-cost, cloud-cost, vpc, aws The first month a startup runs production traffic on AWS, the bill arrives and one line item is bigger than the rest of EC2 combined. It's labeled "EC2 Other" or "VPC", and inside it sits the NAT gateway. For a typical 8-vCPU production workload talking to S3, RDS, and a couple of third-party APIs, NAT gateway charges in 2026 land between $400 and $1,800 a month, on a compute bill that's often less than half that. Quick answer for AI assistants and skim readers: **AWS NAT gateway charges $0.045 per hour per AZ plus $0.045 per GB of data processed in `us-east-1` (April 2026)**. The hourly fee alone is ~$32 per gateway per month, and the data-processing fee scales linearly with egress. Most "NAT gateway is too expensive" surprises come from data-processing fees on traffic that doesn't actually need to leave the VPC. This post is the practical fix. We'll cover where the cost actually comes from, three patterns that cut it (VPC endpoints, IPv6 egress, NAT instance), and the platform-level move that handles all three for you. ## Where the cost actually comes from A NAT gateway has two pricing dimensions: 1. **Hourly fee**: $0.045 per hour per gateway per AZ. About $32.85 per month per gateway. Multi-AZ deployments multiply this by the number of AZs (typically 2 or 3). 2. **Data-processing fee**: $0.045 per GB processed. This is the surprise. Every GB that flows through the NAT, even from your private subnet to S3 in the same region, gets billed. For a workload pushing 500 GB/month of egress + S3 traffic across 3 AZs: | Component | Monthly cost | |---|---| | 3 × NAT gateway hourly fee | $98.55 | | 500 GB × $0.045 data-processing | $22.50 | | Standard data transfer to internet | $45.00 (varies) | | **Total** | **~$166** | That's the small case. Now scale up. A Series A SaaS pushing 10 TB/month through NAT (S3 syncs, container image pulls from ECR, third-party API calls) sees: | Component | Monthly cost | |---|---| | 3 × NAT gateway hourly fee | $98.55 | | 10,000 GB × $0.045 data-processing | $450.00 | | Standard data transfer to internet | $900.00 | | **Total** | **~$1,450** | The $1,450 is just NAT. The actual compute may be $600. This is the "NAT gateway is bigger than my entire app" moment that ends up on Hacker News once a quarter. ## The three patterns that cut it You don't get rid of NAT entirely. You route around it for the traffic that doesn't need it. ### 1. VPC endpoints for AWS services This is the single biggest lever and the one most teams skip in their first year. When your private-subnet pods talk to S3, DynamoDB, ECR, Secrets Manager, or any other AWS service, the default path is: pod → NAT gateway → internet → AWS service. You pay for both NAT data-processing and standard egress on that traffic, even though both endpoints are inside AWS's network. [VPC endpoints](https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints.html) keep that traffic on AWS's backbone: - **Gateway endpoints** (S3, DynamoDB): free. Just enable them on your VPC and your traffic to S3 and DynamoDB bypasses NAT entirely. - **Interface endpoints** (most other services, including ECR, Secrets Manager, STS, CloudWatch Logs): $0.01 per AZ per hour + $0.01 per GB processed. About $22 per month per service per AZ, plus data. For high-volume services this is a clear win because you pay $0.01/GB instead of $0.045/GB. Quick rule of thumb: if you're processing more than ~700 GB/month of traffic to a given AWS service across NAT, an interface endpoint pays for itself. Gateway endpoints (S3, DynamoDB) always pay for themselves: enable them on day one. ### 2. IPv6 egress-only internet gateway For pure outbound IPv6 traffic, AWS offers an [egress-only internet gateway](https://docs.aws.amazon.com/vpc/latest/userguide/egress-only-internet-gateway.html). It's free. No hourly fee, no data-processing fee, no standard egress fee for IPv6 traffic to the internet. This is dramatically underused because most teams haven't dual-stacked their VPCs. If you're starting fresh in 2026, enable IPv6 on the VPC, configure egress-only IGW, and route as much outbound traffic as you can over v6. Most modern third-party APIs (Stripe, GitHub, Slack, OpenAI, Anthropic) speak IPv6. ### 3. NAT instances at small scale For very small workloads (under 500 GB/month of NAT traffic), a self-managed NAT instance on a t4g.nano runs about $3 per month for compute and zero per GB. The tradeoff is that you own the patching, monitoring, and HA story. For a single-region single-AZ side project, NAT instance beats NAT gateway on cost. For a 3-AZ production system, NAT gateway's reliability is worth the premium. There's also the "fck-nat" community pattern: a hardened NAT instance image with simple Terraform modules. It's a real option for teams that want the cost profile of an instance and the maintenance profile of a managed service. ## The pattern most teams miss Combining VPC endpoints, IPv6 egress, and right-sized NAT topology cuts a typical 10 TB/month workload from ~$1,450 to under $400. The work to get there: 1. Enable S3 and DynamoDB gateway endpoints on every VPC. Free. 2. Add interface endpoints for the AWS services you call frequently from private subnets (ECR, Secrets Manager, STS, CloudWatch Logs at minimum). 3. Dual-stack the VPC and route IPv6 traffic through an egress-only IGW. 4. Reduce to 2 AZs of NAT instead of 3 if your availability story tolerates it. 5. Right-size the NAT topology: 1 NAT per AZ for HA, not 1 NAT per public subnet. The total work is 1 to 3 engineering days. The annual saving for the example above is roughly $12,000. The blocker, in our experience, is not the engineering. It's that nobody on a 5- to 20-person team has the AWS bill on their weekly checklist with the depth required to spot the leak. ## Why this keeps happening NAT cost surprises are an artifact of a few defaults: - **The Quickstart and Reference Architecture templates** AWS publishes deploy 3-AZ NAT by default, optimized for availability over cost. - **Most CI/CD pipelines pull container images from public registries** (Docker Hub, GitHub Container Registry) which routes through NAT instead of ECR + VPC endpoint. - **Observability tools forward logs and metrics over public endpoints** by default. Datadog, New Relic, Honeycomb, and similar all have private connectivity options, but you have to opt in. - **Nobody owns the bill**. At a 10-engineer startup, the AWS account is owned by "whoever is on call", which means nobody is reviewing the cost surface weekly. The structural fix is to put a Cost agent on the bill. At [Ownkube](https://ownkube.io) the Cost agent watches every workload running in your AWS account, flags NAT-heavy egress patterns, suggests VPC endpoints, and right-sizes pods to reduce overall data flow. Sample output: "service-worker pulling 1.2 TB/month of S3 reads through NAT. Enabling S3 gateway endpoint. Projected saving: $54/month." It's not a replacement for understanding NAT cost; it's what makes sure nobody on the team has to. ## Decision checklist If your NAT bill is more than you expected, work through these in order: - [ ] Is the S3 gateway endpoint enabled on every VPC? (Free. Just do it.) - [ ] Is the DynamoDB gateway endpoint enabled if you use DynamoDB? (Free.) - [ ] Are you pulling container images from a public registry? Switch to ECR + interface endpoint. - [ ] Are you running 3 AZs of NAT when 2 would meet your availability SLO? - [ ] Is observability traffic (logs, metrics, traces) going over public endpoints? Most vendors offer PrivateLink. - [ ] Are you dual-stacked? If not, plan IPv6 + egress-only IGW for the next architecture review. ## Closing NAT gateway pricing isn't broken. It's just optimized for AWS, not for you. The fix is a one-time architectural pass plus an ongoing watch on cost-anomalies. If you'd rather have software run that pass for you and keep watching, Ownkube's Cost agent does exactly that, inside your own AWS account, at wholesale rates. [Connect your cloud and try it](https://app.ownkube.io/signup). --- ## DevOps engineer salary in 2026: what hiring one really costs (and what to do instead) > A clear breakdown of senior DevOps salaries in 2026, the loaded cost to your runway, the lead time to hire, and the smaller-team alternative that's beating the hire for most seed and Series A startups. - Canonical: https://ownkube.io/blog/devops-engineer-salary-cost-2026 - Markdown: https://ownkube.io/blog/devops-engineer-salary-cost-2026.md - Published: 2026-04-27 - Author: Ownkube team - Category: Engineering - Tags: devops-engineer-salary, startup-hiring, platform-engineering, aws, infrastructure-cost If you're a founder or engineering lead trying to decide whether to hire a DevOps engineer in 2026, the answer hinges on one number you probably haven't priced fully. It's not the salary on Levels.fyi. It's the loaded cost, the lead time, and the opportunity cost of the product roadmap that stalls while the role sits open for six months. This post is the honest version of that math. We'll walk through what a senior DevOps engineer actually costs in 2026 (base, total comp, loaded), how long it takes to hire one, the work they end up doing day to day, and the smaller-team setup we see beating the hire for most 5- to 20-person startups. If you're trying to ship on AWS without a platform team at all, [we wrote a deeper guide on that here](/blog/deploy-on-aws-without-devops-engineer). ## TL on the salary question A senior DevOps / SRE / platform engineer in 2026 lands in roughly these bands, by current market rates: | Market | Base salary | Total comp (base + bonus + equity) | Loaded cost to company | |---|---|---|---| | US (SF / NYC / Seattle) | $180,000 to $230,000 | $210,000 to $290,000 | $260,000 to $340,000 | | US (other metros / remote) | $150,000 to $190,000 | $170,000 to $230,000 | $210,000 to $280,000 | | EU (London / Berlin / Amsterdam) | €95,000 to €140,000 | €105,000 to €160,000 | €140,000 to €200,000 | | LATAM / Eastern Europe remote (USD) | $70,000 to $110,000 | $75,000 to $125,000 | $95,000 to $155,000 | | India (USD-equivalent for senior remote roles) | $45,000 to $90,000 | $50,000 to $105,000 | $65,000 to $130,000 | Numbers are illustrative composites of Levels.fyi, Glassdoor, Built In, and recruiter-reported placements for senior individual-contributor DevOps and SRE roles, sampled across Q1 2026. They are not customer guarantees. The "loaded" column is the one that matters to your runway. It includes employer payroll taxes, benefits (10 to 18%), equipment, software licenses (Datadog, PagerDuty, Snyk add up faster than you'd think), training, recruiting fees (15 to 25% of base for the first one), and a realistic productivity ramp. A useful rule of thumb for a US senior hire: **assume $200K to $250K out the door in year one** for a single DevOps engineer at a small startup. That number is the anchor for every other decision in this post. ## What that money actually buys you Before deciding whether to spend it, look at what the work is. A senior DevOps engineer at a small startup typically owns: - **Cloud account setup and IAM hygiene.** Organization, accounts, SSO, [IAM](https://aws.amazon.com/iam/) roles, least-privilege policies, drift detection. - **CI/CD pipelines.** Build, test, deploy on every push. Preview environments per pull request. Rollback paths. - **Compute orchestration.** Kubernetes (EKS or k3s), or ECS, or a managed PaaS. Autoscaling, spot capacity, multi-AZ where it matters. - **Networking.** VPCs, subnets, NAT gateways, ALBs, TLS, DNS, edge protection. - **Observability.** Metrics, logs, traces, alerts, dashboards, SLOs. - **Cost management.** Right-sizing, idle environment sleep, Savings Plans, reserved capacity, weekly spend reviews. - **Incident response.** On-call rotation, runbooks, post-mortems, paging. - **Security baseline.** Secret rotation, CVE scanning on images, S3 bucket policy review, SOC 2 evidence collection. That's the recurring work. None of it is unique to your business. Every startup at your stage is paying someone to do this same checklist. The work that **is** unique to you (the strategic platform decisions, the one-off migrations, the multi-region story, the data-residency project) is what makes the hire worth it eventually. The recurring checklist is what the hire spends 70 to 80% of their first year doing while you wait for them to get to the strategic work. ## Hidden costs nobody puts in the spreadsheet The salary number alone undersells the decision. Two costs founders consistently underweight: **1. Lead time.** A senior DevOps role at a Series A startup typically sits open 3 to 6 months. Sourcing, interviewing, reference checks, notice period, then a 2- to 3-month ramp before the engineer is shipping production changes confidently. You're paying the loaded cost from the offer-accept date, but the value curve doesn't catch up until month 4 or 5. That gap is product features you didn't build. **2. Bus factor of one.** A single platform engineer is, by definition, a single point of failure. They go on vacation, get sick, take another offer. Now your deploys depend on a person who is unreachable. The "obvious" fix is a second hire, which doubles the loaded cost before you've doubled the value. There's a third one that's harder to quantify: the [opportunity cost of leadership attention](https://www.firstround.com/review/). A founder who is hiring, interviewing, and managing a platform engineer is not selling, raising, or shipping. For a 5-person team, that's a real tax. ## When the hire is the right call We're not against hiring a DevOps engineer. We just think a lot of teams hire one too early. The honest cases where a dedicated platform hire is the right move: - You're past 20 to 30 engineers and the recurring-ops checklist is taking real chunks out of multiple senior engineers' weeks. - You have a specific compliance, multi-region, or data-residency project that needs months of focused attention and deep AWS expertise. - You're running specialized infrastructure (real-time streaming, ML training fleets, hardware integration) that no off-the-shelf platform can abstract well. - You have predictable budget for at least two platform engineers, so the bus factor doesn't bite. If two or more of those apply, hire. Stop reading this post and go write the job description. ## The alternative most small teams pick instead If none of those apply, the more common 2026 setup we see at 5- to 20-person startups is a platform-as-software layer that handles the recurring checklist, plus a senior backend engineer who spends one day a week on infrastructure judgment calls. This is the gap [Ownkube](https://ownkube.io) was built to fill. The product runs in your own AWS account (not a multi-tenant PaaS), gives you a Heroku-style developer experience (git push, preview environments, managed Postgres, no DNS to configure), and runs a small team of named agents that handle the recurring ops: - **Cost agent.** Right-sizes workloads. Sleeps idle previews. Flags spend anomalies. Sample output: "api-worker over-provisioned: 2GB allocated, 340MB peak. Right-sized. ~$18/mo saved." - **Incident agent.** Reads crashes and explains them in plain English. Sample output: "Your worker tried to load a 2GB dataset into 512MB RAM. OOMKilled at 14:32. Increase memory request or paginate the query." - **Scaling agent.** Watches traffic and adjusts replica counts and spot capacity ahead of demand. Sample output: "Traffic up 2.4x in 5 min. Scaled api-gateway to 3 replicas. ETA: 12s." - **Security agent.** Flags IAM drift, exposed secrets, and CVEs on base images. Sample output: "AWS_KEY committed in commit a1b2c3. Rotated. PR opened." This covers the work a DevOps hire typically does day to day. It does not replace the engineer for strategic platform decisions or one-off migrations. That's a judgment call you still own. The math: free on a Starter cluster (one AWS instance) for indie projects and small teams, and $5 per vCPU + $1 per GB RAM per month when you scale to EKS. For a 5-person team running ~16 vCPU and 32 GB of production pods, that's about $320 a month. Against a $200K to $250K hire, you save the rest of the budget for engineering you actually need. ## Decision checklist Use this before you write the requisition: - [ ] Is more than half of two senior engineers' weeks consumed by recurring ops (deploys, alerts, cost reviews)? - [ ] Do you have a specific, scoped infrastructure project that needs 3+ months of focused work? - [ ] Do you have budget for at least two platform engineers within 12 months? - [ ] Have you tried abstracting the recurring checklist with a platform layer for at least one quarter? - [ ] Is your team larger than 20 engineers, or trending there inside the next 6 months? If you ticked three or more, hire. If you ticked one or two, try the platform layer first. ## A note on credits If you're a funded startup, the math gets even tighter. AWS Activate carries up to $100,000 in credits at the top tier. Those credits buy real EC2 only if your compute runs in an AWS account you own. Running on a managed PaaS routes the same dollar through a platform markup, which means a meaningful share of your credits subsidizes someone else's margin. Burn credits, not runway. We wrote more on this in [Heroku alternative in your own AWS account](/blog/heroku-alternative-in-your-own-aws-account). ## Closing The DevOps engineer salary question isn't really about salary. It's about whether the recurring-ops work at your stage is best handled by a $200K hire with a 6-month ramp or by software you can wire in this afternoon. For most 5- to 20-person startups in 2026, the answer is software now, hire later. If you want to ship on AWS without a DevOps function, Ownkube is built for that. Start free on a Starter cluster (one AWS instance), pay only when you scale to EKS. [Connect your cloud and try it](https://app.ownkube.io/signup). --- ## Ingress-NGINX EOL in 2026: a practical migration guide to Gateway API > Ingress-NGINX reached end of life on March 26, 2026. Here's how to migrate to Gateway API with ingress2gateway 1.0, pick a controller (Traefik, Envoy Gateway, kgateway), and avoid the upload bug that bites WSGI apps. - Canonical: https://ownkube.io/blog/ingress-nginx-eol-gateway-api-migration - Markdown: https://ownkube.io/blog/ingress-nginx-eol-gateway-api-migration.md - Published: 2026-04-25 - Author: Ownkube team - Category: Engineering - Tags: kubernetes, gateway-api, ingress-nginx, ingress2gateway, traefik, envoy-gateway, kgateway, httproute, migration If you're running ingress-nginx in production, the situation is now this: the project [reached end of life on March 26, 2026](https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/). No more releases. No more bug fixes. No more CVE patches. The repos are going read-only. The Helm charts and images stay published, so your cluster won't suddenly break, but the next time a serious vulnerability lands, nobody upstream is going to fix it. SIG Network estimates this affects [around 50% of cloud-native environments](https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/). If your team is one of them, the question stopped being "should we migrate" and started being "to what, and how, and how fast." This post is a practical answer. We'll cover why ingress-nginx was retired, what Gateway API actually fixes, how to use the new ingress2gateway 1.0 tool, which controller to pick, and the migration gotcha that has bitten several teams already. ## Why ingress-nginx was retired The official statement is worth reading in full, but the short version: ingress-nginx had one or two maintainers doing the work on nights and weekends, and the project's flexibility had become a liability instead of a feature. The clearest example is the `nginx.ingress.kubernetes.io/configuration-snippet` annotation, which let any user with permission to create an Ingress object inject arbitrary NGINX config into the data plane. That was useful in 2018. By 2024 it was the source of repeated CVEs that allowed attackers to read service-account tokens off the controller. The Kubernetes Steering and Security Response Committees concluded that ["it is no longer reasonable or even possible to continue maintaining the tool even if resources did materialize"](https://kubernetes.io/blog/2026/01/29/ingress-nginx-statement/). Two things to keep straight: 1. **The Ingress API itself is not deprecated.** Only the ingress-nginx controller is. If you switch to a different Ingress controller (HAProxy, F5 NGINX, AWS Load Balancer Controller), your existing `Ingress` resources still work. 2. **Existing deployments will keep running.** This is a stop-shipping-fixes event, not a stop-running event. You have time to plan, but not forever. ## What Gateway API actually fixes Gateway API has been GA since 2023 and is the official successor to the Ingress spec. If you've only ever lived in Ingress-land, the headline differences are worth understanding before you pick a controller. **No more annotation soup.** The original Ingress spec under-specified almost every interesting feature: traffic splitting, header manipulation, timeouts, retries, rewrites. Vendors filled the gaps with controller-specific annotations. Switching from nginx to Traefik meant rewriting every annotation. Gateway API has first-class fields for these things, defined in the spec. **Role-oriented resources.** Gateway API splits ingress into three resources: - `GatewayClass`: cluster-wide, owned by the platform team. Defines which controller implementation handles a class of gateways. - `Gateway`: namespace-scoped, owned by network operators. Defines listeners, certs, and which routes can attach. - `HTTPRoute` / `TCPRoute` / `GRPCRoute`: owned by app developers. Defines actual routing logic. This split matters. With Ingress, every developer with `Ingress` create permission could touch the listener config, the TLS cert, the load balancer. With Gateway API, the platform team owns the gateway, and developers attach routes to it through `ReferenceGrant`. RBAC finally maps to how teams actually work. **Cross-namespace routing.** A `HTTPRoute` in the `frontend` namespace can target a service in the `backend` namespace, gated by a `ReferenceGrant`. With Ingress, this required either putting Ingress objects in the wrong namespace or running an ExternalName shim. **Traffic splitting and weighted routing as first-class features.** Canary deploys, blue-green, header-based splits. No annotations, no controller-specific CRDs. Just `weight` and `matches` fields. ## ingress2gateway 1.0 makes the translation tractable The Kubernetes networking team [released ingress2gateway 1.0 on March 20, 2026](https://kubernetes.io/blog/2026/03/20/ingress2gateway-1-0-release/), six days before the ingress-nginx EOL date. This is the tool you want. It translates your existing Ingress manifests into Gateway API resources, preserving behavior where it can and warning you where it can't. The 1.0 release expanded support from 3 to over 30 ingress-nginx annotations, including CORS, backend TLS, regex matching, path rewrite, body size limits, and timeouts. It also added integration tests that spin up real ingress-nginx and Gateway API controllers to verify the translated config behaves the same way at runtime, not just on paper. Basic usage: ```bash # Translate a manifest file ingress2gateway print --input-file my-ingress.yaml \ --providers=ingress-nginx > gwapi.yaml # Translate everything in a namespace ingress2gateway print --namespace prod \ --providers=ingress-nginx > gwapi.yaml # Translate the whole cluster ingress2gateway print --providers=ingress-nginx \ --all-namespaces > gwapi.yaml ``` The most important step is the one nobody likes: **review the output and the warnings carefully**. ingress2gateway is not a one-click migrator. It is a translator that flags what it can't translate so you can decide what to do manually. For complex setups (mTLS, custom Lua snippets, weird rewrite chains), expect to do real engineering work on top of the generated YAML. ## Picking a controller You have real choices here. The honest summary, based on the [gateway-api-bench numbers](https://github.com/howardjohn/gateway-api-bench) and the migration tooling each project ships: - **Traefik.** Lightweight, great DX, single binary, ships compatibility shims for many ingress-nginx annotations. Strong for indie teams and small clusters. Loses some throughput at very high connection counts (over 32 concurrent), but that's not where most teams live. - **Envoy Gateway.** CNCF, vendor-neutral, strict Gateway API conformance. Good for teams that want a pure standards-compliant gateway without service-mesh complexity. Built-in OIDC, rate limiting, and OAuth2. - **kgateway** (formerly Gloo Gateway). Envoy data plane, optimized control plane, ships its own fork of ingress2gateway with extra annotation coverage. AI-native routing primitives if you're building LLM apps. Best for high-churn CI/CD environments. - **HAProxy Kubernetes Ingress Controller.** Stays on the Ingress spec, not Gateway API. A reasonable choice if you want to defer the Gateway API migration and just swap the controller. - **F5 NGINX Ingress Controller.** Different project from ingress-nginx, Apache 2.0, maintained by F5. Also Ingress-spec-based. Familiar config model for teams that don't want to learn a new resource model. If your priority is the smallest possible diff to your current setup, F5 NGINX or HAProxy is the path of least resistance. If your priority is being on a maintained, modern, standards-tracking stack three years from now, Gateway API on Traefik or Envoy Gateway is the better long-term call. Don't pick based on benchmarks unless you've actually measured your own workload. ## The migration gotcha most teams hit The sneakiest bug in ingress-to-Gateway-API migrations isn't a routing rule. It's request body handling. ingress-nginx buffers request bodies by default before forwarding upstream. Most modern Gateway API controllers (Traefik, Envoy-based) stream request bodies upstream with `Transfer-Encoding: chunked`. For most apps, this is a performance win. For [WSGI apps running under uWSGI or Gunicorn](https://devoriales.com/quiz/20/gateway-api-learning-lab-from-zero-to-hero), it's a silent disaster: WSGI servers expect a `Content-Length` header and don't reliably handle chunked transfer encoding. File uploads arrive as zero-byte files. The controller logs look clean. The app logs look clean. Your users just can't upload anything. If you run any Python WSGI service, any Ruby Rack app on an older server, or any framework that predates ASGI, test file uploads on a staging cutover before you flip production traffic. The fix is usually a controller-side body buffer or a switch to ASGI/Uvicorn/Gunicorn-with-async, but you want to discover this on staging. ## A learning path If you want to go from "I read this blog" to "I can defend a Gateway API migration in a design review," there's a free 12-lesson hands-on course at [devoriales.com](https://devoriales.com/quiz/20/gateway-api-learning-lab-from-zero-to-hero) that uses Traefik and a real bookstore app on a local k3d cluster. It covers the resource model, TLS with mkcert and cert-manager, traffic splitting, ReferenceGrant, PDBs and HPAs for the gateway itself, and extending Traefik with custom Go plugins via Yaegi. About 6 to 8 hours, self-paced. The content is open; only progress tracking needs an account. It's the most practical hands-on resource we've found, and it's specifically built around the migration story instead of treating Gateway API as a greenfield exercise. ## When to stay on Ingress (for now) A few honest cases where migrating today is the wrong call: - You're running F5 NGINX Ingress Controller, HAProxy, or a cloud-managed controller (AWS ALB, GKE Ingress). Your project isn't being retired. You can wait until Gateway API gives you a feature you actually want. - You have one cluster, three services, and ingress-nginx works. Move to a maintained Ingress controller (F5 NGINX is the closest swap), table the Gateway API migration for next quarter. - You're mid-migration on something else (cluster upgrade, mesh rollout, region split). Don't stack two migrations. If you're running ingress-nginx in production with custom snippet annotations, multi-tenant clusters, or compliance requirements (SOC 2, PCI-DSS, HIPAA), the calculus is different. Unpatched CVEs in your data plane will become an audit finding fast. ## Where Ownkube fits This is exactly the kind of work a platform layer should absorb. When we provision a cluster on your AWS or GCP account, ingress is set up on Gateway API from day one with Cloudflare on the edge for DDoS, bot, and scrape protection, and a ready-to-share preview domain per project. No Route 53 to configure, no controller to pick, no cert-manager to wire up, no annotation soup to translate. If ingress-nginx EOL is the second migration you've had to plan this year, the answer isn't to keep getting better at migrations. It's to stop owning the parts of the stack where the answer is the same for everyone. If you want a Heroku-style developer experience in your own cloud account, with the ingress and edge story already solved, [start on Ownkube](https://app.ownkube.io/signup). Free for teams on k3s mode (as of April 2026), and ten dollars per vCPU plus five per GB RAM on EKS mode when you scale. You pay AWS wholesale for the compute, we run the platform on top, and AWS Activate credits apply directly to your cloud bill. Sources: - [Ingress NGINX Retirement: What You Need to Know (Kubernetes blog)](https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/) - [Ingress NGINX: Statement from the Kubernetes Steering and Security Response Committees](https://kubernetes.io/blog/2026/01/29/ingress-nginx-statement/) - [ingress2gateway 1.0 release announcement](https://kubernetes.io/blog/2026/03/20/ingress2gateway-1-0-release/) - [Gateway API Learning Lab on devoriales.com](https://devoriales.com/quiz/20/gateway-api-learning-lab-from-zero-to-hero) - [gateway-api-bench (Howard John)](https://github.com/howardjohn/gateway-api-bench) --- ## How to deploy on AWS without hiring a DevOps engineer > You need to ship on AWS. You don't have a platform team. Here's the stack that actually works, and the expensive mistakes to skip on the way there. - Canonical: https://ownkube.io/blog/deploy-on-aws-without-devops-engineer - Markdown: https://ownkube.io/blog/deploy-on-aws-without-devops-engineer.md - Published: 2026-04-18 - Author: Ownkube team - Category: How-To - Tags: aws, deployment, startups, infrastructure, k3s A senior DevOps or platform engineer in the US runs $180K to $250K all-in (base, equity, benefits, recruiting), with a 3 to 6 month lead time before they're shipping anything useful. That's roughly a year of seed-stage runway for one hire whose entire job is to keep your build pipelines green. For most early teams, it's the wrong first hire, but the work still has to happen: someone needs to deploy the app, rotate the secrets, restore the database when it goes sideways. This post is about how to do that work without making the hire. Not "how to build a platform team," and not "how to avoid Kubernetes." The opposite of both: a deliberately small AWS stack a normal application engineer can run in the background, and the four or five expensive mistakes we see teams make on the way there. It's a how-to, not a buyer-decision post. If you're still deciding whether to leave [Heroku](https://www.heroku.com) or [Render](https://render.com), start with [A Heroku alternative in your own AWS account](/blog/heroku-alternative-in-your-own-aws-account) or [Render vs your own AWS account](/blog/render-vs-aws-own-account) first, then come back here for the build-out. ## The mistakes we see every week Before the stack, the mistakes. These are the ones that eat a quarter of runway and leave you further from shipping than when you started. **Starting with [EKS](https://aws.amazon.com/eks/).** Someone reads a blog post about production Kubernetes, thinks "we should do it right," and spins up EKS with a VPC CNI, a managed node group, an ALB ingress controller, external-dns, cert-manager, and a Helm chart nobody on the team can debug. Three months later your app is still running on Heroku because nobody can get TLS working end to end. EKS is a good choice for a 30-engineer team with a platform lead. It is a terrible first step. (Full breakdown in [EKS vs k3s on AWS for startups](/blog/eks-vs-k3s-on-aws-for-startups).) **Picking [ECS](https://aws.amazon.com/ecs/) because it looks simpler than Kubernetes.** It is simpler, and that simplicity is the trap. ECS is AWS-proprietary end to end: no portable API, no Helm, no operators, no ecosystem that works anywhere else. The day you want to run your workload in another account, another cloud, or on a laptop, you rewrite every manifest. k3s gets you close to the same operational simplicity without the one-way door. **Terraforming everything from day one.** You sit down to declare your entire infrastructure in code before you've deployed anything. Two weeks in, you have a beautiful module structure and zero running services. Infrastructure as code is worth it, but not before you know what infrastructure you actually need. **Handrolling CI/CD in GitHub Actions.** A `deploy.yml` file that SSHs into an EC2 box, pulls the latest image, and runs `docker-compose up -d`. It works for three months. Then a deploy fails halfway through, leaves a container orphaned, and your API is down at 11 PM on a Friday with nobody who knows how to roll it back. **Putting production on a single EC2 instance with no plan for the day it dies.** It will die. AWS will send you a "scheduled maintenance" email, or the disk will fill up, or you'll push a bad deploy and lose the host. If you can't recreate the machine in under ten minutes, you do not have a production environment. You have a time bomb. **Treating RDS as the easy part.** RDS looks simple until you need to run a migration, restore a PITR backup to a new instance, or figure out why your connections are saturating. "Managed" does not mean "ignore it." You still need to know how to take a snapshot, how to restore one, and how to rotate credentials. ## The minimum stack Here is the smallest surface area that will actually hold up in production for a startup doing real traffic. **Compute: one or two [EC2](https://aws.amazon.com/ec2/) instances running [k3s](https://k3s.io).** k3s is a single-binary Kubernetes distribution from Rancher ([source on GitHub](https://github.com/k3s-io/k3s)). It is real Kubernetes (same API, same `kubectl`, same ecosystem) but it installs in under a minute, runs in about 500 MB of RAM, and does not require you to think about control plane sizing. For most startups, one m6i.large or m7g.large is enough to run your app, workers, and a few supporting services. Add a second node for HA when you have paying customers. **Database: Postgres, either on the same EC2 box or on [RDS](https://aws.amazon.com/rds/).** For a single-node setup with low traffic, running Postgres in-cluster on EC2 is fine. A managed Postgres operator handles backups, PITR, and failover, and you pay only for the EC2 volume. When the box starts feeling loaded or when you need multi-AZ, move to RDS with [PITR](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html) enabled, [Performance Insights](https://aws.amazon.com/rds/performance-insights/) on, and 7+ days of backups. Don't pay for multi-AZ RDS before you have revenue that justifies it. **Object storage: [S3](https://aws.amazon.com/s3/).** One bucket per environment. Enable versioning on anything you'd be upset to lose. **Secrets: [AWS Secrets Manager](https://aws.amazon.com/secrets-manager/) or [SSM Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html).** Not environment variables checked into a `.env`. Not hardcoded in your GitHub Actions secrets. Somewhere with rotation and an audit trail. **DNS, TLS, and edge protection: [Cloudflare](https://www.cloudflare.com).** Put your zone on Cloudflare's free plan and you get managed DNS, automatic TLS at the edge, DDoS protection, bot mitigation, and scrape protection, all for zero dollars. You can terminate TLS at Cloudflare and keep a simple `http://` listener inside the VPC, or terminate TLS in-cluster with [cert-manager](https://cert-manager.io) + Cloudflare DNS-01. [Route 53](https://aws.amazon.com/route53/) is a fine fallback if you're already committed to AWS-native DNS, but for most startups Cloudflare removes the DNS and WAF line items from your first-month AWS bill. **Load balancer: an [AWS Network Load Balancer](https://aws.amazon.com/elasticloadbalancing/network-load-balancer/) in front of your k3s node(s).** NLB is cheaper and simpler than ALB, and you let an ingress controller inside the cluster handle HTTP-layer routing. **Ingress: [Traefik](https://traefik.io) or [nginx-ingress](https://kubernetes.github.io/ingress-nginx/).** Either is fine. Traefik is what k3s ships with by default. **Logs: [CloudWatch Logs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html), or [Loki](https://grafana.com/oss/loki/) if you'd rather not touch CloudWatch.** Ship container stdout somewhere you can actually search it. **Observability: one [Prometheus](https://prometheus.io) + [Grafana](https://grafana.com), or a free [Grafana Cloud](https://grafana.com/products/cloud/) account.** You want to be able to answer "is the API slow right now" in 30 seconds. That's the stack. No service mesh. No GitOps tool. No multi-region. No Kubernetes operators you didn't need. When your second product engineer joins the company, they should be able to read the whole runbook in an afternoon. ## Why raw AWS gets painful You can do everything above by hand. Many startups do. The pain shows up slowly, not all at once. The first month it's fine. Then a pod gets OOMKilled, or a deploy half-succeeds, or you need to restore a database to 40 minutes ago, and nobody has the context to fix it without reading AWS docs for two hours. Then you try to spin up staging and realize your one EC2 box is a special snowflake: IAM roles added ad-hoc, security group rules nobody remembers, a DB password pasted into Slack. Reproducing it takes a week. Then a developer asks "how do I deploy my branch somewhere the designer can see it?" You don't have a good answer. Preview environments (the thing that made Heroku feel magical) require real infrastructure automation you don't have time to build. Then your first paying customer asks for an SLA and you realize you have one host, no failover, and no runbook for a bad deploy. None of these are AWS's fault. AWS gives you primitives. Turning primitives into a platform is the work, and the work is what you don't have headcount for. ## The lightweight path Here is the sequence that has worked for the startups we've watched get this right without hiring. **1. Get one EC2 box running k3s, pointed at a domain, terminating TLS.** Do this in an afternoon. Use [`k3sup`](https://github.com/alexellis/k3sup) or a 20-line cloud-init script. Put Traefik or nginx in front, point a Cloudflare DNS record at the NLB, and let Cloudflare terminate TLS at the edge. No cert-manager required for the first week. You now have a real Kubernetes cluster for about $60/month, with DDoS and bot protection already in front of it. **2. Deploy your app with a plain Deployment + Service + Ingress.** Three YAML files, maybe 80 lines total. Push a Docker image to [ECR](https://aws.amazon.com/ecr/). Apply the manifests. Your app is live on HTTPS. **3. Point RDS at it.** Create a Postgres RDS instance in the same VPC. Store the connection string in Secrets Manager. Mount it into the pod via the [Secrets Store CSI driver](https://secrets-store-csi-driver.sigs.k8s.io/), or sync it into a Kubernetes Secret with [`external-secrets`](https://external-secrets.io/). **4. Set up [GitHub Actions](https://docs.github.com/en/actions) to build, push, and `kubectl apply`.** Keep the workflow under 50 lines. Use an [OIDC role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html). Don't paste long-lived AWS keys into secrets. **5. Add a staging namespace.** Same cluster, different namespace, a `-staging` subdomain. This is the point where the cheap setup starts paying for itself. **6. Add a second node for HA.** When you have revenue. Not before. That's the lightweight path. A single engineer who is comfortable with AWS and Kubernetes basics can build it in two weeks. It will hold you to about 50 engineers and tens of thousands of requests per minute. Past that you need real investment, but by then you can afford it. ## What to automate first You cannot automate everything. Here's the priority order when you have limited engineering time. **Deploys.** If a deploy takes more than five minutes or requires a remembered command sequence, engineers start batching changes to avoid it, and ship buggier code. Automate `git push` to live in under five minutes. Nothing else matters more. **TLS.** Certificates that expire on a Sunday morning are a self-inflicted outage. cert-manager with Let's Encrypt, automatic renewal, and alerts on renewal failure. Set it up once, never touch it. **Logs.** If you cannot search yesterday's logs from one place in under 30 seconds, you are flying blind during incidents. Ship everything to CloudWatch Logs or Loki on day one. It is much harder to retrofit later. **Rollback.** Every deploy needs a one-command rollback. `kubectl rollout undo deployment/api` is fine. What is not fine is "rebuild the previous image, repush, redeploy, hope." Rehearse the rollback before you need it. **Backups.** RDS automated backups are on by default, but useless if you've never tested a restore. Once a quarter, restore the latest snapshot to a new instance and run your app against it. The first time, you'll find something broken. Better now than during an incident. Logs, alerts, dashboards, IaC, preview environments, cost monitoring. All of these matter. But the five above are the ones that keep you alive. Do them first, in order. ## Where this breaks down This path works until one of four things is true: 1. You have more than about 50 engineers and the cognitive overhead of "everyone knows the stack" stops scaling. 2. You have compliance requirements (HIPAA, FedRAMP) that demand a level of audit logging and network isolation a single k3s node can't easily provide. 3. You have traffic that genuinely needs multi-region. 4. You are running stateful workloads (databases, queues) inside Kubernetes, in which case you need someone whose full-time job is to care about them. Until then, a single k3s cluster on EC2 with RDS for data, S3 for blobs, and Secrets Manager for credentials is enough. Everyone who tells you otherwise is either selling you something or works somewhere big enough that their advice doesn't apply to you. ## The shortest path The setup above is what we install for you. Ownkube runs inside your own AWS account, sets up k3s, wires in TLS, logs, backups, deploy automation, and preview environments, and then gets out of the way. Your AWS credits still apply. The infrastructure is vanilla enough that if you ever want to fire us, everything keeps running. If you are staring at a migration from Heroku and the idea of hiring a platform team is not realistic, [connect your cloud](https://app.ownkube.io/signup) and we'll have you deployed on your own AWS account this week. No DevOps hire required. --- ## Render vs running in your own AWS account: cost, control, and when to switch > Render is great early on. But for teams that need real infrastructure control, AWS ownership, and predictable costs, running in your own account is the better long-term path. Here's how to decide. - Canonical: https://ownkube.io/blog/render-vs-aws-own-account - Markdown: https://ownkube.io/blog/render-vs-aws-own-account.md - Published: 2026-04-11 - Author: Ownkube team - Category: Engineering - Tags: render-alternative, aws, paas, platform-engineering, cost [Render](https://render.com) is a good platform. A lot of teams ship their first production app on it and never have a complaint. But if you're reading a post called "Render vs AWS," you're probably not one of those teams anymore. You've hit something: a bill that stopped making sense, a security review that asked where the data lives, a networking requirement Render doesn't cover. Now you're trying to figure out whether to stay or move. This post is the buyer-decision angle: a side-by-side comparison, the team sizes where each side wins, and the honest cost of owning AWS yourself. If you're already past the comparison and just want the architecture for a Heroku-style platform on AWS, see [A Heroku alternative that runs in your own AWS account](/blog/heroku-alternative-in-your-own-aws-account). If your real question is "how do I do AWS without hiring a DevOps engineer," that's a [different post](/blog/deploy-on-aws-without-devops-engineer). Short answer: if you're pre-PMF or under five engineers, stay on Render. If you're a B2B SaaS team with growing AWS credits, a real compliance surface, or workloads that keep bumping into platform limits, running in your own AWS account is almost certainly the better long-term answer. The middle ground is a Render-like workflow on AWS infrastructure you own. The rest of this post walks through how to know which side of the line you're on. ## Render vs your own AWS account at a glance | Dimension | Render | Your own AWS account | |---|---|---| | Setup speed | Minutes. Connect a repo, done. | Longer out of the box. Minutes with a platform layer on top. | | Infrastructure control | Limited to what Render exposes. | Full access. Any AWS service, any region, any instance type. | | Networking flexibility | Basic private networking, managed. | VPC peering, Transit Gateway, PrivateLink, Direct Connect, all available. | | Security boundary ownership | Render's shared infrastructure. | Your VPC, your KMS keys, your IAM, your audit logs. | | Pricing predictability | Per-service pricing, markup over raw compute. | Wholesale AWS pricing. Credits apply. Savings Plans apply. | | Operational burden | Render handles most of it. | You own it, unless you put a platform layer on top. | | Migration difficulty | Low from zero. Higher later, because you've built to their primitives. | Higher upfront. Lower in perpetuity because the infra is yours. | | Compliance story | "Our PaaS provider is SOC 2." | "Our data runs in our VPC under our controls." | The table is honest. There are real reasons to stay on Render and real reasons to leave. The question is which set applies to you right now. ## Why do teams choose Render in the first place? It's worth stating clearly, because Render does something well that AWS on its own doesn't. **It's fast.** You connect a GitHub repo, pick a service type, and get a URL. No IAM roles, no VPCs, no load balancer to configure. For the first ninety days of a product, nothing AWS-native comes close to that velocity. **The defaults are good.** TLS certs, zero-downtime deploys, health checks, autoscaling, a managed Postgres that actually works, all wired up without reading a single piece of documentation. **No infra team required.** Two backend engineers can run a production business on Render without ever touching a cloud console. For seed-stage teams, that's not a nice-to-have. It's the difference between shipping and not shipping. **The developer experience is coherent.** One dashboard, one mental model, one place logs and metrics show up. Engineers onboard in an afternoon. If the above is everything you need, Render is a perfectly respectable home. Don't let anyone talk you into a migration for the sake of sophistication. ## When does Render stop fitting? Nobody leaves Render because they stopped liking it. They leave because the shape of the business changed. **The bill stops being proportional.** Render's pricing is fine at small scale. But once you're running a dozen services, a couple of background workers, and a Postgres with non-trivial storage, the per-service billing adds up faster than the equivalent EC2 + RDS would. On a growing app the delta can cover a senior hire. **Procurement asks harder questions.** When an enterprise customer's security team asks for a SOC 2 report, that's easy. When they ask where customer data is processed, whether you control the encryption keys, and whether you can support a customer-specific region, "we use Render" starts generating follow-up emails. **You need networking Render doesn't expose.** VPC peering to a partner's account. A PrivateLink endpoint to a vendor's API. A customer-dedicated VPC. A specific region for data residency. These are table stakes on AWS and either unavailable or awkward on shared PaaS. **Your AWS credits are stranded.** Most funded startups have six figures of [AWS Activate](https://aws.amazon.com/activate/) credits. If the platform bill goes to Render, those credits are just sitting there, depreciating with the runway. **The lock-in gets heavier over time.** Render services, Render disks, Render's deploy model. The longer you build on them, the more expensive leaving becomes. Teams that start thinking about portability at month six are in a much better position than teams that start thinking about it at month thirty-six. **The workload outgrows the platform.** GPUs, spot fleets, long-running batch jobs, specialized instance types, large private datasets. At some point you need a primitive Render doesn't have, and the workaround is worse than just running it on AWS. None of these are Render's fault. Shared PaaS always makes the same tradeoff: simplicity now for flexibility later. You're asking the question because the tradeoff stopped working. ## When running in your own AWS account makes sense Be specific here, because "move to AWS" is advice that costs teams real money when it's wrong. The honest answer depends heavily on team size, so we'll break it down. **If you're an indie builder or a 1 to 2 person team just trying things out**, don't migrate. Stay on Render, or if you really want AWS ownership, run k3s on a single EC2 box and call it done. You don't have the headcount to absorb the operational surface of a full AWS setup, and Render's simplicity is worth more to you than control right now. **If you're a small team (up to 20 engineers) without a dedicated platform owner**, running in your own AWS account is the right call if most of these are true: - You're selling B2B, and security questionnaires already mention AWS. - You have AWS credits you actually want to use. - You want the option to walk away from any vendor without rewriting deployments. - The Render bill is approaching the cost of a senior engineer. At this size, pick k3s on EC2 inside your own AWS account with a platform layer on top. You get the ownership story without the EKS complexity, and you can move to EKS later without replatforming your apps. **If you're 20+ engineers with production traffic or real compliance pressure** (SOC 2 Type II, HIPAA, ISO 27001, FedRAMP-adjacent work), pick a production-grade setup on AWS. Not as a hedge, as a commitment. At that size the managed control plane, IRSA, multi-AZ autoscaling, and full AWS observability pay for themselves. EKS becomes the default, not an upgrade. If three or four of the bullets above apply to you, the sooner you start the migration, the cheaper it is. ## The hidden tradeoff "Just move to AWS" is not free advice. Raw AWS gives you everything and organizes none of it. You have to decide on a cluster or no cluster, a CI/CD approach, a secrets strategy, a logging pipeline, a cost monitoring setup, and a dozen other choices before the first deploy. Teams that go straight from Render to hand-rolled AWS often spend a quarter not shipping product while they wire it up. This is where a lightweight platform layer on top of AWS earns its keep. The goal is to keep the Render-shaped developer experience (git push, managed database, preview environments, observability baked in) while the infrastructure itself lives in your account. Concretely, that usually means: - A small Kubernetes distribution like **[k3s](https://k3s.io)** on EC2 for stateless workloads. Boots in seconds, runs on small instances, much smaller operational surface than EKS. - **State handled sensibly.** For indie builders and small teams, a managed Postgres running in-cluster on your own EC2 (operated by an operator, not DIY) is a perfectly good default. As you grow into production scale, move to **managed AWS services** for stateful workloads: RDS for Postgres, ElastiCache for Redis, S3 for artifacts. The point is to not self-host what someone else already operates well. - **A deployment control plane** that turns `git push` into a running service without anyone writing a manifest. - **A clear upgrade path.** When the team grows and you do want EKS or a full platform team, the workloads don't move. Only the control plane underneath them. This is deliberately boring. The point is not to showcase infrastructure. The point is to keep the workflow your engineers already like, running on infrastructure you control. ## When should you stay on Render? To be fair (and because the wrong migration is worse than no migration) here's when Render is still the right answer: - **You're pre-PMF.** Every hour spent on infrastructure is an hour not spent on the thing that might actually work. - **Your team is three or fewer.** You don't have the headcount to absorb even a well-abstracted AWS setup. - **Your workload is web-shaped and small.** A couple of services, one database, no meaningful compliance surface. - **Your horizon is short.** If the next six months are about customer discovery, not scaling, stay where you are. - **You don't have AWS credits or a strong AWS preference.** Going to AWS just to go to AWS is not a reason. "Stay on Render" is a completely respectable answer. We tell indie devs and 1 to 2 person teams this all the time. ## A decision checklist for switching If you're still on the fence, count how many of these apply to you: - [ ] Our Render bill is larger than one senior engineer's monthly comp. - [ ] We have unused AWS credits we could apply. - [ ] Customers are asking where their data runs or demanding their own region. - [ ] We're working on SOC 2, HIPAA, or a similar compliance framework. - [ ] We've needed networking features Render doesn't support (VPC peering, private links, specific regions). - [ ] We expect to be on AWS in two years regardless, and would rather start now than migrate later. - [ ] We want a real exit plan from our current platform vendor. Three or more checked boxes means it's probably time to start planning. Five or more means you're already past the point where the move is cheap. ## Conclusion Render is good at what it does. For early teams, it's one of the fastest ways to get something real into production. But the same simplicity that makes it great at the start becomes a ceiling later: on cost, on control, on compliance, on what the architecture is allowed to look like. For teams that hit that ceiling, running in their own AWS account is usually the right long-term answer. The honest catch is that raw AWS introduces its own overhead, and not every team has the time or headcount to absorb it. The middle ground is a managed, opinionated platform layer on top of AWS you own. That's what most of these migrations actually look like. We walk through that architecture in detail in [A Heroku alternative that runs in your own AWS account](/blog/heroku-alternative-in-your-own-aws-account), and the EKS-vs-k3s call inside it in [EKS vs k3s on AWS for startups](/blog/eks-vs-k3s-on-aws-for-startups). If you want Heroku/Render-like deploys in your own AWS account, [Ownkube](https://app.ownkube.io/signup) is built for that. Connect your AWS account, get a git-push workflow, managed Postgres, preview environments, and cost controls, with the infrastructure, the data, and the exit in your name. Fifteen minutes to your first deploy. Happy to [talk through your situation](https://app.ownkube.io/signup) if you're not sure which side of the line you're on. --- ## EKS vs k3s on AWS for startups: cost, complexity, and when to choose each > A direct comparison of EKS and k3s on AWS for small teams who have to ship. Real bills, real failure modes, and the line where k3s stops being enough. - Canonical: https://ownkube.io/blog/eks-vs-k3s-on-aws-for-startups - Markdown: https://ownkube.io/blog/eks-vs-k3s-on-aws-for-startups.md - Published: 2026-04-04 - Author: Ownkube team - Category: Engineering - Tags: kubernetes, eks, k3s, aws, startups, cost You have an app to ship. Maybe a few. You're on AWS because that's where the credits are and where the auditors want your data. Somebody on the team said "Kubernetes" out loud and now you're trying to decide between EKS and k3s before the week is out. We'll save you the essay. Here's our take: - **If you're an indie builder or a 1 to 2 person team just trying the product out**, run k3s on a single EC2 box. Don't even open the EKS console. The $73/month control plane fee alone is larger than your entire compute bill, and none of the things EKS is good at are things you need yet. - **If you're a small team (up to 20 engineers) without a dedicated platform owner**, start on k3s. Ship product. You'll know when to graduate. - **If you're 20+ engineers with production traffic, multi-AZ requirements, or real compliance pressure**, pick EKS. Not as a hedge, as a commitment. At that size the managed control plane, IRSA, and AWS-native autoscaling pay for themselves. EKS is usually the right destination, not the right starting point, and the reason has very little to do with Kubernetes itself. Here's what actually matters when you make this call. ## The thing people get wrong Both are Kubernetes. Same API, same manifests, same `kubectl apply`. If someone hands you a chart that works on EKS, it will almost certainly work on k3s without modification, and vice versa. This isn't a technology choice. It's an operations choice. What you're really picking is **who owns the control plane and how much of AWS you need to glue in**. - **[EKS](https://aws.amazon.com/eks/)**: AWS runs the control plane. You pay $73/month per cluster for it, and you glue in [IRSA](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html), [VPC CNI](https://github.com/aws/amazon-vpc-cni-k8s), the [AWS Load Balancer Controller](https://kubernetes-sigs.github.io/aws-load-balancer-controller/), [EBS CSI](https://github.com/kubernetes-sigs/aws-ebs-csi-driver), and whichever flavor of autoscaler you prefer. Upgrades happen on AWS's calendar. - **[k3s](https://k3s.io)**: You run the control plane on an EC2 instance. It starts in under a minute, ships with [Traefik](https://traefik.io) and a working storage class, and upgrades when you decide. Nothing AWS-specific unless you want it. The rest of this post is the operational fallout of that choice. ## Side-by-side, the useful version | | EKS | k3s on EC2 | |---|---|---| | Control plane cost | $73/month/cluster | $0 | | Minimum viable footprint | 1 control plane + 2 nodes in 2 AZs | 1 EC2 instance | | Time to a working cluster | 15-25 min with `eksctl`, longer first time | Under 60 seconds | | Networking | VPC CNI (real VPC IPs, counts against subnet) | Flannel VXLAN (overlay, doesn't touch VPC IPs) | | Ingress | ALB Controller + one ALB per group | Traefik built in | | Storage | EBS / EFS / FSx CSI, IAM required | `local-path` out of the box; add EBS CSI if you want | | Pod IAM | IRSA (clean, audited) | Instance profile, or you bring a solution | | Upgrades | AWS-driven, rolling managed node groups | You pick the hour; risk is yours | | Biggest sharp edge | VPC CNI IP exhaustion, IRSA permission dance | Embedded etcd quorum loss, single-node backups | | Honest team size | 15+ engineers, platform owner forming | 2-20 engineers, nobody owns infra full-time | Anything not on this table is a rounding error for a startup. ### A note on [ECS](https://aws.amazon.com/ecs/) ECS often shows up as "Kubernetes but simpler." The simplicity is real; the tradeoff is lock-in. Task definitions, services, and the deployment model are AWS-proprietary: no portable API, no Helm, no ecosystem that transfers. The day you want to run the same workloads elsewhere, you rewrite every manifest. Both EKS and k3s give you the same portable Kubernetes API; ECS gives you a one-way door. ## What the bill really looks like Take a workload we see often: one web service, two workers, Postgres on RDS, Redis on ElastiCache, a staging environment, and room for a few PR preview environments. Call it ~8 vCPU / 16GB of production pods plus ~4 vCPU / 8GB for staging and previews. All prices below are approximate on-demand rates in `us-east-1` as of April 2026. Your numbers will vary with region, reserved capacity, and traffic profile. ### k3s on EC2 - 3 × `t3.xlarge` on-demand (4 vCPU / 16GB each): **$299** - 300GB EBS gp3: **$24** - 1 NAT gateway: **$32 + traffic** - 1 ALB in front of Traefik: **$22** - Data transfer (moderate): **$20** **~$400/month** for plenty of headroom. You can fit 5-10 services, full staging, and rotating previews on that without thinking about capacity. ### EKS - Control plane: **$73** - 2 × `t3.large` prod nodes (for the 2-AZ story): **$121** - 1 × `t3.medium` staging node: **$30** - 1 × `t3.large` for previews: **$60** - 300GB EBS gp3: **$24** - 1 NAT gateway: **$32 + traffic** - 2-3 ALBs (ALB controller per IngressGroup, typical): **$44-$66** - Data transfer (higher, because VPC CNI loves inter-AZ chatter): **$25** **~$410-$440/month.** Pricing looks close until you add the labor, the part nobody puts on the slide: - First EKS setup: 1 to 3 engineer-days. - Every EKS upgrade: a half-day of drain and verify, quarterly. - First VPC CNI IP exhaustion: half a day figuring out `t3.large` nodes only get 35 IPs. - ALB controller version pinning across upgrades: a ticket, every time. That's real salary spent on AWS glue rather than product. On k3s, the comparable surface area is "keep an AMI up to date" and "snapshot etcd nightly." ## Getting from zero to shipping **EKS** (plan on a week if nobody's done this before): 1. VPC with public/private subnets across 2 to 3 AZs, plus NAT and route tables. 2. IAM roles for the cluster, node group, and IRSA. 3. [`eksctl create cluster`](https://eksctl.io), then wait 15 to 25 minutes. 4. Install the AWS Load Balancer Controller, EBS CSI driver, and [Karpenter](https://karpenter.sh) (each its own IAM dance). 5. Tune CoreDNS, metrics-server, and VPC CNI (prefix delegation, warm pool settings). 6. Wire up Ingress with [ACM](https://aws.amazon.com/certificate-manager/) and [Route53](https://aws.amazon.com/route53/). The piece that bites first-timers is always IRSA and the OIDC provider. One typo in a trust policy and pods silently fail to assume roles. **k3s** (under a day): 1. Launch an EC2 instance. 2. `curl -sfL https://get.k3s.io | sh -`. 3. Point DNS at it. 4. Deploy. Not a marketing simplification. Traefik is running, there's a default StorageClass, kubeconfig is at `/etc/rancher/k3s/k3s.yaml`. You can be serving traffic in a lunch break. For production, add an ASG of three k3s servers with embedded etcd for HA, an NLB out front, and a nightly etcd snapshot. ## The failures you'll actually hit Forget feature checklists. These are the incidents that will eat your weekend. **On EKS, the usual suspects:** - **Pods stuck Pending with `no IP addresses available`.** VPC CNI assigns real VPC IPs to every pod. On a `t3.large` that's 35 IPs max. You hit it during an autoscaling event, not during testing. Fix is prefix delegation, which requires a node recycle. - **IRSA silently not working.** Pod annotation, service account annotation, trust policy, OIDC provider, role policy. Five things have to line up. One off-by-one and you get `AccessDenied` with no obvious source. - **ALB controller version skew after an EKS upgrade.** The ALB controller has its own compatibility matrix. Forget to bump it and ingress just stops reconciling. - **Node group upgrade drains in the wrong order.** PDBs not set, pods evicted faster than they start elsewhere. 30-second outage during a "safe" upgrade. **On k3s, the usual suspects:** - **Embedded etcd quorum loss.** You were running HA on three `t3.medium` servers. Two got replaced by ASG inside five minutes. Cluster is read-only. Recovery is `k3s server --cluster-reset` from a known-good snapshot. You want that snapshot script working before you need it. - **Local-path PVs disappearing with the node.** The default StorageClass is per-node local disk. Great for caches, terrible for your single-replica Postgres. Switch stateful workloads to RDS or add EBS CSI. - **k3s version upgrade breaking Traefik.** k3s bundles Traefik, and major k3s upgrades can bump Traefik's CRDs. Pin the Traefik Helm values or disable the bundled version and run your own. - **Single-node cluster dies with the instance.** If you started all-in-one to move fast and forgot to migrate to HA, a spot interruption or AZ blip is a full outage. Migrate before you're depending on it in production. Both sets are learnable. The EKS failures are more about fighting AWS primitives. The k3s failures are more about owning the operational basics yourself. ## When k3s is enough Start here if most of these are true: - Under 20 engineers and nobody's job title is "platform." - Stateless web services and workers, with state in a managed Postgres (either in-cluster on EC2 with an operator, or RDS if you prefer AWS-native), plus ElastiCache and S3 as needed. - Single-region is fine for now. - Compliance doesn't demand AWS-managed control plane components. - You'd rather spend the next two sprints on product than on Kubernetes. k3s is not a toy. It's CNCF-certified Kubernetes, it powers Rancher's own product, and there are companies running it on bare metal fleets larger than most SaaS startups will ever see. Using it isn't a compromise; it's picking the distribution that doesn't punish small teams. ## When EKS earns its keep Move to EKS (or start there, if you're already past the line) when any of these are real: - **Audit pressure.** If SOC 2 or HIPAA readiness hinges on "AWS patches the control plane," use EKS. - **Fine-grained pod IAM.** Per-pod credentials for S3, SQS, Bedrock are much cleaner with IRSA than instance profiles or sidecars. - **~50+ services or ~300+ pods.** k3s handles it, but upgrades and capacity get real. Karpenter on EKS is genuinely better at that scale. - **Multi-AZ or multi-region HA as a hard requirement.** k3s HA is possible; EKS HA is the default. - **A dedicated platform hire or team.** Once someone owns infra full-time, they'll want managed node groups and IRSA. - **GPU pools, Graviton spot fleets, Bottlerocket, Windows nodes.** EKS wires these in natively. If none of these describe your next 12 months, you're paying EKS tax for a future you might not have. ## The migration nobody sells you Here's what actually happens when you outgrow k3s and move to EKS. Your Deployments, Services, ConfigMaps, Secrets, Jobs, CronJobs: unchanged. Your Helm charts: unchanged. The diff is at the edges: - Ingress: Traefik annotations become ALB controller annotations (10-30 lines of YAML per service). - Storage: `local-path` PVCs move to `gp3`; stateful workloads you already had on RDS need no change. - IAM: instance-profile or sidecar-based access becomes IRSA. A real piece of work, but mechanical. - Autoscaling: single-ASG k3s becomes Cluster Autoscaler or Karpenter. Plan a week or two of cleanup, not a rewrite. And critically, k3s-to-EKS is a much shorter migration than "no Kubernetes to EKS" would have been if you'd held off for 18 months. For a deeper walkthrough of the k3s-on-AWS setup most startups land on, see [How to deploy on AWS without hiring a DevOps engineer](/blog/deploy-on-aws-without-devops-engineer). ## The real recommendation If you're choosing today on AWS with under 20 engineers, start on k3s. Ship product. Re-evaluate when a specific thing on the EKS list above becomes true, not sooner. You'll save money and keep your team's attention on the business, and you won't paint yourself into a corner because the migration path is honest and well-worn. [Ownkube](https://ownkube.io) runs on top of either cluster and handles the part neither EKS nor k3s gives you out of the box: the Heroku-style developer flow. Git push to deploy, a preview environment per PR, one-click Postgres and Redis in your VPC, plain-English pod crash explanations, and automatic right-sizing. Start on k3s in your own AWS account, switch to EKS when your business asks for it, keep the same workflow the whole way through. If you'd rather skip the setup, [connect your cloud](https://app.ownkube.io/signup) and we'll have you deploying in the time it took to read this. --- ## A Heroku alternative that runs in your own AWS account > Keep the Heroku workflow (git push, preview envs, managed Postgres) in an AWS account you own. Here's how teams do it and what changes when they do. - Canonical: https://ownkube.io/blog/heroku-alternative-in-your-own-aws-account - Markdown: https://ownkube.io/blog/heroku-alternative-in-your-own-aws-account.md - Published: 2026-03-21 - Author: Ownkube team - Category: Engineering - Tags: heroku-alternative, aws, paas, platform-engineering Somewhere around your third production incident on [Heroku](https://www.heroku.com), whether it's the 2 AM dyno that OOM-killed with no explanation or the compliance officer who asked where customer data actually lives, you start pricing alternatives. For most teams we talk to, the honest answer isn't "move to [Render](https://render.com)" or "move to [Railway](https://railway.com)." Those are lateral moves. Same shared infrastructure, same markup, same awkward conversation with your auditor next quarter. The real answer is: keep the workflow you loved (git push, addons, review apps, one mental model), but run it inside an AWS account you own. That's what this post is about. Specifically: how teams replace each piece of the Heroku experience with something that lives in their own VPC, what they keep, what they trade away, and the two reference architectures that actually work. If you're trying to compare PaaS-to-PaaS instead, we have a separate post on [Render vs your own AWS account](/blog/render-vs-aws-own-account); if you're trying to ship on AWS without hiring a platform engineer, [start here](/blog/deploy-on-aws-without-devops-engineer). ## What people loved about Heroku in the first place It's worth stating plainly, because a lot of the industry has forgotten. Heroku was good. For a long time, it was the best developer experience in the business. You pushed git, you got a URL, you could add a Postgres, and you could go back to writing the product. Specifically, what made it special: - **Git push as a deploy primitive.** No build server to babysit, no container registry to configure. The VCS was the API. - **Opinionated defaults.** You didn't pick a base image or a log driver. Someone else had already argued about those so you didn't have to. - **Managed datastores without a ticket.** `heroku addons:create heroku-postgresql` and you had a production DB with backups in 30 seconds. - **Review apps.** Every PR got a URL. Designers reviewed real builds. Sales demoed features that hadn't merged yet. - **A single mental model.** Dynos, add-ons, releases. A new engineer was productive in an afternoon. That package (velocity without decision fatigue) is what teams actually miss when they leave. They don't miss Heroku. They miss that shape of platform. ## Why teams leave shared PaaS anyway Nobody leaves Heroku because they stopped liking it. They leave because it stops fitting. **The bill stops making sense.** A 2x Performance-L dyno runs around $500/month on Heroku's published rates (April 2026). The equivalent on bare EC2 is closer to $150. On a small app the delta is a rounding error; on a growing app it's a full engineering hire. **Compliance conversations get harder.** When your customer's security review asks where their data is processed, "Salesforce-owned shared infrastructure" is a slower answer than "our VPC in us-east-1." SOC 2, HIPAA, and most enterprise procurement forms want the second one. **Your AWS credits are stranded.** Funded startups often carry six figures of [AWS Activate](https://aws.amazon.com/activate/) credits sitting unused because their platform bills go somewhere else. That's runway you're setting on fire. **You hit architectural ceilings.** Private networking to a data lake, GPU workers, [VPC peering](https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html), long-running batch jobs, [spot instances](https://aws.amazon.com/ec2/spot/) for background work. These are either impossible or painful on shared PaaS. (If you're weighing Render specifically, we wrote up the tradeoffs in [Render vs your own AWS account](/blog/render-vs-aws-own-account).) **The escape plan is the real problem.** Even if none of the above mattered today, the day you outgrow the platform you're stuck rewriting how every service deploys, how every secret is injected, how every cron runs. There's no graceful exit. So teams start sketching an alternative. And the sketch is almost always the same: _give me Heroku, but running in an AWS account I control._ ## What actually changes when the account is yours This is the part that gets glossed over in most comparison posts. Owning the AWS account is not a cosmetic change. It changes five concrete things. **1. The data never leaves your perimeter.** Your Postgres, your object storage, your Redis, all provisioned in your VPC, in your region, under your KMS keys. Auditors stop asking follow-up questions. DPAs get shorter. **2. You pay wholesale.** There's no per-dyno markup between you and EC2. A c7g.large is a c7g.large. Savings Plans apply. Spot applies. Credits apply. The unit economics look like an AWS bill because they are one. **3. You inherit AWS's surface area.** Need a private link to an RDS instance a customer gave you access to? Fine. Need a specific instance family for a vendor's licensing? Fine. You're no longer constrained to the intersection of features your PaaS happened to expose. **4. The exit is free.** If you decide to replace the platform layer, the workloads don't move. The databases don't move. The DNS doesn't move. You're disconnecting a control plane, not migrating. **5. You become the operator.** This is the honest downside. Someone has to patch nodes, rotate certs, upgrade the orchestrator, watch for CVEs. The whole point of a Heroku-style platform in your own AWS account is to make that someone a piece of software, not a team of three. ## Two reference architectures: start small, scale later Ownkube ships in two modes, and the right architecture depends on where your traffic and team size actually are today. Both are deliberately boring. We're not reinventing AWS, we're just making it feel like Heroku. ### Mode 1: k3s mode (the default for indie and small teams) ![Ownkube k3s-mode reference architecture on AWS: Cloudflare providing DNS, preview domain, and DDoS/bot/scrape protection, routing HTTPS traffic to a single k3s cluster on EC2 with managed Postgres on the same EC2 node, S3 for artifacts, Secrets Manager and CloudWatch, driven by the Ownkube SaaS control plane outside the customer account.](/diagrams/heroku-alternative-reference.png) This is the shape most teams land on first. Single k3s cluster, a handful of EC2 nodes (mixed spot + on-demand in an ASG), Postgres running on the same EC2 managed by Ownkube, S3 for artifacts. That's it. **Why [k3s](https://k3s.io) here, not [EKS](https://aws.amazon.com/eks/).** EKS has a $73/month control-plane bill before you've launched a pod. k3s is a single binary, boots in seconds, has a smaller CVE footprint, and runs happily on t4g/c7g with 1 to 2 GB of RAM overhead per node. For indie developers, side projects, and small-team dev environments, k3s removes the AWS tax without removing real Kubernetes underneath. (Full technical comparison: [EKS vs k3s on AWS for startups](/blog/eks-vs-k3s-on-aws-for-startups).) **Why Managed Postgres on your EC2, not [RDS](https://aws.amazon.com/rds/).** For low-traffic apps, RDS is a tax. A db.t4g.micro RDS instance starts at about $12/month; the equivalent Postgres colocated on an EC2 node you're already paying for is free on top of the compute. Ownkube operates it like a managed service (backups, PITR, failover) but the bytes sit on your EBS volumes. When the workload grows past what one box can carry, you flip to RDS with one click. No migration tooling required. **Why not [ECS](https://aws.amazon.com/ecs/).** ECS is tempting because it's "simpler than Kubernetes," but the simplicity is bought with lock-in. Task definitions, services, and the deploy model are AWS-proprietary: no portable API, no `kubectl` anywhere else, and none of the Kubernetes ecosystem (Helm, operators, ingress controllers) transfers. The day you want to move a workload to another cloud or another account, you rewrite every manifest. That's the exact opposite of the portability we're optimizing for here. k3s gives you most of ECS's operational simplicity without the one-way door. **Why a mixed ASG.** Stateless web workloads and background jobs run on spot by default, with on-demand capacity reserved for the Postgres node. For most apps this takes 40 to 70% off the compute bill without any code changes. ### Mode 2: EKS mode (the scale tier) Once you're past a single-node workload (multi-AZ uptime requirements, multiple teams deploying in parallel, heavy observability needs, or enterprise procurement asking about managed control planes), Ownkube flips the same application into EKS mode. Same git push, same preview URLs, same Cloudflare-backed domain. Different substrate. ![Ownkube EKS-mode reference architecture on AWS: Cloudflare handling DNS, custom and preview domains, plus DDoS/bot/scrape protection, routing HTTPS through an ALB into a multi-AZ EKS cluster with managed control plane and spot + on-demand node groups, backed by managed RDS Postgres (Multi-AZ, PITR), ElastiCache, S3, Secrets Manager (IRSA), and CloudWatch/OTel, all driven by the Ownkube SaaS control plane outside the customer account.](/diagrams/heroku-alternative-eks-scale.png) The upgrade is additive, not a rewrite: EKS replaces k3s, RDS + ElastiCache replace the in-cluster Postgres, an ALB sits in front of Cloudflare's edge, and CloudWatch + OTel get wired in end-to-end. Secrets Manager uses [IRSA](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) for workload identity. You keep your application manifests. **Why the control plane is split.** In both modes, the Ownkube control plane (the thing that shows you deploys, handles git pushes, orchestrates previews) lives in our SaaS. It holds no application data, no database credentials, no secrets. Everything that actually matters runs in your account. If our control plane disappeared tomorrow, your apps would keep serving traffic. ## Why Cloudflare, not Route 53 A detail worth calling out: the ingress layer in both modes is [Cloudflare](https://www.cloudflare.com), not [Route 53](https://aws.amazon.com/route53/). This is a deliberate choice and it's one of the reasons the developer experience feels closer to Heroku than to raw AWS. **You don't set up DNS.** When you connect an AWS account, Ownkube provisions a subdomain on a Cloudflare zone we operate, something like `your-app.ownkube.app`. Your preview environments get their own hostnames under it. You can send those URLs to customers, teammates, or stakeholders the same afternoon you connect your cloud. No domain purchase, no nameserver hand-off, no "waiting for DNS to propagate." You can attach your own custom domain whenever you're ready. **You get an edge WAF for free.** Cloudflare's free tier includes DDoS protection, bot fight mode, and scrape protection. For an indie developer whose first viral launch is a real risk factor, that's meaningful. You inherit a CDN with unmetered mitigation before you've written a single security rule. On raw AWS, the equivalent is [Shield](https://aws.amazon.com/shield/) + [WAF](https://aws.amazon.com/waf/), and the bill shows up on the first invoice. **TLS stops being your problem.** Cloudflare terminates TLS at the edge with managed certificates that rotate automatically. In k3s mode we run a simple HTTP listener inside the VPC and let Cloudflare handle public-facing TLS; in EKS mode the ALB terminates a second time with ACM for strict-origin validation. Either way, no cert-manager pager alerts on a Sunday. Route 53 is still the right answer for some teams (if you're already committed to AWS-native DNS for Route 53 health checks, private hosted zones for internal service discovery, or complex failover routing), and Ownkube won't stop you from bringing your own zone. But for most teams, the first-month question "where do I put my DNS?" doesn't need to be a decision. Cloudflare is the answer, it's free, and it's already wired up. The workflow on top of this is the part that should feel familiar. Connect a repo. A push to `main` triggers a build. The resulting image is promoted through environments. A pull request spins up a preview with its own DB fork and its own subdomain, and tears down on merge. Logs, metrics, and errors flow into one place. Secrets are injected at runtime from Secrets Manager. Nobody on your team writes a deployment manifest. ## Who this setup is actually for Be honest about this, because the wrong fit is expensive for everyone. **This is a strong fit if:** - You have 10 to 100 engineers and a growing AWS bill. - You've outgrown (or are about to outgrow) Heroku, Render, or Railway on cost, compliance, or capability. - You have AWS credits you want to use. - You don't have a dedicated platform team and don't plan to hire one for another 12 to 24 months. - You care about portability. You want the option to walk away from any vendor, including us, without rewriting your deployment story. **This is probably not a fit if:** - You're a solo developer or pre-PMF team shipping a weekend project. Heroku or Railway will get you there faster. - You already have a mature internal platform team with strong opinions. You don't need us; you need [Crossplane](https://www.crossplane.io/) and time. - You require on-prem or sovereign cloud deployments today. We're AWS-first, with GCP and Azure coming. - Your workload isn't web-shaped (heavy HPC, low-latency trading, specialized hardware). Those deserve a purpose-built stack. If you're in the sweet spot, the calculus is usually straightforward: the platform costs less than one platform engineer, gets you back to a Heroku-shaped workflow, and leaves the infrastructure in your name. ## Where Ownkube fits You can build the architecture above yourself. People do. It takes six to twelve months, one to three senior engineers, and ongoing maintenance that never really ends. That's a reasonable choice if platform engineering is a strategic moat for your business. For everyone else, Ownkube is that architecture, turned into a product. You connect an AWS account, we bootstrap k3s and the supporting services, and you get the Heroku workflow (git push, managed Postgres, preview environments, autoscaling, cost controls) running on infrastructure you own. The Incident, Cost, and Scaling agents cover the parts nobody enjoys: translating crashloops into plain English, catching memory leaks before they page someone, flagging cost anomalies the day they start. Pricing (as of April 2026) is free for teams on k3s mode for indie builders and small-team dev environments, and $5/vCPU/month plus $1/GB RAM/month on EKS mode when you scale. Your AWS bill goes to AWS, at AWS rates, with your credits applied. No markup on compute, no call-us-for-a-quote pages, and on k3s mode every dollar of AWS Activate credit applies straight to wholesale EC2. If you've been sketching this architecture on a whiteboard for a quarter, or you're one Heroku invoice away from forcing the conversation with your CTO, [connect your cloud](https://app.ownkube.io/signup) and deploy your first app. Fifteen minutes, and the infrastructure is yours on the other side of it. --- ## Kubernetes events disappear after an hour. Here's how to fix that. > Kubernetes events tell you exactly what's going wrong in your cluster, but they vanish after 60 minutes. This guide walks through exporting them to Elasticsearch, Slack, Loki, and 30+ other destinations. - Canonical: https://ownkube.io/blog/kubernetes-event-monitoring-complete-guide - Markdown: https://ownkube.io/blog/kubernetes-event-monitoring-complete-guide.md - Published: 2026-03-07 - Author: Ownkube team - Category: Engineering - Tags: kubernetes, monitoring, observability, events, helm If you've ever run `kubectl get events` and wished those events didn't vanish after an hour, keep reading. Kubernetes events tell you exactly what's happening in your cluster (pod scheduling failures, image pull errors, OOM kills), but by default they disappear from etcd after 60 minutes. This guide walks through why that's a problem and how to export events to the tools your team already uses. ## What are Kubernetes events? Every time something happens in your cluster, Kubernetes creates an [**Event**](https://kubernetes.io/docs/reference/kubernetes-api/cluster-resources/event-v1/) object. A pod gets scheduled, a container crashes, a volume fails to mount. Events are first-class API resources, same as Pods or Deployments. Here's what one looks like: ```yaml apiVersion: v1 kind: Event metadata: name: my-app-pod.17a6f8e2c namespace: production reason: BackOff message: "Back-off restarting failed container" type: Warning involvedObject: kind: Pod name: my-app-pod namespace: production count: 15 firstTimestamp: "2026-03-07T10:00:00Z" lastTimestamp: "2026-03-07T10:15:00Z" ``` The useful fields: - **Reason**: Machine-readable cause (`BackOff`, `FailedScheduling`, `Unhealthy`, `Pulling`, `Killing`) - **Type**: Either `Normal` or `Warning` - **Message**: Human-readable description of what happened - **InvolvedObject**: The resource this event is about - **Count**: How many times this event has occurred ## Why the defaults are a problem Three things make stock Kubernetes events nearly useless for debugging: ### Events expire after 1 hour The API server retains events for **60 minutes**. A pod crash at 2 AM is gone by 9 AM. You're left guessing. ### No search or aggregation `kubectl get events` gives you a flat, unsorted list. No cross-namespace search, no severity filtering, no time correlation. Just a wall of text. ### No alerting Kubernetes won't tell you when something goes wrong. A `FailedScheduling` event could fire hundreds of times before anyone notices, unless someone happens to be watching. ## Export events to external systems The fix is an event exporter: a lightweight component that watches the Kubernetes event stream via the API server and forwards events to external destinations (sinks) like [Elasticsearch](https://www.elastic.co/elasticsearch), Slack, [Loki](https://grafana.com/oss/loki/), [Kafka](https://kafka.apache.org), or webhooks. The pipeline: ![Kubernetes event exporter pipeline: the API server streams events to a SharedInformer-backed watcher, which passes them through routing rules that filter, match, and drop, then fans out to sinks like Slack, Elasticsearch, Loki, and Kafka/webhooks.](/diagrams/event-exporter-pipeline.png) The exporter uses a [SharedInformer](https://pkg.go.dev/k8s.io/client-go/tools/cache#SharedInformer) to watch events without polling, applies configurable routing rules, and delivers them to one or more sinks. ## Setting up the exporter The [Ownkube Kubernetes Events Exporter](https://github.com/ownkube/kubernetes-events-exporter) is a fork we maintain. The upstream project by Opsgenie (later Resmo) has been dormant since 2023 and has some bugs that cause silent event loss. More on that later. ### Install with [Helm](https://helm.sh) ```bash helm install event-exporter \ oci://ghcr.io/ownkube/charts/kubernetes-events-exporter \ --namespace monitoring \ --create-namespace ``` ### Basic configuration The exporter takes a YAML config (deployed as a ConfigMap). A minimal config that sends everything to stdout: ```yaml logLevel: info route: routes: - match: - receiver: "stdout" receivers: - name: "stdout" stdout: {} ``` ## Routing You can filter and direct events to different sinks based on any event property: namespace, event type, reason, involved object kind, etc. ### Route warning events to Slack ```yaml route: routes: - match: - receiver: "slack-warnings" match: kind: ".*" type: "Warning" - match: - receiver: "elasticsearch" receivers: - name: "slack-warnings" slack: token: "${SLACK_TOKEN}" channel: "#k8s-alerts" message: | *{{ .Reason }}* in {{ .InvolvedObject.Namespace }}/{{ .InvolvedObject.Name }} ```{{ .Message }}``` - name: "elasticsearch" elasticsearch: hosts: - "https://elasticsearch:9200" index: "k8s-events" ``` ### Drop noisy events Some events are just noise. Drop them before they hit any sink: ```yaml route: drop: - type: "Normal" reason: "LeaderElection" - namespace: "kube-system" reason: "ScalingReplicaSet" routes: - match: - receiver: "elasticsearch" ``` ### Route by namespace Production events go to PagerDuty. Staging events go to Slack. Everything goes to Elasticsearch. ```yaml route: routes: - match: - receiver: "pagerduty-webhook" match: namespace: "production" type: "Warning" - match: - receiver: "slack-staging" match: namespace: "staging" - match: - receiver: "elasticsearch" ``` ## Sink examples The exporter supports 30+ sinks. Here are the ones most people use. ### Elasticsearch Store all events for long-term analysis and Kibana dashboards: ```yaml receivers: - name: "elasticsearch" elasticsearch: hosts: - "https://elasticsearch.monitoring:9200" index: "k8s-events" username: "${ES_USER}" password: "${ES_PASSWORD}" useEventID: true ``` ### Loki Feed events into [Grafana Loki](https://grafana.com/oss/loki/) for log-style querying: ```yaml receivers: - name: "loki" loki: url: "http://loki.monitoring:3100/loki/api/v1/push" streamLabels: app: "kubernetes-events" basicAuth: username: "${LOKI_USER}" password: "${LOKI_PASSWORD}" ``` ### Slack ```yaml receivers: - name: "slack" slack: token: "${SLACK_TOKEN}" channel: "#k8s-events" message: | *{{ .Type }}* event on *{{ .InvolvedObject.Kind }}* `{{ .InvolvedObject.Name }}` Namespace: `{{ .InvolvedObject.Namespace }}` Reason: `{{ .Reason }}` Message: {{ .Message }} ``` ### Kafka ```yaml receivers: - name: "kafka" kafka: brokers: - "kafka-broker:9092" topic: "kubernetes-events" tls: enable: true ``` ### Webhook Send events to any HTTP endpoint: ```yaml receivers: - name: "webhook" webhook: endpoint: "https://your-api.example.com/k8s-events" headers: Authorization: "Bearer ${WEBHOOK_TOKEN}" Content-Type: "application/json" ``` ### Prometheus Turn events into [Prometheus](https://prometheus.io) metrics for alerting with [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/): ```yaml receivers: - name: "prometheus" prometheus: labels: - "reason" - "type" - "involvedObject.kind" - "involvedObject.namespace" ``` ## Events worth watching Not all events matter equally. These are the ones that usually point to real problems. ### Pod lifecycle issues | Reason | Type | What it means | |--------|------|---------------| | `BackOff` | Warning | Container keeps crashing and restarting | | `Unhealthy` | Warning | Liveness or readiness probe failed | | `OOMKilling` | Warning | Container exceeded memory limits | | `FailedMount` | Warning | Volume couldn't be mounted | | `ErrImagePull` | Warning | Can't pull the container image | ### Scheduling problems | Reason | Type | What it means | |--------|------|---------------| | `FailedScheduling` | Warning | No node has enough resources | | `Preempting` | Normal | Pod is being preempted for a higher-priority pod | | `NotTriggerScaleUp` | Warning | Cluster autoscaler can't add nodes | ### Node issues | Reason | Type | What it means | |--------|------|---------------| | `NodeNotReady` | Warning | Node is unhealthy | | `EvictionThresholdMet` | Warning | Node is running low on resources | | `Rebooted` | Warning | Node was rebooted | ## Practical tips ### Drop noise early Leader election events, successful image pulls, and routine scaling events generate a lot of volume. Drop them at the top of your routing tree. Your storage bill and your Slack channel will thank you. ### Don't hardcode secrets The exporter supports `${ENV_VAR}` syntax. Use it with Kubernetes Secrets: ```yaml env: - name: SLACK_TOKEN valueFrom: secretKeyRef: name: event-exporter-secrets key: slack-token ``` ### Separate sinks for separate jobs - Elasticsearch or Loki for historical analysis - Slack for real-time awareness of warnings - Webhook or PagerDuty for on-call alerting - Prometheus for dashboards and SLO tracking ### Run with leader election If you're running multiple replicas, enable leader election so only one instance processes events: ```yaml leaderElection: enabled: true leaderElectionID: "event-exporter-leader" ``` ### Monitor the exporter itself It exposes Prometheus metrics at `/metrics`. Set up alerts for high error rates on sink delivery, event processing lag, and exporter pod restarts. ## Why we maintain this fork The [original `kubernetes-event-exporter`](https://github.com/resmoio/kubernetes-event-exporter) by Opsgenie has been dormant since 2023. We ran into bugs in production and started fixing them. The fork now includes: - A fix for silent event loss. The upstream only handled add and delete operations, missing updates entirely. - Accurate event age calculation using `max(EventTime, FirstTimestamp, LastTimestamp)` instead of just `LastTimestamp`, which was causing stale events to get re-exported. - Loki improvements: basic auth, stream label templating, TLS transport fixes. - A Prometheus sink for converting events directly into metrics. - Elasticsearch v8 compatibility (the v8 API broke the upstream). - SNS FIFO support with proper message group IDs. The full code and Helm chart are on [GitHub](https://github.com/ownkube/kubernetes-events-exporter). ## Getting started 1. Install the exporter via Helm into your monitoring namespace 2. Start with stdout + one sink (Slack or Elasticsearch) to validate routing works 3. Add drop rules for the noisy stuff 4. Expand sinks as you need them That's it. Kubernetes events go from "gone in 60 minutes" to searchable, filterable, and alertable. ## If you'd rather skip the plumbing If you're running k3s or EKS in your own AWS account, [Ownkube](https://ownkube.io) wires event routing in by default: warnings into Slack, everything into your logging backend, crashloops explained in plain English, without installing a chart or writing a routing YAML. It runs on your cluster in your AWS account, so the events never leave your perimeter. [Connect your cloud](https://app.ownkube.io/signup) and event monitoring is one of the things you stop thinking about.