Replatforming Airbyte: From Developer Laptop to EKS
originally posted at LinkedIn at July 25, 2025
In early-stage engineering teams, it's natural for tools to start out simple — often running on a single developer machine, just to get things moving. That’s how our Airbyte setup began: quick to spin up, good enough for testing connectors, and easy to iterate on.
But as our team grew and data pipelines became more embedded in how we operated, we knew it was time to treat Airbyte like real infrastructure. That meant moving beyond local environments and into a scalable, secure, and repeatable deployment.
We migrated Airbyte OSS to Amazon EKS, using Helm and AWS-native services like S3 and IAM Roles for Service Accounts (IRSA). Our goal wasn’t to fix something broken, but to build on what was working and make it production-ready—without sacrificing developer velocity.
This post shares how we did it, what we learned, and what you might want to consider if you’re operationalizing Airbyte (or any similar open-source tool) in a small but growing cloud-native team.
📦 From Local Dev to Platform Service
The starting point was a typical OSS install: Airbyte running via Docker Compose, with internal Postgres and MinIO providing storage.
For a single user, this setup works. But as sync jobs became more important — and more cloud-integrated — the limitations grew obvious: - Only one person could run or inspect jobs - Jobs stopped if the machine rebooted or slept - The setup didn’t integrate cleanly with AWS services like KMS or IAM
We needed to make Airbyte a first-class internal service — which meant replatforming it onto our Kubernetes infrastructure.
🚀 Helm-First on EKS
We deployed Airbyte using the official Helm chart on Amazon EKS, with all configuration defined in a single values.yaml
file.
This made it easy to: - Version control the deployment - Propagate changes to dev, stage, or prod clusters - Configure each Airbyte component with its own replica count, resources, and ingress
We enabled the core stack:
- webapp
, server
, and worker
components
- Internal Postgres with persistent volumes
- An Ingress controller (in our case, Tailscale) for secure access
All of this is managed by Helm, making deployment and upgrades consistent and CI/CD-friendly.
☁️ Why We Abandoned MinIO
MinIO ships by default in Airbyte’s OSS Helm chart — and it’s fine if you're storing state internally.
But as soon as we configured an S3 destination connector, we ran into a hard blocker.
Airbyte’s connector expects actual AWS S3 — not a MinIO alias. The authentication flows, endpoint discovery, and SDK behavior are incompatible. The result? Jobs fail, and you lose the ability to use one of Airbyte’s most valuable features.
So we turned off MinIO and went all-in on real S3:
global:
storage:
type: s3
bucket:
log: <BUCKET_NAME>
state: <BUCKET_NAME>
workloadOutput: <BUCKET_NAME>
s3:
region: <AWS_REGION>
authenticationType: instanceProfile
minio:
enabled: false
This cleared the path for both internal object storage and S3 destination connectors — no hacks required.
🔐 Evolving IAM: From Node Role to IRSA
To integrate Airbyte with S3 securely, we started with the simplest approach: attaching S3 access to the EKS node role. This allowed Airbyte to access S3 without static credentials—a good start, but too broad.
Later, we switched to IAM Roles for Service Accounts (IRSA) to scope permissions only to Airbyte pods.
We configured:
- An IRSA role (airbyte-irsa-role-prod
) in Terraform.
- A trust policy bound to the airbyte-admin
service account.
- A Helm annotation to bind that IAM role:
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<AWS_ACCOUNT>:role/airbyte-irsa-role-prod
This let us remove reliance on the NodeInstanceRole and grant least-privilege S3 + KMS access directly to Airbyte.
⚙️ Tuning Resources for Job Efficiency
One thing that stood out during testing: Airbyte jobs default to higher resource allocations than expected — likely to support heavy syncs across large datasets.
But for our workloads, that was overkill.
We explicitly set jobs.resources
to more modest requests and limits:
This helped reduce noise in our Kubernetes scheduler, especially when running multiple concurrent jobs. If you're deploying Airbyte in an environment with resource quotas or shared node pools, this is a worthwhile tweak.
🧱 Stable, Auditable, Cloud-Integrated
With the final setup: - Airbyte OSS runs fully inside EKS. - Storage is backed by real S3, with KMS support. - IAM is scoped via IRSA — no static secrets anywhere. - Ingress is handled through our standard controller (Tailscale). - PostgreSQL is deployed in-cluster with persistent volume claims.
It’s reproducible, secure, and fits neatly into how we manage infrastructure.
✅ Key Lessons for Platform Teams
If you're supporting internal data or integration teams: - MinIO doesn't work with S3 destination connectors — we learned this the hard way and switched to real S3 - Start simple with IAM, then evolve to IRSA as your use case grows - Use Helm to its full extent — the Airbyte chart is surprisingly complete - Tune job resources — the defaults may be more than you need
By replatforming Airbyte this way, we made it stable, scalable, and production-ready — not just for one user, but for the whole company.