Building a Production Ecommerce Platform on AWS EKS
Building and Deploying a Full-Stack Ecommerce Platform: From Zero to Production on AWS EKS
By Rajasekhar Reddy | Senior DevOps Engineer | April 2026
For years, I wanted to build an ecommerce application. I started multiple times and never finished. The scope was always too big, the technology choices paralyzing, the motivation fading after the first few bugs.
Then I decided to stop planning and start shipping. I built a complete ecommerce platform with real payments, deployed it to AWS EKS with full observability, and made it production-ready — all in a few focused sessions.
This is the story of how I built Raj Store, what architectural decisions I made, and why I chose a modular monolith over microservices. Whether you're a developer looking for inspiration, a DevOps engineer curious about full-stack deployment, or a business stakeholder evaluating technology approaches — there's something here for you.
Raj Store — 100 products, real images, search with filters, hover animations
The Problem I Was Solving
Every ecommerce tutorial teaches you the basics — a product list, a cart, maybe a checkout page. But none of them show you what it takes to go from "it works on localhost" to "it's running in production with real payments, real observability, and real deployment pipelines."
I wanted to bridge that gap. Not just build an app, but deploy it the way a real engineering team would.
What I Built
The Application
Raj Store is a full-featured ecommerce platform built with Python (FastAPI) on the backend and server-rendered HTML with Jinja2 + TailwindCSS on the frontend. No React, no separate frontend repo — just clean server-side rendering with HTMX for interactivity.
Core features:
User authentication with JWT tokens stored in secure HTTP-only cookies
Product catalog with 100+ real products seeded from DummyJSON
Full-text search with category filters, price range, and sorting
Shopping cart with quantity management
Real payment processing via Stripe Checkout (test mode)
Complete order lifecycle: Pending → Confirmed → Shipped → Delivered
Amazon-style order numbers (ORD-20260408-A7B2C9)
Admin panel with dashboard, order management, and Stripe refunds
Product reviews and 5-star ratings (purchase-gated — only buyers can review)
Wishlist for saving products
International shipping addresses with 26-country dropdown
Email notifications for registration, order confirmation, and status updates
Product detail page with image, pricing, stock status, Add to Cart, wishlist, and customer reviews
Shopping cart with subtotals, total calculation, and Stripe checkout button
Architecture: Why I Chose a Modular Monolith
This is probably the most important decision I made, and it goes against what most tutorials tell you.
When I started planning, my instinct was to split everything into microservices: auth-service, product-service, cart-service, order-service, payment-service. Seven separate applications, seven databases, seven deployment pipelines.
Then I asked myself: who is this for?
I'm a solo developer building a portfolio project. I don't have a team of 50 engineers who need to deploy independently. I don't have millions of requests per second that require different scaling strategies for different components. I don't have organizational boundaries that mandate service separation.
What I do have is one application that needs to work reliably, be easy to debug, and be impressive to demo.
A modular monolith was the right call. Here's the actual code structure:
app/
├── main.py # Entry point, router registration
├── config.py # Environment-based settings (Pydantic)
├── database.py # SQLAlchemy engine + session
├── dependencies.py # Auth dependencies
├── otel.py # OpenTelemetry setup (graceful degradation)
├── models/ # SQLAlchemy ORM models
│ ├── user.py
│ ├── product.py
│ ├── cart.py
│ ├── order.py
│ ├── review.py
│ └── ...
├── schemas/ # Pydantic request/response schemas
├── services/ # Business logic layer
│ ├── product_service.py
│ ├── cart_service.py
│ ├── order_service.py
│ ├── search_service.py # OpenSearch integration
│ └── email_service.py
└── routers/ # HTTP endpoints
├── auth.py
├── pages.py # Server-rendered HTML pages
├── product.py
├── cart.py
├── order.py # Stripe checkout + payments
├── admin.py
└── ...
Each "module" (auth, products, cart, orders) has its own model, schema, service, and router — but they share one database, one process, and one deployment. This is exactly how Shopify (Rails monolith), GitHub (Rails monolith), and Stack Overflow (.NET monolith) are built.
The key insight: you can always extract a microservice later when you have real traffic patterns showing which module needs independent scaling. Starting with microservices before you have that data is premature optimization at the architectural level.
The Tech Stack
| Layer | Technology | Why |
|---|---|---|
| Backend | FastAPI (Python 3.11) | Async, type-safe, auto-generated OpenAPI docs |
| ORM | SQLAlchemy 2.x | Industry standard, works with SQLite and PostgreSQL |
| Templates | Jinja2 + TailwindCSS | Server-rendered, no JavaScript framework needed |
| Interactivity | HTMX | Add-to-cart without page reload, minimal JS |
| Payments | Stripe Checkout | PCI compliant, hosted payment page, real refunds |
| Database | PostgreSQL 16 | Production-grade, running as StatefulSet on EKS |
| Search | OpenSearch | Fuzzy full-text search with relevance ranking |
| Cache | Redis 7 (ready, not yet enabled) | Product listing cache, session store |
| Tracing | OpenTelemetry + Jaeger | Distributed tracing across every HTTP request |
| Metrics | Prometheus + Grafana | Request rate, latency percentiles, pod health |
| Analytics | Apache Superset | Revenue dashboards, sales analytics, customer insights |
Deployment: Production-Grade EKS
This is where my DevOps background made the difference. The application runs on AWS EKS with a full production deployment pipeline.
Infrastructure (Terraform)
Everything is Infrastructure as Code. One terraform apply creates:
VPC with private subnets across 3 availability zones
EKS cluster with managed node groups (5 × t3.small)
IAM roles with IRSA (IAM Roles for Service Accounts) — no long-lived credentials anywhere
EBS CSI driver for persistent volume provisioning
ECR for container image storage
CI/CD Pipeline
The deployment pipeline uses GitHub Actions with OIDC authentication to AWS — no access keys stored anywhere.
Developer pushes code to main
↓
GitHub Actions triggers (OIDC → AWS)
↓
Docker image built and pushed to ECR
↓
ArgoCD detects new image tag in git
↓
ArgoCD syncs Helm chart to EKS
↓
Rolling update with zero downtime
GitHub Actions build pipeline — OIDC auth, Docker build, ECR push
GitOps with ArgoCD
ArgoCD watches the Helm chart in the git repository. Any change to the chart (new image tag, config change, resource limit adjustment) is automatically applied to the cluster. No manual kubectl apply needed.
ArgoCD application view showing all Kubernetes resources in sync
Secrets Management
Application secrets (database passwords, Stripe API keys, JWT signing keys) are stored in AWS Secrets Manager and synced to Kubernetes via External Secrets Operator. The git repository contains zero secrets — only references to which keys to fetch.
AWS Secrets Manager (raj-store/prod)
↓ (ESO syncs every 1 hour)
Kubernetes Secret (raj-store-secrets)
↓ (mounted as env vars)
FastAPI Pod
Non-sensitive configuration (service hostnames, ports, feature flags) lives in a Kubernetes ConfigMap managed by the Helm chart.
Observability: Seeing Everything
This is where the project goes from "deployed app" to "production-ready platform." I instrumented the application with three pillars of observability: traces, metrics, and analytics.
Distributed Tracing (OpenTelemetry + Jaeger)
Every HTTP request is automatically traced — from the NGINX ingress through the FastAPI route handler, into every SQL query, and out to external API calls (like Stripe).
Jaeger showing distributed traces — every request broken down into spans with timing
The OpenTelemetry integration is auto-instrumented. I added ~30 lines of setup code, and every FastAPI route, every SQLAlchemy query, and every outbound HTTP call is traced automatically. Health check and readiness probe endpoints are excluded from traces to reduce noise.
When debugging a slow checkout, I can see exactly where time is spent:
POST /orders/checkout-form 2.3s total
├── get_current_user 2ms
├── get_all_cart_items (SQL) 18ms
├── get_user_addresses (SQL) 8ms
└── stripe.checkout.Session.create 2200ms ← Stripe API call
Metrics (Prometheus + Grafana)
Prometheus scrapes metrics from the FastAPI application via the /metrics endpoint (provided by prometheus-fastapi-instrumentator). Grafana dashboards show:
Request rate per endpoint
Response time percentiles (p50, p95, p99)
Error rate percentage
Pod CPU and memory utilization
HTTP status code distribution
Grafana dashboard — request rate, latency percentiles, error rate, pod resources
Business Analytics (Apache Superset)
Superset connects directly to the PostgreSQL database (via a read-only user for security) and provides SQL-powered dashboards for business metrics:
Revenue trends over time
Top-selling products by revenue
Order status distribution (pending vs confirmed vs shipped vs delivered)
Customer geography
Average order value
Product rating distribution
Superset analytics dashboard — revenue, top products, order status, customer insights
Stripe Integration: Real Payments
The payment flow uses Stripe Checkout — a hosted payment page that handles card validation, 3D Secure, and PCI compliance without any card data touching my server.
Checkout flow:
User clicks "Proceed to Checkout" in the cart
App creates a Stripe Checkout Session with line items
User is redirected to Stripe's hosted payment page
After payment, Stripe redirects back with a session ID
App verifies payment status with Stripe API
Order is created with status "confirmed" and the Stripe payment intent is saved
Confirmation email is sent
Refunds are handled through the admin panel. The admin clicks a "Refund" button, which calls the Stripe Refund API with the saved payment intent. The order status updates to "refunded" and the customer receives an email notification. Real money flows back through Stripe — verified in the Stripe Dashboard.
Stripe Checkout page — real payment processing in test mode
Order confirmation page with Amazon-style order number
What I'd Do Differently
1. Start with PostgreSQL from day one. I started with SQLite for local development and migrated to PostgreSQL for production. While SQLAlchemy abstracts most differences, there were small quirks (like ALTER TABLE behavior) that caused unnecessary debugging. Starting with PostgreSQL via Docker Compose would have been smoother.
2. Add Alembic migrations early. I used Base.metadata.create_all() for table creation and manual ALTER TABLE for schema changes. This works for a solo project but doesn't scale. Alembic would have given me versioned, repeatable migrations from the start.
4. Use Stripe webhooks instead of redirect-based verification. The current flow relies on the success redirect to verify payment. If the user's browser crashes after payment but before the redirect, the order isn't created even though the card was charged. Webhooks solve this by having Stripe notify the server directly — independent of the browser.
For DevOps Engineers
If you're a DevOps engineer looking to level up your application development skills, this project covers:
How authentication actually works in web apps (JWT, cookies, middleware)
Why database schema design matters (foreign keys, indexes, relationships)
How payment gateways integrate (Stripe's redirect-based flow)
What "full-stack observability" means in practice (traces + metrics + analytics)
How to deploy a real application, not just infrastructure
When microservices make sense vs when a monolith is the right choice
The biggest insight: once you've built and deployed your own application, you understand developers' problems from the inside. That makes you a dramatically better DevOps engineer.
For Business Stakeholders
If you're evaluating this as a technology approach:
The modular monolith architecture keeps development velocity high while maintaining code quality
GitOps deployment means every change is auditable, reversible, and automated
Full observability means issues are detected and debugged in minutes, not hours
Scaling is straightforward — increase replica count for the app, upgrade node sizes for the database
The same codebase serves both the HTML storefront and a REST API (future mobile app ready)
Try It Yourself
Browse products, search, filter by category
Create an account and place a test order (use card
4242 4242 4242 4242)Check your order history
Leave a product review
Source code: github.com/rajasekhar-cloud25/ecommerce-api
Rajasekhar Reddy is a Senior DevOps Engineer with 7+ years of experience across AWS, Azure, GCP, and hybrid cloud environments. He holds CKA and Azure Administrator certifications. Connect on rajasekharcloud.com.

