Building and Deploying a Full-Stack Ecommerce Platform: From Zero to Production on AWS EKS

By Rajasekhar Reddy | Senior DevOps Engineer | April 2026

For years, I wanted to build an ecommerce application. I started multiple times and never finished. The scope was always too big, the technology choices paralyzing, the motivation fading after the first few bugs.

Then I decided to stop planning and start shipping. I built a complete ecommerce platform with real payments, deployed it to AWS EKS with full observability, and made it production-ready — all in a few focused sessions.

This is the story of how I built Raj Store, what architectural decisions I made, and why I chose a modular monolith over microservices. Whether you're a developer looking for inspiration, a DevOps engineer curious about full-stack deployment, or a business stakeholder evaluating technology approaches — there's something here for you.

Raj Store — 100 products, real images, search with filters, hover animations

The Problem I Was Solving

Every ecommerce tutorial teaches you the basics — a product list, a cart, maybe a checkout page. But none of them show you what it takes to go from "it works on localhost" to "it's running in production with real payments, real observability, and real deployment pipelines."

I wanted to bridge that gap. Not just build an app, but deploy it the way a real engineering team would.

What I Built

The Application

Raj Store is a full-featured ecommerce platform built with Python (FastAPI) on the backend and server-rendered HTML with Jinja2 + TailwindCSS on the frontend. No React, no separate frontend repo — just clean server-side rendering with HTMX for interactivity.

Core features:

User authentication with JWT tokens stored in secure HTTP-only cookies
Product catalog with 100+ real products seeded from DummyJSON
Full-text search with category filters, price range, and sorting
Shopping cart with quantity management
Real payment processing via Stripe Checkout (test mode)
Complete order lifecycle: Pending → Confirmed → Shipped → Delivered
Amazon-style order numbers (ORD-20260408-A7B2C9)
Admin panel with dashboard, order management, and Stripe refunds
Product reviews and 5-star ratings (purchase-gated — only buyers can review)
Wishlist for saving products
International shipping addresses with 26-country dropdown
Email notifications for registration, order confirmation, and status updates

Product detail page with image, pricing, stock status, Add to Cart, wishlist, and customer reviews

Shopping cart with subtotals, total calculation, and Stripe checkout button

Architecture: Why I Chose a Modular Monolith

This is probably the most important decision I made, and it goes against what most tutorials tell you.

When I started planning, my instinct was to split everything into microservices: auth-service, product-service, cart-service, order-service, payment-service. Seven separate applications, seven databases, seven deployment pipelines.

Then I asked myself: who is this for?

I'm a solo developer building a portfolio project. I don't have a team of 50 engineers who need to deploy independently. I don't have millions of requests per second that require different scaling strategies for different components. I don't have organizational boundaries that mandate service separation.

What I do have is one application that needs to work reliably, be easy to debug, and be impressive to demo.

A modular monolith was the right call. Here's the actual code structure:

app/
├── main.py              # Entry point, router registration
├── config.py            # Environment-based settings (Pydantic)
├── database.py          # SQLAlchemy engine + session
├── dependencies.py      # Auth dependencies
├── otel.py              # OpenTelemetry setup (graceful degradation)
├── models/              # SQLAlchemy ORM models
│   ├── user.py
│   ├── product.py
│   ├── cart.py
│   ├── order.py
│   ├── review.py
│   └── ...
├── schemas/             # Pydantic request/response schemas
├── services/            # Business logic layer
│   ├── product_service.py
│   ├── cart_service.py
│   ├── order_service.py
│   ├── search_service.py   # OpenSearch integration
│   └── email_service.py
└── routers/             # HTTP endpoints
    ├── auth.py
    ├── pages.py         # Server-rendered HTML pages
    ├── product.py
    ├── cart.py
    ├── order.py         # Stripe checkout + payments
    ├── admin.py
    └── ...

Each "module" (auth, products, cart, orders) has its own model, schema, service, and router — but they share one database, one process, and one deployment. This is exactly how Shopify (Rails monolith), GitHub (Rails monolith), and Stack Overflow (.NET monolith) are built.

The key insight: you can always extract a microservice later when you have real traffic patterns showing which module needs independent scaling. Starting with microservices before you have that data is premature optimization at the architectural level.

The Tech Stack

Layer	Technology	Why
Backend	FastAPI (Python 3.11)	Async, type-safe, auto-generated OpenAPI docs
ORM	SQLAlchemy 2.x	Industry standard, works with SQLite and PostgreSQL
Templates	Jinja2 + TailwindCSS	Server-rendered, no JavaScript framework needed
Interactivity	HTMX	Add-to-cart without page reload, minimal JS
Payments	Stripe Checkout	PCI compliant, hosted payment page, real refunds
Database	PostgreSQL 16	Production-grade, running as StatefulSet on EKS
Search	OpenSearch	Fuzzy full-text search with relevance ranking
Cache	Redis 7 (ready, not yet enabled)	Product listing cache, session store
Tracing	OpenTelemetry + Jaeger	Distributed tracing across every HTTP request
Metrics	Prometheus + Grafana	Request rate, latency percentiles, pod health
Analytics	Apache Superset	Revenue dashboards, sales analytics, customer insights

Deployment: Production-Grade EKS

This is where my DevOps background made the difference. The application runs on AWS EKS with a full production deployment pipeline.

Infrastructure (Terraform)

Everything is Infrastructure as Code. One terraform apply creates:

VPC with private subnets across 3 availability zones
EKS cluster with managed node groups (5 × t3.small)
IAM roles with IRSA (IAM Roles for Service Accounts) — no long-lived credentials anywhere
EBS CSI driver for persistent volume provisioning
ECR for container image storage

CI/CD Pipeline

The deployment pipeline uses GitHub Actions with OIDC authentication to AWS — no access keys stored anywhere.

Developer pushes code to main
    ↓
GitHub Actions triggers (OIDC → AWS)
    ↓
Docker image built and pushed to ECR
    ↓
ArgoCD detects new image tag in git
    ↓
ArgoCD syncs Helm chart to EKS
    ↓
Rolling update with zero downtime

GitHub Actions build pipeline — OIDC auth, Docker build, ECR push

GitOps with ArgoCD

ArgoCD watches the Helm chart in the git repository. Any change to the chart (new image tag, config change, resource limit adjustment) is automatically applied to the cluster. No manual kubectl apply needed.

ArgoCD application view showing all Kubernetes resources in sync

Secrets Management

Application secrets (database passwords, Stripe API keys, JWT signing keys) are stored in AWS Secrets Manager and synced to Kubernetes via External Secrets Operator. The git repository contains zero secrets — only references to which keys to fetch.

AWS Secrets Manager (raj-store/prod)
    ↓ (ESO syncs every 1 hour)
Kubernetes Secret (raj-store-secrets)
    ↓ (mounted as env vars)
FastAPI Pod

Non-sensitive configuration (service hostnames, ports, feature flags) lives in a Kubernetes ConfigMap managed by the Helm chart.

Observability: Seeing Everything

This is where the project goes from "deployed app" to "production-ready platform." I instrumented the application with three pillars of observability: traces, metrics, and analytics.

Distributed Tracing (OpenTelemetry + Jaeger)

Every HTTP request is automatically traced — from the NGINX ingress through the FastAPI route handler, into every SQL query, and out to external API calls (like Stripe).

Jaeger showing distributed traces — every request broken down into spans with timing

The OpenTelemetry integration is auto-instrumented. I added ~30 lines of setup code, and every FastAPI route, every SQLAlchemy query, and every outbound HTTP call is traced automatically. Health check and readiness probe endpoints are excluded from traces to reduce noise.

When debugging a slow checkout, I can see exactly where time is spent:

POST /orders/checkout-form           2.3s total
├── get_current_user                     2ms
├── get_all_cart_items (SQL)            18ms
├── get_user_addresses (SQL)             8ms
└── stripe.checkout.Session.create   2200ms  ← Stripe API call

Metrics (Prometheus + Grafana)

Prometheus scrapes metrics from the FastAPI application via the /metrics endpoint (provided by prometheus-fastapi-instrumentator). Grafana dashboards show:

Request rate per endpoint
Response time percentiles (p50, p95, p99)
Error rate percentage
Pod CPU and memory utilization
HTTP status code distribution

Grafana dashboard — request rate, latency percentiles, error rate, pod resources

Business Analytics (Apache Superset)

Superset connects directly to the PostgreSQL database (via a read-only user for security) and provides SQL-powered dashboards for business metrics:

Revenue trends over time
Top-selling products by revenue
Order status distribution (pending vs confirmed vs shipped vs delivered)
Customer geography
Average order value
Product rating distribution

Superset analytics dashboard — revenue, top products, order status, customer insights

Stripe Integration: Real Payments

The payment flow uses Stripe Checkout — a hosted payment page that handles card validation, 3D Secure, and PCI compliance without any card data touching my server.

Checkout flow:

User clicks "Proceed to Checkout" in the cart
App creates a Stripe Checkout Session with line items
User is redirected to Stripe's hosted payment page
After payment, Stripe redirects back with a session ID
App verifies payment status with Stripe API
Order is created with status "confirmed" and the Stripe payment intent is saved
Confirmation email is sent

Refunds are handled through the admin panel. The admin clicks a "Refund" button, which calls the Stripe Refund API with the saved payment intent. The order status updates to "refunded" and the customer receives an email notification. Real money flows back through Stripe — verified in the Stripe Dashboard.

Stripe Checkout page — real payment processing in test mode

Order confirmation page with Amazon-style order number

What I'd Do Differently

1. Start with PostgreSQL from day one. I started with SQLite for local development and migrated to PostgreSQL for production. While SQLAlchemy abstracts most differences, there were small quirks (like ALTER TABLE behavior) that caused unnecessary debugging. Starting with PostgreSQL via Docker Compose would have been smoother.

2. Add Alembic migrations early. I used Base.metadata.create_all() for table creation and manual ALTER TABLE for schema changes. This works for a solo project but doesn't scale. Alembic would have given me versioned, repeatable migrations from the start.

4. Use Stripe webhooks instead of redirect-based verification. The current flow relies on the success redirect to verify payment. If the user's browser crashes after payment but before the redirect, the order isn't created even though the card was charged. Webhooks solve this by having Stripe notify the server directly — independent of the browser.

For DevOps Engineers

If you're a DevOps engineer looking to level up your application development skills, this project covers:

How authentication actually works in web apps (JWT, cookies, middleware)
Why database schema design matters (foreign keys, indexes, relationships)
How payment gateways integrate (Stripe's redirect-based flow)
What "full-stack observability" means in practice (traces + metrics + analytics)
How to deploy a real application, not just infrastructure
When microservices make sense vs when a monolith is the right choice

The biggest insight: once you've built and deployed your own application, you understand developers' problems from the inside. That makes you a dramatically better DevOps engineer.

For Business Stakeholders

If you're evaluating this as a technology approach:

The modular monolith architecture keeps development velocity high while maintaining code quality
GitOps deployment means every change is auditable, reversible, and automated
Full observability means issues are detected and debugged in minutes, not hours
Scaling is straightforward — increase replica count for the app, upgrade node sizes for the database
The same codebase serves both the HTML storefront and a REST API (future mobile app ready)

Try It Yourself

Browse products, search, filter by category
Create an account and place a test order (use card 4242 4242 4242 4242)
Check your order history
Leave a product review

Source code: github.com/rajasekhar-cloud25/ecommerce-api

Rajasekhar Reddy is a Senior DevOps Engineer with 7+ years of experience across AWS, Azure, GCP, and hybrid cloud environments. He holds CKA and Azure Administrator certifications. Connect on rajasekharcloud.com.

Building a Production Ecommerce Platform on AWS EKS

Building and Deploying a Full-Stack Ecommerce Platform: From Zero to Production on AWS EKS

The Problem I Was Solving

What I Built

The Application

Architecture: Why I Chose a Modular Monolith

The Tech Stack

Deployment: Production-Grade EKS

Infrastructure (Terraform)

CI/CD Pipeline

GitOps with ArgoCD

Secrets Management

Observability: Seeing Everything

Distributed Tracing (OpenTelemetry + Jaeger)

Metrics (Prometheus + Grafana)

Business Analytics (Apache Superset)

Stripe Integration: Real Payments

What I'd Do Differently

For DevOps Engineers

For Business Stakeholders

Try It Yourself

Comments

More from this blog

🤖 How to Build an AI Agent Using MCP and Connect It to Salesforce (Step-by-Step Guide)

Building a Production-Grade EKS Platform on AWS with Terraform and GitOps

Command Palette

Building and Deploying a Full-Stack Ecommerce Platform: From Zero to Production on AWS EKS

The Problem I Was Solving

What I Built

The Application

Architecture: Why I Chose a Modular Monolith

The Tech Stack

Deployment: Production-Grade EKS

Infrastructure (Terraform)

CI/CD Pipeline

GitOps with ArgoCD

Secrets Management

Observability: Seeing Everything

Distributed Tracing (OpenTelemetry + Jaeger)

Metrics (Prometheus + Grafana)

Business Analytics (Apache Superset)

Stripe Integration: Real Payments

What I'd Do Differently

For DevOps Engineers

For Business Stakeholders

Try It Yourself

Comments

More from this blog