Files
online-boutique/docs/architecture.md
Scaffolder 7e119cad41 initial commit
Change-Id: I9c68c43e939d2c1a3b95a68b71ecc5ba861a4df5
2026-03-05 13:37:56 +00:00

522 lines
16 KiB
Markdown

# Architecture
This document describes the architecture of online-boutique.
## System Overview
```
┌──────────────────────────────────────────────────────────────┐
│ Developer │
│ │
│ Backstage UI → Template → Gitea Repo → CI/CD Workflows │
└────────────────────┬─────────────────────────────────────────┘
│ git push
┌──────────────────────────────────────────────────────────────┐
│ Gitea Actions │
│ │
│ ┌───────────────┐ ┌──────────────────┐ │
│ │ Build & Push │──────▶│ Deploy Humanitec │ │
│ │ - Maven │ │ - humctl score │ │
│ │ - Docker │ │ - Environment │ │
│ │ - ACR Push │ │ - Orchestration │ │
│ └───────────────┘ └──────────────────┘ │
└─────────────┬─────────────────┬──────────────────────────────┘
│ │
│ image │ deployment
▼ ▼
┌────────────────────┐ ┌────────────────────────────────────┐
│ Azure Container │ │ Humanitec Platform │
│ Registry │ │ │
│ │ │ ┌──────────────────────────────┐ │
│ bstagecjotdevacr │ │ │ Score Interpretation │ │
│ │ │ │ Resource Provisioning │ │
│ Images: │ │ │ Environment Management │ │
│ - app:latest │ │ └──────────────────────────────┘ │
│ - app:v1.0.0 │ │ │ │
│ - app:git-sha │ │ │ kubectl apply │
└────────────────────┘ └─────────────┼──────────────────────┘
┌─────────────────────────────────────────────┐
│ Azure Kubernetes Service (AKS) │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Namespace: │ │
│ │ │ │
│ │ ┌──────────────────────────────┐ │ │
│ │ │ Deployment │ │ │
│ │ │ - Replicas: 2 │ │ │
│ │ │ - Health Probes │ │ │
│ │ │ - Resource Limits │ │ │
│ │ │ │ │ │
│ │ │ ┌───────────┐ ┌──────────┐ │ │ │
│ │ │ │ Pod │ │ Pod │ │ │ │
│ │ │ │ Spring │ │ Spring │ │ │ │
│ │ │ │ Boot │ │ Boot │ │ │ │
│ │ │ │ :8080 │ │ :8080 │ │ │ │
│ │ │ └─────┬─────┘ └────┬─────┘ │ │ │
│ │ └────────┼────────────┼───────┘ │ │
│ │ │ │ │ │
│ │ ┌────────▼────────────▼───────┐ │ │
│ │ │ Service (ClusterIP) │ │ │
│ │ │ - Port: 80 → 8080 │ │ │
│ │ └────────┬───────────────────┘ │ │
│ │ │ │ │
│ │ ┌────────▼───────────────────┐ │ │
│ │ │ Ingress │ │ │
│ │ │ - TLS (cert-manager) │ │ │
│ │ │ - Host: app.kyndemo.live │ │ │
│ │ └────────────────────────────┘ │ │
│ └────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────┐ │
│ │ Monitoring Namespace │ │
│ │ │ │
│ │ ┌────────────────────────────┐ │ │
│ │ │ Prometheus │ │ │
│ │ │ - ServiceMonitor │ │ │
│ │ │ - Scrapes /actuator/ │ │ │
│ │ │ prometheus every 30s │ │ │
│ │ └────────────────────────────┘ │ │
│ │ │ │
│ │ ┌────────────────────────────┐ │ │
│ │ │ Grafana │ │ │
│ │ │ - Spring Boot Dashboard │ │ │
│ │ │ - Alerts │ │ │
│ │ └────────────────────────────┘ │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────┘
```
## Component Architecture
### 1. Application Layer
#### Spring Boot Application
**Technology Stack:**
- **Framework**: Spring Boot 3.2
- **Java**: OpenJDK 17 (LTS)
- **Build**: Maven 3.9
- **Runtime**: Embedded Tomcat
**Key Components:**
```java
@SpringBootApplication
public class GoldenPathApplication {
// Auto-configuration
// Component scanning
// Property binding
}
@RestController
public class ApiController {
@GetMapping("/")
public String root();
@GetMapping("/api/status")
public ResponseEntity<Map<String, String>> status();
}
```
**Configuration Management:**
- `application.yml`: Base configuration
- `application-development.yml`: Dev overrides
- `application-production.yml`: Production overrides
- Environment variables: Runtime overrides
### 2. Container Layer
#### Docker Image
**Multi-stage Build:**
```dockerfile
# Stage 1: Build
FROM maven:3.9-eclipse-temurin-17 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn package -DskipTests
# Stage 2: Runtime
FROM eclipse-temurin:17-jre-alpine
WORKDIR /app
COPY --from=builder /app/target/*.jar app.jar
USER 1000
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]
```
**Optimizations:**
- Layer caching for dependencies
- Minimal runtime image (Alpine)
- Non-root user (UID 1000)
- Health check support
### 3. Orchestration Layer
#### Humanitec Score
**Resource Specification:**
```yaml
apiVersion: score.dev/v1b1
metadata:
name: online-boutique
containers:
app:
image: bstagecjotdevacr.azurecr.io/online-boutique:latest
resources:
requests:
memory: 512Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 1000m
service:
ports:
http:
port: 80
targetPort: 8080
resources:
route:
type: route
params:
host: online-boutique.kyndemo.live
```
**Capabilities:**
- Environment-agnostic deployment
- Resource dependencies
- Configuration management
- Automatic rollback
#### Kubernetes Resources
**Fallback Manifests:**
- `deployment.yaml`: Pod specification, replicas, health probes
- `service.yaml`: ClusterIP service for internal routing
- `ingress.yaml`: External access with TLS
- `servicemonitor.yaml`: Prometheus scraping config
### 4. CI/CD Pipeline
#### Build & Push Workflow
**Stages:**
1. **Checkout**: Clone repository
2. **Setup**: Install Maven, Docker
3. **Test**: Run unit & integration tests
4. **Build**: Maven package
5. **Docker**: Build multi-stage image
6. **Auth**: Azure OIDC login
7. **Push**: Push to ACR with tags
**Triggers:**
- Push to `main` branch
- Pull requests
- Manual dispatch
#### Deploy Workflow
**Stages:**
1. **Parse Image**: Extract image reference from build
2. **Setup**: Install humctl CLI
3. **Score Update**: Replace image in score.yaml
4. **Deploy**: Execute humctl score deploy
5. **Verify**: Check deployment status
**Secrets:**
- `HUMANITEC_TOKEN`: Platform authentication
- `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`: OIDC federation
### 5. Observability Layer
#### Metrics Collection
**Flow:**
```
Spring Boot App
└── /actuator/prometheus (HTTP endpoint)
└── Prometheus (scrape every 30s)
└── TSDB (15-day retention)
└── Grafana (visualization)
```
**ServiceMonitor Configuration:**
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
spec:
selector:
matchLabels:
app: online-boutique
endpoints:
- port: http
path: /actuator/prometheus
interval: 30s
```
#### Metrics Categories
1. **HTTP Metrics**:
- Request count/rate
- Response time (avg, p95, p99)
- Status code distribution
2. **JVM Metrics**:
- Heap/non-heap memory
- GC pause time
- Thread count
3. **System Metrics**:
- CPU usage
- File descriptors
- Process uptime
## Data Flow
### Request Flow
```
User Request
Ingress Controller (nginx)
│ TLS termination
│ Host routing
Service (ClusterIP)
│ Load balancing
│ Port mapping
Pod (Spring Boot)
│ Request handling
│ Business logic
Response
```
### Metrics Flow
```
Spring Boot (Micrometer)
│ Collect metrics
│ Format Prometheus
Actuator Endpoint
│ Expose /actuator/prometheus
Prometheus (Scraper)
│ Pull every 30s
│ Store in TSDB
Grafana
│ Query PromQL
│ Render dashboards
User Visualization
```
### Deployment Flow
```
Git Push
Gitea Actions (Webhook)
├── Build Workflow
│ │ Maven test + package
│ │ Docker build
│ │ ACR push
│ └── Output: image reference
└── Deploy Workflow
│ Parse image
│ Update score.yaml
│ humctl score deploy
Humanitec Platform
│ Interpret Score
│ Provision resources
│ Generate manifests
Kubernetes API
│ Apply deployment
│ Create/update resources
│ Schedule pods
Running Application
```
## Security Architecture
### Authentication & Authorization
1. **Azure Workload Identity**:
- OIDC federation for CI/CD
- No static credentials
- Scoped permissions
2. **Service Account**:
- Kubernetes ServiceAccount
- Bound to Azure Managed Identity
- Limited RBAC
3. **Image Pull Secrets**:
- AKS ACR integration
- Managed identity for registry access
### Network Security
1. **Ingress**:
- TLS 1.2+ only
- Cert-manager for automatic cert renewal
- Rate limiting (optional)
2. **Network Policies**:
- Restrict pod-to-pod communication
- Allow only required egress
3. **Service Mesh (Future)**:
- mTLS between services
- Fine-grained authorization
### Application Security
1. **Container**:
- Non-root user (UID 1000)
- Read-only root filesystem
- No privilege escalation
2. **Dependencies**:
- Regular Maven dependency updates
- Vulnerability scanning (Snyk/Trivy)
3. **Secrets Management**:
- Azure Key Vault integration
- CSI driver for secret mounting
- No secrets in environment variables
## Scalability
### Horizontal Scaling
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
```
### Vertical Scaling
Use **VPA (Vertical Pod Autoscaler)** for automatic resource recommendation.
### Database Scaling (Future)
- Connection pooling (HikariCP)
- Read replicas for read-heavy workloads
- Caching layer (Redis)
## High Availability
### Application Level
- **Replicas**: Minimum 2 pods per environment
- **Anti-affinity**: Spread across nodes
- **Readiness probes**: Only route to healthy pods
### Infrastructure Level
- **AKS**: Multi-zone node pools
- **Ingress**: Multiple replicas with PodDisruptionBudget
- **Monitoring**: High availability via Thanos
## Disaster Recovery
### Backup Strategy
1. **Application State**: Stateless, no backup needed
2. **Configuration**: Stored in Git
3. **Metrics**: 15-day retention, export to long-term storage
4. **Container Images**: Retained in ACR with retention policy
### Recovery Procedures
1. **Pod failure**: Automatic restart by kubelet
2. **Node failure**: Automatic rescheduling to healthy nodes
3. **Cluster failure**: Redeploy via Terraform + Humanitec
4. **Regional failure**: Failover to secondary region (if configured)
## Technology Decisions
### Why Spring Boot?
- Industry-standard Java framework
- Rich ecosystem (Actuator, Security, Data)
- Production-ready features out of the box
- Easy testing and debugging
### Why Humanitec?
- Environment-agnostic deployment
- Score specification simplicity
- Resource dependency management
- Reduces K8s complexity
### Why Prometheus + Grafana?
- Cloud-native standard
- Rich query language (PromQL)
- Wide integration support
- Open-source, vendor-neutral
### Why Maven?
- Mature dependency management
- Extensive plugin ecosystem
- Declarative configuration
- Wide adoption in Java community
## Future Enhancements
1. **Database Integration**: PostgreSQL with Flyway migrations
2. **Caching**: Redis for session storage
3. **Messaging**: Kafka for event-driven architecture
4. **Tracing**: Jaeger/Zipkin for distributed tracing
5. **Service Mesh**: Istio for advanced traffic management
6. **Multi-region**: Active-active deployment
## Next Steps
- [Review deployment guide](deployment.md)
- [Configure monitoring](monitoring.md)
- [Return to overview](index.md)