initial commit
Change-Id: I9c68c43e939d2c1a3b95a68b71ecc5ba861a4df5
This commit is contained in:
521
docs/architecture.md
Normal file
521
docs/architecture.md
Normal file
@@ -0,0 +1,521 @@
|
||||
# Architecture
|
||||
|
||||
This document describes the architecture of online-boutique.
|
||||
|
||||
## System Overview
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Developer │
|
||||
│ │
|
||||
│ Backstage UI → Template → Gitea Repo → CI/CD Workflows │
|
||||
└────────────────────┬─────────────────────────────────────────┘
|
||||
│
|
||||
│ git push
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Gitea Actions │
|
||||
│ │
|
||||
│ ┌───────────────┐ ┌──────────────────┐ │
|
||||
│ │ Build & Push │──────▶│ Deploy Humanitec │ │
|
||||
│ │ - Maven │ │ - humctl score │ │
|
||||
│ │ - Docker │ │ - Environment │ │
|
||||
│ │ - ACR Push │ │ - Orchestration │ │
|
||||
│ └───────────────┘ └──────────────────┘ │
|
||||
└─────────────┬─────────────────┬──────────────────────────────┘
|
||||
│ │
|
||||
│ image │ deployment
|
||||
▼ ▼
|
||||
┌────────────────────┐ ┌────────────────────────────────────┐
|
||||
│ Azure Container │ │ Humanitec Platform │
|
||||
│ Registry │ │ │
|
||||
│ │ │ ┌──────────────────────────────┐ │
|
||||
│ bstagecjotdevacr │ │ │ Score Interpretation │ │
|
||||
│ │ │ │ Resource Provisioning │ │
|
||||
│ Images: │ │ │ Environment Management │ │
|
||||
│ - app:latest │ │ └──────────────────────────────┘ │
|
||||
│ - app:v1.0.0 │ │ │ │
|
||||
│ - app:git-sha │ │ │ kubectl apply │
|
||||
└────────────────────┘ └─────────────┼──────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ Azure Kubernetes Service (AKS) │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────┐ │
|
||||
│ │ Namespace: │ │
|
||||
│ │ │ │
|
||||
│ │ ┌──────────────────────────────┐ │ │
|
||||
│ │ │ Deployment │ │ │
|
||||
│ │ │ - Replicas: 2 │ │ │
|
||||
│ │ │ - Health Probes │ │ │
|
||||
│ │ │ - Resource Limits │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ ┌───────────┐ ┌──────────┐ │ │ │
|
||||
│ │ │ │ Pod │ │ Pod │ │ │ │
|
||||
│ │ │ │ Spring │ │ Spring │ │ │ │
|
||||
│ │ │ │ Boot │ │ Boot │ │ │ │
|
||||
│ │ │ │ :8080 │ │ :8080 │ │ │ │
|
||||
│ │ │ └─────┬─────┘ └────┬─────┘ │ │ │
|
||||
│ │ └────────┼────────────┼───────┘ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ ┌────────▼────────────▼───────┐ │ │
|
||||
│ │ │ Service (ClusterIP) │ │ │
|
||||
│ │ │ - Port: 80 → 8080 │ │ │
|
||||
│ │ └────────┬───────────────────┘ │ │
|
||||
│ │ │ │ │
|
||||
│ │ ┌────────▼───────────────────┐ │ │
|
||||
│ │ │ Ingress │ │ │
|
||||
│ │ │ - TLS (cert-manager) │ │ │
|
||||
│ │ │ - Host: app.kyndemo.live │ │ │
|
||||
│ │ └────────────────────────────┘ │ │
|
||||
│ └────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────┐ │
|
||||
│ │ Monitoring Namespace │ │
|
||||
│ │ │ │
|
||||
│ │ ┌────────────────────────────┐ │ │
|
||||
│ │ │ Prometheus │ │ │
|
||||
│ │ │ - ServiceMonitor │ │ │
|
||||
│ │ │ - Scrapes /actuator/ │ │ │
|
||||
│ │ │ prometheus every 30s │ │ │
|
||||
│ │ └────────────────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ ┌────────────────────────────┐ │ │
|
||||
│ │ │ Grafana │ │ │
|
||||
│ │ │ - Spring Boot Dashboard │ │ │
|
||||
│ │ │ - Alerts │ │ │
|
||||
│ │ └────────────────────────────┘ │ │
|
||||
│ └─────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Component Architecture
|
||||
|
||||
### 1. Application Layer
|
||||
|
||||
#### Spring Boot Application
|
||||
|
||||
**Technology Stack:**
|
||||
- **Framework**: Spring Boot 3.2
|
||||
- **Java**: OpenJDK 17 (LTS)
|
||||
- **Build**: Maven 3.9
|
||||
- **Runtime**: Embedded Tomcat
|
||||
|
||||
**Key Components:**
|
||||
|
||||
```java
|
||||
@SpringBootApplication
|
||||
public class GoldenPathApplication {
|
||||
// Auto-configuration
|
||||
// Component scanning
|
||||
// Property binding
|
||||
}
|
||||
|
||||
@RestController
|
||||
public class ApiController {
|
||||
@GetMapping("/")
|
||||
public String root();
|
||||
|
||||
@GetMapping("/api/status")
|
||||
public ResponseEntity<Map<String, String>> status();
|
||||
}
|
||||
```
|
||||
|
||||
**Configuration Management:**
|
||||
- `application.yml`: Base configuration
|
||||
- `application-development.yml`: Dev overrides
|
||||
- `application-production.yml`: Production overrides
|
||||
- Environment variables: Runtime overrides
|
||||
|
||||
### 2. Container Layer
|
||||
|
||||
#### Docker Image
|
||||
|
||||
**Multi-stage Build:**
|
||||
|
||||
```dockerfile
|
||||
# Stage 1: Build
|
||||
FROM maven:3.9-eclipse-temurin-17 AS builder
|
||||
WORKDIR /app
|
||||
COPY pom.xml .
|
||||
RUN mvn dependency:go-offline
|
||||
COPY src ./src
|
||||
RUN mvn package -DskipTests
|
||||
|
||||
# Stage 2: Runtime
|
||||
FROM eclipse-temurin:17-jre-alpine
|
||||
WORKDIR /app
|
||||
COPY --from=builder /app/target/*.jar app.jar
|
||||
USER 1000
|
||||
EXPOSE 8080
|
||||
ENTRYPOINT ["java", "-jar", "app.jar"]
|
||||
```
|
||||
|
||||
**Optimizations:**
|
||||
- Layer caching for dependencies
|
||||
- Minimal runtime image (Alpine)
|
||||
- Non-root user (UID 1000)
|
||||
- Health check support
|
||||
|
||||
### 3. Orchestration Layer
|
||||
|
||||
#### Humanitec Score
|
||||
|
||||
**Resource Specification:**
|
||||
|
||||
```yaml
|
||||
apiVersion: score.dev/v1b1
|
||||
metadata:
|
||||
name: online-boutique
|
||||
|
||||
containers:
|
||||
app:
|
||||
image: bstagecjotdevacr.azurecr.io/online-boutique:latest
|
||||
resources:
|
||||
requests:
|
||||
memory: 512Mi
|
||||
cpu: 250m
|
||||
limits:
|
||||
memory: 1Gi
|
||||
cpu: 1000m
|
||||
|
||||
service:
|
||||
ports:
|
||||
http:
|
||||
port: 80
|
||||
targetPort: 8080
|
||||
|
||||
resources:
|
||||
route:
|
||||
type: route
|
||||
params:
|
||||
host: online-boutique.kyndemo.live
|
||||
```
|
||||
|
||||
**Capabilities:**
|
||||
- Environment-agnostic deployment
|
||||
- Resource dependencies
|
||||
- Configuration management
|
||||
- Automatic rollback
|
||||
|
||||
#### Kubernetes Resources
|
||||
|
||||
**Fallback Manifests:**
|
||||
- `deployment.yaml`: Pod specification, replicas, health probes
|
||||
- `service.yaml`: ClusterIP service for internal routing
|
||||
- `ingress.yaml`: External access with TLS
|
||||
- `servicemonitor.yaml`: Prometheus scraping config
|
||||
|
||||
### 4. CI/CD Pipeline
|
||||
|
||||
#### Build & Push Workflow
|
||||
|
||||
**Stages:**
|
||||
|
||||
1. **Checkout**: Clone repository
|
||||
2. **Setup**: Install Maven, Docker
|
||||
3. **Test**: Run unit & integration tests
|
||||
4. **Build**: Maven package
|
||||
5. **Docker**: Build multi-stage image
|
||||
6. **Auth**: Azure OIDC login
|
||||
7. **Push**: Push to ACR with tags
|
||||
|
||||
**Triggers:**
|
||||
- Push to `main` branch
|
||||
- Pull requests
|
||||
- Manual dispatch
|
||||
|
||||
#### Deploy Workflow
|
||||
|
||||
**Stages:**
|
||||
|
||||
1. **Parse Image**: Extract image reference from build
|
||||
2. **Setup**: Install humctl CLI
|
||||
3. **Score Update**: Replace image in score.yaml
|
||||
4. **Deploy**: Execute humctl score deploy
|
||||
5. **Verify**: Check deployment status
|
||||
|
||||
**Secrets:**
|
||||
- `HUMANITEC_TOKEN`: Platform authentication
|
||||
- `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`: OIDC federation
|
||||
|
||||
### 5. Observability Layer
|
||||
|
||||
#### Metrics Collection
|
||||
|
||||
**Flow:**
|
||||
|
||||
```
|
||||
Spring Boot App
|
||||
│
|
||||
└── /actuator/prometheus (HTTP endpoint)
|
||||
│
|
||||
└── Prometheus (scrape every 30s)
|
||||
│
|
||||
└── TSDB (15-day retention)
|
||||
│
|
||||
└── Grafana (visualization)
|
||||
```
|
||||
|
||||
**ServiceMonitor Configuration:**
|
||||
|
||||
```yaml
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: ServiceMonitor
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
app: online-boutique
|
||||
endpoints:
|
||||
- port: http
|
||||
path: /actuator/prometheus
|
||||
interval: 30s
|
||||
```
|
||||
|
||||
#### Metrics Categories
|
||||
|
||||
1. **HTTP Metrics**:
|
||||
- Request count/rate
|
||||
- Response time (avg, p95, p99)
|
||||
- Status code distribution
|
||||
|
||||
2. **JVM Metrics**:
|
||||
- Heap/non-heap memory
|
||||
- GC pause time
|
||||
- Thread count
|
||||
|
||||
3. **System Metrics**:
|
||||
- CPU usage
|
||||
- File descriptors
|
||||
- Process uptime
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Request Flow
|
||||
|
||||
```
|
||||
User Request
|
||||
│
|
||||
▼
|
||||
Ingress Controller (nginx)
|
||||
│ TLS termination
|
||||
│ Host routing
|
||||
▼
|
||||
Service (ClusterIP)
|
||||
│ Load balancing
|
||||
│ Port mapping
|
||||
▼
|
||||
Pod (Spring Boot)
|
||||
│ Request handling
|
||||
│ Business logic
|
||||
▼
|
||||
Response
|
||||
```
|
||||
|
||||
### Metrics Flow
|
||||
|
||||
```
|
||||
Spring Boot (Micrometer)
|
||||
│ Collect metrics
|
||||
│ Format Prometheus
|
||||
▼
|
||||
Actuator Endpoint
|
||||
│ Expose /actuator/prometheus
|
||||
▼
|
||||
Prometheus (Scraper)
|
||||
│ Pull every 30s
|
||||
│ Store in TSDB
|
||||
▼
|
||||
Grafana
|
||||
│ Query PromQL
|
||||
│ Render dashboards
|
||||
▼
|
||||
User Visualization
|
||||
```
|
||||
|
||||
### Deployment Flow
|
||||
|
||||
```
|
||||
Git Push
|
||||
│
|
||||
▼
|
||||
Gitea Actions (Webhook)
|
||||
│
|
||||
├── Build Workflow
|
||||
│ │ Maven test + package
|
||||
│ │ Docker build
|
||||
│ │ ACR push
|
||||
│ └── Output: image reference
|
||||
│
|
||||
└── Deploy Workflow
|
||||
│ Parse image
|
||||
│ Update score.yaml
|
||||
│ humctl score deploy
|
||||
│
|
||||
▼
|
||||
Humanitec Platform
|
||||
│ Interpret Score
|
||||
│ Provision resources
|
||||
│ Generate manifests
|
||||
│
|
||||
▼
|
||||
Kubernetes API
|
||||
│ Apply deployment
|
||||
│ Create/update resources
|
||||
│ Schedule pods
|
||||
│
|
||||
▼
|
||||
Running Application
|
||||
```
|
||||
|
||||
## Security Architecture
|
||||
|
||||
### Authentication & Authorization
|
||||
|
||||
1. **Azure Workload Identity**:
|
||||
- OIDC federation for CI/CD
|
||||
- No static credentials
|
||||
- Scoped permissions
|
||||
|
||||
2. **Service Account**:
|
||||
- Kubernetes ServiceAccount
|
||||
- Bound to Azure Managed Identity
|
||||
- Limited RBAC
|
||||
|
||||
3. **Image Pull Secrets**:
|
||||
- AKS ACR integration
|
||||
- Managed identity for registry access
|
||||
|
||||
### Network Security
|
||||
|
||||
1. **Ingress**:
|
||||
- TLS 1.2+ only
|
||||
- Cert-manager for automatic cert renewal
|
||||
- Rate limiting (optional)
|
||||
|
||||
2. **Network Policies**:
|
||||
- Restrict pod-to-pod communication
|
||||
- Allow only required egress
|
||||
|
||||
3. **Service Mesh (Future)**:
|
||||
- mTLS between services
|
||||
- Fine-grained authorization
|
||||
|
||||
### Application Security
|
||||
|
||||
1. **Container**:
|
||||
- Non-root user (UID 1000)
|
||||
- Read-only root filesystem
|
||||
- No privilege escalation
|
||||
|
||||
2. **Dependencies**:
|
||||
- Regular Maven dependency updates
|
||||
- Vulnerability scanning (Snyk/Trivy)
|
||||
|
||||
3. **Secrets Management**:
|
||||
- Azure Key Vault integration
|
||||
- CSI driver for secret mounting
|
||||
- No secrets in environment variables
|
||||
|
||||
## Scalability
|
||||
|
||||
### Horizontal Scaling
|
||||
|
||||
```yaml
|
||||
apiVersion: autoscaling/v2
|
||||
kind: HorizontalPodAutoscaler
|
||||
spec:
|
||||
minReplicas: 2
|
||||
maxReplicas: 10
|
||||
metrics:
|
||||
- type: Resource
|
||||
resource:
|
||||
name: cpu
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 70
|
||||
- type: Resource
|
||||
resource:
|
||||
name: memory
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 80
|
||||
```
|
||||
|
||||
### Vertical Scaling
|
||||
|
||||
Use **VPA (Vertical Pod Autoscaler)** for automatic resource recommendation.
|
||||
|
||||
### Database Scaling (Future)
|
||||
|
||||
- Connection pooling (HikariCP)
|
||||
- Read replicas for read-heavy workloads
|
||||
- Caching layer (Redis)
|
||||
|
||||
## High Availability
|
||||
|
||||
### Application Level
|
||||
- **Replicas**: Minimum 2 pods per environment
|
||||
- **Anti-affinity**: Spread across nodes
|
||||
- **Readiness probes**: Only route to healthy pods
|
||||
|
||||
### Infrastructure Level
|
||||
- **AKS**: Multi-zone node pools
|
||||
- **Ingress**: Multiple replicas with PodDisruptionBudget
|
||||
- **Monitoring**: High availability via Thanos
|
||||
|
||||
## Disaster Recovery
|
||||
|
||||
### Backup Strategy
|
||||
1. **Application State**: Stateless, no backup needed
|
||||
2. **Configuration**: Stored in Git
|
||||
3. **Metrics**: 15-day retention, export to long-term storage
|
||||
4. **Container Images**: Retained in ACR with retention policy
|
||||
|
||||
### Recovery Procedures
|
||||
1. **Pod failure**: Automatic restart by kubelet
|
||||
2. **Node failure**: Automatic rescheduling to healthy nodes
|
||||
3. **Cluster failure**: Redeploy via Terraform + Humanitec
|
||||
4. **Regional failure**: Failover to secondary region (if configured)
|
||||
|
||||
## Technology Decisions
|
||||
|
||||
### Why Spring Boot?
|
||||
- Industry-standard Java framework
|
||||
- Rich ecosystem (Actuator, Security, Data)
|
||||
- Production-ready features out of the box
|
||||
- Easy testing and debugging
|
||||
|
||||
### Why Humanitec?
|
||||
- Environment-agnostic deployment
|
||||
- Score specification simplicity
|
||||
- Resource dependency management
|
||||
- Reduces K8s complexity
|
||||
|
||||
### Why Prometheus + Grafana?
|
||||
- Cloud-native standard
|
||||
- Rich query language (PromQL)
|
||||
- Wide integration support
|
||||
- Open-source, vendor-neutral
|
||||
|
||||
### Why Maven?
|
||||
- Mature dependency management
|
||||
- Extensive plugin ecosystem
|
||||
- Declarative configuration
|
||||
- Wide adoption in Java community
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **Database Integration**: PostgreSQL with Flyway migrations
|
||||
2. **Caching**: Redis for session storage
|
||||
3. **Messaging**: Kafka for event-driven architecture
|
||||
4. **Tracing**: Jaeger/Zipkin for distributed tracing
|
||||
5. **Service Mesh**: Istio for advanced traffic management
|
||||
6. **Multi-region**: Active-active deployment
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Review deployment guide](deployment.md)
|
||||
- [Configure monitoring](monitoring.md)
|
||||
- [Return to overview](index.md)
|
||||
Reference in New Issue
Block a user