522 lines
16 KiB
Markdown
522 lines
16 KiB
Markdown
# Architecture
|
|
|
|
This document describes the architecture of online-boutique.
|
|
|
|
## System Overview
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ Developer │
|
|
│ │
|
|
│ Backstage UI → Template → Gitea Repo → CI/CD Workflows │
|
|
└────────────────────┬─────────────────────────────────────────┘
|
|
│
|
|
│ git push
|
|
▼
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ Gitea Actions │
|
|
│ │
|
|
│ ┌───────────────┐ ┌──────────────────┐ │
|
|
│ │ Build & Push │──────▶│ Deploy Humanitec │ │
|
|
│ │ - Maven │ │ - humctl score │ │
|
|
│ │ - Docker │ │ - Environment │ │
|
|
│ │ - ACR Push │ │ - Orchestration │ │
|
|
│ └───────────────┘ └──────────────────┘ │
|
|
└─────────────┬─────────────────┬──────────────────────────────┘
|
|
│ │
|
|
│ image │ deployment
|
|
▼ ▼
|
|
┌────────────────────┐ ┌────────────────────────────────────┐
|
|
│ Azure Container │ │ Humanitec Platform │
|
|
│ Registry │ │ │
|
|
│ │ │ ┌──────────────────────────────┐ │
|
|
│ bstagecjotdevacr │ │ │ Score Interpretation │ │
|
|
│ │ │ │ Resource Provisioning │ │
|
|
│ Images: │ │ │ Environment Management │ │
|
|
│ - app:latest │ │ └──────────────────────────────┘ │
|
|
│ - app:v1.0.0 │ │ │ │
|
|
│ - app:git-sha │ │ │ kubectl apply │
|
|
└────────────────────┘ └─────────────┼──────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────┐
|
|
│ Azure Kubernetes Service (AKS) │
|
|
│ │
|
|
│ ┌────────────────────────────────────┐ │
|
|
│ │ Namespace: │ │
|
|
│ │ │ │
|
|
│ │ ┌──────────────────────────────┐ │ │
|
|
│ │ │ Deployment │ │ │
|
|
│ │ │ - Replicas: 2 │ │ │
|
|
│ │ │ - Health Probes │ │ │
|
|
│ │ │ - Resource Limits │ │ │
|
|
│ │ │ │ │ │
|
|
│ │ │ ┌───────────┐ ┌──────────┐ │ │ │
|
|
│ │ │ │ Pod │ │ Pod │ │ │ │
|
|
│ │ │ │ Spring │ │ Spring │ │ │ │
|
|
│ │ │ │ Boot │ │ Boot │ │ │ │
|
|
│ │ │ │ :8080 │ │ :8080 │ │ │ │
|
|
│ │ │ └─────┬─────┘ └────┬─────┘ │ │ │
|
|
│ │ └────────┼────────────┼───────┘ │ │
|
|
│ │ │ │ │ │
|
|
│ │ ┌────────▼────────────▼───────┐ │ │
|
|
│ │ │ Service (ClusterIP) │ │ │
|
|
│ │ │ - Port: 80 → 8080 │ │ │
|
|
│ │ └────────┬───────────────────┘ │ │
|
|
│ │ │ │ │
|
|
│ │ ┌────────▼───────────────────┐ │ │
|
|
│ │ │ Ingress │ │ │
|
|
│ │ │ - TLS (cert-manager) │ │ │
|
|
│ │ │ - Host: app.kyndemo.live │ │ │
|
|
│ │ └────────────────────────────┘ │ │
|
|
│ └────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────┐ │
|
|
│ │ Monitoring Namespace │ │
|
|
│ │ │ │
|
|
│ │ ┌────────────────────────────┐ │ │
|
|
│ │ │ Prometheus │ │ │
|
|
│ │ │ - ServiceMonitor │ │ │
|
|
│ │ │ - Scrapes /actuator/ │ │ │
|
|
│ │ │ prometheus every 30s │ │ │
|
|
│ │ └────────────────────────────┘ │ │
|
|
│ │ │ │
|
|
│ │ ┌────────────────────────────┐ │ │
|
|
│ │ │ Grafana │ │ │
|
|
│ │ │ - Spring Boot Dashboard │ │ │
|
|
│ │ │ - Alerts │ │ │
|
|
│ │ └────────────────────────────┘ │ │
|
|
│ └─────────────────────────────────┘ │
|
|
└─────────────────────────────────────────┘
|
|
```
|
|
|
|
## Component Architecture
|
|
|
|
### 1. Application Layer
|
|
|
|
#### Spring Boot Application
|
|
|
|
**Technology Stack:**
|
|
- **Framework**: Spring Boot 3.2
|
|
- **Java**: OpenJDK 17 (LTS)
|
|
- **Build**: Maven 3.9
|
|
- **Runtime**: Embedded Tomcat
|
|
|
|
**Key Components:**
|
|
|
|
```java
|
|
@SpringBootApplication
|
|
public class GoldenPathApplication {
|
|
// Auto-configuration
|
|
// Component scanning
|
|
// Property binding
|
|
}
|
|
|
|
@RestController
|
|
public class ApiController {
|
|
@GetMapping("/")
|
|
public String root();
|
|
|
|
@GetMapping("/api/status")
|
|
public ResponseEntity<Map<String, String>> status();
|
|
}
|
|
```
|
|
|
|
**Configuration Management:**
|
|
- `application.yml`: Base configuration
|
|
- `application-development.yml`: Dev overrides
|
|
- `application-production.yml`: Production overrides
|
|
- Environment variables: Runtime overrides
|
|
|
|
### 2. Container Layer
|
|
|
|
#### Docker Image
|
|
|
|
**Multi-stage Build:**
|
|
|
|
```dockerfile
|
|
# Stage 1: Build
|
|
FROM maven:3.9-eclipse-temurin-17 AS builder
|
|
WORKDIR /app
|
|
COPY pom.xml .
|
|
RUN mvn dependency:go-offline
|
|
COPY src ./src
|
|
RUN mvn package -DskipTests
|
|
|
|
# Stage 2: Runtime
|
|
FROM eclipse-temurin:17-jre-alpine
|
|
WORKDIR /app
|
|
COPY --from=builder /app/target/*.jar app.jar
|
|
USER 1000
|
|
EXPOSE 8080
|
|
ENTRYPOINT ["java", "-jar", "app.jar"]
|
|
```
|
|
|
|
**Optimizations:**
|
|
- Layer caching for dependencies
|
|
- Minimal runtime image (Alpine)
|
|
- Non-root user (UID 1000)
|
|
- Health check support
|
|
|
|
### 3. Orchestration Layer
|
|
|
|
#### Humanitec Score
|
|
|
|
**Resource Specification:**
|
|
|
|
```yaml
|
|
apiVersion: score.dev/v1b1
|
|
metadata:
|
|
name: online-boutique
|
|
|
|
containers:
|
|
app:
|
|
image: bstagecjotdevacr.azurecr.io/online-boutique:latest
|
|
resources:
|
|
requests:
|
|
memory: 512Mi
|
|
cpu: 250m
|
|
limits:
|
|
memory: 1Gi
|
|
cpu: 1000m
|
|
|
|
service:
|
|
ports:
|
|
http:
|
|
port: 80
|
|
targetPort: 8080
|
|
|
|
resources:
|
|
route:
|
|
type: route
|
|
params:
|
|
host: online-boutique.kyndemo.live
|
|
```
|
|
|
|
**Capabilities:**
|
|
- Environment-agnostic deployment
|
|
- Resource dependencies
|
|
- Configuration management
|
|
- Automatic rollback
|
|
|
|
#### Kubernetes Resources
|
|
|
|
**Fallback Manifests:**
|
|
- `deployment.yaml`: Pod specification, replicas, health probes
|
|
- `service.yaml`: ClusterIP service for internal routing
|
|
- `ingress.yaml`: External access with TLS
|
|
- `servicemonitor.yaml`: Prometheus scraping config
|
|
|
|
### 4. CI/CD Pipeline
|
|
|
|
#### Build & Push Workflow
|
|
|
|
**Stages:**
|
|
|
|
1. **Checkout**: Clone repository
|
|
2. **Setup**: Install Maven, Docker
|
|
3. **Test**: Run unit & integration tests
|
|
4. **Build**: Maven package
|
|
5. **Docker**: Build multi-stage image
|
|
6. **Auth**: Azure OIDC login
|
|
7. **Push**: Push to ACR with tags
|
|
|
|
**Triggers:**
|
|
- Push to `main` branch
|
|
- Pull requests
|
|
- Manual dispatch
|
|
|
|
#### Deploy Workflow
|
|
|
|
**Stages:**
|
|
|
|
1. **Parse Image**: Extract image reference from build
|
|
2. **Setup**: Install humctl CLI
|
|
3. **Score Update**: Replace image in score.yaml
|
|
4. **Deploy**: Execute humctl score deploy
|
|
5. **Verify**: Check deployment status
|
|
|
|
**Secrets:**
|
|
- `HUMANITEC_TOKEN`: Platform authentication
|
|
- `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`: OIDC federation
|
|
|
|
### 5. Observability Layer
|
|
|
|
#### Metrics Collection
|
|
|
|
**Flow:**
|
|
|
|
```
|
|
Spring Boot App
|
|
│
|
|
└── /actuator/prometheus (HTTP endpoint)
|
|
│
|
|
└── Prometheus (scrape every 30s)
|
|
│
|
|
└── TSDB (15-day retention)
|
|
│
|
|
└── Grafana (visualization)
|
|
```
|
|
|
|
**ServiceMonitor Configuration:**
|
|
|
|
```yaml
|
|
apiVersion: monitoring.coreos.com/v1
|
|
kind: ServiceMonitor
|
|
spec:
|
|
selector:
|
|
matchLabels:
|
|
app: online-boutique
|
|
endpoints:
|
|
- port: http
|
|
path: /actuator/prometheus
|
|
interval: 30s
|
|
```
|
|
|
|
#### Metrics Categories
|
|
|
|
1. **HTTP Metrics**:
|
|
- Request count/rate
|
|
- Response time (avg, p95, p99)
|
|
- Status code distribution
|
|
|
|
2. **JVM Metrics**:
|
|
- Heap/non-heap memory
|
|
- GC pause time
|
|
- Thread count
|
|
|
|
3. **System Metrics**:
|
|
- CPU usage
|
|
- File descriptors
|
|
- Process uptime
|
|
|
|
## Data Flow
|
|
|
|
### Request Flow
|
|
|
|
```
|
|
User Request
|
|
│
|
|
▼
|
|
Ingress Controller (nginx)
|
|
│ TLS termination
|
|
│ Host routing
|
|
▼
|
|
Service (ClusterIP)
|
|
│ Load balancing
|
|
│ Port mapping
|
|
▼
|
|
Pod (Spring Boot)
|
|
│ Request handling
|
|
│ Business logic
|
|
▼
|
|
Response
|
|
```
|
|
|
|
### Metrics Flow
|
|
|
|
```
|
|
Spring Boot (Micrometer)
|
|
│ Collect metrics
|
|
│ Format Prometheus
|
|
▼
|
|
Actuator Endpoint
|
|
│ Expose /actuator/prometheus
|
|
▼
|
|
Prometheus (Scraper)
|
|
│ Pull every 30s
|
|
│ Store in TSDB
|
|
▼
|
|
Grafana
|
|
│ Query PromQL
|
|
│ Render dashboards
|
|
▼
|
|
User Visualization
|
|
```
|
|
|
|
### Deployment Flow
|
|
|
|
```
|
|
Git Push
|
|
│
|
|
▼
|
|
Gitea Actions (Webhook)
|
|
│
|
|
├── Build Workflow
|
|
│ │ Maven test + package
|
|
│ │ Docker build
|
|
│ │ ACR push
|
|
│ └── Output: image reference
|
|
│
|
|
└── Deploy Workflow
|
|
│ Parse image
|
|
│ Update score.yaml
|
|
│ humctl score deploy
|
|
│
|
|
▼
|
|
Humanitec Platform
|
|
│ Interpret Score
|
|
│ Provision resources
|
|
│ Generate manifests
|
|
│
|
|
▼
|
|
Kubernetes API
|
|
│ Apply deployment
|
|
│ Create/update resources
|
|
│ Schedule pods
|
|
│
|
|
▼
|
|
Running Application
|
|
```
|
|
|
|
## Security Architecture
|
|
|
|
### Authentication & Authorization
|
|
|
|
1. **Azure Workload Identity**:
|
|
- OIDC federation for CI/CD
|
|
- No static credentials
|
|
- Scoped permissions
|
|
|
|
2. **Service Account**:
|
|
- Kubernetes ServiceAccount
|
|
- Bound to Azure Managed Identity
|
|
- Limited RBAC
|
|
|
|
3. **Image Pull Secrets**:
|
|
- AKS ACR integration
|
|
- Managed identity for registry access
|
|
|
|
### Network Security
|
|
|
|
1. **Ingress**:
|
|
- TLS 1.2+ only
|
|
- Cert-manager for automatic cert renewal
|
|
- Rate limiting (optional)
|
|
|
|
2. **Network Policies**:
|
|
- Restrict pod-to-pod communication
|
|
- Allow only required egress
|
|
|
|
3. **Service Mesh (Future)**:
|
|
- mTLS between services
|
|
- Fine-grained authorization
|
|
|
|
### Application Security
|
|
|
|
1. **Container**:
|
|
- Non-root user (UID 1000)
|
|
- Read-only root filesystem
|
|
- No privilege escalation
|
|
|
|
2. **Dependencies**:
|
|
- Regular Maven dependency updates
|
|
- Vulnerability scanning (Snyk/Trivy)
|
|
|
|
3. **Secrets Management**:
|
|
- Azure Key Vault integration
|
|
- CSI driver for secret mounting
|
|
- No secrets in environment variables
|
|
|
|
## Scalability
|
|
|
|
### Horizontal Scaling
|
|
|
|
```yaml
|
|
apiVersion: autoscaling/v2
|
|
kind: HorizontalPodAutoscaler
|
|
spec:
|
|
minReplicas: 2
|
|
maxReplicas: 10
|
|
metrics:
|
|
- type: Resource
|
|
resource:
|
|
name: cpu
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 70
|
|
- type: Resource
|
|
resource:
|
|
name: memory
|
|
target:
|
|
type: Utilization
|
|
averageUtilization: 80
|
|
```
|
|
|
|
### Vertical Scaling
|
|
|
|
Use **VPA (Vertical Pod Autoscaler)** for automatic resource recommendation.
|
|
|
|
### Database Scaling (Future)
|
|
|
|
- Connection pooling (HikariCP)
|
|
- Read replicas for read-heavy workloads
|
|
- Caching layer (Redis)
|
|
|
|
## High Availability
|
|
|
|
### Application Level
|
|
- **Replicas**: Minimum 2 pods per environment
|
|
- **Anti-affinity**: Spread across nodes
|
|
- **Readiness probes**: Only route to healthy pods
|
|
|
|
### Infrastructure Level
|
|
- **AKS**: Multi-zone node pools
|
|
- **Ingress**: Multiple replicas with PodDisruptionBudget
|
|
- **Monitoring**: High availability via Thanos
|
|
|
|
## Disaster Recovery
|
|
|
|
### Backup Strategy
|
|
1. **Application State**: Stateless, no backup needed
|
|
2. **Configuration**: Stored in Git
|
|
3. **Metrics**: 15-day retention, export to long-term storage
|
|
4. **Container Images**: Retained in ACR with retention policy
|
|
|
|
### Recovery Procedures
|
|
1. **Pod failure**: Automatic restart by kubelet
|
|
2. **Node failure**: Automatic rescheduling to healthy nodes
|
|
3. **Cluster failure**: Redeploy via Terraform + Humanitec
|
|
4. **Regional failure**: Failover to secondary region (if configured)
|
|
|
|
## Technology Decisions
|
|
|
|
### Why Spring Boot?
|
|
- Industry-standard Java framework
|
|
- Rich ecosystem (Actuator, Security, Data)
|
|
- Production-ready features out of the box
|
|
- Easy testing and debugging
|
|
|
|
### Why Humanitec?
|
|
- Environment-agnostic deployment
|
|
- Score specification simplicity
|
|
- Resource dependency management
|
|
- Reduces K8s complexity
|
|
|
|
### Why Prometheus + Grafana?
|
|
- Cloud-native standard
|
|
- Rich query language (PromQL)
|
|
- Wide integration support
|
|
- Open-source, vendor-neutral
|
|
|
|
### Why Maven?
|
|
- Mature dependency management
|
|
- Extensive plugin ecosystem
|
|
- Declarative configuration
|
|
- Wide adoption in Java community
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Database Integration**: PostgreSQL with Flyway migrations
|
|
2. **Caching**: Redis for session storage
|
|
3. **Messaging**: Kafka for event-driven architecture
|
|
4. **Tracing**: Jaeger/Zipkin for distributed tracing
|
|
5. **Service Mesh**: Istio for advanced traffic management
|
|
6. **Multi-region**: Active-active deployment
|
|
|
|
## Next Steps
|
|
|
|
- [Review deployment guide](deployment.md)
|
|
- [Configure monitoring](monitoring.md)
|
|
- [Return to overview](index.md)
|