initial commit
Change-Id: I9c68c43e939d2c1a3b95a68b71ecc5ba861a4df5
This commit is contained in:
521
docs/architecture.md
Normal file
521
docs/architecture.md
Normal file
@@ -0,0 +1,521 @@
|
||||
# Architecture
|
||||
|
||||
This document describes the architecture of online-boutique.
|
||||
|
||||
## System Overview
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Developer │
|
||||
│ │
|
||||
│ Backstage UI → Template → Gitea Repo → CI/CD Workflows │
|
||||
└────────────────────┬─────────────────────────────────────────┘
|
||||
│
|
||||
│ git push
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Gitea Actions │
|
||||
│ │
|
||||
│ ┌───────────────┐ ┌──────────────────┐ │
|
||||
│ │ Build & Push │──────▶│ Deploy Humanitec │ │
|
||||
│ │ - Maven │ │ - humctl score │ │
|
||||
│ │ - Docker │ │ - Environment │ │
|
||||
│ │ - ACR Push │ │ - Orchestration │ │
|
||||
│ └───────────────┘ └──────────────────┘ │
|
||||
└─────────────┬─────────────────┬──────────────────────────────┘
|
||||
│ │
|
||||
│ image │ deployment
|
||||
▼ ▼
|
||||
┌────────────────────┐ ┌────────────────────────────────────┐
|
||||
│ Azure Container │ │ Humanitec Platform │
|
||||
│ Registry │ │ │
|
||||
│ │ │ ┌──────────────────────────────┐ │
|
||||
│ bstagecjotdevacr │ │ │ Score Interpretation │ │
|
||||
│ │ │ │ Resource Provisioning │ │
|
||||
│ Images: │ │ │ Environment Management │ │
|
||||
│ - app:latest │ │ └──────────────────────────────┘ │
|
||||
│ - app:v1.0.0 │ │ │ │
|
||||
│ - app:git-sha │ │ │ kubectl apply │
|
||||
└────────────────────┘ └─────────────┼──────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ Azure Kubernetes Service (AKS) │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────┐ │
|
||||
│ │ Namespace: │ │
|
||||
│ │ │ │
|
||||
│ │ ┌──────────────────────────────┐ │ │
|
||||
│ │ │ Deployment │ │ │
|
||||
│ │ │ - Replicas: 2 │ │ │
|
||||
│ │ │ - Health Probes │ │ │
|
||||
│ │ │ - Resource Limits │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ ┌───────────┐ ┌──────────┐ │ │ │
|
||||
│ │ │ │ Pod │ │ Pod │ │ │ │
|
||||
│ │ │ │ Spring │ │ Spring │ │ │ │
|
||||
│ │ │ │ Boot │ │ Boot │ │ │ │
|
||||
│ │ │ │ :8080 │ │ :8080 │ │ │ │
|
||||
│ │ │ └─────┬─────┘ └────┬─────┘ │ │ │
|
||||
│ │ └────────┼────────────┼───────┘ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ ┌────────▼────────────▼───────┐ │ │
|
||||
│ │ │ Service (ClusterIP) │ │ │
|
||||
│ │ │ - Port: 80 → 8080 │ │ │
|
||||
│ │ └────────┬───────────────────┘ │ │
|
||||
│ │ │ │ │
|
||||
│ │ ┌────────▼───────────────────┐ │ │
|
||||
│ │ │ Ingress │ │ │
|
||||
│ │ │ - TLS (cert-manager) │ │ │
|
||||
│ │ │ - Host: app.kyndemo.live │ │ │
|
||||
│ │ └────────────────────────────┘ │ │
|
||||
│ └────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────┐ │
|
||||
│ │ Monitoring Namespace │ │
|
||||
│ │ │ │
|
||||
│ │ ┌────────────────────────────┐ │ │
|
||||
│ │ │ Prometheus │ │ │
|
||||
│ │ │ - ServiceMonitor │ │ │
|
||||
│ │ │ - Scrapes /actuator/ │ │ │
|
||||
│ │ │ prometheus every 30s │ │ │
|
||||
│ │ └────────────────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ ┌────────────────────────────┐ │ │
|
||||
│ │ │ Grafana │ │ │
|
||||
│ │ │ - Spring Boot Dashboard │ │ │
|
||||
│ │ │ - Alerts │ │ │
|
||||
│ │ └────────────────────────────┘ │ │
|
||||
│ └─────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Component Architecture
|
||||
|
||||
### 1. Application Layer
|
||||
|
||||
#### Spring Boot Application
|
||||
|
||||
**Technology Stack:**
|
||||
- **Framework**: Spring Boot 3.2
|
||||
- **Java**: OpenJDK 17 (LTS)
|
||||
- **Build**: Maven 3.9
|
||||
- **Runtime**: Embedded Tomcat
|
||||
|
||||
**Key Components:**
|
||||
|
||||
```java
|
||||
@SpringBootApplication
|
||||
public class GoldenPathApplication {
|
||||
// Auto-configuration
|
||||
// Component scanning
|
||||
// Property binding
|
||||
}
|
||||
|
||||
@RestController
|
||||
public class ApiController {
|
||||
@GetMapping("/")
|
||||
public String root();
|
||||
|
||||
@GetMapping("/api/status")
|
||||
public ResponseEntity<Map<String, String>> status();
|
||||
}
|
||||
```
|
||||
|
||||
**Configuration Management:**
|
||||
- `application.yml`: Base configuration
|
||||
- `application-development.yml`: Dev overrides
|
||||
- `application-production.yml`: Production overrides
|
||||
- Environment variables: Runtime overrides
|
||||
|
||||
### 2. Container Layer
|
||||
|
||||
#### Docker Image
|
||||
|
||||
**Multi-stage Build:**
|
||||
|
||||
```dockerfile
|
||||
# Stage 1: Build
|
||||
FROM maven:3.9-eclipse-temurin-17 AS builder
|
||||
WORKDIR /app
|
||||
COPY pom.xml .
|
||||
RUN mvn dependency:go-offline
|
||||
COPY src ./src
|
||||
RUN mvn package -DskipTests
|
||||
|
||||
# Stage 2: Runtime
|
||||
FROM eclipse-temurin:17-jre-alpine
|
||||
WORKDIR /app
|
||||
COPY --from=builder /app/target/*.jar app.jar
|
||||
USER 1000
|
||||
EXPOSE 8080
|
||||
ENTRYPOINT ["java", "-jar", "app.jar"]
|
||||
```
|
||||
|
||||
**Optimizations:**
|
||||
- Layer caching for dependencies
|
||||
- Minimal runtime image (Alpine)
|
||||
- Non-root user (UID 1000)
|
||||
- Health check support
|
||||
|
||||
### 3. Orchestration Layer
|
||||
|
||||
#### Humanitec Score
|
||||
|
||||
**Resource Specification:**
|
||||
|
||||
```yaml
|
||||
apiVersion: score.dev/v1b1
|
||||
metadata:
|
||||
name: online-boutique
|
||||
|
||||
containers:
|
||||
app:
|
||||
image: bstagecjotdevacr.azurecr.io/online-boutique:latest
|
||||
resources:
|
||||
requests:
|
||||
memory: 512Mi
|
||||
cpu: 250m
|
||||
limits:
|
||||
memory: 1Gi
|
||||
cpu: 1000m
|
||||
|
||||
service:
|
||||
ports:
|
||||
http:
|
||||
port: 80
|
||||
targetPort: 8080
|
||||
|
||||
resources:
|
||||
route:
|
||||
type: route
|
||||
params:
|
||||
host: online-boutique.kyndemo.live
|
||||
```
|
||||
|
||||
**Capabilities:**
|
||||
- Environment-agnostic deployment
|
||||
- Resource dependencies
|
||||
- Configuration management
|
||||
- Automatic rollback
|
||||
|
||||
#### Kubernetes Resources
|
||||
|
||||
**Fallback Manifests:**
|
||||
- `deployment.yaml`: Pod specification, replicas, health probes
|
||||
- `service.yaml`: ClusterIP service for internal routing
|
||||
- `ingress.yaml`: External access with TLS
|
||||
- `servicemonitor.yaml`: Prometheus scraping config
|
||||
|
||||
### 4. CI/CD Pipeline
|
||||
|
||||
#### Build & Push Workflow
|
||||
|
||||
**Stages:**
|
||||
|
||||
1. **Checkout**: Clone repository
|
||||
2. **Setup**: Install Maven, Docker
|
||||
3. **Test**: Run unit & integration tests
|
||||
4. **Build**: Maven package
|
||||
5. **Docker**: Build multi-stage image
|
||||
6. **Auth**: Azure OIDC login
|
||||
7. **Push**: Push to ACR with tags
|
||||
|
||||
**Triggers:**
|
||||
- Push to `main` branch
|
||||
- Pull requests
|
||||
- Manual dispatch
|
||||
|
||||
#### Deploy Workflow
|
||||
|
||||
**Stages:**
|
||||
|
||||
1. **Parse Image**: Extract image reference from build
|
||||
2. **Setup**: Install humctl CLI
|
||||
3. **Score Update**: Replace image in score.yaml
|
||||
4. **Deploy**: Execute humctl score deploy
|
||||
5. **Verify**: Check deployment status
|
||||
|
||||
**Secrets:**
|
||||
- `HUMANITEC_TOKEN`: Platform authentication
|
||||
- `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`: OIDC federation
|
||||
|
||||
### 5. Observability Layer
|
||||
|
||||
#### Metrics Collection
|
||||
|
||||
**Flow:**
|
||||
|
||||
```
|
||||
Spring Boot App
|
||||
│
|
||||
└── /actuator/prometheus (HTTP endpoint)
|
||||
│
|
||||
└── Prometheus (scrape every 30s)
|
||||
│
|
||||
└── TSDB (15-day retention)
|
||||
│
|
||||
└── Grafana (visualization)
|
||||
```
|
||||
|
||||
**ServiceMonitor Configuration:**
|
||||
|
||||
```yaml
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: ServiceMonitor
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
app: online-boutique
|
||||
endpoints:
|
||||
- port: http
|
||||
path: /actuator/prometheus
|
||||
interval: 30s
|
||||
```
|
||||
|
||||
#### Metrics Categories
|
||||
|
||||
1. **HTTP Metrics**:
|
||||
- Request count/rate
|
||||
- Response time (avg, p95, p99)
|
||||
- Status code distribution
|
||||
|
||||
2. **JVM Metrics**:
|
||||
- Heap/non-heap memory
|
||||
- GC pause time
|
||||
- Thread count
|
||||
|
||||
3. **System Metrics**:
|
||||
- CPU usage
|
||||
- File descriptors
|
||||
- Process uptime
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Request Flow
|
||||
|
||||
```
|
||||
User Request
|
||||
│
|
||||
▼
|
||||
Ingress Controller (nginx)
|
||||
│ TLS termination
|
||||
│ Host routing
|
||||
▼
|
||||
Service (ClusterIP)
|
||||
│ Load balancing
|
||||
│ Port mapping
|
||||
▼
|
||||
Pod (Spring Boot)
|
||||
│ Request handling
|
||||
│ Business logic
|
||||
▼
|
||||
Response
|
||||
```
|
||||
|
||||
### Metrics Flow
|
||||
|
||||
```
|
||||
Spring Boot (Micrometer)
|
||||
│ Collect metrics
|
||||
│ Format Prometheus
|
||||
▼
|
||||
Actuator Endpoint
|
||||
│ Expose /actuator/prometheus
|
||||
▼
|
||||
Prometheus (Scraper)
|
||||
│ Pull every 30s
|
||||
│ Store in TSDB
|
||||
▼
|
||||
Grafana
|
||||
│ Query PromQL
|
||||
│ Render dashboards
|
||||
▼
|
||||
User Visualization
|
||||
```
|
||||
|
||||
### Deployment Flow
|
||||
|
||||
```
|
||||
Git Push
|
||||
│
|
||||
▼
|
||||
Gitea Actions (Webhook)
|
||||
│
|
||||
├── Build Workflow
|
||||
│ │ Maven test + package
|
||||
│ │ Docker build
|
||||
│ │ ACR push
|
||||
│ └── Output: image reference
|
||||
│
|
||||
└── Deploy Workflow
|
||||
│ Parse image
|
||||
│ Update score.yaml
|
||||
│ humctl score deploy
|
||||
│
|
||||
▼
|
||||
Humanitec Platform
|
||||
│ Interpret Score
|
||||
│ Provision resources
|
||||
│ Generate manifests
|
||||
│
|
||||
▼
|
||||
Kubernetes API
|
||||
│ Apply deployment
|
||||
│ Create/update resources
|
||||
│ Schedule pods
|
||||
│
|
||||
▼
|
||||
Running Application
|
||||
```
|
||||
|
||||
## Security Architecture
|
||||
|
||||
### Authentication & Authorization
|
||||
|
||||
1. **Azure Workload Identity**:
|
||||
- OIDC federation for CI/CD
|
||||
- No static credentials
|
||||
- Scoped permissions
|
||||
|
||||
2. **Service Account**:
|
||||
- Kubernetes ServiceAccount
|
||||
- Bound to Azure Managed Identity
|
||||
- Limited RBAC
|
||||
|
||||
3. **Image Pull Secrets**:
|
||||
- AKS ACR integration
|
||||
- Managed identity for registry access
|
||||
|
||||
### Network Security
|
||||
|
||||
1. **Ingress**:
|
||||
- TLS 1.2+ only
|
||||
- Cert-manager for automatic cert renewal
|
||||
- Rate limiting (optional)
|
||||
|
||||
2. **Network Policies**:
|
||||
- Restrict pod-to-pod communication
|
||||
- Allow only required egress
|
||||
|
||||
3. **Service Mesh (Future)**:
|
||||
- mTLS between services
|
||||
- Fine-grained authorization
|
||||
|
||||
### Application Security
|
||||
|
||||
1. **Container**:
|
||||
- Non-root user (UID 1000)
|
||||
- Read-only root filesystem
|
||||
- No privilege escalation
|
||||
|
||||
2. **Dependencies**:
|
||||
- Regular Maven dependency updates
|
||||
- Vulnerability scanning (Snyk/Trivy)
|
||||
|
||||
3. **Secrets Management**:
|
||||
- Azure Key Vault integration
|
||||
- CSI driver for secret mounting
|
||||
- No secrets in environment variables
|
||||
|
||||
## Scalability
|
||||
|
||||
### Horizontal Scaling
|
||||
|
||||
```yaml
|
||||
apiVersion: autoscaling/v2
|
||||
kind: HorizontalPodAutoscaler
|
||||
spec:
|
||||
minReplicas: 2
|
||||
maxReplicas: 10
|
||||
metrics:
|
||||
- type: Resource
|
||||
resource:
|
||||
name: cpu
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 70
|
||||
- type: Resource
|
||||
resource:
|
||||
name: memory
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 80
|
||||
```
|
||||
|
||||
### Vertical Scaling
|
||||
|
||||
Use **VPA (Vertical Pod Autoscaler)** for automatic resource recommendation.
|
||||
|
||||
### Database Scaling (Future)
|
||||
|
||||
- Connection pooling (HikariCP)
|
||||
- Read replicas for read-heavy workloads
|
||||
- Caching layer (Redis)
|
||||
|
||||
## High Availability
|
||||
|
||||
### Application Level
|
||||
- **Replicas**: Minimum 2 pods per environment
|
||||
- **Anti-affinity**: Spread across nodes
|
||||
- **Readiness probes**: Only route to healthy pods
|
||||
|
||||
### Infrastructure Level
|
||||
- **AKS**: Multi-zone node pools
|
||||
- **Ingress**: Multiple replicas with PodDisruptionBudget
|
||||
- **Monitoring**: High availability via Thanos
|
||||
|
||||
## Disaster Recovery
|
||||
|
||||
### Backup Strategy
|
||||
1. **Application State**: Stateless, no backup needed
|
||||
2. **Configuration**: Stored in Git
|
||||
3. **Metrics**: 15-day retention, export to long-term storage
|
||||
4. **Container Images**: Retained in ACR with retention policy
|
||||
|
||||
### Recovery Procedures
|
||||
1. **Pod failure**: Automatic restart by kubelet
|
||||
2. **Node failure**: Automatic rescheduling to healthy nodes
|
||||
3. **Cluster failure**: Redeploy via Terraform + Humanitec
|
||||
4. **Regional failure**: Failover to secondary region (if configured)
|
||||
|
||||
## Technology Decisions
|
||||
|
||||
### Why Spring Boot?
|
||||
- Industry-standard Java framework
|
||||
- Rich ecosystem (Actuator, Security, Data)
|
||||
- Production-ready features out of the box
|
||||
- Easy testing and debugging
|
||||
|
||||
### Why Humanitec?
|
||||
- Environment-agnostic deployment
|
||||
- Score specification simplicity
|
||||
- Resource dependency management
|
||||
- Reduces K8s complexity
|
||||
|
||||
### Why Prometheus + Grafana?
|
||||
- Cloud-native standard
|
||||
- Rich query language (PromQL)
|
||||
- Wide integration support
|
||||
- Open-source, vendor-neutral
|
||||
|
||||
### Why Maven?
|
||||
- Mature dependency management
|
||||
- Extensive plugin ecosystem
|
||||
- Declarative configuration
|
||||
- Wide adoption in Java community
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **Database Integration**: PostgreSQL with Flyway migrations
|
||||
2. **Caching**: Redis for session storage
|
||||
3. **Messaging**: Kafka for event-driven architecture
|
||||
4. **Tracing**: Jaeger/Zipkin for distributed tracing
|
||||
5. **Service Mesh**: Istio for advanced traffic management
|
||||
6. **Multi-region**: Active-active deployment
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Review deployment guide](deployment.md)
|
||||
- [Configure monitoring](monitoring.md)
|
||||
- [Return to overview](index.md)
|
||||
384
docs/deployment.md
Normal file
384
docs/deployment.md
Normal file
@@ -0,0 +1,384 @@
|
||||
# Deployment Guide
|
||||
|
||||
This guide covers deploying online-boutique to Azure Kubernetes Service via Humanitec or ArgoCD.
|
||||
|
||||
## Deployment Methods
|
||||
|
||||
### 1. Humanitec Platform Orchestrator (Primary)
|
||||
|
||||
Humanitec manages deployments using the `score.yaml` specification, automatically provisioning resources and handling promotions across environments.
|
||||
|
||||
#### Prerequisites
|
||||
|
||||
- Humanitec Organization: `kyn-cjot`
|
||||
- Application registered in Humanitec
|
||||
- Environments created (development, staging, production)
|
||||
- Gitea Actions configured with HUMANITEC_TOKEN secret
|
||||
|
||||
#### Automatic Deployment (via Gitea Actions)
|
||||
|
||||
Push to trigger workflows:
|
||||
|
||||
```bash
|
||||
git add .
|
||||
git commit -m "feat: new feature"
|
||||
git push origin main
|
||||
```
|
||||
|
||||
**Build & Push Workflow** (`.gitea/workflows/build-push.yml`):
|
||||
1. Maven build & test
|
||||
2. Docker image build
|
||||
3. Push to Azure Container Registry (ACR)
|
||||
4. Tags: `latest`, `git-SHA`, `semantic-version`
|
||||
|
||||
**Deploy Workflow** (`.gitea/workflows/deploy-humanitec.yml`):
|
||||
1. Parses image from build
|
||||
2. Updates score.yaml with image reference
|
||||
3. Deploys to Humanitec environment
|
||||
4. Triggers orchestration
|
||||
|
||||
#### Manual Deployment with humctl CLI
|
||||
|
||||
Install Humanitec CLI:
|
||||
|
||||
```bash
|
||||
# macOS
|
||||
brew install humanitec/tap/humctl
|
||||
|
||||
# Linux/Windows
|
||||
curl -s https://get.humanitec.io/install.sh | bash
|
||||
```
|
||||
|
||||
Login:
|
||||
|
||||
```bash
|
||||
humctl login --org kyn-cjot
|
||||
```
|
||||
|
||||
Deploy from Score:
|
||||
|
||||
```bash
|
||||
humctl score deploy \
|
||||
--org kyn-cjot \
|
||||
--app online-boutique \
|
||||
--env development \
|
||||
--file score.yaml \
|
||||
--image bstagecjotdevacr.azurecr.io/online-boutique:latest \
|
||||
--message "Manual deployment from local"
|
||||
```
|
||||
|
||||
Deploy specific version:
|
||||
|
||||
```bash
|
||||
humctl score deploy \
|
||||
--org kyn-cjot \
|
||||
--app online-boutique \
|
||||
--env production \
|
||||
--file score.yaml \
|
||||
--image bstagecjotdevacr.azurecr.io/online-boutique:v1.2.3 \
|
||||
--message "Production release v1.2.3"
|
||||
```
|
||||
|
||||
#### Environment Promotion
|
||||
|
||||
Promote from development → staging:
|
||||
|
||||
```bash
|
||||
humctl deploy \
|
||||
--org kyn-cjot \
|
||||
--app online-boutique \
|
||||
--env staging \
|
||||
--from development \
|
||||
--message "Promote to staging after testing"
|
||||
```
|
||||
|
||||
Promote to production:
|
||||
|
||||
```bash
|
||||
humctl deploy \
|
||||
--org kyn-cjot \
|
||||
--app online-boutique \
|
||||
--env production \
|
||||
--from staging \
|
||||
--message "Production release"
|
||||
```
|
||||
|
||||
#### Check Deployment Status
|
||||
|
||||
```bash
|
||||
# List deployments
|
||||
humctl get deployments \
|
||||
--org kyn-cjot \
|
||||
--app online-boutique \
|
||||
--env development
|
||||
|
||||
# Get specific deployment
|
||||
humctl get deployment <DEPLOYMENT_ID> \
|
||||
--org kyn-cjot \
|
||||
--app online-boutique \
|
||||
--env development
|
||||
|
||||
# View deployment logs
|
||||
humctl logs \
|
||||
--org kyn-cjot \
|
||||
--app online-boutique \
|
||||
--env development
|
||||
```
|
||||
|
||||
### 2. ArgoCD GitOps (Fallback)
|
||||
|
||||
If Humanitec is unavailable, use ArgoCD with Kubernetes manifests in `deploy/`.
|
||||
|
||||
#### Create ArgoCD Application
|
||||
|
||||
```yaml
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: online-boutique
|
||||
namespace: argocd
|
||||
spec:
|
||||
project: default
|
||||
source:
|
||||
repoURL: https://gitea.kyndemo.live/validate/online-boutique.git
|
||||
targetRevision: main
|
||||
path: deploy
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace:
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
```
|
||||
|
||||
Apply:
|
||||
|
||||
```bash
|
||||
kubectl apply -f argocd-app.yaml
|
||||
```
|
||||
|
||||
#### Manual Deploy with kubectl
|
||||
|
||||
Update image in `deploy/kustomization.yaml`:
|
||||
|
||||
```yaml
|
||||
images:
|
||||
- name: app-image
|
||||
newName: bstagecjotdevacr.azurecr.io/online-boutique
|
||||
newTag: v1.2.3
|
||||
```
|
||||
|
||||
Deploy:
|
||||
|
||||
```bash
|
||||
kubectl apply -k deploy/
|
||||
```
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
kubectl -n get pods
|
||||
kubectl -n get svc
|
||||
kubectl -n get ing
|
||||
```
|
||||
|
||||
## Kubernetes Access
|
||||
|
||||
### Get AKS Credentials
|
||||
|
||||
```bash
|
||||
az aks get-credentials \
|
||||
--resource-group bstage-cjot-dev \
|
||||
--name bstage-cjot-dev-aks \
|
||||
--overwrite-existing
|
||||
```
|
||||
|
||||
### View Application
|
||||
|
||||
```bash
|
||||
# List pods
|
||||
kubectl -n get pods
|
||||
|
||||
# Check pod logs
|
||||
kubectl -n logs -f deployment/online-boutique
|
||||
|
||||
# Describe deployment
|
||||
kubectl -n describe deployment online-boutique
|
||||
|
||||
# Port-forward for local access
|
||||
kubectl -n port-forward svc/online-boutique-service 8080:80
|
||||
```
|
||||
|
||||
### Check Health
|
||||
|
||||
```bash
|
||||
# Health endpoint
|
||||
kubectl -n exec -it deployment/online-boutique -- \
|
||||
curl http://localhost:8080/actuator/health
|
||||
|
||||
# Metrics endpoint
|
||||
kubectl -n exec -it deployment/online-boutique -- \
|
||||
curl http://localhost:8080/actuator/prometheus
|
||||
```
|
||||
|
||||
## Environment Configuration
|
||||
|
||||
### Development
|
||||
|
||||
- **Purpose**: Active development, frequent deployments
|
||||
- **Image Tag**: `latest` or `git-SHA`
|
||||
- **Replicas**: 1
|
||||
- **Resources**: Minimal (requests: 256Mi RAM, 250m CPU)
|
||||
- **Monitoring**: Prometheus scraping enabled
|
||||
|
||||
### Staging
|
||||
|
||||
- **Purpose**: Pre-production testing, integration tests
|
||||
- **Image Tag**: Semantic version (e.g., `v1.2.3-rc.1`)
|
||||
- **Replicas**: 2
|
||||
- **Resources**: Production-like (requests: 512Mi RAM, 500m CPU)
|
||||
- **Monitoring**: Full observability stack
|
||||
|
||||
### Production
|
||||
|
||||
- **Purpose**: Live traffic, stable releases
|
||||
- **Image Tag**: Semantic version (e.g., `v1.2.3`)
|
||||
- **Replicas**: 3+ (autoscaling)
|
||||
- **Resources**: Right-sized (requests: 1Gi RAM, 1 CPU)
|
||||
- **Monitoring**: Alerts enabled, SLO tracking
|
||||
|
||||
## Rollback Procedures
|
||||
|
||||
### Humanitec Rollback
|
||||
|
||||
```bash
|
||||
# List previous deployments
|
||||
humctl get deployments \
|
||||
--org kyn-cjot \
|
||||
--app online-boutique \
|
||||
--env production
|
||||
|
||||
# Rollback to specific deployment
|
||||
humctl deploy \
|
||||
--org kyn-cjot \
|
||||
--app online-boutique \
|
||||
--env production \
|
||||
--deployment-id <PREVIOUS_DEPLOYMENT_ID> \
|
||||
--message "Rollback due to issue"
|
||||
```
|
||||
|
||||
### Kubernetes Rollback
|
||||
|
||||
```bash
|
||||
# Rollback to previous revision
|
||||
kubectl -n rollout undo deployment/online-boutique
|
||||
|
||||
# Rollback to specific revision
|
||||
kubectl -n rollout undo deployment/online-boutique --to-revision=2
|
||||
|
||||
# Check rollout status
|
||||
kubectl -n rollout status deployment/online-boutique
|
||||
|
||||
# View rollout history
|
||||
kubectl -n rollout history deployment/online-boutique
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Pod Not Starting
|
||||
|
||||
```bash
|
||||
# Check pod events
|
||||
kubectl -n describe pod <POD_NAME>
|
||||
|
||||
# Check logs
|
||||
kubectl -n logs <POD_NAME>
|
||||
|
||||
# Check previous container logs (if restarting)
|
||||
kubectl -n logs <POD_NAME> --previous
|
||||
```
|
||||
|
||||
### Image Pull Errors
|
||||
|
||||
```bash
|
||||
# Verify ACR access
|
||||
az acr login --name bstagecjotdevacr
|
||||
|
||||
# Check image exists
|
||||
az acr repository show-tags --name bstagecjotdevacr --repository online-boutique
|
||||
|
||||
# Verify AKS ACR integration
|
||||
az aks check-acr \
|
||||
--resource-group bstage-cjot-dev \
|
||||
--name bstage-cjot-dev-aks \
|
||||
--acr bstagecjotdevacr.azurecr.io
|
||||
```
|
||||
|
||||
### Service Not Accessible
|
||||
|
||||
```bash
|
||||
# Check service endpoints
|
||||
kubectl -n get endpoints online-boutique-service
|
||||
|
||||
# Check ingress
|
||||
kubectl -n describe ingress online-boutique-ingress
|
||||
|
||||
# Test internal connectivity
|
||||
kubectl -n run curl-test --image=curlimages/curl:latest --rm -it --restart=Never -- \
|
||||
curl http://online-boutique-service/actuator/health
|
||||
```
|
||||
|
||||
### Humanitec Deployment Stuck
|
||||
|
||||
```bash
|
||||
# Check deployment status
|
||||
humctl get deployment <DEPLOYMENT_ID> \
|
||||
--org kyn-cjot \
|
||||
--app online-boutique \
|
||||
--env development
|
||||
|
||||
# View error logs
|
||||
humctl logs \
|
||||
--org kyn-cjot \
|
||||
--app online-boutique \
|
||||
--env development \
|
||||
--deployment-id <DEPLOYMENT_ID>
|
||||
|
||||
# Cancel stuck deployment
|
||||
humctl delete deployment <DEPLOYMENT_ID> \
|
||||
--org kyn-cjot \
|
||||
--app online-boutique \
|
||||
--env development
|
||||
```
|
||||
|
||||
### Resource Issues
|
||||
|
||||
```bash
|
||||
# Check resource usage
|
||||
kubectl -n top pods
|
||||
|
||||
# Describe pod for resource constraints
|
||||
kubectl -n describe pod <POD_NAME> | grep -A 10 "Conditions:"
|
||||
|
||||
# Check node capacity
|
||||
kubectl describe nodes | grep -A 10 "Allocated resources:"
|
||||
```
|
||||
|
||||
## Blue-Green Deployments
|
||||
|
||||
For zero-downtime deployments with Humanitec:
|
||||
|
||||
1. Deploy new version to staging
|
||||
2. Run smoke tests
|
||||
3. Promote to production with traffic splitting
|
||||
4. Monitor metrics
|
||||
5. Complete cutover or rollback
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Configure monitoring](monitoring.md)
|
||||
- [Review architecture](architecture.md)
|
||||
- [Return to overview](index.md)
|
||||
130
docs/index.md
Normal file
130
docs/index.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# online-boutique
|
||||
|
||||
## Overview
|
||||
|
||||
**online-boutique** is a production-ready Java microservice built using the Kyndryl Platform Engineering Golden Path.
|
||||
|
||||
!!! info "Service Information"
|
||||
- **Description**: Java microservice via Golden Path
|
||||
- **Environment**: development
|
||||
- **Technology**: Spring Boot 3.2, Java 17
|
||||
- **Orchestration**: Humanitec
|
||||
- **Observability**: Prometheus + Grafana
|
||||
|
||||
## Quick Links
|
||||
|
||||
- [Repository](https://gitea.kyndemo.live/validate/online-boutique)
|
||||
- [Humanitec Console](https://app.humanitec.io/orgs/kyn-cjot/apps/online-boutique)
|
||||
- [Grafana Dashboard](https://grafana.kyndemo.live/d/spring-boot-dashboard?var-app=online-boutique)
|
||||
|
||||
## Features
|
||||
|
||||
✅ **Production-Ready Configuration**
|
||||
- Health checks (liveness, readiness, startup)
|
||||
- Graceful shutdown
|
||||
- Resource limits and requests
|
||||
- Security contexts
|
||||
|
||||
✅ **Observability**
|
||||
- Prometheus metrics integration
|
||||
- Pre-configured Grafana dashboards
|
||||
- Structured logging
|
||||
- Request tracing
|
||||
|
||||
✅ **CI/CD**
|
||||
- Automated builds via GitHub Actions
|
||||
- Azure Container Registry integration
|
||||
- Humanitec deployment automation
|
||||
- GitOps fallback with ArgoCD
|
||||
|
||||
✅ **Developer Experience**
|
||||
- Local development support
|
||||
- Hot reload with Spring DevTools
|
||||
- Comprehensive tests
|
||||
- API documentation
|
||||
|
||||
## Architecture
|
||||
|
||||
This service follows the golden path architecture:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ Developer Experience │
|
||||
│ (Backstage Template → Gitea Repo) │
|
||||
└─────────────────────────────────────────┘
|
||||
│
|
||||
│ git push
|
||||
▼
|
||||
┌─────────────────────────────────────────┐
|
||||
│ GitHub Actions CI/CD │
|
||||
│ 1. Build with Maven │
|
||||
│ 2. Run tests │
|
||||
│ 3. Build Docker image │
|
||||
│ 4. Push to ACR │
|
||||
│ 5. Deploy via Humanitec │
|
||||
└─────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────┐
|
||||
│ Humanitec Orchestrator │
|
||||
│ - Interprets score.yaml │
|
||||
│ - Provisions resources │
|
||||
│ - Deploys to AKS │
|
||||
└─────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────┐
|
||||
│ Azure AKS Cluster │
|
||||
│ - Pods with app containers │
|
||||
│ - Prometheus scraping metrics │
|
||||
│ - Service mesh (optional) │
|
||||
└─────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────┐
|
||||
│ Grafana + Prometheus │
|
||||
│ - Real-time metrics │
|
||||
│ - Dashboards │
|
||||
│ - Alerting │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Application Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/` | GET | Welcome message |
|
||||
| `/api/status` | GET | Service health status |
|
||||
|
||||
### Actuator Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/actuator/health` | GET | Overall health |
|
||||
| `/actuator/health/liveness` | GET | Liveness probe |
|
||||
| `/actuator/health/readiness` | GET | Readiness probe |
|
||||
| `/actuator/metrics` | GET | Available metrics |
|
||||
| `/actuator/prometheus` | GET | Prometheus metrics |
|
||||
| `/actuator/info` | GET | Application info |
|
||||
|
||||
## Technology Stack
|
||||
|
||||
- **Language**: Java 17
|
||||
- **Framework**: Spring Boot 3.2.0
|
||||
- **Build Tool**: Maven 3.9
|
||||
- **Metrics**: Micrometer + Prometheus
|
||||
- **Container**: Docker (Alpine-based)
|
||||
- **Orchestration**: Humanitec (Score)
|
||||
- **CI/CD**: GitHub Actions
|
||||
- **Registry**: Azure Container Registry
|
||||
- **Kubernetes**: Azure AKS
|
||||
- **Monitoring**: Prometheus + Grafana
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Set up local development environment](local-development.md)
|
||||
- [Learn about deployment process](deployment.md)
|
||||
- [Configure monitoring and alerts](monitoring.md)
|
||||
- [Understand the architecture](architecture.md)
|
||||
279
docs/local-development.md
Normal file
279
docs/local-development.md
Normal file
@@ -0,0 +1,279 @@
|
||||
# Local Development
|
||||
|
||||
This guide covers setting up and running online-boutique on your local machine.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **Java 17** or higher ([Download](https://adoptium.net/))
|
||||
- **Maven 3.9+** (included via Maven Wrapper)
|
||||
- **Docker** (optional, for container testing)
|
||||
- **Git**
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Clone the Repository
|
||||
|
||||
```bash
|
||||
git clone https://gitea.kyndemo.live/validate/online-boutique.git
|
||||
cd online-boutique
|
||||
```
|
||||
|
||||
### 2. Build the Application
|
||||
|
||||
```bash
|
||||
# Using Maven Wrapper (recommended)
|
||||
./mvnw clean package
|
||||
|
||||
# Or with system Maven
|
||||
mvn clean package
|
||||
```
|
||||
|
||||
### 3. Run the Application
|
||||
|
||||
```bash
|
||||
# Run with Spring Boot Maven plugin
|
||||
./mvnw spring-boot:run
|
||||
|
||||
# Or run the JAR directly
|
||||
java -jar target/online-boutique-1.0.0-SNAPSHOT.jar
|
||||
```
|
||||
|
||||
The application will start on **http://localhost:8080**
|
||||
|
||||
### 4. Verify It's Running
|
||||
|
||||
```bash
|
||||
# Check health
|
||||
curl http://localhost:8080/actuator/health
|
||||
|
||||
# Check status
|
||||
curl http://localhost:8080/api/status
|
||||
|
||||
# View metrics
|
||||
curl http://localhost:8080/actuator/prometheus
|
||||
```
|
||||
|
||||
## Development Workflow
|
||||
|
||||
### Hot Reload with Spring DevTools
|
||||
|
||||
For automatic restarts during development, add Spring DevTools to `pom.xml`:
|
||||
|
||||
```xml
|
||||
<dependency>
|
||||
<groupId>org.springframework.boot</groupId>
|
||||
<artifactId>spring-boot-devtools</artifactId>
|
||||
<scope>runtime</scope>
|
||||
<optional>true</optional>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
Changes to Java files will trigger automatic restarts.
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
# Run all tests
|
||||
./mvnw test
|
||||
|
||||
# Run specific test class
|
||||
./mvnw test -Dtest=GoldenPathApplicationTests
|
||||
|
||||
# Run tests with coverage
|
||||
./mvnw test jacoco:report
|
||||
```
|
||||
|
||||
### Active Profile
|
||||
|
||||
Set active profile via environment variable:
|
||||
|
||||
```bash
|
||||
# Development profile
|
||||
export SPRING_PROFILES_ACTIVE=development
|
||||
./mvnw spring-boot:run
|
||||
|
||||
# Or inline
|
||||
SPRING_PROFILES_ACTIVE=development ./mvnw spring-boot:run
|
||||
```
|
||||
|
||||
## Docker Development
|
||||
|
||||
### Build Image Locally
|
||||
|
||||
```bash
|
||||
docker build -t online-boutique:dev .
|
||||
```
|
||||
|
||||
### Run in Docker
|
||||
|
||||
```bash
|
||||
docker run -p 8080:8080 \
|
||||
-e SPRING_PROFILES_ACTIVE=development \
|
||||
online-boutique:dev
|
||||
```
|
||||
|
||||
### Docker Compose (if needed)
|
||||
|
||||
Create `docker-compose.yml`:
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
services:
|
||||
app:
|
||||
build: .
|
||||
ports:
|
||||
- "8080:8080"
|
||||
environment:
|
||||
- SPRING_PROFILES_ACTIVE=development
|
||||
```
|
||||
|
||||
Run with:
|
||||
```bash
|
||||
docker-compose up
|
||||
```
|
||||
|
||||
## IDE Setup
|
||||
|
||||
### IntelliJ IDEA
|
||||
|
||||
1. **Import Project**: File → New → Project from Existing Sources
|
||||
2. **Select Maven**: Choose Maven as build tool
|
||||
3. **SDK**: Configure Java 17 SDK
|
||||
4. **Run Configuration**:
|
||||
- Main class: `com.kyndryl.goldenpath.GoldenPathApplication`
|
||||
- VM options: `-Dspring.profiles.active=development`
|
||||
|
||||
### VS Code
|
||||
|
||||
1. **Install Extensions**:
|
||||
- Extension Pack for Java
|
||||
- Spring Boot Extension Pack
|
||||
|
||||
2. **Open Folder**: Open the project root
|
||||
|
||||
3. **Run/Debug**: Use Spring Boot Dashboard or F5
|
||||
|
||||
### Eclipse
|
||||
|
||||
1. **Import**: File → Import → Maven → Existing Maven Projects
|
||||
2. **Update Project**: Right-click → Maven → Update Project
|
||||
3. **Run**: Right-click on Application class → Run As → Java Application
|
||||
|
||||
## Debugging
|
||||
|
||||
### Enable Debug Logging
|
||||
|
||||
In `application-development.yml`:
|
||||
|
||||
```yaml
|
||||
logging:
|
||||
level:
|
||||
root: DEBUG
|
||||
com.kyndryl.goldenpath: TRACE
|
||||
```
|
||||
|
||||
### Remote Debugging
|
||||
|
||||
Start with debug enabled:
|
||||
|
||||
```bash
|
||||
./mvnw spring-boot:run -Dspring-boot.run.jvmArguments="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005"
|
||||
```
|
||||
|
||||
Connect debugger to `localhost:5005`
|
||||
|
||||
## Common Development Tasks
|
||||
|
||||
### Adding a New Endpoint
|
||||
|
||||
```java
|
||||
@GetMapping("/api/hello")
|
||||
public ResponseEntity<String> hello() {
|
||||
return ResponseEntity.ok("Hello, World!");
|
||||
}
|
||||
```
|
||||
|
||||
### Adding Custom Metrics
|
||||
|
||||
```java
|
||||
@Autowired
|
||||
private MeterRegistry meterRegistry;
|
||||
|
||||
@GetMapping("/api/data")
|
||||
public String getData() {
|
||||
Counter counter = Counter.builder("custom_api_calls")
|
||||
.tag("endpoint", "data")
|
||||
.register(meterRegistry);
|
||||
counter.increment();
|
||||
return "data";
|
||||
}
|
||||
```
|
||||
|
||||
### Database Integration (Future)
|
||||
|
||||
To add PostgreSQL:
|
||||
|
||||
1. Add dependency in `pom.xml`:
|
||||
```xml
|
||||
<dependency>
|
||||
<groupId>org.springframework.boot</groupId>
|
||||
<artifactId>spring-boot-starter-data-jpa</artifactId>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.postgresql</groupId>
|
||||
<artifactId>postgresql</artifactId>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
2. Configure in `application.yml`:
|
||||
```yaml
|
||||
spring:
|
||||
datasource:
|
||||
url: jdbc:postgresql://localhost:5432/mydb
|
||||
username: user
|
||||
password: pass
|
||||
jpa:
|
||||
hibernate:
|
||||
ddl-auto: update
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Port 8080 Already in Use
|
||||
|
||||
```bash
|
||||
# Find process using port 8080
|
||||
lsof -i :8080
|
||||
|
||||
# Kill process
|
||||
kill -9 <PID>
|
||||
|
||||
# Or use different port
|
||||
./mvnw spring-boot:run -Dspring-boot.run.arguments=--server.port=8081
|
||||
```
|
||||
|
||||
### Maven Build Fails
|
||||
|
||||
```bash
|
||||
# Clean and rebuild
|
||||
./mvnw clean install -U
|
||||
|
||||
# Skip tests temporarily
|
||||
./mvnw clean package -DskipTests
|
||||
```
|
||||
|
||||
### Tests Fail
|
||||
|
||||
```bash
|
||||
# Run with verbose output
|
||||
./mvnw test -X
|
||||
|
||||
# Run single test
|
||||
./mvnw test -Dtest=GoldenPathApplicationTests#contextLoads
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Learn about deployment](deployment.md)
|
||||
- [Configure monitoring](monitoring.md)
|
||||
- [Review architecture](architecture.md)
|
||||
395
docs/monitoring.md
Normal file
395
docs/monitoring.md
Normal file
@@ -0,0 +1,395 @@
|
||||
# Monitoring & Observability
|
||||
|
||||
This guide covers monitoring online-boutique with Prometheus and Grafana.
|
||||
|
||||
## Overview
|
||||
|
||||
The Java Golden Path includes comprehensive observability:
|
||||
|
||||
- **Metrics**: Prometheus metrics via Spring Boot Actuator
|
||||
- **Dashboards**: Pre-configured Grafana dashboard
|
||||
- **Scraping**: Automatic discovery via ServiceMonitor
|
||||
- **Retention**: 15 days of metrics storage
|
||||
|
||||
## Metrics Endpoint
|
||||
|
||||
Spring Boot Actuator exposes Prometheus metrics at:
|
||||
|
||||
```
|
||||
http://<pod-ip>:8080/actuator/prometheus
|
||||
```
|
||||
|
||||
### Verify Metrics Locally
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/actuator/prometheus
|
||||
```
|
||||
|
||||
### Sample Metrics Output
|
||||
|
||||
```
|
||||
# HELP jvm_memory_used_bytes The amount of used memory
|
||||
# TYPE jvm_memory_used_bytes gauge
|
||||
jvm_memory_used_bytes{area="heap",id="G1 Eden Space",} 5.2428800E7
|
||||
|
||||
# HELP http_server_requests_seconds Duration of HTTP server request handling
|
||||
# TYPE http_server_requests_seconds summary
|
||||
http_server_requests_seconds_count{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/api/status",} 42.0
|
||||
http_server_requests_seconds_sum{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/api/status",} 0.351234567
|
||||
```
|
||||
|
||||
## Available Metrics
|
||||
|
||||
### HTTP Metrics
|
||||
|
||||
- `http_server_requests_seconds_count`: Total request count
|
||||
- `http_server_requests_seconds_sum`: Total request duration
|
||||
- **Labels**: method, status, uri, outcome, exception
|
||||
|
||||
### JVM Metrics
|
||||
|
||||
#### Memory
|
||||
- `jvm_memory_used_bytes`: Current memory usage
|
||||
- `jvm_memory_max_bytes`: Maximum memory available
|
||||
- `jvm_memory_committed_bytes`: Committed memory
|
||||
- **Areas**: heap, nonheap
|
||||
- **Pools**: G1 Eden Space, G1 Old Gen, G1 Survivor Space
|
||||
|
||||
#### Garbage Collection
|
||||
- `jvm_gc_pause_seconds_count`: GC pause count
|
||||
- `jvm_gc_pause_seconds_sum`: Total GC pause time
|
||||
- `jvm_gc_memory_allocated_bytes_total`: Total memory allocated
|
||||
- `jvm_gc_memory_promoted_bytes_total`: Memory promoted to old gen
|
||||
|
||||
#### Threads
|
||||
- `jvm_threads_live_threads`: Current live threads
|
||||
- `jvm_threads_daemon_threads`: Current daemon threads
|
||||
- `jvm_threads_peak_threads`: Peak thread count
|
||||
- `jvm_threads_states_threads`: Threads by state (runnable, blocked, waiting)
|
||||
|
||||
#### CPU
|
||||
- `process_cpu_usage`: Process CPU usage (0-1)
|
||||
- `system_cpu_usage`: System CPU usage (0-1)
|
||||
- `system_cpu_count`: Number of CPU cores
|
||||
|
||||
### Application Metrics
|
||||
|
||||
- `application_started_time_seconds`: Application start timestamp
|
||||
- `application_ready_time_seconds`: Application ready timestamp
|
||||
- `process_uptime_seconds`: Process uptime
|
||||
- `process_files_open_files`: Open file descriptors
|
||||
|
||||
### Custom Metrics
|
||||
|
||||
Add custom metrics with Micrometer:
|
||||
|
||||
```java
|
||||
@Autowired
|
||||
private MeterRegistry meterRegistry;
|
||||
|
||||
// Counter
|
||||
Counter.builder("business_operations")
|
||||
.tag("operation", "checkout")
|
||||
.register(meterRegistry)
|
||||
.increment();
|
||||
|
||||
// Gauge
|
||||
Gauge.builder("active_users", this, obj -> obj.getActiveUsers())
|
||||
.register(meterRegistry);
|
||||
|
||||
// Timer
|
||||
Timer.builder("api_processing_time")
|
||||
.tag("endpoint", "/api/process")
|
||||
.register(meterRegistry)
|
||||
.record(() -> {
|
||||
// Timed operation
|
||||
});
|
||||
```
|
||||
|
||||
## Prometheus Configuration
|
||||
|
||||
### ServiceMonitor
|
||||
|
||||
Deployed automatically in `deploy/servicemonitor.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
name: online-boutique
|
||||
namespace:
|
||||
labels:
|
||||
app: online-boutique
|
||||
prometheus: kube-prometheus
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
app: online-boutique
|
||||
endpoints:
|
||||
- port: http
|
||||
path: /actuator/prometheus
|
||||
interval: 30s
|
||||
```
|
||||
|
||||
### Verify Scraping
|
||||
|
||||
Check Prometheus targets:
|
||||
|
||||
1. Access Prometheus: `https://prometheus.kyndemo.live`
|
||||
2. Navigate to **Status → Targets**
|
||||
3. Find `online-boutique` in `monitoring/` namespace
|
||||
4. Status should be **UP**
|
||||
|
||||
Or via kubectl:
|
||||
|
||||
```bash
|
||||
# Port-forward Prometheus
|
||||
kubectl -n monitoring port-forward svc/prometheus-operated 9090:9090
|
||||
|
||||
# Check targets API
|
||||
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.labels.job == "online-boutique")'
|
||||
```
|
||||
|
||||
### Query Metrics
|
||||
|
||||
Access Prometheus UI and run queries:
|
||||
|
||||
```promql
|
||||
# Request rate
|
||||
rate(http_server_requests_seconds_count{job="online-boutique"}[5m])
|
||||
|
||||
# Average request duration
|
||||
rate(http_server_requests_seconds_sum{job="online-boutique"}[5m])
|
||||
/ rate(http_server_requests_seconds_count{job="online-boutique"}[5m])
|
||||
|
||||
# Error rate
|
||||
sum(rate(http_server_requests_seconds_count{job="online-boutique",status=~"5.."}[5m]))
|
||||
/ sum(rate(http_server_requests_seconds_count{job="online-boutique"}[5m]))
|
||||
|
||||
# Memory usage
|
||||
jvm_memory_used_bytes{job="online-boutique",area="heap"}
|
||||
/ jvm_memory_max_bytes{job="online-boutique",area="heap"}
|
||||
```
|
||||
|
||||
## Grafana Dashboard
|
||||
|
||||
### Access Dashboard
|
||||
|
||||
1. Open Grafana: `https://grafana.kyndemo.live`
|
||||
2. Navigate to **Dashboards → Spring Boot Application**
|
||||
3. Select `online-boutique` from dropdown
|
||||
|
||||
### Dashboard Panels
|
||||
|
||||
#### HTTP Metrics
|
||||
- **Request Rate**: Requests per second by endpoint
|
||||
- **Request Duration**: Average, 95th, 99th percentile latency
|
||||
- **Status Codes**: Breakdown of 2xx, 4xx, 5xx responses
|
||||
- **Error Rate**: Percentage of failed requests
|
||||
|
||||
#### JVM Metrics
|
||||
- **Heap Memory**: Used vs. max heap memory over time
|
||||
- **Non-Heap Memory**: Metaspace, code cache, compressed class space
|
||||
- **Garbage Collection**: GC pause frequency and duration
|
||||
- **Thread Count**: Live threads, daemon threads, peak threads
|
||||
|
||||
#### System Metrics
|
||||
- **CPU Usage**: Process and system CPU utilization
|
||||
- **File Descriptors**: Open file count
|
||||
- **Uptime**: Application uptime
|
||||
|
||||
### Custom Dashboards
|
||||
|
||||
Import dashboard JSON from `/k8s/monitoring/spring-boot-dashboard.json`:
|
||||
|
||||
1. Grafana → Dashboards → New → Import
|
||||
2. Upload `spring-boot-dashboard.json`
|
||||
3. Select Prometheus data source
|
||||
4. Click **Import**
|
||||
|
||||
## Alerting
|
||||
|
||||
### Prometheus Alerting Rules
|
||||
|
||||
Create alerting rules in Prometheus:
|
||||
|
||||
```yaml
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: PrometheusRule
|
||||
metadata:
|
||||
name: online-boutique-alerts
|
||||
namespace:
|
||||
labels:
|
||||
prometheus: kube-prometheus
|
||||
spec:
|
||||
groups:
|
||||
- name: online-boutique
|
||||
interval: 30s
|
||||
rules:
|
||||
# High error rate
|
||||
- alert: HighErrorRate
|
||||
expr: |
|
||||
sum(rate(http_server_requests_seconds_count{job="online-boutique",status=~"5.."}[5m]))
|
||||
/ sum(rate(http_server_requests_seconds_count{job="online-boutique"}[5m]))
|
||||
> 0.05
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High error rate on online-boutique"
|
||||
description: "Error rate is {{ $value | humanizePercentage }}"
|
||||
|
||||
# High latency
|
||||
- alert: HighLatency
|
||||
expr: |
|
||||
histogram_quantile(0.95,
|
||||
sum(rate(http_server_requests_seconds_bucket{job="online-boutique"}[5m])) by (le)
|
||||
) > 1.0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High latency on online-boutique"
|
||||
description: "95th percentile latency is {{ $value }}s"
|
||||
|
||||
# High memory usage
|
||||
- alert: HighMemoryUsage
|
||||
expr: |
|
||||
jvm_memory_used_bytes{job="online-boutique",area="heap"}
|
||||
/ jvm_memory_max_bytes{job="online-boutique",area="heap"}
|
||||
> 0.90
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "High memory usage on online-boutique"
|
||||
description: "Heap usage is {{ $value | humanizePercentage }}"
|
||||
|
||||
# Pod not ready
|
||||
- alert: PodNotReady
|
||||
expr: |
|
||||
kube_pod_status_ready{namespace="",pod=~"online-boutique-.*",condition="true"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "online-boutique pod not ready"
|
||||
description: "Pod {{ $labels.pod }} not ready for 5 minutes"
|
||||
```
|
||||
|
||||
Apply:
|
||||
|
||||
```bash
|
||||
kubectl apply -f prometheus-rules.yaml
|
||||
```
|
||||
|
||||
### Grafana Alerts
|
||||
|
||||
Configure alerts in Grafana dashboard panels:
|
||||
|
||||
1. Edit panel
|
||||
2. Click **Alert** tab
|
||||
3. Set conditions (e.g., "when avg() of query(A) is above 0.8")
|
||||
4. Configure notification channels (Slack, email, PagerDuty)
|
||||
|
||||
### Alert Testing
|
||||
|
||||
Trigger test alerts:
|
||||
|
||||
```bash
|
||||
# Generate errors
|
||||
for i in {1..100}; do
|
||||
curl http://localhost:8080/api/nonexistent
|
||||
done
|
||||
|
||||
# Trigger high latency
|
||||
ab -n 10000 -c 100 http://localhost:8080/api/status
|
||||
|
||||
# Cause memory pressure
|
||||
curl -X POST http://localhost:8080/actuator/heapdump
|
||||
```
|
||||
|
||||
## Distributed Tracing (Future)
|
||||
|
||||
To add tracing with Jaeger/Zipkin:
|
||||
|
||||
1. Add dependency:
|
||||
```xml
|
||||
<dependency>
|
||||
<groupId>io.micrometer</groupId>
|
||||
<artifactId>micrometer-tracing-bridge-otel</artifactId>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>io.opentelemetry</groupId>
|
||||
<artifactId>opentelemetry-exporter-zipkin</artifactId>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
2. Configure in `application.yml`:
|
||||
```yaml
|
||||
management:
|
||||
tracing:
|
||||
sampling:
|
||||
probability: 1.0
|
||||
zipkin:
|
||||
tracing:
|
||||
endpoint: http://zipkin:9411/api/v2/spans
|
||||
```
|
||||
|
||||
## Log Aggregation
|
||||
|
||||
For centralized logging:
|
||||
|
||||
1. **Loki**: Add Promtail to collect pod logs
|
||||
2. **Grafana Logs**: Query logs alongside metrics
|
||||
3. **Log Correlation**: Link traces to logs via trace ID
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Metric Cardinality**: Avoid high-cardinality labels (user IDs, timestamps)
|
||||
2. **Naming**: Follow Prometheus naming conventions (`_total`, `_seconds`, `_bytes`)
|
||||
3. **Aggregation**: Use recording rules for expensive queries
|
||||
4. **Retention**: Adjust retention period based on storage capacity
|
||||
5. **Dashboarding**: Create business-specific dashboards for stakeholders
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Metrics Not Appearing
|
||||
|
||||
```bash
|
||||
# Check if actuator is enabled
|
||||
kubectl -n exec -it deployment/online-boutique -- \
|
||||
curl http://localhost:8080/actuator
|
||||
|
||||
# Check ServiceMonitor
|
||||
kubectl -n get servicemonitor online-boutique -o yaml
|
||||
|
||||
# Check Prometheus logs
|
||||
kubectl -n monitoring logs -l app.kubernetes.io/name=prometheus --tail=100 | grep online-boutique
|
||||
```
|
||||
|
||||
### High Memory Usage
|
||||
|
||||
```bash
|
||||
# Take heap dump
|
||||
kubectl -n exec -it deployment/online-boutique -- \
|
||||
curl -X POST http://localhost:8080/actuator/heapdump --output heapdump.hprof
|
||||
|
||||
# Analyze with jmap/jhat or Eclipse Memory Analyzer
|
||||
```
|
||||
|
||||
### Slow Queries
|
||||
|
||||
Enable query logging in Prometheus:
|
||||
|
||||
```bash
|
||||
kubectl -n monitoring port-forward svc/prometheus-operated 9090:9090
|
||||
# Access http://localhost:9090/graph
|
||||
# Enable query stats in settings
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Review architecture](architecture.md)
|
||||
- [Learn about deployment](deployment.md)
|
||||
- [Return to overview](index.md)
|
||||
Reference in New Issue
Block a user