Brian Tan Seng 294cbdca9d ⏺ Perfect! I've created a comprehensive Google Cloud deployment guide for ERPNext with GKE.

Here's what I've delivered:

  📚 Documentation Created

  1. 01-gke-deployment.md - Complete step-by-step GKE deployment guide
  2. 02-cloud-run-analysis.md - Detailed feasibility analysis for Cloud Run vs GKE
  3. 03-production-setup.md - Production hardening, security, monitoring, and best practices

  🗂️ Supporting Files

  Kubernetes Manifests (Ready-to-Deploy)

  - namespace.yaml - Namespace with security policies and resource quotas
  - storage.yaml - Storage classes and persistent volume claims
  - configmap.yaml - Application configuration and optimized Nginx config
  - redis.yaml - Production-ready Redis deployment
  - mariadb.yaml - Secure MariaDB with health checks
  - erpnext-backend.yaml - ERPNext backend with autoscaling and security
  - erpnext-frontend.yaml - Nginx frontend with security headers
  - erpnext-workers.yaml - Queue workers and scheduler with autoscaling
  - ingress.yaml - Ingress with SSL/TLS and security configurations
  - jobs.yaml - Site creation, backup, and migration jobs

  Automation Scripts

  - deploy.sh - Complete automated deployment script with error handling
  - backup-restore.sh - Comprehensive backup and restore functionality

  🎯 Key Features Included

  🔒 Security First

  - Private GKE clusters
  - Pod security standards
  - Network policies
  - RBAC configuration
  - Secrets management
  - Security headers and CSP

  🚀 Production Ready

  - High availability setup
  - Horizontal Pod Autoscaling
  - Pod Disruption Budgets
  - Health checks and probes
  - Rolling updates
  - Resource optimization

  📊 Monitoring & Observability

  - Prometheus and Grafana integration
  - Custom ERPNext dashboards
  - Alerting rules
  - Log aggregation
  - Performance metrics

  💾 Backup & Recovery

  - Automated daily backups
  - Point-in-time recovery
  - Cross-region storage
  - Manual backup/restore scripts

  🎛️ Easy Deployment

  - One-command deployment script
  - Environment variable configuration
  - Comprehensive error handling
  - Status monitoring

  🚀 Quick Start

  To deploy ERPNext on GKE, users can now simply:

  cd documentation/deployment-guides/gcp/scripts/
  export PROJECT_ID=your-gcp-project
  export DOMAIN=erpnext.yourdomain.com
  ./deploy.sh deploy

  The guides provide both automated and manual deployment options, allowing users to choose
  based on their expertise and requirements. The Cloud Run analysis helps decision-making
  between different deployment strategies.

  All files are production-ready with security best practices, monitoring, and operational
  procedures included. The documentation is structured to support both first-time deployments
  and ongoing operations.

2025-08-22 18:13:04 +08:00

27 KiB

Raw Permalink Blame History

ERPNext GKE Production Hardening Guide

Overview

This guide covers production-ready configurations, security hardening, monitoring, backup strategies, and operational best practices for ERPNext on GKE.

🔐 Security Hardening

1. Private GKE Cluster Setup

# Create private GKE cluster with enhanced security
gcloud container clusters create erpnext-prod \
    --zone=us-central1-a \
    --node-locations=us-central1-a,us-central1-b,us-central1-c \
    --enable-private-nodes \
    --master-ipv4-cidr-block=172.16.0.0/28 \
    --enable-ip-alias \
    --cluster-ipv4-cidr=10.1.0.0/16 \
    --services-ipv4-cidr=10.2.0.0/16 \
    --enable-network-policy \
    --enable-autoscaling \
    --min-nodes=3 \
    --max-nodes=20 \
    --machine-type=e2-standard-4 \
    --disk-type=pd-ssd \
    --disk-size=100GB \
    --enable-autorepair \
    --enable-autoupgrade \
    --maintenance-window-start=2024-01-01T03:00:00Z \
    --maintenance-window-end=2024-01-01T07:00:00Z \
    --maintenance-window-recurrence="FREQ=WEEKLY;BYDAY=SU" \
    --workload-pool=erpnext-production.svc.id.goog \
    --enable-shielded-nodes \
    --enable-image-streaming \
    --logging=SYSTEM,WORKLOAD,API_SERVER \
    --monitoring=SYSTEM,WORKLOAD,STORAGE,POD,DEPLOYMENT,STATEFULSET,DAEMONSET,HPA,CADVISOR,KUBELET

2. Network Security Policies

# Deny all traffic by default
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: erpnext
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
EOF

# Allow ERPNext frontend to backend communication
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: erpnext-frontend-to-backend
  namespace: erpnext
spec:
  podSelector:
    matchLabels:
      app: erpnext-frontend
  policyTypes:
  - Egress
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: erpnext-backend
    ports:
    - protocol: TCP
      port: 8000
  - to:
    - podSelector:
        matchLabels:
          app: erpnext-backend
    ports:
    - protocol: TCP
      port: 9000
EOF

# Allow backend to database and redis
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: erpnext-backend-to-services
  namespace: erpnext
spec:
  podSelector:
    matchLabels:
      app: erpnext-backend
  policyTypes:
  - Egress
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: mariadb
    ports:
    - protocol: TCP
      port: 3306
  - to:
    - podSelector:
        matchLabels:
          app: redis
    ports:
    - protocol: TCP
      port: 6379
  - to: []
    ports:
    - protocol: TCP
      port: 53
    - protocol: UDP
      port: 53
EOF

# Allow ingress from nginx controller
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-nginx-ingress
  namespace: erpnext
spec:
  podSelector:
    matchLabels:
      app: erpnext-frontend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8080
EOF

3. Pod Security Standards

# Apply restricted pod security standards
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: erpnext
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted
EOF

# Security context for ERPNext backend
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: erpnext-backend-secure
  namespace: erpnext
spec:
  replicas: 3
  selector:
    matchLabels:
      app: erpnext-backend
  template:
    metadata:
      labels:
        app: erpnext-backend
    spec:
      serviceAccountName: erpnext-ksa
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: erpnext-backend
        image: frappe/erpnext-worker:v14
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
          runAsGroup: 1000
          capabilities:
            drop:
            - ALL
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        volumeMounts:
        - name: sites-data
          mountPath: /home/frappe/frappe-bench/sites
        - name: tmp
          mountPath: /tmp
        - name: logs
          mountPath: /home/frappe/frappe-bench/logs
      volumes:
      - name: sites-data
        persistentVolumeClaim:
          claimName: erpnext-sites-pvc
      - name: tmp
        emptyDir: {}
      - name: logs
        emptyDir: {}
EOF

4. RBAC Configuration

# Create service account with minimal permissions
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: erpnext-ksa
  namespace: erpnext
  annotations:
    iam.gke.io/gcp-service-account: erpnext-gke@erpnext-production.iam.gserviceaccount.com
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: erpnext
  name: erpnext-role
rules:
- apiGroups: [""]
  resources: ["configmaps", "secrets"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: erpnext-binding
  namespace: erpnext
subjects:
- kind: ServiceAccount
  name: erpnext-ksa
  namespace: erpnext
roleRef:
  kind: Role
  name: erpnext-role
  apiGroup: rbac.authorization.k8s.io
EOF

🛡️ Secrets Management

1. External Secrets Operator Setup

# Install External Secrets Operator
helm repo add external-secrets https://charts.external-secrets.io
helm install external-secrets external-secrets/external-secrets -n external-secrets-system --create-namespace

# Create SecretStore for GCP Secret Manager
kubectl apply -f - <<EOF
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: gcpsm-secret-store
  namespace: erpnext
spec:
  provider:
    gcpsm:
      projectId: "erpnext-production"
      auth:
        workloadIdentity:
          clusterLocation: us-central1-a
          clusterName: erpnext-prod
          serviceAccountRef:
            name: erpnext-ksa
EOF

# Create ExternalSecret for ERPNext credentials
kubectl apply -f - <<EOF
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: erpnext-external-secret
  namespace: erpnext
spec:
  refreshInterval: 15m
  secretStoreRef:
    name: gcpsm-secret-store
    kind: SecretStore
  target:
    name: erpnext-secrets
    creationPolicy: Owner
  data:
  - secretKey: admin-password
    remoteRef:
      key: erpnext-admin-password
  - secretKey: db-password
    remoteRef:
      key: erpnext-db-password
  - secretKey: api-key
    remoteRef:
      key: erpnext-api-key
  - secretKey: api-secret
    remoteRef:
      key: erpnext-api-secret
EOF

2. Encrypt Secrets at Rest

# Create KMS key for additional encryption
gcloud kms keyrings create erpnext-keyring --location=us-central1

gcloud kms keys create erpnext-key \
    --location=us-central1 \
    --keyring=erpnext-keyring \
    --purpose=encryption

# Update cluster to use application-layer secrets encryption
gcloud container clusters update erpnext-prod \
    --zone=us-central1-a \
    --database-encryption-key projects/erpnext-production/locations/us-central1/keyRings/erpnext-keyring/cryptoKeys/erpnext-key \
    --database-encryption-key-state=ENCRYPTED

📊 Monitoring and Observability

1. Install Prometheus Stack

# Add prometheus-community helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
    --namespace monitoring \
    --create-namespace \
    --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
    --set prometheus.prometheusSpec.retention=30d \
    --set grafana.adminPassword=SecurePassword123! \
    --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi

2. ERPNext Monitoring ConfigMap

kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: erpnext-monitoring
  namespace: erpnext
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
    scrape_configs:
    - job_name: 'erpnext-backend'
      static_configs:
      - targets: ['erpnext-backend:8000']
      metrics_path: '/api/method/frappe.utils.response.get_response_length'
      scrape_interval: 30s
    - job_name: 'erpnext-queue-metrics'
      static_configs:
      - targets: ['erpnext-backend:8000']
      metrics_path: '/api/method/frappe.utils.scheduler.get_events'
      scrape_interval: 60s
EOF

3. Custom Grafana Dashboard

kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: erpnext-dashboard
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  erpnext-dashboard.json: |
    {
      "dashboard": {
        "id": null,
        "title": "ERPNext Production Dashboard",
        "tags": ["erpnext"],
        "style": "dark",
        "timezone": "browser",
        "panels": [
          {
            "id": 1,
            "title": "Response Time",
            "type": "graph",
            "targets": [
              {
                "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job=\"erpnext-backend\"}[5m])) by (le))",
                "legendFormat": "95th percentile"
              }
            ]
          },
          {
            "id": 2,
            "title": "Request Rate",
            "type": "graph",
            "targets": [
              {
                "expr": "sum(rate(http_requests_total{job=\"erpnext-backend\"}[5m]))",
                "legendFormat": "Requests/sec"
              }
            ]
          },
          {
            "id": 3,
            "title": "Pod CPU Usage",
            "type": "graph",
            "targets": [
              {
                "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"erpnext\"}[5m])) by (pod)",
                "legendFormat": "{{pod}}"
              }
            ]
          },
          {
            "id": 4,
            "title": "Pod Memory Usage",
            "type": "graph",
            "targets": [
              {
                "expr": "sum(container_memory_working_set_bytes{namespace=\"erpnext\"}) by (pod)",
                "legendFormat": "{{pod}}"
              }
            ]
          }
        ],
        "time": {
          "from": "now-1h",
          "to": "now"
        },
        "refresh": "30s"
      }
    }
EOF

4. Alerting Rules

kubectl apply -f - <<EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: erpnext-alerts
  namespace: monitoring
  labels:
    prometheus: kube-prometheus
    role: alert-rules
spec:
  groups:
  - name: erpnext.rules
    rules:
    - alert: ERPNextHighResponseTime
      expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job="erpnext-backend"}[5m])) by (le)) > 2
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "ERPNext response time is high"
        description: "95th percentile response time is {{ $value }}s"
    
    - alert: ERPNextPodCrashLooping
      expr: rate(kube_pod_container_status_restarts_total{namespace="erpnext"}[5m]) > 0
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "ERPNext pod is crash looping"
        description: "Pod {{ $labels.pod }} is restarting frequently"
    
    - alert: ERPNextHighCPUUsage
      expr: sum(rate(container_cpu_usage_seconds_total{namespace="erpnext"}[5m])) by (pod) > 0.8
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "ERPNext pod high CPU usage"
        description: "Pod {{ $labels.pod }} CPU usage is {{ $value }}"
    
    - alert: ERPNextHighMemoryUsage
      expr: sum(container_memory_working_set_bytes{namespace="erpnext"}) by (pod) / sum(container_spec_memory_limit_bytes{namespace="erpnext"}) by (pod) > 0.9
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "ERPNext pod high memory usage"
        description: "Pod {{ $labels.pod }} memory usage is {{ $value }}"
    
    - alert: ERPNextDatabaseDown
      expr: up{job="mariadb"} == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "ERPNext database is down"
        description: "MariaDB database is not responding"
EOF

🔄 Backup and Disaster Recovery

1. Database Backup Strategy

# Create backup job using CronJob
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: CronJob
metadata:
  name: erpnext-db-backup
  namespace: erpnext
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: erpnext-ksa
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: mysql:8.0
            command:
            - /bin/bash
            - -c
            - |
              BACKUP_FILE="erpnext_backup_\$(date +%Y%m%d_%H%M%S).sql"
              mysqldump -h mariadb -u erpnext -p\$DB_PASSWORD --single-transaction --routines --triggers erpnext > /backup/\$BACKUP_FILE
              gzip /backup/\$BACKUP_FILE
              gsutil cp /backup/\$BACKUP_FILE.gz gs://erpnext-backups/database/
              # Keep only last 30 days of backups
              find /backup -name "*.gz" -mtime +30 -delete
            env:
            - name: DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: erpnext-secrets
                  key: db-password
            volumeMounts:
            - name: backup-storage
              mountPath: /backup
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: backup-pvc
EOF

2. Site Files Backup

# Create site files backup job
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: CronJob
metadata:
  name: erpnext-files-backup
  namespace: erpnext
spec:
  schedule: "0 3 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: erpnext-ksa
          restartPolicy: OnFailure
          containers:
          - name: files-backup
            image: google/cloud-sdk:alpine
            command:
            - /bin/bash
            - -c
            - |
              BACKUP_DATE=\$(date +%Y%m%d_%H%M%S)
              tar -czf /tmp/sites_backup_\$BACKUP_DATE.tar.gz -C /sites .
              gsutil cp /tmp/sites_backup_\$BACKUP_DATE.tar.gz gs://erpnext-backups/sites/
              rm /tmp/sites_backup_\$BACKUP_DATE.tar.gz
            volumeMounts:
            - name: sites-data
              mountPath: /sites
              readOnly: true
          volumes:
          - name: sites-data
            persistentVolumeClaim:
              claimName: erpnext-sites-pvc
EOF

3. Backup Storage Setup

# Create backup bucket with lifecycle policy
gsutil mb gs://erpnext-backups

# Set lifecycle policy
gsutil lifecycle set - gs://erpnext-backups <<EOF
{
  "lifecycle": {
    "rule": [
      {
        "action": {"type": "Delete"},
        "condition": {"age": 90}
      },
      {
        "action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
        "condition": {"age": 30}
      },
      {
        "action": {"type": "SetStorageClass", "storageClass": "COLDLINE"},
        "condition": {"age": 60}
      }
    ]
  }
}
EOF

# Create backup PVC
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: backup-pvc
  namespace: erpnext
spec:
  storageClassName: standard-retain
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
EOF

4. Disaster Recovery Plan

# Create DR restoration job template
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
  name: erpnext-restore
  namespace: erpnext
spec:
  template:
    spec:
      serviceAccountName: erpnext-ksa
      restartPolicy: Never
      containers:
      - name: restore
        image: mysql:8.0
        command:
        - /bin/bash
        - -c
        - |
          # Download latest backup
          gsutil cp gs://erpnext-backups/database/\$BACKUP_FILE /tmp/
          gunzip /tmp/\$BACKUP_FILE
          
          # Restore database
          mysql -h mariadb -u erpnext -p\$DB_PASSWORD erpnext < /tmp/\${BACKUP_FILE%.gz}
          
          # Verify restoration
          mysql -h mariadb -u erpnext -p\$DB_PASSWORD -e "SHOW TABLES;" erpnext
        env:
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: erpnext-secrets
              key: db-password
        - name: BACKUP_FILE
          value: "erpnext_backup_20241201_020000.sql.gz"
EOF

🚀 Performance Optimization

1. Resource Optimization

# Vertical Pod Autoscaler for right-sizing
kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: erpnext-backend-vpa
  namespace: erpnext
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: erpnext-backend
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: erpnext-backend
      maxAllowed:
        cpu: 2
        memory: 4Gi
      minAllowed:
        cpu: 500m
        memory: 1Gi
EOF

2. Pod Disruption Budgets

kubectl apply -f - <<EOF
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: erpnext-backend-pdb
  namespace: erpnext
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: erpnext-backend
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: erpnext-frontend-pdb
  namespace: erpnext
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: erpnext-frontend
EOF

3. Node Affinity and Anti-Affinity

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: erpnext-backend-ha
  namespace: erpnext
spec:
  replicas: 3
  selector:
    matchLabels:
      app: erpnext-backend
  template:
    metadata:
      labels:
        app: erpnext-backend
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - erpnext-backend
              topologyKey: kubernetes.io/hostname
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: cloud.google.com/gke-preemptible
                operator: DoesNotExist
      containers:
      - name: erpnext-backend
        image: frappe/erpnext-worker:v14
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
EOF

🔧 Operational Procedures

1. Health Checks and Probes

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: erpnext-backend-health
  namespace: erpnext
spec:
  replicas: 3
  selector:
    matchLabels:
      app: erpnext-backend
  template:
    metadata:
      labels:
        app: erpnext-backend
    spec:
      containers:
      - name: erpnext-backend
        image: frappe/erpnext-worker:v14
        ports:
        - containerPort: 8000
        livenessProbe:
          httpGet:
            path: /api/method/ping
            port: 8000
          initialDelaySeconds: 60
          periodSeconds: 30
          timeoutSeconds: 10
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /api/method/ping
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        startupProbe:
          httpGet:
            path: /api/method/ping
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 30
EOF

2. Log Aggregation

# Configure Fluentd for log collection
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-gcp
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: fluentd-gcp
  template:
    metadata:
      labels:
        k8s-app: fluentd-gcp
    spec:
      serviceAccountName: fluentd-gcp
      containers:
      - name: fluentd-gcp
        image: gcr.io/gke-release/fluentd-gcp:2.0.17-gke.0
        env:
        - name: FLUENTD_ARGS
          value: --no-supervisor -q
        resources:
          limits:
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: libsystemddir
          mountPath: /host/lib
          readOnly: true
        - name: config-volume
          mountPath: /etc/fluent/config.d
      nodeSelector:
        beta.kubernetes.io/os: linux
      tolerations:
      - operator: Exists
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: libsystemddir
        hostPath:
          path: /usr/lib64
      - name: config-volume
        configMap:
          name: fluentd-gcp-config
EOF

3. Update Strategy

# Rolling update configuration
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: erpnext-backend-rolling
  namespace: erpnext
spec:
  replicas: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  selector:
    matchLabels:
      app: erpnext-backend
  template:
    metadata:
      labels:
        app: erpnext-backend
        version: v14.1.0
    spec:
      containers:
      - name: erpnext-backend
        image: frappe/erpnext-worker:v14
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
EOF

🔍 Compliance and Governance

1. OPA Gatekeeper Policies

# Install Gatekeeper
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/release-3.14/deploy/gatekeeper.yaml

# Create constraint template for required labels
kubectl apply -f - <<EOF
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        type: object
        properties:
          labels:
            type: array
            items:
              type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels
        
        violation[{"msg": msg}] {
          required := input.parameters.labels
          provided := input.review.object.metadata.labels
          missing := required[_]
          not provided[missing]
          msg := sprintf("Missing required label: %v", [missing])
        }
EOF

# Apply constraint
kubectl apply -f - <<EOF
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: must-have-environment
spec:
  match:
    kinds:
      - apiGroups: ["apps"]
        kinds: ["Deployment"]
    namespaces: ["erpnext"]
  parameters:
    labels: ["environment", "app", "version"]
EOF

2. Resource Quotas

kubectl apply -f - <<EOF
apiVersion: v1
kind: ResourceQuota
metadata:
  name: erpnext-quota
  namespace: erpnext
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    persistentvolumeclaims: "10"
    pods: "20"
    services: "10"
EOF

📈 Performance Testing

1. Load Testing Setup

# Create load testing job
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
  name: erpnext-load-test
  namespace: erpnext
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: load-test
        image: grafana/k6:latest
        command:
        - k6
        - run
        - --vus=50
        - --duration=10m
        - -
        stdin: |
          import http from 'k6/http';
          import { check, sleep } from 'k6';
          
          export default function () {
            let response = http.get('https://erpnext.yourdomain.com/api/method/ping');
            check(response, {
              'status is 200': (r) => r.status === 200,
              'response time < 2s': (r) => r.timings.duration < 2000,
            });
            sleep(1);
          }
EOF

🚨 Incident Response

1. Runbook for Common Issues

# Create incident response ConfigMap
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: incident-runbooks
  namespace: erpnext
data:
  pod-crashloop.md: |
    # Pod CrashLoop Incident Response
    
    ## Investigation Steps
    1. Check pod logs: kubectl logs <pod-name> -n erpnext
    2. Check events: kubectl describe pod <pod-name> -n erpnext
    3. Check resource usage: kubectl top pod <pod-name> -n erpnext
    
    ## Common Causes
    - Database connection issues
    - Insufficient resources
    - Configuration errors
    - Image pull failures
    
    ## Resolution Steps
    1. Scale down problematic deployment
    2. Fix underlying issue
    3. Scale back up
    4. Monitor for stability
  
  high-response-time.md: |
    # High Response Time Incident Response
    
    ## Investigation Steps
    1. Check current load: kubectl top pods -n erpnext
    2. Check HPA status: kubectl get hpa -n erpnext
    3. Check database performance
    4. Review nginx access logs
    
    ## Resolution Steps
    1. Scale up if needed: kubectl scale deployment erpnext-backend --replicas=5
    2. Check database queries
    3. Clear Redis cache if needed
    4. Review and optimize slow queries
EOF

📋 Production Checklist

Pre-Deployment Checklist

Security hardening applied
Network policies configured
RBAC properly set up
Secrets management implemented
Monitoring stack deployed
Backup procedures tested
Load testing completed
Disaster recovery plan documented
Incident response procedures ready
Documentation updated

Post-Deployment Checklist

All pods running and healthy
Ingress working correctly
SSL certificates issued
Monitoring alerts configured
Backup jobs scheduled
Log aggregation working
Performance metrics baseline established
Team trained on operational procedures

📚 Additional Resources

⚠️ Important: Regular security audits and updates are essential for maintaining a secure production environment. Schedule quarterly reviews of all security configurations and keep up with the latest security patches.

27 KiB Raw Permalink Blame History