From e4dd205be6271c22d2b80239009636ede8e775bc Mon Sep 17 00:00:00 2001 From: bluepal-prasanthi-moparthi Date: Wed, 5 Nov 2025 16:03:45 +0530 Subject: [PATCH 1/2] add documentation for certificate-rotation --- .../arangodb-starter/certificate-rotation.md | 477 ++++++++++++++++++ 1 file changed, 477 insertions(+) create mode 100644 site/content/3.12/components/tools/arangodb-starter/certificate-rotation.md diff --git a/site/content/3.12/components/tools/arangodb-starter/certificate-rotation.md b/site/content/3.12/components/tools/arangodb-starter/certificate-rotation.md new file mode 100644 index 0000000000..af48f5ff47 --- /dev/null +++ b/site/content/3.12/components/tools/arangodb-starter/certificate-rotation.md @@ -0,0 +1,477 @@ +--- +title: SSL/TLS Certificate Rotation +menuTitle: SSL/TLS Certificate Rotation +weight: 10 +description: >- + This document explains how to rotate SSL/TLS certificates in ArangoDB clusters managed by the ArangoDB Starter. +--- +## Contents + +- [1. Quick Recommendation](#1-quick-recommendation) +- [2. Prerequisites & Variables](#2-prerequisites--variables) +- [3. Option Comparison](#3-option-comparison) +- [4. Option 1: Graceful Restart](#4-option-1-graceful-restart-recommended) +- [5. Option 2: Configuration File Update](#5-option-2-configuration-file-update) +- [6. Option 3: Hot Reload via API](#6-option-3-hot-reload-via-api) +- [7. Summary](#7-summary) + +--- + +## 1. Quick Recommendation + +For most production deployments, use _Option 1_ (Graceful Restart). + +It provides the best balance of _simplicity, reliability, and safety_ with only _30-60 seconds_ of planned downtime. + +--- + +## 2. Prerequisites & Variables + +Before starting any certificate rotation procedure, set these variables for your environment: + +```bash +# Replace with your actual values +export NODE="your-hostname.example.com" # Your server hostname or IP +export PORT="8529" # Coordinator port (default: 8529) +export STARTER_PORT="8528" # Starter API port (default: 8528) +``` + +**Common Default Ports:** +- **Starter API**: 8528 +- **Coordinator**: 8529 +- **DBServer**: 8530 +- **Agent**: 8531 + +Throughout this document, commands use `${NODE}`, `${PORT}`, and `${STARTER_PORT}` placeholders. Replace them with your actual values or set the variables as shown above. + +--- + +## 3. Option Comparison + +| Options | Downtime | Complexity | Reliability | Best For | +|------------------------|----------|------------|----------------------|-------------------------| +| 1. Graceful Restart | 30-60s | Low | 100% | **Most production use** | +| 2. Config File Update | 30-60s | High | Error-prone | Path changes only | +| 3. Hot Reload API | None | Medium | Requires verification| Zero-downtime SLA | + +## 4. Option 1: Graceful Restart (Recommended) + +### Overview + +This option provides the best approach for most certificate rotation scenarios. + +**Benefits:** +- **Simple**: Only 6 straightforward steps +- **Reliable**: Always successful when followed correctly +- **Safe**: No manual configuration file editing +- **Quick**: 30-60 seconds planned downtime +- **Clean**: Fresh configuration state +- **Easy Rollback**: Restore file and restart + +### Procedure + +#### Step 1: Prepare New Certificate + +```bash +# Generate or obtain your new certificate +openssl req -x509 -newkey rsa:4096 \ + -keyout /tmp/new-key.pem \ + -out /tmp/new-cert.pem \ + -days 365 -nodes \ + -subj "/CN=your-hostname/O=YourOrganization" + +# Combine into ArangoDB keyfile format (certificate + private key) +cat /tmp/new-cert.pem /tmp/new-key.pem > /tmp/new-server.keyfile +chmod 600 /tmp/new-server.keyfile + +# Verify the certificate +openssl x509 -in /tmp/new-server.keyfile -noout -subject -dates + +# Backup current certificate +cp /path/to/current/server.keyfile \ + /path/to/current/server.keyfile.backup-$(date +%Y%m%d) +``` + +#### Step 2: Replace Certificate at Same Path + +```bash +# Replace the certificate file +# Note: Cluster still running with old cert in memory +cp /tmp/new-server.keyfile /path/to/current/server.keyfile + +# Verify replacement +openssl x509 -in /path/to/current/server.keyfile -noout -subject +``` + +#### Step 3: Graceful Cluster Shutdown + +```bash +# Shutdown each node gracefully +# Adjust NODE and PORT for your environment +curl -k -X POST https://${NODE}:${STARTER_PORT}/shutdown + +# Wait for clean shutdown +sleep 15 + +# Verify all stopped +ps aux | grep arangod | grep -v grep # Should be empty +``` + +#### Step 4: Delete setup.json Files + +**This is the key step** - forces starter to use command-line options: + +```bash +# Delete setup.json from all data directories +rm -f /path/to/data-dir/setup.json + +# Verify deletion +ls -la /path/to/data-dir/setup.json # Should show "No such file" +``` + +**Why delete setup.json?** +- Forces fresh configuration with new certificate +- Eliminates cached state conflicts +- Ensures command-line options take precedence + +#### Step 5: Restart Cluster + +Restart using the **exact same commands** as original startup: + +```bash +export STARTER=/usr/local/bin/arangodb + +$STARTER \ + --ssl.keyfile=/path/to/current/server.keyfile \ + --starter.data-dir=/path/to/data-dir \ + --starter.port=${PORT} \ + --log.console=true +``` + +Wait for startup completion (~30 seconds): +``` +Your cluster can now be accessed with a browser at `https://hostname:8529` +``` + +#### Step 6: Verify New Certificate + +```bash +# Check each server type (adjust NODE and PORT for your environment) +echo | openssl s_client -connect ${NODE}:${PORT} 2>/dev/null | \ + openssl x509 -noout -subject -dates + +# Verify cluster health +curl -k -u root: https://${NODE}:8529/_admin/cluster/health +# Should return JSON with "Status": "GOOD" +``` + +**Default Ports**: Coordinator: 8529, DBServer: 8530, Agent: 8531 + +--- + +### Production Automation Script + +```bash +#!/bin/bash +# rotate-certificate.sh - Automated certificate rotation +# Usage: ./rotate-certificate.sh /path/to/new/certificate.keyfile + +set -e + +# Configuration - Adjust for your environment +NODES=("node1.example.com" "node2.example.com" "node3.example.com") +STARTER_PORT="8528" +CERT_PATH="/path/to/production/server.keyfile" +DATA_DIRS=("/path/to/data-node1" "/path/to/data-node2" "/path/to/data-node3") + +NEW_CERT="$1" + +echo "=== ArangoDB Certificate Rotation ===" + +# Backup +BACKUP="${CERT_PATH}.backup-$(date +%Y%m%d-%H%M%S)" +cp "$CERT_PATH" "$BACKUP" +echo "Backed up to: $BACKUP" + +# Replace certificate +cp "$NEW_CERT" "$CERT_PATH" +chmod 600 "$CERT_PATH" +echo "Certificate replaced" + +# Shutdown all nodes +echo "Shutting down cluster..." +for NODE in "${NODES[@]}"; do + curl -k -X POST https://${NODE}:${STARTER_PORT}/shutdown || true +done +sleep 15 + +# Delete setup.json on all nodes +for DIR in "${DATA_DIRS[@]}"; do + rm -f "${DIR}/setup.json" +done +echo "setup.json files deleted" + +echo "Ready to restart cluster. Press Enter after restart..." +read + +# Verify +echo "Verifying new certificate..." +for NODE in "${NODES[@]}"; do + echo "$NODE:" + echo | openssl s_client -connect ${NODE}:8529 2>/dev/null | \ + openssl x509 -noout -subject +done + +echo "Certificate rotation complete" +``` + +--- + +## 5. Option 2: Configuration File Update + +### Overview + +{{< warning >}} +**Not Recommended** - Use only for path changes. +{{< /warning >}} + +This option requires manually editing multiple configuration files and is error-prone. Use only when the certificate path must change. + +### Procedure + +#### Step 1: Create New Certificate at Different Path + +```bash +mkdir -p /new/certificate/path + +openssl req -x509 -newkey rsa:4096 \ + -keyout /new/certificate/path/key.pem \ + -out /new/certificate/path/cert.pem \ + -days 730 -nodes \ + -subj "/CN=your-hostname/O=YourOrganization" + +cat /new/certificate/path/cert.pem \ + /new/certificate/path/key.pem \ + > /new/certificate/path/server.keyfile + +chmod 600 /new/certificate/path/server.keyfile +``` + +#### Step 2: Update setup.json Files + +{{< warning >}} +The cluster is still running during this step - be careful with edits. +{{< /warning >}} + +```python +#!/usr/bin/env python3 +# update-setup-json.py +import json, sys, shutil + +setup_file, new_cert_path = sys.argv[1], sys.argv[2] + +with open(setup_file, 'r') as f: + config = json.load(f) + +shutil.copy(setup_file, setup_file + '.backup') +config['ssl-keyfile'] = new_cert_path + +with open(setup_file, 'w') as f: + json.dump(config, f, indent=2) + +print(f"Updated {setup_file}") +``` + +```bash +# Run for each node's setup.json +python3 update-setup-json.py /path/to/data-dir/setup.json /new/path/cert.keyfile + +# Verify +grep "ssl-keyfile" /path/to/data-dir/setup.json +``` + +#### Step 3: Update arangod.conf Files + +{{< info >}} +Config files use `keyfile = ...` under the `[ssl]` section. +{{< /info >}} + +```bash +#!/bin/bash +# update-arangod-conf.sh +CONF_FILE="$1" +NEW_PATH="$2" + +[ ! -f "$CONF_FILE" ] && echo "Skip: $CONF_FILE" && exit 0 + +cp "$CONF_FILE" "${CONF_FILE}.backup" +sed -i "s|^keyfile.*|keyfile = ${NEW_PATH}|g" "$CONF_FILE" +echo "Updated: $CONF_FILE" +``` + +```bash +# Update all server instances (adjust paths for your environment) +./update-arangod-conf.sh /path/to/data-dir/coordinator*/arangod.conf /new/path/cert.keyfile +./update-arangod-conf.sh /path/to/data-dir/dbserver*/arangod.conf /new/path/cert.keyfile +./update-arangod-conf.sh /path/to/data-dir/agent*/arangod.conf /new/path/cert.keyfile +``` + +{{< info >}} +Directory names include port numbers (e.g., `coordinator8529`). +{{< /info >}} + +#### Step 4: Restart Cluster + +```bash +# Shutdown +curl -k -X POST https://${NODE}:${STARTER_PORT}/shutdown +sleep 15 + +# Restart with original commands (starter uses paths from config files) +$STARTER --ssl.keyfile=/old/path/cert.keyfile ... +``` + +#### Step 5: Verify New Certificate + +```bash +echo | openssl s_client -connect ${NODE}:8529 2>/dev/null | \ + openssl x509 -noout -subject -dates +``` + +--- + +## 6. Option 3: Hot Reload via API + +### Overview + +This option enables certificate rotation with zero downtime by reloading certificates without restarting the cluster. + +**When to Use:** +- Zero downtime is absolutely critical (strict SLA) +- Cannot schedule maintenance window +- Have monitoring to verify reload success + +**Important:** For most deployments, the 30-60 second downtime of Option 1 is preferable due to simplicity and reliability. + +### How It Works + +The `/_admin/server/tls` API endpoint reloads the SSL certificate from disk without restarting. However: + +- New TLS connections immediately use the new certificate +- Existing TLS connections cache the old certificate until they close + +### Procedure + +#### Step 1: Prepare and Replace Certificate + +```bash +# Generate new certificate +openssl req -x509 -newkey rsa:4096 \ + -keyout /tmp/new-key.pem \ + -out /tmp/new-cert.pem \ + -days 365 -nodes \ + -subj "/CN=your-hostname/O=YourOrganization" + +cat /tmp/new-cert.pem /tmp/new-key.pem > /tmp/new-server.keyfile +chmod 600 /tmp/new-server.keyfile + +# Backup and replace (cluster stays running) +cp /path/to/current/server.keyfile \ + /path/to/current/server.keyfile.backup-$(date +%Y%m%d) + +cp /tmp/new-server.keyfile /path/to/current/server.keyfile +``` + +{{< warning >}} +Certificate path must stay the same. Path changes are not supported for hot reload. +{{< /warning >}} + +#### Step 2: Trigger Hot Reload + +```bash +# Reload all server types on each node +# Adjust NODE and PORTs for your environment +curl -k -u root: -X POST https://${NODE}:8529/_admin/server/tls # Coordinator +curl -k -u root: -X POST https://${NODE}:8530/_admin/server/tls # DBServer +curl -k -u root: -X POST https://${NODE}:8531/_admin/server/tls # Agent + +# Expected response: {"error":false,"code":200} +sleep 5 +``` + +#### Step 3: Verify with Fresh Connections + +{{< info >}} +You must force new TLS connections to verify the new certificate is active. +{{< /info >}} + +```bash +# Force fresh connection to see new certificate +timeout 2 openssl s_client -connect localhost:8530/dev/null | \ + openssl x509 -noout -subject -dates +``` + +#### Step 4: Verify Cluster Health + +```bash +curl -k -u root: https://${NODE}:8529/_admin/cluster/health +``` + +#### Step 5: Fallback if Needed + +If verification fails: + +```bash +# Fall back to Option 1 +curl -k -X POST https://${NODE}:${STARTER_PORT}/shutdown +sleep 15 +rm -f /path/to/data-dir/setup.json +# Restart cluster +``` + +### Understanding Connection Caching + +After calling `/_admin/server/tls`: + +1. Server immediately reloads certificate from disk +2. New TLS connections use new certificate +3. Existing TLS connections continue with cached old certificate + +This is standard TLS behavior. Applications will gradually reconnect and pick up new certificate. + +--- + +## 7. Summary + +### For Most Deployments: Use Option 1 + +**Graceful Restart** is recommended for: +- Regular certificate renewals +- Emergency rotations (expiring cert) +- Production updates + +**Why:** +- Simple and reliable (5 steps, 100% success) +- Minimal downtime (30-60 seconds) +- No manual config editing +- Easy rollback + +### For Special Cases + +**Use Option 3 (Hot Reload)** only when: +- Zero downtime SLA absolutely requires it +- You have verification and monitoring +- You understand connection caching + +**Use Option 2 (Config Update)** only when: +- Certificate path must change +- Simplified variant won't work + +### Key Points + +1. **Option 1 is recommended** for 95% of certificate rotations +2. Brief downtime (30-60s) is acceptable and worth the simplicity +3. Hot reload works but adds complexity +4. Manual config editing should be avoided when possible +5. Always backup certificates before rotation +6. Always verify after rotation From d61ccbf453f16e1074f1fa58ac1de725b911b863 Mon Sep 17 00:00:00 2001 From: bluepal-prasanthi-moparthi Date: Thu, 6 Nov 2025 19:18:54 +0530 Subject: [PATCH 2/2] removed space at the end --- .../components/tools/arangodb-starter/certificate-rotation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/content/3.12/components/tools/arangodb-starter/certificate-rotation.md b/site/content/3.12/components/tools/arangodb-starter/certificate-rotation.md index af48f5ff47..93ab843302 100644 --- a/site/content/3.12/components/tools/arangodb-starter/certificate-rotation.md +++ b/site/content/3.12/components/tools/arangodb-starter/certificate-rotation.md @@ -474,4 +474,4 @@ This is standard TLS behavior. Applications will gradually reconnect and pick up 3. Hot reload works but adds complexity 4. Manual config editing should be avoided when possible 5. Always backup certificates before rotation -6. Always verify after rotation +6. Always verify after rotation \ No newline at end of file