Quickstart Guide¶
Quick Start Guide¶
[!NOTE] Who is this guide for? This is a complete, self-sufficient guide to setting up and using Zelyo Operator — from a blank laptop to running AI-powered security scans. No prior Kubernetes operator experience required.
What Is Zelyo Operator?¶
Zelyo Operator is an Autonomous AI Security Agent for Kubernetes. It watches your cluster in real time, detects security misconfigurations, and can automatically create pull requests to fix them — powered by your choice of LLM (Claude, GPT-4, Ollama, etc.).
The three-phase loop:
- Observe: Scans pods for 8 categories of security issues
- Reason: Uses an LLM to explain findings and recommend fixes
- Act: Creates GitHub PRs to remediate violations (optional)
Prerequisites¶
Before starting, install these tools:
| Tool | Version | Install Guide |
|---|---|---|
| Docker | Latest | Required by k3d |
| k3d | Latest | brew install k3d or k3d.io |
| kubectl | Latest | Comes with Docker Desktop or install standalone |
| Helm | 3.x | brew install helm or helm.sh |
[!TIP] Verify your tools are ready
Part 1 — Environment Setup¶
Step 0: Clean the Slate¶
Start fresh to avoid port conflicts with any previous cluster:
# Delete any existing zelyo cluster (safe to run even if it doesn't exist)
k3d cluster delete zelyo
# Optional: remove unused Docker networks
docker network prune -f
Step 1: Create a Fresh Local Cluster¶
This creates a single-node Kubernetes cluster running inside Docker. It takes about 30 seconds.
[!NOTE] What's happening here? k3d runs Kubernetes inside Docker containers — much faster than spinning up real VMs. Your
kubectlcontext is automatically switched tok3d-zelyo.Verify the cluster is running:
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# k3d-zelyo-server-0 Ready control-plane,master 30s v1.31.x
Step 1.5: Prepare Local Developer Build¶
[!TIP] Recommended for Demos For modern features like Slack notifications and enhanced GitOps logic (which are currently being finalized in this dev-build), you should build and deploy the operator locally.
Step 2: Install cert-manager¶
Zelyo Operator uses admission webhooks to validate SecurityPolicy resources before they're stored. Webhooks must communicate over HTTPS, which requires TLS certificates. We use cert-manager to automate certificate provisioning.
# Install cert-manager from the official OCI chart
helm install cert-manager oci://quay.io/jetstack/charts/cert-manager \
--version v1.20.0 \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=true
# Wait until all 3 pods are Running (~60s)
kubectl wait --for=condition=Ready pods --all -n cert-manager --timeout=120s
Verify:
kubectl get pods -n cert-manager
# NAME READY STATUS RESTARTS AGE
# cert-manager-xxx 1/1 Running 0 60s
# cert-manager-cainjector-xxx 1/1 Running 0 60s
# cert-manager-webhook-xxx 1/1 Running 0 60s
Step 3: Install Zelyo Operator¶
# Install the Zelyo Operator
# Note: we are telling it to use our 'local' image we just imported
helm install zelyo-operator oci://ghcr.io/zelyo-ai/charts/zelyo-operator \
--namespace zelyo-system \
--create-namespace \
--set image.repository=zelyo-operator \
--set image.tag=local \
--set image.pullPolicy=IfNotPresent \
--set config.llm.provider=openrouter \
--set config.llm.model=google/gemini-2.0-flash-001 \
--set webhook.certManager.enabled=true
[!TIP] Verify the operator is running
kubectl get pods -n zelyo-system # NAME READY STATUS RESTARTS AGE # zelyo-operator-669577fb4b-7kpg2 1/1 Running 0 30s[!WARNING] Webhook Error? If you see
failed calling webhook "msecuritypolicy.zelyo.ai"when applying resources, run the Webhook Patch commands. This is a known path mismatch in OCI chartv0.0.1.
Step 4: Add Your LLM API Key¶
Now that the zelyo-system namespace exists, create the secret the operator will use to call the LLM.
Zelyo Operator needs an LLM to "Reason" about security findings. You provide your own API key — Zelyo never stores it centrally.
Getting an OpenRouter Key (Recommended)¶
OpenRouter is a gateway that gives you access to Claude, GPT-4, Nvidia, and 100+ other models from one API key with pay-per-use pricing.
- Go to openrouter.ai and create a free account
- Navigate to Keys → Create Key
- Copy your key (starts with
sk-or-v1-...) - Add credit at Credits (minimum $1) — models like Claude Haiku cost fractions of a cent per scan
[!TIP] Model Recommendations | Tier | Model | Cost | |---|---|---| | Free tier |
nvidia/nemotron-3-super-120b-a12b:free| No cost, great for initial testing | | Best for testing |anthropic/claude-haiku| Fast and cheap (~$0.001 per scan) | | Best for production |anthropic/claude-sonnet-4-20250514| Highest reasoning quality |
Other Supported Providers¶
| Provider | Where to Get a Key | Config Value |
|---|---|---|
| OpenRouter | openrouter.ai/keys | openrouter |
| OpenAI | platform.openai.com/api-keys | openai |
| Anthropic | console.anthropic.com | anthropic |
| Ollama (local, free) | ollama.ai — run locally | ollama |
Create the Kubernetes Secret¶
kubectl create secret generic zelyo-llm \
--namespace zelyo-system \
--from-literal=api-key=<YOUR_API_KEY>
[!CAUTION] Never commit API keys to git. Use
kubectl create secretor a secrets manager — never paste them into YAML files.
Step 5: Activate the AI Agent¶
The last piece of the setup is the ZelyoConfig. This is a cluster-wide resource that tells the operator which LLM model to use and connects it to the secret you just created.
kubectl apply -f - <<'EOF'
apiVersion: zelyo.ai/v1alpha1
kind: ZelyoConfig
metadata:
name: default
spec:
mode: protect # Use 'protect' to allow automatic PR creation
llm:
provider: openrouter
model: "google/gemini-2.0-flash-001" # Highly recommended for reliability
apiKeySecret: zelyo-llm
EOF
[!NOTE] What's happening here? Once applied, the operator reconciles this config, initializes a secure LLM client using your secret, and injects it into the remediation engine. You'll see
ZelyoConfig reconciled successfullyin the logs.
Part 2 — The 8 Security Scanners¶
Zelyo Operator ships with 8 built-in scanners. Every scanner runs automatically when triggered by a SecurityPolicy or ClusterScan. Understanding what each one checks helps you write better policies.
How Scanning Works¶
- You create a
SecurityPolicywith rules (e.g.,type: container-security-context) - The operator finds all pods matching your
spec.matchcriteria - Each scanner runs and produces findings
- Findings below your
spec.severitythreshold are filtered out - Results are stored in
.statusand emitted as Kubernetes Events
Scanner Reference¶
Rule type: container-security-context
Checks that containers follow security best practices for their securityContext.
| Check | Severity | What It Means |
|---|---|---|
| No security context set | High | No restrictions at all |
privileged: true | Critical | Full access to the host kernel |
runAsNonRoot not set | High | Container might run as root |
readOnlyRootFilesystem not set | Medium | Filesystem is writable (aids attackers) |
allowPrivilegeEscalation not false | Medium | Child processes can gain more privileges |
Rule type: resource-limits
Checks that every container has CPU and memory requests/limits. Without them, one pod can starve the whole node.
| Check | Severity |
|---|---|
| No CPU request | Medium |
| No CPU limit | Medium |
| No memory request | Medium |
| No memory limit | Medium |
Rule type: image-vulnerability
Checks that images are pinned — not floating on :latest or mutable tags.
| Check | Severity | Why It Matters |
|---|---|---|
Uses :latest tag | High | Image can change without notice |
| No tag (defaults to latest) | High | Same risk |
| Not pinned by digest | Medium | Even versioned tags can be overwritten |
Rule type: pod-security
Checks for Pod Security Standards violations.
| Check | Severity |
|---|---|
hostNetwork: true | Critical |
hostPID: true | Critical |
hostIPC: true | High |
| HostPath volume mounts | High–Critical |
| Dangerous capabilities (SYS_ADMIN, NET_RAW) | High |
Rule type: privilege-escalation
Checks for settings that let attackers escalate privileges after compromise.
| Check | Severity |
|---|---|
| Runs as root (UID 0) | Critical |
| Service account token auto-mounted | Medium |
| Root group (GID 0) | Medium |
[!TIP] Quick win Add
automountServiceAccountToken: falseto every pod that doesn't need Kubernetes API access. Eliminates the most common privilege escalation vector with one line.
Rule type: secrets-exposure
Checks for patterns that could leak sensitive data through environment variables.
| Check | Severity |
|---|---|
| Hardcoded secret in env var | Critical |
Entire Secret injected via envFrom | Medium |
Secret passed via secretKeyRef | Low |
Detected patterns: env var names containing password, secret, token, api_key, access_key, private_key, credentials, auth.
Rule type: network-policy
Checks for network segmentation gaps.
| Check | Severity |
|---|---|
| Pod has no labels | Medium |
Container uses hostPort | High |
Rule type: rbac-audit
Checks for RBAC-related risks at the pod level.
| Check | Severity |
|---|---|
Uses the default service account | Medium |
| Service account name contains "admin" or "root" | High |
Part 3 — Recipes¶
Recipe 1: Security Baseline Scan¶
Goal: Find every security misconfiguration across your workloads using all 8 scanners.
Deploy a Vulnerable Test Pod¶
This pod intentionally has multiple issues: :latest tag, no resource limits, no security context, auto-mounted service account token.
Apply the Policy¶
kubectl apply -f - <<'EOF'
apiVersion: zelyo.ai/v1alpha1
kind: SecurityPolicy
metadata:
name: production-security-baseline
namespace: zelyo-system
spec:
severity: medium
match:
namespaces: ["default"]
rules:
- name: security-context
type: container-security-context
enforce: true
- name: resource-limits
type: resource-limits
enforce: true
- name: image-pinning
type: image-vulnerability
enforce: false
- name: pod-security
type: pod-security
enforce: true
- name: privilege-escalation
type: privilege-escalation
enforce: true
- name: secrets-exposure
type: secrets-exposure
enforce: false
- name: network-policy
type: network-policy
enforce: false
- name: rbac-audit
type: rbac-audit
enforce: false
EOF
Check Results¶
# List all security policies and their violation counts
kubectl get securitypolicies -n zelyo-system
# See detailed findings with AI reasoning
kubectl describe securitypolicy production-security-baseline -n zelyo-system
Expected output:
Look at the Status > Conditions section in the describe output — you'll see ScanCompleted=True and the full reasoning from the LLM.
Cleanup
Recipe 2: Critical-Only Alerting¶
Goal: Filter out noise — only surface high and critical severity findings.
The spec.severity field sets the minimum threshold. Setting it to high means medium, low, and info findings are silently ignored.
kubectl apply -f - <<'EOF'
apiVersion: zelyo.ai/v1alpha1
kind: SecurityPolicy
metadata:
name: critical-only
namespace: zelyo-system
spec:
severity: high
match:
namespaces: ["default"]
rules:
- name: privileged-containers
type: container-security-context
enforce: true
- name: host-access
type: pod-security
enforce: true
- name: root-containers
type: privilege-escalation
enforce: true
- name: hardcoded-secrets
type: secrets-exposure
enforce: true
EOF
kubectl run insecure-nginx --image=nginx:latest --restart=Never -n default
# Wait 10s then check
kubectl get securitypolicies -n zelyo-system
kubectl describe securitypolicy critical-only -n zelyo-system
Expected: Fewer violations compared to Recipe 1 — only high-severity issues appear.
Cleanup
Recipe 3: Nightly Full-Cluster Scan¶
Goal: Run all 8 scanners on a schedule and maintain a 30-day history of reports.
A ClusterScan creates ScanReport resources after each run, giving you a historical audit trail.
kubectl apply -f - <<'EOF'
apiVersion: zelyo.ai/v1alpha1
kind: ClusterScan
metadata:
name: nightly-full-scan
namespace: zelyo-system
spec:
schedule: "0 2 * * *"
scanners:
- container-security-context
- resource-limits
- image-vulnerability
- pod-security
- privilege-escalation
- secrets-exposure
- network-policy
- rbac-audit
scope:
namespaces: []
excludeNamespaces: ["kube-system", "kube-public"]
historyLimit: 30
suspend: false
EOF
[!TIP] Test immediately without waiting until 2 AM Change
schedule: "0 2 * * *"toschedule: "* * * * *"to trigger a scan every minute.
Watch for Reports¶
# Watch ScanReports being created in real-time
kubectl get scanreports -n zelyo-system --watch
# View the latest report's findings
kubectl describe scanreport $(kubectl get scanreports -n zelyo-system \
--sort-by=.metadata.creationTimestamp -o jsonpath='{.items[-1].metadata.name}') \
-n zelyo-system
Expected output from --watch:
Cleanup
Recipe 4: Cost Optimization¶
Goal: Find idle or oversized pods and get AI-powered rightsizing recommendations.
kubectl apply -f - <<'EOF'
apiVersion: zelyo.ai/v1alpha1
kind: CostPolicy
metadata:
name: optimize-default
namespace: zelyo-system
spec:
targetNamespaces: ["default"]
resizeStrategy: conservative
budgetLimits:
monthlyBudgetUSD: "10000"
costIncreaseThresholdPercent: 15
idleDetection:
enabled: true
cpuThresholdPercent: 5
memoryThresholdPercent: 5
idleDurationMinutes: 60
EOF
kubectl get costpolicy optimize-default -n zelyo-system -o wide
kubectl describe costpolicy optimize-default -n zelyo-system
Recipe 5: Slack Alerts¶
Goal: Send security alerts directly to a Slack channel when violations are found.
Step 1: Get a Slack Webhook URL¶
[!TIP] How to get a Slack Webhook URL 1. Go to api.slack.com/apps 2. Click Create New App → From scratch 3. Choose a name (e.g., "Zelyo Operator") and your workspace 4. Go to Incoming Webhooks → toggle Activate Incoming Webhooks to On 5. Click Add New Webhook to Workspace → choose your channel → Allow 6. Copy the Webhook URL (starts with
https://hooks.slack.com/services/...)
Step 2: Create the Secret¶
kubectl create secret generic slack-token \
--namespace zelyo-system \
--from-literal=webhook-url=https://hooks.slack.com/services/YOUR_WORKSPACE/YOUR_CHANNEL/YOUR_TOKEN
Step 3: Create the NotificationChannel¶
kubectl apply -f - <<'EOF'
apiVersion: zelyo.ai/v1alpha1
kind: NotificationChannel
metadata:
name: slack-security
namespace: zelyo-system
spec:
type: slack
credentialSecret: slack-token
severityFilter: high
rateLimit:
maxPerHour: 30
aggregateSeconds: 60
slack:
channel: "#security-alerts"
EOF
Step 4: Attach to a MonitoringPolicy¶
kubectl apply -f - <<'EOF'
apiVersion: zelyo.ai/v1alpha1
kind: MonitoringPolicy
metadata:
name: production-monitoring
namespace: zelyo-system
spec:
targetNamespaces: ["default"]
notificationChannels: ["slack-security"]
eventFilters:
types: ["Warning"]
reasons: ["OOMKilled", "CrashLoopBackOff", "FailedScheduling"]
EOF
Verify Setup¶
kubectl get notificationchannel slack-security -n zelyo-system
kubectl get monitoringpolicy production-monitoring -n zelyo-system
Cleanup
Recipe 6: GitOps Automated Remediation¶
Goal: Have Zelyo Operator automatically open GitHub PRs to fix detected security issues — no manual intervention required.
Step 1: Create a GitHub Personal Access Token¶
- Go to github.com/settings/tokens
- Click Generate new token (classic)
- Select scopes:
repo(full control of private repos) - Click Generate token and copy it (starts with
ghp_...)
[!CAUTION] Store your token securely. GitHub only shows it once. If lost, you must regenerate it.
Step 2: Create the Kubernetes Secret¶
kubectl create secret generic github-creds \
--namespace zelyo-system \
--from-literal=token=ghp_xxxxxxxxxxxxxxxxxxxx
Step 3: Onboard Your Repository¶
kubectl apply -f - <<'EOF'
apiVersion: zelyo.ai/v1alpha1
kind: GitOpsRepository
metadata:
name: infra-repo
namespace: zelyo-system
spec:
url: https://github.com/YOUR_USERNAME/YOUR_REPO
branch: main
paths: ["./"]
provider: github
authSecret: github-creds
enableDriftDetection: true
EOF
Verify onboarding:
kubectl get gitopsrepository infra-repo -n zelyo-system
kubectl describe gitopsrepository infra-repo -n zelyo-system
Look for these conditions in the output:
SecretResolved→ authentication secret found ✅GitOpsConnected→ repository is reachable ✅Ready→ everything is operational ✅
Step 4: Create a RemediationPolicy¶
kubectl apply -f - <<'EOF'
apiVersion: zelyo.ai/v1alpha1
kind: RemediationPolicy
metadata:
name: auto-fix-security
namespace: zelyo-system
spec:
targetPolicies: ["production-security-baseline"]
gitOpsRepository: infra-repo
prTemplate:
titlePrefix: "[Zelyo Operator Auto-Fix]"
labels: ["security", "automated"]
branchPrefix: "zelyo/fix-"
severityFilter: high
maxConcurrentPRs: 3
dryRun: false
autoMerge: false
EOF
[!NOTE]
dryRun: truefor testing Set this while testing to see what PRs would be created without actually opening them. Switch tofalsewhen ready to go live.
Cleanup
Part 4 — Advanced Configuration¶
LLM Configuration via ZelyoConfig¶
You can fine-tune the LLM behaviour and set cost limits using the ZelyoConfig CRD:
kubectl apply -f - <<'EOF'
apiVersion: zelyo.ai/v1alpha1
kind: ZelyoConfig
metadata:
name: default
spec:
llm:
provider: openrouter
model: "anthropic/claude-sonnet-4-20250514"
apiKeySecret: zelyo-llm
temperature: "0.1"
maxTokensPerRequest: 4096
tokenBudget:
hourlyTokenLimit: 50000
dailyTokenLimit: 500000
monthlyTokenLimit: 10000000
alertThresholdPercent: 80
enableCaching: true
batchingEnabled: true
EOF
Monitor LLM token usage:
Switching LLM Providers¶
| Provider | Model String | Note |
|---|---|---|
| OpenRouter | anthropic/claude-sonnet-4-20250514 | Best quality |
| OpenRouter | anthropic/claude-haiku | Fast and cheap |
| OpenAI | gpt-4o | OpenAI native |
| Ollama (local) | llama3 | Free, no internet needed |
To switch:
helm upgrade zelyo-operator oci://ghcr.io/zelyo-ai/charts/zelyo-operator \
--namespace zelyo-system \
--set config.llm.provider=openai \
--set config.llm.model=gpt-4o \
--set config.llm.apiKeySecret=zelyo-llm
Part 5 — Observability¶
Check Operator Health¶
# Pod status
kubectl get pods -n zelyo-system
# Live log stream
kubectl logs -f deploy/zelyo-operator -n zelyo-system
# Events for a specific resource
kubectl events --for securitypolicy/production-security-baseline -n zelyo-system
Inspect Resource Status¶
# List everything in zelyo-system
kubectl get securitypolicies,clusterscans,scanreports,costpolicies,monitoringpolicies,notificationchannels,remediationpolicies,gitopsrepositories,zelyoconfigs -A
# Get conditions as JSON for a security policy
kubectl get securitypolicy production-security-baseline -n zelyo-system \
-o jsonpath='{.status.conditions}' | jq .
Dashboard¶
Zelyo Operator includes a built-in web dashboard:
Part 6 — Verification & Troubleshooting¶
After applying your policies, monitor the operator logs to ensure that notifications are being sent and remediation plans are being generated.
For specific commands to verify Slack alerts, AI reasoning (LLM) status, and GitOps PR creation, see the Troubleshooting Guide.
[!WARNING] Webhook Error? If you see
failed calling webhook "msecuritypolicy.zelyo.ai"while applying any resource, run the Webhook Patch commands. This is a known issue with OCI chartv0.0.1.
Part 7 — Full Environment Teardown¶
When you're done testing, remove everything cleanly:
# Delete all Zelyo resources
kubectl delete securitypolicies,clusterscans,scanreports,costpolicies,monitoringpolicies,notificationchannels,remediationpolicies,gitopsrepositories --all -n zelyo-system
kubectl delete zelyoconfigs --all
# Uninstall the operator and cert-manager
helm uninstall zelyo-operator -n zelyo-system
helm uninstall cert-manager -n cert-manager
# Delete the cluster
k3d cluster delete zelyo