Kayenta is the subcomponent of Spinnaker that handles automated canary analysis during a deployment. It reads from your metric sources and compares the stats from an existing deployed service against a new version of the service to see if there are anomalies or problems, indicating the rollout should be aborted if the new service fails to meet specified tolerances.
I’m a huge fan of Spinnaker, but sometimes you already have a full CI/CD system in place and you really don’t want to replace all of that with Spinnaker. You really just want the canary part of Spinnaker. Luckily, you can totally use Kayenta as a standalone service. They even have some light documentation on it!
In my specific case, I also want to use Azure Storage as the place where I store the data for Kayenta - canary configuration, that sort of thing. It’s totally possible to do that, but, at least at the time of this writing, the hal config canary
Halyard command does not have Azure listed and the docs don’t cover it.
So there are a couple of things that come together here, and maybe all of it’s interesting to you or maybe only one piece. In any case, here’s what we’re going to build:
- A Kubernetes ingress to allow access to Kayenta from your CI/CD pipeline.
- A deployment of the Kayenta microservice.
- Kayenta configured to use an Azure Storage Account to hold its configuration and such.
Things I’m not going to cover:
- How exactly your CI/CD canary stage needs to work.
- How long a canary stage should last.
- How exactly you should configure Kayenta (other than the Azure part).
- Which statistics you should monitor for your services to determine if they “pass” or “fail.”
- Securing the Kayenta ingress so only authenticated/authorized access is allowed.
This stuff is hard and it gets pretty deep pretty quickly. I can’t cover it all in one go. I don’t honestly have answers to all of it anyway, since a lot of it depends on how your build pipeline is set up, how your app is set up, and what your app does. There’s no “one-size-fits-all.”
Let’s do it.
Deployment
First, provision an Azure Storage account. Make sure you enable HTTP access because right now Kayenta requires HTTP and not HTTPS.
You also need to provision a container in the Azure Storage account to hold the Kayenta contents.
# I love me some PowerShell, so examples/scripts will be PowerShell.
# Swap in your preferred names as needed.
$ResourceGroup = "myresourcegroup"
$StorageAccountName = "kayentastorage"
$StorageContainerName = "kayenta"
$Location = "westus2"
# Create the storage account with HTTP enabled.
az storage account create `
--name $StorageAccountName `
--resoure-group $ResourceGroup `
--location $Location `
--https-only false `
--sku Standard_GRS
# Get the storage key so you can create a container.
$StorageKey = az storage account keys list `
--account-name $StorageAccountName `
--query '[0].value' `
-o tsv
# Create the container that will hold Kayenta stuff.
az storage container create `
--name $StorageContainerName `
--account-name $StorageAccountName `
--account-key $StorageKey
Let’s make a namespace in Kubernetes for Kayenta so we can put everything we’re deploying in there.
# We'll use the namespace a lot, so a variable
# for that in our scripting will help.
$Namespace = "kayenta"
kubectl create namespace $Namespace
Kayenta needs Redis. We can use the Helm chart to deploy a simple Redis instance. Redis must not be in clustered mode, and there’s no option for providing credentials.
helm repo add bitnami https://charts.bitnami.com/bitnami
# The name of the deployment will dictate the name of the
# Redis master service that gets deployed. In this example,
# 'kayenta-redis' as the deployment name will create a
# 'kayenta-redis-master' service. We'll need that later for
# Kayenta configuration.
helm install kayenta-redis bitnami/redis `
-n $Namespace `
--set cluster.enabled=false `
--set usePassword=false `
--set master.persistence.enabled=false
Now let’s get Kayenta configured. This is a full, commented version of a Kayenta configuration file. There’s also a little doc on Kayenta configuration that might help. What we’re going to do here is put the kayenta.yml
configuration into a Kubernetes ConfigMap so it can be used in our service.
Here’s a ConfigMap YAML file based on the fully commented version, but with the extra stuff taken out. This is also where you’ll configure the location of Prometheus (or whatever) where Kayenta will read stats. For this example, I’m using Prometheus with some basic placeholder config.
apiVersion: v1
kind: ConfigMap
metadata:
name: kayenta
namespace: kayenta
data:
kayenta.yml: |-
server:
port: 8090
# This should match the name of the master service from when
# you deployed the Redis Helm chart earlier.
redis:
connection: redis://kayenta-redis-master:6379
kayenta:
atlas:
enabled: false
google:
enabled: false
# This is the big one! Here's where you configure your Azure Storage
# account and container details.
azure:
enabled: true
accounts:
- name: canary-storage
storageAccountName: kayentastorage
# azure.storageKey is provided via environment AZURE_STORAGEKEY
# so it can be stored in a secret. You'll see that in a bit.
# Don't check in credentials!
accountAccessKey: ${azure.storageKey}
container: kayenta
rootFolder: kayenta
endpointSuffix: core.windows.net
supportedTypes:
- OBJECT_STORE
- CONFIGURATION_STORE
aws:
enabled: false
datadog:
enabled: false
graphite:
enabled: false
newrelic:
enabled: false
# Configure your Prometheus here. Or if you're using something else, disable
# Prometheus and configure your own metrics store. The important part is you
# MUST have a metrics store configured!
prometheus:
enabled: true
accounts:
- name: canary-prometheus
endpoint:
baseUrl: http://prometheus:9090
supportedTypes:
- METRICS_STORE
signalfx:
enabled: true
wavefront:
enabled: false
gcs:
enabled: false
blobs:
enabled: true
s3:
enabled: false
stackdriver:
enabled: false
memory:
enabled: false
configbin:
enabled: false
remoteJudge:
enabled: false
# Enable the SCAPE endpoint that has the same user experience that the Canary StageExecution in Deck/Orca has.
# By default this is disabled - in standalone we enable it!
standaloneCanaryAnalysis:
enabled: true
metrics:
retry:
series: SERVER_ERROR
statuses: REQUEST_TIMEOUT, TOO_MANY_REQUESTS
attempts: 10
backoffPeriodMultiplierMs: 1000
serialization:
writeDatesAsTimestamps: false
writeDurationsAsTimestamps: false
management.endpoints.web.exposure.include: '*'
management.endpoint.health.show-details: always
keiko:
queue:
redis:
queueName: kayenta.keiko.queue
deadLetterQueueName: kayenta.keiko.queue.deadLetters
spectator:
applicationName: ${spring.application.name}
webEndpoint:
enabled: true
swagger:
enabled: true
title: Kayenta API
description:
contact:
patterns:
- /admin.*
- /canary.*
- /canaryConfig.*
- /canaryJudgeResult.*
- /credentials.*
- /fetch.*
- /health
- /judges.*
- /metadata.*
- /metricSetList.*
- /metricSetPairList.*
- /metricServices.*
- /pipeline.*
- /standalone.*
Save that and deploy it to the cluster.
kubectl apply -f kayenta-configmap.yml
You’ll notice in the config we just put down that we did not include the Azure Storage acccount key. Assuming we want to commit that YAML to a source control system at some point, we definitely don’t want credentials in there. Instead, let’s use a Kubernetes secret for the Azure Storage account key.
# Remember earlier we got the storage account key for creating
# the container? We're going to use that again.
kubectl create secret generic azure-storage `
-n $Namespace `
--from-literal=storage-key="$StorageKey"
It’s deployment time! Let’s get a Kayenta container into the cluster! Obviously you can tweak all the tolerances and affinities and node selectors and all that to your heart’s content. I’m keeping the example simple.
apiVersion: apps/v1
kind: Deployment
metadata:
name: kayenta
namespace: kayenta
labels:
app.kubernetes.io/name: kayenta
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kayenta
template:
metadata:
labels:
app.kubernetes.io/name: kayenta
spec:
containers:
- name: kayenta
# Find the list of tags here: https://console.cloud.google.com/gcr/images/spinnaker-marketplace/GLOBAL/kayenta?gcrImageListsize=30
# This is just the tag I've been using for a while. I use one of the images NOT tagged
# with Spinnaker because the Spinnaker releases are far slower.
image: "gcr.io/spinnaker-marketplace/kayenta:0.17.0-20200803200017"
env:
# If you need to troubleshoot, you can set the logging level by adding
# -Dlogging.level.root=TRACE
# Without the log at DEBUG level, very little logging comes out at all and
# it's really hard to see if something goes wrong. If you don't want that
# much logging, go ahead and remove the log level option here.
- name: JAVA_OPTS
value: "-XX:+UnlockExperimentalVMOptions -Dlogging.level.root=DEBUG"
# We can store secrets outside config and provide them via the environment.
# Insert them into the config file using ${dot.delimited} versions of the
# variables, like ${azure.storageKey} which we saw in the ConfigMap.
- name: AZURE_STORAGEKEY
valueFrom:
secretKeyRef:
name: azure-storage
key: storage-key
ports:
- name: http
containerPort: 8090
protocol: TCP
livenessProbe:
httpGet:
path: /health
port: http
readinessProbe:
httpGet:
path: /health
port: http
volumeMounts:
- name: config-volume
mountPath: /opt/kayenta/config
volumes:
- name: config-volume
configMap:
name: kayenta
And let’s save and apply.
kubectl apply -f kayenta-deployment.yml
If you have everything wired up right, the Kayenta instance should start. But we want to see something happen, right? Without kubectl port-forward
?
Let’s put a LoadBalancer service in here so we can access it. I’m going to show the simplest Kubernetes LoadBalancer here, but in your situation you might have, say, an nginx ingress in play or something else. You’ll have to adjust as needed.
apiVersion: v1
kind: Service
metadata:
name: kayenta
namespace: kayenta
labels:
app.kubernetes.io/name: kayenta
spec:
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app.kubernetes.io/name: kayenta
type: LoadBalancer
Let’s see it do something. You should be able to get the public IP address for that LoadBalancer service by doing:
kubectl get service/kayenta -n $Namespace
You’ll see something like this:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kayenta LoadBalancer 10.3.245.137 104.198.205.71 80/TCP 54s
Take note of that external IP and you can visit the Swagger docs in a browser: http://104.198.205.71/swagger-ui.html
If it’s all wired up, you should get some Swagger docs!
The first operation you should try is under credentials-controller
- GET /credentials
. This will tell you what metrics and object stores Kayenta thinks it’s talking to. The result should look something like this:
[
{
"name": "canary-prometheus",
"supportedTypes": [
"METRICS_STORE"
],
"endpoint": {
"baseUrl": "http://prometheus"
},
"type": "prometheus",
"locations": [],
"recommendedLocations": []
},
{
"name": "canary-storage",
"supportedTypes": [
"OBJECT_STORE",
"CONFIGURATION_STORE"
],
"rootFolder": "kayenta",
"type": "azure",
"locations": [],
"recommendedLocations": []
}
]
If you are missing the canary-storage
account pointing to azure
- that means Kayenta can’t access the storage account or it’s otherwise misconfigured. I found the biggest gotcha here was that it’s HTTP-only and that’s not the default for a storage account if you create it through the Azure portal. You have to turn that on.
Troubleshooting
What do you do if you can’t figure out why Kayenta isn’t connecting to stuff?
Up in the Kubernetes deployment, you’ll see the logging is set up at the DEBUG
level. The logging is pretty good at this level. You can use kubectl logs
to get the logs from the Kayenta pods or, better, use stern
for that Those logs are going to be your secret. You’ll see errors that pretty clearly indicate whether there’s a DNS problem or a bad password or something similar.
If you still aren’t getting enough info, turn the log level up to TRACE
. It can get noisy, but you’ll only need it for troubleshooting.
Next Steps
There’s a lot you can do from here.
Canary configuration: Actually configuring a canary is hard. For me, it took deploying a full Spinnaker instance and doing some canary stuff to figure it out. There’s a bit more doc on it now, but it’s definitely tricky. Here’s a pretty basic configuration where we just look for errors by ASP.NET microservice controller. No, I can not help or support you in configuring a canary. I’ll give you this example with no warranties, expressed or implied.
{
"canaryConfig": {
"applications": [
"app"
],
"classifier": {
"groupWeights": {
"StatusCodes": 100
},
"scoreThresholds": {
"marginal": 75,
"pass": 75
}
},
"configVersion": "1",
"description": "App Canary Configuration",
"judge": {
"judgeConfigurations": {
},
"name": "NetflixACAJudge-v1.0"
},
"metrics": [
{
"analysisConfigurations": {
"canary": {
"direction": "increase",
"nanStrategy": "replace"
}
},
"groups": [
"StatusCodes"
],
"name": "Errors By Controller",
"query": {
"customInlineTemplate": "PromQL:sum(increase(http_requests_received_total{app='my-app',azure_pipelines_version='${location}',code=~'5\\\\d\\\\d|4\\\\d\\\\d'}[120m])) by (action)",
"scopeName": "default",
"serviceType": "prometheus",
"type": "prometheus"
},
"scopeName": "default"
}
],
"name": "app-config",
"templates": {
}
},
"executionRequest": {
"scopes": {
"default": {
"controlScope": {
"end": "2020-11-20T23:01:09.3NZ",
"location": "baseline",
"scope": "control",
"start": "2020-11-20T21:01:09.3NZ",
"step": 2
},
"experimentScope": {
"end": "2020-11-20T23:01:09.3NZ",
"location": "canary",
"scope": "experiment",
"start": "2020-11-20T21:01:09.3NZ",
"step": 2
}
}
},
"siteLocal": {
},
"thresholds": {
"marginal": 75,
"pass": 95
}
}
}
Integrate with your CI/CD pipeline: Your deployment is going to need to know how to track the currently deployed vs. new/canary deployment. Statistics are going to need to be tracked that way, too. (That’s the same as if you were using Spinnaker.) I’ve been using the KubernetesManifest@0
task in Azure DevOps, setting trafficSplitMethod: smi
and making use of the canary control there. A shell script polls Kayenta to see how the analysis is going.
How you do this for your template is very subjective. Pipelines at this level are really complex. I’d recommend working with Postman or some other HTTP debugging tool to get things working before trying to automate it.
Secure it!: You probably don’t want public anonymous access to the Kayenta API. I locked mine down with oauth2-proxy and Istio but you could do it with nginx ingress and oauth2-proxy or some other mechanism.
Put a UI on it!: As you can see, configuring Kayenta canaries without a UI is actually pretty hard. Nike has a UI for standalone Kayenta called “Referee”. At the time of this writing there’s no Docker container for it so it’s not as easy to deploy as you might like. However, there is a Dockerfile gist that might be helpful. I have not personally got this working, but it’s on my list of things to do.
Huge props to my buddy Chris who figured a lot of this out, especially the canary configuration and Azure DevOps integration pieces.