Production
This documentation highlights how to run Opsmate in production environment.
Why bother?
Opsmate can be used as a command line tool standalone however it comes with a few limitations:
- Every local workstation is a snowflake in its own way, thus it's hard to have a consistent experience across different machines.
- Some of the production environments access are simply not available from the local workstation.
- People cannot collaborate on a local workstation.
To address these issues, we also provide a opsmate-operator that can run Opsmate on demand in a Kubernetes cluster.
Key features
Here are some of the key features of opsmate-operator:
- Manage the Opsmate environment via a EnvironmentBuildCRD.
- Opsmatecan be scheduled on demand via a- TaskCRD.
- Each of the Opsmatetask comes with a dedicated secured HTTPS endpoint and web UI, run inside a dedicated pod.
- Opsmateenvironment builds and tasks are scoped by the namespace thus support multi-tenancy.
- The task comes with a TTL(time to live) thus it will be automatically garbage collected after the TTL expires. By doing so we avoid resource waste.
- An API-server to allow you to manage the Opsmateenvironment and tasks.
How to install the operator
Here is an example of how to install the operator using Terraform and Helm.
# Where you install the operator
resource "kubernetes_namespace" "opsmate_operator" {
  metadata {
    name = "opsmate-operator"
  }
}
resource "helm_release" "opsmate_operator" {
  name             = "opsmate-operator"
  repository       = "oci://ghcr.io/opsmate-ai/opsmate-operator"
  chart            = "opsmate-operator"
  version          = "0.2.0"
  namespace        = kubernetes_namespace.opsmate_operator.metadata[0].name
  create_namespace = false
  max_history      = 3
  set {
    name  = "installCRDs"
    value = "true"
  }
  values = [
    yamlencode({
      controllerManager = {
        fullnameOverride = "opsmate-operator"
      }
    }),
  ]
}
Environment Build
Opsmate Environment Build is a CRD (Custom Resource Definition) that defines the environment that will be used to run the Opsmate task.
The following example we:
- Create a new namespace opsmate-workspace
- Create a new cluster role opsmate-cluster-readerwhich is bound to theopsmate-cluster-readerservice account.
- Create a new environmentBuildcalledcluster-readerwhich will be used as a template for running theOpsmatetask.
The environmentBuild is composed of:
- A opsmatecontainer that runs as a Web UI and API server.
- A workercontainer that is responsible for running background heavy-lifting tasks such as ingesting the knowledge base and embedding into the vector database.
- The opsmateand theworkercontainers shared the same volume for storing the sqlite database and the vector database.
Click to show opsmate-cluster-reader ClusterRole
---
# cluster reader role
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: opsmate-cluster-reader
rules:
- apiGroups: [""]  # Core API group
  resources:
  - nodes
  - namespaces
  - pods
  - services
  - configmaps
  - secrets
  - persistentvolumes
  - persistentvolumeclaims
  - events
  verbs:
  - get
  - list
  - watch
- apiGroups: ["apps"]  # Apps API group
  resources:
  - deployments
  - daemonsets
  - statefulsets
  - replicasets
  verbs:
  - get
  - list
  - watch
- apiGroups: ["batch"]  # Batch API group
  resources:
  - jobs
  - cronjobs
  verbs:
  - get
  - list
  - watch
- apiGroups: ["networking.k8s.io"]  # Networking API group
  resources:
  - ingresses
  - networkpolicies
  verbs:
  - get
  - list
  - watch
- apiGroups: ["storage.k8s.io"]  # Storage API group
  resources:
  - storageclasses
  verbs:
  - get
  - list
  - watch
---
apiVersion: v1
kind: Namespace
metadata:
  name: opsmate-workspace
---
# service account for cluster reader
apiVersion: v1
kind: ServiceAccount
metadata:
  name: opsmate-cluster-reader
  namespace: opsmate-workspace
---
# role binding for cluster reader
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: opsmate-cluster-reader
subjects:
- kind: ServiceAccount
  name: opsmate-cluster-reader
  namespace: opsmate-workspace
roleRef:
  kind: ClusterRole
  name: opsmate-cluster-reader
---
# configmap for opsmate task
apiVersion: v1
kind: ConfigMap
metadata:
  name: opsmate-config
  namespace: opsmate-workspace
data:
  OPSMATE_DB_URL: sqlite:////var/opsmate/opsmate.sqlite
  EMBEDDINGS_DB_PATH: /var/opsmate/embedding
  GITHUB_EMBEDDINGS_CONFIG: |
    {
      "opsmate-ai/opsmate": "**/*.md"
    }
---
# cluster reader environment build
apiVersion: sre.opsmate.io/v1alpha1
kind: EnvironmentBuild
metadata:
  name: cluster-reader
  namespace: opsmate-workspace
spec:
  podTemplate:
    spec:
      serviceAccountName: opsmate-cluster-reader
      initContainers:
        - name: opsmate-db-migrate
          image: ghcr.io/opsmate-ai/opsmate:0.2.0a0
          args:
            - db-migrate
          envFrom:
            - configMapRef:
                name: opsmate-config
          volumeMounts:
            - name: opsmate-vol
              mountPath: /var/opsmate
      containers:
        - name: opsmate
          image: ghcr.io/opsmate-ai/opsmate:0.2.0a0
          ports:
            - containerPort: 8000
          envFrom:
            - secretRef:
                name: opsmate-secret
            - configMapRef:
                name: opsmate-config
          volumeMounts:
            - name: opsmate-vol
              mountPath: /var/opsmate
          args:
            - serve
            - --auto-migrate=false
        - name: worker
          image: ghcr.io/opsmate-ai/opsmate:0.1.45a0
          envFrom:
            - secretRef:
                name: opsmate-secret
            - configMapRef:
                name: opsmate-config
          args:
            - worker
            - --auto-migrate=false
          volumeMounts:
            - name: opsmate-vol
              mountPath: /var/opsmate
      imagePullSecrets:
        - name: opsmate-workspace-image-pull-secret
      volumes:
      - name: opsmate-vol
        emptyDir:
          sizeLimit: 500Mi
  service:
    type: ClusterIP
    ports:
      - port: 80
        targetPort: 8000
  ingressTLS: true
  ingressTargetPort: 80
There are a few secrets that you will need to create:
- OPENAI_API_KEY- If you are using OpenAI as your LLM provider. Currently it is mandatory as we are using OpenAI's embedding API for embedding the knowledge base.
- ANTHROPIC_API_KEY- If you are using Anthropic as your LLM provider.
- XAI_API_KEY- If you are using xAI as your LLM provider.
- GITHUB_TOKEN- This is used for
- Accessing the GitHub repository for loading knowledge base.
- Used by Opsmate to to clone the repo, commit changes and raise PRs.
Here are the examples of how to create the secrets:
apiVersion: v1
kind: Secret
metadata:
  name: opsmate-secret
  namespace: opsmate-workspace
type: Opaque
data:
  OPENAI_API_KEY: <your-openai-api-key-base64-encoded>
  ANTHROPIC_API_KEY: <your-anthropic-api-key-base64-encoded>
  GITHUB_TOKEN: <your-github-token-base64-encoded>
---
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: gcp-secret-store
  namespace: opsmate-workspace
spec:
  provider:
    gcpsm:
      projectID: $YOUR_GCP_PROJECT_ID
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: opsmate-secret
  namespace: opsmate-workspace
spec:
  refreshInterval: 1h
  secretStoreRef:
    kind: SecretStore
    name: gcp-secret-store
  target:
    name: opsmate-secret
    creationPolicy: Owner
  data:
    - secretKey: OPENAI_API_KEY
      remoteRef:
        key: opsmate-workspace-openai-key
    - secretKey: ANTHROPIC_API_KEY
      remoteRef:
        key: opsmate-workspace-anthropic-key
    - secretKey: GITHUB_TOKEN
      remoteRef:
        key: opsmate-workspace-github-token-ro
Task
The task is a CRD that defines a workspace that will be used for tackling production problem.
Here is an example of a task:
---
apiVersion: sre.opsmate.io/v1alpha1
kind: Task
metadata:
  name: investigator
  namespace: opsmate-workspace
spec:
  userID: anonymous
  environmentBuildName: cluster-reader
  description: "a opsmate task for investigating the cluster"
  context: "you are on a kubernetes cluster"
  domainName: "investigator.opsmate.your-corp.com"
  ingressAnnotations:
    external-dns.alpha.kubernetes.io/hostname: investigator.opsmate.your-corp.com
  ingressSecretName: opsmate-cert
In the example above we assume that you:
- Own the domain name opsmate.your-corp.com
- Can use external-dns to manage the ingress for the domain name.
- Have a wildcard *.opsmate.your-corp.comcertificate in theopsmate-workspacenamespace managed by cert-manager. Notes the wildcard certificate can now be provisioned by LetsEncrypt.
After you create the task you can access the task via the following URL:
https://investigator.opsmate.your-corp.com?token=$(kubectl -n opsmate-workspace get task investigator -o jsonpath='{.status.token}')