Skip to content

1.K8sGPT简介及安装

k8sGPT是一款基于AI的Kubernetes故障诊断工具,可自动扫描集群、分析日志与事件,用自然语言解释问题并给出 可执行修复建议,降低K8s运维门槛,可以实现自动扫描集群异常并通过指定的后端大模型生成解决方案,k8sGPT 支持多类 AI 后端(OpenAI、本地模型等),提供两种核心使用模式:

  • Cli方式安装:安装 cli 工具方式使用,通过kubeconfig连接集群,即时诊断问题
  • Operator方式安装:通过在集群中安装Operator方式使用,这种方式非常适合持续监控集群,并且可以与 Prometheus和Alertmanager等现有监控集成
## 1.1 K8sGPT Cli 安装
https://github.com/k8sgpt-ai/k8sgpt
### 1.1.1:下载linux 通用版本
```shell
root@master01:~# wget https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.4.33/                                                                   k8sgpt_Linux_x86_64.tar.gz
root@master01:~# tar xvf k8sgpt_Linux_x86_64.tar.gz
CHANGELOG.md
LICENSE
README.md
k8sgpt
root@master01:~# mv k8sgpt /usr/bin
root@master01:~# k8sgpt version
k8sgpt: 0.4.33 (fb24679), built at: unknown
root@master01:~# k8sgpt --help
Kubernetes debugging powered by AI

Usage:
  k8sgpt [command]

Available Commands:
  analyze         This command will find problems within your Kubernetes cluster
  auth            Authenticate with your chosen backend
  cache           For working with the cache the results of an analysis
  completion      Generate the autocompletion script for the specified shell
  custom-analyzer Manage a custom analyzer
  dump            Creates a dumpfile for debugging issues with K8sGPT
  filters         Manage filters for analyzing Kubernetes resources
  generate        Generate Key for your chosen backend (opens browser)
  help            Help about any command
  integration     Integrate another tool into K8sGPT
  serve           Runs k8sgpt as a server
  version         Print the version number of k8sgpt

Flags:
      --config string        Default config file (/root/.config/k8sgpt/k8sgpt.yaml)
  -h, --help                 help for k8sgpt
      --kubeconfig string    Path to a kubeconfig. Only required if out-of-cluster.
      --kubecontext string   Kubernetes context to use. Only required if out-of-cluster.
  -v, --verbose              Show detailed tool actions (e.g., API calls, checks).

Use "k8sgpt [command] --help" for more information about a command.

1.1.2:配置AI Provider

https://docs.k8sgpt.ai/reference/providers/backend/ #各类Backend对接

shell
root@master01:~# k8sgpt auth list
Default:
> openai
Active:
Unused:
> openai
> localai
> ollama
> azureopenai
> cohere
> amazonbedrock
> amazonbedrockconverse
> amazonsagemaker
> google
> noopai
> huggingface
> googlevertexai
> oci
> customrest
> ibmwatsonxai
> groq
root@master01:~# baseurl=https://modelservice.jdcloud.com/v1/
root@master01:~# model=DeepSeek-V3-0324
root@master01:~# key=

root@master01:~# k8sgpt auth add -b localai -u $baseurl -m $model -p $key
localai added to the AI backend provider list

# 设置为默认Provider
root@master01:~# k8sgpt auth default -p localai
Default provider set to localai

# 验证
root@master01:~# k8sgpt auth list
Default:
> localai
Active:
> localai
Unused:
> openai
> ollama
> azureopenai
> cohere
> amazonbedrock
> amazonbedrockconverse
> amazonsagemaker
> google
> noopai
> huggingface
> googlevertexai
> oci
> customrest
> ibmwatsonxai
> groq

1.2 执行扫描

默认扫描全部NS的全部资源对象

shell
root@master01:~# k8sgpt analyze --explain

1.2.1 指定NS及资源对象

shell
root@master01:~# k8sgpt analyze --explain --filter=Pod --namespace=myserver --output=json
{
  "provider": "localai",
  "errors": null,
  "status": "OK",
  "problems": 0,
  "results": null
}

1.2.2 指定NS及多个资源对象

shell
root@master01:~# k8sgpt analyze --explain --filter=Pod,Service,Deployment,ConfigMap --namespace=kube-system --output=json
W0524 20:24:09.956499 2736327 warnings.go:70] v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice
{
  "provider": "localai",
  "errors": null,
  "status": "ProblemDetected",
  "problems": 3,
  "results": [
    {
      "kind": "ConfigMap",
      "name": "kube-system/extension-apiserver-authentication",
      "error": [
        {
          "Text": "ConfigMap extension-apiserver-authentication is not used by any pods in the namespace",
          "KubernetesDoc": "",
          "Sensitive": []
        }
      ],
      "details": "Error: The ConfigMap \"extension-apiserver-authentication\" is not referenced by any pods in the namespace.  \nSolution:  \n1. Check if the ConfigMap is required for your workload.  \n2. If needed, update your pod/deployment manifest to reference it under `volumes` or `envFrom`.  \n3. Apply changes with `kubectl apply -f \u003cmanifest\u003e`.  \n4. If unused, delete it with `kubectl delete configmap extension-apiserver-authentication`.",
      "parentObject": ""
    },
    {
      "kind": "ConfigMap",
      "name": "kube-system/kube-apiserver-legacy-service-account-token-tracking",
      "error": [
        {
          "Text": "ConfigMap kube-apiserver-legacy-service-account-token-tracking is not used by any pods in the namespace",
          "KubernetesDoc": "",
          "Sensitive": []
        }
      ],
      "details": "Error: The ConfigMap \"kube-apiserver-legacy-service-account-token-tracking\" exists but isn't referenced by any pods in the namespace.  \nSolution:  \n1. Check if the ConfigMap is needed.  \n2. If unused, delete it: `kubectl delete configmap kube-apiserver-legacy-service-account-token-tracking`.  \n3. If needed, ensure pods reference it in their specs.",
      "parentObject": ""
    },
    {
      "kind": "ConfigMap",
      "name": "kube-system/kube-root-ca.crt",
      "error": [
        {
          "Text": "ConfigMap kube-root-ca.crt is not used by any pods in the namespace",
          "KubernetesDoc": "",
          "Sensitive": []
        }
      ],
      "details": "Error: The ConfigMap \"kube-root-ca.crt\" exists but isn't referenced by any pods in the namespace.  \nSolution:  \n1. Check if pods need this ConfigMap.  \n2. If needed, add a volumeMount and volume referencing it in the pod spec.  \n3. If unused, delete it: `kubectl delete configmap kube-root-ca.crt -n \u003cnamespace\u003e`.",
      "parentObject": ""
    }
  ]
}

2.K8sGPT Operator安装