2024年03月23日玄貓（BlackCat）

Kubernetes 資源監控與管理

本文探討 Kubernetes 叢集的資源監控與管理，涵蓋 Kubernetes Dashboard、Metrics Server 等內建工具，以及 Prometheus 和 Grafana 的整合應用，提供叢集管理員實用的監控策略和技巧，有效控制資源使用並提升叢集穩定性。

容器技術 DevOps

Kubernetes 資源監控 Metrics Server Prometheus Grafana 資源管理

Kubernetes 資源監控對於維持叢集穩定性和效能至關重要。本文介紹瞭如何使用 Kubernetes 內建工具和開源工具進行資源監控。內建工具如 Kubernetes Dashboard 提供圖形化介面，方便檢視叢集狀態和資源使用情況，而 Metrics Server 則提供資源使用指標資料。此外，Prometheus 和 Grafana 的組合可以提供更進階的監控和視覺化功能，讓管理員能更深入地瞭解叢集的運作狀況。透過這些工具，可以有效地監控資源使用情況、設定警示，並及時應對潛在問題，確保應用程式穩定執行。

Kubernetes 資源監控

在前面的章節中，我們討論了資源監控對於確保叢集服務可用性的重要性。資源監控能夠提前發現叢集中服務不可用的跡象或症狀。通常，資源監控會與警示管理結合使用，以確保在叢集中觀察到任何問題或相關症狀時能夠及時通知相關人員。

本文中，我們首先介紹 Kubernetes 提供的內建監控工具，包括 Kubernetes Dashboard 和 Metrics Server。我們將探討如何設定這些工具，並討論如何高效地使用它們。接著，我們將介紹一些開源工具，它們可以與 Kubernetes 叢集整合，提供比內建工具更深入的洞察。

內建監控工具

讓我們來看看 Kubernetes 提供的用於監控 Kubernetes 資源和物件的工具——Metrics Server 和 Kubernetes Dashboard。

Kubernetes Dashboard

Kubernetes Dashboard 提供了一個 Web UI，讓叢集管理員能夠建立、管理及監控叢集物件和資源。叢集管理員也可以透過儀錶板建立 Pod、服務和 DaemonSet。儀錶板顯示了叢集的狀態以及叢集中發生的任何錯誤。

Kubernetes Dashboard 提供了叢集管理員管理叢集內資源和物件所需的所有功能。鑒於儀錶板的功能，應限制對儀錶板的存取許可權給叢集管理員。從 v1.7.0 版本開始，儀錶板具備了登入功能。2018 年，在儀錶板中發現了一個特權提升漏洞（CVE-2018-18264），該漏洞允許未經身份驗證的使用者登入到儀錶板。雖然沒有已知的實際利用案例，但這個簡單的漏洞可能會對許多 Kubernetes 發行版造成嚴重破壞。

目前的登入功能允許使用服務帳戶和 kubeconfig 登入。建議使用服務帳戶令牌來存取 Kubernetes Dashboard。

# 在預設名稱空間中建立一個服務帳戶
$ kubectl create serviceaccount dashboard-admin-sa

# 將 cluster-admin 角色與服務帳戶繫結
$ kubectl create clusterrolebinding dashboard-admin-sa --clusterrole=cluster-admin --serviceaccount=default:dashboard-admin-sa

# 取得服務帳戶的令牌
$ kubectl describe serviceaccount dashboard-admin-sa
$ kubectl describe secrets dashboard-admin-sa-token-5zwpw

# 使用服務帳戶令牌登入到儀錶板

使用 Kubernetes Dashboard

透過 Kubernetes Dashboard，管理員可以洞察資源可用性、資源分配、Kubernetes 物件和事件日誌。

Kubernetes Dashboard 執行在主節點上的一個容器中。你可以透過在主節點上列舉 Docker 容器來檢視這一點：

$ docker ps | grep dashboard

確保儀錶板容器執行時具備以下引數：

停用不安全埠：--insecure-port 允許 Kubernetes Dashboard 接收 HTTP 請求。確保在生產環境中停用它。
停用不安全地址：應停用 --insecure-bind-address 以避免 Kubernetes Dashboard 可以透過 HTTP 存取。
將繫結地址設定為 localhost：--bind-address 應設定為 127.0.0.1 以防止主機透過網際網路連線。
啟用 TLS：使用 tls-cert-file 和 tls-key-file 透過安全通道存取儀錶板。
確保啟用令牌身份驗證模式：可以使用 --authentication-mode 標誌指定身份驗證模式。預設情況下，它設定為令牌。確保不使用基本身份驗證。
停用不安全登入：當儀錶板可透過 HTTP 存取時，使用不安全登入。預設情況下應停用它。
停用跳過登入：跳過登入允許未經身份驗證的使用者存取 Kubernetes 儀錶板。--enable-skip-login 啟用跳過登入；在生產環境中不應存在此引數。
停用設定授權者：--disable-settings-authorizer 允許未經身份驗證的使用者存取設定頁面。在生產環境中應停用此功能。

內容解密：

建立服務帳戶：首先，我們需要在預設名稱空間中建立一個名為 dashboard-admin-sa 的服務帳戶，這是為了給予 Kubernetes Dashboard 管理員許可權做準備。
角色繫結：接著，我們將 cluster-admin 角色與剛建立的服務帳戶繫結，這樣服務帳戶就具備了管理整個叢集的許可權。
取得服務帳戶令牌：然後，我們需要取得服務帳戶的令牌，用於登入 Kubernetes Dashboard。
安全組態：最後，我們強調了在生產環境中應該如何安全地組態 Kubernetes Dashboard，包括停用不安全埠、啟用 TLS 等，以防止未經授權的存取。

圖表說明：Kubernetes Dashboard 登入流程

此圖示展示了使用服務帳戶令牌登入 Kubernetes Dashboard 的流程，從建立服務帳戶到最終登入儀錶板的每一步驟都清晰地呈現出來。

Kubernetes 資源監控與管理

在 Kubernetes 環境中，資源監控是確保叢集穩定運作的關鍵環節。本文將探討 Kubernetes 中的資源監控工具，包括 Metrics Server、Prometheus 和 Grafana，並介紹如何設定和使用這些工具來監控叢集資源。

Metrics Server

Metrics Server 是 Kubernetes 的一個元件，用於收集叢集中的資源使用資料。它透過 Summary API 從每個節點的 kubelet 收集資料，並將這些資料透過 Metrics API 提供給其他元件使用，如 Horizontal Pod Autoscaler 和 Vertical Pod Autoscaler。

啟用 Metrics Server

在 minikube 中，可以使用以下命令啟用 Metrics Server：

$ minikube addons enable metrics-server

可以使用以下命令檢查 Metrics Server 是否已啟用：

$ kubectl get apiservices | grep metrics
v1beta1.metrics.k8s.io kube-system/metrics-server True 7m17s

使用 Metrics Server

一旦 Metrics Server 啟用後，可以使用 kubectl top 命令檢視節點和 Pod 的資源使用情況：

$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
minikube 156m 7% 1140Mi 30%

$ kubectl top pod
NAME CPU(cores) MEMORY(bytes)
nginx-good 0m 2Mi

Prometheus 和 Grafana

Prometheus 是一個開源的監控系統，用於收集和儲存時間序列資料。Grafana 是一個資料視覺化工具，可以與 Prometheus 整合，提供豐富的資料視覺化功能。

設定 Prometheus

首先，需要建立一個 namespace 用於監控：

$ kubectl create namespace monitoring

然後，需要定義一個 ClusterRole 和 ClusterRoleBinding，以便 Prometheus 可以存取 Kubernetes 資源：

$ cat prometheus-role.yaml
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]

$ kubectl create -f prometheus-role.yaml
clusterrole.rbac.authorization.k8s.io/prometheus created

接下來，需要建立一個 ConfigMap 用於指定 Prometheus 的 scrape 組態：

$ cat config_prometheus.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-server-conf
  labels:
    name: prometheus-server-conf
  namespace: monitoring
data:
  prometheus.yml: |-
    global:
      scrape_interval: 5s
      evaluation_interval: 5s
    scrape_configs:
    - job_name: 'kubernetes-apiservers'
      kubernetes_sd_configs:
      - role: endpoints
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https

使用 Prometheus

可以使用以下命令建立 Prometheus 的 Deployment：

spec:
  containers:
  - name: prometheus
    image: prom/prometheus:v2.12.0
    args:
    - "--config.file=/etc/prometheus/prometheus.yml"
    - "--storage.tsdb.path=/prometheus/"
    ports:
    - containerPort: 9090
    volumeMounts:
    - name: prometheus-config-volume
      mountPath: /etc/prometheus/
    - name: prometheus-storage-volume
      mountPath: /prometheus/
  volumes:
  - name: prometheus-config-volume
    configMap:
      defaultMode: 420
      name: prometheus-server-conf
  - name: prometheus-storage-volume
    emptyDir: {}

建立成功後，可以使用 port-forwarding 或 Kubernetes Service 存取 Prometheus 的 Dashboard：

$ kubectl port-forward <prometheus-pod> 8080:9090 -n monitoring

Prometheus 查詢範例

以下是一些 Prometheus 查詢範例：

Kubernetes CPU 使用率：sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
Kubernetes CPU 使用率（按 namespace）：sum(rate(container_cpu_usage_seconds_total{container_name!="POD",namespace!=""}[5m])) by (namespace)
Pod 的 CPU 請求：sum(kube_pod_container_resource_requests_cpu_cores) by (pod)

設定 Grafana

Grafana 需要一個資料來源，可以使用 ConfigMap 指定：

$ cat grafana-data.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
  namespace: monitoring
data:
  prometheus.yaml: |-
    {
      "apiVersion": 1,
      "datasources": [
        {
          "name": "Prometheus",
          "type": "prometheus",
          "url": "http://prometheus:9090",
          "access": "proxy",
          "isDefault": true
        }
      ]
    }

Kubernetes叢集的即時監控與資源管理

在現代的DevOps與Kubernetes管理中，資源監控與管理是確保叢集穩定運作的關鍵。本章將探討Kubernetes中的資源管理與監控工具，幫助管理員有效控制叢集資源並提升安全性。

資源請求與限制：核心概念

在Kubernetes中，資源請求（Resource Requests）與資源限制（Resource Limits）是確保Pod正常運作的基礎。

資源請求定義了Pod啟動所需的最低資源量，確保排程器能正確分配資源。
資源限制則規定了Pod可使用的最大資源量，防止單一Pod耗盡叢集資源。

設定範例

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  containers:
  - name: example-container
    image: nginx:latest
    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "512Mi"
        cpu: "1000m"

內容解密：

requests.memory: "256Mi" 確保Pod至少獲得256MB記憶體。
limits.memory: "512Mi" 限制Pod最大可用記憶體為512MB。
CPU資源以毫核（milli-core）為單位進行組態。

Namespace資源配額與限制範圍

Namespace級的資源管理工具能有效控制不同團隊或專案的資源使用。

資源配額（ResourceQuota）：限制Namespace內所有Pod的總資源使用量。
限制範圍（LimitRange）：為Namespace中的Pod設定預設的資源請求與限制。

資源配額設定範例

apiVersion: v1
kind: ResourceQuota
metadata:
  name: memory-limit
  namespace: development
spec:
  hard:
    limits.memory: 2Gi
    requests.memory: 1Gi

內容解密：

limits.memory: 2Gi 限制development名稱空間內所有Pod的最大記憶體總和不超過2GB。
requests.memory: 1Gi 確保所有Pod的記憶體請求總和不超過1GB。

Kubernetes內建監控工具

Kubernetes提供了多種內建監控工具，包括：

Kubernetes Dashboard：視覺化的叢集管理介面，用於監控叢集狀態。
Metrics Server：收集叢集中各資源的使用指標。

Metrics Server佈署範例

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

內容解密：

Metrics Server透過Kubernetes API收集節點與Pod的效能指標。
可搭配kubectl top指令檢視即時資源使用狀況。

Prometheus與Grafana：進階監控解決方案

Prometheus和Grafana是目前最流行的開源監控組合，能提供強大的資料收集與視覺化功能。

Prometheus佈署架構圖

@startuml
skinparam backgroundColor #FEFEFE
skinparam componentStyle rectangle

title Kubernetes 資源監控與管理

package "Kubernetes Cluster" {
    package "Control Plane" {
        component [API Server] as api
        component [Controller Manager] as cm
        component [Scheduler] as sched
        database [etcd] as etcd
    }

    package "Worker Nodes" {
        component [Kubelet] as kubelet
        component [Kube-proxy] as proxy
        package "Pods" {
            component [Container 1] as c1
            component [Container 2] as c2
        }
    }
}

api --> etcd : 儲存狀態
api --> cm : 控制迴圈
api --> sched : 調度決策
api --> kubelet : 指令下達
kubelet --> c1
kubelet --> c2
proxy --> c1 : 網路代理
proxy --> c2

note right of api
  核心 API 入口
  所有操作經由此處
end note

@enduml

此圖示展示了Prometheus與Grafana的協同工作原理。

Grafana設定範例

建立ConfigMap儲存Grafana資料來源設定：

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
data:
  prometheus.yaml: |
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://prometheus-service:9090

佈署Grafana：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana:latest
        ports:
        - containerPort: 3000

內容解密：

ConfigMap用於儲存Grafana的資料來源設定。
Grafana佈署後可透過NodePort服務對外提供存取。