在上一章节介绍了 8-5 在Prometheus实现Kubernetes-apiserver及Coredns服务发现 基于K8s集群内部安装的Prometheus,添加服务发现时更加方便。Prometheus的安装方式有多种,详情参考 8-1 基于Operator和二进制安装Prometheus系统。
对于二进制部署的Prometheus,即集群外部的监控系统。配置服务发现时涉及到创建用户,授权,添加job,重写标签等。
创建用户prometheus和密码:
---
# 创建用户
apiVersion: v1
kind: ServiceAccount
metadata:name: prometheusnamespace: monitoring---
# 创建密码
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:name: monitoring-tokennamespace: monitoringannotations:kubernetes.io/service-account.name: "prometheus"
设置权限,并将用户与权限绑定:
---
# 设置权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: prometheus
rules:
- apiGroups:- ""resources:- nodes- services- endpoints- pods- nodes/proxy# 对于基本资源可读可观察verbs:- get- list- watch
- apiGroups:- "extensions"resources:- ingressesverbs:- get- list- watch
- apiGroups:- ""resources:- configmaps- nodes/metrics# 配置资源只读verbs:- get
- nonResourceURLs:- /metricsverbs:- get---
# 绑定权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: prometheus
subjects:
- kind: ServiceAccountname: prometheusnamespace: monitoring
在K8s集群内查看secret 资源,复制token的值:
sudo kubectl describe secret monitoring-token -n monitoring
Name: monitoring-token
Namespace: monitoring
Labels:
Annotations: kubernetes.io/service-account.name: prometheuskubernetes.io/service-account.uid: da94b15f-55bb-4eba-9d20-52f0b33a9852Type: kubernetes.io/service-account-tokenData
====
ca.crt: 1302 bytes
namespace: 10 bytes
token: eyJhbGciOiJSUzI1NiIsImtpZCI6IkF3Y3h6QklHbXh3S1g3Nl9LNlBIcTNTVEQ3MWpJRU9NcEdJM2hTZDU4SzgifQ.eyJpc3
MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5poTd1njdbdMYlmaUuAPIT_5hY5D3pRgabQ6tysWc0QuFN_mn6U-E
nbBlka6ZUB3gjlvk4XBKZJutqHyFHtkc6RYN98kKSPeRBCXFd8vZROx9PsOjL1uIseox4IeaZ8BvGje3RkGHiyTp_djmc8eyBBA6DwtKKldsd
3hhuD0eX2hbbg2YZVbiYOkLK976gL5pX_8BPQeZ66McDTCPlaoiYOIcegVGwZs49kA4YlYV_A5bO8WUSvnKQfPK_74qLy0BGp-rx0gjTc7w
到K8s集群外的Prometheus服务器,粘贴token的值:
vim /apps/prometheus/k8s.token
eyJhbGciOiJSUzI1NiIsImtpZCI6IkF3Y3h6QklHbXh3S1g3Nl9LNlBIcTNTVEQ3MWpJRU9NcEdJM2hTZDU4SzgifQ.eyJpc3MiOiJrdWJlcm5l
GVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5poTd1njdbdMYlmaUuAPIT_5hY5D3pRgabQ6tysWc0QuFN_mn6U-EnbBlka6ZUB3gjlvk4
XBKZJutqHyFHtkc6RYN98kKSPeRBCXFd8vZROx9PsOjL1uIseox4IeaZ8BvGje3RkGHiyTp_djmc8eyBBA6DwtKKldsd3hhuD0eX2hbbg2YZVbi
OkLK976gL5pX_8BPQeZ66McDTCPlaoiYOIcegVGwZs49kA4YlYV_A5bO8WUSvnKQfPK_74qLy0BGp-rx0gjTc7w
修改Prometheus全局配置,再依次添加收集node,pod,service,endpoint等工作。
在二进制部署的Prometheus服务器,找出配置文件并修改:
vim /apps/prometheus/prometheus.yml# my global config
global:# 每15秒收集一次信息scrape_interval: 15s # 每15秒刷新一次规则evaluation_interval: 15s# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093rule_files:# - "first_rules.yml"# - "second_rules.yml"scrape_configs:
# The job name is added as a label `job=` to any timeseries scraped from this config. - job_name: "prometheus"# metrics_path defaults to '/metrics' # scheme defaults to 'http'# 示例工作,收集服务器自己的运行指数:static_configs: - targets: ["localhost:9090"]
在上述通用配置文件prometheus.yml,末尾添加API Server工作:
# API Serevr 节点发现- job_name: 'kubernetes-apiservers-monitor' kubernetes_sd_configs: - role: endpoints# 填写一个master即可,会自动发现三个。api_server: https://192.168.100.191:6443# 连续该master所需的tokentls_config:insecure_skip_verify: truebearer_token_file: /apps/prometheus/k8s.tokenscheme: https# 连续其它master所需的tokentls_config:insecure_skip_verify: truebearer_token_file: /apps/prometheus/k8s.tokenrelabel_configs:# 配置这些类型保留采集- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]action: keepregex: default; kubernetes; https# 替换发现的服务端口、协议等- source_labels: [__address__]regex: '(.*):6443' replacement: '${1}:9100' target_label: __address__ action: replace- source_labels: [__scheme__]regex: httpsreplacement: httptarget_label: __scheme__ action: replace
可以看到服务端口和协议变成了9100和http:
在通用配置文件prometheus.yml,末尾添加Node节点发现:
# node 节点发现- job_name: 'kubernetes-nodes-monitor' # 通过连接master,获取集群node信息kubernetes_sd_configs: - role: nodeapi_server: https://192.168.100.192:6443tls_config:insecure_skip_verify: truebearer_token_file: /apps/prometheus/k8s.token# 连接api-server所需的tokenscheme: httptls_config:insecure_skip_verify: truebearer_token_file: /apps/prometheus/k8s.tokenrelabel_configs: - source_labels: [__address__]# 10250是kubelet端口,即node节点。regex: '(.*):10250' # 转换成exporter端口9100,采集节点信息。replacement: '${1}:9100' target_label: __address__ action: replace- source_labels: [__meta_kubernetes_node_label_failure_domain_beta_kubernetes_io_region]regex: '(.*)' replacement: '${1}' action: replacetarget_label: LOC- source_labels: [__meta_kubernetes_node_label_failure_domain_beta_kubernetes_io_region]regex: '(.*)' replacement: 'NODE' action: replacetarget_label: Type- source_labels: [__meta_kubernetes_node_label_failure_domain_beta_kubernetes_io_region]regex: '(.*)' replacement: 'K8S-test' action: replacetarget_label: Env- action: labelmapregex: __meta_kubernetes_node_label_(.+)
master也有kubelet,所以集群有6个节点:
在通用配置文件prometheus.yml,末尾添加Namespace Pod发现:
#指定namespace的pod- job_name: 'kubernetes-namespace-pod' kubernetes_sd_configs: - role: podapi_server: https://192.168.100.193:6443tls_config:insecure_skip_verify: truebearer_token_file: /apps/prometheus/k8s.token# 选择命名空间为monitoringnamespaces:names: - monitoringrelabel_configs: # 保留这些标签和值- action: labelmapregex: __meta_kubernetes_pod_label_(.+) # 更换标签- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: kubernetes_namespace- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: kubernetes_pod_name
可以发现7个实例有6个可采集,有1个pod是down因为没有安装cadvisor。
要发现自定义Pod,首先创建Pod时要添加annotation_prometheus_io_scrape,值为true:
apiVersion: apps/v1
kind: Deployment
...
spec:template:metadata:annotations:prometheus.io/scrape: 'true'
...
然后在通用配置文件prometheus.yml,末尾添加自定义Pod发现:
# 自定义Pod发现- job_name: 'kubernetes-condition-pod' kubernetes_sd_configs: - role: podapi_server: https://192.168.100.191:6443tls_config:insecure_skip_verify: truebearer_token_file: /apps/prometheus/k8s.tokenrelabel_configs: # 开启scrape的Pod才保留监控- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]action: keepregex: true- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]action: replacetarget_label: __metrics_path__regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]action: replaceregex: ([^:]+)(?::\d+)?;(\d+)replacement: $1:$2target_label: __address__- action: labelmap# 保留以下开头的标签和值regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: kubernetes_namespace- source_labels: [__meta_kubernetes_pod_name]# 修改标签action: replacetarget_label: kubernetes_pod_name- source_labels: [__meta_kubernetes_pod_label_pod_template_hash]regex: '(.*)' replacement: 'K8S-test' action: replacetarget_label: Env
Prometheus已经收集符合条件的6个pod,但状态都是down。还需要在pod安装Prometheus插件,监控才能正常显示。
上一篇:不懂这十招就不能吃
下一篇:你到底要不要魔法钱包