The website uses cookies. By using this site, you agree to our use of cookies as described in the Privacy Policy.
I Agree
blank_error__heading
blank_error__body
Text direction?

K8s远程调试,你的姿势对了吗?

阅读 8020

前言

本文讲述k8s各个系统组件如何进行远程调试, 适用于所有mac、windows以及不方便在本地进行调试的技术宅; 像k8s代码量如此庞大的系统, 调试代码让你更深入地理解逻辑,代码间的跳转更易跟踪,通过打印变量映证你的猜想。

准备阶段

1、Goland IDE 工欲善必先利其器, Goland集成了dlv调试工具, dlv和gdb都可以用来调试golang程序, 但是dlv相比gdb在调试goroutine方面支持更好;

Goland下载地址:

https://www.jetbrains.com/go/download/#section=windows

2、开发环境+golang环境+dlv+kubernetes源码 在远端linux服务器上安装一套kubernetes环境,可以用kubeadm工具安装; 安装golang环境+dlv+kubernetes源码,方便重编译k8s二进制,默认k8s二进制开启了编译优化,忽略了debug信息;dlv用来启个debug server提供给本地IDE远程调试使用;

环境版本: Kubernetes:v1.14.6

Etcd:3.3.12

Docker:18.09.9

dlv:1.4.0

Golang:1.12.7

3、本地clone kubernetes代码

# mkdir -p $GOPATH/src/k8s.io/
# cd $GOPATH/src/k8s.io/

克隆kubernetes源码

# git clone https://github.com/kubernetes/kubernetes.git

切换到与上面k8s环境版本相同的tag

# git checkout v1.14.6

备战阶段

1、修改编译参数 在v1.14.6版本编译二进制的时候,默认编译参数写死了加上-s -w选项 -s disable symbol table 禁用符号表 -w disable DWARF generation 禁用调试信息; 更多编译参数帮助信息查看:go tool link

vim k8s.io/kubernetes/hack/lib/golang.sh# goldflags="${GOLDFLAGS:-} -s -w $(kube::version::ldflags)"goldflags="${GOLDFLAGS:-} $(kube::version::ldflags)"

2、重编译k8s二进制 因为默认k8s二进制开启了编译优化,忽略了debug信息

[root@ice /data/Documents/src/k8s.io/kubernetes]# GO111MODULE=off KUBE_GIT_TREE_STATE=clean KUBE_GIT_VERSION=v1.14.6 make all GOGCFLAGS="all=-N -l"+++ [0425 17:34:35] Building go targets for linux/amd64:    cmd/kube-proxy    cmd/kube-apiserver    cmd/kube-controller-manager    cmd/cloud-controller-manager    cmd/kubelet    cmd/kubeadm    cmd/hyperkube    cmd/kube-scheduler    vendor/k8s.io/apiextensions-apiserver    cluster/gce/gci/mounter    cmd/kubectl    cmd/gendocs    cmd/genkubedocs    cmd/genman    cmd/genyaml    cmd/genswaggertypedocs    cmd/linkcheck    vendor/github.com/onsi/ginkgo/ginkgo    test/e2e/e2e.test    cmd/kubemark    vendor/github.com/onsi/ginkgo/ginkgo    test/e2e_node/e2e_node.test
[root@ice /data/Documents/src/k8s.io/kubernetes]# ll _output/bin/total 2893888-rwxr-xr-x 1 root root  62531871 Apr 25 17:42 apiextensions-apiserver-rwxr-xr-x 1 root root 137590835 Apr 25 17:43 cloud-controller-manager-rwxr-xr-x 1 root root   6771616 Apr 22 09:41 conversion-gen-rwxr-xr-x 1 root root   6771520 Apr 22 09:41 deepcopy-gen-rwxr-xr-x 1 root root   6738784 Apr 22 09:41 defaulter-gen-rwxr-xr-x 1 root root 250845049 Apr 25 17:43 e2e.test-rwxr-xr-x 1 root root 292344015 Apr 25 17:43 e2e_node.test-rwxr-xr-x 1 root root  62318688 Apr 25 17:43 gendocs-rwxr-xr-x 1 root root 321355263 Apr 25 17:42 genkubedocs-rwxr-xr-x 1 root root 333735462 Apr 25 17:43 genman-rwxr-xr-x 1 root root   5956441 Apr 25 17:43 genswaggertypedocs-rwxr-xr-x 1 root root  62312802 Apr 25 17:43 genyaml-rwxr-xr-x 1 root root  12754284 Apr 25 17:43 ginkgo-rwxr-xr-x 1 root root   2063232 Apr 22 09:42 go-bindata-rwxr-xr-x 1 root root   4709220 Feb 21 09:32 go2make-rwxr-xr-x 1 root root 326576409 Apr 25 17:43 hyperkube-rwxr-xr-x 1 root root 194326902 Apr 25 17:43 kube-apiserver-rwxr-xr-x 1 root root 160177458 Apr 25 17:42 kube-controller-manager-rwxr-xr-x 1 root root  54119344 Apr 25 17:43 kube-proxy-rwxr-xr-x 1 root root  58112547 Apr 25 17:43 kube-scheduler-rwxr-xr-x 1 root root  58810053 Apr 25 17:43 kubeadm-rwxr-xr-x 1 root root  63338080 Apr 25 17:43 kubectl-rwxr-xr-x 1 root root 231795512 Apr 25 17:42 kubelet-rwxr-xr-x 1 root root 224980977 Apr 25 17:43 kubemark-rwxr-xr-x 1 root root   8100012 Apr 25 17:43 linkcheck-rwxr-xr-x 1 root root   2462303 Apr 25 17:42 mounter-rwxr-xr-x 1 root root  11642368 Apr 22 09:42 openapi-gen

重编译的产物在当前位置的_output/bin/目录下;

实战阶段

1、kube-apiserver调试

kubeadm安装的k8s集群,kube-apiserver是以k8s static pod启动的,要调试它,就得停掉以容器运行的kube-apiserver,转而以命令行启动;通过查看kube-apiserver的manifest文件可以得知kube-apiserver启动带的参数;

默认k8s的static pod目录是

/etc/kubernetes/manifests/

[root@ice ~]# cat /etc/kubernetes/manifests/kube-apiserver.yaml apiVersion: v1kind: Podmetadata:  annotations:    scheduler.alpha.kubernetes.io/critical-pod: ""    tke.prometheus.io/scrape: "true"    prometheus.io/scheme: "https"    prometheus.io/port: "6443"  creationTimestamp: null  labels:    component: kube-apiserver    tier: control-plane  name: kube-apiserver  namespace: kube-systemspec:  containers:  - command:    - kube-apiserver    - --advertise-address=1.2.3.4    - --allow-privileged=true    - --authorization-mode=Node,RBAC    - --client-ca-file=/etc/kubernetes/pki/ca.crt    - --enable-admission-plugins=NodeRestriction    - --enable-bootstrap-token-auth=true    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key    - --etcd-servers=https://127.0.0.1:2379    - --insecure-port=0    - --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt    - --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname    - --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt    - --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key    - --requestheader-allowed-names=front-proxy-client    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt    - --requestheader-extra-headers-prefix=X-Remote-Extra-    - --requestheader-group-headers=X-Remote-Group    - --requestheader-username-headers=X-Remote-User    - --secure-port=6443    - --service-account-key-file=/etc/kubernetes/pki/sa.pub    - --service-cluster-ip-range=172.20.252.0/22    - --service-node-port-range=80-60000    - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt    - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key    - --token-auth-file=/etc/kubernetes/known_tokens.csv    image: registry.com/kube-apiserver:v1.14.6    imagePullPolicy: IfNotPresent    livenessProbe:      failureThreshold: 8      httpGet:        host: 1.2.3.4        path: /healthz        port: 6443        scheme: HTTPS      initialDelaySeconds: 15      timeoutSeconds: 15    name: kube-apiserver    resources:      requests:        cpu: 250m    volumeMounts:    - mountPath: /etc/ssl/certs      name: ca-certs      readOnly: true    - mountPath: /etc/pki      name: etc-pki      readOnly: true    - mountPath: /etc/kubernetes/pki      name: k8s-certs      readOnly: true    - mountPath: /etc/kubernetes      name: vol-dir-0  hostNetwork: true priorityClassName: system-cluster-critical  volumes:  - hostPath:      path: /etc/ssl/certs      type: DirectoryOrCreate    name: ca-certs  - hostPath:      path: /etc/pki      type: DirectoryOrCreate    name: etc-pki  - hostPath:      path: /etc/kubernetes/pki      type: DirectoryOrCreate    name: k8s-certs  - hostPath:      path: /etc/kubernetes      type: ""    name: vol-dir-0status: {}

看command部分就是启动命令+启动参数;

停止kube-apiserver static pod

# mv /etc/kubernetes/manifests/kube-apiserver.yaml /etc/kubernetes/
# docker ps -a |grep kube-apiserver

找不到kube-apiserver container的话就说明已经停止; dlv方式启动kube-apiserver;

# dlv exec _output/bin/kube-apiserver --headless -l 1.2.3.4:2345 --api-version=2 -- \        --advertise-address=1.2.3.4 \        --allow-privileged=true \        --authorization-mode=Node,RBAC \        --client-ca-file=/etc/kubernetes/pki/ca.crt \        --enable-admission-plugins=NodeRestriction \        --enable-bootstrap-token-auth=true \        --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt \        --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt \        --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key \        --etcd-servers=https://127.0.0.1:2379 \        --insecure-port=0 \        --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt \        --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key \        --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname \        --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt \        --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key \        --requestheader-allowed-names=front-proxy-client \        --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt \        --requestheader-extra-headers-prefix=X-Remote-Extra- \        --requestheader-group-headers=X-Remote-Group \        --requestheader-username-headers=X-Remote-User \        --secure-port=6443 \        --service-account-key-file=/etc/kubernetes/pki/sa.pub \        --service-cluster-ip-range=172.20.252.0/22 \        --service-node-port-range=80-60000 \        --tls-cert-file=/etc/kubernetes/pki/apiserver.crt \        --tls-private-key-file=/etc/kubernetes/pki/apiserver.key \        --token-auth-file=/etc/kubernetes/known_tokens.csvAPI server listening at: 1.2.3.4:2345

打开Golang IDE, 添加Run/Debug Configurations,新添加一个Go Remote配置

给kube-apiserver的启动函数设置断点,然后点击Debug(绿色小瓢虫那个按钮)

成功捕获到断点了!

2、kube-scheduler调试

调试完kube-apiserver,就先还原kube-apiserver的manifest文件;一样的方式来远程调试kube-scheduler;

# mv /etc/kubernetes/kube-apiserver.yaml /etc/kubernetes/manifests/

停止kube-scheduler static pod,dlv启动kube-scheduler;

# mv /etc/kubernetes/manifests/kube-scheduler.yaml /etc/kubernetes/
# dlv exec _output/bin/kube-scheduler --headless -l 1.2.3.4:2345 --api-version=2 -- \--bind-address=127.0.0.1 \--kubeconfig=/etc/kubernetes/scheduler.conf \--leader-elect=true \--policy-config-file=/etc/kubernetes/scheduler-policy-config.json \--use-legacy-policy-config=true

kube-scheduler断点捕获

3、kube-controller-manager调试 一样的配方,不一样的味道!调试完上个组件,先恢复 停止kube-controller-manager static pod,dlv启动kube-controller-manager

# mv /etc/kubernetes/manifests/kube-controller-manager.yaml /etc/kubernetes/
# dlv exec _output/bin/kube-controller-manager --headless -l 1.2.3.4:2345 --api-version=2 -- \--allocate-node-cidrs=true \--authentication-kubeconfig=/etc/kubernetes/controller-manager.conf \--authorization-kubeconfig=/etc/kubernetes/controller-manager.conf \--bind-address=127.0.0.1 \--client-ca-file=/etc/kubernetes/pki/ca.crt \--cluster-cidr=172.20.0.0/16 \--cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt \--cluster-signing-key-file=/etc/kubernetes/pki/ca.key \--controllers=*,bootstrapsigner,tokencleaner \--kubeconfig=/etc/kubernetes/controller-manager.conf \--leader-elect=true \--node-cidr-mask-size=24 \--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt \--root-ca-file=/etc/kubernetes/pki/ca.crt \--service-account-private-key-file=/etc/kubernetes/pki/sa.key \--service-cluster-ip-range=172.20.252.0/22 \--use-service-account-credentials=true

kube-controller-manager断点捕获

4、kube-proxy调试 先恢复上个组件,kube-proxy的启动方式不是static pod方式;

是以daemonset+configmap配置文件方式启动服务的;

先来瞅下kube-proxy的yaml定义

[root@ice ~]# kubectl -n kube-system get ds kube-proxy -o yaml---apiVersion: extensions/v1beta1kind: DaemonSetmetadata:  creationTimestamp: "2020-04-15T02:54:27Z"  generation: 1  labels:    k8s-app: kube-proxy  name: kube-proxy  namespace: kube-system  resourceVersion: "2298922"  selfLink: /apis/extensions/v1beta1/namespaces/kube-system/daemonsets/kube-proxy  uid: 6b7f7476-7ec4-11ea-a28b-52540002fcf3spec:  revisionHistoryLimit: 10  selector:    matchLabels:      k8s-app: kube-proxy  template:    metadata:      creationTimestamp: null      labels:        k8s-app: kube-proxy    spec:      containers:      - command:        - /usr/local/bin/kube-proxy        - --config=/var/lib/kube-proxy/config.conf        - --hostname-override=$(NODE_NAME)        env:        - name: NODE_NAME          valueFrom:            fieldRef:              apiVersion: v1              fieldPath: spec.nodeName        image: registry.com/kube-proxy:v1.14.6        imagePullPolicy: IfNotPresent        name: kube-proxy        resources: {}        securityContext:          privileged: true terminationMessagePath: /dev/termination-log        terminationMessagePolicy: File        volumeMounts:        - mountPath: /var/lib/kube-proxy          name: kube-proxy        - mountPath: /run/xtables.lock          name: xtables-lock        - mountPath: /lib/modules          name: lib-modules          readOnly: true dnsPolicy: ClusterFirst      hostNetwork: true nodeSelector:        beta.kubernetes.io/os: linux      priorityClassName: system-node-critical      restartPolicy: Always      schedulerName: default-scheduler      securityContext: {}      serviceAccount: kube-proxy      serviceAccountName: kube-proxy      terminationGracePeriodSeconds: 30      tolerations:      - key: CriticalAddonsOnly        operator: Exists      - operator: Exists      volumes:      - configMap:          defaultMode: 420          name: kube-proxy        name: kube-proxy      - hostPath:          path: /run/xtables.lock          type: FileOrCreate        name: xtables-lock      - hostPath:          path: /lib/modules          type: ""        name: lib-modules  templateGeneration: 1  updateStrategy:    rollingUpdate:      maxUnavailable: 1    type: RollingUpdatestatus:  currentNumberScheduled: 3  desiredNumberScheduled: 3  numberAvailable: 3  numberMisscheduled: 0  numberReady: 3  observedGeneration: 1  updatedNumberScheduled: 3    ---[root@ice ~]# kubectl -n kube-system get cm kube-proxy -o yamlapiVersion: v1data:  config.conf: |-    apiVersion: kubeproxy.config.k8s.io/v1alpha1    bindAddress: 0.0.0.0    clientConnection:      acceptContentTypes: ""      burst: 10      contentType: application/vnd.kubernetes.protobuf      kubeconfig: /var/lib/kube-proxy/kubeconfig.conf      qps: 5    clusterCIDR: ""    configSyncPeriod: 15m0s    conntrack:      maxPerCore: 32768      min: 131072      tcpCloseWaitTimeout: 1h0m0s      tcpEstablishedTimeout: 24h0m0s    enableProfiling: false healthzBindAddress: 0.0.0.0:10256    hostnameOverride: ""    iptables:      masqueradeAll: false masqueradeBit: 14      minSyncPeriod: 0s      syncPeriod: 30s    ipvs:      excludeCIDRs: null      minSyncPeriod: 0s      scheduler: ""      strictARP: false syncPeriod: 30s    kind: KubeProxyConfiguration    metricsBindAddress: 127.0.0.1:10249    mode: ""    nodePortAddresses: null    oomScoreAdj: -999    portRange: ""    resourceContainer: /kube-proxy    udpIdleTimeout: 250ms    winkernel:      enableDSR: false networkName: ""      sourceVip: ""  kubeconfig.conf: |-    apiVersion: v1    kind: Config    clusters:    - cluster:        certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt        server: https://1.2.3.4:6443      name: default    contexts:    - context:        cluster: default        namespace: default        user: default      name: default    current-context: default    users:    - name: default      user:        tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/tokenkind: ConfigMapmetadata:  creationTimestamp: "2020-04-15T02:54:27Z"  labels:    app: kube-proxy  name: kube-proxy  namespace: kube-system  resourceVersion: "252"  selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy  uid: 6b7cec7f-7ec4-11ea-a28b-52540002fcf3

启动命令参数看daemonset中的command部分,配置文件看configmap中的config.conf和kubeconfig.conf

(调试代码的话,可以用$HOME/.kube/config取代);

停止kube-proxy pod运行

# kubectl -n kube-system edit daemonsets. kube-proxy       nodeSelector:        beta.kubernetes.io/os: linux2

投机取巧的方式停止它,让node的标签匹配不上;

dlv启动kube-proxy

# vim /root/config.confapiVersion: kubeproxy.config.k8s.io/v1alpha1bindAddress: 0.0.0.0clientConnection:  acceptContentTypes: ""  burst: 10  contentType: application/vnd.kubernetes.protobuf  kubeconfig: /root/.kube/config   qps: 5clusterCIDR: ""configSyncPeriod: 15m0sconntrack:  maxPerCore: 32768  min: 131072  tcpCloseWaitTimeout: 1h0m0s  tcpEstablishedTimeout: 24h0m0senableProfiling: false healthzBindAddress: 0.0.0.0:10256hostnameOverride: ""iptables:  masqueradeAll: false masqueradeBit: 14  minSyncPeriod: 0s  syncPeriod: 30sipvs:  excludeCIDRs: null  minSyncPeriod: 0s  scheduler: ""  strictARP: false syncPeriod: 30skind: KubeProxyConfigurationmetricsBindAddress: 127.0.0.1:10249mode: ""nodePortAddresses: nulloomScoreAdj: -999portRange: ""resourceContainer: /kube-proxyudpIdleTimeout: 250mswinkernel:  enableDSR: false networkName: ""  sourceVip: ""
# dlv exec _output/bin/kube-proxy --headless -l 1.2.3.4:2345 --api-version=2 -- \--config=/root/config.conf \--hostname-override=1.2.3.4

kube-proxy断点捕获

5、kubelet调试 先恢复上一个组件,kubelet的启动方式又跟上述的都不一样,它是以systemd方式运行在宿主机上; 来观摩下kubelet的systemd启动脚本

[root@ice ~]# cat /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf # Note: This dropin only works with kubeadm and kubelet v1.11+[Service]Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamicallyEnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.EnvironmentFile=-/etc/sysconfig/kubeletExecStart=ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

老规矩,先停掉kubelet服务

 systemctl stop kubelet.service   

最后得出的kubelet dlv启动命令是

# dlv exec _output/bin/kubelet --headless -l 1.2.3.4:2345 --api-version=2 -- --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --hostname-override=1.2.3.4 --network-plugin=cni --pod-infra-container-image=registry.com/pause:3.1

kubelet断点捕获

收官!

《dlv vs gdb》参考链接:http://lday.me/2017/02/27/0005_gdb-vs-dlv/

Ice yao的大作

还在为容器时区困扰?送你一剂良药!

玩转K8S AdmissionWebhook

· END ·

记得文末点个在看鸭~

点就完事儿了!

Measure
Measure
Related Notes
Get a free MyMarkup account to save this article and view it later on any device.
Create account

End User License Agreement

Summary | 5 Annotations
kubelet dlv启动命令是
2020/08/07 06:14
适用于所有mac、windows以及不方便在本地进行调试的技术宅
2020/08/07 06:14
-s disable symbol table 禁用符号表
2020/08/07 06:15
-w disable DWARF generation 禁用调试信息
2020/08/07 06:15
dlv exec
2020/08/07 08:43