如何保护Kubernetes Critical Pod

发布时间：2021-12-20 10:18:05 作者：iii
来源：亿速云阅读：158

这篇文章主要讲解了“如何保护Kubernetes Critical Pod”，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着小编的思路慢慢深入，一起来研究和学习“如何保护Kubernetes Critical Pod”吧！

Kubelet Eviction Manager Admit

kubelet在syncLoop中每个1s会循环调用syncLoopIteration，从config change channel | pleg channel | sync channel | houseKeeping channel | liveness manager's update channel中获取event，然后分别调用对应的event handler进行处理。

configCh: dispatch the pods for the config change to the appropriate handler callback for the event type
plegCh: update the runtime cache; sync pod
syncCh: sync all pods waiting for sync
houseKeepingCh: trigger cleanup of pods
liveness manager's update channel: sync pods that have failed or in which one or more containers have failed liveness checks

特别提一下，houseKeeping channel是每隔houseKeeping（10s）时间就会有event，然后执行HandlePodCleanups，执行以下清理操作：

Stop the workers for no-longer existing pods.（每个pod对应会有一个worker，也就是goruntine）
killing unwanted pods
removes the volumes of pods that should not be running and that have no containers running.
Remove any orphaned mirror pods.
Remove any cgroups in the hierarchy for pods that are no longer running.

pkg/kubelet/kubelet.go:1753

func (kl *Kubelet) syncLoopIteration(configCh <-chan kubetypes.PodUpdate, handler SyncHandler,
	syncCh <-chan time.Time, housekeepingCh <-chan time.Time, plegCh <-chan *pleg.PodLifecycleEvent) bool {
	select {
	case u, open := <-configCh:
		
		if !open {
			glog.Errorf("Update channel is closed. Exiting the sync loop.")
			return false
		}

		switch u.Op {
		case kubetypes.ADD:
			
			handler.HandlePodAdditions(u.Pods)
		...
		case kubetypes.RESTORE:
			glog.V(2).Infof("SyncLoop (RESTORE, %q): %q", u.Source, format.Pods(u.Pods))
			// These are pods restored from the checkpoint. Treat them as new
			// pods.
			handler.HandlePodAdditions(u.Pods)
		...
		}

		if u.Op != kubetypes.RESTORE {
			...
		}
	case e := <-plegCh:
		...
	case <-syncCh:
		...
	case update := <-kl.livenessManager.Updates():
		...
	case <-housekeepingCh:
		...
	}
	return true
}

syncLoopIteration中定义了当kubelet配置变更重启后的逻辑：kubelet会对正在running的Pods进行Admission处理，Admission的结果有可能会让该Pod被本节点拒绝。

HandlePodAdditions就是用来处理Kubelet ConficCh中的event的Handler。

// HandlePodAdditions is the callback in SyncHandler for pods being added from a config source.
func (kl *Kubelet) HandlePodAdditions(pods []*v1.Pod) {
	start := kl.clock.Now()
	sort.Sort(sliceutils.PodsByCreationTime(pods))
	for _, pod := range pods {
		...

		if !kl.podIsTerminated(pod) {
			...
			// Check if we can admit the pod; if not, reject it.
			if ok, reason, message := kl.canAdmitPod(activePods, pod); !ok {
				kl.rejectPod(pod, reason, message)
				continue
			}
		}
		...
	}
}

如果该Pod Status不是属于Terminated，就调用canAdmitPod对该Pod进行准入检查。如果准入检查结果表示该Pod被拒绝，那么就会将该Pod Phase设置为Failed。

pkg/kubelet/kubelet.go:1643

func (kl *Kubelet) canAdmitPod(pods []*v1.Pod, pod *v1.Pod) (bool, string, string) {
	// the kubelet will invoke each pod admit handler in sequence
	// if any handler rejects, the pod is rejected.
	// TODO: move out of disk check into a pod admitter
	// TODO: out of resource eviction should have a pod admitter call-out
	attrs := &lifecycle.PodAdmitAttributes{Pod: pod, OtherPods: pods}
	for _, podAdmitHandler := range kl.admitHandlers {
		if result := podAdmitHandler.Admit(attrs); !result.Admit {
			return false, result.Reason, result.Message
		}
	}

	return true, "", ""
}

canAdmitPod就会调用kubelet启动时注册的一系列admitHandlers对该Pod进行准入检查，其中就包括kubelet eviction manager对应的admitHandle。

pkg/kubelet/eviction/eviction_manager.go:123

// Admit rejects a pod if its not safe to admit for node stability.
func (m *managerImpl) Admit(attrs *lifecycle.PodAdmitAttributes) lifecycle.PodAdmitResult {
	m.RLock()
	defer m.RUnlock()
	if len(m.nodeConditions) == 0 {
		return lifecycle.PodAdmitResult{Admit: true}
	}
	
	if utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation) && kubelettypes.IsCriticalPod(attrs.Pod) {
		return lifecycle.PodAdmitResult{Admit: true}
	}

	if hasNodeCondition(m.nodeConditions, v1.NodeMemoryPressure) {
		notBestEffort := v1.PodQOSBestEffort != v1qos.GetPodQOS(attrs.Pod)
		if notBestEffort {
			return lifecycle.PodAdmitResult{Admit: true}
		}
	}

		return lifecycle.PodAdmitResult{
		Admit:   false,
		Reason:  reason,
		Message: fmt.Sprintf(message, m.nodeConditions),
	}
}

eviction manager的Admit的逻辑如下：

如果该node的Conditions为空，则Admit成功；
如果enable了ExperimentalCriticalPodAnnotation Feature Gate，并且该Pod是Critical Pod（Pod有Critical的Annotation，或者Pod的优先级不小于SystemCriticalPriority），则Admit成功；

SystemCriticalPriority的值为2 billion。

如果该node的Condition为Memory Pressure，并且Pod QoS为非best-effort，则Admit成功；
其他情况都表示Admit失败，即不允许该Pod在该node上Running。

Kubelet Eviction Manager SyncLoop

另外，在kubelet eviction manager的syncLoop中，也会对Critical Pod有特殊处理，代码如下。

pkg/kubelet/eviction/eviction_manager.go:226

// synchronize is the main control loop that enforces eviction thresholds.
// Returns the pod that was killed, or nil if no pod was killed.
func (m *managerImpl) synchronize(diskInfoProvider DiskInfoProvider, podFunc ActivePodsFunc) []*v1.Pod {
	...

	// we kill at most a single pod during each eviction interval
	for i := range activePods {
		pod := activePods[i]
		
		if utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation) &&
			kubelettypes.IsCriticalPod(pod) && kubepod.IsStaticPod(pod) {
			continue
		}
		...
		return []*v1.Pod{pod}
	}
	glog.Infof("eviction manager: unable to evict any pods from the node")
	return nil
}

当触发了kubelet evict pod时，如果该pod满足以下所有条件时，将不会被kubelet eviction manager kill掉。

该Pod Status不是Terminated；
Enable ExperimentalCriticalPodAnnotation Feature Gate；
该Pod是Critical Pod；
该Pod时Static Pod；

感谢各位的阅读，以上就是“如何保护Kubernetes Critical Pod”的内容了，经过本文的学习后，相信大家对如何保护Kubernetes Critical Pod这一问题有了更深刻的体会，具体使用情况还需要大家实践验证。这里是亿速云，小编将为大家推送更多相关知识点的文章，欢迎关注！

如何保护Kubernetes Critical Pod

Kubelet Eviction Manager Admit

Kubelet Eviction Manager SyncLoop

相关阅读