Crio panics with "panic: close of closed channel " after attempting to stop a container in OpenShift Container Platform 4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (OCP) 4

Issue

  • crio panics with panic: close of closed channel and the below stacktrace:

      Mar 14 15:23:27 worker-1 hyperkube[1692]: I0314 15:23:27.387723    1692 kubelet.go:1954] "SyncLoop REMOVE" source="api" pods=[debug/worker-1-debug]
      Mar 14 15:23:27 worker-1 hyperkube[1692]: I0314 15:23:27.387792    1692 kubelet_pods.go:1285] "Killing unwanted pod" podName="worker-1-debug"
      Mar 14 15:23:27 worker-1 hyperkube[1692]: I0314 15:23:27.387838    1692 kuberuntime_container.go:720] "Killing container with a grace period override" pod="debug/worker-1-debug" podUID=ed3fba93-12fd-40ba-af22-207bc2dd7ebd containerName="container-00" containerID="cri-o://5e4a5efe282d3f77fe472d8810fa9a8a61df545a6087a7e8ecaa9379b7f1fa5c" gracePeriod=2
      Mar 14 15:23:27 worker-1 systemd[1]: crio-conmon-5e4a5efe282d3f77fe472d8810fa9a8a61df545a6087a7e8ecaa9379b7f1fa5c.scope: Consumed 55ms CPU time
      Mar 14 15:23:27 worker-1 crio[1636]: panic: close of closed channel
      Mar 14 15:23:27 worker-1 crio[1636]: goroutine 5778599 [running]:
      Mar 14 15:23:27 worker-1 crio[1636]: panic(0x55c2b280a280, 0x55c2b2aa4f90)
      Mar 14 15:23:27 worker-1 crio[1636]:         /usr/lib/golang/src/runtime/panic.go:1065 +0x565 fp=0xc001827530 sp=0xc001827468 pc=0x55c2b098e8a5
      Mar 14 15:23:27 worker-1 crio[1636]: runtime.closechan(0xc00187e300)
      Mar 14 15:23:27 worker-1 crio[1636]:         /usr/lib/golang/src/runtime/chan.go:363 +0x3f5 fp=0xc001827570 sp=0xc001827530 pc=0x55c2b095cbb5
      Mar 14 15:23:27 worker-1 crio[1636]: github.com/cri-o/cri-o/internal/oci.(*runtimeOCI).StopContainer.func1(0xc001827678, 0xc0015e5080)
      Mar 14 15:23:27 worker-1 crio[1636]:         /builddir/build/BUILD/cri-o-c05847896bc721f6529b1ceb4bafaf6cfe523b5d/_output/src/github.com/cri-o/cri-o/internal/oci/runtime_oci.go:701 +0x49 fp=0xc001827588 sp=0xc001827570 pc=0x55c2b1fd0a29
      Mar 14 15:23:27 worker-1 crio[1636]: github.com/cri-o/cri-o/internal/oci.(*runtimeOCI).StopContainer(0xc001ad3530, 0x55c2b2b19a20, 0xc001a90c00, 0xc0015e5080, 0x2, 0x55c2b2ac7d68, 0xc0000da780)
      Mar 14 15:23:27 worker-1 crio[1636]:         /builddir/build/BUILD/cri-o-c05847896bc721f6529b1ceb4bafaf6cfe523b5d/_output/src/github.com/cri-o/cri-o/internal/oci/runtime_oci.go:710 +0x788 fp=0xc001827650 sp=0xc001827588 pc=0x55c2b1fc3948
      Mar 14 15:23:27 worker-1 crio[1636]: github.com/cri-o/cri-o/internal/oci.(*Runtime).StopContainer(0xc0005fe5d0, 0x55c2b2b19a20, 0xc001a90c00, 0xc0015e5080, 0x2, 0x0, 0x8000101)
      Mar 14 15:23:27 worker-1 crio[1636]:         /builddir/build/BUILD/cri-o-c05847896bc721f6529b1ceb4bafaf6cfe523b5d/_output/src/github.com/cri-o/cri-o/internal/oci/oci.go:323 +0x9d fp=0xc001827698 sp=0xc001827650 pc=0x55c2b1fba85d
      Mar 14 15:23:27 worker-1 crio[1636]: github.com/cri-o/cri-o/internal/lib.(*ContainerServer).StopContainer(0xc0006be180, 0x55c2b2b19a20, 0xc001a90c00, 0xc0015e5080, 0x2, 0xc000daebd0, 0x0)
      Mar 14 15:23:27 worker-1 crio[1636]:         /builddir/build/BUILD/cri-o-c05847896bc721f6529b1ceb4bafaf6cfe523b5d/_output/src/github.com/cri-o/cri-o/internal/lib/stop.go:14 +0x79 fp=0xc001827748 sp=0xc001827698 pc=0x55c2b2009159
      Mar 14 15:23:27 worker-1 crio[1636]: github.com/cri-o/cri-o/server.(*Server).StopContainer(0xc0001ba580, 0x55c2b2b19a20, 0xc001a90c00, 0xc001827840, 0x55c2b0962325, 0x55c2b2977e00)
      Mar 14 15:23:27 worker-1 crio[1636]:         /builddir/build/BUILD/cri-o-c05847896bc721f6529b1ceb4bafaf6cfe523b5d/_output/src/github.com/cri-o/cri-o/server/container_stop.go:34 +0x349 fp=0xc001827810 sp=0xc001827748 pc=0x55c2b2076b69
      Mar 14 15:23:27 worker-1 crio[1636]: github.com/cri-o/cri-o/server/cri/v1alpha2.(*service).StopContainer(0xc00059c080, 0x55c2b2b19a20, 0xc001a90c00, 0xc00195c220, 0xc00059c080, 0x1, 0x1)
      Mar 14 15:23:27 worker-1 crio[1636]:         /builddir/build/BUILD/cri-o-c05847896bc721f6529b1ceb4bafaf6cfe523b5d/_output/src/github.com/cri-o/cri-o/server/cri/v1alpha2/rpc_stop_container.go:17 +0x85 fp=0xc001827868 sp=0xc001827810 pc=0x55c2b2184625
    

Resolution

  • This problem was resolved in Red Hat OpenShift Container Platform 4.9.31 via RHBA-2022:1605 and Red Hat OpenShift Container Platform 4.10.11 via RHBA-2022:1431. Please update to the given version or later to prevent the issue from happening.
  • As consequence of this Bug coredns and/or keepalived static Pods may be in Pending state. Please refer to coredns and keepalived Pods in a non-ready state in RHOCP 4 to obtain more details and information.

Root Cause

crio would segfault when receiving multiple stop requests for the same container.

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.