Who rebooted my VM? A cloud mystery

I had been recently involved in an interesting discussion on how to figure out, who has rebooted your virtual machine from a linux perspective. In this particular case, we were talking about an OpenShift node. I will be elaborating on a few different cases and how those can be investigated.

To reproduce the problem, I am using a five node OpenShift cluster which is running on a Kernel Based Virtual Machine (KVM) host.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
$ oc get nodes
NAME        STATUS   ROLES                         AGE     VERSION
compute-0   Ready    worker                        3d23h   v1.31.6
compute-1   Ready    worker                        3d23h   v1.31.6
master-0    Ready    control-plane,master,worker   3d23h   v1.31.6
master-1    Ready    control-plane,master,worker   3d23h   v1.31.6
master-2    Ready    control-plane,master,worker   3d23h   v1.31.6

$ virsh list
 Id   Name             State
--------------------------------
 2    ocp4-master-0    running
 3    ocp4-master-1    running
 4    ocp4-master-2    running
 5    ocp4-compute-0   running
 6    ocp4-compute-1   running

Consequently, I am my own hypervisor admin which will be sabotaging my OpenShift cluster for this investigation.

What all cases will have in common is that a node will be NotReady for some time, e.g.:

1
2
3
4
5
6
7
$ oc get nodes   
NAME        STATUS     ROLES                         AGE     VERSION
compute-0   Ready      worker                        4d15h   v1.31.6
compute-1   NotReady   worker                        4d15h   v1.31.6
master-0    Ready      control-plane,master,worker   4d15h   v1.31.6
master-1    Ready      control-plane,master,worker   4d15h   v1.31.6
master-2    Ready      control-plane,master,worker   4d15h   v1.31.6

Additionally you will be able to find logs of the machine config controller which reports a NotReady machine:

1
2
3
4
5
6
7
8
$ oc -n openshift-machine-config-operator logs machine-config-controller-5f88fbbc4f-rqjm4 -c machine-config-controller | grep -i compute-1
...
I0404 06:41:13.153952       1 node_controller.go:584] Pool worker: node compute-1: Reporting unready: node compute-1 is reporting OutOfDisk=Unknown
I0404 06:41:13.196590       1 node_controller.go:584] Pool worker: node compute-1: changed taints
I0404 06:41:19.481728       1 node_controller.go:584] Pool worker: node compute-1: changed taints
I0404 06:42:42.730148       1 node_controller.go:584] Pool worker: node compute-1: Reporting unready: node compute-1 is reporting NotReady=False
...
I0404 06:42:56.309260       1 node_controller.go:584] Pool worker: node compute-1: Reporting ready

Case 1: User reboot

In this case a user has rebooted one of your nodes (either via a debug container or via ssh). This can be easily found with the audit logs:

1
2
3
$ cat /var/log/audit/audit.log | grep reboot
...
type=USER_CMD msg=audit(1743748835.979:805): pid=1366614 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:spc_t:s0 msg='cwd="/" cmd="reboot" exe="/usr/bin/sudo" terminal=? res=success'UID="root" AUID="unset"

Case 2: Machine Config rollout

A machine config rollout can also be found in the audit.log files:

1
2
3
4
$ cat /var/log/audit/audit.log | grep reboot
...
type=SERVICE_START msg=audit(1743789169.017:1277): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=machine-config-daemon-reboot comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'^]UID="root" AUID="unset"
type=SERVICE_STOP msg=audit(1743789169.119:1278): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=machine-config-daemon-reboot comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'^]UID="root" AUID="unset

Furthermore this can be confirmed with the MachineConfigPool, which provides us with the latest MachineConfig rendered-worker-04636f9acf052beffabf57725a564095

1
2
3
$ oc get mcp worker       
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-04636f9acf052beffabf57725a564095   True      False      False      2              2                   2                     0                      5d2h

Looking at the boot IDs, we should see, that boot id 57cdb3ac10fd46c3933a3f0ba3e3c33f should be shutdown by the machine config operator with a message, that it will boot into rendered-worker-04636f9acf052beffabf57725a564095. This is the case:

1
2
3
$ journalctl --list-boots | tail -n 2
 -1 57cdb3ac10fd46c3933a3f0ba3e3c33f Fri 2025-04-04 07:47:43 UTC Fri 2025-04-04 17:53:13 UTC
  0 6004e8be8fbc4d769485a991ff1641d4 Fri 2025-04-04 17:53:19 UTC Fri 2025-04-04 18:04:01 UTC

1
2
3
4
5
6
$ journalctl -u machine-config-daemon-reboot
...
-- Boot 57cdb3ac10fd46c3933a3f0ba3e3c33f --
Apr 04 17:52:49 compute-1 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-worker-04636f9acf052beffabf57725a564095.
Apr 04 17:52:49 compute-1 systemd[1]: machine-config-daemon-reboot.service: Deactivated successfully.
Apr 04 17:52:49 compute-1 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-worker-04636f9acf052beffabf57725a564095.

Case 3: A container with elevated privileges was compromised

Some containers require elevated privileges to perform certain actions on OpenShift. If such a container is compromised, then a reboot could be initiated from the container. In this example I will be using a container of the Node Tuning Operator:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
$ oc -n openshift-cluster-node-tuning-operator get pods -owide                                          
NAME                                            READY   STATUS    RESTARTS   AGE     IP              NODE        NOMINATED NODE   READINESS GATES
cluster-node-tuning-operator-5bb647f57c-lk7wx   1/1     Running   0          4d15h   X.X.X.X         master-1    <none>           <none>
tuned-5q7vm                                     1/1     Running   1          4d15h   X.X.X.X         compute-0   <none>           <none>
tuned-8n89v                                     1/1     Running   1          4d15h   X.X.X.X         compute-1   <none>           <none>
tuned-g7nwk                                     1/1     Running   0          4d15h   X.X.X.X         master-0    <none>           <none>
tuned-rxt2q                                     1/1     Running   0          4d15h   X.X.X.X         master-2    <none>           <none>
tuned-xsm98                                     1/1     Running   0          4d15h   X.X.X.X         master-1    <none>           <none>
$ oc -n openshift-cluster-node-tuning-operator rsh tuned-8n89v
$ id
uid=0(root) gid=0(root) groups=0(root)
$ date
Fri Apr  4 07:25:01 UTC 2025
$ reboot           

One interesting fact here is, that we do not see this reboot in the audit.log, as the last entry for a reboot is the same as the one from Case 1:

1
2
$ cat /var/log/audit/audit.log | grep reboot | tail -n 1
type=USER_CMD msg=audit(1743748835.979:805): pid=1366614 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:spc_t:s0 msg='cwd="/" cmd="reboot" exe="/usr/bin/sudo" terminal=? res=success'UID="root" AUID="unset"

Nevertheless, we can see in the journalctl that the reboot was initiated. Please also note, that the reboot was initiated shortly after Fri Apr 4 07:25:01 UTC 2025:

1
2
3
4
$ journalctl -b -1 | grep -i reboot | tail -n 3
Apr 04 07:25:06 compute-1 systemd-logind[906]: The system will reboot now!
Apr 04 07:25:06 compute-1 systemd-logind[906]: System is rebooting.
Apr 04 07:26:41 compute-1 systemd[1]: Requested transaction contradicts existing jobs: Transaction for NetworkManager-dispatcher.service/start is destructive (reboot.target has 'start' job queued, but 'stop' is included in transaction).

While an audit rule might be helping to detect that the reboot was started via /usr/sbin/reboot from a container, this is probably not something of practical relevance.

Case 4: The hypervisor triggered a soft reboot

To differentiate better between those cases, we will be using compute-0 for the reboots initiated by the hypervisor / cloud provider instead of compute-1. For a (kind of) graceful shutdown we will be using the virsh reboot command. To provide a rough idea what this does, we can take a look at the man page for virsh reboot:

Reboot a domain. This acts just as if the domain had the reboot command run from the console. The command returns as soon as it has executed the reboot action, which may be significantly before the domain actually reboots.

1
2
3
4
5
$ date
Fri Apr  4 06:59:52 PM CEST 2025

$ virsh reboot ocp4-compute-0
Domain 'ocp4-compute-0' is being rebooted

We can find a related audit.log:

1
2
$ ausearch -m SERVICE_STOP | grep -Ei 'shutdownd|shutdown|reboot|final|auditd'
type=SERVICE_STOP msg=audit(1743785996.839:279): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dracut-shutdown comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

Additionally we see the shutdown process in the journalctl of the previous boot:

1
2
3
4
5
6
7
$ journalctl -b -1 | grep -i shutdown | tail -n 6
Apr 04 16:59:56 compute-0 systemd[1]: Stopping Restore /run/initramfs on shutdown...
Apr 04 16:59:56 compute-0 systemd[1]: dracut-shutdown.service: Deactivated successfully.
Apr 04 16:59:56 compute-0 systemd[1]: Stopped Restore /run/initramfs on shutdown.
Apr 04 17:00:42 compute-0 systemd[1]: Requested transaction contradicts existing jobs: Transaction for NetworkManager-dispatcher.service/start is destructive (shutdown.target has 'start' job queued, but 'stop' is included in transaction).
Apr 04 17:00:43 compute-0 systemd[1]: Stopping Record System Boot/Shutdown in UTMP...
Apr 04 17:00:43 compute-0 systemd[1]: Stopped Record System Boot/Shutdown in UTMP.

Note: While the timezone configuration is different, those logs are related to the same virsh reboot.

Case 5: The hypervisor triggered a hard reset

For this experiment we will be using the virsh reset command. To provide a rough idea what this does, we can take a look at the man page for virsh reset:

Reset a domain immediately without any guest shutdown. reset emulates the power reset button on a machine, where all guest hardware sees the RST line set and reinitializes internal state. Note: Reset without any guest OS shutdown risks data loss.

1
2
3
4
5
$ date
Fri Apr  4 07:17:45 PM CEST 2025

$ virsh reset ocp4-compute-0
Domain 'ocp4-compute-0' was reset

The only audit.log we can find, is the one form before:

1
2
$  ausearch -m SERVICE_STOP | grep -Ei 'shutdownd|shutdown|reboot|final|auditd'
type=SERVICE_STOP msg=audit(1743785996.839:279): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dracut-shutdown comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

The journalctl of the previous boot also suddenly ends without a shutdown sequence:

1
2
3
$ journalctl -b -1 |  tail -n 2
Apr 04 17:16:11 compute-0 systemd[1]: Finished Cleanup of Temporary Directories.
Apr 04 17:16:11 compute-0 systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.

Note: The shutdown was initiated at 17:17:45 when considering the different timezone configurations.

Nevertheless, we can have an indicator, that the previous boot had an unclean shutdown by seeing if there was

A recovery started during latest boot
A filesystem check initiated during latest boot.

1
2
3
4
5
6
7
8
$ journalctl -b 0 |  grep -iE 'recover|fsck'
Apr 04 17:17:58 localhost systemd-fsck[651]: /usr/sbin/fsck.xfs: XFS file system.
Apr 04 17:17:58 localhost kernel: XFS (vda4): Starting recovery (logdev: internal) <===== HERE
Apr 04 17:17:58 localhost kernel: XFS (vda4): Ending recovery (logdev: internal) <===== HERE
Apr 04 17:17:59 compute-0 systemd[1]: Created slice Slice /system/systemd-fsck.
Apr 04 17:17:59 compute-0 systemd[1]: systemd-fsck-root.service: Deactivated successfully.
Apr 04 17:18:00 compute-0 systemd-fsck[819]: boot: recovering journal <===== HERE
Apr 04 17:18:00 compute-0 systemd-fsck[819]: boot: clean, 365/98304 files, 150426/393216 blocks

We see that the system did not go through a normal shutdown sequence due to the presence of XFS recovery and fsck journal recovery which confirms that filesystems were not cleanly unmounted. See the unclean unmounts section in the RHEL 9 docs for XFS.

Case 6: A kernel panic occured

A kernel panic is a safety measure taken by the kernel of an operating system when an internal fatal error is detected, such as hardware failures and memory corruption. First we need to clarify, if a system would reboot in such a case. The responsible setting is kernel.panic, where 0 means that the system is not rebooted. Any other value indicates the number of seconds which are waited before the reboot is triggered, see further kernel docs.

1
2
$ sysctl kernel.panic
kernel.panic = 10

If a kernel panic would have happened, we would usually find a corresponding file in /sys/fs/pstore/

1
2
$ ls /sys/fs/pstore/
sh-5.1# 

As the setup here is a bit more complicated for OpenShift, I will leave it without an example. For further readings, please follow the kernel docs.

Conclusion

Linux offers many powerful utilities to understand what has happened on your system. This includes user initiated actions, hypervisor / cloud provider activity and even hardware failures. If you have other indicators which you are usually verifying, please let me know.