Server rebooted but not available after maintenance/OS upgrade?

A server was rebooted in a "tick" (a scheduled maintenance window or specific time frame), but after the reboot, it is no longer accessible. What could be causing this issue? Additionally, in the case of a physical server that underwent maintenance and is not booting after the reboot, or a virtual machine (VM) that is not booting properly after an OS upgrade, what steps should be taken to diagnose and resolve the problem? Please provide a detailed explanation of the potential causes and troubleshooting steps for each scenario.

Server rebooted in 'Tick' or Timer issue"

This could refer to a kernel tick rate issue, system clock failure, or hardware timer synchronization problems. Such issues might cause the server to become unresponsive after reboot.

Quick Troubleshooting Steps

journalctl -xb  # Check logs of the last boot
dmesg | tail -50  # Review kernel logs before failure
cat /etc/fstab  # Check for errors in partition mounting
fsck /dev/sdaX  # Verify and repair filesystem issues
grub2-editenv list  # Check GRUB boot environment for issues

Bash script

# 1. Access the server console
#    - For physical servers: Connect a monitor and keyboard.
#    - For virtual machines: Use the hypervisor's console access feature.

# 2. Check for hardware issues
#    - Verify physical connections (cables, power supply).
#    - For virtual machines, ensure the VM is powered on and resources are allocated properly.

# 3. Review boot messages
#    - Reboot the server and observe messages for errors.
#    - If the system halts, note the error message displayed.

# 4. Boot into rescue mode
#    - For systems with GRUB bootloader:
#      1. Reboot the server.
#      2. At the GRUB menu, select the desired entry and press 'e' to edit.
#      3. Append 'systemd.unit=rescue.target' to the kernel line.
#      4. Press Ctrl+x to boot.
#    - Alternatively, boot from a live CD/USB and choose the rescue mode.

# 5. Check filesystem integrity
#    - Identify the root partition (e.g., /dev/sda1):
lsblk
#    - Run filesystem check (replace /dev/sda1 with your root partition):
fsck /dev/sda1

# 6. Reinstall or repair GRUB bootloader
#    - Mount the root filesystem:
mount /dev/sda1 /mnt
#    - If /boot is a separate partition, mount it:
mount /dev/sda2 /mnt/boot
#    - Mount necessary filesystems:
mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys
#    - Change root into the mounted system:
chroot /mnt
#    - Reinstall GRUB:
grub-install /dev/sda
#    - Update GRUB configuration:
update-grub  # For Debian-based systems
grub2-mkconfig -o /boot/grub2/grub.cfg  # For RHEL-based systems

# 7. Review recent system changes
#    - Check for recent package updates or installations:
less /var/log/dpkg.log  # Debian-based systems
less /var/log/yum.log   # RHEL-based systems
#    - Undo or reinstall any suspicious packages.

# 8. Examine hardware logs
#    - Check for hardware errors:
dmesg | less
#    - Review system logs:
less /var/log/syslog  # Debian-based systems
less /var/log/messages  # RHEL-based systems

# 9. Verify boot sequence in BIOS/UEFI
#    - Reboot the server and access BIOS/UEFI settings.
#    - Ensure the correct boot device is selected.

# 10. Consult vendor documentation or support
#     - If hardware-specific issues are suspected, refer to the manufacturer's guidelines or contact support.

1. Access the Server Console

Physical Servers: Connect directly using a monitor and keyboard to observe boot processes and interact with the system.
Virtual Machines: Utilize the hypervisor's console access (e.g., VMware vSphere, Proxmox) to manage the VM directly.

2. Check for Hardware Issues

Physical Connections: Ensure all cables (power, data) are securely connected and the power supply is functioning.
Virtual Machines: Confirm the VM is powered on and has adequate resources (CPU, memory, disk space).

3. Review Boot Messages

Procedure:
1. Reboot the server.
2. Observe the boot sequence for any error messages or failures.
Purpose: Identifying specific errors during boot can pinpoint issues such as missing files, misconfigurations, or hardware failures.

4. Boot into Rescue Mode

Single-User Mode:

At the GRUB menu, edit the kernel line and append single or init=/bin/bash.

Example:

  linux /vmlinuz-5.4.0-42-generic root=/dev/sda1 single

This boots the server into a minimal environment for troubleshooting.

Using GRUB Bootloader:
1. Reboot the server.
2. At the GRUB menu, select the desired entry and press 'e' to edit.
3. Append systemd.unit=rescue.target to the kernel line.
4. Press Ctrl+x to boot.
Alternative: Boot from a live CD/USB and select the rescue mode option.
Purpose: Rescue mode provides a minimal environment to perform troubleshooting and repairs.

5. Check Filesystem Integrity

Identify Root Partition:
```
  lsblk
```
This command lists all block devices, helping to identify the root partition (e.g., /dev/sda1).
Run Filesystem Check:
```
  fsck /dev/sda1
  OR
  fsck -y /dev/sda1
```
Replace /dev/sda1 with the appropriate partition. The fsck tool checks and repairs filesystem inconsistencies.

Example Output:

  /dev/sda1: clean, 12345/123456 files, 1234567/12345678 blocks

If Errors Found: fsck will attempt to repair them.
Force Filesystem Check on Reboot:
- Command:
```
  touch /forcefsck
```
- Explanation: Forces a filesystem check during the next reboot.

6. Reinstall or Repair GRUB Bootloader

Rescue Mode:

Boot from a live CD/DVD or ISO image (e.g., CentOS, Ubuntu).
Mount the root filesystem and chroot into it:

Mount Filesystems:

  mount /dev/sda1 /mnt
  mount --bind /dev /mnt/dev
  mount --bind /proc /mnt/proc
  mount --bind /sys /mnt/sys

If /boot is a separate partition:

  mount /dev/sda2 /mnt/boot

Change Root and Reinstall GRUB:
```
  chroot /mnt
  grub-install /dev/sda
```
Replace /dev/sda with the appropriate disk.
Update GRUB Configuration:
- For Debian-based systems:
```
  update-grub
```
- For RHEL-based systems:
```
  grub2-mkconfig -o /boot/grub2/grub.cfg
```
Purpose: Reinstalling GRUB can resolve issues where the bootloader is corrupted or misconfigured.

Verify Bootloader Configuration

Check GRUB Configuration:
- Command:
```
  cat /boot/grub2/grub.cfg
```
- Explanation: Ensures the GRUB configuration is correct.
- Common Issues:
  - Missing kernel or initramfs entries.
  - Incorrect root device.
Reinstall GRUB:
- Command:
```
  grub2-install /dev/sda
  grub2-mkconfig -o /boot/grub2/grub.cfg
```
- Explanation: Reinstalls GRUB and regenerates the configuration file.

7. Review Recent System Changes

Check Recent Package Activities:
- Debian-based systems:
```
  less /var/log/dpkg.log
```
- RHEL-based systems:
```
  less /var/log/yum.log
```
Action: Identify and possibly revert recent updates or installations that might have caused the issue.

8. Examine Hardware Logs

Check for Hardware Errors:
```
  dmesg | less
```
This command displays kernel ring buffer messages, which can include hardware-related errors.
Review System Logs:
- Debian-based systems:
```
  less /var/log/syslog
```
- RHEL-based systems:
```
  less /var/log/messages
```
Purpose: Logs provide insights into hardware failures, driver issues, or other critical errors affecting boot.

Check Kernel and Initramfs
1. Verify Kernel and Initramfs:
  - Command:
```
  ls /boot
```
  - Explanation: Lists the kernel (vmlinuz) and initramfs (initrd) files.
  - Example Output:
```
  vmlinuz-5.4.0-42-generic
  initrd.img-5.4.0-42-generic
```
  - If Missing: Rebuild the initramfs:
```
  dracut --force
```
2. Rebuild Initramfs:
  - Command:
```
  mkinitrd -v -f /boot/initramfs-$(uname -r).img $(uname -r)
```

9. Verify Boot Sequence in BIOS/UEFI

Procedure:
1. Reboot the server and access BIOS/UEFI settings (usually by pressing keys like F2, F10, Del during startup).
2. Ensure the correct boot device (e.g., the primary hard drive) is selected.
Purpose: Incorrect boot order can lead the system to attempt booting from non-bootable devices.

Server rebooted but not available after maintenance/OS upgrade?

Quick Troubleshooting Steps

Bash script

1. Access the Server Console

2. Check for Hardware Issues

3. Review Boot Messages

4. Boot into Rescue Mode

5. Check Filesystem Integrity

6. Reinstall or Repair GRUB Bootloader

Verify Bootloader Configuration

7. Review Recent System Changes

8. Examine Hardware Logs

Check Kernel and Initramfs

9. Verify Boot Sequence in BIOS/UEFI

10. Consult Vendor Documentation or Support