Server rebooted but not available after maintenance/OS upgrade?

·

6 min read

  1. A server was rebooted in a "tick" (a scheduled maintenance window or specific time frame), but after the reboot, it is no longer accessible. What could be causing this issue? Additionally, in the case of a physical server that underwent maintenance and is not booting after the reboot, or a virtual machine (VM) that is not booting properly after an OS upgrade, what steps should be taken to diagnose and resolve the problem? Please provide a detailed explanation of the potential causes and troubleshooting steps for each scenario.

Server rebooted in 'Tick' or Timer issue"

  • This could refer to a kernel tick rate issue, system clock failure, or hardware timer synchronization problems. Such issues might cause the server to become unresponsive after reboot.

Quick Troubleshooting Steps

journalctl -xb  # Check logs of the last boot
dmesg | tail -50  # Review kernel logs before failure
cat /etc/fstab  # Check for errors in partition mounting
fsck /dev/sdaX  # Verify and repair filesystem issues
grub2-editenv list  # Check GRUB boot environment for issues

Bash script

# 1. Access the server console
#    - For physical servers: Connect a monitor and keyboard.
#    - For virtual machines: Use the hypervisor's console access feature.

# 2. Check for hardware issues
#    - Verify physical connections (cables, power supply).
#    - For virtual machines, ensure the VM is powered on and resources are allocated properly.

# 3. Review boot messages
#    - Reboot the server and observe messages for errors.
#    - If the system halts, note the error message displayed.

# 4. Boot into rescue mode
#    - For systems with GRUB bootloader:
#      1. Reboot the server.
#      2. At the GRUB menu, select the desired entry and press 'e' to edit.
#      3. Append 'systemd.unit=rescue.target' to the kernel line.
#      4. Press Ctrl+x to boot.
#    - Alternatively, boot from a live CD/USB and choose the rescue mode.

# 5. Check filesystem integrity
#    - Identify the root partition (e.g., /dev/sda1):
lsblk
#    - Run filesystem check (replace /dev/sda1 with your root partition):
fsck /dev/sda1

# 6. Reinstall or repair GRUB bootloader
#    - Mount the root filesystem:
mount /dev/sda1 /mnt
#    - If /boot is a separate partition, mount it:
mount /dev/sda2 /mnt/boot
#    - Mount necessary filesystems:
mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys
#    - Change root into the mounted system:
chroot /mnt
#    - Reinstall GRUB:
grub-install /dev/sda
#    - Update GRUB configuration:
update-grub  # For Debian-based systems
grub2-mkconfig -o /boot/grub2/grub.cfg  # For RHEL-based systems

# 7. Review recent system changes
#    - Check for recent package updates or installations:
less /var/log/dpkg.log  # Debian-based systems
less /var/log/yum.log   # RHEL-based systems
#    - Undo or reinstall any suspicious packages.

# 8. Examine hardware logs
#    - Check for hardware errors:
dmesg | less
#    - Review system logs:
less /var/log/syslog  # Debian-based systems
less /var/log/messages  # RHEL-based systems

# 9. Verify boot sequence in BIOS/UEFI
#    - Reboot the server and access BIOS/UEFI settings.
#    - Ensure the correct boot device is selected.

# 10. Consult vendor documentation or support
#     - If hardware-specific issues are suspected, refer to the manufacturer's guidelines or contact support.

1. Access the Server Console

  • Physical Servers: Connect directly using a monitor and keyboard to observe boot processes and interact with the system.

  • Virtual Machines: Utilize the hypervisor's console access (e.g., VMware vSphere, Proxmox) to manage the VM directly.

2. Check for Hardware Issues

  • Physical Connections: Ensure all cables (power, data) are securely connected and the power supply is functioning.

  • Virtual Machines: Confirm the VM is powered on and has adequate resources (CPU, memory, disk space).

3. Review Boot Messages

  • Procedure:

    1. Reboot the server.

    2. Observe the boot sequence for any error messages or failures.

  • Purpose: Identifying specific errors during boot can pinpoint issues such as missing files, misconfigurations, or hardware failures.

4. Boot into Rescue Mode

Single-User Mode:

  • At the GRUB menu, edit the kernel line and append single or init=/bin/bash.

  • Example:

      linux /vmlinuz-5.4.0-42-generic root=/dev/sda1 single
    
  • This boots the server into a minimal environment for troubleshooting.

  • Using GRUB Bootloader:

    1. Reboot the server.

    2. At the GRUB menu, select the desired entry and press 'e' to edit.

    3. Append systemd.unit=rescue.target to the kernel line.

    4. Press Ctrl+x to boot.

  • Alternative: Boot from a live CD/USB and select the rescue mode option.

  • Purpose: Rescue mode provides a minimal environment to perform troubleshooting and repairs.

5. Check Filesystem Integrity

  • Identify Root Partition:

      lsblk
    

    This command lists all block devices, helping to identify the root partition (e.g., /dev/sda1).

  • Run Filesystem Check:

      fsck /dev/sda1
      OR
      fsck -y /dev/sda1
    

    Replace /dev/sda1 with the appropriate partition. The fsck tool checks and repairs filesystem inconsistencies.

  • Example Output:

      /dev/sda1: clean, 12345/123456 files, 1234567/12345678 blocks
    
  • If Errors Found: fsck will attempt to repair them.

  • Force Filesystem Check on Reboot:

    • Command:

        touch /forcefsck
      
    • Explanation: Forces a filesystem check during the next reboot.

6. Reinstall or Repair GRUB Bootloader

Rescue Mode:

  • Boot from a live CD/DVD or ISO image (e.g., CentOS, Ubuntu).

  • Mount the root filesystem and chroot into it:

  • Mount Filesystems:

      mount /dev/sda1 /mnt
      mount --bind /dev /mnt/dev
      mount --bind /proc /mnt/proc
      mount --bind /sys /mnt/sys
    

    If /boot is a separate partition:

      mount /dev/sda2 /mnt/boot
    
  • Change Root and Reinstall GRUB:

      chroot /mnt
      grub-install /dev/sda
    

    Replace /dev/sda with the appropriate disk.

  • Update GRUB Configuration:

    • For Debian-based systems:

        update-grub
      
    • For RHEL-based systems:

        grub2-mkconfig -o /boot/grub2/grub.cfg
      
  • Purpose: Reinstalling GRUB can resolve issues where the bootloader is corrupted or misconfigured.

Verify Bootloader Configuration

  1. Check GRUB Configuration:

    • Command:

        cat /boot/grub2/grub.cfg
      
    • Explanation: Ensures the GRUB configuration is correct.

    • Common Issues:

      • Missing kernel or initramfs entries.

      • Incorrect root device.

  2. Reinstall GRUB:

    • Command:

        grub2-install /dev/sda
        grub2-mkconfig -o /boot/grub2/grub.cfg
      
    • Explanation: Reinstalls GRUB and regenerates the configuration file.

7. Review Recent System Changes

  • Check Recent Package Activities:

    • Debian-based systems:

        less /var/log/dpkg.log
      
    • RHEL-based systems:

        less /var/log/yum.log
      
  • Action: Identify and possibly revert recent updates or installations that might have caused the issue.

8. Examine Hardware Logs

  • Check for Hardware Errors:

      dmesg | less
    

    This command displays kernel ring buffer messages, which can include hardware-related errors.

  • Review System Logs:

    • Debian-based systems:

        less /var/log/syslog
      
    • RHEL-based systems:

        less /var/log/messages
      
  • Purpose: Logs provide insights into hardware failures, driver issues, or other critical errors affecting boot.

  • Check Kernel and Initramfs

    1. Verify Kernel and Initramfs:

      • Command:

          ls /boot
        
      • Explanation: Lists the kernel (vmlinuz) and initramfs (initrd) files.

      • Example Output:

          vmlinuz-5.4.0-42-generic
          initrd.img-5.4.0-42-generic
        
      • If Missing: Rebuild the initramfs:

          dracut --force
        
    2. Rebuild Initramfs:

      • Command:

          mkinitrd -v -f /boot/initramfs-$(uname -r).img $(uname -r)
        

9. Verify Boot Sequence in BIOS/UEFI

  • Procedure:

    1. Reboot the server and access BIOS/UEFI settings (usually by pressing keys like F2, F10, Del during startup).

    2. Ensure the correct boot device (e.g., the primary hard drive) is selected.

  • Purpose: Incorrect boot order can lead the system to attempt booting from non-bootable devices.

10. Consult Vendor Documentation or Support