Server rebooted but not available after maintenance/OS upgrade?
A server was rebooted in a "tick" (a scheduled maintenance window or specific time frame), but after the reboot, it is no longer accessible. What could be causing this issue? Additionally, in the case of a physical server that underwent maintenance and is not booting after the reboot, or a virtual machine (VM) that is not booting properly after an OS upgrade, what steps should be taken to diagnose and resolve the problem? Please provide a detailed explanation of the potential causes and troubleshooting steps for each scenario.
Server rebooted in 'Tick' or Timer issue"
- This could refer to a kernel tick rate issue, system clock failure, or hardware timer synchronization problems. Such issues might cause the server to become unresponsive after reboot.
Quick Troubleshooting Steps
journalctl -xb # Check logs of the last boot
dmesg | tail -50 # Review kernel logs before failure
cat /etc/fstab # Check for errors in partition mounting
fsck /dev/sdaX # Verify and repair filesystem issues
grub2-editenv list # Check GRUB boot environment for issues
Bash script
# 1. Access the server console
# - For physical servers: Connect a monitor and keyboard.
# - For virtual machines: Use the hypervisor's console access feature.
# 2. Check for hardware issues
# - Verify physical connections (cables, power supply).
# - For virtual machines, ensure the VM is powered on and resources are allocated properly.
# 3. Review boot messages
# - Reboot the server and observe messages for errors.
# - If the system halts, note the error message displayed.
# 4. Boot into rescue mode
# - For systems with GRUB bootloader:
# 1. Reboot the server.
# 2. At the GRUB menu, select the desired entry and press 'e' to edit.
# 3. Append 'systemd.unit=rescue.target' to the kernel line.
# 4. Press Ctrl+x to boot.
# - Alternatively, boot from a live CD/USB and choose the rescue mode.
# 5. Check filesystem integrity
# - Identify the root partition (e.g., /dev/sda1):
lsblk
# - Run filesystem check (replace /dev/sda1 with your root partition):
fsck /dev/sda1
# 6. Reinstall or repair GRUB bootloader
# - Mount the root filesystem:
mount /dev/sda1 /mnt
# - If /boot is a separate partition, mount it:
mount /dev/sda2 /mnt/boot
# - Mount necessary filesystems:
mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys
# - Change root into the mounted system:
chroot /mnt
# - Reinstall GRUB:
grub-install /dev/sda
# - Update GRUB configuration:
update-grub # For Debian-based systems
grub2-mkconfig -o /boot/grub2/grub.cfg # For RHEL-based systems
# 7. Review recent system changes
# - Check for recent package updates or installations:
less /var/log/dpkg.log # Debian-based systems
less /var/log/yum.log # RHEL-based systems
# - Undo or reinstall any suspicious packages.
# 8. Examine hardware logs
# - Check for hardware errors:
dmesg | less
# - Review system logs:
less /var/log/syslog # Debian-based systems
less /var/log/messages # RHEL-based systems
# 9. Verify boot sequence in BIOS/UEFI
# - Reboot the server and access BIOS/UEFI settings.
# - Ensure the correct boot device is selected.
# 10. Consult vendor documentation or support
# - If hardware-specific issues are suspected, refer to the manufacturer's guidelines or contact support.
1. Access the Server Console
Physical Servers: Connect directly using a monitor and keyboard to observe boot processes and interact with the system.
Virtual Machines: Utilize the hypervisor's console access (e.g., VMware vSphere, Proxmox) to manage the VM directly.
2. Check for Hardware Issues
Physical Connections: Ensure all cables (power, data) are securely connected and the power supply is functioning.
Virtual Machines: Confirm the VM is powered on and has adequate resources (CPU, memory, disk space).
3. Review Boot Messages
Procedure:
Reboot the server.
Observe the boot sequence for any error messages or failures.
Purpose: Identifying specific errors during boot can pinpoint issues such as missing files, misconfigurations, or hardware failures.
4. Boot into Rescue Mode
Single-User Mode:
At the GRUB menu, edit the kernel line and append
single
orinit=/bin/bash
.Example:
linux /vmlinuz-5.4.0-42-generic root=/dev/sda1 single
This boots the server into a minimal environment for troubleshooting.
Using GRUB Bootloader:
Reboot the server.
At the GRUB menu, select the desired entry and press 'e' to edit.
Append
systemd.unit=
rescue.target
to the kernel line.Press Ctrl+x to boot.
Alternative: Boot from a live CD/USB and select the rescue mode option.
Purpose: Rescue mode provides a minimal environment to perform troubleshooting and repairs.
5. Check Filesystem Integrity
Identify Root Partition:
lsblk
This command lists all block devices, helping to identify the root partition (e.g.,
/dev/sda1
).Run Filesystem Check:
fsck /dev/sda1 OR fsck -y /dev/sda1
Replace
/dev/sda1
with the appropriate partition. Thefsck
tool checks and repairs filesystem inconsistencies.Example Output:
/dev/sda1: clean, 12345/123456 files, 1234567/12345678 blocks
If Errors Found:
fsck
will attempt to repair them.Force Filesystem Check on Reboot:
Command:
touch /forcefsck
Explanation: Forces a filesystem check during the next reboot.
6. Reinstall or Repair GRUB Bootloader
Rescue Mode:
Boot from a live CD/DVD or ISO image (e.g., CentOS, Ubuntu).
Mount the root filesystem and chroot into it:
Mount Filesystems:
mount /dev/sda1 /mnt mount --bind /dev /mnt/dev mount --bind /proc /mnt/proc mount --bind /sys /mnt/sys
If
/boot
is a separate partition:mount /dev/sda2 /mnt/boot
Change Root and Reinstall GRUB:
chroot /mnt grub-install /dev/sda
Replace
/dev/sda
with the appropriate disk.Update GRUB Configuration:
For Debian-based systems:
update-grub
For RHEL-based systems:
grub2-mkconfig -o /boot/grub2/grub.cfg
Purpose: Reinstalling GRUB can resolve issues where the bootloader is corrupted or misconfigured.
Verify Bootloader Configuration
Check GRUB Configuration:
Command:
cat /boot/grub2/grub.cfg
Explanation: Ensures the GRUB configuration is correct.
Common Issues:
Missing kernel or initramfs entries.
Incorrect root device.
Reinstall GRUB:
Command:
grub2-install /dev/sda grub2-mkconfig -o /boot/grub2/grub.cfg
Explanation: Reinstalls GRUB and regenerates the configuration file.
7. Review Recent System Changes
Check Recent Package Activities:
Debian-based systems:
less /var/log/dpkg.log
RHEL-based systems:
less /var/log/yum.log
Action: Identify and possibly revert recent updates or installations that might have caused the issue.
8. Examine Hardware Logs
Check for Hardware Errors:
dmesg | less
This command displays kernel ring buffer messages, which can include hardware-related errors.
Review System Logs:
Debian-based systems:
less /var/log/syslog
RHEL-based systems:
less /var/log/messages
Purpose: Logs provide insights into hardware failures, driver issues, or other critical errors affecting boot.
Check Kernel and Initramfs
Verify Kernel and Initramfs:
Command:
ls /boot
Explanation: Lists the kernel (
vmlinuz
) and initramfs (initrd
) files.Example Output:
vmlinuz-5.4.0-42-generic initrd.img-5.4.0-42-generic
If Missing: Rebuild the initramfs:
dracut --force
Rebuild Initramfs:
Command:
mkinitrd -v -f /boot/initramfs-$(uname -r).img $(uname -r)
9. Verify Boot Sequence in BIOS/UEFI
Procedure:
Reboot the server and access BIOS/UEFI settings (usually by pressing keys like F2, F10, Del during startup).
Ensure the correct boot device (e.g., the primary hard drive) is selected.
Purpose: Incorrect boot order can lead the system to attempt booting from non-bootable devices.