1597704 : batch1500 & batch1600 booten nicht in PXE fuer Neuinstallation

Created: 2026-05-03T14:59:07Z - current status: new

Here is the anonymized and summarized version of the reported issue:


Summary of the Issue

A user reports that multiple machines ([MACHINE_ID_1]: batch1500, [MACHINE_ID_2]: batch1600) fail to boot properly into the PXE network installer despite explicit configuration via IPMI. The problems are as follows:

  1. [MACHINE_ID_1] (batch1500):
  2. The PXE boot process hangs during connection establishment (as shown in attached screenshots, not provided here).
  3. No further progress is observed.

  4. [MACHINE_ID_2] (batch1600):

  5. Machines do not receive the PXE image and proceed to disk boot instead.
  6. Manual PXE boot via console works, but this was previously unnecessary (e.g., in early April, the process worked automatically).

The user is manually installing [MACHINE_ID_2] as a temporary workaround but requests investigation into the root cause.


Possible Solutions/Next Steps

  1. Verify PXE Server Configuration:
  2. Check if the PXE server (e.g., DHCP/TFTP) is correctly configured to serve the required boot images for these machine types.
  3. Ensure the MAC addresses of the affected machines are whitelisted or properly registered in the PXE server’s configuration.

  4. Network Connectivity:

  5. Confirm that the machines can reach the PXE server (e.g., via ping or traceroute from another host on the same network).
  6. Check for firewall rules or VLAN misconfigurations that might block PXE traffic (UDP ports 67, 68, 69, and 4011).

  7. IPMI-Specific Checks:

  8. Verify that the IPMI settings for PXE boot are correctly applied (e.g., "Boot Device" set to "PXE" and "Persistent" enabled).
  9. Test with a hard reset of IPMI (e.g., ipmitool mc reset cold) to rule out firmware issues.

  10. Firmware/BIOS Updates:

  11. Check if the machines’ BIOS/UEFI firmware is up to date, as older versions may have PXE compatibility issues.

  12. Logs and Debugging:

  13. Review PXE server logs (e.g., /var/log/syslog or DHCP server logs) for errors related to the affected machines.
  14. Capture packet traces (e.g., tcpdump) on the PXE server during boot attempts to diagnose connection drops.

  15. Temporary Workaround:

  16. If manual PXE boot via console works, document the steps for the user to use until the issue is resolved.

References

  • No direct context from the provided NAF documentation was applicable to this specific PXE boot issue. The problem appears to be related to network boot infrastructure rather than NAF-specific services (e.g., CVMFS or job submission).
  • For further debugging, consult the documentation of the PXE server software (e.g., ISC DHCP + TFTP) or the machine vendor’s IPMI/BIOS guides.