1596802 : naf-jhub - Server spawn failed

Created: 2026-04-29T05:29:33Z - current status: new

Anonymized Summary: A user reports persistent issues when attempting to start a Jupyter notebook server on the NAF platform. The error message indicates a timeout after 60 seconds: "Spawn failed: Server at http://[WORKER_NODE]:40000/user/[USERNAME]/api didn't respond in 60 seconds".

Attempts to resolve the issue by retrying with different configurations (e.g., GPU/non-GPU) have been unsuccessful.


Possible Causes & Solutions:

Based on the provided context, the issue may stem from one of the following:

  1. Deadlocked Notebook Entry in the Hub
  2. The user’s notebook state may be stuck due to a race condition or failed route deletion in the JupyterHub database.
  3. Solution:

    • Try using the "Stop My Server" button in the JupyterHub GUI to force removal of the obsolete entry.
    • If the button fails, the issue may require admin intervention (e.g., account reset or hub restart). The user can contact NAF support for assistance.
  4. Disk Quota Exceeded

  5. The error log in .jupyterhub.condor.err (located in the user’s $HOME directory) may show: Failed to write server-info to [PATH]: OSError(122, 'Disk quota exceeded')
  6. Solution:

    • Check disk usage (fs lq or quota) and clean up unnecessary files in $HOME.
    • Request a quota increase if needed.
  7. Unresponsive CVMFS Mounts

  8. A known issue (December 2024) where CVMFS mounts prevent the notebook from reporting its port to the hub, leading to timeouts.
  9. Solution:

    • Retry spawning after some time, as the healthcheck bug has been fixed and faulty nodes should now be removed automatically.
  10. Local Jupyter Configuration Issue

  11. A misconfigured setting in .jupyter/jupyter_server_config.json (e.g., open_browser = False) may block the hub connection.
  12. Solution:

    • Check the file and comment out problematic entries.
  13. Worker Node Issues

  14. The specific worker node (batchj004.desy.de) may be unresponsive or overloaded.
  15. Solution:
    • Wait and retry later, or contact NAF admins to investigate the node.

  1. Check Logs:
  2. Review .jupyterhub.condor.err and .jupyterhub.condor.out in $HOME for detailed error messages.
  3. Look for Disk quota exceeded or CVMFS-related errors.

  4. Force Server Removal:

  5. Use the "Stop My Server" button in the JupyterHub GUI to clear stale entries.

  6. Clean Up Disk Space:

  7. Free up space in $HOME if quota is exceeded.

  8. Retry Later:

  9. If the issue persists, wait and retry, as it may be related to temporary node or CVMFS problems.

  10. Contact Support:

  11. If none of the above works, reach out to NAF admins for further diagnosis.

Sources: - NAF Documentation: Notebook Spawning Issues - JupyterHub BatchSpawner Issue #314