Where am I?
In Bright Computing, Inc. you can ask and answer questions and share your experience with others!
when i use horovod-pytorch-py36-cuda10.1-gcc/0.17.1,It has an error message: [zack@node01 examples]$ horovodrun -n 4 ... (MPI) or CMake is installed (Gloo).
How can I prevent users from running big jobs on the cluster headnode and force them to use only the slurm batch system?
How can I remove nodes from a queue and add them to a new one? I've tried to do it in BrightView under Job Queues, but ... commands? I am currently using slurm.
I am seeing failing health checks for our "schedulers" on 3 of 4 compute nodes and only 1 job out of over 60 running on ... compute nodes at full capacity? Thanks