Skip to main content
Ask Question
NVIDIA
Asked a question 8 months ago

How do I run Docker and Kubernetes on DGX nodes that use RHEL/CentOS?

Where am I?

In Bright Computing, Inc. you can ask and answer questions and share your experience with others!

ianm
Support & Deployment Engineer

This assumes that the official DGX packages for RHEL have been installed (see this community answer65 for further details). 

Docker

RHEL / CentOS Docker packages use Docker's native cgroup driver. That means changing the default driver in the Docker service unit file:

# sed -i 's/native.cgroupdriver=systemd/native.cgroupdriver=cgroupfs/' /cm/images/dgx-image/usr/lib/systemd/system/docker.service

Create a temporary directory for Docker images:

# mkdir -p /cm/images/dgx-image/var/lib/docker/tmp
# chmod 700 /cm/images/dgx-image/var/lib/docker/tmp

Test Docker:

# docker run --security-opt label=type:nvidia_container_t --rm nvcr.io/nvidia/cuda:10.0-runtime35 nvidia-smi

Kubernetes

  • Install Docker on the head node:
# yum install -y cm-docker
  • Change the partitioning in the DGX category to use Docker thin pool.
  • Install Kubernetes with defaults.
# cm-kubernetes-setup --skip-docker

Test Kubernetes .

# module load kubernetes/default/1.12.10
# kubectl get pod --all-namespaces
# cat testgpu.yaml
apiVersion: v1
kind: Pod
metadata:
 name: gpu-pod
spec:
 restartPolicy: Never
 containers:
 - name: cuda-container
   image: nvidia/cuda:9.2-runtime
   command: ["nvidia-smi"]
   resources:
     limits:
       nvidia.com/gpu35: 8

# kubectl apply -f testgpu.yaml
# kubectl logs gpu-pod