This assumes that the official DGX packages for RHEL have been installed (see this community answer90 for further details).
Docker
RHEL / CentOS Docker packages use Docker's native cgroup driver. That means changing the default driver in the Docker service unit file:
# sed -i 's/native.cgroupdriver=systemd/native.cgroupdriver=cgroupfs/' /cm/images/dgx-image/usr/lib/systemd/system/docker.service
Create a temporary directory for Docker images:
# mkdir -p /cm/images/dgx-image/var/lib/docker/tmp
# chmod 700 /cm/images/dgx-image/var/lib/docker/tmp
Test Docker:
# docker run --security-opt label=type:nvidia_container_t --rm nvcr.io/nvidia/cuda:10.0-runtime64 nvidia-smi
Kubernetes
- Install Docker on the head node:
# yum install -y cm-docker
- Change the partitioning in the DGX category to use Docker thin pool.
- Install Kubernetes with defaults.
# cm-kubernetes-setup --skip-docker
Test Kubernetes .
# module load kubernetes/default/1.12.10
# kubectl get pod --all-namespaces
# cat testgpu.yaml
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvidia/cuda:9.2-runtime
command: ["nvidia-smi"]
resources:
limits:
nvidia.com/gpu68: 8
# kubectl apply -f testgpu.yaml
# kubectl logs gpu-pod