I was installing k8s-cluster.yml on a DGX A100 that had previously been running microk8s and kubespray k8s. I got the below error:
TASK [nvidia.nvidia_docker : set docker daemon configuration] **********************************************************
changed: [localhost]
TASK [nvidia.nvidia_docker : grab nvidia-docker wrapper] ***************************************************************
changed: [localhost]
RUNNING HANDLER [nvidia.nvidia_docker : restart docker] ****************************************************************
fatal: [localhost]: FAILED! => changed=false
msg: |-
Unable to restart service docker: Warning: The unit file, source configuration file or drop-ins of docker.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
NO MORE HOSTS LEFT *****************************************************************************************************
PLAY RECAP *************************************************************************************************************
localhost : ok=59 changed=21 unreachable=0 failed=1 skipped=16 rescued=0 ignored=0
dgxa100 : ok=1 changed=0 unreachable=0 failed=1 skipped=1 rescued=0 ignored=0
Was able to proceed by running a reload of the daemon and the below steps (no configuration changes were involved):
$:~/deepops$ sudo systemctl daemon-reload
$:~/deepops$ sudo systemctl stop docker
$:~/deepops$ sudo systemctl start docker
We may need to add a daemon-reload to the nvidia-docker Galaxy role.
I was installing k8s-cluster.yml on a DGX A100 that had previously been running microk8s and kubespray k8s. I got the below error:
Was able to proceed by running a reload of the daemon and the below steps (no configuration changes were involved):
We may need to add a daemon-reload to the nvidia-docker Galaxy role.