Fix some ordering in k8s-cluster.yml to install Helm properly and run all commands from kube-master[0]- fixing CentOS install#1128
Merged
ajdecon merged 15 commits intoNVIDIA:masterfrom Mar 24, 2022
Conversation
supertetelman
commented
Mar 23, 2022
| - include: ../bootstrap/bootstrap-openshift.yml | ||
|
|
||
| # GPU operator | ||
| - hosts: kube-master[0] |
Contributor
Author
There was a problem hiding this comment.
The expectation is that Helm commands are run from the provisioning node. No need to install Helm and run it on the management systems.
ajdecon
suggested changes
Mar 23, 2022
Contributor
ajdecon
left a comment
There was a problem hiding this comment.
Two issues to address:
ansible-lintfailed with a minor spacing issue:
Linting ./nfs-client-provisioner
WARNING Listing 1 violation(s) that are fatal
tasks/main.yml:4: [var-spacing] [LOW] Variables should have spaces before and after: "{{k8s_nfs_client_repo_name}}"
Warning: var-spacing Variables should have spaces before and after: "{{k8s_nfs_client_repo_name}}"
You can skip specific rules or tags by adding them to your configuration file:
# .ansible-lint
warn_list: # or 'skip_list' to silence them completely
- var-spacing # Variables should have spaces before and after: {{ var_name }}
Finished with 1 failure(s), 0 warning(s) on 2 files.
- The Jenkins end-to-end test failed. This might be a transient failure, so it's worth re-running, then debugging if it repeats.
TASK [install nfs-client-provisioner] ******************************************
fatal: [localhost]: FAILED! => changed=false
cmd:
- /usr/local/bin/helm
- upgrade
- --install
- nfs-subdir-external-provisioner
- nfs-subdir-external-provisioner/nfs-subdir-external-provisioner
- --create-namespace
- --namespace
- deepops-nfs-client-provisioner
- --version
- 4.0.13
- --set
- nfs.server=127.0.0.1
- --set
- nfs.path=/export/deepops_nfs
- --set
- storageClass.defaultClass=true
- --wait
delta: '0:00:00.060010'
end: '2022-03-23 03:29:00.175032'
msg: non-zero return code
rc: 1
start: '2022-03-23 03:29:00.115022'
stderr: |-
Error: Kubernetes cluster unreachable: <html><head><meta http-equiv='refresh' content='1;url=/login?from=%2Fversion%3Ftimeout%3D32s'/><script>window.location.replace('/login?from=%2Fversion%3Ftimeout%3D32s');</script></head><body style='background-color:white; color:white;'>
Authentication required
<!--
You are authenticated as: anonymous
Groups that you are in:
Permission you need to have (but didn't): hudson.model.Hudson.Read
... which is implied by: hudson.security.Permission.GenericRead
... which is implied by: hudson.model.Hudson.Administer
-->
</body></html>
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
supertetelman
commented
Mar 23, 2022
supertetelman
commented
Mar 23, 2022
supertetelman
commented
Mar 23, 2022
ajdecon
approved these changes
Mar 24, 2022
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Our Helm installs were doing a mix of running from localhost and/or kube-master[0]. This was causing issues in the nfs-client-provisioner because the CentOS kubespray installer was not properly installing kubectl on the kube-master nodes.
For now I am aligning everything to what we did in GPU Operator. In the future, it would make sense to use the now functional helm Ansible module and run things from localhost (the provisioning node) instead of the kube-master[0]. This would simply allow us to install less binaries on the management nodes, but beyond that it is not a necessary change.
Also added the standard proxy commands to a few places where they were missing in helm installs.
Additionally I moved the block of code that runs helm/kubectl commands to be after the block where we actually install the proper kubectl/helm binaries. This was causing issues in the edge-cases on CentOS because of how different software was installed across Ubuntu/CentOS.
The automated testing already tests all the paths that this touches.