Deploy CoD cluster
Deploy CoD cluster with samltest.idp following steps here: https://gitlab.rc.uab.edu/rc/cod-heat-stack
This is a continuously running test that runs every sprint on Thursday before Integration Day.
Designs
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Ravi Tripathi added Sprint 23-03 label
added Sprint 23-03 label
- Ravi Tripathi assigned to @louistw
assigned to @louistw
- Ravi Tripathi marked this issue as related to #261 (closed)
marked this issue as related to #261 (closed)
- Author
Adding previous instances of this issue from previous sprints.
We'll keep this issue running and move them across sprints to maintain a history
- Owner
Noticed shibboleth configuration issue when deploying with samltest idp.
Further investigation shows that the tarball for smaltest idp does not include the key and cert. Therefore each deployment uses a different key and cert. Since we don't update Metadata every time, it could not find the key to decrypt the message.
Solution:
Update the tarball to include key and cert, and upload the latest metadata, which has the included key and cert, to
samltest.id
. - Owner
Tarball has updated to include key and cert. Tested with samltest.id and worked fine.
- Bo-Chun Chen added Sprint 23-04 label and removed Sprint 23-03 label
added Sprint 23-04 label and removed Sprint 23-03 label
- Owner
Deployed today, found issue in
enable_lmod
role, #258 (comment 74063), other than that, SSO with samltest.id works fine.Edited by Bo-Chun Chen - Ravi Tripathi added Sprint 23-05 label and removed Sprint 23-04 label
added Sprint 23-05 label and removed Sprint 23-04 label
- Ravi Tripathi assigned to @krish94 and unassigned @louistw
- Ravi Tripathi changed the description
changed the description
- Ravi Tripathi added Sprint 23-06 label and removed Sprint 23-05 label
added Sprint 23-06 label and removed Sprint 23-05 label
- Ravi Tripathi assigned to @ravi89 and unassigned @krish94
- Author
Sprint 23-06
Ran into an error while trying to build CoD:
TASK [enable_lmod : Make sure lmod config is installed with cm version] ********************* fatal: [ohpc]: FAILED! => {"changed": false, "changes": {"installed": ["cm-modules-init-client"]}, "msg": "\n\nTransaction check error:\n file /etc/cm-release from install of cm-slave-9.0-44_cm9.0.noarch conflicts with file from package cm-master-9.0-49_cm9.0.noarch\n file /etc/profile.d/modules.sh from install of cm-modules-init-client-9.0-69_cm9.0.noarch conflicts with file from package cm-modules-init-9.0-64_cm9.0.noarch\n\nError Summary\n-------------\n\n", "rc": 1, "results": ["Loaded plugins: fastestmirror, priorities\nLoading mirror speeds from cached hostfile\n * base: mirror.team-cymru.com\n * cm-ml-rhel7-9.0-updates: updates-us-east.brightcomputing.com\n * cm-rhel7-9.0-updates: updates-us-east.brightcomputing.com\n * epel: ftp-nyc.osuosl.org\n * extras: mirrors.greenmountainaccess.net\n * updates: distro.ibiblio.org\n115 packages excluded due to repository priority protections\nResolving Dependencies\n--> Running transaction check\n---> Package cm-modules-init-client.noarch 0:9.0-69_cm9.0 will be installed\n--> Processing Dependency: cm-slave for package: cm-modules-init-client-9.0-69_cm9.0.noarch\n--> Running transaction check\n---> Package cm-slave.noarch 0:9.0-44_cm9.0 will be installed\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package Arch Version Repository Size\n================================================================================\nInstalling:\n cm-modules-init-client noarch 9.0-69_cm9.0 cm-rhel7-9.0-updates 6.4 k\nInstalling for dependencies:\n cm-slave noarch 9.0-44_cm9.0 cm-rhel7-9.0-updates 2.4 k\n\nTransaction Summary\n================================================================================\nInstall 1 Package (+1 Dependent package)\n\nTotal size: 8.8 k\nInstalled size: 8.8 k\nDownloading packages:\nRunning transaction check\nRunning transaction test\n"]} PLAY RECAP ********************************************************************************** ohpc : ok=2 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
Looks like a conflict between versions of
cm
Here's the ansible role where the error occurs:
- name: Tasks only needed in cm version block: - name: Make sure lmod config is installed with cm version ansible.builtin.yum: name: cm-modules-init-client state: present - name: Enable Lmod replace: path: "{{ enable_lmod_prefix }}/etc/sysconfig/modules/lmod/{{ item.path }}" regexp: "{{ item.regexp }}" replace: "{{ item.replace }}" loop: - { path: 'cm-lmod-init.sh', regexp: 'LMOD=.*$', replace: 'LMOD=1'} - { path: 'cm-lmod-init.csh', regexp: 'LMOD.*$', replace: 'LMOD "1"'} - name: Update Lmod Spider Cache setting ansible.builtin.replace: path: "{{ enable_lmod_prefix }}{{ lmod_loc }}/init/lmodrc.lua" regexp: '(\["{{ item.key }}"\] = ).*$' replace: \1 "{{ item.value }}", loop: - {"key": "dir", "value": "{{ lmod_cache_loc }}/sysCacheDir"} - {"key": "timestamp", "value": "{{ lmod_cache_loc }}/sysCacheTS.txt"} when: '"cm" in ansible_facts.packages.Lmod[0].release'
- Author
Issue was fixed in https://github.com/jprorama/CRI_XCBC/pull/423
Tested and merged
- Ravi Tripathi added Sprint 23-07 label and removed Sprint 23-06 label
added Sprint 23-07 label and removed Sprint 23-06 label
- Ravi Tripathi assigned to @atlurie and unassigned @ravi89
- Eesaan Atluri added Sprint 23-08 label and removed Sprint 23-07 label
added Sprint 23-08 label and removed Sprint 23-07 label
- Ravi Tripathi added Sprint 23-09 label and removed Sprint 23-08 label
added Sprint 23-09 label and removed Sprint 23-08 label
- Ravi Tripathi added Sprint 23-10 label and removed Sprint 23-09 label
added Sprint 23-10 label and removed Sprint 23-09 label