I've seen this happen a couple of times. Noticed this morning so reporting.
It looks like something might be failing with the account check in knightly's OOD login sequence. I go to main url and then get redirected through the dashboard to the /account url which then reports a failure.
The core issue is that I shouldn't be redirected because I have an account. But the end problem is that the /account app isn't functioning right and returns an error instead of an account status page.
I checked the knightly instance. It turns out that the nfs mount failed.
Nothing much is shown in log of account.service
$ journalctl -u accountMay 28 11:55:53 ood-knightly.openstack.internal systemd[1]: account.service: main process exited, code=exited, status=3/NOTIMPLEMENTEDMay 28 11:55:53 ood-knightly.openstack.internal systemd[1]: Unit account.service entered failed state.May 28 11:55:53 ood-knightly.openstack.internal systemd[1]: account.service failed.May 28 11:55:58 ood-knightly.openstack.internal systemd[1]: account.service holdoff time over, scheduling restart.May 28 11:55:58 ood-knightly.openstack.internal systemd[1]: Stopped uWSGI server for flask user registration.May 28 11:55:58 ood-knightly.openstack.internal systemd[1]: Started uWSGI server for flask user registration.May 28 11:55:58 ood-knightly.openstack.internal gunicorn[3330]: !!!May 28 11:55:58 ood-knightly.openstack.internal gunicorn[3330]: !!! WARNING: configuration file should have a valid Python extension.May 28 11:55:58 ood-knightly.openstack.internal gunicorn[3330]: !!!
I then checked user_auth.py which is pretty much the first script we run after SSO login and got this:
$ /opt/ood/ood_auth_map/bin/user_auth.py louistw-bash: /opt/ood/ood_auth_map/bin/user_auth.py: /cm/shared/rabbitmq_agents/venv/bin/python: bad interpreter: No such file or directory
Since /cm/shared mounted via nfs, I then check if nfs folders are mounted and they were not mounted. So I run mount -a and now /account is working again.
This is a good indicator that we need a second acceptance test for knightly: "did NFS mount?" Or more generally did the remote file systems mount. This is a common acceptance test for compute nodes on Cheaha since sometimes GPFS mounts fail and if they do a node shouldn't leave the drain state an accept jobs.