Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • tr27p/ohpc_vagrant
  • jpr/ohpc_vagrant
  • louistw/ohpc_vagrant
  • krish94/ohpc_vagrant
  • mmoo97/ohpc_vagrant
  • chirag24/ohpc_vagrant
  • noe121/ohpc_vagrant
  • ishan747/ohpc_vagrant
  • alansill/ohpc_vagrant
  • atlurie/ohpc_vagrant
  • rrand11/ohpc_vagrant
11 results
Show changes
Commits on Source (48)
[submodule "CRI_XCBC"]
path = CRI_XCBC
url = git@github.com:jprorama/CRI_XCBC.git
url = https://github.com/jprorama/CRI_XCBC.git
Subproject commit 2fa0b835745ec05e2b61e737a1723a0bffeaedd4
Subproject commit ce62e52ee2c03a9db4e60b256c73dee3f99a7715
Project to provision an OpenHPC cluster via Vagrant using the
Project to provision an [OpenHPC](https://openhpc.community/) + [Open OnDemand](https://openondemand.org/) cluster via Vagrant using the
CRI_XCBC (XSEDE basic cluster) Ansible provisioning framework.
The Vagrantfile takes inspiration from the [vagrantcluster](https://github.com/cluening/vagrantcluster)
project but is oriented toward deploying only a master node
and using standard OHPC tools to provision the cluster, and
therfore favors the CRI_XCBC approach to ansible scripts just
project but is oriented toward deploying only a master node
and using standard OHPC tools to provision the cluster, and
therfore favors the CRI_XCBC approach to ansible scripts just
for the master.
The Vagrantfile is stripped to the core (rather that carry all
the cruft of a vagrant init). It leverages work from a
the cruft of a vagrant init). It leverages work from a
[pilot project](https://gitlab.rc.uab.edu/ravi89/ohpc_vagrant)
(primaryly the development of an updated centos 7.5 image)
but prefers a clean repo slate.
but prefers a clean repo slate.
## Project Setup
After cloning this project you need to initialize the submodule
from with in the git repo
Clone this project recursively to get the correct version for the
CRI_XSEDE submodule to build the OpenHPC(ohpc) and Open OnDemand (ood) nodes
```
git submodule init
git submodule update
git clone --recursive https://gitlab.rc.uab.edu/jpr/ohpc_vagrant.git
```
Alternatively you can provide the `--recurse-submodules` command
during the initial clone.
## Cluster Setup
After setting up the project above create your single node OpenHPC
cluster with vagrant:
```
vagrant up
vagrant up ohpc
```
NOTE: After you run the above command if you were to get a `"kernel mismatch error"`. To get past this error please run:
`vagrant ssh ohpc -c "uname -r"`.
Copy and paste this kernel version in the group_vars/all to update the kernel version in the `build_kernel_ver` variable.
The ansible config will bring the master node to the point where its
ready to ingest compute nodes via wwnodescan and prompt to you
start a compute node. You can create a compute node and start it with
......@@ -43,12 +44,12 @@ Create node c0 (choose whatever name makes sense, c0 matches the config):
compute_create c0
```
When prompted start node c0:
When prompted start compute node c0:
```
compute_start c0
```
If you want to stop the node:
If you want to stop the compute node:
```
compute_stop c0
```
......@@ -65,7 +66,7 @@ ipxe.iso in compute_create to match your local environment.
## Cluster Check
After the `vagrant up` completes you can can log into the cluster with `vagrant ssh`.
After the `vagrant up ohpc` completes you can can log into the cluster with `vagrant ssh ohpc`.
To confirm the system is operational run `sinfo` and you should see the following text:
```
......@@ -82,3 +83,42 @@ srun hostname
This should return the name `c0`.
With these tests confirmed you have a working OpenHPC cluster running slurm.
## Boot the Open OnDemand node
A primary function of this project is to provide a dev/test cluster for working
with Open OnDemand. After the cluster is up boot the ood node with:
```
vagrant up ood
```
This will provision the node.
NOTE: Near the end of the ood provisioning, the ansible scripts will display several
sudo commands that need to be run on the ohpc node to register the ood node
with the cluster. The commands ensure system file synchronization and slurm work.
You will need to copy and paste these sudo commands to a shell in ohpc. The
ansible script will pause for 90 seconds to give you time to do this.
After the node is provisioned (or booted) you need to work around a mount issue
with NFS mounts in the centos/7 vagrant box and issue the `mount -a` command
on the ood node:
```
vagrant ssh ood -c "sudo mount -a"
```
After this point you can connect to the web ui of the ood node, typically via
(the port mapping may change in your local vagrant env):
http://localhost:8080
The default user name and password for the web UI is 'vagrant'.
## Issues and Work arounds
If you encounter an issue with OHPC node provisioning due to GPG key errors as mentioned in https://github.com/jprorama/CRI_XCBC/issues/77. Please run the following command:
```
vagrant box update
```
If you encounter an issue with nodes_vivify role in updating the slurm status on nodes, specifically the error `slurm_update error: Invalid node state specified`. Please increase the compute node memory. For example if you're using 4GB already increase the memory to 6GB in your Virtual Box.
......@@ -3,17 +3,26 @@
Vagrant.configure("2") do |config|
# don't configure host-specific keys, config will use the user's key
config.ssh.insert_key = false
config.vm.define "ohpc" do |ohpc|
ohpc.vm.box = "ravi89/centos7.5"
ohpc.vm.box_version = "1"
ohpc.vm.box = "centos/7"
# version placeholder for selecting specific vagrant boxes
# used mainly for debugging and sanity checking
# leave commented to use the latest version in the local cache
#ood.vm.box_version = "1804.02"
ohpc.vm.hostname = "ohpc"
ohpc.vm.network "private_network", ip: "10.1.1.1", virtualbox__intnet: "compute"
#ohpc.vm.customize ["modifyvm", :id, "--name", "ohpc"]
end
config.vm.define "ood" do |ood|
ood.vm.box = "ravi89/centos7.5"
ood.vm.box_version = "1"
ood.vm.box = "centos/7"
# version placeholder for selecting specific vagrant boxes
# used mainly for debugging and sanity checking
# leave commented to use the latest version in the local cache
#ood.vm.box_version = "1804.02"
ood.vm.hostname = "ood"
ood.vm.network "private_network", ip: "10.1.1.254", virtualbox__intnet: "compute"
ood.vm.network "forwarded_port", guest: 80, host: 8080,
......@@ -22,19 +31,31 @@ Vagrant.configure("2") do |config|
auto_correct: true
end
config.vm.provider :virtualbox do |vb|
vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
vb.memory = "2048"
end
# define user's key and insecure default
# insecure default is required for initial provisioning
config.ssh.private_key_path = ["~/.ssh/id_rsa", "~/.vagrant.d/insecure_private_key"]
# append user's key to vagrant config to avoid overwrite of existing authorized_keys
# https://stackoverflow.com/a/31153912/8928529
config.vm.provision "ssh_pub_key", type: "shell" do |s|
ssh_pub_key = File.readlines("#{Dir.home}/.ssh/id_rsa.pub").first.strip
s.inline = <<-SHELL
echo #{ssh_pub_key} >> /home/vagrant/.ssh/authorized_keys
SHELL
end
config.vm.provision "shell", inline: <<-SHELL
if [ -f /vagrant/localenv.sh ]; then
. /vagrant/localenv.sh
fi
yum install -y epel-release
yum install -y ansible git vim bash-completion
ansible-playbook -c local -i /vagrant/CRI_XCBC/hosts -l `hostname` /vagrant/CRI_XCBC/site.yaml
ansible-playbook -c local -i /vagrant/CRI_XCBC/hosts -l `hostname` /vagrant/CRI_XCBC/site.yaml -b
SHELL
......
......@@ -9,8 +9,18 @@
nodename="$1"
VBoxManage createvm --name "$nodename" --register
# Ensure that the $HOME/iso directory exists
if [[ ! -d "$HOME/iso" ]]; then
mkdir ~/iso
fi
# Download the ipxe.iso boot image if it doesn't already exist
if [[ ! -f "$HOME/iso/ipxe.iso" ]]; then
echo "Downloading the network boot ipxe.iso file"
curl -o ~/iso/ipxe.iso http://boot.ipxe.org/ipxe.iso
fi
VBoxManage createvm --name "$nodename" --register
VBoxManage modifyvm "$nodename" --memory 4096 --nic1 intnet --intnet1 compute --nictype1=82540EM --rtcuseutc on
......