Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • rrand11/terraform-openstack
  • louistw/terraform-openstack
  • chirag24/terraform-openstack
  • mmoo97/terraform-openstack
  • jpr/terraform-openstack
  • ravi89/terraform-openstack
  • noe121/terraform-openstack
  • ishan747/terraform-openstack
  • clint93/terraform-openstack
  • ravi89/terraform_openstack
  • krish94/terraform-openstack
  • rc/terraform-openstack
12 results
Show changes
Commits on Source (340)
Showing
with 5 additions and 1293 deletions
[submodule "CRI_XCBC"]
path = CRI_XCBC
url = https://github.com/jprorama/CRI_XCBC.git
branch = feat-openstack
Subproject commit 1d7a787a562130d8b96d1b85d087c4a3ecc34f41
MIT License
Copyright (c) 2018 XSEDE
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Simple playbooks to install OpenHPC version 1.0 using Ansible.
See the doc/README.md for a tutorial on using these scripts in a VirtualBox environment.
The Ansible layout is fairly simple, using a series of roles for different parts of the installation process.
This repo will get you to the point of a working slurm installation across your cluster. It does not
currently provide any scientific software or user management options!
The basic usage is to set up the master node with the initial 3 roles (pre\_ohpc,ohpc\_install,ohpc\_config)
and use the rest to build node images, and deploy the actual nodes (these use Warewulf as a provisioner by default).
Trigger the roles individually via tags, like:
```
ansible-playbook -t pre_ohpc -i inventory/headnode headnode.yml
```
None of these Ansible roles actually touch the compute nodes directly - at most, they build a new vnfs image and
trigger a reboot.
A more detailed description is available in the /doc folder.
[defaults]
retry_files_enabled = False
inventory = ./inventory/headnode
[ssh_connection]
control_path = ~/.ssh/ansible-%%r@%%h:%%p
---
- hosts: compute
roles:
- pre_ohpc
- ohpc_install
- ohpc_config
- compute_build_vnfs
- compute_build_nodes
- nodes_vivify
Introduction
============
This is a basic quickstart guide for the CentOS 7 version of the XSEDE
Compatible Basic Cluster, based on the OpenHPC Project.
(https://openhpc.community). It covers initial setup of your hardware
(or virtual machines), configuration options for the ansible scripts,
and a brief walkthrough of how to use the scripts.
The provided scripts are designed to be able to provision three types of
nodes: basic compute nodes, login nodes, and GPU nodes.
By the end of the guide, you should have a working cluster with running
Slurmctld, which can accept and run jobs on all nodes.
If you encounter errors, have questions, suggestions, or comments,
please contact the XCRI Team by emailing help@xsede.org. Be sure to ask
for the XCRI team!
XCBC Overview
=============
The XCBC project is designed to provide the basic software necessary to create
and HPC environment similar to that found on XSEDE resources, with open-source
software and a minimum of fuss.
We use the OpenHPC repositories (link) for setup of the cluster management
software and scheduler.
Ansible is used in this toolkit to provide an idempotent, non-invasive method
of managing the cluster headnode. Ideally, the admin installing the cluster
will only have to customize a single file before running the included
playbooks. This guide walks the user through the cluster build by actually
running the ansible playbooks locally on the headnode, but this could
be done from a different machine just as easily, with a few tweaks to the
inventory file. We did not wish to force the reader to make changes
to some local machine, and so elected to keep everything on the VMs built
specifically for this guide.
All of the (intended) customizable variables exist in the '''group_vars/all'''
file, which is described in more detail below (Section 3: Defining Cluster Parameters)
The installation process, at a high level, takes place in six phases:
(ignoring hardware/VM setup)
1\. Installation of the bare OS on the headnode
2\. Installation of the XCBC toolkit scripts and dependencies
3\. Defining cluster parameters
4\. Configuration of the headnode via Ansible
5\. Installation of the compute nodes
6\. Testing the scheduler
This guide in particular will walk through the steps of building an XCBC using
VMs defined in VirtualBox, though this generalizes very well to a bare-metal
deployment as well.
Common Acronyms and Abbreviations
=================================
XCBC = XSEDE Compatible Basic Cluster
XNIT = XSEDE National Integration Toolkit
WW = Warewulf - the cluster management software preferred in the OpenHPC
project.
VM = Virtual Machine
NIC = Network Interface Card
NAT = Network Address Translation - used by Virtualbox to provide
a connection from the Headnode VM to the outside world.
OS = Operating System
HPC = High Performance Computing/Cluster/Computer
Initial Setup On VirtualBox
===========================
Create (at least) two VM’s - one to be the headnode, and one to be a
compute node.
For the headnode, activate three network interfaces, attached to NAT,
Internal Network, and Host-only. (For hardware you would require only
two, but having both NAT and host-only simplifies configuration on the
headnode.)
In this configuration, the assumption is that there is a host-only network
on Virtualbox configured with the internal DHCP server on.
(Under File-\>Preferences-\>Networking-\>Host-only Networks).
The default network is 192.168.56.0, but feel free to change this as you
prefer.
1\. Configure the network interfaces on the
headnode. There are three: one for NAT, which provides connection to the
outside world, one for a host-only network,
and one for the internal network, which connects to
compute nodes.
The host-only network is for an ssh connection into the main host. You could also use
this as the interface for the headnode in an ansible inventory file, and
run these roles against the headnode remotely.
Use the DHCP server provided by Virtualbox; you will find the ip address given to
the VM after installation of the OS. It is possible to use a static IP, but
this is somewhat unreliable on successive reboots or image copies.
For the compute nodes, define two virtual machines, 'compute-0' and 'compute-1' with
the boot order (Under 'Settings->General') set to Network ONLY, and a single ethernet
interface, on the internal network. DO NOT INSTALL ANYTHING ON THESE VMs - The images
will be generated and pushed out from the headnode during this tutorial. Make sure they have
at least 2GB of RAM - otherwise the disk images built in this tutorial will be too large,
and you will encounter mysterious errors.
Building the Cluster
====================
Installation of the base OS on the headnode
-------------------------------------------
1\. Install CentOS 7.x minimal, on the headnode VM,
During installation, the default partition setup is fine.
It helps to set up the three network interfaces at this point.
Don't touch the 'NAT' interface, other than to check the 'Always Connect' box under
'Configure->General'. The same goes for the 'host-only' network.
Configure the internal network interface to have address 10.0.0.1, netmask /24 and gateway
0.0.0.0 - the headnode will act as router for the compute nodes.
/24 is important, so that Warewulf will see the compute nodes as existing on
the same network as the headnode interface!!! Don't forget to also check the
'Always Connect' box.
2\. After installation, check the ip of your headnode on the host-only adapter via
ip addr
Compare the MAC addresses of the interfaces on your headnode with those listed
in Virtualbox, to be sure you substitute the correct device names below!
(Typically, they show up as something like enp0s3,enp0s8, etc. - it pays to
double-check!)
The NAT ip address will be used sparingly in the following documentation, but will
be called ```$public-nic```. Virtualbox assigns these as 10.0.x.15, where x begins
at 2 for the 1st VM, 3 for the 2nd, etc.
Save the ip address of the interface on the host-only network -
you'll use this as the address for the headnode in the ansible scripts,
and it will be referred to as ```$host-nic```
The ip address for the internal nic was set earlier, and will be referred to
either as 10.0.0.1 or ```$internal-nic```
Make sure that the host-only and internal adapters are not set as default
routes - ```ip route show``` should not list them as default!
If you do see this, something like ```ip route del default via 10.0.0.1```
should do the trick, along with editing 'DEFROUTE=no' in
the relevant ```/etc/sysconfig/network-scripts/ifcfg-``` file.
After checking the interfaces, ensure that the private nic is set in the
internal firewall zone, and that the public is set in the public
firewall zone.
nmcli connection modify $internal-nic connection.zone internal
nmcli connection modify $public-nic connection.zone public
You may also need to ensure that the connections will autoconnect on reboot:
nmcli con modify $internal-nic connection.autoconnect=yes
nmcli con modify $public-nic connection.autoconnect=yes
(replace enp0s3 with each of your interfaces! The host-only and internal network
interfaces are the most likely to have this turned off by default.)
Connecting to your headnode
---------------------------
Instead of using the VirtualBox terminal, it's often much simpler to ssh in to the headnode
from your native local terminal - which allows for copy-pasting, window history, etc.
Check the address of the host-only network using the ```ip addr``` command on the
headnode - usually in the ```192.168.56.0\24``` by default.
From your host machine, open a terminal emulator, and you should be able to ssh in as
root (using the password you set during install - running ```ssh-copy-id root@$headnode_ip```
is also quite useful, if you're on a Linux host machine.).
Follow the guide below from your local terminal, rather than the VirtualBox terminal.
(primarily for ease of use)
Installation of the XCBC Tools and Dependencies
-----------------------------------------------
####Please note - this is meant to be run as the root user!
0\. ```yum install git vim bash-completion```
Git is necessary for getting the
playbooks; vim and bash-completion are just nice add-ons. Install your
editor of choice!
1\. `git clone https://github.com/XSEDE/CRI_XCBC/ `
Get the actual playbooks.
This creates a directory named `CRI_XCBC` in your current directory, which
contains the XCBC Ansible playbooks.
2\. On the headnode, from your ${HOME} directory,
run `ssh-keygen`, to create a local set of ssh keys, followed by
`cat .ssh/id_rsa.pub >> .ssh/authorized_keys`
3\. ```cd ./CRI_XCBC``` and then run the ```install_ansible.sh``` script.
The script creates a python virtualenv named “ansible” in
```${HOME}/ansible_env/ansible```, in order to avoid polluting
the system python installation. The ansible source code is cloned into
```${HOME}/ansible_env/ansible_source```.
### Prepare your shell session
The next two steps prepare your shell for using the ansible playbooks,
by source two files containing environment variables - one for a
python virtualenv, and one for the local installation of ansible.
4\. `source ${HOME}/ansible_env/ansible/bin/activate`
Loads the ansible virtualenv into the current session.
5\. `source ${HOME}/ansible_env/ansible_source/hacking/env-setup `
Loads the ansible environment variables into the current session.
Defining Cluster Parameters
---------------------------
Inside the ```CRI_XCBC``` directory, examine the file ```group_vars/all```.
This file contains
several important parameters for the cluster installation. The current
defaults should work with the Virtualbox configuration suggested
above. This is the only file that should be edited during this tutorial!
Other files that would be useful or necessary to edit during a production
build will be pointed out as we go along.
(The format here is
- ```parameter_name: default_value```
description
)
Separated by category, the full list of parameters is:
#### OpenHPC Release Version
- ```openhpc_release_rpm: "https://github.com/openhpc/ohpc/releases/download/v1.3.GA/ohpc-release-1.3-1.el7.x86_64.rpm"```
This contains the version number and path
of the current openhpc release rpm. Older versions are listed and
commented out. generate the list of these via
curl -s https://github.com/openhpc/ohpc/releases/ | grep rpm | grep -v sles | grep -v strong | sed 's/.*="\(.*\)".*".*".*/\1/'
#### Headnode Information
- ```public_interface: enp0s3 ```
The device name of the public NIC on the
headnode (which provides access to the outside internet)
- ```private_interface: enp0s9```
The device name of the private NIC on the
headnode, which gives access to the compute nodes
- ```headnode_private_ip: "10.0.0.1"```
The ip of the headnode on the private network
- ```build_kernel_ver: '3.10.0-327.el7.x86_64'```
`uname -r` at build time - required for Warewulf to build bootstrap
images for the compute nodes. THIS SHOULD BE UPDATED AT RUN-TIME!
#### Private network parameters
No changes are necessary in this section.
These are the default parameters used for the private network that is
used to communicate with/between the compute nodes. The compute_ip parameters
define the range over which the dhcp server on the headnode will offer
IP addresses.
If you change the subnet here, make sure to do so consistently! The
network_mask (CIDR mask) and network_long_netmask must cover the same subnet,
and the compute_ip limits must fall within the same subnet.
- ```private_network: "10.0.0.0"```
- ```private_network_mask: "24"```
- ```private_network_long_netmask: "255.255.255.0"```
- ```compute_ip_minimum: "10.0.0.2"```
- ```compute_ip_maximum: "10.0.0.255"```
#### slurm.conf variables
These are added to the SLURM configuration file as needed
- ```cluster_name: "xcbc-example"```
The name you’d like to give your cluster. This will
be inserted into the slurm.conf file.
- ```gres_types: "gpu" ```
(if any GPU nodes exist) - any types of consumable
resources that exist on your cluster.
COMMENTED OUT BY DEFAULT - IF YOU ARE BUILDING ON A PHYSICAL SYSTEM
WITH GPU NODES, UNCOMMENT THIS LINE in ```${HOME}/CRI_XCBC/group_vars/all```!
#### Stateful Node controls
- ```stateful_nodes: false```
Choose whether or not you’d like
to build stateful compute nodes, or go with the Warewulf default of
having nodes pull down a new image each time they boot.
CURRENTLY NOT IMPLEMENTED, THE DEFAULT IS FALSE.
#Node Config Vars - for stateful nodes
- ```sda1: "mountpoint=/boot:dev=sda1:type=ext3:size=500"```
- ```sda2: "dev=sda2:type=swap:size=500"```
- ```sda3: "mountpoint=/:dev=sda3:type=ext3:size=fill"```
These options must be defined in order for compute nodes
to boot from local disk. Currently outside the scope of this
tutorial.
#### GPU Necessities
- ```nvidia_driver_installer: "NVIDIA-Linux-x86_64-375.39.run"```
Contains the full name of
the NVIDIA driver installer. This should be downloaded and placed
in `CRI_XCBC/roles/gpu_build_vnfs/files/`.
COMMENTED OUT BY DEFAULT - ONLY NECESSARY FOR CLUSTERS WITH GPU
NODES.
#### Warewulf Parameters
The following should not be changed, unless you are familiar with the guts
of these playbooks. They are used in defining the images for different
types of compute nodes, and must have corresponding names in the
directory defined by the ```template_path``` variable.
- ```template_path: "/usr/libexec/warewulf/wwmkchroot/"```
- ```compute_template: "compute-nodes"```
- ```gpu_template: "gpu-nodes"```
- ```login_template: "login-nodes"```
#### chroot Parameters
The following should not be changed, unless you are familiar with the guts
of these playbooks and are familiar with Warewulf. These define the location
and names of the chroot images for different types of compute nodes.
Do not worry! If you don't have GPU or login nodes, space and time will not
be wasted making unnecessary images.
- ```compute_chroot_loc: "/opt/ohpc/admin/images/{{ compute_chroot }}"```
- ```compute_chroot: centos7-compute```
- ```gpu_chroot_loc: "/opt/ohpc/admin/images/{{ gpu_chroot }}"```
- ```gpu_chroot: centos7-gpu```
- ```login_chroot_loc: "/opt/ohpc/admin/images/{{ login_chroot }}"```
- ```login_chroot: centos7-login```
#### Node Inventory Method
- ```node_inventory_auto: true```
Allows one to switch between ’manually’ adding compute node information
here (in `${HOME}/CRI_XCBC/group_vars/all`) or by running wwnodescan.
The default is to use wwnodescan to automatically search for nodes in
the 10.0.0.0/24 network. In some situations, such as migrating an
existing cluster to a new framework, one may already have a list of
hardware. If some of that is owned/provided by researchers, it is
necessary to keep track of 'which nodes are which', and can be beneficial
to add and name nodes based on an existing set of information.
The following items are ONLY to be used in this case.
```
- compute_nodes:
- { name: "compute-1", vnfs: '{{compute_chroot}}', cpus: 1, sockets: 1, corespersocket: 1, mac: "08:00:27:EC:E2:FF", ip: "10.0.0.254"}
```
The compute_nodes variable is a list of dictionaries, each of which
contains the necessary information to build and name each compute node. You will
have to edit the MAC address for each node, if using the
manual node inventory method.
```
- login_nodes:
- { name: "login-1", vnfs: '{{login_chroot}}', cpus: 8, sockets: 2, corespersocket: 4, mac: "00:26:B9:2E:23:FD", ip: "10.0.0.2"}
```
List of login nodes, same format as compute\_nodes
```
- gpu_nodes:
- { name: "gpu-compute-1", vnfs: '{{gpu_chroot}}', gpus: 4, gpu_type: "gtx_TitanX", cpus: 16, sockets: 2, corespersocket: 8, mac: "0C:B4:7C:6E:9D:4A", ip: "10.0.0.253"}
```
List of gpu nodes, with the addition of a key
describing the number and types of GPU available on that node. These
parameters will be inserted into the slurm.conf. The gpu_type is completely custom, and is the
string that users must request to run on these nodes in the default SLURM configuration.
Ansible Inventory
-----------------
Note the inventory file in
```CRI_XCBC/inventory```:
```
[headnode]
headnode ansible_host="{{ headnode_private_ip }}" ansible_connection=ssh ansible_ssh_user=root
```
Make sure that the hostname of your headnode matches the entry on that line! Either
edit the inventory file, or change the hostname via:
```hostnamectl set-hostname headnode```.
Configuration of the Headnode via Ansible
-----------------------------------------
Examine the headnode.yml file - this contains the basic recipe for the
sequence of steps to take. While it could be run all at once with
‘ansible-playbook headnode.yml‘ we will take go through it here step by
step. Each step can be run with the ’-t’ flag, which asks
ansible-playbook to exectute only tasks with the given name.
When running these scripts, be sure to either cd to the playbook
directory (`cd ${HOME}/CRI_XCBC/`) or provide the complete path
before each file - like
`ansible-playbook -i ${HOME}/CRI_XCBC/inventory/headnode
-t pre_ohpc ${HOME}/CRI_XCBC/headnode.yml`.
1\.
This first role installs necessary dependencies for the OpenHPC rpms,
configures the firewall zones `internal` and `public`. This also installs
fail2ban, configures the `/etc/hosts` file, and sets up the headnode
as an ntp server for the cluster (this is IMPORTANT for SLURM functionality).
This also configures ssh to disallow password authentication - if you
don't want this, edit the template in
`roles/pre_ohpc/templates/sshd_config.j2`
To apply this role, run:
`ansible-playbook -i inventory/headnode -t pre_ohpc headnode.yml`
2\. `ansible-playbook -i inventory/headnode -t ohpc_install headnode.yml`
This second role installs several OpenHPC package groups, `base, warewulf
and slurm server`, configures SLURM (and enables job accounting), creates
a basic template for the compute nodes, and applies two fixes to the
wwsh and wwnodescan scripts.
3\. `ansible-playbook -i inventory/headnode -t ohpc_config headnode.yml`
This third role configures the headnode for several things. It sets the
interface that Warewulf uses, and sets up httpd to serve files to compute
nodes. It configures xinetd for tftp (for PXE-booting the compute nodes),
and mariadb for the internal Warewulf database. This also initializes the
NFS exports from the headnode to compute nodes, in `/etc/exports`. There are three
main exports:
- `/home` for user home directories
- `/opt/ohpc/public` for OpenHPC documentation and packages
- `/export' for custom software packages shared to compute nodes
Installation of the compute nodes
---------------------------------
4\. `ansible-playbook -i inventory/headnode -t compute_build_vnfs headnode.yml `
This role builds an image for the compute nodes, by configuring a chroot environment
in `/opt/ohpc/admin/images/centos-7.3-compute`, and adding a "VNFS" image to the Warewulf
database. This will be used by the compute nodes to PXE boot.
It takes a while to build the image - good time to take a break from your screen!
5\. `ansible-playbook -i inventory/headnode -t compute_build_nodes headnode.yml `
This role does one of two things: if you are using the automatic inventory method,
it runs wwnodescan, with names based on the number of nodes you've defined in
`group_vars/all`, and waits for the nodes to boot. At this point, simply `Start` your
compute nodes in VirtualBox, without providing a boot medium, and watch to be sure
they receive a DHCP response. Occasionally, they will fail to receive a response, but
will work fine if booted a second time.
If you are using the 'manual' method of node entry, this role will enter the provided
information (again, from `group_vars/all`) in the Warewulf database. At that point, you
may boot your compute nodes any time after the role finishes running, and they should
receive a PXE boot image from the headnode as in the automatic method.
6\. `ansible-playbook -i inventory/headnode -t nodes_vivify headnode.yml`
This final role will "bring your nodes to life" by starting the necessary services
on the headnode and compute nodes, such as slurmctld (on the headnode),
slurmd (on the compute nodes), and munge (used by slurm on all nodes for
authentication).
Testing the scheduler
---------------------
After confirming that both nodes have booted successfully (In the VirtualBox windows,
you should see a basic login prompt for each), double-check that you are able to
ssh into the machines as root.
Now, in order to test the scheduler, it is necessary to add a user, by running
`useradd testuser` on the headnode.
To make sure the new user will be enabled on the compute node, run
`wwsh file sync` to update the passwd,group, and shadow files in the Warewulf
database, followed by
`pdsh -w compute-[0-1] '/warewulf/transports/http/wwgetfiles'`
to request that the compute nodes pull the files from the master. While they are automatically
synced every 5 minutes, this will force an update immediately.
Next, become the new user, via `su - testuser`.
Open your text editor of choice, and create a (very) simple slurm batch file
(named `slurm_ex.job` in this example), like:
```
#!/bin/sh
#SBATCH -o nodes.out
#SBATCH -N 2
/bin/hostname
srun -l /bin/hostname
srun -l /bin/pwd
```
Submit this to the scheduler via
`sbatch ./slurm_ex.job`
You should receive a message like `Submitted batch job 2` and find an output
file called `nodes.out` with the contents:
```
[testuser@headnode ~]$ cat nodes.out
compute-0
0: compute-0
1: compute-1
0: /home/testuser
1: /home/testuser
```
Otherwise, there should be useful debugging information in /var/log/slurmctld
on the headnode, or in /var/log/slurmd on the compute nodes.
Conclusion
==========
At this point, you have a basic working cluster with scheduler. The addition of scientific
software and utilities available through XSEDE will be covered in this guide soon.
Thanks for trying this out! Please get in touch with any problems, questions, or comments
at help@xsede.org, with 'XCRI XCBC Tutorial" in the subject line.
---
#OpenHPC release version
openhpc_release_rpm: "https://github.com/openhpc/ohpc/releases/download/v1.3.GA/ohpc-release-1.3-1.el7.x86_64.rpm"
#The full list of available versions for CentOS can be generated via
# curl -s https://github.com/openhpc/ohpc/releases/ | grep rpm | grep -v sle | grep -v strong | sed 's/.*="\(.*\)".*".*".*/\1/'
#
# Headnode Info
public_interface: "eth0" # NIC that allows access to the public internet
private_interface: "eth1" #NIC that allows access to compute nodes
headnode_private_ip: "10.1.1.1"
build_kernel_ver: '3.10.0-957.12.2.el7.x86_64' # `uname -r` at build time... for wwbootstrap
#Private network Info
private_network: "10.1.1.0"
private_network_mask: "24"
private_network_long_netmask: "255.255.255.0"
compute_ip_minimum: "10.1.1.2"
compute_ip_maximum: "10.1.1.255"
gpu_ip_minimum: "10.1.1.128" #This could be more clever, like compute_ip_minimum + num_nodes...
#slurm.conf variables
cluster_name: "ohpc"
# gres_types: "gpu"
# sacct user list
cluster_users:
- centos # include each username on separate line as a list
#Stateful compute or not?
stateful_nodes: false
#Node Config Vars - for stateful nodes
sda1: "mountpoint=/boot:dev=sda1:type=ext3:size=500"
sda2: "dev=sda2:type=swap:size=500"
sda3: "mountpoint=/:dev=sda3:type=ext3:size=fill"
# GPU Node Vars
# download the nvidia cuda installer, and run with only --extract=$path_to_CRI_XCBC/roles/gpu_build_vnfs/files to get these three installers
nvidia_driver_installer: "NVIDIA-Linux-x86_64-387.26.run"
cuda_toolkit_installer: "cuda-linux.9.1.85-23083092.run"
cuda_samples_installer: "cuda-samples.9.1.85-23083092-linux.run"
# WW Template Names for wwmkchroot
template_path: "/usr/libexec/warewulf/wwmkchroot/"
compute_template: "compute-nodes"
gpu_template: "gpu-nodes"
login_template: "login-nodes"
# Chroot variables
compute_chroot_loc: "/opt/ohpc/admin/images/{{ compute_chroot }}"
compute_chroot: centos7-compute
gpu_chroot_loc: "/opt/ohpc/admin/images/{{ gpu_chroot }}"
gpu_chroot: centos7-gpu
login_chroot_loc: "/opt/ohpc/admin/images/{{ login_chroot }}"
login_chroot: centos7-login
# Node Inventory method - automatic, or manual
node_inventory_auto: true
#Node naming variables - no need to change
compute_node_prefix: "c"
num_compute_nodes: 1
gpu_node_prefix: "gpu-compute-"
num_gpu_nodes: 1
login_node_prefix: "login-"
num_login_nodes: 0
#OpenOnDemand
ood_nodename: "ood"
ood_version: 1.5
ood_ip_addr: 10.1.1.254
ood_rpm_repo: "https://yum.osc.edu/ondemand/{{ ood_version }}/ondemand-release-web-{{ ood_version }}-1.el7.noarch.rpm"
#Node Inventory - not in the Ansible inventory sense! Just for WW and Slurm config.
# Someday I will need a role that can run wwnodescan, and add nodes to this file! Probably horrifying practice.
# There is a real difference between building from scratch, and using these for maintenance / node addition!
#
compute_private_nic: "eth0"
compute_nodes:
- { name: "compute-1", vnfs: '{{compute_chroot}}', cpus: 1, sockets: 1, corespersocket: 1, mac: "08:00:27:EC:E2:FF", ip: "10.0.0.254"}
login_nodes:
- { name: "login-1", vnfs: '{{login_chroot}}', cpus: 8, sockets: 2, corespersocket: 4, mac: "00:26:b9:2e:21:ed", ip: "10.2.255.137"}
gpu_nodes:
- { name: "gpu-compute-1", vnfs: '{{gpu_chroot}}', gpus: 4, gpu_type: "gtx_TitanX", cpus: 16, sockets: 2, corespersocket: 8, mac: "0c:c4:7a:6e:9d:6e", ip: "10.2.255.47"}
viz_nodes:
- { name: "viz-node-0-0", vnfs: gpu_chroot, gpus: 2, gpu_type: nvidia_gtx_780, cpus: 8, sockets: 2, corespersocket: 4, mac: "foo", ip: "bar"}
#Slurm Accounting Variables - little need to change these
slurm_acct_db: "slurmdb"
slurmdb_storage_port: "7031"
slurmdb_port: "1234"
slurmdb_sql_pass: "password" #could force this to be a hash...
slurmdb_sql_user: slurm
#automatic variables for internal use
# Don't edit these!
compute_node_glob: "{{ compute_node_prefix }}[0-{{ num_compute_nodes|int - 1}}]"
gpu_node_glob: "{{ gpu_node_prefix }}[0-{{ num_gpu_nodes|int - 1}}]"
node_glob_bash: "{{ compute_node_prefix }}{0..{{ num_compute_nodes|int - 1}}}"
gpu_node_glob_bash: "{{ compute_node_prefix }}{0..{{ num_compute_nodes|int - 1}}}"
#Jupyter related
jupyter_provision: false
#EasyBuild variables
cluster_shared_folder: "/export"
easybuild_prefix: "{{ cluster_shared_folder }}/eb"
easybuild_tmpdir: "/tmp"
easybuild_buildpath: "/tmp/build"
easybuild_sourcepath: "/tmp/source"
#matlab install related
matlab_provision: false
matlab_download_url: "https://uab.box.com/shared/static/y01qu7oo1gpne6j2s6nqwcuee63epivo.gz"
matlab_clustershare: "/opt/ohpc/pub/apps/matlab/"
matlab_destination: "/tmp/matlab.tar.gz"
# module file vars
matlab_install_root: "/opt/ohpc/pub-master/apps/matlab/M2/"
matlab_docs_url: "http://{{ ood_nodename }}"
matlab_license_file: "{{ matlab_install_root }}/licenses/licenses.lic"
matlab_module_path: "{{ easybuild_prefix }}/modules/all"
matlab_module_appdir: "matlab"
matlab_module_file: "r2018a"
matlab_ver: "{{ matlab_module_file }}"
#SAS install related
sas_provision: false
sas_clustershare: "/export/apps/sas/"
sas_module_path: "{{ easybuild_prefix }}/modules/all"
sas_module_appdir: "sas"
sas_module_file: "9.4"
sas_ver: "{{ sas_module_file }}"
#Rstudio related
rstudio_provision: false
singularity_ver: '2.4.2'
r_versions:
- { full: '3.5.1', short: '3.5' }
- { full: '3.4.4', short: '3.4' }
#Copr Repos
enable_copr: true
copr_repos:
- { repo_name: "louistw/mod_wsgi-3.4-18-httpd24", host: ["{{ ood_nodename }}"] }
- { repo_name: "louistw/slurm-17.11.11-ohpc-1.3.6", host: ["{{ cluster_name }}", "{{ ood_nodename }}"] }
- { repo_name: "atlurie/shibboleth-3.0-ood", host: ["{{ ood_nodename }}"] }
# Shibboleth SSO
enable_shib: false
# User Registration
enable_user_reg: false
user_register_app: "flask_user_reg"
user_register_app_path: "/var/www/ood/register/{{ user_register_app }}"
user_register_app_repo: "https://gitlab.rc.uab.edu/mmoo97/flask_user_reg.git"
mod_wsgi_pkg_name: "uab-httpd24-mod_wsgi"
RegUser_app_user: "reggie"
RegUser_app_user_full_name: "RegUser of user register app"
RegUser_app_user_passwd: "qweasd"
# User Create Scripts
enable_user_create_scripts: false
user_create_scripts: "ohpc_user_create"
user_create_scripts_path: "/opt/{{ user_create_scripts }}"
user_create_script_repo: "https://gitlab.rc.uab.edu/tr27p/ohpc_user_create.git"
---
- hosts: headnode
roles:
- {role: pre_ohpc, tags: pre_ohpc}
- {role: ohpc_install, tags: ohpc_install}
- {role: ohpc_config, tags: ohpc_config}
- {role: compute_build_vnfs, tags: compute_build_vnfs}
- {role: gpu_build_vnfs, tags: gpu_build_vnfs}
- {role: login_build_vnfs, tags: login_build_vnfs}
- {role: compute_build_nodes, tags: compute_build_nodes}
- {role: gpu_build_nodes, tags: gpu_build_nodes}
- {role: login_build_nodes, tags: login_build_nodes}
- {role: viz_build_nodes, tags: viz_build_nodes}
- {role: nodes_vivify, tags: nodes_vivify}
[headnode]
ohpc ansible_connection=local
[headnode:vars]
sshgroup=headnode
[ood]
ood
[ood:vars]
sshgroup=ood
[compute]
c1
[compute:vars]
sshgroup=compute
#!/bin/sh
yum -y install epel-release
yum -y install python-devel python-setuptools python-setuptools-devel gcc libffi-devel openssl-devel
easy_install pip
pip install virtualenv
mkdir -p $HOME/ansible_env
cd $HOME/ansible_env
virtualenv ansible
source $HOME/ansible_env/ansible/bin/activate
git clone git://github.com/ansible/ansible.git --recursive ./ansible_source
#pexpect has to be 3.3 because new 4.01 version only
# works with python >= 2.7 :(
pip install paramiko PyYAML Jinja2 httplib2 six pexpect==3.3
#moved this after lib installations
source $HOME/ansible_env/ansible_source/hacking/env-setup -q
## later figure out how to source it together with virtualenv
#echo -e "\nsource $HOME/ansible/hacking/env-setup -q" >> $HOME/.activate_ansible
# run a quick test
echo "# Ansible Inventory" > inventory
echo "[headnode]" >> inventory
echo "localhost ansible_connection=local" >> inventory
ansible -i inventory headnode -a 'hostname'
[headnode]
headnode ansible_host="{{ headnode_private_ip }}" ansible_connection=ssh ansible_ssh_user=root
---
- hosts: ohpc
roles:
- { name: 'pre_ohpc', tags: 'pre_ohpc' }
- { name: 'ohpc_install', tags: 'ohpc_install' }
- { name: 'ohpc_config', tags: 'ohpc_config' }
- { name: 'compute_build_vnfs', tags: 'compute_build_vnfs' }
- { name: 'compute_build_nodes', tags: 'compute_build_nodes' }
- { name: 'nodes_vivify', tags: 'nodes_vivify' }
- { name: 'ohpc_add_easybuild', tags: 'ohpc_add_easybuild' }
- { name: 'ohpc_jupyter', tags: 'ohpc_jupyter', when: jupyter_provision}
- { name: 'ohpc_matlab', tags: 'ohpc_matlab', when: matlab_provision }
- { name: 'ohpc_sas', tags: 'ohpc_sas', when: sas_provision }
- { name: 'ohpc_add_rstudio', tags: 'ohpc_add_rstudio', when: rstudio_provision }
- { name: 'ohpc_user_reg', tags: 'ohpc_user_reg', when: enable_user_reg }
- { name: 'reg_user_create_scripts', tags: 'reg_user_create_scripts', when: enable_user_create_scripts }
---
- hosts: ood
roles:
- { name: 'prep_ood', tags: 'prep_ood' }
- { name: 'ood', tags: 'ood' }
- { name: 'warewulf_sync', tags: 'warewulf_sync' }
- { name: 'ood_jupyter', tags: 'ood_jupyter', when: jupyter_provision}
- { name: 'ood_vnc_form', tags: 'ood_vnc_form' }
- { name: 'ood_add_rstudio', tags: 'ood_add_rstudio', when: rstudio_provision }
- { name: 'ood_matlab', tags: 'ood_matlab', when: matlab_provision }
- { name: 'ood_sas', tags: 'ood_sas', when: sas_provision }
- { name: 'ood_firewall_and_services', tags: 'ood_firewall_and_services' }
- { name: 'ohpc_firewall_and_services', tags: 'ohpc_firewall_and_services' }
- { name: 'ood_shib_sso', tags: 'ood_shib_sso', when: enable_shib }
- { name: 'ood_user_reg', tags: 'ood_user_reg', when: enable_user_reg }
- { name: 'reg_user_create_scripts', tags: 'reg_user_create_scripts', when: enable_user_create_scripts }
---
# - name: print single node info
# debug:
# var: item.mac
# with_items: "{{ compute_nodes }}"
# - name: print single node info
# debug:
# var: item.vnfs
# with_items: "{{ compute_nodes }}"
#
# - fail:
# msg: "Quick fail for test!"
- block:
- name: add node to ww db
command: wwsh -y node new {{ item.name }} --ipaddr={{ item.ip }} --hwaddr={{ item.mac }} -D {{ compute_private_nic }}
with_items: "{{ compute_nodes }}"
- name: set nodes bootloader
command: wwsh -y object modify -s bootloader=sda -t node {{ item.name }}
with_items: "{{ compute_nodes }}"
when: stateful_nodes == true
- name: set nodes partitions
command: wwsh -y object modify -s diskpartition=sda -t node {{ item.name }}
with_items: "{{ compute_nodes }}"
when: stateful_nodes == true
- name: format partitions
command: wwsh -y object modify -s diskformat=sda1,sda2,sda3 -t node {{ item.name }}
with_items: "{{ compute_nodes }}"
when: stateful_nodes == true
- name: define filesystems
command: wwsh -y object modify -s filesystems="{{ sda1 }},{{ sda2 }},{{ sda3 }}" -t node {{ item.name }}
with_items: "{{ compute_nodes }}"
when: stateful_nodes == true
#" for vim
- name: set files to provision
command: wwsh -y provision set {{ item.name }} --vnfs={{ item.vnfs }} --bootstrap={{ build_kernel_ver }} --files=passwd,group,shadow,munge.key,slurm.conf,dynamic_hosts,network
with_items: "{{ compute_nodes }}"
- name: remove node from slurm.conf if it exists already # to avoid duplication!
lineinfile:
dest: /etc/slurm/slurm.conf
regexp: "^NodeName={{ item.name }}"
state: absent
with_items: "{{ compute_nodes }}"
- name: add node to slurm.conf
lineinfile:
dest: /etc/slurm/slurm.conf
line: "NodeName={{ item.name }} Sockets={{ item.sockets }} CoresPerSocket={{ item.corespersocket }} State=UNKNOWN"
insertbefore: "^# PARTITIONS"
state: present
with_items: "{{ compute_nodes }}"
when: node_inventory_auto == false
- name: add nodes via wwnodescan - BOOT NODES NOW, IN ORDER
shell: wwnodescan --ip={{ compute_ip_minimum }} --netdev={{ private_interface }} --netmask=255.255.255.0 --bootstrap={{ build_kernel_ver }} --vnfs={{ compute_chroot }} {{ node_glob_bash }}
when: node_inventory_auto == true
# - name: Waiting for the compute node to bootup
# pause:
# seconds: 180
- name: set files to provision
command: wwsh -y provision set {{ compute_node_glob }} --vnfs={{ compute_chroot }} --bootstrap={{ build_kernel_ver }} --files=passwd,group,shadow,munge.key,slurm.conf,dynamic_hosts,network,lmod.sh,lmod.csh --kargs="net.ifnames=1 biosdevname=1 quiet" --postnetdown=1
when: node_inventory_auto == true
- name: sync files #also generates dynamic hosts on headnode!
command: wwsh file sync
# - name: add compute nodes to ansible inventory for wait
# add_host: name={{ node_glob }} group="compute nodes"
# - name: wait for compute nodes to boot
# local_action: wait_for host={{ last_node }} state=started delay=30 timeout=600
- name: restart dhcp
service: name=dhcpd state=restarted enabled=yes
- name: update pxeconfig to force node to boot from pxe
command: wwsh -y object modify -D bootlocal -t node {{ compute_node_glob}}
when: stateful_nodes == false and node_inventory_auto == true
- name: update pxeconfig to let node boot from local disk
command: wwsh -y object modify -s bootlocal=EXIT -t node {{ compute_node_glob}}
when: stateful_nodes == true and node_inventory_auto == true
- name: wwsh pxe update
command: wwsh -v pxe update
register: command_result
failed_when: "'Building iPXE' not in command_result.stdout and 'Building Pxelinux' not in command_result.stdout"
# vars:
# - compute_node_glob: "{{ compute_node_prefix }}[0-{{ num_compute_nodes|int - 1}}]"
# - node_glob_bash: "{{ compute_node_prefix }}{0..{{ num_compute_nodes|int - 1}}}"
# - last_node: "{{ node_prefix }}{{ num_nodes|int - 1 }}"
---
# - name: fix broken wwmkchroot file
# lineinfile:
# dest: /usr/libexec/warewulf/wwmkchroot/centos-7.tmpl
# regexp: "^YUM_MIRROR(.*)7.2.1511(.*)"
# line: 'YUM_MIRROR\g<1>7\g<2>' # use \g<1> for backref followed by digit!
# backrefs: yes
#
- name: check current kernel version
shell: uname -r | sed "s/.$(uname -m)//"
register: running_kernel_version
- name: check most recent installed kernel version
shell: yum list installed | grep 'kernel\.' | tail -n 1 | awk '{print $2}'
register: installed_kernel_version
- fail:
msg: "Most recently installed kernel is not currently loaded version! Consider rebooting before building the vnfs"
when: running_kernel_version.stdout != installed_kernel_version.stdout
- fail:
msg: "Loaded kernel does not match the build_kernel_ver in group_vars/all"
when: running_kernel_version.stdout not in build_kernel_ver
- name: remove old vnfs if it exists
file:
path: "{{ compute_chroot_loc }}"
state: absent
- template: src=compute_template.j2 dest="{{ template_path }}{{ compute_template }}.tmpl"
- template: src=extend_compute_packages.j2 dest="{{ template_path }}extend_compute_packages"
- template: src=base_packages.j2 dest="{{ template_path }}base_packages"
- name: make chroot
command: wwmkchroot "{{ compute_template }}" "{{ compute_chroot_loc }}"
- name: copy resolve.conf into image
copy: src=/etc/resolv.conf dest="{{ compute_chroot_loc }}/etc/resolv.conf" #"
- name: yum install into the image chroot
yum:
state: present
installroot: "{{ compute_chroot_loc }}"
name:
- chrony
- 'kernel-{{ running_kernel_version.stdout }}'
- lmod-ohpc
- grub2
- freeipmi
- ipmitool
- ohpc-slurm-client
- ohpc-base-compute
- tmux
- ruby
- turbojpeg
- nc
- '@X Window System'
- '@Xfce'
# one method to install TurboVNC
- name: download TurboVNC rpm
get_url:
url: https://sourceforge.net/projects/turbovnc/files/2.2/turbovnc-2.2.x86_64.rpm
dest: /var/tmp/turbovnc-2.2.x86_64.rpm
checksum: md5:25711ad32bfae63031aff20528d4af79
- name: install TurboVNC via rpm into chroot image
yum:
name: /var/tmp/turbovnc-2.2.x86_64.rpm
state: present
installroot: "{{ compute_chroot_loc }}"
# Another method to install TurboVNC, tested
# All information comes from TurboVNC official website:
# https://turbovnc.org/pmwiki/uploads/Downloads/TurboVNC.repo
# - name: add TurboVNC repo into yum inside compute node image
# yum_repository:
# name: TurboVNC
# description: TurboVNC official RPMs
# baseurl: https://sourceforge.net/projects/turbovnc/files
# gpgcheck: yes
# gpgkey: http://pool.sks-keyservers.net/pks/lookup?op=get&search=0x6BBEFA1972FEB9CE
# exclude: 'turbovnc-*.*.9[0-9]-*' # exclude beta releases
# reposdir: "{{ compute_chroot_loc }}/etc/yum.repos.d"
#
# - name: install TurboVNC via yum into chroot image
# yum:
# name: turbovnc
# state: present
# installroot: "{{ compute_chroot_loc }}"
- name: download Websockify source code
get_url:
url: https://github.com/novnc/websockify/archive/v0.8.0.tar.gz
dest: /var/tmp/websockify-0.8.0.tar.gz
- name: extract Websockify source code into chroot env
unarchive:
src: /var/tmp/websockify-0.8.0.tar.gz
dest: '{{ compute_chroot_loc }}/tmp'
- name: install Websockify inside chroot env
command: "chroot {{ compute_chroot_loc }} /bin/bash -c 'cd /tmp/websockify-0.8.0; python setup.py install'"
# After we installed Xfce, the compute node is set to bootup in graphical mode.
# This task is to unset that back to multi-user mode.
- name: set compute node to boot with multi-user mode
command: chroot '{{ compute_chroot_loc }}' systemctl set-default multi-user.target
- name: put NFS home mount info in image
lineinfile: line="{{ headnode_private_ip }}:/home /home nfs nfsvers=3,rsize=1024,wsize=1024,cto 0 0" dest={{ compute_chroot_loc }}/etc/fstab state=present
- name: put NFS opt mount info in image
lineinfile: line="{{ headnode_private_ip }}:/opt/ohpc/pub /opt/ohpc/pub-master nfs nfsvers=3 0 0" dest={{ compute_chroot_loc }}/etc/fstab state=present
- name: put NFS opt mount info in image
lineinfile: line="{{ headnode_private_ip }}:/export /export nfs nfsvers=3 0 0" dest={{ compute_chroot_loc }}/etc/fstab state=present
- name: firewalld on compute image disabled
command: chroot '{{ compute_chroot_loc }}' systemctl disable firewalld
- name: chronyd on compute image enabled
command: chroot '{{ compute_chroot_loc }}' systemctl enable chronyd
- name: add headnode to compute chrony.conf
lineinfile: line="server {{ headnode_private_ip }}" dest={{ compute_chroot_loc }}/etc/chrony.conf state=present
- name: slurmd on compute image enabled
command: chroot '{{ compute_chroot_loc }}' systemctl enable slurmd
- name: wwimport file into image (passwd)
command: wwsh file import /etc/passwd
- name: wwimport file into image (group)
command: wwsh file import /etc/group
- name: wwimport file into image (shadow)
command: wwsh file import /etc/shadow
- name: wwimport file into image (slurm)
command: wwsh file import /etc/slurm/slurm.conf --name slurm.conf
- name: wwimport file into image (munge)
command: wwsh file import /etc/munge/munge.key
- name: wwimport file into image (lmod.sh)
command: wwsh file import /etc/profile.d/lmod.sh
- name: wwimport file into image (lmod.csh)
command: wwsh file import /etc/profile.d/lmod.csh
- name: build bootstrap image
shell: wwbootstrap {{ build_kernel_ver }}
- name: build the vnfs
command: wwvnfs -y --chroot "{{ compute_chroot_loc }}/"
- name: set up provisioning interface
lineinfile: line="GATEWAYDEV={{ private_interface }}" dest=/tmp/network.ww create=yes
#" for vim
#
- name: add network file to import
command: wwsh -y file import /tmp/network.ww --name network
- name: set network file path
command: wwsh -y file set network --path /etc/sysconfig/network --mode=0644 --uid=0
PKGLIST="basesystem bash redhat-release chkconfig coreutils e2fsprogs \
ethtool filesystem findutils gawk grep initscripts iproute iputils \
mingetty mktemp net-tools nfs-utils pam portmap procps psmisc rdate rsync \
sed setup shadow-utils rsyslog tcp_wrappers tzdata util-linux words zlib \
tar less gzip which util-linux module-init-tools udev openssh-clients \
openssh-server dhclient pciutils vim-minimal shadow-utils strace cronie \
crontabs cpuspeed cpufrequtils cpio wget yum numactl libicu"
#DESC: A clone of Red Hat Enterprise Linux 7
# The general RHEL include has all of the necessary functions, but requires
# some basic variables specific to each chroot type to be defined.
# with additional procedure to add packages from variable EXTEND_COMPUTE
. include-rhel-xcbc
# Define the location of the YUM repository
# YUM_MIRROR="http://mirror.centos.org/centos-7/7/os/\$basearch/"
YUM_MIRROR="http://mirror.centos.org/centos-7/7/os/\$basearch/"
# Include the basic packages
. base_packages
# Additional packages to get closer to the definition of compute node I had in rocks.
. extend_compute_packages
ADDITIONALPACKAGES=( "$EXTEND_COMPUTE" )
# vim:filetype=sh:syntax=sh:expandtab:ts=4:sw=4:
EXTEND_COMPUTE="PyPAM abrt-addon-ccpp abrt-addon-kerneloops abrt-addon-python abrt-cli \
abrt-python aide alsa-utils atlas atlas-sse3 audispd-plugins augeas-libs\
authd biosdevname blktrace bridge-utils brltty cim-schema cpupowerutils\
crash-gcore-command crash-trace-command device-mapper-multipath device-mapper-persistent-data \
dstat dumpet edac-utils fftw fftw-devel fftw-static flex flex-devel fprintd-pam \
freeglut GConf2 gdb-gdbserver gdk-pixbuf2 glibc-utils glibc-devel.i686 gnuplot gsl gsl-devel \
hardlink hunspell i2c-tools iotop json-c lapack latencytop latencytop-tui latrace \
ledmon linuxptp lm_sensors lksctp-tools ltrace lvm2 memtest86+ ncurses-term numpy \
oprofile oprofile-jit papi perf powertop python-volume_key rfkill rsyslog-gnutls rsyslog-gssapi rsyslog-relp \
scipy scl-utils sdparm sg3_utils sox squashfs-tools star strace tboot \
trace-cmd udftools units uuidd valgrind vim-X11 vim-enhanced \
virt-what volume_key wodim x86info zsh SDL abrt abrt-libs abrt-tui audit autoconf automake \
blas dejavu-fonts-common dejavu-sans-fonts device-mapper-multipath-libs flac \
fontpackages-filesystem fprintd giflib gnuplot-common gsm jline jpackage-utils latencytop-common libcmpiCppImpl0 \
libao libasyncns libfprint libesmtp libjpeg-turbo-devel libIDL libproxy libproxy-bin libproxy-python \
librelp libreport libreport-cli libreport-compat libreport-filesystem libreport-plugin-kerneloops \
libreport-plugin-logger libreport-plugin-mailx libreport-plugin-reportuploader libreport-plugin-rhtsupport \
libreport-plugin-ureport libreport-python libsamplerate libsndfile \
libtar libXdmcp libxkbfile libxshmfence lvm2-libs numpy-f2py ORBit2 pulseaudio-libs pycairo \
python-argparse python-crypto python-dateutil python-matplotlib python-nose python-paramiko \
python-setuptools pytz qt-sqlite rhino satyr sg3_utils-libs sgml-common suitesparse theora-tools \
trousers tzdata-java vim-common vim-filesystem volume_key-libs wavpack xkeyboard-config xinetd xmlrpc-c \
xmlrpc-c-client xorg-x11-server-common xorg-x11-server-Xvfb xorg-x11-xkb-utils xterm libwsman1 \
net-snmp-utils openwsman-client openwsman-server perl-Compress-Raw-Zlib perl-Compress-Zlib perl-HTML-Parser \
perl-HTML-Tagset perl-IO-Compress-Base perl-IO-Compress-Zlib perl-libwww-perl perl-URI sblim-sfcb sblim-sfcc"
---
# - name: print single node info
# debug:
# var: item.mac
# with_items: "{{ gpu_nodes }}"
- block:
- name: add node to ww db
command: wwsh -y node new {{ item.name }} --ipaddr={{ item.ip }} --hwaddr={{ item.mac }} -D {{ private_interface }}
with_items: "{{ gpu_nodes }}"
- name: blacklist nouveau on first boot
command: wwsh -y object modify -s kargs='modprobe.blacklist=nouveau,quiet' -t node {{ item.name }}
with_items: "{{ gpu_nodes }}"
- name: set nodes bootloader
command: wwsh -y object modify -s bootloader=sda -t node {{ item.name }}
with_items: "{{ gpu_nodes }}"
- name: set nodes partitions
command: wwsh -y object modify -s diskpartition=sda -t node {{ item.name }}
with_items: "{{ gpu_nodes }}"
- name: format partitions
command: wwsh -y object modify -s diskformat=sda1,sda2,sda3 -t node {{ item.name }}
with_items: "{{ gpu_nodes }}"
- name: define filesystems
command: wwsh -y object modify -s filesystems="{{ sda1 }},{{ sda2 }},{{ sda3 }}" -t node {{ item.name }}
with_items: "{{ gpu_nodes }}"
#" for vim
- name: remove node from slurm.conf if it exists already # to avoid duplication!
lineinfile:
dest: /etc/slurm/slurm.conf
regexp: "^NodeName={{ item.name }}"
state: absent
with_items: "{{ gpu_nodes }}"
- name: add node to slurm.conf
lineinfile:
dest: /etc/slurm/slurm.conf
line: "NodeName={{ item.name }} Gres=gpu:{{ item.gpu_type }}:{{ item.gpus }} Sockets={{ item.sockets }} CoresPerSocket={{ item.corespersocket }} State=UNKNOWN"
insertbefore: "^# PARTITIONS"
state: present
with_items: "{{ gpu_nodes }}"
- name: remove node from gres.conf if it exists already # to avoid duplication!
lineinfile:
dest: /etc/slurm/gres.conf
regexp: "^NodeName={{ item.name }}"
state: absent
with_items: "{{ gpu_nodes }}"
- name: add node info to slurm/gres.conf
lineinfile:
dest: /etc/slurm/gres.conf
line: "NodeName={{ item.name }} Name=gpu Type={{ item.gpu_type }} File=/dev/nvidia[0-{{ item.gpus - 1 }}]"
insertafter: "^#######"
state: present
with_items: "{{ gpu_nodes }}"
when: node_inventory_auto == false # END NON-AUTO-INVENTORY BLOCK
- name: add nodes via wwnodescan - BOOT NODES NOW, IN ORDER
shell: wwnodescan --ip={{ gpu_ip_minimum }} --netdev={{ private_interface }} --netmask=255.255.255.0 --bootstrap={{ build_kernel_ver }} --vnfs={{ compute_chroot }} {{ gpu_node_glob_bash }}
when: node_inventory_auto == true
- name: blacklist nouveau on first boot
command: wwsh -y object modify -s kargs='modprobe.blacklist=nouveau,quiet' -t node "{{ gpu_prefix}}*"
when: node_inventory_auto == true
- name: set files to provision
command: wwsh -y provision set {{ gpu_node_glob }} --vnfs={{ gpu_chroot }} --bootstrap={{ build_kernel_ver }} --files=passwd,group,shadow,munge.key,slurm.conf,dynamic_hosts,network,gres.conf
when: node_inventory_auto == true
- name: wwsh file sync
command: wwsh file sync
- name: restart dhcp
service: name=dhcpd state=restarted
- name: update pxeconfig to let node boot from pxe
command: wwsh -y object modify -D bootlocal -t node {{ gpu_node_glob }}
when: stateful_nodes == false and node_inventory_auto == true
- name: update pxeconfig to let node boot from local disk
command: wwsh -y object modify -s bootlocal=EXIT -t node {{ gpu_node_glob}}
when: stateful_nodes == true and node_inventory_auto == true
- name: wwsh pxe update
command: wwsh -v pxe update
register: command_result
failed_when: "'Building iPXE' not in command_result.stdout and 'Building Pxelinux' not in command_result.stdout"