Skip to main content
Asked a question recently

How do I configure PBSPro HA with DAS shared storage?

Where am I?

In Bright Computing, Inc. you can ask and answer questions and share your experience with others!

How do I configure PBSPro High Availability with DAS shared storage?

Bright Cluster Manager does not support HA (High Availability) for PBSPro when the shared storage used is DAS (Please note that DRBD is no longer supported).

To achieve HA in Bright Cluster Manager 8.x, PBSPro must be configured manually as described in this article.

Note: As part of this procedure, a copy of the pbs_comm executable is made. The administrator should keep in mind that if Bright Cluster Manager is updated, then the copied executable won’t be, which could cause incompatibilities.

For this example, the hostnames of the head nodes and their IP addresses on the internalnet are:

head1 10.141.255.254

head2 10.141.255.253

This procedure assumes that HA and shared storage have already been configured.

1) Configure PBSPro

On the primary head node, run:

wlm-setup -s -w pbspro

2) Set the pbspro service to run only in the active head node

With cmsh run this command:

% device
% foreach -l master (services; use pbspro; set runif active; commit)

3) Freeze PBSPro configuration file in both head nodes

On both head nodes edit the file /cm/local/apps/cmd/etc/cmd.conf and add this line:

FrozenFile = { "/etc/pbs.conf" }

then restart CMDaemon in both head nodes:

service cmd restart

4) Modify head node configurations on both nodes

On the primary Head node edit /etc/pbs.conf and modify these properties:

PBS_START_COMM=0
PBS_START_SCHED=1
PBS_SERVER=master
PBS_SERVER_HOST_NAME=head1.cm.cluster
#PBS_PRIMARY=head1.cm.cluster
#PBS_SECONDARY=head2.cm.cluster

On the secondary Head node edit /etc/pbs.conf and modify these properties:

PBS_START_COMM=0
PBS_START_SCHED=1
PBS_SERVER=master
PBS_SERVER_HOST_NAME=head2.cm.cluster
#PBS_PRIMARY=head1.cm.cluster
#PBS_SECONDARY=head2.cm.cluster

From within cmsh, restart the pbs service on the primary head node.

% device; use head1; services; use pbspro; restart

5) Create pbs_comm service configuration

On the primary head node, create the file /etc/pbs_comm.conf with this content:

PBS_EXEC=/cm/local/apps/pbspro/current
PBS_HOME=/cm/local/apps/pbspro/var/spool
PBS_START_SERVER=0
PBS_START_COMM=1
PBS_START_SCHED=0
PBS_START_MOM=0
PBS_SERVER=master
PBS_SERVER_HOST_NAME=head1.cm.cluster

On the secondary head node, create the file /etc/pbs_comm.conf with this content:

PBS_EXEC=/cm/local/apps/pbspro/current
PBS_HOME=/cm/local/apps/pbspro/var/spool
PBS_START_SERVER=0
PBS_START_COMM=1
PBS_START_SCHED=0
PBS_START_MOM=0
PBS_SERVER=master
PBS_SERVER_HOST_NAME=head2.cm.cluster

6) Create the pbs_comm service

On both head nodes, create the directories /cm/local/apps/pbspro/current/sbin and /cm/local/apps/pbspro/current/etc

    mkdir -p /cm/local/apps/pbspro/current/{sbin,etc}

On both head nodes, copy the file /etc/init.d/pbs to the file /etc/init.d/pbs_comm, with the following modifications:

: main code
export PBS_CONF_FILE=/etc/pbs_comm.conf

conf=${PBS_CONF_FILE:-/etc/pbs_comm.conf}
[...]
case "$1" in
start_msg) echo "Starting PBS_COMM" ;;
stop_msg) echo "Stopping PBS_COMM" ;;
status) status_pbs ;;
start) pre_start_pbs ;;
stop) stop_pbs ;;
restart) echo "Restarting PBS_COMM" ; stop_pbs ; pre_start_pbs ;;
*) echo "Usage: `basename $0` --version" ;
echo "Usage: `basename $0` {start|stop|restart|status}" ; exit 1 ;;
esac

On both head nodes copy the file /cm/shared/apps/pbspro/current/etc/pbs_habitat to /cm/local/apps/pbspro/current/etc/pbs_habitat, with the following modifications:

# Start of the pbs_habitat script
#
export PBS_CONF_FILE=/etc/pbs_comm.conf
conf=${PBS_CONF_FILE:-/etc/pbs_comm.conf}

On both head nodes copy the pbs_comm executable to /cm/local:

cp /cm/shared/apps/pbspro/current/sbin/pbs_comm /cm/local/apps/pbspro/current/sbin/pbs_comm

7) Terminate the running pbs_comm processes

On both head nodes, terminate any pbs_comm running processes.

killall -KILL pbs_comm

8) Configure monitoring of the pbs_comm service in Bright Cluster Manager.

With cmsh configure the service for both head nodes:

% device
% foreach -l master (services; add pbs_comm; set monitored on; set autostart on; commit) 

9) Freeze PBSPro configuration in the software images

For each software image which will be used for the compute nodes, edit the file /cm/local/apps/cmd/etc/cmd.conf and add this line to the file:

FrozenFile = {"/etc/pbs.conf","/cm/local/apps/pbspro/var/spool/mom_priv/config"} 

10) Configure the clients to point to a single server and to have two leaf routers.

For each software image that will be used for the compute nodes, edit the file /etc/pbs.conf and modify these properties:

PBS_SERVER=master
PBS_LEAF_ROUTERS=head1.cm.cluster,head2.cm8.cluster
#PBS_PRIMARY=head1.cm.cluster
#PBS_SECONDARY=head2.cm.cluster

11) Modify node configurations

For each software image that will be used for the compute nodes, edit the file /cm/local/apps/pbspro/var/spool/mom_priv/config so it has the following content:
$clienthost head1
$clienthost head2
$restrict_user_maxsysid 499

12) Reboot the compute nodes