Skip to main content
Ask Question
General
Asked a question 4 months ago

How do I upgrade from 6.0 to 6.1 for RHEL?

Where am I?

In Bright Computing, Inc. you can ask and answer questions and share your experience with others!

Upgrading from Bright 6.0 to 6.1 for RHEL based distributions

The procedure below can be used to upgrade a Bright 6.0 installation with failover to Bright 6.1 on the following supported RHEL based distributions:

  • RHEL6, CENTOS6, SL6
  • RHEL5, CENTOS5, SL5

Contents

Prerequisites

  • Make sure a full backup of both head nodes is available and working.
  • Turn off all nodes.
  • If there is a cloud setup configured:
    • Cluster Extension Scenario:
      • Terminate cloud nodes
      • Terminate cloud director(s)
  • Extra distribution packages will be installed. For RedHat enterprise Linux, a valid RPM repository or the appropriate redhat software channels need to be configured and accessible via your RedHat subscription.

On the primary (active) and secondary (passive) head nodes

Apply existing updates to Bright 6.0

yum clean all

yum update 

Update Bright YUM repo configuration to 6.1

If the head node has access to the internet, then update /etc/yum.repos.d/cm.repo:

a. Change all occurrences of 6.0 to 6.1

b. Add the following packages in 'exclude=' (should be a space separated list)

   exclude = slurm* torque* pbspro* cm-hwloc sge*
c. NOTE: cm60user is allowed to access the Bright 6.1 repository, so it is not required to change the username and password
d. IMPORTANT: Set highest priority for Bright repos. Add the line
priority=1

Perl one liner: perl -i.bak -pe 's#6.0#6.1#g;' -pe 's#^(exclude\s*=.*)$#\1 slurm* torque* pbspro* cm-hwloc sge*#g;' /etc/yum.repos.d/cm.repo

OR

Use a Bright 6.1 DVD/ISO as a repo:

# Mount DVD

mount -o loop /path/to/bright/dvd /mnt1


# Create repo file

cat <<EOF >/etc/yum.repos.d/cm6.1-dvd.repo

[cm-dvdrepo-6.1]

name=Bright Cluster Manager 6.1 DVD Repo

baseurl=file:///mnt1/data/cm-rpms/6.1

enabled=1

gpgcheck=1
priority=1
exclude= slurm* pbspro* sge* torque* cm-hwloc EOF # Disable the old repo /etc/yum.repos.d/cm.repo perl -pi -e 's/enabled=1/enabled=0/g' /etc/yum.repos.d/cm.repo

Create FRESH file, clear yum cache

touch /cm/FRESH

yum clean all 

Create back up of current cluster configuration, stop workload managers, unmount shared storage

 service cmd stop

 cmd -x /root/cm/cmd-backup-6.0.xml


 # Stop all workload managers

 /etc/init.d/slurm stop

 /etc/init.d/sgeexecd stop

 /etc/init.d/sgemaster.sge1 stop

 /etc/init.d/pbs stop

 /etc/init.d/torque_mom stop

 /etc/init.d/torque_server stop

 /etc/init.d/maui stop

 /etc/init.d/moab stop


 # Unmount shared storage (only in failover setup)

 umount /cm/shared 

 umount /home

 

Create a backup of the MySQL configuration file and .bashrc

cp /etc/my.cnf{,.ok}

cp /root/.bashrc{,.ok} 

Remove cm-config-cm from Bright 6.0 and install the one for Bright 6.1

 rpm -e --nodeps cm-config-cm

 yum install cm-config-cm 

 # Ignore messages about "Warning: RPMDB altered outside of yum" and "missing requires of cm-config-cm"

 

 # Remove globalarrays rpms of 6.0, this causes conflicts during dep solving

 yum remove globalarrays-{gcc,open64}-openmpi-64

 

Upgrade CM packages to Bright 6.1

yum upgrade 

Remove old/obsolete packages

 # Remove packages that have been obseleted in Bright 6.1

 rpm -e $(rpm -qa | grep -E "^gotoblas|lm_sensors|rsync-cm|cmgui-json-dist" | grep _cm6.0) --nodeps


 # Remove packages for which names have changed in Bright 6.1

 rpm -e $(rpm -qa | grep -E "^fftw|freeipmi|ipmitool|globalarrays|conman|iozone|stresscpu" | grep _cm6.0)


 # Bright 6.1 will no longer provide mpich2 RPMs. The mpich RPMs will be upgraded to version 3.x

 # If the mpich2 provided by Bright is not being used, then it can be removed.


 rpm -e $(rpm -qa | grep -E "^mpich2-" | grep _cm6.0)

 

Install new packages introduced by Bright 6.1

yum install openblas cm-conman cm-freeipmi cm-iozone cm-ipmitool stresscpu fftw{2,3}-openmpi-{gcc,open64}-64 globalarrays-openmpi-{gcc,open64}-64 

Install distribution packages required for dependency resolving

yum install lm_sensors lm_sensors-libs lm_sensors-devel 

Get list of all remaining Bright 6.0 packages

rpm -qa | grep _cm6.0

# The output should show only the workload manager RPMs if they were excluded from update. 

Upgrade cuda RPMS (optional)

 # If cuda RPMS were installed on the Bright 6.0 cluster, the following RPMS will remain:

 rpm -qa | grep -E "^cuda" | grep _cm6.0


 # To remove all Bright 6.0 cuda RPMS (If multiple cuda versions were installed)

 yum remove $(rpm -qa | grep -E "^cuda" | grep _cm6.0)


 # To remove only specific versions of cuda RPMS (for example cuda42):

 yum remove $(rpm -qa | grep -E "^cuda42" | grep _cm6.0)


 # Install latest cuda RPMS from Bright 6.1

 yum install cuda52*

 

Remove /cm/FRESH file

 # This is important, because future updates to any Bright config RPMs will overwrite 

 # config files in default locations, treating this as a FRESH install.


 rm /cm/FRESH

 

Check for rpmsave or rpmnew files, and fix/update them

 #Some updates may have resulted in rpmsave and/or rpmnew files being created. All of these need to be processed. 


 #Use the following command to find the rpmsave and/or rpmnew files:

 

 find / -name "*.rpmnew" -o -name "*.rpmsave" 

*** VERY IMPORTANT ***
# It is important to use the new cmd.conf

# Create backup of existing configuration cp /cm/local/apps/cmd/etc/cmd.conf{,.old}

# Update required information (*** VERY IMPORTANT ***).

1.Copy all username and password information from /cm/local/apps/cmd/etc/cmd.conf to /cm/local/apps/cmd/etc/cmd.conf.rpmnew .
The most important ones are:
DBPass
LDAPPass
LDAPReadOnlyOnlyPass

2.AdvancedConfig Directives:
The ProvisioningNodeAutoUpdate and ProvisioningNodeAutoUpdateTimer directives have become obsolete.
If you were using these, please set the provisioningnodeautoupdatetimeout property in the base partition after the upgrade completed.

IMPORTANT: Please copy all other AdvancedConfig directives that are being used, to /cm/local/apps/cmd/etc/cmd.conf.rpmnew

# Use new file cp /cm/local/apps/cmd/etc/cmd.conf{.rpmnew,} # Use new cmd init script and then restart cmdaemon cp /etc/init.d/cmd{,.old} cp /etc/init.d/cmd{.rpmnew,} service cmd restart

Restore the backup MySQL configuration and .bashrc

mv /etc/my.cnf{.ok,}

cp /root/.bashrc{.ok,} 

Fix /etc/motd

perl -pi -e "s/Cluster Manager ID: #00000/Cluster Manager ID: #$(cat /cm/CLUSTERMANAGERID)/g" \

  /etc/motd 

Restore iptables RPM if required (only RHEL6, SL6, CENTOS6)

On RHEL6 based distributions, 'iptables' RPM does not get updated properly by YUM, leaving it

with some components missing. Due to this, the shorewall service will fail to start. To correct

this problem, please do the following:


rpm -e iptables --nodeps

yum install iptables

service shorewall restart 

On the primary (active) head node update all software images

For each software image (e.g. /cm/images/default-image), do the following:

export IMAGE=/cm/images/default-image 

Apply existing updates to software image:

yum --installroot=$IMAGE clean all

yum --installroot=$IMAGE upgrade 

Update repo configuration files

If the head node has access to the internet, then update $IMAGE/etc/yum.repos.d/cm.repo:


a. Change all occurrences of 6.0 to 6.1

b. Add the following packages in 'exclude=' (should be a space separated list)

    exclude = slurm* torque* pbspro* cm-hwloc

c. NOTE: cm60user is allowed to access the Bright 6.1 repository, so it is not 

   required to change the username and password
d. IMPORTANT:Set highest priority for Bright repos. Add the line
priority=1

OR

# Update using a Bright 6.1 DVD/ISO:

mount -o loop /path/to/bright/dvd /mnt1 (if not already mounted)

mkdir $IMAGE/mnt1; mount -o bind /mnt1 $IMAGE/mnt1


# Create YUM repo configuration file for Bright 6.1


cat <<EOF >$IMAGE/etc/yum.repos.d/cm6.1-dvd.repo

[cm-dvdrepo-6.1]

name=Bright Cluster Manager 6.1 DVD Repo

baseurl=file:///mnt1/data/cm-rpms/6.1

enabled=1

gpgcheck=1

exclude= slurm* pbspro* sge* torque* cm-hwloc

EOF


# Disable the old repo /etc/yum.repos.d/cm.repo

perl -pi -e 's/enabled=1/enabled=0/g' $IMAGE/etc/yum.repos.d/cm.repo 

Create FRESH file, clear yum cache

touch $IMAGE/cm/FRESH

yum --installroot=$IMAGE clean all 

Remove cm-config-cm from Bright 6.0 and install the one for Bright 6.1

rpm --root $IMAGE -e --nodeps cm-config-cm

chroot $IMAGE yum install cm-config-cm  

Upgrade CM packages to Bright 6.1

yum --installroot=$IMAGE upgrade 

Remove old/obsolete packages

rpm --root $IMAGE -e $(chroot $IMAGE rpm -qa | grep -E "^freeipmi|ipmitool|rsync-cm|lm_sensors" | grep _cm6.0) --nodeps 

Install new packages introduced by Bright 6.1

yum --installroot=$IMAGE install cm-freeipmi cm-ipmitool 

Get list of all remaining Bright 6.0 packages

chroot $IMAGE rpm -qa | grep _cm6.0

# Only the workload manager RPMS from Bright 6.0 will remain. To upgrade,

# remove them from the excludes in the YUM repo config ($IMAGE/etc/yum.repos.d/cm.repo

# or $IMAGE/etc/yum.repos.d/cm6.1-dvd.repo), and run:

yum --installroot=$IMAGE update 

Remove /cm/FRESH file

 # This is important, because future updates to any Bright config RPMs will overwrite 

 # config files in default locations, treating this as a FRESH install.


 rm $IMAGE/cm/FRESH

 

Check for rpmsave or rpmnew files, and fix/update them

 # Some updates may have resulted in rpmsave and/or rpmnew files being created. All of these need to be processed. 

 

 # Use the following command to find the rpmsave and/or rpmnew files:

 

 find $IMAGE/ -name "*.rpmnew" -o -name "*.rpmsave"


*** VERY IMPORTANT ***
# It is important to use the new cmd.conf

# Create backup of existing configuration cp $IMAGE/cm/local/apps/cmd/etc/cmd.conf{,.old}

# Update required information (*** VERY IMPORTANT ***).

1.Copy all username and password information from $IMAGE/cm/local/apps/cmd/etc/cmd.conf to $IMAGE/cm/local/apps/cmd/etc/cmd.conf.rpmnew .
The most important one is:
LDAPReadOnlyOnlyPass
2.AdvancedConfig Directives:
The ProvisioningNodeAutoUpdate and ProvisioningNodeAutoUpdateTimer directives have become obsolete.
If you were using these, please set the provisioningnodeautoupdatetimeout property in the base partition after the upgrade completed.

IMPORTANT: Please copy all other AdvancedConfig directives that are being used, to $IMAGE/cm/local/apps/cmd/etc/cmd.conf.rpmnew

#Use new file cp $IMAGE/cm/local/apps/cmd/etc/cmd.conf{.rpmnew,} # Use new init cmd init script cp $IMAGE/etc/init.d/cmd{,.old} cp $IMAGE/etc/init.d/cmd{.rpmnew,}

Propagate changes in the software image(s) to the secondary (passive) head node

 cmsh -> softwareimage -> updateprovisioners


 # Wait for 'Provisioning completed' event

 

Add Xeon Phi settings (optional)

# Add mic metric:
cmsh
$ monitoring metrics
$ add mic
$ set command /cm/local/apps/cmd/scripts/metrics/sample_mic
$ set classofmetric prototype
$ set timeout 30
$ commit

# Add mic gres type into slurmserver role:
cmsh
$ device roles master
$ use slurmserver
$ append grestypes mic
$ commit

Upgrading Workload Managers (optional)

Workload manager RPMS from Bright 6.0 will remain:

rpm -qa | grep -E "slurm|pbspro|torque|cm-hwloc|sge.*_cm6.0"


# Back up all workload manager configuration files

cp /cm/shared/apps/slurm/var/etc/slurm.conf{,.bak}

cp /etc/pbs.conf{,.bak}

cp $IMAGE/etc/pbs.conf{,.bak}

 

Upgrade packages

Slurm

Remove slurm, cm-hwloc from the exclude list in the 6.1 yum configuration file (from step 1.1), and run:

 yum update

 chroot $IMAGE yum update 

PBS Pro

On RHEL5 based systems, there is a bug in the latest PBS Pro that prevents the pbs_server from starting if there are empty lines in /etc/pbs.conf, so remove them:

# On the active and passive head nodes

sed -i '/^$/d' /etc/pbs.conf

Remove pbspro from the exclude list in the 6.1 yum configuration file (from step 1.1), and run:

 # On the active head node

 yum remove pbspro-slave

 yum install pbspro-client

 yum update


 chroot $IMAGE yum remove pbspro-slave

 chroot $IMAGE yum install pbspro-client 

Torque

Remove torque from the exclude list in the 6.1 yum configuration file (from step 1.1), and run:

 # On the active head node

 yum update

 chroot $IMAGE yum update 

Open Grid Scheduler/SGE

Remove sge from the exclude list in the 6.1 yum configuration file (from step 1.1), and run:

 # On the active head node

 yum update

 chroot $IMAGE yum update 

Update provisioners

Propagate changes in the software image(s) to the secondary (passive) head node

 # On the active head node

 cmsh -> softwareimage -> updateprovisioners


 # Wait for 'Provisioning completed' event 

Clean up and reboot head nodes

Re-do shared storage setup (if failover setup) from active head node

cmha-setup -> Shared Storage 

Repair slurm config on the active head node (if slurm power save was enabled)

 # Create back up of existing slurm.conf

 cp /cm/shared/apps/slurm/var/etc/slurm/slurm.conf{,.bak}

 

 # Remove power save definitions between old markers (including markers).

 sed -i '/# ##### CM-POWER-SAVE-ENABLE #####/,/# ##### CM-POWER-SAVE-ENABLE #####/d' /etc/slurm/slurm.conf

 

 # Check diff with backup file, to make sure only the duplicate power save defs were removed

 diff /etc/slurm/slurm.conf /etc/slurm/slurm.conf.bak


 # Re-read slurm config

 scontrol reconfigure

 

Update Slurm prologs

 # In slurm.conf replace 
PrologSlurmctld=/cm/local/apps/cmd/scripts/prolog
to
PrologSlurmctld=/cm/local/apps/cmd/scripts/prolog-healthchecker
Prolog=/cm/local/apps/cmd/scripts/prolog

Unmount ISO (if it was used as the repo) and reboot

 # On the active head node:

 umount /mnt1

 umount $IMAGE/mnt1

 reboot


 # On the passive head node:

 umount /mnt1

 reboot 

Boot cloud director, cloud nodes and regular nodes

 # Boot cloud director(s)

 cmsh -> device power on -n <cloud-director-hostname>


 # Boot cloud nodes

 cmsh -> device power on -n cnode001..cnode1000


 # Boot regular nodes

 cmsh -> device power on -n node001..node1000