How do I upgrade from Bright 6.0/6.1/7.0/7.1/7.2/7.3/8.0 to Bright 8.1?
The procedure below can be used to upgrade a Bright 6.0 or 6.1 or 7.0 or 7.1 or 7.2 or 7.3 or 8.0 installation, including those with failover, to Bright 8.1
Supported Linux distributions
An upgrade to Bright 8.1 is supported for Bright 6.0, 6.1, 7.0, 7.1, 7.2, 7.3 and 8.0 clusters that are running the following Linux distributions:
- RedHat Enterprise Linux 6.8 (RHEL6u8)
- RedHat Enterprise Linux 6.9 (RHEL6u9)
- RedHat Enterprise Linux 7.3 (RHEL7u3)
- RedHat Enterprise Linux 7.4 (RHEL7u4)
- RedHat Enterprise Linux 7.5 (RHEL7u5)
- CentOS Linux 6.8 (CENTOS6u8)
- CentOS Linux 6.9 (CENTOS6u9)
- CentOS Linux 7.3 (CENTOS7u3)
- CentOS Linux 7.4 (CENTOS7u4)
- CentOS Linux 7.5 (CENTOS7u5)
- Scientific Linux 6.8 (SL6u8)
- Scientific Linux 6.9 (SL6u9)
- Scientific Linux 7.3 (SL7u3)
- Scientific Linux 7.4 (SL7u4)
- Scientific Linux 7.5 (SL7u5)
- SUSE Linux Enterprise Server 12 Service Pack 3 (SLES12sp3)
- SUSE Linux Enterprise Server 12 Service Pack 4 (SLES12sp4)
- Extra base distribution RPMs will be installed by yum/zypper in order to resolve dependencies that might arise as a result of the upgrade. Hence the base distribution repositories must be reachable. This means that the clusters that run the Enterprise Linux distributions (RHEL and SLES11) must be subscribed to the appropriate software channels.
- Packages in /cm/shared are upgraded, but the administrator should be aware of the following:
- If /cm/shared is installed in the local partition, then the packages are upgraded. This may not be desirable for users that wish to retain the old behavior.
- If /cm/shared is mounted from a separate partition, then unmounting it will prevent upgrades to the mounted partition, but will allow new packages to be installed in /cm/shared within the local partition. This may be desirable for the administrator, who can later copy over updates from the local /cm/shared to the remote /cm/shared manually according to site-specific requirements.
Since unmounting of mounted /cm/shared is carried out by default, a local /cm/shared will have files from any packages installed there upgraded. According to the yum database, the system is then upgraded even though the files are misplaced in the local partition. However, the newer packages can only be expected to work properly if their associated files are copied over from the local partition to the remote partition
- If the /cm/shared will be unmounted during the upgrade (i,e if an in-place upgrade is not being performed), then please make sure that the contents of the local /cm/shared are in sync with the remote copy.
- Hadoop deployments must be removed (using cm-hadoop-setup), before proceeding with the upgrade. Please contact Bright Support2 (http://www.brightcomputing.com/support2) for further assistance.
- Bright OpenStack deployments must be removed (using cm-openstack-setup). All older Bright OpenStack packages and dependencies must be removed prior to starting the upgrade. Please contact Bright Support2 for further assistance.
- Kubernetes deployments from older Bright versions must be removed before upgrading and re-deployed after upgrade.
- Configurations of cluster extension to the cloud must be removed before upgrading and re-deployed after upgrade.
Important note about package upgrades
The upgrade process will not only upgrade CMDaemon and its dependencies, but it will also upgrade other packages. This means that old packages from the 8.0 or earlier will not be available from the repositories of the latest version of Bright (in this case 8.1 repositories). In some cases, this will require recompiling the user applications to use the upgraded versions of the compilers and the libraries. Also, the configurations of the old packages will not be copied automatically to the new packages, which means that the administrator will have to adjust the configuration from the old packages to suit the new packages manually.
Important note about monitoring data and configuration
The monitoring backend has changed considerably starting from Bright 8.0. Hence it is not possible to migrate the older monitoring configuration and data to the new monitoring system. As a result, the following are to be expected after upgrading a cluster that is running Bright versions 7.3 and earlier.
Upgrades from 8.0 to 8.1 are not affected.
Monitoring data: All monitoring data from prior to the upgrade are lost after the upgrade to Bright 8.1
Monitoring configuration: The monitoring configuration is reset to a default Bright 8.1 configuration. That is, similar to what is configured on a freshly-installed Bright 8.1 cluster. This means that all old custom monitoring configurations are lost.
Important note about GPU integration
Starting from Bright 8.0, Nvidia DCGM is used for managing and monitoring Nvidia GPUs. Only Tesla GPUs (K80 and later) are supported by DCGM. After upgrading to Bright 8.1, it is still possible to use Bright to obtain metrics from older GPUs by following the article How to collect metrics from older GPUs using NVML1, however configuring these GPUs from Bright is no longer possible.
Enable the upgrade repo and install the upgrade RPM
Install the Bright Cluster Manager upgrade RPM on the Bright head node(s) as shown below:
1. Add and enable the upgrade repo
Create a repo file with the following contents:
name=Bright 8.1 Upgrade Repository
Note: Plese replace <DIST> with one of : rhel/7 , rhel/6, sles/12
On RHEL-based distributions, save the repo file to /etc/yum.repos.d/
On SLES-based distributions, save the repo file to /etc/zypp/repos.d/
2. Install RPM
yum install cm-upgrade-8.1
3. Make the cm-upgrade command available in the default PATH
module load cm-upgrade/8.1
The recommended order for upgrade is:
- Power off the regular nodes.
Terminate the cloud nodes and the cloud directors
- Apply existing updates to Bright 6.0/6.1/7.0/7.1/7.2/7.3 on the head node and in the software images.
- Update the head node as follows:
For RHEL derivatives:
For SLES derivatives:
- Update software images. For each software image run the following:
For RHEL derivatives:
yum --installroot /cm/images/<software image> update
For SLES derivatives:
zypper --root /cm/images/<software image> up
Note: If the software image repositories differ from the repositories that the head node uses, then you should chroot into the software image first before attempting to run "yum update" or "zypper up" . This is because using the --installroot or --root switch will not allow yum/zypper to use the repositories defined in the software images.
- Update the head node as follows:
- Upgrade head nodes to Bright 8.1:
Important: this must be run on both head nodes in a high-availability setup
Recommended: Upgrade active head node first, and then the passive head node
- Run post upgrade actions (must be run only on the active head node):
cm-upgrade -f -p
- In a HA setup, after upgrading both the head nodes, resync the databases. Run the following from the active head node (it is very important to complete this step before moving to the next one):
cmha dbreclone <secondary>
- Upgrade the software image(s) to Bright 8.1
cm-upgrade -i all
Important: this must be run only on the active head node. If the software images are not under the standard location, which is /cm/images/ on the head node, then the option "-a" should be used "cm-upgrade -a /apps/images -i <name of software image>"
- Power on the regular nodes, cloud nodes and cloud directors
Usage and help
For more detailed information on usage, examples, and a full description:
Upgrading using a Bright DVD/ISO
When using a Bright DVD/ISO to perform the upgrade, it is important to use a DVD/ISO that is not older than 8.1-5. The DVD/ISO version can be found (assuming that the DVD/ISO is mounted under /mnt/cdrom) with a find command such as:
# find /mnt/cdrom -type d -name '8.1-*'
FAQs and Troubleshooting
Q: Why are my SGE or Torque jobs not running after upgrading to Bright 8.1 ?
A: This is mostly because there is an obsolete broken prolog symlink
Solution: Remove the broken symlink on the nodes and re-submit job.
Q: Why did cm-upgrade fail at the stage: 'Installing distribution packages' or 'Upgrading packages to Bright 8.1' ?
A: This will happen when some distribution package dependencies could not be met. Please look in
/var/log/cm-upgrade.log for detailed information about what packages are missing.
Solution: Enable required additional base distribution repositories and re-run cm-upgrade with the -f option.
Example: cm-upgrade -f
Q: After upgrading from Bright 6.0 to Bright 8.1, why is the MySQL healthcheck failing because the cmdaemon monitoring database engine is not MyISAM ?
A: This is because Bright versions before 6.1 use InnoDB as the MySQL engine. Starting with Bright 6.1, MyISAM is the default monitoring database engine.
Solution: Change the engine type for the cmdaemon_mon database to MyISAM.
Q: Why are LDAP users sometimes not accessible on SLES compute node after upgrading to Bright 8.1 ?
A: This is most likely because the 'sssd' service failed to start. This can happen when /var/lib/sssd is in the exclude lists of the node or category.
Solution: Remove /var/lib/sssd from the exclude lists and then reboot the nodes.
Q: Why is the mvapich package not upgraded to the Bright 8.1 version ?
A: This is because support for the mvapich package has been dropped starting Bright 8.0. The package is not obsoleted or removed automatically, because there might be user applications that are still using them.
Solution: If there are no user applications that use mvapich, then the package must be manually removed by the administrator.
Q: Why are slurm upgrades disabled in certain cases (e.g when the version of slurm is < 17.x) ?
A: Upgrade of slurm versions < 17.x causes loss of statesave information. This is useful when there are pending jobs during the upgrade. Slurm upgrades must be done separately after the main upgrade has completed. After upgrading the slurm packages, the statesave direcroty /cm/shared/apps/slurm/var/cm/statesave must be removed and the slurm services restarted.
Q: What versions of LSF are supported in Bright 8.1 ?
A: Only LSF versions >= 10.x are supported. The LSF integration in Bright will stop working after the upgrade if unsupported versions are installed. The LSF packages are not upgraded automatically because they are not distributed by Bright. LSF upgrades must be done separately after the main upgrade has completed.