Skip to main content
Ask Question
Anonymous
Asked a question 4 months ago

My Mellanox Inifiniband kernel modules are failing to load with the 5.0 version of the Mellanox OFED.

Where am I?

In Bright Computing, Inc. you can ask and answer questions and share your experience with others!

The Linux Kernel ABI compatibility appears to have been broken in newer Linux kernel releases.
This appears to affect the kernels in SLES12 SP5, RHEL/CentOS 7.7 onwards, RHEL/Centos 8.1 onwards. There may be other versions affected as well. For example: SLES 15

The issue shows up as follows when attempting to start the openibd service.

# service openibd start
Module mlx4_core belong to kernel which is not a part of ML[FAILED] skipping...
Module mlx4_ib belong to kernel which is not a part of MLNX[FAILED] skipping...
Module mlx4_core belong to kernel which is not a part of ML[FAILED] skipping...
Module mlx4_en belong to kernel which is not a part of MLNX[FAILED] skipping...
Module mlx5_core belong to kernel which is not a part of ML[FAILED] skipping...
Module mlx5_ib belong to kernel which is not a part of MLNX[FAILED] skipping...
Module mlx5_fpga_tools does not exist, skipping... [FAILED]
Module ib_umad belong to kernel which is not a part of MLNX[FAILED] skipping...
Module ib_uverbs belong to kernel which is not a part of ML[FAILED] skipping...
Module ib_ipoib belong to kernel which is not a part of MLN[FAILED]skipping...
Loading HCA driver and Access Layer:                       [  OK  ]
Module rdma_cm belong to kernel which is not a part of MLNX[FAILED]skipping...
Module ib_ucm does not exist, skipping...                  [FAILED]
Module rdma_ucm belong to kernel which is not a part of MLN[FAILED]skipping...

There are two potential solutions.

Option1: 

To work around this issue, the upstream Mellanox installer provides a "--add-kernel-support" flag. Unfortunately, the Bright packaged version of the Mellanox OFED doesn't provide this functionality as it has the potential to break MPI and workload manager compatibility. 

As a workaround for the Bright packages, perform the following:

1. In /etc/openibd/infiniband.conf on line 132, set FORCE=0 to FORCE=1. This causes openibd to ignore the kernel difference but relies on weak-updates. 

2. Edit /etc/infiniband/openib.conf and set UCM_LOAD=no and MLX5_FPGA_LOAD=no. As most customers aren't using Legacy cards or FPGAs, this should not be an issue. 

3. Restart the openibd service.

Once complete, the Mellanox OFED modules should load as expected.

# service openibd start
Loading HCA driver and Access Layer:                       [  OK  ]


The above changes would also need to be applied to your software images.

Option 2:
The alternative to the above steps is to use the upstream Mellanox installer with the --add-kernel-support flag.