I’m trying to build the NVMe-oF Target offloading environment based on the Bluefield-2.
I installed the MLNX OFED driver MLNX_OFED_LINUX-23.10-2.1.3.1-rhel7.9-x86_64. The OS version is Centos 7.2009 and the kernel version is 6.0.10.
I followed the steps in the following website:
https://enterprise-support.nvidia.com/s/article/howto-configure-nvme-over-fabrics–nvme-of–target-offload
When I connect to the Target, I will receive the error info in the dmesg of the Target server:
received IB Backend ctrl event: XRQ NVMF backend ctrl timeout error (22)
And I can’t use fio read or write the block device /dev/nvme2n1.
FIO will print the following message:
[root@k8s-node3 randread]# fio 4k-rand-r-qd128-6numjobs.config
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.7
Starting 6 processes
fio: io_u error on file /dev/nvme2n1: Operation not supported: read offset=115801505792, buflen=4096
fio: io_u error on file /dev/nvme2n1: Operation not supported: read offset=1421357064192, buflen=4096
fio: pid=5779, err=95/file:io_u.c:1747, func=io_u error, error=Operation not supported
fio: io_u error on file /dev/nvme2n1: Operation not supported: read offset=167814012928, buflen=4096
fio: io_u error on file /dev/nvme2n1: Operation not supported: read offset=1427328159744, buflen=4096
fio: pid=5783, err=95/file:io_u.c:1747, func=io_u error, error=Operation not supported
fio: io_u error on file /dev/nvme2n1: Operation not supported: read offset=1465976410112, buflen=4096
fio: io_u error on file /dev/nvme2n1: Operation not supported: read offset=1338877808640, buflen=4096
fio: pid=5784, err=95/file:io_u.c:1747, func=io_u error, error=Operation not supported
fio: io_u error on file /dev/nvme2n1: Operation not supported: read offset=1252726947840, buflen=4096
fio: io_u error on file /dev/nvme2n1: Operation not supported: read offset=1409161920512, buflen=4096
fio: io_u error on file /dev/nvme2n1: Operation not supported: read offset=489187487744, buflen=4096
fio: io_u error on file /dev/nvme2n1: Operation not supported: read offset=348755009536, buflen=4096
fio: pid=5782, err=95/file:io_u.c:1747, func=io_u error, error=Operation not supported
fio: pid=5780, err=95/file:io_u.c:1747, func=io_u error, error=Operation not supported
fio: io_u error on file /dev/nvme2n1: Operation not supported: read offset=188397432832, buflen=4096
fio: io_u error on file /dev/nvme2n1: Operation not supported: read offset=1572508553216, buflen=4096
fio: pid=5781, err=95/file:io_u.c:1747, func=io_u error, error=Operation not supported
test: (groupid=0, jobs=6): err=95 (file:io_u.c:1747, func=io_u error, error=Operation not supported): pid=5779: Mon Apr 8 10:32:29 2024
cpu : usr=91.18%, sys=5.88%, ctx=27, majf=0, minf=855
IO depths : 1=0.8%, 2=1.6%, 4=3.1%, 8=6.2%, 16=12.5%, 32=25.0%, >=64=50.8%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=768,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128
Run status group 0 (all jobs):
Disk stats (read/write):
nvme2n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
In the Target side, I do the following steps:
modprobe -r nvme
modprobe nvme num_p2p_queues=24
# start
mkdir /sys/kernel/config/nvmet/subsystems/testsubsystem
echo 1 > /sys/kernel/config/nvmet/subsystems/testsubsystem/attr_allow_any_host
echo 1 > /sys/kernel/config/nvmet/subsystems/testsubsystem/attr_offload
mkdir /sys/kernel/config/nvmet/subsystems/testsubsystem/namespaces/1
echo -n /dev/nvme0n1 > /sys/kernel/config/nvmet/subsystems/testsubsystem/namespaces/1/device_path
echo 1 > /sys/kernel/config/nvmet/subsystems/testsubsystem/namespaces/1/enable
# create nvmet port
mkdir /sys/kernel/config/nvmet/ports/1
echo 4420 > /sys/kernel/config/nvmet/ports/1/addr_trsvcid
echo 192.168.1.101 > /sys/kernel/config/nvmet/ports/1/addr_traddr
echo "rdma" > /sys/kernel/config/nvmet/ports/1/addr_trtype
echo "ipv4" > /sys/kernel/config/nvmet/ports/1/addr_adrfam
# enable port
ln -s /sys/kernel/config/nvmet/subsystems/testsubsystem/ /sys/kernel/config/nvmet/ports/1/subsystems/testsubsystem
And in the Host side, I do these steps:
modprobe nvme
modprobe nvme-rdma
nvme connect -t rdma -n testsubsystem -a 192.168.1.100 -s 4420
Then I can see the block device /dev/nvme2n1 in the host side. But after several seconds the dmesg in the Target side will print the debug info:
[ 743.528042] nvmet: creating nvm controller 1 for subsystem testsubsystem for NQN nqn.2014-08.org.nvmexpress:uuid:a067be64-6f53-4b5f-a780-4a088f99ec84.
[ 751.154941] nvmet_rdma: using dynamic staging buffer 0000000072e3e2f8
[ 751.219387] nvmet: Adding offload ctx 0 to configfs
[ 769.852202] nvme 0000:b2:00.0: received IB Backend ctrl event: XRQ NVMF backend ctrl timeout error (22) be_ctrl 000000002f8cb549 id 0
[ 769.852240] nvmet: Removing offload ctx 0 from configfs
I tried update the firmware of the BF2 and reset the configuration of BF2 but nothing changed.
The configuration of BF2 is as follows:
[root@k8s-node2 jason]# mlxconfig -d 86:00.0 q
Device #1:
----------
Device type: BlueField2
Name: MBF2H516A-CEEO_Ax
Description: BlueField-2 SmartNIC 100GbE Dual-Port QSFP56; PCIe Gen4 x16; Crypto; 16GB on-board DDR; 1GbE OOB management; FHHL
Device: 86:00.0
Configurations: Next Boot
MEMIC_BAR_SIZE 0
MEMIC_SIZE_LIMIT _256KB(1)
HOST_CHAINING_MODE DISABLED(0)
HOST_CHAINING_CACHE_DISABLE False(0)
HOST_CHAINING_DESCRIPTORS Array[0..7]
HOST_CHAINING_TOTAL_BUFFER_SIZE Array[0..7]
INTERNAL_CPU_MODEL SEPARATED_HOST(0)
FLEX_PARSER_PROFILE_ENABLE 0
PROG_PARSE_GRAPH False(0)
FLEX_IPV4_OVER_VXLAN_PORT 0
ROCE_NEXT_PROTOCOL 254
ESWITCH_HAIRPIN_DESCRIPTORS Array[0..7]
ESWITCH_HAIRPIN_TOT_BUFFER_SIZE Array[0..7]
PF_BAR2_SIZE 3
DPU_RESET_NOTIFICATION_ENABLED ENABLED(1)
INTERNAL_CPU_RSHIM ENABLED(0)
PF_NUM_OF_VF_VALID False(0)
NON_PREFETCHABLE_PF_BAR False(0)
VF_VPD_ENABLE False(0)
PF_NUM_PF_MSIX_VALID False(0)
PER_PF_NUM_SF False(0)
STRICT_VF_MSIX_NUM False(0)
VF_NODNIC_ENABLE False(0)
NUM_PF_MSIX_VALID True(1)
NUM_OF_VFS 16
NUM_OF_PF 2
PF_BAR2_ENABLE True(1)
HIDE_PORT2_PF False(0)
SRIOV_EN True(1)
PF_LOG_BAR_SIZE 5
VF_LOG_BAR_SIZE 1
NUM_PF_MSIX 63
NUM_VF_MSIX 11
INT_LOG_MAX_PAYLOAD_SIZE AUTOMATIC(0)
PCIE_CREDIT_TOKEN_TIMEOUT 0
RT_PPS_ENABLED_ON_POWERUP False(0)
LAG_RESOURCE_ALLOCATION DEVICE_DEFAULT(0)
PHY_COUNT_LINK_UP_DELAY DELAY_NONE(0)
ACCURATE_TX_SCHEDULER False(0)
PARTIAL_RESET_EN False(0)
RESET_WITH_HOST_ON_ERRORS False(0)
NVME_EMULATION_ENABLE False(0)
NVME_EMULATION_NUM_VF 0
NVME_EMULATION_NUM_PF 1
NVME_EMULATION_VENDOR_ID 5555
NVME_EMULATION_DEVICE_ID 24577
NVME_EMULATION_CLASS_CODE 67586
NVME_EMULATION_REVISION_ID 0
NVME_EMULATION_SUBSYSTEM_VENDOR_ID 0
NVME_EMULATION_SUBSYSTEM_ID 0
NVME_EMULATION_NUM_MSIX 0
NVME_EMULATION_MAX_QUEUE_DEPTH 0
PCI_SWITCH_EMULATION_NUM_PORT 0
VIRTIO_EMULATION_HOTPLUG_TRANS False(0)
PCI_SWITCH_EMULATION_ENABLE False(0)
VIRTIO_NET_EMULATION_VF_PCI_LAYOUT VIRTIO_1_X(0)
VIRTIO_NET_EMULATION_PF_PCI_LAYOUT VIRTIO_1_X(0)
VIRTIO_NET_EMULATION_ENABLE False(0)
VIRTIO_NET_EMULATION_NUM_VF 0
VIRTIO_NET_EMULATION_NUM_PF 0
VIRTIO_NET_EMU_SUBSYSTEM_VENDOR_ID 6900
VIRTIO_NET_EMULATION_SUBSYSTEM_ID 4161
VIRTIO_NET_EMULATION_NUM_MSIX 2
VIRTIO_BLK_EMULATION_VF_PCI_LAYOUT VIRTIO_1_X(0)
VIRTIO_BLK_EMULATION_PF_PCI_LAYOUT VIRTIO_1_X(0)
VIRTIO_BLK_EMULATION_ENABLE False(0)
VIRTIO_BLK_EMULATION_NUM_VF 0
VIRTIO_BLK_EMULATION_NUM_PF 0
VIRTIO_BLK_EMU_SUBSYSTEM_VENDOR_ID 6900
VIRTIO_BLK_EMULATION_SUBSYSTEM_ID 4162
VIRTIO_BLK_EMULATION_NUM_MSIX 2
PCI_DOWNSTREAM_PORT_OWNER Array[0..15]
RO HOST_PRIV_RSHIM DEVICE_DEFAULT(0)
CQE_COMPRESSION BALANCED(0)
IP_OVER_VXLAN_EN False(0)
MKEY_BY_NAME False(0)
PRIO_TAG_REQUIRED_EN False(0)
UCTX_EN True(1)
REAL_TIME_CLOCK_ENABLE False(0)
RDMA_SELECTIVE_REPEAT_EN False(0)
PCI_ATOMIC_MODE PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0)
TUNNEL_ECN_COPY_DISABLE False(0)
LRO_LOG_TIMEOUT0 6
LRO_LOG_TIMEOUT1 7
LRO_LOG_TIMEOUT2 8
LRO_LOG_TIMEOUT3 13
LOG_TX_PSN_WINDOW 7
VF_MIGRATION_MODE DEVICE_DEFAULT(0)
LOG_MAX_OUTSTANDING_WQE 7
ROCE_ADAPTIVE_ROUTING_EN False(0)
TUNNEL_IP_PROTO_ENTROPY_DISABLE False(0)
USER_PROGRAMMABLE_CC False(0)
PCC_INT_NP_RTT_DSCP 26
PCC_INT_NP_RTT_DSCP_EN False(0)
PCC_INT_NP_RTT_DATA_MODE RTT_V0(64)
PCC_INT_EN False(0)
PCC_INT_SYSTEM_RTT 0
STEERING_CACHE_REFRESH 0
MULTI_PCI_RESOURCE_SHARING DEVICE_DEFAULT(0)
ICM_CACHE_MODE DEVICE_DEFAULT(0)
HAIRPIN_DATA_BUFFER_LOCK False(0)
TLS_OPTIMIZE False(0)
TX_SCHEDULER_BURST 0
ZERO_TOUCH_TUNING_ENABLE False(0)
ROCE_CC_LEGACY_DCQCN False(0)
LOG_MAX_QUEUE 17
UPT_EMULATION_ENABLE False(0)
LARGE_MTU_TWEAK_64 False(0)
AES_XTS_TWEAK_INC_64 False(0)
CRYPTO_POLICY UNRESTRICTED(1)
RDE_DISABLE False(0)
PLDM_FW_UPDATE_DISABLE False(0)
RBT_DISABLE False(0)
PCIE_SMBUS_DISABLE False(0)
PCIE_IN_BAND_VDM_DISABLE False(0)
RO HOST_PRIV_NV_HOST DEVICE_DEFAULT(0)
RO HOST_PRIV_NV_PORT DEVICE_DEFAULT(0)
RO HOST_PRIV_NV_GLOBAL DEVICE_DEFAULT(0)
RO HOST_PRIV_NV_INTERNAL_CPU DEVICE_DEFAULT(0)
RO HOST_PRIV_FW_UPDATE DEVICE_DEFAULT(0)
RO HOST_PRIV_NIC_RESET DEVICE_DEFAULT(0)
LOG_DCR_HASH_TABLE_SIZE 11
MAX_PACKET_LIFETIME 0
DCR_LIFO_SIZE 16384
ROCE_CC_PRIO_MASK_P1 255
ROCE_CC_PRIO_MASK_P2 255
CLAMP_TGT_RATE_AFTER_TIME_INC_P1 True(1)
CLAMP_TGT_RATE_P1 False(0)
RPG_TIME_RESET_P1 300
RPG_BYTE_RESET_P1 32767
RPG_THRESHOLD_P1 1
RPG_MAX_RATE_P1 0
RPG_AI_RATE_P1 5
RPG_HAI_RATE_P1 50
RPG_GD_P1 11
RPG_MIN_DEC_FAC_P1 50
RPG_MIN_RATE_P1 1
RATE_TO_SET_ON_FIRST_CNP_P1 0
DCE_TCP_G_P1 1019
DCE_TCP_RTT_P1 1
RATE_REDUCE_MONITOR_PERIOD_P1 4
INITIAL_ALPHA_VALUE_P1 1023
MIN_TIME_BETWEEN_CNPS_P1 4
CNP_802P_PRIO_P1 6
CNP_DSCP_P1 48
CLAMP_TGT_RATE_AFTER_TIME_INC_P2 True(1)
CLAMP_TGT_RATE_P2 False(0)
RPG_TIME_RESET_P2 300
RPG_BYTE_RESET_P2 32767
RPG_THRESHOLD_P2 1
RPG_MAX_RATE_P2 0
RPG_AI_RATE_P2 5
RPG_HAI_RATE_P2 50
RPG_GD_P2 11
RPG_MIN_DEC_FAC_P2 50
RPG_MIN_RATE_P2 1
RATE_TO_SET_ON_FIRST_CNP_P2 0
DCE_TCP_G_P2 1019
DCE_TCP_RTT_P2 1
RATE_REDUCE_MONITOR_PERIOD_P2 4
INITIAL_ALPHA_VALUE_P2 1023
MIN_TIME_BETWEEN_CNPS_P2 4
CNP_802P_PRIO_P2 6
CNP_DSCP_P2 48
LLDP_NB_DCBX_P1 False(0)
LLDP_NB_RX_MODE_P1 OFF(0)
LLDP_NB_TX_MODE_P1 OFF(0)
LLDP_NB_DCBX_P2 False(0)
LLDP_NB_RX_MODE_P2 OFF(0)
LLDP_NB_TX_MODE_P2 OFF(0)
ROCE_RTT_RESP_DSCP_P1 0
ROCE_RTT_RESP_DSCP_MODE_P1 DEVICE_DEFAULT(0)
ROCE_RTT_RESP_DSCP_P2 0
ROCE_RTT_RESP_DSCP_MODE_P2 DEVICE_DEFAULT(0)
DCBX_IEEE_P1 True(1)
DCBX_CEE_P1 True(1)
DCBX_WILLING_P1 True(1)
DCBX_IEEE_P2 True(1)
DCBX_CEE_P2 True(1)
DCBX_WILLING_P2 True(1)
KEEP_ETH_LINK_UP_P1 True(1)
KEEP_IB_LINK_UP_P1 False(0)
KEEP_LINK_UP_ON_BOOT_P1 False(0)
KEEP_LINK_UP_ON_STANDBY_P1 False(0)
DO_NOT_CLEAR_PORT_STATS_P1 False(0)
AUTO_POWER_SAVE_LINK_DOWN_P1 False(0)
KEEP_ETH_LINK_UP_P2 True(1)
KEEP_IB_LINK_UP_P2 False(0)
KEEP_LINK_UP_ON_BOOT_P2 False(0)
KEEP_LINK_UP_ON_STANDBY_P2 False(0)
DO_NOT_CLEAR_PORT_STATS_P2 False(0)
AUTO_POWER_SAVE_LINK_DOWN_P2 False(0)
NUM_OF_VL_P1 _4_VLs(3)
NUM_OF_TC_P1 _8_TCs(0)
NUM_OF_PFC_P1 8
VL15_BUFFER_SIZE_P1 0
QOS_TRUST_STATE_P1 TRUST_PCP(1)
ETS_SCHED_MODE_P1 device_default(0)
NUM_OF_VL_P2 _4_VLs(3)
NUM_OF_TC_P2 _8_TCs(0)
NUM_OF_PFC_P2 8
VL15_BUFFER_SIZE_P2 0
QOS_TRUST_STATE_P2 TRUST_PCP(1)
ETS_SCHED_MODE_P2 device_default(0)
DUP_MAC_ACTION_P1 LAST_CFG(0)
MPFS_MC_LOOPBACK_DISABLE_P1 False(0)
MPFS_UC_LOOPBACK_DISABLE_P1 False(0)
UNKNOWN_UPLINK_MAC_FLOOD_P1 False(0)
SRIOV_IB_ROUTING_MODE_P1 LID(1)
IB_ROUTING_MODE_P1 LID(1)
DUP_MAC_ACTION_P2 LAST_CFG(0)
MPFS_MC_LOOPBACK_DISABLE_P2 False(0)
MPFS_UC_LOOPBACK_DISABLE_P2 False(0)
UNKNOWN_UPLINK_MAC_FLOOD_P2 False(0)
SRIOV_IB_ROUTING_MODE_P2 LID(1)
IB_ROUTING_MODE_P2 LID(1)
PHY_AUTO_NEG_P1 DEVICE_DEFAULT(0)
PHY_RATE_MASK_OVERRIDE_P1 False(0)
PHY_FEC_OVERRIDE_P1 DEVICE_DEFAULT(0)
PHY_AUTO_NEG_P2 DEVICE_DEFAULT(0)
PHY_RATE_MASK_OVERRIDE_P2 False(0)
PHY_FEC_OVERRIDE_P2 DEVICE_DEFAULT(0)
PF_TOTAL_SF 0
PF_DEVICE_ID_ENABLE False(0)
PF_SF_BAR_SIZE 0
PF_NUM_PF_MSIX 63
PF_DEVICE_ID 41686
SILENT_MODE False(0)
MKEY_BY_NAME_RANGE DEVICE_DEFAULT(0)
ROCE_CONTROL ROCE_ENABLE(2)
PCI_WR_ORDERING per_mkey(0)
MULTI_PORT_VHCA_EN False(0)
PORT_OWNER True(1)
ALLOW_RD_COUNTERS True(1)
RENEG_ON_CHANGE True(1)
TRACER_ENABLE True(1)
IP_VER IPv4(0)
BOOT_UNDI_NETWORK_WAIT 0
UEFI_HII_EN True(1)
BOOT_DBG_LOG False(0)
UEFI_LOGS DISABLED(0)
BOOT_VLAN 1
LEGACY_BOOT_PROTOCOL PXE(1)
BOOT_INTERRUPT_DIS False(0)
BOOT_LACP_DIS True(1)
BOOT_VLAN_EN False(0)
BOOT_PKEY 0
P2P_ORDERING_MODE DEVICE_DEFAULT(0)
EXP_ROM_VIRTIO_NET_PXE_ENABLE True(1)
EXP_ROM_VIRTIO_NET_UEFI_ARM_ENABLE True(1)
EXP_ROM_VIRTIO_NET_UEFI_x86_ENABLE True(1)
EXP_ROM_VIRTIO_BLK_UEFI_ARM_ENABLE True(1)
EXP_ROM_VIRTIO_BLK_UEFI_x86_ENABLE True(1)
EXP_ROM_NVME_UEFI_x86_ENABLE True(1)
ATS_ENABLED False(0)
DYNAMIC_VF_MSIX_TABLE False(0)
EXP_ROM_UEFI_ARM_ENABLE True(1)
EXP_ROM_UEFI_x86_ENABLE True(1)
EXP_ROM_PXE_ENABLE True(1)
ADVANCED_PCI_SETTINGS False(0)
SAFE_MODE_THRESHOLD 10
SAFE_MODE_ENABLE True(1)
The 'RO' shows parameters which are for read only and cannot be changed
Does anyone know how to solve this problem? Thanks!
I tried update the firmware of the BF2 and reset the configuration of BF2 but nothing changed.
I also tried other version of MLNX OFED drivers (5.5) and tried the whole processes in another machine but the problem still exists.
2
Answers
I solved this problem through turn off the IOMMU of the server.
vim /etc/default/grub
GRUB_CMDLINE_LINUX="crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet intel_iommu=off"
Then execute grub2-mkconfig -o /boot/grub2/grub.cfg and reboot now.
what the mean of intel_iommu=off , why add this parameter