-
Notifications
You must be signed in to change notification settings - Fork 312
P4d support on Amazon Linux 1
The Nvidia Fabric Manager package is required on P4d instances to fully exploit the potential of A100 GPUs; however this package is not available on Amazon Linux 1. Therefore, although the combination of base_os = alinux
with p4d.24xlarge
instance type is allowed, CUDA initialization is likely to fail in compute nodes with this configuration because NVLink connections are only enabled after the NVIDIA kernel driver is loaded and Fabric Manager configures them (see: https://docs.nvidia.com/datacenter/tesla/pdf/fabric-manager-user-guide.pdf). Additionally, Amazon Linux 1 reached end-of-life and entered maintenance mode on 12/31/2021, and as such it is no longer recommended for use with ParallelCluster.
We recommend using one of the other supported operating systems as value for the base_os
configuration parameter, such as Amazon Linux 2, which requires minimal change in user space relative to Amazon Linux 1. The full set of supported operating systems compatible with the Nvidia Fabric Manager are: alinux2
, centos7
, centos8
, ubuntu1604
or ubuntu1804
.