Can I build pytorch/xla from source on Windows 11 WSL using Ubuntu?

intel_chris
May 9, 2023
247 views
0 votes
2 Answers

I am attempting to build Pytorch/XLA on a new Windows 11 laptop (16" Lenovo AMD Ryzen Ideapad 5 pro to be specific) under WSL (Ubuntu 22.04) following the Linux instructions at https://github.com/pytorch/pytorch#from-source.

However, no matter what I try, I get compilation errors (mostly warnings that are promoted to errors).

If I try the steps out of the box:

cmake install cmake ninja
pip install -r requirements.txt
conda install mkl mkl-include
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py develop

I get errors like the following:

In function ‘__m512i _mm512_slli_epi32(__m512i, unsigned int)’,
    inlined from ‘void fbgemm::{anonymous}::Bfloat16ToFloatKernelAvx512(const fbgemm::bfloat16*, float*)’ at /home/pytorch/third_party/fbgemm/src/FbgemmBfloat16ConvertAvx512.cc:37:38,
    inlined from ‘void fbgemm::Bfloat16ToFloat_avx512(const bfloat16*, float*, size_t)’ at /home/pytorch/third_party/fbgemm/src/FbgemmBfloat16ConvertAvx512.cc:54:32:
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h:1242:50: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
 1242 |   return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B,
      |                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
 1243 |                                                   (__v16si)
      |                                                   ~~~~~~~~~
 1244 |                                                   _mm512_undefined_epi32 (),
      |                                                   ~~~~~~~~~~~~~~~~~~~~~~~~~~
 1245 |                                                   (__mmask16) -1);
      |                                                   ~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h: In function ‘void fbgemm::Bfloat16ToFloat_avx512(const bfloat16*, float*, size_t)’:
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h:206:11: note: ‘__Y’ was declared here
  206 |   __m512i __Y = __Y;
      |           ^~~
cc1plus: all warnings being treated as errors

If I read notes on the self-initialized variables. This looks like an idiom to suppress that warning message in g++, but instead it is causing it. And I’ve tried numerous things (not including editing the source) to turn that warning off, none of which have had any effect.

If I try to switch to Clang via ENV variables (which works for building LLVM and MLIR):

export CC=clang
export CXX=clang++

I can’t even get the cmake configuration process to run:

    Change Dir: /home/pytorch/build/CMakeFiles/CMakeTmp
    
    Run Build Command(s):/home/cfclark/anaconda3/bin/ninja cmTC_6fa39 && [1/2] Building CXX object CMakeFiles/cmTC_6fa39.dir/testCXXCompiler.cxx.o
    [2/2] Linking CXX executable cmTC_6fa39
    FAILED: cmTC_6fa39 
    : && /usr/bin/clang++   CMakeFiles/cmTC_6fa39.dir/testCXXCompiler.cxx.o -o cmTC_6fa39   && :
    /usr/bin/ld: cannot find -lstdc++: No such file or directory
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    ninja: build stopped: subcommand failed.

Tags: c#cmake linux pytorch windows-subsystem-for-linux

Answers

Chosen as BEST ANSWER
- intel_chris
- May 9, 2023 at 12:49 pm
- 0 votes
0
Following the breadcrumb tossed me by @Richard Critten:

https://github.com/pytorch/pytorch/issues/77939

This is a regression in g++ 12 (and 13).

To resolve it I needed to install both gcc and g ++ 11 and export them as CC and CXX environment variables, clean the CmakeCache and rerun the pytorch build script.
```
sudo apt-get install gcc-11 g++-11
export CC=gcc-11
export CXX=g++-11
rm ./build/CMakeCache.txt
python setup.py develop
```
However, worth noting ,some of the torch files relied on the libstdc++.so from the gcc 12 version. So, while this built, importing torch failed. I resolved that by copying my libstdc++.so.6.0.31 into the anaconda directory and relinking the libstdc++.so and libstdc++.so.6 to it.
```
cp /usr/lib/libstdc++.so.6.31 ~/anaconda3/lib
pushd ~/anaconda3/binln
rm libstdc++.so libstdc++.so.6
ln -s libstdc++.so.6.31 libstdc++.so
ln -s libstdc++.so.6.31 libstdc++.so.6
popd
```
I do not have a solution yet to building pytorch with clang on WSL, which is actually my preference as building LLVM and MLIR both seem to compile better with clang and I'd prefer one toolchain.

(Edit)

- User
- May 8, 2023 at 8:33 pm
- 0 votes
0
This issue is covered in Fails to compile with GCC 12.1.0
#77939.

Cause: This seems to be a bug related to false-positives with warnings in GCC 12.1.0 and apparently also some versions of GCC 13. A fix has been made here.

Workarounds:
- GitHub user @Birch-San had success building with the following commands to ignore the false-positive warnings (source):
  CUDA_DIR=/usr/local/cuda-12.1 CXXFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -Wno-nonnull' CFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -Wno-nonnull' USE_ROCM=0 TORCH_CUDA_ARCH_LIST=8.9 PATH="$CUDA_DIR/bin:$PATH" LD_LIBRARY_PATH=$CUDA_DIR/lib64 python setup.py develop
- Other users had success downgrading to GCC 11 (source1, source2).
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.