tf op compile trouble shooting

tf operation compile problem

problems encountered in reproducing PlaneNet

problem descriptions

PlaneNet 自己定义了 op, 需要编译相关文件才能正常 import, 具体细节详见官网 ad op. When I compile the file, some errors pop up.

Basically, all errors are of the same form as:

1
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9220): error: argument of type "const void *" is incompatible with parameter of type "const float *"

My system settings are:

  • ubuntu: 16.04
  • gcc: 5.5
  • python: 2.7
  • tensorflow: 1.4
  • cuda: 9.0.176

Firstly, I downgrade my gcc to 4.8 by instructions as follows:

1
2
3
4
sudo apt remove gcc g++
sudo apt install gcc-4.8 g++-4.8
sudo ln -s /usr/bin/gcc-4.8 /usr/bin/gcc
sudo ln -s /usr/bin/g++-4.8 /usr/bin/gcc

For the good side, the compile can be done with some warnings, but when I tried to import my compiled tf op, an error came up:

1
tensorflow.python.framework.errors_impl.NotFoundError: /home/vradmin/Desktop/PlaneNet/nndistance/tf_nndistance_so.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv

My system settings are:

  • ubuntu: 16.04
  • gcc: 4.8
  • python: 2.7
  • tensorflow: 1.4
  • cuda: 9.0.176

The undefined symbol problem typically result from the conflicts between the gcc version of the one tf used to compile with and the one you use to compile the new op .cpp file. If your tf is compiled with 4.\ gcc version, and your specified tf op is compiled with gcc 5.*, then you ought to add -D_GLIBCXX_USE_CXX11_ABI=0 flags when compiling. As is often the case, the low-version tf is compiled with 4.* gcc.

solution

Finally, I solve the problem with following system settings:

  • ubuntu: 16.04
  • gcc: 5.4
  • python: 2.7
  • tensorflow: 1.4
  • cuda: 9.0.176

Besides, I remove the -D_GLIBCXX_USE_CXX11_ABI=0 flag which should only be used when the tf is compiled with 4.* gcc.

associated issues