tf op compile trouble shooting
tf operation compile problem
problems encountered in reproducing PlaneNet
problem descriptions
PlaneNet 自己定义了 op, 需要编译相关文件才能正常 import, 具体细节详见官网 ad op. When I compile the file, some errors pop up.
Basically, all errors are of the same form as:
1 | /usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9220): error: argument of type "const void *" is incompatible with parameter of type "const float *" |
My system settings are:
- ubuntu: 16.04
- gcc: 5.5
- python: 2.7
- tensorflow: 1.4
- cuda: 9.0.176
Firstly, I downgrade my gcc to 4.8 by instructions as follows:
1 | sudo apt remove gcc g++ |
For the good side, the compile can be done with some warnings, but when I tried to import my compiled tf op, an error came up:
1 | tensorflow.python.framework.errors_impl.NotFoundError: /home/vradmin/Desktop/PlaneNet/nndistance/tf_nndistance_so.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv |
My system settings are:
- ubuntu: 16.04
- gcc: 4.8
- python: 2.7
- tensorflow: 1.4
- cuda: 9.0.176
The undefined symbol problem typically result from the conflicts between the gcc version of the one tf used to compile with and the one you use to compile the new op .cpp file. If your tf is compiled with 4.\ gcc version, and your specified tf op is compiled with gcc 5.*, then you ought to add -D_GLIBCXX_USE_CXX11_ABI=0 flags when compiling. As is often the case, the low-version tf is compiled with 4.* gcc.
solution
Finally, I solve the problem with following system settings:
- ubuntu: 16.04
- gcc: 5.4
- python: 2.7
- tensorflow: 1.4
- cuda: 9.0.176
Besides, I remove the -D_GLIBCXX_USE_CXX11_ABI=0 flag which should only be used when the tf is compiled with 4.* gcc.