icon-cookie
The website uses cookies to optimize your user experience. Using this website grants us the permission to collect certain information essential to the provision of our services to you, but you may change the cookie settings within your browser any time you wish. Learn more
I agree
blank_error__heading
blank_error__body
Text direction?

Does the sparseConvNet have windows version? #128

Open
dengyuhk opened this issue on Oct 5, 2019 · 5 comments

Comments

Copy link
Copy link

victoryc commented on Jul 22
edited

Copy link

victoryc commented 23 days ago
edited

I finally managed to get this library to build on my Windows 10, 64-bit machine. I am leaving notes about the main changes I needed to get that far for the benefit of anyone else who might want to build this library on Windows. May be these notes might also help the authors in case they care to make this library cross-platform.

- Added self.use_ninja = False in the __init__ function of BuildExtension class in ...\site-packages\torch\utils\cpp_extension.py, to get more informative output about where the build errors are. Instead of hacking the code like that, there might be a way to pass some argument to disable ninja build; I didn't explore that.
   - For "Warning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified".
	- In each Anaconda prompt, before "cl" is invoked, we need to run appropriate Visual Studio Developer command file to set up the environment variables. Example: "call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Auxiliary\Build\vcvars64.bat". The exact batch file to execute depends on the architecture for which we want to compile etc. 
- I decided not to use develop.sh, because it leads to "Python: unknown command" error due to bash not being aware of Anaconda environment (??). The first line in develop.sh is for cleaning up -- which can be done manually. I chose to directly run the second line "python setup.py develop". 
- Handling the <tr1/functional> file not found error
	- "tr1" was a temporary namespace that was in use when "functional" was still not part of the C++ standard. 
	- In "sparseconfig.h" under "sparseconvnet\SCN\Metadata\sparsehash\internal", made the following changes:
		- Changed "#define HASH_FUN_H <tr1/functional>" to "#define HASH_FUN_H <functional>"
		- Changed "#define HASH_NAMESPACE std::tr1" to "#define HASH_NAMESPACE std"
	- In cpp_extension.py, line 1398, set num_workers = 1
	- On line 297, self.use_ninja = False
- There are  several occurrences of "or" (instead of "||") and "and" (instead of "&&") in Metadata.cpp, NetworkInNetwork.cpp etc. To get them to build with Microsoft compiler, we need to pass the "/permissive-" option (Note: Though '/Za' flag is supposed to fix them, it causes other errors). Updated setup.py to add that option.
- c:\blahblah\SparseConvNet\sparseconvnet\SCN\Metadata\sparsehash\internal\hashtable-common.h(166): error C2760: syntax error: unexpected token 'typedef', expected ';'
	- In hashtable-common.h, changed the definition of SPARSEHASH_COMPILE_ASSERT(expr, msg) to static_assert(expr, "message")
- c:\blahblah\SparseConvNet\sparseconvnet\SCN\CUDA/SparseToDense.cpp(29): error C2664: 'at::Tensor &at::Tensor::resize_(c10::IntArrayRef,c10::optional<c10::MemoryFormat>) const': cannot convert argument 1 from 'std::array<long,3>' to 'c10::IntArrayRef'
	- Changed "std::array<long, Dimension + 2> sz" to "std::array<int64_t, Dimension + 2> sz"
- c:\blahblah\SparseConvNet\sparseconvnet\SCN\CUDA/BatchNormalization.cu(83): error: calling a __host__ function("pow<float, double, (int)0> ") from a __global__ function("BatchNormalization_f_test<float, (int)16, (int)64> ") is not allowed
	- Changed "pow(_saveInvStd / nActive + eps, -0.5)" to "pow(double(_saveInvStd / nActive + eps), -0.5)". Otherwise, the calling signature happens to be pow(float, double) which does not correspond to the signature of any variant of "pow" function available on CUDA.
- Big one: After doing all of the above, I got the code to compile, but a mysterious link error appeared. The error said something like this: sparseconvnet_cuda.obj : error LNK2001: unresolved external symbol "public: long * __cdecl at::Tensor::data_ptr<long>(void)const " (??$data_ptr@J@Tensor@at@@QEBAPEAJXZ)
	- This was too mysterious. A knowledgeable poster on CUDA forum offered a clue to help this. As that poster said, code meant to be cross-platform should not be using "long". It would end up being 32 bit wide on 64-bit Windows machines while being 64 bit wide on 64-bit Linux machines.
	- Replaced all occurrences of "long" by "int64_t" and the mysterious link error went away.
Measure
Measure
Related Notes
Get a free MyMarkup account to save this article and view it later on any device.
Create account

End User License Agreement

Summary | 1 Annotation
- Added self.use_ninja = False in the __init__ function of BuildExtension class in ...\site-packages\torch\utils\cpp_extension.py, to get more informative output about where the build errors are. Instead of hacking the code like that, there might be a way to pass some argument to disable ninja build; I didn't explore that. - For "Warning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified". - In each Anaconda prompt, before "cl" is invoked, we need to run appropriate Visual Studio Developer command file to set up the environment variables. Example: "call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Auxiliary\Build\vcvars64.bat". The exact batch file to execute depends on the architecture for which we want to compile etc. - I decided not to use develop.sh, because it leads to "Python: unknown command" error due to bash not being aware of Anaconda environment (??). The first line in develop.sh is for cleaning up -- which can be done manually. I chose to directly run the second line "python setup.py develop". - Handling the <tr1/functional> file not found error - "tr1" was a temporary namespace that was in use when "functional" was still not part of the C++ standard. - In "sparseconfig.h" under "sparseconvnet\SCN\Metadata\sparsehash\internal", made the following changes: - Changed "#define HASH_FUN_H <tr1/functional>" to "#define HASH_FUN_H <functional>" - Changed "#define HASH_NAMESPACE std::tr1" to "#define HASH_NAMESPACE std" - In cpp_extension.py, line 1398, set num_workers = 1 - On line 297, self.use_ninja = False - There are several occurrences of "or" (instead of "||") and "and" (instead of "&&") in Metadata.cpp, NetworkInNetwork.cpp etc. To get them to build with Microsoft compiler, we need to pass the "/permissive-" option (Note: Though '/Za' flag is supposed to fix them, it causes other errors). Updated setup.py to add that option. - c:\blahblah\SparseConvNet\sparseconvnet\SCN\Metadata\sparsehash\internal\hashtable-common.h(166): error C2760: syntax error: unexpected token 'typedef', expected ';' - In hashtable-common.h, changed the definition of SPARSEHASH_COMPILE_ASSERT(expr, msg) to static_assert(expr, "message") - c:\blahblah\SparseConvNet\sparseconvnet\SCN\CUDA/SparseToDense.cpp(29): error C2664: 'at::Tensor &at::Tensor::resize_(c10::IntArrayRef,c10::optional<c10::MemoryFormat>) const': cannot convert argument 1 from 'std::array<long,3>' to 'c10::IntArrayRef' - Changed "std::array<long, Dimension + 2> sz" to "std::array<int64_t, Dimension + 2> sz" - c:\blahblah\SparseConvNet\sparseconvnet\SCN\CUDA/BatchNormalization.cu(83): error: calling a __host__ function("pow<float, double, (int)0> ") from a __global__ function("BatchNormalization_f_test<float, (int)16, (int)64> ") is not allowed - Changed "pow(_saveInvStd / nActive + eps, -0.5)" to "pow(double(_saveInvStd / nActive + eps), -0.5)". Otherwise, the calling signature happens to be pow(float, double) which does not correspond to the signature of any variant of "pow" function available on CUDA. - Big one: After doing all of the above, I got the code to compile, but a mysterious link error appeared. The error said something like this: sparseconvnet_cuda.obj : error LNK2001: unresolved external symbol "public: long * __cdecl at::Tensor::data_ptr<long>(void)const " (??$data_ptr@J@Tensor@at@@QEBAPEAJXZ) - This was too mysterious. A knowledgeable poster on CUDA forum offered a clue to help this. As that poster said, code meant to be cross-platform should not be using "long". It would end up being 32 bit wide on 64-bit Windows machines while being 64 bit wide on 64-bit Linux machines. - Replaced all occurrences of "long" by "int64_t" and the mysterious link error went away.
2020/08/28 07:34