diff --git a/test_gpu_treebuild_on_leonardo.md b/test_gpu_treebuild_on_leonardo.md new file mode 100644 index 0000000000000000000000000000000000000000..950110f3c1ed838789610c1ea3a30dc45f72a478 --- /dev/null +++ b/test_gpu_treebuild_on_leonardo.md @@ -0,0 +1,11 @@ + + +The `hotwheels` tree build can run either in serial or in parallel. The parallel version can run both on multiple cores and or on a GPU using OpenMP+Target directives. + +Here above there is the scaling test for insterting `1e7` particles into the tree. As you can see the algorithm scales very well with openmp threads. The GPU code scales as a CPU code with 4-8 threads. Therefore it is suggested to use this setup in situations where the number of cores per MPI rank is limited. Otherwise, for OpenMP-dominated runs, the CPU tree build scales much better than the GPU. + +Improvement is in progress and things may vary in the future. Here below a job script for running the scaling test on Leonard BOOSTER machine. + +```bash +::include{file=leonardo_booster.sh} +``` \ No newline at end of file