From e1ae65d05988a8397c6238ca84b8f3e9b549aab0 Mon Sep 17 00:00:00 2001
From: Antonio Ragagnin <antonio.ragagnin@inaf.it>
Date: Tue, 4 Feb 2025 14:01:13 +0000
Subject: [PATCH] Add new file

---
 test_gpu_treebuild_on_leonardo.md | 11 +++++++++++
 1 file changed, 11 insertions(+)
 create mode 100644 test_gpu_treebuild_on_leonardo.md

diff --git a/test_gpu_treebuild_on_leonardo.md b/test_gpu_treebuild_on_leonardo.md
new file mode 100644
index 0000000..950110f
--- /dev/null
+++ b/test_gpu_treebuild_on_leonardo.md
@@ -0,0 +1,11 @@
+![octree scaling gpu vs cpu](leonardo_booster.sh)
+
+The `hotwheels` tree build can run either in serial or in parallel. The parallel version can run both on multiple cores and or on a GPU using OpenMP+Target directives.
+
+Here above there is the scaling test for insterting `1e7` particles into the tree. As you can see the algorithm scales very well with openmp threads. The GPU code scales as a CPU code with 4-8 threads. Therefore it is suggested to use this setup in situations where the number of cores per MPI rank is limited. Otherwise, for OpenMP-dominated runs, the CPU tree build scales much better than the GPU.
+
+Improvement is in progress and things may vary in the future. Here below a job script for running the scaling test on Leonard BOOSTER machine.
+
+```bash
+::include{file=leonardo_booster.sh}
+```
\ No newline at end of file
-- 
GitLab