From aea36fd999a81630ed40a350122cbbdcffa44f14 Mon Sep 17 00:00:00 2001 From: Antonio Ragagnin <antonio.ragagnin@inaf.it> Date: Tue, 4 Feb 2025 14:08:37 +0000 Subject: [PATCH] Edit test_gpu_treebuild_on_leonardo.md --- test_gpu_treebuild_on_leonardo.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/test_gpu_treebuild_on_leonardo.md b/test_gpu_treebuild_on_leonardo.md index 11fd352..0b10f1f 100644 --- a/test_gpu_treebuild_on_leonardo.md +++ b/test_gpu_treebuild_on_leonardo.md @@ -4,6 +4,8 @@ The `hotwheels` tree build can run either in serial or in parallel. The parallel Here above there is the scaling test for insterting `1e7` particles into the tree. As you can see the algorithm scales very well with openmp threads. The GPU code scales as a CPU code with 4-8 threads. Therefore it is suggested to use this setup in situations where the number of cores per MPI rank is limited. Otherwise, for OpenMP-dominated runs, the CPU tree build scales much better than the GPU. +Notice how having multiple particles pear leaf do decrease the tree build process (less nodes to create and less recursion). This choise is strongly incouraged as it has even the great advantage of reducing the tree walk time (work in progress). + Improvement is in progress and things may vary in the future. Here below a job script for running the scaling test on Leonard BOOSTER machine. ```bash -- GitLab