diff --git a/test_gpu_treebuild_on_leonardo.md b/test_gpu_treebuild_on_leonardo.md
index 11fd352f4b60a2444a7e3cf963a01630b94b765d..0b10f1f67c8f06089de505e4efde53644b3cd2e0 100644
--- a/test_gpu_treebuild_on_leonardo.md
+++ b/test_gpu_treebuild_on_leonardo.md
@@ -4,6 +4,8 @@ The `hotwheels` tree build can run either in serial or in parallel. The parallel
 
 Here above there is the scaling test for insterting `1e7` particles into the tree. As you can see the algorithm scales very well with openmp threads. The GPU code scales as a CPU code with 4-8 threads. Therefore it is suggested to use this setup in situations where the number of cores per MPI rank is limited. Otherwise, for OpenMP-dominated runs, the CPU tree build scales much better than the GPU.
 
+Notice how having multiple particles pear leaf do decrease the tree build process (less nodes to create and less recursion). This choise is strongly incouraged as it has even the great advantage of reducing the tree walk time (work in progress).
+
 Improvement is in progress and things may vary in the future. Here below a job script for running the scaling test on Leonard BOOSTER machine.
 
 ```bash