Push the cooling task to a lower level to gain more parallelism
Push the cooling task to a lower level to gain more parallelism in the case of GRACKLE cooling for instance.
This now contains just the changes to the cooling task.
Merge request reports
Activity
added 1 commit
- a73d9fce - Add the missing new tasks to the interactive task plotting script
The beginning of the discussion was done here !1108 (comment 30096)
I have changed grackle in order to do an expensive and useless loop just after entering the fortran function and then exit the function.
# type/subtype : count minimum maximum sum mean percent # All threads : cooling/none : 1030 7.4340 1146.7485 164225.8926 159.4426 95.96 cooling/none : 58 7.4375 114690.2099 164176.1506 2830.6233 14.22
As you can see, we have the same total time, therefore calling a fortran code is not the problem.
Sorry about the git mess. For my future testing reference, this gives a great speed up with
engine_max_parts_per_kick
= 1e6, but after multiple steps crashed with[0001] [00634.3] runner_doiact_grav.c:runner_do_grav_down():71: cp->multipole not drifted. application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
Note that the changes to the kick are not included here.
@lhausammann can you show me in the grackle code where your local functions lock objects in memory to ensure thread-safety?
Arg, thanks for the question. I was not looking to the correct function ><. I forgot to copy one of the variable. By the way, the structure contains a lot of pointers, maybe it is due to the data pointed by them.
I should be doing this one https://gitlab.cosma.dur.ac.uk/swift/swiftsim/blob/master/src/cooling/grackle/cooling.c#L713
and I am doing this one https://gitlab.cosma.dur.ac.uk/swift/swiftsim/blob/master/src/cooling/grackle/cooling.c#L677
Then I am calling this function https://github.com/grackle-project/grackle/blob/master/src/clib/solve_chemistry.c#L87
Then grackle is calling a fortran function https://github.com/grackle-project/grackle/blob/master/src/clib/solve_chemistry.c#L171.
Edited by Loic HausammannI am not an expert on grackle, but I do not think they are locking anything. They expect the user to have a single thread / MPI rank and then they use OMP inside grackle. Therefore, I think they should not have any lock.
When I copy the structures, I copy some pointers, therefore the different threads will access the same arrays (but should not write inside them). Do you think the common access to the arrays may be the problem?
I was using gcc and now I tried with intel:
# type/subtype : count minimum maximum sum mean percent # All threads (new) : cooling/none : 1030 0.0090 1.3203 193.7010 0.1881 2.86 # All threads (old) : cooling/none : 58 0.0105 133.7666 192.3123 3.3157 2.72
As you can see, it seems that the problem is linked to gcc. Therefore vtune will not be very useful to find the problem. The result from vtune is that grackle is speeding most of its time in __libc_malloc (
~43%
).Oh, I was sure that it was only working with ICC. Thanks for the information :)
I compiled with ICC both SWIFT and grackle. Previously both were compiled with GCC.
It seems that the default makefile in grackle is a bit shitty. I have written my own and now I get that with GCC:
# type/subtype : count minimum maximum sum mean percent # All threads (new) : cooling/none : 1030 0.0094 1.4294 206.9717 0.2009 2.25 # All threads (old) : cooling/none : 58 0.0117 142.4641 205.2999 3.5397 2.24
added 1 commit
- d0ebee65 - Better default value for the cooling splitting for the default case where the…
assigned to @matthieu
mentioned in commit ac275e00