Fix super pointer
Updated the logic setting the super
pointer and creating the "hierarchical" tasks (i.e. init, kick, ghost, etc.). Changes involve:
- Getting rid of the
g_super
pointer. We have only onesuper
cell per hierarchy. - Set the
super
pointer in a new routine. - Have only one routine to construct the hierarchical task to avoid overwriting things.
- Moved the external gravity task to be a self with a new kind of sub-type.
The last item is necessary for the following reason. If we want to run with only external gravity (no hydro, no normal gravity), which is useful to test this aspect individually, we need to create the tasks. Now, we only create the init/kick tasks for super cells, i.e. cells that have at least one self or pair. So I promoted the external_gravity task to be a self with a new sub_type.
IMO that's more clean now. What do you think ?
Fix #215 (closed).
Merge request reports
Activity
Added 23 commits:
-
e549ba94...d539422a - 22 commits from branch
master
- 04451df3 - Merge branch 'master' into fix_super_pointer
-
e549ba94...d539422a - 22 commits from branch
Added 1 commit:
- 0e071ac4 - Post-merge fixes
LGTM, handing over to @pdraper.
Reassigned to @pdraper
I think we have a problem with the Sedov Blast in MPI mode. When I run:
mpirun -np 4 ../swift_mpi -s -t 4 sedov.yml
after 274 steps it reports:
274 1.152344e-02 9.765625e-05 262144 0 408.301 [0000] [00064.9] space_regrid: basic cell dimensions have increased - recalculating the global partition. [0003] [00064.9] engine.c:engine_maketasks():1870: No hydro or gravity tasks created. application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3
Don't see the same issue on master, or non-MPI.
Any ideas, nothing has occurred to me yet.
So running the same thing with more verbosity, I get the following (interesting lines start with an arrow):
<snip> [0000] [00033.1] space_regrid: set cell dimensions to [ 7 7 7 ]. [0000] [00033.1] space_regrid: basic cell dimensions have increased - recalculating the global partition. [0000] [00033.1] space_parts_sort: took 0.580 ms. [0000] [00033.1] space_gparts_sort: took 0.355 ms. --> [0003] [00033.1] engine_redistribute: node 3 now has 0 parts and 0 gparts in 0 cells. [0003] [00033.1] engine_redistribute: took 30.774 ms. [0003] [00033.1] engine_makeproxies: took 0.131 ms. [0003] [00033.1] space_regrid: took 31.198 ms. [0003] [00033.1] engine_exchange_strays: sent out 0/0 parts/gparts, got 0/0 back. [0003] [00033.1] engine_exchange_strays: took 0.026 ms. [0003] [00033.1] space_parts_sort: took 0.136 ms. [0003] [00033.1] space_gparts_sort: took 0.125 ms. --> [0001] [00033.1] engine_redistribute: node 1 now has 0 parts and 0 gparts in 0 cells. [0001] [00033.1] engine_redistribute: took 31.139 ms. [0001] [00033.1] engine_makeproxies: took 0.052 ms. [0001] [00033.1] space_regrid: took 31.681 ms. [0003] [00033.1] space_split: took 0.150 ms. [0003] [00033.1] space_rebuild: took 31.687 ms. [0003] [00033.1] engine_exchange_cells: took 0.010 ms. [0001] [00033.1] engine_exchange_strays: sent out 0/0 parts/gparts, got 0/0 back. [0001] [00033.1] engine_exchange_strays: took 0.008 ms. [0001] [00033.1] space_parts_sort: took 0.132 ms. [0001] [00033.1] space_gparts_sort: took 0.097 ms. [0001] [00033.1] space_split: took 0.161 ms. [0001] [00033.1] space_rebuild: took 32.107 ms. [0001] [00033.1] engine_exchange_cells: took 0.005 ms. --> [0002] [00033.1] engine_redistribute: node 2 now has 0 parts and 0 gparts in 0 cells. [0002] [00033.1] engine_redistribute: took 31.641 ms. [0002] [00033.1] engine_makeproxies: took 0.064 ms. [0002] [00033.1] space_regrid: took 32.417 ms. [0002] [00033.1] engine_exchange_strays: sent out 0/0 parts/gparts, got 0/0 back. [0002] [00033.1] engine_exchange_strays: took 0.006 ms. [0002] [00033.1] space_parts_sort: took 0.165 ms. [0002] [00033.1] space_gparts_sort: took 0.074 ms. [0002] [00033.1] space_split: took 0.144 ms. [0002] [00033.1] space_rebuild: took 32.832 ms. [0002] [00033.1] engine_exchange_cells: took 0.007 ms. --> [0000] [00033.1] engine_redistribute: node 0 now has 262144 parts and 0 gparts in 343 cells. [0000] [00033.1] engine_redistribute: took 28.766 ms. [0000] [00033.1] engine_makeproxies: took 0.080 ms. [0000] [00033.1] space_regrid: took 33.045 ms. [0000] [00033.1] engine_exchange_strays: sent out 0/0 parts/gparts, got 0/0 back. [0000] [00033.1] engine_exchange_strays: took 0.028 ms.
It looks like as a result of the partitioning, we end up with all particles on one single node. We then go on and create tasks. But these are only created for non-empty cells. So for a space with 0 particles, we get no tasks which triggers this newly created error message.
If I change the condition to trigger an error to:
if (e->sched.nr_tasks == 0 && (s->nr_gparts > 0 || s->nr_parts > 0)) error("No hydro or gravity tasks created.");
then there is no problem any more and the test case runs to completion.
Are we happy with this solution ? Maybe the partitioning algorithm should not return empty nodes but that's unrelated to the issue we are trying to solve here.
Edited by Matthieu SchallerAdded 1 commit:
- 8ebc446c - Only trigger an error if there are no tasks and a non-zero number of particles
mentioned in issue #224 (closed)
mentioned in commit ce5d0537