Skip to content
Snippets Groups Projects

Fix super pointer

Merged Matthieu Schaller requested to merge fix_super_pointer into master

Updated the logic setting the super pointer and creating the "hierarchical" tasks (i.e. init, kick, ghost, etc.). Changes involve:

  • Getting rid of the g_super pointer. We have only one super cell per hierarchy.
  • Set the super pointer in a new routine.
  • Have only one routine to construct the hierarchical task to avoid overwriting things.
  • Moved the external gravity task to be a self with a new kind of sub-type.

The last item is necessary for the following reason. If we want to run with only external gravity (no hydro, no normal gravity), which is useful to test this aspect individually, we need to create the tasks. Now, we only create the init/kick tasks for super cells, i.e. cells that have at least one self or pair. So I promoted the external_gravity task to be a self with a new sub_type.

IMO that's more clean now. What do you think ?

Fix #215 (closed).

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Matthieu Schaller Added 23 commits:

    Added 23 commits:

  • Added 1 commit:

  • LGTM, handing over to @pdraper.

  • Reassigned to @pdraper

  • I think we have a problem with the Sedov Blast in MPI mode. When I run:

    mpirun -np 4 ../swift_mpi -s -t 4 sedov.yml

    after 274 steps it reports:

         274   1.152344e-02   9.765625e-05     262144          0               408.301
    [0000] [00064.9] space_regrid: basic cell dimensions have increased - recalculating the global partition.
    [0003] [00064.9] engine.c:engine_maketasks():1870: No hydro or gravity tasks created.
    application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3

    Don't see the same issue on master, or non-MPI.

    Any ideas, nothing has occurred to me yet.

  • BTW, at this stage it has already gone through recalculating the global partition several times, so it is not clear that is causing the issue.

  • So running the same thing with more verbosity, I get the following (interesting lines start with an arrow):

    <snip>
    [0000] [00033.1] space_regrid: set cell dimensions to [ 7 7 7 ].
    [0000] [00033.1] space_regrid: basic cell dimensions have increased - recalculating the global partition.
    [0000] [00033.1] space_parts_sort: took 0.580 ms.
    [0000] [00033.1] space_gparts_sort: took 0.355 ms.
    --> [0003] [00033.1] engine_redistribute: node 3 now has 0 parts and 0 gparts in 0 cells.
    [0003] [00033.1] engine_redistribute: took 30.774 ms.
    [0003] [00033.1] engine_makeproxies: took 0.131 ms.
    [0003] [00033.1] space_regrid: took 31.198 ms.
    [0003] [00033.1] engine_exchange_strays: sent out 0/0 parts/gparts, got 0/0 back.
    [0003] [00033.1] engine_exchange_strays: took 0.026 ms.
    [0003] [00033.1] space_parts_sort: took 0.136 ms.
    [0003] [00033.1] space_gparts_sort: took 0.125 ms.
    --> [0001] [00033.1] engine_redistribute: node 1 now has 0 parts and 0 gparts in 0 cells.
    [0001] [00033.1] engine_redistribute: took 31.139 ms.
    [0001] [00033.1] engine_makeproxies: took 0.052 ms.
    [0001] [00033.1] space_regrid: took 31.681 ms.
    [0003] [00033.1] space_split: took 0.150 ms.
    [0003] [00033.1] space_rebuild: took 31.687 ms.
    [0003] [00033.1] engine_exchange_cells: took 0.010 ms.
    [0001] [00033.1] engine_exchange_strays: sent out 0/0 parts/gparts, got 0/0 back.
    [0001] [00033.1] engine_exchange_strays: took 0.008 ms.
    [0001] [00033.1] space_parts_sort: took 0.132 ms.
    [0001] [00033.1] space_gparts_sort: took 0.097 ms.
    [0001] [00033.1] space_split: took 0.161 ms.
    [0001] [00033.1] space_rebuild: took 32.107 ms.
    [0001] [00033.1] engine_exchange_cells: took 0.005 ms.
    --> [0002] [00033.1] engine_redistribute: node 2 now has 0 parts and 0 gparts in 0 cells.
    [0002] [00033.1] engine_redistribute: took 31.641 ms.
    [0002] [00033.1] engine_makeproxies: took 0.064 ms.
    [0002] [00033.1] space_regrid: took 32.417 ms.
    [0002] [00033.1] engine_exchange_strays: sent out 0/0 parts/gparts, got 0/0 back.
    [0002] [00033.1] engine_exchange_strays: took 0.006 ms.
    [0002] [00033.1] space_parts_sort: took 0.165 ms.
    [0002] [00033.1] space_gparts_sort: took 0.074 ms.
    [0002] [00033.1] space_split: took 0.144 ms.
    [0002] [00033.1] space_rebuild: took 32.832 ms.
    [0002] [00033.1] engine_exchange_cells: took 0.007 ms.
    --> [0000] [00033.1] engine_redistribute: node 0 now has 262144 parts and 0 gparts in 343 cells.
    [0000] [00033.1] engine_redistribute: took 28.766 ms.
    [0000] [00033.1] engine_makeproxies: took 0.080 ms.
    [0000] [00033.1] space_regrid: took 33.045 ms.
    [0000] [00033.1] engine_exchange_strays: sent out 0/0 parts/gparts, got 0/0 back.
    [0000] [00033.1] engine_exchange_strays: took 0.028 ms.

    It looks like as a result of the partitioning, we end up with all particles on one single node. We then go on and create tasks. But these are only created for non-empty cells. So for a space with 0 particles, we get no tasks which triggers this newly created error message.

    If I change the condition to trigger an error to:

      if (e->sched.nr_tasks == 0 && (s->nr_gparts > 0 || s->nr_parts > 0))
        error("No hydro or gravity tasks created.");

    then there is no problem any more and the test case runs to completion.

    Are we happy with this solution ? Maybe the partitioning algorithm should not return empty nodes but that's unrelated to the issue we are trying to solve here.

    Edited by Matthieu Schaller
  • Added 1 commit:

    • 8ebc446c - Only trigger an error if there are no tasks and a non-zero number of particles
  • Good, at least that fixes this up. I'll open an issue to look at this partitioning, they may be a problem.

  • mentioned in issue #224 (closed)

  • Might be related to the multi-dt policy ?

  • Shouldn't matter in this case as the partitioning does not use the task weights. The default method is to use the geometric centres of the current nodes to seed a new partition, that looks to have failed.

    In the meanwhile this now looks good to go.

  • Peter W. Draper mentioned in commit ce5d0537

    mentioned in commit ce5d0537

  • Peter W. Draper Status changed to merged

    Status changed to merged

Please register or sign in to reply
Loading