Skip to content
Snippets Groups Projects

Pack the timebin for the limiter communications

Merged Matthieu Schaller requested to merge pack_limiter into master

For the part communication related to the time-step limiter, only send the time-bin (an int8_t) rather than the whole particle. The packing is done manually rather than by using an MPI-provided mechanism.

Edited by Matthieu Schaller

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • I am running my standard full-physics eagle-25 box on 2 nodes with 2 ranks / node on c7.

    @pdraper Is there another test you'd suggest at this stage? Anything where the limiter was particularly problematic?

  • No, that is the test I would run, perhaps over a few more nodes. Stretching the problem usual reveals the MPI issues more easily.

  • As always, not sure what to make of the results. At least it's going in the direction we expect.

    runtime_a_limiter_comms

  • One thing I was wondering is whether doing the copy where we do it is indeed the best choice. It currently happens during the enqueue procedure. Would it be better to have a separate task do the copy. And when that copy is done we unlock the send task, which does just that: send.

  • Yes, that would be better. Don't want really want to do work when queueing. It is much how I do my split sends using RDMA, which superficially looks better (task plot wise), but the speed up wasn't great. Suspect that is down to NUMA effects and the overheads of more tasks!

  • I like the explicit split so we clearly identify in the task plot what the costs are.

    Did you modify the task structure to be able to pass the buffer pointer along?

  • To keep you up-to-speed, I will also do the same thing for the gpart but without the need to unpack on the receiving side. That will follow the changes I tried in !1318 (closed) but with the packing done by hand also in a separate task.

    These two changes + the numa updates would be nice to have.

  • Yes, I had an additional pointer for the extra memory location (and a lot more other things as well, for RDMA support).

  • Matthieu Schaller added 10 commits

    added 10 commits

    • d3fc8b9e - Add a new task type to do the limiter packing and unpacking
    • b7132b1b - Create the new tasks to pack and unpack the limiter and link them
    • 615fe200 - Activate the new pack/unpack tasks when activating the corresponding send/recv
    • 872ca5ae - Document the new tasks
    • 4472f4a4 - Call the correct runner_ function when the unpack task is found in runner_main()
    • 3e4f9704 - Use the correct types in the send/recv dependency creation
    • 0119d194 - Fix logic in task_pass_buffer()
    • f5c1c17f - Also call the new task function when enqueing a task that is done
    • aa8c633b - Too many brackets...
    • e6e15ca9 - Fix documentation string

    Compare with previous version

  • What do you think of the solution adopted here? I recycled the task->buff pointer but had to invent a function to pass the information along.

    I am not convinced I like it....

    It does run, however.

  • added 1 commit

    • ff94e221 - Add the new task types to the task plot tools

    Compare with previous version

  • I see, yes that does feel unpleasant and a bit extreme to avoid an extra pointer. Could we store a pointer to the send/recv task somewhere and just use the buff from it?

  • I guess I could use the pack task's buff pointer to point to the buff pointer of the send task. Is that cleaner?

    I toyed with the idea of fishing out the pointer from the dependency list (of size 1) otherwise. We can't really store the buffer pointer in the cell as their number is variable. A linked list might work there but that's also quite the overhead.

  • Here is what this all looks like on 2 and 8 cosma7 nodes prior to this last batch of changes:

    runtime_a_limiter_comms

  • Looking good, if only that orange track hadn't looked the same for the first 18 hours.

  • An alternative would be to maintain a dynamic list of buffers where each buffer is an object

    struct buffer{
       void* next;
       void* buffer;
       enum task_subtype sub_type;
       cell* c;
    };

    The engine maintains these. When we pack, we put the buffer pointer in one of these. The send task can then figure out which element it needs to send based on the cell pointer and sub_type. This has to come with a locking mechanism, however.

    Or we put this linked list in the cell object. Then it's lock-free but more memory-heavy.

    I don't like that more than the current solution.

  • The other option is to recycle buff in the task to point to the next task. Might be a tiny bit cleaner.

  • Isn't there a one to one relationship between pack/unpack and send/recv? If so the pack/unpack buff pointer could point at the send/recv task and via that access the send/recv task buff pointer, which both use.

  • added 1 commit

    • f51c9ec9 - Use directly the pointer of the next task

    Compare with previous version

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading