Skip to content
Snippets Groups Projects

[WIP] Re-entrant Threadpool Mapper

Closed Pedro Gonnet requested to merge threadpool_rmapper into master

Quick hack of a re-entrant mapper for the threadpool, i.e. a mapper to which additional data can be added on the fly.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Author Developer

    @matthieu, can you have a look at this?

    Currently I've only converted the runner_do_unskip mapper to this mode of operation, could you possibly run the EAGLE_25 benchmark and plot the threadpool tasks for the first few steps? What I'm most interested in is the behaviour of the smallest steps, i.e. the threadpool tasks should now parallelize much better.

    Cheers!

  • @jwillis could you run this on your special node against the latest master when you have some time ? Thanks!

    I'll make some threadpool task plots in parallel.

  • Have you got the threadpool task plots by any chance? Because my scaling test has been stuck on step 1 with 1 thread for the past 4 hours...

  • Not tried yet. That sounds like a bug/feature. Could you try with more than 1 thread ?

  • Author Developer

    Hmm... Never actually tried it with a single thread. Guess I will tonight :smiley:

  • What problems did you run? Because it's struggling with EAGLE_25 on 16 threads even.

  • Author Developer

    OK, that's bad. I tried it on EAGLE_12 with up to four threads (that's all my laptop can take)...

  • Same problem with EAGLE_12 on 4 threads. I'm guessing you're using GCC?

  • Author Developer

    @jwillis, yes, it's the only thing I have... Any idea what it's hanging on?

  • Not sure yet, I'm running it with GCC.

  • Author Developer

    Otherwise, just let it be and I'll do some further testing as soon as I can. This was really just a first hack, so god knows what kind of bugs I've hidden in there!

  • I have ran EAGLE_12 with GCC on 4 threads and with -v 1 I get this far:

    Welcome to the cosmological hydrodynamical code
        ______       _________________
       / ___/ |     / /  _/ ___/_  __/
       \__ \| | /| / // // /_   / /   
      ___/ /| |/ |/ // // __/  / /    
     /____/ |__/|__/___/_/    /_/     
     SPH With Inter-dependent Fine-grained Tasking
    
     Version : 0.6.0
     Revision: v0.6.0-209-g286fc4ff, Branch: threadpool_rmapper, Date: 2017-08-29 22:36:25 +0200
     Webpage : www.swiftsim.com
    
     Config. options: '--disable-doxygen-doc --disable-mpi --enable-debugging-checks'
    
     Compiler: GCC, Version: 4.8.1
     CFLAGS  : '-O3 -fomit-frame-pointer -malign-double -fstrict-aliasing -ffast-math -funroll-loops -march=corei7-avx -mavx -pthread -Wall -Wextra -Wno-unused-parameter -Werror'
    
     HDF5 library version: 1.8.9
     FFTW library version: 3.x (details not available)
    
    [00000.0] main: CPU frequency used for tick conversion: 2600000000 Hz
    [00000.0] main: Running on: m5019
    [00000.0] main: WARNING: Debugging checks activated. Code will be slower !
    [00000.0] main: sizeof(part)        is  160 bytes.
    [00000.0] main: sizeof(xpart)       is   64 bytes.
    [00000.0] main: sizeof(spart)       is   96 bytes.
    [00000.0] main: sizeof(gpart)       is  128 bytes.
    [00000.0] main: sizeof(multipole)   is  160 bytes.
    [00000.0] main: sizeof(grav_tensor) is  288 bytes.
    [00000.0] main: sizeof(task)        is   64 bytes.
    [00000.0] main: sizeof(cell)        is  768 bytes.
    [00000.0] main: Reading runtime parameters from file 'eagle_12.yml'
    [00000.0] main: Internal unit system: U_M = 1.989000e+43 g.
    [00000.0] main: Internal unit system: U_L = 3.085678e+24 cm.
    [00000.0] main: Internal unit system: U_t = 3.085678e+19 s.
    [00000.0] main: Internal unit system: U_I = 1.000000e+00 A.
    [00000.0] main: Internal unit system: U_T = 1.000000e+00 K.
    [00000.0] phys_const_print:    Gravitational constant = 4.302051e+01
    [00000.0] phys_const_print:            Speed of light = 2.997925e+05
    [00000.0] phys_const_print:           Planck constant = 9.787529e-02
    [00000.0] phys_const_print:        Boltzmann constant = 6.941420e-70
    [00000.0] phys_const_print:     Thomson cross-section = 6.986843e-74
    [00000.0] phys_const_print:             Electron-Volt = 8.055187e-66
    [00000.0] phys_const_print:                      Year = 1.022690e-12
    [00000.0] phys_const_print:         Astronomical Unit = 4.848136e-12
    [00000.0] phys_const_print:                    Parsec = 9.999999e-07
    [00000.0] phys_const_print:                Solar mass = 9.997486e-11
    [00000.0] main: Reading ICs from file './EAGLE_ICs_12.hdf5'
    [00000.0] read_ic_single: IC and internal units match. No conversion needed.
    [00004.6] read_ic_single: Particle Type 5 not yet supported. Particles ignored
    [00004.6] main: Reading initial conditions took 4599.469 ms.
    [00004.6] main: Read 6387423 gas particles, 0 star particles and 0 gparts from the ICs.
    [00004.6] space_init: max_size set to 8000000, sub_size_pair set to 256000000, sub_size_self set to 32000, split_size set to 400
    [00005.1] space_regrid: h_max is 3.480e-01 (cell_min=6.989e-01).
    [00005.1] space_regrid: (re)griding space cdim=(12 12 12)
    [00005.1] space_regrid: set cell dimensions to [ 12 12 12 ].
    [00005.1] space_regrid: took 60.905 ms.
    [00005.1] main: space_init took 461.842 ms.
    [00005.1] main: space dimensions are [ 8.471 8.471 8.471 ].
    [00005.1] main: space is periodic.
    [00005.1] main: highest-level cell dimensions are [ 12 12 12 ].
    [00005.1] main: 6387423 parts in 1728 cells.
    [00005.1] main: 0 gparts in 1728 cells.
    [00005.1] main: 0 sparts in 1728 cells.
    [00005.1] main: maximum depth is 0.
    [00005.1] main: map_cellcheck picked up 0 parts.
    [00005.1] main: nr of cells at depth 0 is 1728.
    [00005.1] engine_init: Affinity at entry: 11111111111111111111111111111111
    [00005.1] engine_init: prefer NUMA-distant CPUs
    [00005.1] engine_init: cpu map is [ 0 8 1 9 2 10 3 11 4 12 5 13 6 14 7 15 16 24 17 25 18 26 19 27 20 28 21 29 22 30 23 31 ].
    [00005.1] engine_policy: engine policies are [  steal  keep  numa_affinity  hydro  ]
    [00005.1] hydro_props_print: Equation of state: Ideal gas.
    [00005.1] hydro_props_print: Adiabatic index gamma: 1.666667.
    [00005.1] hydro_props_print: Hydrodynamic scheme: Gadget-2 version of SPH (Springel 2005) in 3D.
    [00005.1] hydro_props_print: Hydrodynamic kernel: Cubic spline (M4) with eta=1.234800 (48.00 neighbours).
    [00005.1] hydro_props_print: Hydrodynamic relative tolerance in h: 0.00010 (+/- 0.0144 neighbours).
    [00005.1] hydro_props_print: Hydrodynamic integration: CFL parameter: 0.1000.
    [00005.1] hydro_props_print: Hydrodynamic integration: Max change of volume: 1.40 (max|dlog(h)/dt|=0.112157).
    [00005.1] engine_init: Absolute minimal timestep size: 6.938894e-20
    [00005.1] engine_init: Minimal timestep size (on time-line): 7.450580e-11
    [00005.1] engine_init: Maximal timestep size (on time-line): 7.812500e-05
    [00005.1] engine_compute_next_snapshot_time: Next output time set to t=1.000000e-03.
    [00005.1] engine_estimate_nr_tasks: tasks per cell estimated as: 36, maximum tasks: 62208
    [00005.1] engine_init: runner 0 on cpuid=0 with qid=0.
    [00005.1] engine_init: runner 1 on cpuid=8 with qid=1.
    [00005.1] engine_init: runner 2 on cpuid=1 with qid=2.
    [00005.1] engine_init: runner 3 on cpuid=9 with qid=3.
    [00005.1] main: engine_init took 5.576 ms.
    [00005.1] main: Running on 6387423 gas particles, 0 star particles and 0 DM particles (0 gravity particles)
    [00005.1] main: from t=0.000e+00 until t=1.000e-02 with 4 threads and 4 queues (dt_min=1.000e-10, dt_max=1.000e-04)...
    [00005.1] engine_init_particles: Computing initial gas densities.
    [00005.1] space_rebuild: (re)building space
    [00005.1] space_regrid: h_max is 3.480e-01 (cell_min=6.989e-01).
    [00005.1] space_regrid: took 0.127 ms.
    [00005.1] space_parts_get_cell_index: took 53.611 ms.
    [00005.6] space_parts_sort: Sorting succeeded.
    [00005.6] space_parts_sort: took 499.139 ms.
    [00006.1] space_split: took 330.522 ms.
    [00006.1] space_rebuild: took 1024.499 ms.
    [00006.1] engine_estimate_nr_tasks: tasks per cell estimated as: 3, maximum tasks: 196938
    [00006.2] scheduler_reweight: took 3.295 ms.
    [00006.2] engine_maketasks: took 114.094 ms (including reweight).
    [00006.2] engine_marktasks: took 23.221 ms.
    [00006.3] engine_rebuild: took 1252.517 ms.
    [00006.3] engine_print_task_counts: Total = 91061  (per cell = 2)
    [00006.3] engine_print_task_counts: task counts are [ none=0 sort=1735 self=315 pair=12345 sub_self=1637 sub_pair=13433 init_grav=0 ghost=26926 extra_ghost=0 drift_part=0 drift_gpart=0 kick1=0 kick2=0 timestep=0 send=0 recv=0 grav_top_level=0 grav_long_range=0 grav_ghost=0 grav_mm=0 grav_down=0 cooling=0 sourceterms=0 skipped=34670 ]
    [00006.3] engine_print_task_counts: nr_parts = 6387423.
    [00006.3] engine_print_task_counts: nr_gparts = 0.
    [00006.3] engine_print_task_counts: nr_sparts = 0.
    [00006.3] engine_print_task_counts: took 2.237 ms.
    [00018.1] engine_launch: took 11590.850 ms.
    [00018.1] engine_init_particles: Converting internal energy variable.
    [00018.4] engine_init_particles: Running initial fake time-step.
    [00018.5] engine_marktasks: took 18.382 ms.
    [00018.6] engine_print_task_counts: Total = 91061  (per cell = 2)
    [00018.6] engine_print_task_counts: task counts are [ none=0 sort=0 self=630 pair=24690 sub_self=3274 sub_pair=26866 init_grav=0 ghost=26926 extra_ghost=0 drift_part=0 drift_gpart=0 kick1=1735 kick2=1735 timestep=1735 send=0 recv=0 grav_top_level=0 grav_long_range=0 grav_ghost=0 grav_mm=0 grav_down=0 cooling=0 sourceterms=0 skipped=3470 ]
    [00018.6] engine_print_task_counts: nr_parts = 6387423.
    [00018.6] engine_print_task_counts: nr_gparts = 0.
    [00018.6] engine_print_task_counts: nr_sparts = 0.
    [00018.6] engine_print_task_counts: took 2.896 ms.
    [00028.8] engine_launch: took 10131.473 ms.
    [00028.8] engine_collect_timestep_and_rebuild: took 0.143 ms.
    [00029.0] part_verify_links: All links OK
    [00029.0] engine_init_particles: took 23812.652 ms.
    [00029.1] engine_dump_snapshot: writing snapshot at t=0.000000e+00.
    [00029.1] write_output_single: Snapshot and internal units match. No conversion needed.
    [00031.2] engine_dump_snapshot: writing particle properties took 2161.787 ms.
    #   Step           Time      Time-step    Updates  g-Updates  s-Updates  Wall-clock time [ms]
           0   0.000000e+00   0.000000e+00    6387423          0          0             23812.652
    [00031.3] space_rebuild: (re)building space

    So somewhere in space_rebuild? If that helps

  • Author Developer

    OK, that's quite odd. I was able to run 1000 steps without issue. The only place that uses the new re-entrant mapper is in runner_do_unskip, so if it does block, it should do so only at the beginning of a regular timestep.

  • Does this branch depart from the latest master ? Or is it departing from an outdated commit ?

  • Author Developer

    Latest master.

  • Author Developer

    Had another long, hard look and tried a few things, but even on my laptop with four cores, EAGLE_12 is a bit faster with this branch... Since the threads spin while waiting for work, using more threads than physical cores will make it slower (working threads compete with spinning threads), so I didn't pursue that further.

    @jwillis, can you comment-out the if (e->verbose) at the bottom of engine_unskip and try both the commented-out threadpool_map code, as well as the new threadpool_rmap code below it? If there's a large difference between the two, could you also run both master and this branch in VTune to see where all the time is going?

    Cheers!

  • [00004.5] engine_policy: engine policies are [  steal  keep  numa_affinity  hydro  ]
    [00004.5] hydro_props_print: Equation of state: Ideal gas.
    [00004.5] hydro_props_print: Adiabatic index gamma: 1.666667.
    [00004.5] hydro_props_print: Hydrodynamic scheme: Gadget-2 version of SPH (Springel 2005) in 3D.
    [00004.5] hydro_props_print: Hydrodynamic kernel: Cubic spline (M4) with eta=1.234800 (48.00 neighbours).
    [00004.5] hydro_props_print: Hydrodynamic relative tolerance in h: 0.00010 (+/- 0.0144 neighbours).
    [00004.5] hydro_props_print: Hydrodynamic integration: CFL parameter: 0.1000.
    [00004.5] hydro_props_print: Hydrodynamic integration: Max change of volume: 1.40 (max|dlog(h)/dt|=0.112157).
    [00004.5] engine_init: Absolute minimal timestep size: 6.938894e-20
    [00004.5] engine_init: Minimal timestep size (on time-line): 7.450580e-11
    [00004.5] engine_init: Maximal timestep size (on time-line): 7.812500e-05
    [New Thread 0x7fffad430700 (LWP 53229)]
    [New Thread 0x7fffaca2f700 (LWP 53230)]
    [New Thread 0x7fffac02e700 (LWP 53231)]
    [New Thread 0x7fffab62d700 (LWP 53232)]
    [New Thread 0x7fffaac2c700 (LWP 53233)]
    [New Thread 0x7fffaa22b700 (LWP 53234)]
    [New Thread 0x7fffa982a700 (LWP 53235)]
    [00004.6] main: engine_init took 16.951 ms.
    [00004.6] main: Running on 6387423 gas particles, 0 star particles and 0 DM particles (0 gravity particles)
    [00004.6] main: from t=0.000e+00 until t=1.000e-02 with 4 threads and 4 queues (dt_min=1.000e-10, dt_max=1.000e-04)...
    [00004.6] engine_init_particles: Computing initial gas densities.
    [00017.1] engine_init_particles: Converting internal energy variable.
    [00017.5] engine_init_particles: Running initial fake time-step.
    #   Step           Time      Time-step    Updates  g-Updates  s-Updates  Wall-clock time [ms]
           0   0.000000e+00   0.000000e+00    6387423          0          0             21993.252
    ^C
    Program received signal SIGINT, Interrupt.
    threadpool_rchomp (tp=0x7fffffff8f30, rmap_function=Unhandled dwarf expression opcode 0xf3
    ) at threadpool.c:183
    183	      if (tp->rmap_waiting == 0) return;
    Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.166.el6_7.7.x86_64 numactl-2.0.9-2.el6.x86_64 zlib-1.2.3-29.el6.x86_64
    (gdb) info threads
      8 Thread 0x7fffa982a700 (LWP 53235)  0x000000346360c5ac in pthread_barrier_wait () from /lib64/libpthread.so.0
      7 Thread 0x7fffaa22b700 (LWP 53234)  0x000000346360c5ac in pthread_barrier_wait () from /lib64/libpthread.so.0
      6 Thread 0x7fffaac2c700 (LWP 53233)  0x000000346360c5ac in pthread_barrier_wait () from /lib64/libpthread.so.0
      5 Thread 0x7fffab62d700 (LWP 53232)  0x000000346360c5ac in pthread_barrier_wait () from /lib64/libpthread.so.0
      4 Thread 0x7fffac02e700 (LWP 53231)  threadpool_rchomp (tp=0x7fffffff8f30, tid=Unhandled dwarf expression opcode 0xf3
    ) at threadpool.c:183
      3 Thread 0x7fffaca2f700 (LWP 53230)  threadpool_rchomp (tp=0x7fffffff8f30, tid=Unhandled dwarf expression opcode 0xf3
    ) at threadpool.c:183
      2 Thread 0x7fffad430700 (LWP 53229)  threadpool_rchomp (tp=0x7fffffff8f30, tid=Unhandled dwarf expression opcode 0xf3
    ) at threadpool.c:183
    * 1 Thread 0x7ffff686bb20 (LWP 53145)  threadpool_rchomp (tp=0x7fffffff8f30, rmap_function=Unhandled dwarf expression opcode 0xf3
    ) at threadpool.c:183
  • The threads seem to be stuck in pthread_barrier_wait Pedro.

  • Author Developer

    Cool, thanks for tracking that down! It's weird that they're all stuck there, I'll have a closer look as soon as I can.

  • Pedro Gonnet Added 1 commit:

    Added 1 commit:

    • 536f0d48 - mark the rmap_data as volatile, make sure the loop in threadpool_rchomp actually…
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading