Gpart mpi io
We can now do i/o with multiple types using the serial version. I may implement a parallel-hdf5 version later but as we (I) still need to work on i/o not sure that is the highest priority.
The code crashes later in the exchange of strays but that's what @nnrw56 is working on.
Merge request reports
Activity
The old SodShock will no longer run:
> mpirun -np 2 ../swift_mpi -t 4 -f sodShock.hdf5 -m 0.01 -w 5000 -c 1. -d 1e-7 -e 0.01 . . . [0000] [00000.2] engine_init: Minimal timestep size (on time-line): 5.960464e-08 [0000] [00000.2] engine_init: Maximal timestep size (on time-line): 7.812500e-03 [0000] [00000.4] engine_split: Re-allocating parts array from 512064 to 614476. HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 47863001370560: #000: ../../../src/H5G.c line 463 in H5Gopen2(): unable to open group major: Symbol table minor: Can't open object #001: ../../../src/H5Gint.c line 320 in H5G__open_name(): group not found major: Symbol table minor: Object not found #002: ../../../src/H5Gloc.c line 430 in H5G_loc_find(): can't find object major: Symbol table minor: Object not found #003: ../../../src/H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed major: Symbol table minor: Object not found #004: ../../../src/H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed major: Symbol table minor: Callback failed #005: ../../../src/H5Gloc.c line 385 in H5G_loc_find_cb(): object 'PartType1' doesn't exist major: Symbol table minor: Object not found [0000] [00001.1] serial_io.c:write_output_serial():753: Error while opening particle group /PartType1. application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
Added 1 commit:
- 6bf681f3 - Better check for whether or not to create the HDF5 groups for a given particle type
Yes, that is expected. Since the gparts are not (re-)distributed correctly over MPI, we end up with incorrect number of particles on each rank and inconsistencies between parts and gparts. If you don't redistribute everything works.
But I should have been clear about the fact that this only works in conjunction with @nnrw56's work on the exchanges.
Ran the test above and got:
. . . [0000] [00000.4] engine_split: Re-allocating parts array from 512064 to 614476. [0000] [00001.9] main: Running on 1024128 gas particles and 0 DM particles until t=1.000e+00 with 4 threads and 4 queues (dt_min=1.000e-07, dt_max=1.000e-02)... [0000] [00001.9] engine_init_particles: Initialising particles [0000] [00001.9] engine.c:engine_exchange_strays():659: Do not have a proxy for the requested nodeID 0 for part with id=47748781199296, x=[2.980766e-01,4.629061e-02,2.251668e-02]. application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
It fails because
engine_redistribute()
does not exchange gparts.We fail before actually reaching the tasks part of the code. So the updates to
exchange_strays()
won't solve the problem.I hacked something to make
engine_redistribute()
work with gparts and then the i/o works and we die later on in the code. The problem with my implementation ofengine_redistribute()
is that I don't deal with the linking of part-gpart. We need to agree on how to do this first.Edited by Matthieu SchallerAdded 1 commit:
- 669d0999 - Temporary fix to preserve master
Added 1 commit:
- ae787120 - Added one command-line option '-g' to switch on gravity (i.e. not de-allocate th…
Actually, I have modified the
main()
by adding a command-line option.- If you run normally, the gparts get de-allocated and we have the old behaviour with everything working smoothly.
- If you add
-g
, you switch on gravity, which will preserve the gparts and add the gravity policy to the mask.
This latter option should allow @tt and @jregan to keep working on their branch after they have merged master into theirs.
383 389 N_total[0] = Ngas; 384 390 N_total[1] = Ngpart - Ngas; 385 391 message("Read %lld gas particles and %lld DM particles from the ICs", 386 N_total[0], N_total[1]); 392 N_total[0], N_total[1]); 387 393 #endif 388 394 395 /* MATTHIEU: Temporary fix to preserve master */ 396 if (!with_gravity) { 397 free(gparts); 383 389 N_total[0] = Ngas; 384 390 N_total[1] = Ngpart - Ngas; 385 391 message("Read %lld gas particles and %lld DM particles from the ICs", 386 N_total[0], N_total[1]); 392 N_total[0], N_total[1]); 387 393 #endif 388 394 395 /* MATTHIEU: Temporary fix to preserve master */ 396 if (!with_gravity) { 397 free(gparts); 383 389 N_total[0] = Ngas; 384 390 N_total[1] = Ngpart - Ngas; 385 391 message("Read %lld gas particles and %lld DM particles from the ICs", 386 N_total[0], N_total[1]); 392 N_total[0], N_total[1]); 387 393 #endif 388 394 395 /* MATTHIEU: Temporary fix to preserve master */ 396 if (!with_gravity) { 397 free(gparts); 383 389 N_total[0] = Ngas; 384 390 N_total[1] = Ngpart - Ngas; 385 391 message("Read %lld gas particles and %lld DM particles from the ICs", 386 N_total[0], N_total[1]); 392 N_total[0], N_total[1]); 387 393 #endif 388 394 395 /* MATTHIEU: Temporary fix to preserve master */ 396 if (!with_gravity) { 397 free(gparts); Added 1 commit:
- 1a819dee - Unlink the gparts
383 390 N_total[0] = Ngas; 384 391 N_total[1] = Ngpart - Ngas; 385 392 message("Read %lld gas particles and %lld DM particles from the ICs", 386 N_total[0], N_total[1]); 393 N_total[0], N_total[1]); 387 394 #endif 388 395 396 /* MATTHIEU: Temporary fix to preserve master */ 397 if (!with_gravity) { @nnrw56 Are you looking into adapting
engine_redistribute()
? Or should I give it a go ?mentioned in commit 34e76452
mentioned in issue #127 (closed)
mentioned in issue #130 (closed)