Strange memory errors on nodes where nr_cores is not power of 2
I tried running swift on 2 of our local computing nodes, that have 24 cores. The program compiles nicely, but when I try running, it crashes:
...
main: nr_nodes is 1.
[000] engine_init: cpu map is [ 0 12 6 18 3 9 15 21 1 4 7 10 13 16 19 22 0 1 2 3 4 5 6 7 ].
engine.c:engine_init():2084: SWIFT was not compiled with MPI support.
Aborted (core dumped)
This does make sense, since I always configure with the --disable-mpi
flag. However, when nr_nodes=1
, this message should not occur. Seems like a memory issue.
When I run the code with the address-sanitizer, it crashes with a SEGFAULT on
engine.c:2062: e->s = s;
This does not make any sense. The engine is allocated at the start of test.c:main()
, so this memory should be allocated...
The totally funny part is that we also have 2 nodes with exactly the same system configuration, but with 16 and 32 cores instead of 24, and there everything works without any problem.
You can also see that the cpu map for the 24 core nodes does look strange:
[ 0 12 6 18 3 9 15 21 1 4 7 10 13 16 19 22 0 1 2 3 4 5 6 7 ]
So my best guess is that there is an issue when running swift on nodes with a number of cores that is not a power of 2...