Add option for particle type level default output behaviour to new output list scheme

We had planned to implement something similar, All_X, to prevent the writing of the particle group completely: #680 (closed). Let me think about whether this is preferrable - it does lead to, sometimes, incorrect values of NumPart_ThisFile and the dumping of unecessary cell metadata (which can be quite large relative to e.g. the black holes).

This or #680 (closed) are quite similar in spirit.

I would worry here that the Default in the name is similar to the Default used for the case where no output selection is used.

Also, indeed, it would be good to adjust the meta-data to save up a bit of space but also prevent problems with tools reading the data. For instance, the number of particles in the header should reflect the fact that a whole type of particles was cancelled.

Ah, I hadn't seen #680 (closed)... A few thoughts:

(i) I have no strong feelings whatsoever about the naming. Agreed with Matthieu that the double-use of Default may not be the most optimal choice.

(ii) I think one difference to #680 (closed) is that this explicitly does not override individual field selectors. It is meant only as a default setting; this would also make it easy to, say, only write gas particle coordinates (set Default_Gas: off, Coordinates_Gas: on).

(iii) Agreed, NumPart_ThisFile should be adjusted and the cell metadata reduced appropriately whenever all output fields are cancelled. I guess the question is how do we know that (in this approach) there is no additional xxx_Gas: on or that (in the current master) no field name has been omitted?

BTW, there is definitely significant overhead for black hole only dumps at the moment. Typically, transcribing all the black hole data into a single file reduces the size by a factor of ~3 or so...

I see, this is not a full overwrite but changes the default assumed behaviour from "on" to "off". Users can nevertheless re-activate a specific field. That has the drawback of working out whether anything of a given particle type is written a lot harder.

So, the question is whether this is a feature we want or is the simpler full-field on/off of #680 (closed) enough for normal use cases?

I'd argue it is a desirable feature, because it also makes writing limited-information snipshots much cleaner. I am having a look right now at determining whether a particle type is completely disabled.

There is a function in common_io that checks the fields for validity on startup that you may find instructive.

Thanks Josh! A couple of consistency questions:

(i) Are the entries from the new output selection YAML file tested for consistency somewhere?

(ii) In io_check_output_fields(), the code currently raises an error if there is a SelectOutput: section in the (main) YAML file; however the comment (and subsequent continue;) suggest that this should maybe just have been a warning...?

Sorry, ignore point (i) above. Obviously the whole point of this function is now to check this YAML file, rather than the standard YAML file... I guess this also means in (ii) above, the error should be an error (so no need for continue; afterwards).

I would think it is then the most straightforward option to just integrate the test for 'particle type is completely disabled' directly into io_check_output_fields(). My suggestion would be to make this function add one new parameter per section to the output_options->select_output structure, e.g. Snapshot:WriteAnyFields_Gas, set to 1 if there is at least one field to be written, and 0 if it is completely disabled. We could then quickly check at the beginning of the ptype-specific section in the output functions whether the particle type is fully disabled, and skip it (analogous to what happens already now if there are zero particles of this type).

One problem we have here is that this requires that at least one particle of each type exists, as far as I can tell. I think we can get away with an array in output_options->select_output of the length of the ptype enum.

But there would have to be a separate array per output 'class' (snapshot, snipshot, ...), or? Otherwise I agree, this is better.

Also, is there a reason for using &field_value[0] (in the call to parser_get_param_string() in io_check_output_fields()), rather than just field_value? I had thought the two were equivalent, but I may be wrong.

Just me being stupid :)

The problem with this is that it requires different choices based on whether we are using an output list or not.

So in the case of no output list, we know there is only one type of output selection.

This means we could have output_options->should_write_part_type or something as a list of bools that tell us whether or not that class of particles should be written ever.

In the case where we have an output list, then we need an array of these options, so something like output_options->should_write[output_type][part_type] Which is more messy, and duplicates the behaviour.

This is added to by the fact that output_options and output_list do not know about each other. I suppose what we could do is have a pointer to the output_options in the output_list, and store the information about the particle selections in output_options instead of output_list.

So what happens when there is no output list? I'd have thought that all outputs are set to the Default type in this case.

That's correct. There is a buffer, initially filled with Default, that gets overwritten if the output list is used. See e.g. https://gitlab.cosma.dur.ac.uk/swift/swiftsim/blob/master/src/single_io.c#L820

Can we not add a function that gets called at the start of the i/o to check whether an entire field is cancelled based on the current output options? Then set the NumPart of that ptype to 0 (if need be) which will then entirely skip the particle in the snapshot.

added 4 commits

02f4c896 - Update to determine a priori whether any fields of a ptype should be written.
f1928034 - Reset debug attempt.
e4266706 - Fix incorrect string length.
f2517f5c - Removed debugging messages

Compare with previous version

The last updates implements something that should work in principle, but I still need to test it. Note that it does not currently work in practice because of an issue in interfacing with the parser (under investigation).

The parser bug should now be fixed in master.

Add option for particle type level default output behaviour to new output list scheme

Merge request reports

Activity