Skip to content

Draft: Runtime dependency check

Bert Vandenbroucke requested to merge runtime-dependency-check into master

I will just leave this here with some extra documentation so that I can pick this up when I get back.

This MR implements the long needed runtime dependency check that checks, at runtime, whether any of the task dependencies get violated. In principle, the task dependency graph looks something like this (for a single cell):

mini_graph_full

But, because of inactive particles, it sometimes looks like this in practice:

mini_graph_partial

Not unskipping tasks is equivalent to graying out parts of the graph, which also grays out (and effectively removes) the corresponding dependencies. In the example, task 2a (this could be a pair interaction between an active and inactive cell) is effectively loose (i.e. has no dependencies), so it can be executed before/after or (if the cell is not locked) while task4-6 run. This is wrong and is what we somehow want to detect at runtime.

The runtime dependency check starts from the basic assumption that the task graph for steps where all particles are active (as constructed in engine_maketasks) is complete and needs to be satisfied at all times. Based on this, we can construct a dependency mask for every task that marks all the tasks that need to run before that task, like this mask for task 5a:

mini_graph_check

If every cell stores a counter with the number of tasks of each type (and subtype) it already processed (this counter is actually already there if you use debugging checks - it is currently not used), then we could check that the counters marked by a tasks mask are all zero when the task starts. If this is not the case, then we know the graph was violated, and we detected a runtime dependency error.

To implement this kind of check, we have to make sure that each task has exactly one dependency mask. This is only true if the global graph (including all tasks at all levels and tasks involving proxies) is a Directed Acyclic Graph (DAG). This MR therefore includes some changes that convert the graph into a DAG by removing a circular dependency between the stars sort, stars density tasks, stars receive and stars sort. Cyclic dependencies between a task and itself (because of level recursion) are allowed, since those are easy to detect.

The second step (currently only partially implemented) is to convert the task dependencies as created during engine_maketasks into a dependency mask for every task. I have a simple Python code that can do this from the regular dependency_graph.csv output. However, when I try to apply the same logic to the actual task dependencies, I end up missing some dependencies because of inter-level dependencies. This can be fixed by first creating the equivalent of the global graph and then running the algorithm on that. So that is the next step.

In the current implementation, the task masks are constructed and written to a file for inspection, but they are not actually used. Once they are complete, they need to be stored in the engine, so that they are available to the runners at the start of a task in runner_main. Once that is done, the cell task counters need to be changed a bit so that they are updated recursively. With that, the runtime dependency check should work.

Merge request reports