Merged Pedro's dynamic unlock branch

b788cc00 · Matthieu Schaller · 1c1e2d4d · 62962a2f · b788cc00 · b788cc00
Commit b788cc00 authored 9 years ago by Matthieu Schaller
--- a/.gitignore
+++ b/.gitignore
@@ -25,6 +25,9 @@ examples/swift_mindt
 examples/swift_mindt_mpi
 examples/swift_mpi

+tests/testVectorize
+tests/brute_force.dat
+tests/swift_dopair.dat
 tests/testGreetings
 tests/testReading
 tests/input.hdf5

--- a/configure.ac
+++ b/configure.ac
@@ -260,8 +260,10 @@ fi
 AM_CONDITIONAL([HAVEPARALLELHDF5],[test "$have_parallel_hdf5" = "yes"])

 # Check for setaffinity.
-AC_CHECK_FUNC( pthread_setaffinity_np , AC_DEFINE([HAVE_SETAFFINITY],[true],
+AC_CHECK_FUNC(pthread_setaffinity_np, AC_DEFINE([HAVE_SETAFFINITY],[true],
    [Defined if pthread_setaffinity_np exists.]) )
+AM_CONDITIONAL(HAVESETAFFINITY,
+    [test "$ac_cv_func_pthread_setaffinity_np" = "yes"])

 # Check for timing functions needed by cycle.h.
 AC_HEADER_TIME

--- a/doc/RTD/DeveloperGuide/AddingTasks/addingtasks.rst
+++ b/doc/RTD/DeveloperGuide/AddingTasks/addingtasks.rst
+.. _NewTask:
+
+How to add a new task to SWIFT?
+=================================
+.. highlight:: c
+
+
+
+.. toctree::
+   :maxdepth: 0
+
+This tutorial will step through how to add a new task to swift. First we will go through the 
+idealology of adding a new task to SWIFT. This will be followed by an example of how to add a task
+for an imposed external gravitational field to SWIFT and a task to include "cooling" to the gas particles.
+
+In the simplest case adding a new tasks requires changes to five files, namely:
+
+* task.h
+* cell.h
+* timers.h
+* task.c
+* engine.c
+
+Further, implementation details of what the task will then do should be added to another file
+(for example runner_myviptask.c) which will contain the actual task implementation.
+
+So now lets look at what needs to change in each of the files above, starting with task.h
+
+--------------
+**task.h**
+--------------
+Within task.h there exists a structure of the form:: 
+
+    /* The different task types. */
+    enum task_types {
+        task_type_none = 0,
+        task_type_sort,
+        task_type_self,
+        task_type_pair,
+	      .
+	      .
+	      .
+	task_type_my_new_task,
+	task_type_psort,
+	task_type_split_cell,
+	task_type_count
+    };
+
+Within this task structure your new task should be added. Add the task entry anywhere in the struct before the
+task_type_count member. This last entry is used to count the number of tasks and must always be the last entry.
+
+--------------
+**task.c**
+--------------
+
+Within task.c the addition of the new task type must be include in a character list (which at the moment is only 
+used for debugging purposes)::
+
+     /* Task type names. */
+     const char *taskID_names[task_type_count] = {
+     "none",  "sort",    "self",    "pair",    "sub",
+     "ghost", "kick1",   "kick2",   "send",    "recv",
+     "link",  "grav_pp", "grav_mm", "grav_up", "grav_down",
+     "my_new_task", "psort", "split_cell"};
+
+The new task type should be added to this list in the same order as it was added within the task_types struct in 
+task.h
+
+--------------
+**cell.h**
+--------------
+
+cell.h contains pointers to all of the tasks associated with that cell. You must include your new task type 
+here e.g::
+
+   struct task *my_new_task;
+
+--------------
+**timers.h**
+--------------
+
+Within timers.h is an enumerated list of timers associated with each task. The timers measure the time required 
+to execute a given task and this information is used in improve scheduling the task in future iterations::
+
+    /* The timers themselves. */
+    enum {
+      timer_none = 0,
+      timer_prepare,
+      timer_kick1,
+           .
+	   .
+	   .
+      timer_new_task,
+      timer_step,
+      timer_count,
+    };
+
+--------------
+**engine.c**
+--------------
+
+Finally, in engine.c the new task is added so that the scheduler knows to include the task in the list of tasks 
+to be scheduled. Knowing where to add the task in engine.c is a little bit more difficult. This will depend on 
+the type of task involved and whether it is a task that acts only on a individual particle independent of other 
+particles (e.g. a cooling a task) or whether the task depends on other tasks (e.g. density, force or feedback). 
+
+If we assume that the task is a particle only task then the first place to modify is the engine_mkghosts() 
+function. Within this function the new task must be added to the list of tasks 
+(within the c->nodeID == e->nodeID if clause)::
+
+       /* Generate the external gravity task*/
+         c->my_new_task = scheduler_addtask(s, task_type_my_new_task, task_subtype_none, 0, 0,
+                                   c, NULL, 0);
+
+
+That's pretty much it - but what about dependencies and conflicts?
+Remember SWIFT automatically handles conflicts (by understanding which tasks need to write to the same data) so 
+you (the developer) don't need to worry about conflicts. Dependencies do however need to be managed and they will
+be task specific. The following two examples, implementing cooling and an imposed external gravitational field
+will illustrate how dependencies should be treated. 
+
+
+Examples:
+
+:ref:`ExternalGravityExample`
+
+:ref:`CoolingExample`
--- a/doc/RTD/DeveloperGuide/Examples/Cooling/cooling.rst
+++ b/doc/RTD/DeveloperGuide/Examples/Cooling/cooling.rst
+.. _CoolingExample:
+
+Cooling Example
+--------------------------
+
+An example of how to implement a particle cooling task in SWIFT
+===================================================================
+
+
+.. toctree::
+   :maxdepth: 1
--- a/doc/RTD/DeveloperGuide/Examples/ExternalGravity/externalgravity.rst
+++ b/doc/RTD/DeveloperGuide/Examples/ExternalGravity/externalgravity.rst
+.. _ExternalGravityExample:
+
+External Gravity Task Example
+----------------------------------
+
+An example of how to implement an external gravity task in SWIFT
+=====================================================================
+
+An external gravitational field can be imposed in SWIFT to mimic self-gravity. This is done by assigning
+a gravitational force that falls as $1/ r^2$ (mathjax support to be included).
+
+In order to do this we update the files as described in :ref:`NewTask`. For the specific case of adding an 
+external graviational field the additions are as follows:
+
+
+--------------
+**task.h**
+--------------
+
+Code (snapshot Nov 2015)::
+
+     /* The different task types. */
+     enum task_types {
+        task_type_none = 0,
+     	task_type_sort,
+     	task_type_self,
+     	task_type_pair,
+     	task_type_sub,
+     	task_type_ghost,
+     	task_type_kick1,
+     	task_type_kick2,
+     	task_type_send,
+     	task_type_recv,
+     	task_type_link,
+     	task_type_grav_pp,
+     	task_type_grav_mm,
+     	task_type_grav_up,
+     	task_type_grav_down,
+     	**task_type_grav_external,**
+     	task_type_psort,
+     	task_type_split_cell,
+     	task_type_count
+     };
+
+Task of type - task_type_grav_external - added to list of tasks.
+
+--------------
+**task.c**
+--------------
+
+Code (snapshot Nov 2015)::
+
+       /* Task type names. */
+       const char *taskID_names[task_type_count] = {
+           "none",  "sort",    "self",    "pair",    "sub",
+    	   "ghost", "kick1",   "kick2",   "send",    "recv",
+    	   "link",  "grav_pp", "grav_mm", "grav_up", "grav_down", "grav_external",
+    	   "psort", "split_cell"
+        };
+
+Task added to list of task names (used only for debugging purposed).
+
+
+--------------
+**cell.h**
+--------------
+
+Code (snapshot Nov 2015)::
+
+     /* The ghost task to link density to interactions. */
+        struct task *ghost, *kick1, *kick2, *grav_external;
+
+Struture of type "task" declared (or pointer to a task at least). 
+
+
+
+--------------
+**timers.h**
+--------------
+
+Code (snapshot Nov 2015)::
+
+    /* The timers themselves. */
+    enum {
+      timer_none = 0,
+      timer_prepare,
+      timer_kick1,
+      timer_kick2,
+      timer_dosort,
+      timer_doself_density,
+      timer_doself_force,
+      timer_doself_grav,
+      timer_dopair_density,
+      timer_dopair_force,
+      timer_dopair_grav,
+      timer_dosub_density,
+      timer_dosub_force,
+      timer_dosub_grav,
+      timer_dopair_subset,
+      timer_doghost,
+      timer_dograv_external,
+      timer_gettask,
+      timer_qget,
+      timer_qsteal,
+      timer_runners,
+      timer_step,
+      timer_count,
+      };
+
+The timer list is updated to include a timer task. 
+
+
+--------------
+**engine.c**
+--------------
+
+Code (snapshot Nov 2015)::
+
+    void engine_mkghosts(struct engine *e, struct cell *c, struct cell *super) {
+
+        int k;
+  	struct scheduler *s = &e->sched;
+
+  	/* Am I the super-cell? */
+  	if (super == NULL && c->nr_tasks > 0) {
+
+           /* Remember me. */
+    	   super = c;
+
+    	   /* Local tasks only... */
+           if (c->nodeID == e->nodeID) {
+
+               /* Generate the external gravity task*/
+      	       c->grav_external = scheduler_addtask(s, task_type_grav_external, task_subtype_none, 0, 0,
+                                   c, NULL, 0);
+
+ 	       /* Enforce gravity calculated before kick 2 */
+      	       scheduler_addunlock(s, c->grav_external, c->kick2);
+    	       }
+	   }
+     }
+
+
+The first function call adds the task to the scheduler. The second function call takes care of the dependency 
+involved in imposing an external gravitational field. These two functions are worth considering due to their 
+obvious importance. 
+
+
+
+The function prototype for the addtask function is (**found in scheduler.c**)::
+
+        struct task *scheduler_addtask(struct scheduler *s, int type, int subtype,
+                               int flags, int wait, struct cell *ci,
+                               struct cell *cj, int tight) {
+
+This function adds a task to the scheduler. In the call to this function in engine.c we used the actual 
+parameters **s** for the scheduler, **task_type_grav_external** for the (task) type, task_subtype_none for 
+the (task) subtype, zeros for the flags and wait parameters, **c** for the pointer to our cell, NULL for the
+cell we interact with since there is none and 0 for the tight parameter. 
+
+The function prototype for the addunlock function is(**found in scheduler.c**)::
+
+        void scheduler_addunlock(struct scheduler *s, struct task *ta,
+                         struct task *tb) {
+
+This function signals when the unlock a certain task. In our case we use the external gravity task to unlock the 
+kick2 task - i.e. kick2 depends on external gravity. So when calling the addunlock function the 
+order is the **ta** task should be the task to unlock and **tb** should the task that does the unlocking.
+
+
+--------------
+**runner.c**
+--------------
+
+In runner.c the implementation of the external gravity task is taken care of. The function prototype is::
+
+        void runner_dograv_external(struct runner *r, struct cell *c) {
+
+The function takes a pointer to a runner struct and a pointer to the cell struct. The entire function call is::
+
+
+
+       void runner_dograv_external(struct runner *r, struct cell *c) {
+
+            struct part *p, *parts = c->parts;
+	    float rinv;
+  	    int i, ic, k, count = c->count;
+  	    float dt_step = r->e->dt_step;
+  	    TIMER_TIC
+
+  	    /* Recurse? */
+  	    if (c->split) {
+    	       for (k = 0; k < 8; k++)
+      	           if (c->progeny[k] != NULL) runner_dograv_external(r, c->progeny[k]);
+    		   return;
+            }		   
+
+  	    /* Loop over the parts in this cell. */
+  	    for (i = 0; i < count; i++) {
+
+	        /* Get a direct pointer on the part. */
+	 	p = &parts[i];
+	 
+	        /* Is this part within the time step? */
+	 	if (p->dt <= dt_step) {
+		   rinv = 1 / sqrtf((p->x[0])*(p->x[0]) + (p->x[1])*(p->x[1]) + (p->x[2])*(p->x[2]));
+		   for(ic=0;ic<3;ic++){
+		       p->grav_accel[ic] = - const_G * (p->x[ic]) * rinv * rinv * rinv;
+		   }
+	        }
+            }
+            TIMER_TOC(timer_dograv_external);
+        }
+
+
+The key component of this function is the calculation of **rinv** and then the imposition of the 
+**grav_accel** to this particle. **rinv** is calculated assuming the centre of the gravitational 
+potential lies at the origin. The acceleration of each particle then is calculated by multiplying
+the graviational constant by the component of the position along one axis divided by R^3. The 
+gravitational acceleration is then added to the total particle acceleration **a**.
+
+
+
+.. toctree::
+   :maxdepth: 1
--- a/doc/RTD/DeveloperGuide/developerguide.rst
+++ b/doc/RTD/DeveloperGuide/developerguide.rst
+.. _DeveloperGuide:
+
+A Developer Guide for SWIFT
+=================================
+
+
+.. toctree::
+   :maxdepth: 1
+
+   AddingTasks/addingtasks.rst
+   Examples/ExternalGravity/externalgravity.rst
+   Examples/Cooling/cooling.rst
+   
--- a/doc/RTD/FAQ/index.rst
+++ b/doc/RTD/FAQ/index.rst
+.. _GettingStarted:
+
+Frequently Asked Questions
+==========================
+
+1)
+
+2)
+
+3)
+
+4)
+
+5)
+
+.. toctree::
+   :maxdepth: 1
--- a/doc/RTD/Innovation/AsynchronousComms/index.rst
+++ b/doc/RTD/Innovation/AsynchronousComms/index.rst
+.. _GettingStarted:
+
+Asynchronous Communication
+=================================
+
+
+.. toctree::
+   :maxdepth: 1
--- a/doc/RTD/Innovation/Caching/index.rst
+++ b/doc/RTD/Innovation/Caching/index.rst
+.. _GettingStarted:
+
+Caching
+=================================
+
+
+.. toctree::
+   :maxdepth: 1
--- a/doc/RTD/Innovation/HeirarchicalCellDecomposition/InitialDecomp.png
+++ b/doc/RTD/Innovation/HeirarchicalCellDecomposition/InitialDecomp.png
--- a/doc/RTD/Innovation/HeirarchicalCellDecomposition/SplitCell.png
+++ b/doc/RTD/Innovation/HeirarchicalCellDecomposition/SplitCell.png
--- a/doc/RTD/Innovation/HeirarchicalCellDecomposition/SplitPair.png
+++ b/doc/RTD/Innovation/HeirarchicalCellDecomposition/SplitPair.png
--- a/doc/RTD/Innovation/HeirarchicalCellDecomposition/index.rst
+++ b/doc/RTD/Innovation/HeirarchicalCellDecomposition/index.rst
+.. _GettingStarted:
+
+Heirarchical Cell Decomposition
+=================================
+
+Most SPH codes rely on spatial trees to decompose the simulation space. This decomposition makes neighbour-finding simple, at the cost of computational efficiency. Neighbour-finding using the tree-based approach has an average computational cost of ~O(logN) and has a worst case behaviour of ~O(N\ :sup:`2/3`\), both cases grow with the total number of particles N. SWIFT's neighbour-finding algorithm however, has a constant scaling of ~O(1) per particle. This results from the way SWIFT decomposes its domain.
+
+The space is divided up into a grid of rectangular cells with an edge length that is greater than or equal to the maximum smoothing of any particle in the simulation, h\ :sub:`max`\  (See :ref:`cell_decomp`). 
+
+.. _cell_decomp:
+.. figure:: InitialDecomp.png
+   :scale: 40 %
+   :align: center
+   :figclass: align-center
+
+   Figure 1: 2D Cell Decomposition
+
+In this initial decomposition if a particle p\ :sub:`j`\  is within range of particle p\ :sub:`i`\, both will either be in the same cell (self-interaction) or in neighbouring cells (pair-interaction). Each cell then only has to compute its self-interactions and pair-interactions for each of its particles.   
+
+The best case scenario is when each cell only contains particles that have a smoothing length equal to the cell edge length and even then, for any given particle p\ :sub:`i`\  it will only interact with 16% of the total number of particles in the same cell and surrounding neighbours. This percentage decreases if the cell contains particles whose smoothing length is less than the cell edge length. Therefore the cell decomposition needs to be refined recursively by bisecting a cell along each dimension if the following conditions are met:
+
+1) The cell contains more than a minimum number of particles
+
+2) The smoothing length of a reasonable number of particles within a cell is less than half the cell's edge length
+
+.. _split_cell:
+.. figure:: SplitCell.png
+   :scale: 40 %
+   :align: center
+   :figclass: align-center
+
+   Figure 2: Refined Cell Decomposition
+
+Once a cell has been split its self-interactions can be decomposed into self-interactions of its sub-cells and corresponding pair interactions (See :ref:`split_cell`). If a pair of split cells share a boundary with each other and all particles in both cells have a smoothing length less than the cell edge length, then their pair-interactions can also be split up into pair-interactions of the sub-cells spanning the boundary (See :ref:`split_pair`).
+
+.. _split_pair:
+.. figure:: SplitPair.png
+   :scale: 40 %
+   :align: center
+   :figclass: align-center
+
+   Figure 3: Split Cell Pair Interactions
+
+When the cells' particle interactions are split up between self-interactions and pair-interactions, any two particles who are within range of each other will either share a cell for which a cell self-interaction is defined or they will be located in neighbouring cells which share a cell pair-interaction. Therefore to determine whether particles are within range of each other it is sufficient to traverse the list of self-interactions and pair-interactions and compute the interactions therein.  
+
+.. toctree::
+   :maxdepth: 1
--- a/doc/RTD/Innovation/HybridParallelism/index.rst
+++ b/doc/RTD/Innovation/HybridParallelism/index.rst
+.. _GettingStarted:
+
+Hybrid Shared/Distributed-Memory Parallelism
+============================================
+
+
+.. toctree::
+   :maxdepth: 1
--- a/doc/RTD/Innovation/TaskBasedParallelism/OMPScaling.png
+++ b/doc/RTD/Innovation/TaskBasedParallelism/OMPScaling.png
--- a/doc/RTD/Innovation/TaskBasedParallelism/TasksExample.png
+++ b/doc/RTD/Innovation/TaskBasedParallelism/TasksExample.png
--- a/doc/RTD/Innovation/TaskBasedParallelism/TasksExampleConflicts.png
+++ b/doc/RTD/Innovation/TaskBasedParallelism/TasksExampleConflicts.png
--- a/doc/RTD/Innovation/TaskBasedParallelism/index.rst
+++ b/doc/RTD/Innovation/TaskBasedParallelism/index.rst
+.. _GettingStarted:
+
+Task Based Parallelism
+=================================
+
+One of biggest problems faced by many applications when running on a shared memory system is *load imbalance*, this occurs when the work load is not evenly distributed across the cores. The most well known paradigm for handling this type of parallel architecture is OpenMP, in which the programmer applies annotations to the code to indicate to the compiler which sections should be executed in parallel. If a ``for`` loop has been identified as a parallel section, the iterations of the loop are split between available threads, each executing on a single core. Once all threads have terminated the program becomes serial again and only executes on single thread, this technique is known as branch-and-bound parallelism, shown in :ref:`branch_and_bound`. Unfortunately, this implementation generally leads to low performance and bad scaling as you increase the number of cores. 
+
+.. _branch_and_bound:
+.. figure:: OMPScaling.png
+   :scale: 40 %
+   :align: center
+   :figclass: align-center
+
+   Figure 1: Branch-and-bound parallelism
+
+Another disadvantage with this form of shared-memory parallelism is that there is no implicit handling of concurrency issues between threads. *Race conditions* can occur when two threads attempt to modify the same data simultaneously, unless explicit *critical* regions are defined which prevent more than one thread executing the same code at the same time. These regions degrade parallel performance even further.
+
+A better way to exploit shared memory systems is to use an approach called *task-based parallelism*. This method describes the entire computation in a way that is more inherently parallelisable. The simulation is divided up into a set of computational tasks which are **dynamically** allocated to a number of processors. In order to ensure that the tasks are executed in the correct order and to avoid *race conditions*, *dependencies* between tasks are identified and strictly enforced by a task scheduler. A Directed Acyclic Graph (DAG) illustrates how a set of computational tasks link together via dependencies. Processors can traverse the graph in topological order, selecting and executing tasks that have no unresolved dependencies or waiting until tasks become available. This selection process continues for all processors until all tasks have been completed. An example of a DAG can be seen in :ref:`DAG`, the figure represents tasks as circles, labelled A-E, and dependencies as arrows. Tasks B and C both depend on A, and D depends on B, whereas A and E are independent tasks. Therefore on a shared memory system, tasks A and E could be executed first. Once task A is finished, tasks B and C become available for execution as their dependencies to A have been resolved. Finally, task D can be executed after task B has completed. 
+
+.. _DAG:
+.. figure:: TasksExample.png
+   :scale: 40 %
+   :align: center
+   :figclass: align-center
+
+   Figure 2: Tasks and Dependencies
+
+The main advantages of using this approach are as follows:
+
+* The order in which the tasks are processed is completely dynamic and adapts automatically to load imbalances.
+* If the dependencies and conflicts are specified correctly, there is no need for expensive explicit locking, synchronisation or atomic operations, found in OpenMP to deal with most concurrency problems.
+* Each task has exclusive access to the data it is working on, thus improving cache locality and efficiency.  
+
+SWIFT modifies the task-based approach by introducing the concept of *conflicts* between tasks. Conflicts occur when two tasks operate on the same data, but the order in which the operations occur does not matter. :ref:`task_conflicts` illustrates tasks with conflicts, where there is a conflict between tasks B and C, and tasks D and E. In a parallel setup, once task A has finished executing, if one processor selects task B, then no other processor is allowed to execute task C until task B has completed, or vice versa. Without this modification, other task-based models used dependencies to model conflicts between tasks, which introduces an artificial ordering between tasks and imposes unnecessary constraints on the task scheduler.  
+
+.. _task_conflicts:
+.. figure:: TasksExampleConflicts.png
+   :scale: 40 %
+   :align: center
+   :figclass: align-center
+
+   Figure 3: Tasks and Conflicts
+
+.. toctree::
+   :maxdepth: 1
--- a/doc/RTD/Innovation/TaskGraphPartition/index.rst
+++ b/doc/RTD/Innovation/TaskGraphPartition/index.rst
+.. _GettingStarted:
+
+Task Graph Partition
+=================================
+
+
+.. toctree::
+   :maxdepth: 1
--- a/doc/RTD/Innovation/Vectorisation/index.rst
+++ b/doc/RTD/Innovation/Vectorisation/index.rst
+.. _GettingStarted:
+
+Vectorisation
+=================================
+
+
+.. toctree::
+   :maxdepth: 1