diff --git a/doc/RTD/source/ParameterFiles/parameter_description.rst b/doc/RTD/source/ParameterFiles/parameter_description.rst
index 711dfb7b99938460038eb154456135bb9ad78f99..207c173001437f283688699fb507999c0225ec21 100644
--- a/doc/RTD/source/ParameterFiles/parameter_description.rst
+++ b/doc/RTD/source/ParameterFiles/parameter_description.rst
@@ -13,7 +13,7 @@ name followed by a column and the value of the parameter:
 
 .. code:: YAML
 
-   ICs:        santa_barbara.hdf5	  
+   ICs:        santa_barbara.hdf5
    dt_max:     1.5
    shift:      [2., 4., 5.]
 
@@ -46,7 +46,7 @@ will be raised. The code can also read an array of values:
 .. code:: YAML
 
    shift:  [2., 4., 5.]
-	  
+
 Some options in the parameter file are optional and
 when not provided, SWIFT will run with the default value. However, if
 a compulsory parameter is missing an error will be raised at
@@ -104,8 +104,8 @@ speed, we would use:
      UnitLength_in_cgs:   3.08567758e24 # 1 Mpc in centimeters
      UnitVelocity_in_cgs: 1e5           # 1 km/s in centimeters per second
      UnitCurrent_in_cgs:  1             # 1 Ampere
-     UnitTemp_in_cgs:     1             # 1 Kelvin   
-	  
+     UnitTemp_in_cgs:     1             # 1 Kelvin
+
 Note that there are currently no variables in any of the SWIFT physics
 schemes that make use of the unit of electric current. There is also
 no incentive to use anything else than Kelvin but that makes the whole
@@ -122,7 +122,7 @@ system <https://en.wikipedia.org/wiki/FFF_system>`_ one would use
      UnitLength_in_cgs:   20116.8     # 1 Furlong (fur) in cm
      UnitVelocity_in_cgs: 0.01663095  # 1 Furlong (fur) per Fortnight (ftn) in cm/s
      UnitCurrent_in_cgs:  1           # 1 Ampere
-     UnitTemp_in_cgs:     1           # 1 Kelvin   
+     UnitTemp_in_cgs:     1           # 1 Kelvin
 
 The value of the physical constants in this system is left as an
 exercise for the reader [#f1]_.
@@ -171,10 +171,10 @@ use the following parameters:
    Cosmology:
      a_begin:        0.0078125     # z = 127
      a_end:          1.0           # z = 0
-     h:              0.6777        
-     Omega_m:        0.307         
-     Omega_lambda:   0.693         
-     Omega_b:        0.0455        
+     h:              0.6777
+     Omega_m:        0.307
+     Omega_lambda:   0.693
+     Omega_b:        0.0455
      Omega_r:        0.            # (Optional)
      w_0:            -1.0          # (Optional)
      w_a:            0.            # (Optional)
@@ -192,7 +192,7 @@ provided in the ``Gravity`` section. The theory document puts these parameters i
 context of the equations being solved. We give a brief overview here.
 
 * The Plummer-equivalent co-moving softening length used for all particles :math:`\epsilon_{com}`: ``comoving_softening``,
-* The Plummer-equivalent maximal physical softening length used for all particles :math:`\epsilon_{max}`: ``comoving_softening``, 
+* The Plummer-equivalent maximal physical softening length used for all particles :math:`\epsilon_{max}`: ``comoving_softening``,
 
 At any redshift :math:`z`, the Plummer-equivalent softening length used by the
 code will be :math:`\epsilon=\min(\epsilon_{max},
@@ -200,7 +200,7 @@ code will be :math:`\epsilon=\min(\epsilon_{max},
 
 * The opening angle (multipole acceptance criterion) used in the FMM :math:`\theta`: ``theta``,
 * The time-step size pre-factor :math:`\eta`: ``eta``,
-  
+
 The time-step of a given particle is given by :math:`\Delta t =
 \eta\sqrt{\frac{\epsilon}{|\overrightarrow{a}|}}`, where
 :math:`\overrightarrow{a}` is the particle's acceleration. `Power et al. (2003) <http://adsabs.harvard.edu/abs/2003MNRAS.338...14P>`_ recommend using :math:`\eta=0.025`.
@@ -224,31 +224,31 @@ Particle-Mesh part of the calculation. The last three are optional:
 * The scale below which the short-range forces are assumed to be exactly Newtonian (in units of
   the mesh cell-size multiplied by :math:`a_{\rm smooth}`) :math:`r_{\rm
   cut,min}`: ``r_cut_min`` (default: ``0.1``),
-  
+
 For most runs, the default values can be used. Only the number of cells along
 each axis needs to be specified. The remaining three values are best described
 in the context of the full set of equations in the theory documents.
-  
+
 As a summary, here are the values used for the EAGLE :math:`100^3~{\rm Mpc}^3`
 simulation:
 
 .. code:: YAML
-	  
+
    # Parameters for the self-gravity scheme for the EAGLE-100 box
    Gravity:
-     eta:          0.025              
-     theta:        0.7                
+     eta:          0.025
+     theta:        0.7
      comoving_softening:     0.0026994  # 0.7 proper kpc at z=2.8.
      max_physical_softening: 0.0007     # 0.7 proper kpc
      rebuild_frequency:      0.01       # Default optional value
-     mesh_side_length:       512       
+     mesh_side_length:       512
      a_smooth:     1.25                 # Default optional value
      r_cut_max:    4.5                  # Default optional value
      r_cut_min:    0.1                  # Default optional value
 
 
 .. _Parameters_SPH:
-     
+
 SPH
 ---
 
@@ -312,7 +312,7 @@ Whilst for a cosmological run, one would need:
     max_dt_RMS_factor: 0.25     # Default optional value
 
 .. _Parameters_ICs:
-    
+
 Initial Conditions
 ------------------
 
@@ -368,21 +368,21 @@ be:
    InitialConditions:
      file_name:  my_ics.hdf5
      periodic:                    1
-     cleanup_h_factors:           1     
-     cleanup_velocity_factors:    1     
-     generate_gas_in_ics:         1     
-     cleanup_smoothing_lengths:   1  
+     cleanup_h_factors:           1
+     cleanup_velocity_factors:    1
+     generate_gas_in_ics:         1
+     cleanup_smoothing_lengths:   1
 
 
 .. _Parameters_constants:
-     
+
 Physical Constants
 ------------------
 
 For some idealised test it can be useful to overwrite the value of
 some physical constants; in particular the value of the gravitational
 constant. SWIFT offers an optional parameter to overwrite the value of
-:math:`G_N`. 
+:math:`G_N`.
 
 .. code:: YAML
 
@@ -475,7 +475,7 @@ one described above (See the :ref:`Parameters_units` section) and read:
 
 When un-specified, these all take the same value as assumed by the internal
 system of units. These are rarely used but can offer a practical alternative to
-converting data in the post-processing of the simulations. 
+converting data in the post-processing of the simulations.
 
 For a standard cosmological run with structure finding activated, the
 full section would be:
@@ -487,7 +487,7 @@ full section would be:
      scale_factor_first:  0.02    # z = 49
      delta_time:          1.02
      invoke_stf:          1
-	    
+
 Showing all the parameters for a basic hydro test-case, one would have:
 
 .. code:: YAML
@@ -513,7 +513,7 @@ following pages:
 
 
 .. _Parameters_statistics:
-  
+
 Statistics
 ----------
 
@@ -523,28 +523,28 @@ following page:
 * :ref:`Output_list_label` (to have statistics outputs not evenly spaced in time).
 
 .. _Parameters_restarts:
-  
+
 Restarts
 --------
 
 SWIFT can write check-pointing files and restart from them. The behaviour of
 this mechanism is driven by the options in the ``Restarts`` section of the YAML
 parameter file. All the parameters are optional but default to values that
-ensure a reasonable behaviour. 
+ensure a reasonable behaviour.
 
 * Whether or not to enable the dump of restart files: ``enable`` (default:
   ``1``).
 
 This parameter acts a master-switch for the check-pointing capabilities. All the
 other options require the ``enable`` parameter to be set to ``1``.
-  
+
 * Whether or not to save a copy of the previous set of check-pointing files:
   ``save`` (default: ``1``),
 * Whether or not to dump a set of restart file on regular exit: ``onexit``
   (default: ``0``),
 * The wall-clock time in hours between two sets of restart files:
   ``delta_hours`` (default: ``6.0``).
-  
+
 Note that there is no buffer time added to the ``delta_hours`` value. If the
 system's batch queue run time limit is set to 6 hours, the user must specify a
 smaller value to allow for enough time to safely dump the check-point files.
@@ -591,24 +591,120 @@ To run SWIFT, dumping check-pointing files every 6 hours and running for 24
 hours after which a shell command will be run, one would use:
 
 .. code:: YAML
-	  
+
   Restarts:
-    enable:             1          
+    enable:             1
     save:               1          # Keep copies
-    onexit:             0          
+    onexit:             0
     subdir:             restart    # Sub-directory of the directory where SWIFT is run
-    basename:           swift      
-    delta_hours:        6.0        
-    stop_steps:         100        
-    max_run_time:       24.0       # In hours 
-    resubmit_on_exit:   1          
-    resubmit_command:   ./resub.sh 
+    basename:           swift
+    delta_hours:        6.0
+    stop_steps:         100
+    max_run_time:       24.0       # In hours
+    resubmit_on_exit:   1
+    resubmit_command:   ./resub.sh
 
 .. _Parameters_scheduler:
 
 Scheduler
 ---------
 
+The Scheduler section contains various parameters that control how the cell
+tree is configured and defines some values for the related tasks.  In general
+these should be considered as tuning parameters, both for speed and memory
+use.
+
+.. code:: YAML
+
+   nr_queues: 0
+
+Defines the number of task queues used. These are normally set to one per
+thread and should be at least that number.
+
+A number of parameters decide how the cell tree will be split into sub-cells,
+according to the number of particles and their expected interaction count,
+and the type of interaction. These are:
+
+.. code:: YAML
+
+  cell_max_size:             8000000
+  cell_sub_size_pair_hydro:  256000000
+  cell_sub_size_self_hydro:  32000
+  cell_sub_size_pair_grav:   256000000
+  cell_sub_size_self_grav:   32000
+  cell_sub_size_pair_stars:  256000000
+  cell_sub_size_self_stars:  32000
+  cell_split_size:           400
+
+when possible cells that exceed these constraints will be split into a further
+level of sub-cells. So for instance a sub-cell should not contain more than
+400 particles (this number defines the scale of most `N*N` interactions).
+
+To control the number of self-gravity tasks we have the parameter:
+
+.. code:: YAML
+
+  cell_subdepth_diff_grav:   4
+
+which stops these from being done at the scale of the leaf cells, of which
+there can be a large number. In this case cells with gravity tasks must be at
+least 4 levels above the leaf cells (when possible).
+
+Extra space is required when particles are created in the system (to the time
+of the next rebuild). These are controlled by:
+
+.. code:: YAML
+
+  cell_extra_parts:          0
+  cell_extra_gparts:         0
+  cell_extra_sparts:         400
+
+
+The number of top-level cells is controlled by the parameter:
+
+.. code:: YAML
+
+  max_top_level_cells:       12
+
+this is the number per dimension, we will have 12x12x12 cells. There must be
+at least 3 top-level cells per dimension.
+
+The number of top-level cells should be set so that the number of particles
+per cell is not too large, this is particularly important when using MPI as
+this defines the maximum size of cell exchange and also the size of non-local
+cells (these are used for cell interactions with local cells), which can have
+a large influence on memory use. Best advice for this is to at least scale for
+additional nodes.
+
+The memory used for holding the task and task-link lists needs to be
+pre-allocated, but cannot be pre-calculated, so we have the two parameters:
+
+.. code:: YAML
+
+  tasks_per_cell:            0.0
+  links_per_tasks:           10
+
+which are guesses at the mean numbers of tasks per cell and number of links
+per task. The tasks_per_cell value will be conservatively guessed when set to
+0.0, but you will be able to save memory by setting a value. The way to get a
+better estimate is to run SWIFT with verbose reporting on (```--verbose=1```)
+and check for the lines that report the ```per cell``` or with MPI ``maximum
+per cell``` values. This number can vary as the balance between MPI ranks
+does, so it is probably best to leave some head room.
+
+If these are exceeded you should get an obvious error message.
+
+Finally the parameter:
+
+.. code:: YAML
+
+  mpi_message_limit:         4096
+
+Defines the size (in bytes) below which MPI communication will be sent using
+non-buffered calls. These should have lower latency, but how that works or
+is honoured is an implementation question.
+
+
 .. _Parameters_domain_decomposition:
 
 Domain Decomposition:
@@ -766,7 +862,7 @@ large numbers of particles can be exchanged between MPI ranks, so is best
 avoided.
 
 If you are using ParMETIS there additional ways that you can tune the
-repartition process. 
+repartition process.
 
 METIS only offers the ability to create a partition from a graph, which means
 that each solution is independent of those that have already been made, that
@@ -774,12 +870,12 @@ can make the exchange of particles very large (although SWIFT attempts to
 minimize this), however, using ParMETIS we can use the existing partition to
 inform the new partition, this has two algorithms that are controlled using::
 
-    adaptive:         1       
+    adaptive:         1
 
 which means use adaptive repartition, otherwise simple refinement. The
 adaptive algorithm is further controlled by the::
 
-    itr:              100     
+    itr:              100
 
 parameter, which defines the ratio of inter node communication time to data
 redistribution time, in the range 0.00001 to 10000000.0. Lower values give
@@ -807,7 +903,7 @@ enabled). This file can then be used to replace the same file found in the
 `src/` directory and SWIFT should then be recompiled. Once you have that, you
 can use the parameter::
 
-    use_fixed_costs:  1       
+    use_fixed_costs:  1
 
 to control whether they are used or not. If enabled these will be used to
 repartition after the second step, which will generally give as good a
@@ -835,4 +931,4 @@ Structure finding (VELOCIraptor)
 	 compare runs over the same physical time but with different numbers of
 	 snapshots. Snapshots at a given time would always have the same set of
 	 digits irrespective of the number of snapshots produced before.
-	       
+
diff --git a/doc/RTD/source/SubgridModels/EAGLE/index.rst b/doc/RTD/source/SubgridModels/EAGLE/index.rst
index a4069bda593204af1fb977e14ca57f5db157ee49..620572f962948308884414622ca2ae7d555d158f 100644
--- a/doc/RTD/source/SubgridModels/EAGLE/index.rst
+++ b/doc/RTD/source/SubgridModels/EAGLE/index.rst
@@ -28,7 +28,11 @@ only a small role in practice. The second limit, labelled as ``Jeans``, is
 used to prevent the fragmentation of high-density gas into clumps that
 cannot be resolved by the coupled hydro+gravity solver. The two limits are
 sketched on the following figure. An additional over-density criterion is
-applied to prevent gas not collapsed into structures from being affected.
+applied to prevent gas not collapsed into structures from being
+affected. This criterion demands that :math:`\rho > \Delta_{\rm floor}
+\Omega_b \rho_{\rm crit}`, with :math:`\Delta_{\rm floor}` specified by the
+user and :math:`\rho_{\rm crit}` the critical density at that redshift
+[#f1]_.
 
 .. figure:: EAGLE_entropy_floor.svg
     :width: 400px
@@ -53,7 +57,7 @@ limits. These are given in the ``EAGLEEntropyFloor`` section of the
 YAML file. The parameters are the Hydrogen number density (in
 :math:`cm^{-3}`) and temperature (in :math:`K`) of the anchor point of
 each floor as well as the power-law slope of each floor and the
-minimal over-density required to apply the limit [#f1]_. For a normal
+minimal over-density required to apply the limit. For a normal
 EAGLE run, that section of the parameter file reads:
 
 .. code:: YAML
diff --git a/doc/RTD/source/Task/index.rst b/doc/RTD/source/Task/index.rst
index e701e924a79f9256d4c0b034b6c15651f41ff405..82210895618618f87faaada472c72afe321b1d04 100644
--- a/doc/RTD/source/Task/index.rst
+++ b/doc/RTD/source/Task/index.rst
@@ -9,10 +9,12 @@ Task System
 This section of the documentation includes information on the task system
 available in SWIFT, as well as how to implement your own task.
 
-SWIFT produces at the beginning of each simulation a ``dot`` file (see the graphviz library for more information).
-It contains the full hierarchy of tasks used in this simulation.
+SWIFT can produce a graph containing all the dependencies using graphviz.
+At the beginning of each simulation a ``csv`` file is generated and can be transformed into a ``png`` with the script ``tools/plot_task_dependencies.py``.
+This script has also the possibility to generate a list of function calls for each task with the option ``--with-calls``.
 You can convert the ``dot`` file into a ``png`` with the following command
-``dot -Tpng dependency_graph.dot -o dependency_graph.png``.
+``dot -Tpng dependency_graph.dot -o dependency_graph.png`` or directly read it with the python module ``xdot`` with ``python -m xdot dependency_graph.dot``.
+
 
 .. toctree::
    :maxdepth: 2
diff --git a/examples/Cooling/CoolingBox/coolingBox.yml b/examples/Cooling/CoolingBox/coolingBox.yml
index 97b452e5ea376ab10bba155e0ff3e9acb8ed02bd..853e480cd4ffba4baa659232e6d5f068b4ea2815 100644
--- a/examples/Cooling/CoolingBox/coolingBox.yml
+++ b/examples/Cooling/CoolingBox/coolingBox.yml
@@ -48,10 +48,12 @@ GrackleCooling:
   ProvideVolumetricHeatingRates: 0 # User provide volumetric heating rates
   ProvideSpecificHeatingRates: 0 # User provide specific heating rates
   SelfShieldingMethod: 0 # Grackle (<= 3) or Gear self shielding method
-  OutputMode: 1 # Write in output corresponding primordial chemistry mode
   MaxSteps: 1000
   ConvergenceLimit: 1e-2
   
+GearChemistry:
+  InitialMetallicity: 0.01295
+
 EAGLECooling:
   dir_name:                ./coolingtables/
   H_reion_z:               11.5
@@ -71,5 +73,3 @@ EAGLEChemistry:             # Solar abundances
   init_abundance_Silicon:   6.825874e-4
   init_abundance_Iron:      1.1032152e-3
 
-GearChemistry:
-  InitialMetallicity: 0.01295
diff --git a/examples/Cooling/CoolingBox/energy_plot.py b/examples/Cooling/CoolingBox/energy_plot.py
deleted file mode 100644
index 45f0b4f6b11c3855a919f6a98fd0ca006a887f82..0000000000000000000000000000000000000000
--- a/examples/Cooling/CoolingBox/energy_plot.py
+++ /dev/null
@@ -1,128 +0,0 @@
-import matplotlib
-matplotlib.use("Agg")
-from pylab import *
-import h5py
-
-# Plot parameters
-params = {'axes.labelsize': 10,
-'axes.titlesize': 10,
-'font.size': 12,
-'legend.fontsize': 12,
-'xtick.labelsize': 10,
-'ytick.labelsize': 10,
-'text.usetex': True,
- 'figure.figsize' : (3.15,3.15),
-'figure.subplot.left'    : 0.145,
-'figure.subplot.right'   : 0.99,
-'figure.subplot.bottom'  : 0.11,
-'figure.subplot.top'     : 0.99,
-'figure.subplot.wspace'  : 0.15,
-'figure.subplot.hspace'  : 0.12,
-'lines.markersize' : 6,
-'lines.linewidth' : 3.,
-'text.latex.unicode': True
-}
-rcParams.update(params)
-rc('font',**{'family':'sans-serif','sans-serif':['Times']})
-
-
-import numpy as np
-import h5py as h5
-import sys
-
-# File containing the total energy
-stats_filename = "./energy.txt"
-
-# First snapshot
-snap_filename = "coolingBox_0000.hdf5"
-
-# Some constants in cgs units
-k_b = 1.38E-16 #boltzmann
-m_p = 1.67e-24 #proton mass
-
-# Initial conditions set in makeIC.py
-T_init = 1.0e5
-
-# Read the initial state of the gas
-f = h5.File(snap_filename,'r')
-rho = np.mean(f["/PartType0/Density"])
-pressure = np.mean(f["/PartType0/Pressure"])
-
-# Read the units parameters from the snapshot
-units = f["InternalCodeUnits"]
-unit_mass = units.attrs["Unit mass in cgs (U_M)"]
-unit_length = units.attrs["Unit length in cgs (U_L)"]
-unit_time = units.attrs["Unit time in cgs (U_t)"]
-
-# Read the properties of the cooling function
-parameters = f["Parameters"]
-cooling_lambda = float(parameters.attrs["LambdaCooling:lambda_cgs"])
-min_T = float(parameters.attrs["LambdaCooling:minimum_temperature"])
-mu = float(parameters.attrs["LambdaCooling:mean_molecular_weight"])
-X_H = float(parameters.attrs["LambdaCooling:hydrogen_mass_abundance"])
-
-# Read the adiabatic index
-gamma = float(f["HydroScheme"].attrs["Adiabatic index"])
-
-print "Initial density :", rho
-print "Initial pressure:", pressure
-print "Adiabatic index :", gamma
-
-# Read energy and time arrays
-array = np.genfromtxt(stats_filename,skip_header = 1)
-time = array[:,0]
-total_mass = array[:,1]
-total_energy = array[:,2]
-kinetic_energy = array[:,3]
-internal_energy = array[:,4]
-radiated_energy = array[:,8]
-initial_energy = total_energy[0]
-
-# Conversions to cgs
-rho_cgs = rho * unit_mass / (unit_length)**3
-time_cgs = time * unit_time
-total_energy_cgs = total_energy / total_mass[0] * unit_length**2 / (unit_time)**2
-kinetic_energy_cgs = kinetic_energy / total_mass[0] * unit_length**2 / (unit_time)**2
-internal_energy_cgs = internal_energy / total_mass[0] * unit_length**2 / (unit_time)**2
-radiated_energy_cgs = radiated_energy / total_mass[0] * unit_length**2 / (unit_time)**2  
-
-# Find the energy floor
-u_floor_cgs = k_b * min_T / (mu * m_p * (gamma - 1.))
-
-# Find analytic solution
-initial_energy_cgs = initial_energy/total_mass[0] * unit_length**2 / (unit_time)**2 
-n_H_cgs = X_H * rho_cgs / m_p
-du_dt_cgs = -cooling_lambda * n_H_cgs**2 / rho_cgs
-cooling_time_cgs = (initial_energy_cgs/(-du_dt_cgs))[0]
-analytic_time_cgs = np.linspace(0, cooling_time_cgs * 1.8, 1000)
-u_analytic_cgs = du_dt_cgs*analytic_time_cgs + initial_energy_cgs
-u_analytic_cgs[u_analytic_cgs < u_floor_cgs] = u_floor_cgs
-
-print "Cooling time:", cooling_time_cgs, "[s]"
-
-# Read snapshots
-u_snapshots_cgs = zeros(25)
-t_snapshots_cgs = zeros(25)
-for i in range(25):
-    snap = h5.File("coolingBox_%0.4d.hdf5"%i,'r')
-    u_snapshots_cgs[i] = sum(snap["/PartType0/InternalEnergy"][:] * snap["/PartType0/Masses"][:])  / total_mass[0] * unit_length**2 / (unit_time)**2
-    t_snapshots_cgs[i] = snap["/Header"].attrs["Time"] * unit_time
-
-
-figure()
-plot(time_cgs, total_energy_cgs, 'r-', lw=1.6, label="Gas total energy")
-plot(t_snapshots_cgs, u_snapshots_cgs, 'rD', ms=3)
-plot(time_cgs, radiated_energy_cgs, 'g-', lw=1.6, label="Radiated energy")
-plot(time_cgs, total_energy_cgs + radiated_energy_cgs, 'b-', lw=0.6, label="Gas total + radiated")
-
-plot(analytic_time_cgs, u_analytic_cgs, '--', color='k', alpha=0.8, lw=1.0, label="Analytic solution")
-
-legend(loc="upper right", fontsize=8, frameon=False, handlelength=3, ncol=1)
-xlabel("${\\rm{Time~[s]}}$", labelpad=0)
-ylabel("${\\rm{Energy~[erg]}}$")
-xlim(0, 1.5*cooling_time_cgs)
-ylim(0, 1.5*u_analytic_cgs[0])
-
-savefig("energy.png", dpi=200)
-
-
diff --git a/examples/Cooling/CoolingBox/makeIC.py b/examples/Cooling/CoolingBox/makeIC.py
index f863e174b1fcd404ae178fe324c7a165598b4af0..807ad5378d87891ed878a85c1020d5983474e13d 100644
--- a/examples/Cooling/CoolingBox/makeIC.py
+++ b/examples/Cooling/CoolingBox/makeIC.py
@@ -1,58 +1,69 @@
 ###############################################################################
- # This file is part of SWIFT.
- # Copyright (c) 2016 Stefan Arridge (stefan.arridge@durhama.ac.uk)
- #                    Matthieu Schaller (matthieu.schaller@durham.ac.uk)
- # 
- # This program is free software: you can redistribute it and/or modify
- # it under the terms of the GNU Lesser General Public License as published
- # by the Free Software Foundation, either version 3 of the License, or
- # (at your option) any later version.
- # 
- # This program is distributed in the hope that it will be useful,
- # but WITHOUT ANY WARRANTY; without even the implied warranty of
- # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- # GNU General Public License for more details.
- # 
- # You should have received a copy of the GNU Lesser General Public License
- # along with this program.  If not, see <http://www.gnu.org/licenses/>.
- # 
- ##############################################################################
+# This file is part of SWIFT.
+# Copyright (c) 2016 Stefan Arridge (stefan.arridge@durhama.ac.uk)
+#                    Matthieu Schaller (matthieu.schaller@durham.ac.uk)
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published
+# by the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+##############################################################################
 
 import h5py
-import sys
-from numpy import *
+import numpy as np
 
 # Generates a SWIFT IC file with a constant density and pressure
 
 # Parameters
-periodic= 1           # 1 For periodic box
-boxSize = 1           # 1 kiloparsec    
-rho = 3.2e3           # Density in code units (3.2e6 is 0.1 hydrogen atoms per cm^3)
-P = 4.5e6             # Pressure in code units (at 10^5K)
-gamma = 5./3.         # Gas adiabatic index
-eta = 1.2349          # 48 ngbs with cubic spline kernel
-fileName = "coolingBox.hdf5" 
-
-#---------------------------------------------------
+periodic = 1  # 1 For periodic box
+boxSize = 1  # 1 kiloparsec
+rho = 3.2e3  # Density in code units (3.2e6 is 0.1 hydrogen atoms per cm^3)
+T = 4e3  # Initial Temperature
+gamma = 5./3.  # Gas adiabatic index
+fileName = "coolingBox.hdf5"
+# ---------------------------------------------------
+
+# defines some constants
+# need to be changed in plotTemperature.py too
+h_frac = 0.76
+mu = 4. / (1. + 3. * h_frac)
+
+m_h_cgs = 1.67e-24
+k_b_cgs = 1.38e-16
+
+# defines units
+unit_length = 3.0857e21  # kpc
+unit_mass = 2.0e33  # solar mass
+unit_time = 3.0857e16  # ~ Gyr
 
 # Read id, position and h from glass
 glass = h5py.File("glassCube_32.hdf5", "r")
 ids = glass["/PartType0/ParticleIDs"][:]
-pos = glass["/PartType0/Coordinates"][:,:] * boxSize
+pos = glass["/PartType0/Coordinates"][:, :] * boxSize
 h = glass["/PartType0/SmoothingLength"][:] * boxSize
 
 # Compute basic properties
-numPart = size(pos) / 3
+numPart = np.size(pos) // 3
 mass = boxSize**3 * rho / numPart
-internalEnergy = P / ((gamma - 1.) * rho)
+internalEnergy = k_b_cgs * T * mu / ((gamma - 1.) * m_h_cgs)
+internalEnergy *= (unit_time / unit_length)**2
 
-#File
-file = h5py.File(fileName, 'w')
+# File
+f = h5py.File(fileName, 'w')
 
 # Header
-grp = file.create_group("/Header")
+grp = f.create_group("/Header")
 grp.attrs["BoxSize"] = boxSize
-grp.attrs["NumPart_Total"] =  [numPart, 0, 0, 0, 0, 0]
+grp.attrs["NumPart_Total"] = [numPart, 0, 0, 0, 0, 0]
 grp.attrs["NumPart_Total_HighWord"] = [0, 0, 0, 0, 0, 0]
 grp.attrs["NumPart_ThisFile"] = [numPart, 0, 0, 0, 0, 0]
 grp.attrs["Time"] = 0.0
@@ -61,43 +72,43 @@ grp.attrs["MassTable"] = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 grp.attrs["Flag_Entropy_ICs"] = 0
 
 # Runtime parameters
-grp = file.create_group("/RuntimePars")
+grp = f.create_group("/RuntimePars")
 grp.attrs["PeriodicBoundariesOn"] = periodic
 
 # Units
-grp = file.create_group("/Units")
-grp.attrs["Unit length in cgs (U_L)"] = 3.0857e21 
-grp.attrs["Unit mass in cgs (U_M)"] = 2.0e33 
-grp.attrs["Unit time in cgs (U_t)"] = 3.0857e16 
+grp = f.create_group("/Units")
+grp.attrs["Unit length in cgs (U_L)"] = unit_length
+grp.attrs["Unit mass in cgs (U_M)"] = unit_mass
+grp.attrs["Unit time in cgs (U_t)"] = unit_time
 grp.attrs["Unit current in cgs (U_I)"] = 1.
 grp.attrs["Unit temperature in cgs (U_T)"] = 1.
 
-#Particle group
-grp = file.create_group("/PartType0")
+# Particle group
+grp = f.create_group("/PartType0")
 
-v  = zeros((numPart, 3))
+v = np.zeros((numPart, 3))
 ds = grp.create_dataset('Velocities', (numPart, 3), 'f')
 ds[()] = v
 
-m = full((numPart, 1), mass)
-ds = grp.create_dataset('Masses', (numPart,1), 'f')
+m = np.full((numPart, 1), mass)
+ds = grp.create_dataset('Masses', (numPart, 1), 'f')
 ds[()] = m
 
-h = reshape(h, (numPart, 1))
+h = np.reshape(h, (numPart, 1))
 ds = grp.create_dataset('SmoothingLength', (numPart, 1), 'f')
 ds[()] = h
 
-u = full((numPart, 1), internalEnergy)
-ds = grp.create_dataset('InternalEnergy', (numPart,1), 'f')
+u = np.full((numPart, 1), internalEnergy)
+ds = grp.create_dataset('InternalEnergy', (numPart, 1), 'f')
 ds[()] = u
 
-ids = reshape(ids, (numPart, 1))
+ids = np.reshape(ids, (numPart, 1))
 ds = grp.create_dataset('ParticleIDs', (numPart, 1), 'L')
 ds[()] = ids
 
 ds = grp.create_dataset('Coordinates', (numPart, 3), 'd')
 ds[()] = pos
 
-file.close()
+f.close()
 
-print numPart
+print("Initial condition generated")
diff --git a/examples/Cooling/CoolingBox/plotEnergy.py b/examples/Cooling/CoolingBox/plotEnergy.py
new file mode 100644
index 0000000000000000000000000000000000000000..9c7af57d3d9dfdcfa222e9f77701f230d25f9ddc
--- /dev/null
+++ b/examples/Cooling/CoolingBox/plotEnergy.py
@@ -0,0 +1,110 @@
+from h5py import File
+import numpy as np
+import matplotlib
+from glob import glob
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+
+# Plot parameters
+params = {
+    'axes.labelsize': 10,
+    'axes.titlesize': 10,
+    'font.size': 12,
+    'legend.fontsize': 12,
+    'xtick.labelsize': 10,
+    'ytick.labelsize': 10,
+    'text.usetex': True,
+    'figure.figsize': (5, 5),
+    'figure.subplot.left': 0.145,
+    'figure.subplot.right': 0.99,
+    'figure.subplot.bottom': 0.11,
+    'figure.subplot.top': 0.99,
+    'figure.subplot.wspace': 0.15,
+    'figure.subplot.hspace': 0.12,
+    'lines.markersize': 6,
+    'lines.linewidth': 3.,
+}
+plt.rcParams.update(params)
+
+
+# Some constants in cgs units
+k_b_cgs = 1.38e-16  # boltzmann
+m_h_cgs = 1.67e-24  # proton mass
+
+
+# File containing the total energy
+stats_filename = "./energy.txt"
+
+# First snapshot
+snap_filename = "coolingBox_0000.hdf5"
+
+# Read the initial state of the gas
+f = File(snap_filename, 'r')
+
+# Read the units parameters from the snapshot
+units = f["InternalCodeUnits"]
+unit_mass = units.attrs["Unit mass in cgs (U_M)"]
+unit_length = units.attrs["Unit length in cgs (U_L)"]
+unit_time = units.attrs["Unit time in cgs (U_t)"]
+
+# Read the adiabatic index
+gamma = float(f["HydroScheme"].attrs["Adiabatic index"])
+
+
+def energyUnits(u):
+    """ Compute the temperature from the internal energy. """
+    u *= (unit_length / unit_time)**2
+    return u * m_h_cgs / k_b_cgs
+
+
+# Read energy and time arrays
+array = np.genfromtxt(stats_filename, skip_header=1)
+time = array[:, 0] * unit_time
+total_mass = array[:, 1]
+total_energy = array[:, 2]
+kinetic_energy = array[:, 3]
+internal_energy = array[:, 4]
+radiated_energy = array[:, 8]
+initial_energy = total_energy[0]
+
+# Conversions to cgs
+total_energy_cgs = total_energy / total_mass[0]
+total_energy_cgs = energyUnits(total_energy_cgs)
+
+kinetic_energy_cgs = kinetic_energy / total_mass[0]
+kinetic_energy_cgs = energyUnits(kinetic_energy_cgs)
+
+internal_energy_cgs = internal_energy / total_mass[0]
+internal_energy_cgs = energyUnits(internal_energy_cgs)
+
+radiated_energy_cgs = radiated_energy / total_mass[0]
+radiated_energy_cgs = energyUnits(radiated_energy_cgs)
+
+# Read snapshots
+files = glob("coolingBox_*.hdf5")
+N = len(files)
+temp_snap = np.zeros(N)
+time_snap_cgs = np.zeros(N)
+for i in range(N):
+    snap = File(files[i], 'r')
+    u = snap["/PartType0/InternalEnergy"][:] * snap["/PartType0/Masses"][:]
+    u = sum(u) / total_mass[0]
+    temp_snap[i] = energyUnits(u)
+    time_snap_cgs[i] = snap["/Header"].attrs["Time"] * unit_time
+
+
+plt.figure()
+
+Myr_in_yr = 3.15e13
+plt.plot(time, total_energy_cgs, 'r-', lw=1.6, label="Gas total energy")
+plt.plot(time_snap_cgs, temp_snap, 'rD', ms=3)
+plt.plot(time, radiated_energy_cgs, 'g-', lw=1.6, label="Radiated energy")
+plt.plot(time, total_energy_cgs + radiated_energy_cgs, 'b-',
+         lw=0.6, label="Gas total + radiated")
+
+plt.legend(loc="right", fontsize=8, frameon=False,
+           handlelength=3, ncol=1)
+plt.xlabel("${\\rm{Time~[Myr]}}$", labelpad=0)
+plt.ylabel("${\\rm{Internal ~Energy ~(u ~m_H / k_B) ~[K]}}$")
+
+plt.savefig("energy.png", dpi=200)
diff --git a/examples/Cooling/CoolingBox/run.sh b/examples/Cooling/CoolingBox/run.sh
index 8a0d717e49324cdc49301fa0c551d5b7da198d5e..ae5b6d361e3364028864882d3400e702c8e670fb 100755
--- a/examples/Cooling/CoolingBox/run.sh
+++ b/examples/Cooling/CoolingBox/run.sh
@@ -21,7 +21,7 @@ then
 fi
 
 # Run SWIFT
-../../swift --cosmology --hydro --cooling --threads=4 -n 1000 coolingBox.yml
+../../swift --hydro --cooling --threads=4 -n 1000 coolingBox.yml
 
 # Check energy conservation and cooling rate
-python energy_plot.py
+python plotEnergy.py
diff --git a/examples/GiantImpacts/GiantImpact/run.sh b/examples/GiantImpacts/GiantImpact/run.sh
index 0324e72deed978b1095b15532e7c48eecc1a4276..5a4f0a74dd098b1fff4659a7d72be3845ad47fc6 100755
--- a/examples/GiantImpacts/GiantImpact/run.sh
+++ b/examples/GiantImpacts/GiantImpact/run.sh
@@ -14,7 +14,7 @@ then
     echo "Fetching equations of state tables for the Uranus impact example..."
     ./get_eos_tables.sh
 fi
-cd ../GiantImpactUranus
+cd ../GiantImpact
 
 # Run SWIFT
 ../../swift -s -G -t 8 uranus_1e6.yml 2>&1 | tee output.log
diff --git a/examples/HydroTests/VacuumSpherical_3D/getGlass.sh b/examples/HydroTests/VacuumSpherical_3D/getGlass.sh
old mode 100755
new mode 100644
diff --git a/examples/Makefile.am b/examples/Makefile.am
index bfab73c0e4709b20de3733b08e4fb99901cb996f..1c8415e1302df828133d0459d7dc9e9b4d73ffcb 100644
--- a/examples/Makefile.am
+++ b/examples/Makefile.am
@@ -59,7 +59,7 @@ swift_mpi_CFLAGS = $(MYFLAGS) $(AM_CFLAGS) $(MPI_FLAGS) -DENGINE_POLICY="engine_
 swift_mpi_LDADD =  ../src/.libs/libswiftsim_mpi.a ../argparse/.libs/libargparse.a $(MPI_LIBS) $(EXTRA_LIBS)
 
 # Scripts to generate ICs
-EXTRA_DIST = Cooling/CoolingBox/coolingBox.yml Cooling/CoolingBox/energy_plot.py Cooling/CoolingBox/makeIC.py Cooling/CoolingBox/run.sh \
+EXTRA_DIST = Cooling/CoolingBox/coolingBox.yml Cooling/CoolingBox/plotEnergy.py Cooling/CoolingBox/makeIC.py Cooling/CoolingBox/run.sh Cooling/CoolingBox/getGlass.sh \
              Cosmology/ConstantCosmoVolume/run.sh Cosmology/ConstantCosmoVolume/makeIC.py Cosmology/ConstantCosmoVolume/plotSolution.py Cosmology/ConstantCosmoVolume/constant_volume.yml \
              Cosmology/ZeldovichPancake_3D/makeIC.py Cosmology/ZeldovichPancake_3D/zeldovichPancake.yml Cosmology/ZeldovichPancake_3D/run.sh Cosmology/ZeldovichPancake_3D/plotSolution.py \
              EAGLE_low_z/EAGLE_6/eagle_6.yml EAGLE_low_z/EAGLE_6/getIC.sh EAGLE_low_z/EAGLE_6/README EAGLE_low_z/EAGLE_6/run.sh \
diff --git a/examples/SmallCosmoVolume/SmallCosmoVolume_cooling/plotTempEvolution.py b/examples/SmallCosmoVolume/SmallCosmoVolume_cooling/plotTempEvolution.py
index 988ea36163203a50928cc7fd8f9c81f4d3a377ff..a3458ac1598e5657f3f597dfb10b36a7a641e68f 100644
--- a/examples/SmallCosmoVolume/SmallCosmoVolume_cooling/plotTempEvolution.py
+++ b/examples/SmallCosmoVolume/SmallCosmoVolume_cooling/plotTempEvolution.py
@@ -164,7 +164,9 @@ legend(loc="upper left", frameon=False, handlelength=1.5)
 if cooling_model == "Constant Lambda":
     text(1e-2, 6e4, "$\Lambda_{\\rm const}/n_{\\rm H}^2 = %.1f\\times10^{%d}~[\\rm{cgs}]$"%(Lambda/10.**(int(log10(Lambda))), log10(Lambda)), fontsize=7)
 elif cooling_model == "EAGLE":
-    text(1e-2, 6e4, "EAGLE (Wiersma et al. (2009)")
+    text(1e-2, 6e4, "EAGLE (Wiersma et al. 2009)")
+elif cooling_model == b"Grackle":
+    text(1e-2, 6e4, "Grackle (Smith et al. 2016)")
 else:
     text(1e-2, 6e4, "No cooling")
     
@@ -187,7 +189,7 @@ minorticks_off()
 xlabel("${\\rm Redshift}~z$", labelpad=0)
 ylabel("${\\rm Temperature}~T~[{\\rm K}]$", labelpad=0)
 xlim(9e-3, 1.1)
-ylim(20, 2.5e7)
+ylim(5, 2.5e7)
 
 savefig("Temperature_evolution.png", dpi=200)
 
diff --git a/examples/SmallCosmoVolume/SmallCosmoVolume_cooling/run.sh b/examples/SmallCosmoVolume/SmallCosmoVolume_cooling/run.sh
index 5ce9e2bb3b544eb915f189d4aa417fcc9c316b64..a7ae9dab54975efaf523de323a127b8134663544 100755
--- a/examples/SmallCosmoVolume/SmallCosmoVolume_cooling/run.sh
+++ b/examples/SmallCosmoVolume/SmallCosmoVolume_cooling/run.sh
@@ -7,6 +7,11 @@ then
     ./getIC.sh
 fi
 
+if [ ! -e CloudyData_UVB=HM2012.h5 ]
+then
+    ../../Cooling/getCoolingTable.sh 
+fi
+
 if [ ! -e coolingtables ]
 then
     echo "Fetching cooling tables for the small cosmological volume example..."
diff --git a/examples/SmallCosmoVolume/SmallCosmoVolume_cooling/small_cosmo_volume.yml b/examples/SmallCosmoVolume/SmallCosmoVolume_cooling/small_cosmo_volume.yml
index 60046cadc3bd3af324b5f967c76315cf5a27fe52..96f10465410ac120b30904a8655da4d8133d09bd 100644
--- a/examples/SmallCosmoVolume/SmallCosmoVolume_cooling/small_cosmo_volume.yml
+++ b/examples/SmallCosmoVolume/SmallCosmoVolume_cooling/small_cosmo_volume.yml
@@ -16,7 +16,7 @@ Cosmology:                      # WMAP9 cosmology
 
 # Parameters governing the time integration
 TimeIntegration:
-  dt_min:     1e-6 
+  dt_min:     1e-8
   dt_max:     1e-2 
 
 # Parameters for the self-gravity scheme
@@ -83,3 +83,18 @@ EAGLEChemistry:
   init_abundance_Magnesium: 0.0
   init_abundance_Silicon:   0.0
   init_abundance_Iron:      0.0
+
+# Cooling with Grackle 3.0
+GrackleCooling:
+  CloudyTable: CloudyData_UVB=HM2012.h5 # Name of the Cloudy Table (available on the grackle bitbucket repository)
+  WithUVbackground: 1                   # Enable or not the UV background
+  Redshift: -1                           # Redshift to use (-1 means time based redshift)
+  WithMetalCooling: 1                   # Enable or not the metal cooling
+  ProvideVolumetricHeatingRates: 0      # (optional) User provide volumetric heating rates
+  ProvideSpecificHeatingRates: 0        # (optional) User provide specific heating rates
+  SelfShieldingMethod: 0                # (optional) Grackle (<= 3) or Gear self shielding method
+  MaxSteps: 10000                       # (optional) Max number of step when computing the initial composition
+  ConvergenceLimit: 1e-2                # (optional) Convergence threshold (relative) for initial composition
+
+GearChemistry:
+  InitialMetallicity: 0.01295
diff --git a/examples/SubgridTests/SupernovaeFeedback/getGlass.sh b/examples/SubgridTests/SupernovaeFeedback/getGlass.sh
old mode 100644
new mode 100755
diff --git a/examples/parameter_example.yml b/examples/parameter_example.yml
index 423e40d3ba5aa18822b3e185783730161ecdf11d..22bbf3db4f4f49f1cce6c1aa817b8228f829437f 100644
--- a/examples/parameter_example.yml
+++ b/examples/parameter_example.yml
@@ -71,7 +71,7 @@ Scheduler:
   cell_extra_gparts:         0         # (Optional) Number of spare gparts per top-level allocated at rebuild time for on-the-fly creation.
   cell_extra_sparts:         400       # (Optional) Number of spare sparts per top-level allocated at rebuild time for on-the-fly creation.
   max_top_level_cells:       12        # (Optional) Maximal number of top-level cells in any dimension. The number of top-level cells will be the cube of this (this is the default value).
-  tasks_per_cell:            0         # (Optional) The average number of tasks per cell. If not large enough the simulation will fail (means guess...).
+  tasks_per_cell:            0.0       # (Optional) The average number of tasks per cell. If not large enough the simulation will fail (means guess...).
   links_per_tasks:           10        # (Optional) The average number of links per tasks (before adding the communication tasks). If not large enough the simulation will fail (means guess...). Defaults to 10.
   mpi_message_limit:         4096      # (Optional) Maximum MPI task message size to send non-buffered, KB.
 
@@ -297,7 +297,6 @@ GrackleCooling:
   ProvideVolumetricHeatingRates: 0      # (optional) User provide volumetric heating rates
   ProvideSpecificHeatingRates: 0        # (optional) User provide specific heating rates
   SelfShieldingMethod: 0                # (optional) Grackle (<= 3) or Gear self shielding method
-  OutputMode: 0                         # (optional) Write in output corresponding primordial chemistry mode
   MaxSteps: 10000                       # (optional) Max number of step when computing the initial composition
   ConvergenceLimit: 1e-2                # (optional) Convergence threshold (relative) for initial composition
 
diff --git a/src/collectgroup.c b/src/collectgroup.c
index 0b7b419b565612149fd2b295116b37aa65aa01e9..ddf3e35d945fd8b07cc927d8ba383963c7558cd2 100644
--- a/src/collectgroup.c
+++ b/src/collectgroup.c
@@ -41,6 +41,9 @@ struct mpicollectgroup1 {
   integertime_t ti_hydro_end_min;
   integertime_t ti_gravity_end_min;
   int forcerebuild;
+  long long total_nr_cells;
+  long long total_nr_tasks;
+  float tasks_per_cell_max;
 };
 
 /* Forward declarations. */
@@ -93,6 +96,9 @@ void collectgroup1_apply(struct collectgroup1 *grp1, struct engine *e) {
   e->nr_inhibited_gparts = grp1->g_inhibited;
   e->nr_inhibited_sparts = grp1->s_inhibited;
   e->forcerebuild = grp1->forcerebuild;
+  e->total_nr_cells = grp1->total_nr_cells;
+  e->total_nr_tasks = grp1->total_nr_tasks;
+  e->tasks_per_cell_max = grp1->tasks_per_cell_max;
 }
 
 /**
@@ -101,37 +107,39 @@ void collectgroup1_apply(struct collectgroup1 *grp1, struct engine *e) {
  * @param grp1 The #collectgroup1 to initialise
  * @param updated the number of updated hydro particles on this node this step.
  * @param g_updated the number of updated gravity particles on this node this
- * step.
+ *                  step.
  * @param s_updated the number of updated star particles on this node this step.
  * @param inhibited the number of inhibited hydro particles on this node this
- * step.
+ *                  step.
  * @param g_inhibited the number of inhibited gravity particles on this node
- * this step.
+ *                    this step.
  * @param s_inhibited the number of inhibited star particles on this node this
- * step.
+ *                    step.
  * @param ti_hydro_end_min the minimum end time for next hydro time step after
- * this step.
+ *                         this step.
  * @param ti_hydro_end_max the maximum end time for next hydro time step after
- * this step.
+ *                         this step.
  * @param ti_hydro_beg_max the maximum begin time for next hydro time step after
- * this step.
+ *                         this step.
  * @param ti_gravity_end_min the minimum end time for next gravity time step
- * after this step.
+ *                           after this step.
  * @param ti_gravity_end_max the maximum end time for next gravity time step
- * after this step.
+ *                           after this step.
  * @param ti_gravity_beg_max the maximum begin time for next gravity time step
- * after this step.
+ *                           after this step.
  * @param forcerebuild whether a rebuild is required after this step.
+ * @param total_nr_cells total number of all cells on rank.
+ * @param total_nr_tasks total number of tasks on rank.
+ * @param tasks_per_cell the used number of tasks per cell.
  */
-void collectgroup1_init(struct collectgroup1 *grp1, size_t updated,
-                        size_t g_updated, size_t s_updated, size_t inhibited,
-                        size_t g_inhibited, size_t s_inhibited,
-                        integertime_t ti_hydro_end_min,
-                        integertime_t ti_hydro_end_max,
-                        integertime_t ti_hydro_beg_max,
-                        integertime_t ti_gravity_end_min,
-                        integertime_t ti_gravity_end_max,
-                        integertime_t ti_gravity_beg_max, int forcerebuild) {
+void collectgroup1_init(
+    struct collectgroup1 *grp1, size_t updated, size_t g_updated,
+    size_t s_updated, size_t inhibited, size_t g_inhibited, size_t s_inhibited,
+    integertime_t ti_hydro_end_min, integertime_t ti_hydro_end_max,
+    integertime_t ti_hydro_beg_max, integertime_t ti_gravity_end_min,
+    integertime_t ti_gravity_end_max, integertime_t ti_gravity_beg_max,
+    int forcerebuild, long long total_nr_cells, long long total_nr_tasks,
+    float tasks_per_cell) {
 
   grp1->updated = updated;
   grp1->g_updated = g_updated;
@@ -146,6 +154,9 @@ void collectgroup1_init(struct collectgroup1 *grp1, size_t updated,
   grp1->ti_gravity_end_max = ti_gravity_end_max;
   grp1->ti_gravity_beg_max = ti_gravity_beg_max;
   grp1->forcerebuild = forcerebuild;
+  grp1->total_nr_cells = total_nr_cells;
+  grp1->total_nr_tasks = total_nr_tasks;
+  grp1->tasks_per_cell_max = tasks_per_cell;
 }
 
 /**
@@ -171,6 +182,9 @@ void collectgroup1_reduce(struct collectgroup1 *grp1) {
   mpigrp11.ti_hydro_end_min = grp1->ti_hydro_end_min;
   mpigrp11.ti_gravity_end_min = grp1->ti_gravity_end_min;
   mpigrp11.forcerebuild = grp1->forcerebuild;
+  mpigrp11.total_nr_cells = grp1->total_nr_cells;
+  mpigrp11.total_nr_tasks = grp1->total_nr_tasks;
+  mpigrp11.tasks_per_cell_max = grp1->tasks_per_cell_max;
 
   struct mpicollectgroup1 mpigrp12;
   if (MPI_Allreduce(&mpigrp11, &mpigrp12, 1, mpicollectgroup1_type,
@@ -187,6 +201,9 @@ void collectgroup1_reduce(struct collectgroup1 *grp1) {
   grp1->ti_hydro_end_min = mpigrp12.ti_hydro_end_min;
   grp1->ti_gravity_end_min = mpigrp12.ti_gravity_end_min;
   grp1->forcerebuild = mpigrp12.forcerebuild;
+  grp1->total_nr_cells = mpigrp12.total_nr_cells;
+  grp1->total_nr_tasks = mpigrp12.total_nr_tasks;
+  grp1->tasks_per_cell_max = mpigrp12.tasks_per_cell_max;
 
 #endif
 }
@@ -221,6 +238,14 @@ static void doreduce1(struct mpicollectgroup1 *mpigrp11,
   /* Everyone must agree to not rebuild. */
   if (mpigrp11->forcerebuild || mpigrp12->forcerebuild)
     mpigrp11->forcerebuild = 1;
+
+  /* Totals of all tasks and cells. */
+  mpigrp11->total_nr_cells += mpigrp12->total_nr_cells;
+  mpigrp11->total_nr_tasks += mpigrp12->total_nr_tasks;
+
+  /* Maximum value of tasks_per_cell. */
+  mpigrp11->tasks_per_cell_max =
+      max(mpigrp11->tasks_per_cell_max, mpigrp12->tasks_per_cell_max);
 }
 
 /**
diff --git a/src/collectgroup.h b/src/collectgroup.h
index b6e8769ac993cc023ae402cdfc4b0169406f6181..3e430b58db05b563f96149d1ae21039444a03640 100644
--- a/src/collectgroup.h
+++ b/src/collectgroup.h
@@ -46,19 +46,25 @@ struct collectgroup1 {
 
   /* Force the engine to rebuild? */
   int forcerebuild;
+
+  /* Totals of cells and tasks. */
+  long long total_nr_cells;
+  long long total_nr_tasks;
+
+  /* Maximum value of actual tasks per cell across all ranks. */
+  float tasks_per_cell_max;
 };
 
 void collectgroup_init(void);
 void collectgroup1_apply(struct collectgroup1 *grp1, struct engine *e);
-void collectgroup1_init(struct collectgroup1 *grp1, size_t updated,
-                        size_t g_updated, size_t s_updated, size_t inhibited,
-                        size_t g_inhibited, size_t s_inhibited,
-                        integertime_t ti_hydro_end_min,
-                        integertime_t ti_hydro_end_max,
-                        integertime_t ti_hydro_beg_max,
-                        integertime_t ti_gravity_end_min,
-                        integertime_t ti_gravity_end_max,
-                        integertime_t ti_gravity_beg_max, int forcerebuild);
+void collectgroup1_init(
+    struct collectgroup1 *grp1, size_t updated, size_t g_updated,
+    size_t s_updated, size_t inhibited, size_t g_inhibited, size_t s_inhibited,
+    integertime_t ti_hydro_end_min, integertime_t ti_hydro_end_max,
+    integertime_t ti_hydro_beg_max, integertime_t ti_gravity_end_min,
+    integertime_t ti_gravity_end_max, integertime_t ti_gravity_beg_max,
+    int forcerebuild, long long total_nr_cells, long long total_nr_tasks,
+    float tasks_per_cell);
 void collectgroup1_reduce(struct collectgroup1 *grp1);
 
 #endif /* SWIFT_COLLECTGROUP_H */
diff --git a/src/cooling.h b/src/cooling.h
index c1a78e256fdd77fcb1f5cde074f843bd16d412ec..875ef5054491f783d526e7c8e2caf3e005c8a5a0 100644
--- a/src/cooling.h
+++ b/src/cooling.h
@@ -27,6 +27,12 @@
 /* Config parameters. */
 #include "../config.h"
 
+/* Local includes */
+#include "parser.h"
+#include "physical_constants.h"
+#include "restart.h"
+#include "units.h"
+
 /* Import the right cooling definition */
 #if defined(COOLING_NONE)
 #include "./cooling/none/cooling.h"
diff --git a/src/cooling/Compton/cooling.h b/src/cooling/Compton/cooling.h
index 3251166b821f6bc77d86405f34d8f9540a5d9915..c796375c33c586e2c9a95515f4124062a41640eb 100644
--- a/src/cooling/Compton/cooling.h
+++ b/src/cooling/Compton/cooling.h
@@ -19,6 +19,14 @@
 #ifndef SWIFT_COOLING_COMPTON_H
 #define SWIFT_COOLING_COMPTON_H
 
+/**
+ * @file src/cooling/Compton/cooling.h
+ * @brief Routines related to the "Compton" cooling function.
+ *
+ * This model compute the cooling rate from the Compton interaction with
+ * the CMB photons.
+ */
+
 /* Config parameters. */
 #include "../config.h"
 
@@ -27,7 +35,7 @@
 #include <math.h>
 
 /* Local includes. */
-#include "const.h"
+#include "entropy_floor.h"
 #include "error.h"
 #include "hydro.h"
 #include "parser.h"
@@ -122,6 +130,7 @@ __attribute__((always_inline)) INLINE static double Compton_cooling_rate_cgs(
  * @param us The internal system of units.
  * @param cosmo The current cosmological model.
  * @param hydro_props The properties of the hydro scheme.
+ * @param floor_props Properties of the entropy floor.
  * @param cooling The #cooling_function_data used in the run.
  * @param p Pointer to the particle data.
  * @param xp Pointer to the particle' extended data.
@@ -133,6 +142,7 @@ __attribute__((always_inline)) INLINE static void cooling_cool_part(
     const struct unit_system* restrict us,
     const struct cosmology* restrict cosmo,
     const struct hydro_props* hydro_props,
+    const struct entropy_floor_properties* floor_props,
     const struct cooling_function_data* restrict cooling,
     struct part* restrict p, struct xpart* restrict xp, const float dt,
     const float dt_therm) {
@@ -140,9 +150,6 @@ __attribute__((always_inline)) INLINE static void cooling_cool_part(
   /* Nothing to do here? */
   if (dt == 0.) return;
 
-  /* Internal energy floor */
-  const float u_floor = hydro_props->minimal_internal_energy;
-
   /* Current energy */
   const float u_old = hydro_get_physical_internal_energy(p, xp, cosmo);
 
@@ -164,11 +171,22 @@ __attribute__((always_inline)) INLINE static void cooling_cool_part(
 
   /* We now need to check that we are not going to go below any of the limits */
 
+  /* Limit imposed by the entropy floor */
+  const float A_floor = entropy_floor(p, cosmo, floor_props);
+  const float rho = hydro_get_physical_density(p, cosmo);
+  const float u_floor = gas_internal_energy_from_entropy(rho, A_floor);
+
+  /* Absolute minimum */
+  const float u_minimal = hydro_props->minimal_internal_energy;
+
+  /* Largest of both limits */
+  const float u_limit = max(u_minimal, u_floor);
+
   /* First, check whether we may end up below the minimal energy after
    * this step 1/2 kick + another 1/2 kick that could potentially be for
    * a time-step twice as big. We hence check for 1.5 delta_t. */
-  if (u_old + total_du_dt * 1.5 * dt_therm < u_floor) {
-    total_du_dt = (u_floor - u_old) / (1.5f * dt_therm);
+  if (u_old + total_du_dt * 1.5 * dt_therm < u_limit) {
+    total_du_dt = (u_limit - u_old) / (1.5f * dt_therm);
   }
 
   /* Second, check whether the energy used in the prediction could get negative.
diff --git a/src/cooling/EAGLE/cooling.c b/src/cooling/EAGLE/cooling.c
index d657beeaa4f36d1efd3c5cc407f8c1d689037693..87c12e73d970b4d1caa6ce33c20666556463e08c 100644
--- a/src/cooling/EAGLE/cooling.c
+++ b/src/cooling/EAGLE/cooling.c
@@ -36,6 +36,7 @@
 #include "cooling_rates.h"
 #include "cooling_struct.h"
 #include "cooling_tables.h"
+#include "entropy_floor.h"
 #include "error.h"
 #include "exp10.h"
 #include "hydro.h"
@@ -444,17 +445,19 @@ INLINE static double bisection_iter(
  * @param us The internal system of units.
  * @param cosmo The current cosmological model.
  * @param hydro_properties the hydro_props struct
+ * @param floor_props Properties of the entropy floor.
  * @param cooling The #cooling_function_data used in the run.
  * @param p Pointer to the particle data.
  * @param xp Pointer to the extended particle data.
  * @param dt The cooling time-step of this particle.
  * @param dt_therm The hydro time-step of this particle.
  */
-void cooling_cool_part(const struct phys_const *restrict phys_const,
-                       const struct unit_system *restrict us,
-                       const struct cosmology *restrict cosmo,
-                       const struct hydro_props *restrict hydro_properties,
-                       const struct cooling_function_data *restrict cooling,
+void cooling_cool_part(const struct phys_const *phys_const,
+                       const struct unit_system *us,
+                       const struct cosmology *cosmo,
+                       const struct hydro_props *hydro_properties,
+                       const struct entropy_floor_properties *floor_props,
+                       const struct cooling_function_data *cooling,
                        struct part *restrict p, struct xpart *restrict xp,
                        const float dt, const float dt_therm) {
 
@@ -596,11 +599,22 @@ void cooling_cool_part(const struct phys_const *restrict phys_const,
 
   /* We now need to check that we are not going to go below any of the limits */
 
+  /* Limit imposed by the entropy floor */
+  const double A_floor = entropy_floor(p, cosmo, floor_props);
+  const double rho = hydro_get_physical_density(p, cosmo);
+  const double u_floor = gas_internal_energy_from_entropy(rho, A_floor);
+
+  /* Absolute minimum */
+  const double u_minimal = hydro_properties->minimal_internal_energy;
+
+  /* Largest of both limits */
+  const double u_limit = max(u_minimal, u_floor);
+
   /* First, check whether we may end up below the minimal energy after
    * this step 1/2 kick + another 1/2 kick that could potentially be for
    * a time-step twice as big. We hence check for 1.5 delta_u. */
-  if (u_start + 1.5 * delta_u < hydro_properties->minimal_internal_energy) {
-    delta_u = (hydro_properties->minimal_internal_energy - u_start) / 1.5;
+  if (u_start + 1.5 * delta_u < u_limit) {
+    delta_u = (u_limit - u_start) / 1.5;
   }
 
   /* Second, check whether the energy used in the prediction could get negative.
diff --git a/src/cooling/EAGLE/cooling.h b/src/cooling/EAGLE/cooling.h
index a811007136168646a98df0c0da7b43227de082f8..d95c75e58aecfd8fb4816f1e50c7a3f379a08e51 100644
--- a/src/cooling/EAGLE/cooling.h
+++ b/src/cooling/EAGLE/cooling.h
@@ -26,20 +26,22 @@
 
 /* Local includes. */
 #include "cooling_struct.h"
-#include "cosmology.h"
-#include "hydro_properties.h"
-#include "part.h"
-#include "physical_constants.h"
-#include "units.h"
+
+struct part;
+struct xpart;
+struct cosmology;
+struct hydro_props;
+struct entropy_floor_properties;
 
 void cooling_update(const struct cosmology *cosmo,
                     struct cooling_function_data *cooling);
 
-void cooling_cool_part(const struct phys_const *restrict phys_const,
-                       const struct unit_system *restrict us,
-                       const struct cosmology *restrict cosmo,
-                       const struct hydro_props *restrict hydro_properties,
-                       const struct cooling_function_data *restrict cooling,
+void cooling_cool_part(const struct phys_const *phys_const,
+                       const struct unit_system *us,
+                       const struct cosmology *cosmo,
+                       const struct hydro_props *hydro_properties,
+                       const struct entropy_floor_properties *floor_props,
+                       const struct cooling_function_data *cooling,
                        struct part *restrict p, struct xpart *restrict xp,
                        const float dt, const float dt_therm);
 
diff --git a/src/cooling/const_du/cooling.h b/src/cooling/const_du/cooling.h
index 15fa614198687a7aa1b71a789de9190cc34a276a..15eecc43093e9cc571d166aa49392e7dfea60c11 100644
--- a/src/cooling/const_du/cooling.h
+++ b/src/cooling/const_du/cooling.h
@@ -28,17 +28,20 @@
  * This is the simplest possible cooling function. A constant cooling rate
  * (du/dt) with a minimal energy floor is applied. Should be used as a template
  * for more realistic functions.
+ *
+ * This cooling model does NOT include cosmological terms and will hence yield
+ * an incorrect answer when running in co-moving coordinates.
  */
 
 /* Config parameters. */
 #include "../config.h"
 
 /* Some standard headers. */
+#include <float.h>
 #include <math.h>
 
 /* Local includes. */
-#include "const.h"
-#include "error.h"
+#include "entropy_floor.h"
 #include "hydro.h"
 #include "parser.h"
 #include "part.h"
@@ -67,6 +70,7 @@ INLINE static void cooling_update(const struct cosmology* cosmo,
  * @param us The internal system of units.
  * @param cosmo The current cosmological model.
  * @param hydro_props The properties of the hydro scheme.
+ * @param floor_props Properties of the entropy floor.
  * @param cooling The #cooling_function_data used in the run.
  * @param p Pointer to the particle data.
  * @param xp Pointer to the extended particle data.
@@ -78,6 +82,7 @@ __attribute__((always_inline)) INLINE static void cooling_cool_part(
     const struct unit_system* restrict us,
     const struct cosmology* restrict cosmo,
     const struct hydro_props* hydro_props,
+    const struct entropy_floor_properties* floor_props,
     const struct cooling_function_data* restrict cooling,
     struct part* restrict p, struct xpart* restrict xp, const float dt,
     const float dt_therm) {
diff --git a/src/cooling/const_lambda/cooling.h b/src/cooling/const_lambda/cooling.h
index eef5ad3c5d3fead96dcd962cb57d04482795d62a..974b055b1f8942f53b72e6ccf17389d8f85b666e 100644
--- a/src/cooling/const_lambda/cooling.h
+++ b/src/cooling/const_lambda/cooling.h
@@ -37,6 +37,7 @@
 
 /* Local includes. */
 #include "const.h"
+#include "entropy_floor.h"
 #include "error.h"
 #include "hydro.h"
 #include "parser.h"
@@ -99,6 +100,7 @@ __attribute__((always_inline)) INLINE static double cooling_rate_cgs(
  * @param us The internal system of units.
  * @param cosmo The current cosmological model.
  * @param hydro_props The properties of the hydro scheme.
+ * @param floor_props Properties of the entropy floor.
  * @param cooling The #cooling_function_data used in the run.
  * @param p Pointer to the particle data.
  * @param xp Pointer to the particle' extended data.
@@ -110,6 +112,7 @@ __attribute__((always_inline)) INLINE static void cooling_cool_part(
     const struct unit_system* restrict us,
     const struct cosmology* restrict cosmo,
     const struct hydro_props* hydro_props,
+    const struct entropy_floor_properties* floor_props,
     const struct cooling_function_data* restrict cooling,
     struct part* restrict p, struct xpart* restrict xp, const float dt,
     const float dt_therm) {
@@ -117,9 +120,6 @@ __attribute__((always_inline)) INLINE static void cooling_cool_part(
   /* Nothing to do here? */
   if (dt == 0.) return;
 
-  /* Internal energy floor */
-  const float u_floor = hydro_props->minimal_internal_energy;
-
   /* Current energy */
   const float u_old = hydro_get_physical_internal_energy(p, xp, cosmo);
 
@@ -141,11 +141,22 @@ __attribute__((always_inline)) INLINE static void cooling_cool_part(
 
   /* We now need to check that we are not going to go below any of the limits */
 
+  /* Limit imposed by the entropy floor */
+  const float A_floor = entropy_floor(p, cosmo, floor_props);
+  const float rho = hydro_get_physical_density(p, cosmo);
+  const float u_floor = gas_internal_energy_from_entropy(rho, A_floor);
+
+  /* Absolute minimum */
+  const float u_minimal = hydro_props->minimal_internal_energy;
+
+  /* Largest of both limits */
+  const float u_limit = max(u_minimal, u_floor);
+
   /* First, check whether we may end up below the minimal energy after
    * this step 1/2 kick + another 1/2 kick that could potentially be for
    * a time-step twice as big. We hence check for 1.5 delta_t. */
-  if (u_old + total_du_dt * 1.5 * dt_therm < u_floor) {
-    total_du_dt = (u_floor - u_old) / (1.5f * dt_therm);
+  if (u_old + total_du_dt * 1.5 * dt_therm < u_limit) {
+    total_du_dt = (u_limit - u_old) / (1.5f * dt_therm);
   }
 
   /* Second, check whether the energy used in the prediction could get negative.
diff --git a/src/cooling/grackle/cooling.h b/src/cooling/grackle/cooling.h
index 058ed387bf2097ce789a3807641c3b92b9bc7af0..2632de7e223306a6c6400e350f8cb62a62e58206 100644
--- a/src/cooling/grackle/cooling.h
+++ b/src/cooling/grackle/cooling.h
@@ -24,9 +24,6 @@
  * @brief Cooling using the GRACKLE 3.0 library.
  */
 
-/* Config parameters. */
-#include "../config.h"
-
 /* Some standard headers. */
 #include <float.h>
 #include <math.h>
@@ -37,6 +34,7 @@
 /* Local includes. */
 #include "chemistry.h"
 #include "cooling_io.h"
+#include "entropy_floor.h"
 #include "error.h"
 #include "hydro.h"
 #include "parser.h"
@@ -48,6 +46,20 @@
 #define GRACKLE_NPART 1
 #define GRACKLE_RANK 3
 
+/* prototype */
+static gr_float cooling_time(
+    const struct phys_const* restrict phys_const,
+    const struct unit_system* restrict us,
+    const struct cosmology* restrict cosmo,
+    const struct cooling_function_data* restrict cooling,
+    const struct part* restrict p, const struct xpart* restrict xp);
+static gr_float cooling_new_energy(
+    const struct phys_const* restrict phys_const,
+    const struct unit_system* restrict us,
+    const struct cosmology* restrict cosmo,
+    const struct cooling_function_data* restrict cooling,
+    const struct part* restrict p, struct xpart* restrict xp, double dt);
+
 /**
  * @brief Common operations performed on the cooling function at a
  * given time-step or redshift.
@@ -57,24 +69,13 @@
  */
 INLINE static void cooling_update(const struct cosmology* cosmo,
                                   struct cooling_function_data* cooling) {
-  // Add content if required.
+  /* set current time */
+  if (cooling->redshift == -1)
+    cooling->units.a_value = cosmo->a;
+  else
+    cooling->units.a_value = 1. / (1. + cooling->redshift);
 }
 
-/* prototypes */
-static gr_float cooling_time(
-    const struct phys_const* restrict phys_const,
-    const struct unit_system* restrict us,
-    const struct cosmology* restrict cosmo,
-    const struct cooling_function_data* restrict cooling,
-    const struct part* restrict p, struct xpart* restrict xp);
-
-static double cooling_rate(const struct phys_const* restrict phys_const,
-                           const struct unit_system* restrict us,
-                           const struct cosmology* restrict cosmo,
-                           const struct cooling_function_data* restrict cooling,
-                           const struct part* restrict p,
-                           struct xpart* restrict xp, double dt);
-
 /**
  * @brief Print the chemical network
  *
@@ -176,7 +177,7 @@ __attribute__((always_inline)) INLINE static void cooling_compute_equilibrium(
   const double alpha = 0.01;
   double dt =
       fabs(cooling_time(phys_const, us, cosmo, &cooling_tmp, &p_tmp, xp));
-  cooling_rate(phys_const, us, cosmo, &cooling_tmp, &p_tmp, xp, dt);
+  cooling_new_energy(phys_const, us, cosmo, &cooling_tmp, &p_tmp, xp, dt);
   dt = alpha *
        fabs(cooling_time(phys_const, us, cosmo, &cooling_tmp, &p_tmp, xp));
 
@@ -192,7 +193,7 @@ __attribute__((always_inline)) INLINE static void cooling_compute_equilibrium(
     old = *xp;
 
     /* update chemistry */
-    cooling_rate(phys_const, us, cosmo, &cooling_tmp, &p_tmp, xp, dt);
+    cooling_new_energy(phys_const, us, cosmo, &cooling_tmp, &p_tmp, xp, dt);
   } while (step < max_step && !cooling_converged(xp, &old, conv_limit));
 
   if (step == max_step)
@@ -289,7 +290,8 @@ __attribute__((always_inline)) INLINE static void cooling_print_backend(
   message("\tLength       = %g", cooling->units.length_units);
   message("\tDensity      = %g", cooling->units.density_units);
   message("\tTime         = %g", cooling->units.time_units);
-  message("\tScale Factor = %g", cooling->units.a_units);
+  message("\tScale Factor = %g (units: %g)", cooling->units.a_value,
+          cooling->units.a_units);
 }
 
 /**
@@ -486,7 +488,8 @@ __attribute__((always_inline)) INLINE static void cooling_print_backend(
   cooling_copy_from_grackle3(data, p, xp, rho);
 
 /**
- * @brief Compute the cooling rate and update the particle chemistry data
+ * @brief Compute the energy of a particle after dt and update the particle
+ * chemistry data
  *
  * @param phys_const The physical constants in internal units.
  * @param us The internal system of units.
@@ -497,25 +500,15 @@ __attribute__((always_inline)) INLINE static void cooling_print_backend(
  *
  * @return du / dt
  */
-__attribute__((always_inline)) INLINE static gr_float cooling_rate(
+__attribute__((always_inline)) INLINE static gr_float cooling_new_energy(
     const struct phys_const* restrict phys_const,
     const struct unit_system* restrict us,
     const struct cosmology* restrict cosmo,
     const struct cooling_function_data* restrict cooling,
     const struct part* restrict p, struct xpart* restrict xp, double dt) {
 
-  if (cosmo->Omega_m != 0. || cosmo->Omega_r != 0. || cosmo->Omega_k != 0. ||
-      cosmo->Omega_lambda != 0. || cosmo->Omega_b != 0.)
-    error(
-        "Check cosmology factors (physical vs. co-moving and drifted vs. "
-        "un-drifted)!");
-
   /* set current time */
   code_units units = cooling->units;
-  if (cooling->redshift == -1)
-    units.a_value = cosmo->a;
-  else
-    units.a_value = 1. / (1. + cooling->redshift);
 
   /* initialize data */
   grackle_field_data data;
@@ -534,8 +527,7 @@ __attribute__((always_inline)) INLINE static gr_float cooling_rate(
 
   /* general particle data */
   gr_float density = hydro_get_physical_density(p, cosmo);
-  const double energy_before =
-      hydro_get_drifted_physical_internal_energy(p, cosmo);
+  const float energy_before = hydro_get_physical_internal_energy(p, xp, cosmo);
   gr_float energy = energy_before;
 
   /* initialize density */
@@ -554,8 +546,7 @@ __attribute__((always_inline)) INLINE static gr_float cooling_rate(
 
   /* solve chemistry */
   chemistry_data chemistry_grackle = cooling->chemistry;
-  chemistry_data_storage chemistry_rates = grackle_rates;
-  if (local_solve_chemistry(&chemistry_grackle, &chemistry_rates, &units, &data,
+  if (local_solve_chemistry(&chemistry_grackle, &grackle_rates, &units, &data,
                             dt) == 0) {
     error("Error in solve_chemistry.");
   }
@@ -563,8 +554,7 @@ __attribute__((always_inline)) INLINE static gr_float cooling_rate(
   /* copy from grackle data to particle */
   cooling_copy_from_grackle(data, p, xp, density);
 
-  /* compute rate */
-  return (energy - energy_before) / dt;
+  return energy;
 }
 
 /**
@@ -581,14 +571,10 @@ __attribute__((always_inline)) INLINE static gr_float cooling_time(
     const struct unit_system* restrict us,
     const struct cosmology* restrict cosmo,
     const struct cooling_function_data* restrict cooling,
-    const struct part* restrict p, struct xpart* restrict xp) {
+    const struct part* restrict p, const struct xpart* restrict xp) {
 
   /* set current time */
   code_units units = cooling->units;
-  if (cooling->redshift == -1)
-    error("TODO time dependant redshift");
-  else
-    units.a_value = 1. / (1. + cooling->redshift);
 
   /* initialize data */
   grackle_field_data data;
@@ -606,7 +592,7 @@ __attribute__((always_inline)) INLINE static gr_float cooling_time(
 
   /* general particle data */
   const gr_float energy_before =
-      hydro_get_drifted_physical_internal_energy(p, cosmo);
+      hydro_get_physical_internal_energy(p, xp, cosmo);
   gr_float density = hydro_get_physical_density(p, cosmo);
   gr_float energy = energy_before;
 
@@ -658,29 +644,55 @@ __attribute__((always_inline)) INLINE static void cooling_cool_part(
     const struct unit_system* restrict us,
     const struct cosmology* restrict cosmo,
     const struct hydro_props* hydro_props,
+    const struct entropy_floor_properties* floor_props,
     const struct cooling_function_data* restrict cooling,
     struct part* restrict p, struct xpart* restrict xp, double dt,
     double dt_therm) {
 
-  if (cosmo->Omega_m != 0. || cosmo->Omega_r != 0. || cosmo->Omega_k != 0. ||
-      cosmo->Omega_lambda != 0. || cosmo->Omega_b != 0.)
-    error(
-        "Check cosmology factors (physical vs. co-moving and drifted vs. "
-        "un-drifted)!");
-
+  /* Nothing to do here? */
   if (dt == 0.) return;
 
-  /* Current du_dt */
+  /* Current energy */
+  const float u_old = hydro_get_physical_internal_energy(p, xp, cosmo);
+
+  /* Current du_dt in physical coordinates (internal units) */
   const float hydro_du_dt = hydro_get_physical_internal_energy_dt(p, cosmo);
 
-  /* compute cooling rate */
-  const float du_dt = cooling_rate(phys_const, us, cosmo, cooling, p, xp, dt);
+  /* Calculate energy after dt */
+  gr_float u_new =
+      cooling_new_energy(phys_const, us, cosmo, cooling, p, xp, dt);
+
+  float delta_u = u_new - u_old + hydro_du_dt * dt_therm;
+
+  /* We now need to check that we are not going to go below any of the limits */
+
+  /* First, check whether we may end up below the minimal energy after
+   * this step 1/2 kick + another 1/2 kick that could potentially be for
+   * a time-step twice as big. We hence check for 1.5 delta_u. */
+  if (u_old + 1.5 * delta_u < hydro_props->minimal_internal_energy) {
+    delta_u = (hydro_props->minimal_internal_energy - u_old) / 1.5;
+  }
+
+  /* Second, check whether the energy used in the prediction could get negative.
+   * We need to check for the 1/2 dt kick followed by a full time-step drift
+   * that could potentially be for a time-step twice as big. We hence check
+   * for 2.5 delta_u but this time against 0 energy not the minimum.
+   * To avoid numerical rounding bringing us below 0., we add a tiny tolerance.
+   */
+  const float rounding_tolerance = 1.0e-4;
+
+  if (u_old + 2.5 * delta_u < 0.) {
+    delta_u = -u_old / (2.5 + rounding_tolerance);
+  }
+
+  /* Turn this into a rate of change (including cosmology term) */
+  const float cooling_du_dt = delta_u / dt_therm;
 
-  /* record energy lost */
-  xp->cooling_data.radiated_energy += -du_dt * dt * hydro_get_mass(p);
+  /* Update the internal energy time derivative */
+  hydro_set_physical_internal_energy_dt(p, cosmo, cooling_du_dt);
 
-  /* Update the internal energy */
-  hydro_set_physical_internal_energy_dt(p, cosmo, hydro_du_dt + du_dt);
+  /* Store the radiated energy */
+  xp->cooling_data.radiated_energy -= hydro_get_mass(p) * cooling_du_dt * dt;
 }
 
 static INLINE float cooling_get_temperature(
@@ -729,7 +741,7 @@ __attribute__((always_inline)) INLINE static void cooling_init_units(
   /* These are conversions from code units to cgs. */
 
   /* first cosmo */
-  cooling->units.a_units = 1.0;  // units for the expansion factor (1/1+zi)
+  cooling->units.a_units = 1.0;  // units for the expansion factor
   cooling->units.a_value = 1.0;
 
   /* We assume here all physical quantities to
@@ -738,12 +750,12 @@ __attribute__((always_inline)) INLINE static void cooling_init_units(
 
   /* then units */
   cooling->units.density_units =
-      us->UnitMass_in_cgs / pow(us->UnitLength_in_cgs, 3);
-  cooling->units.length_units = us->UnitLength_in_cgs;
-  cooling->units.time_units = us->UnitTime_in_cgs;
-  cooling->units.velocity_units = cooling->units.a_units *
-                                  cooling->units.length_units /
-                                  cooling->units.time_units;
+      units_cgs_conversion_factor(us, UNIT_CONV_DENSITY);
+  cooling->units.length_units =
+      units_cgs_conversion_factor(us, UNIT_CONV_LENGTH);
+  cooling->units.time_units = units_cgs_conversion_factor(us, UNIT_CONV_TIME);
+  cooling->units.velocity_units =
+      units_cgs_conversion_factor(us, UNIT_CONV_VELOCITY);
 }
 
 /**
@@ -822,6 +834,7 @@ __attribute__((always_inline)) INLINE static void cooling_init_backend(
   /* Set up the units system. */
   cooling_init_units(us, cooling);
 
+  /* Set up grackle */
   cooling_init_grackle(cooling);
 }
 
diff --git a/src/cooling/grackle/cooling_io.h b/src/cooling/grackle/cooling_io.h
index 88235a20a2f9d150b56f59ee32fa9ab91941e659..3905cafd05fb8e15ddf33f4ea688d6144698df73 100644
--- a/src/cooling/grackle/cooling_io.h
+++ b/src/cooling/grackle/cooling_io.h
@@ -19,9 +19,6 @@
 #ifndef SWIFT_COOLING_GRACKLE_IO_H
 #define SWIFT_COOLING_GRACKLE_IO_H
 
-/* Config parameters. */
-#include "../config.h"
-
 /* Local includes */
 #include "cooling_struct.h"
 #include "io_properties.h"
@@ -64,8 +61,6 @@ __attribute__((always_inline)) INLINE static int cooling_write_particles(
 
   int num = 0;
 
-  if (cooling->output_mode == 0) return num;
-
 #if COOLING_GRACKLE_MODE >= 1
   /* List what we want to write */
   list[0] = io_make_output_field("HI", FLOAT, 1, UNIT_CONV_NO_UNITS, xparts,
@@ -89,8 +84,6 @@ __attribute__((always_inline)) INLINE static int cooling_write_particles(
   num += 6;
 #endif
 
-  if (cooling->output_mode == 1) return num;
-
 #if COOLING_GRACKLE_MODE >= 2
   list += num;
 
@@ -106,8 +99,6 @@ __attribute__((always_inline)) INLINE static int cooling_write_particles(
   num += 3;
 #endif
 
-  if (cooling->output_mode == 2) return num;
-
 #if COOLING_GRACKLE_MODE >= 3
   list += num;
 
@@ -156,9 +147,6 @@ __attribute__((always_inline)) INLINE static void cooling_read_parameters(
   cooling->self_shielding_method = parser_get_opt_param_int(
       parameter_file, "GrackleCooling:SelfShieldingMethod", 0);
 
-  cooling->output_mode =
-      parser_get_opt_param_int(parameter_file, "GrackleCooling:OutputMode", 0);
-
   cooling->max_step = parser_get_opt_param_int(
       parameter_file, "GrackleCooling:MaxSteps", 10000);
 
diff --git a/src/cooling/grackle/cooling_struct.h b/src/cooling/grackle/cooling_struct.h
index 6d4b37a6240446d580818e16ff887c8079e319a6..66c385234cccf6532100e86f2c16a508b1112baa 100644
--- a/src/cooling/grackle/cooling_struct.h
+++ b/src/cooling/grackle/cooling_struct.h
@@ -19,8 +19,6 @@
 #ifndef SWIFT_COOLING_STRUCT_GRACKLE_H
 #define SWIFT_COOLING_STRUCT_GRACKLE_H
 
-#include "../config.h"
-
 /* include grackle */
 #include <grackle.h>
 
@@ -61,9 +59,6 @@ struct cooling_function_data {
   /* Self shielding method (<= 3) means grackle modes */
   int self_shielding_method;
 
-  /* Output mode (correspond to primordial chemistry mode */
-  int output_mode;
-
   /* convergence limit for first init */
   float convergence_limit;
 
diff --git a/src/cooling/none/cooling.h b/src/cooling/none/cooling.h
index a9921720d3ddd945c439515c91692dbb0b132445..3f90d357ad863da0525859f06e85a4cc492d3ae2 100644
--- a/src/cooling/none/cooling.h
+++ b/src/cooling/none/cooling.h
@@ -23,18 +23,19 @@
  * @file src/cooling/none/cooling.h
  * @brief Empty infrastructure for the cases without cooling function
  */
+#include "../config.h"
 
 /* Some standard headers. */
 #include <float.h>
 #include <math.h>
 
 /* Local includes. */
-#include "error.h"
+#include "cooling_struct.h"
+#include "cosmology.h"
+#include "entropy_floor.h"
 #include "hydro.h"
-#include "parser.h"
+#include "hydro_properties.h"
 #include "part.h"
-#include "physical_constants.h"
-#include "units.h"
 
 /**
  * @brief Common operations performed on the cooling function at a
@@ -70,6 +71,7 @@ __attribute__((always_inline)) INLINE static void cooling_cool_part(
     const struct unit_system* restrict us,
     const struct cosmology* restrict cosmo,
     const struct hydro_props* hydro_props,
+    const struct entropy_floor_properties* floor_props,
     const struct cooling_function_data* restrict cooling,
     struct part* restrict p, struct xpart* restrict xp, const float dt,
     const float dt_therm) {}
diff --git a/src/engine.c b/src/engine.c
index 51751a3b0c41fca1bcefacabaf430dffeee269b8..ae3d692e730e9abf299f6d0ba0adf57536ccfbfb 100644
--- a/src/engine.c
+++ b/src/engine.c
@@ -1904,6 +1904,35 @@ void engine_print_task_counts(const struct engine *e) {
   const int nr_tasks = sched->nr_tasks;
   const struct task *const tasks = sched->tasks;
 
+  /* Global tasks and cells when using MPI. */
+#ifdef WITH_MPI
+  if (e->nodeID == 0 && e->total_nr_tasks > 0)
+    printf(
+        "[%04i] %s engine_print_task_counts: System total: %lld,"
+        " no. cells: %lld\n",
+        e->nodeID, clocks_get_timesincestart(), e->total_nr_tasks,
+        e->total_nr_cells);
+  fflush(stdout);
+#endif
+
+  /* Report value that can be used to estimate the task_per_cells parameter. */
+  float tasks_per_cell = (float)nr_tasks / (float)e->s->tot_cells;
+
+#ifdef WITH_MPI
+  message("Total = %d (per cell = %.2f)", nr_tasks, tasks_per_cell);
+
+  /* And the system maximum on rank 0, only after first step, increase by our
+   * margin to allow for some variation in repartitioning. */
+  if (e->nodeID == 0 && e->total_nr_tasks > 0) {
+    message("Total = %d (maximum per cell = %.2f)", nr_tasks,
+            e->tasks_per_cell_max * engine_tasks_per_cell_margin);
+  }
+
+#else
+  message("Total = %d (per cell = %.2f)", nr_tasks, tasks_per_cell);
+#endif
+  fflush(stdout);
+
   /* Count and print the number of each task type. */
   int counts[task_type_count + 1];
   for (int k = 0; k <= task_type_count; k++) counts[k] = 0;
@@ -1915,8 +1944,7 @@ void engine_print_task_counts(const struct engine *e) {
     } else
       counts[(int)tasks[k].type] += 1;
   }
-  message("Total = %d  (per cell = %d)", nr_tasks,
-          (int)ceil((double)nr_tasks / e->s->tot_cells));
+
 #ifdef WITH_MPI
   printf("[%04i] %s engine_print_task_counts: task counts are [ %s=%i",
          e->nodeID, clocks_get_timesincestart(), taskID_names[0], counts[0]);
@@ -1941,7 +1969,7 @@ void engine_print_task_counts(const struct engine *e) {
  * @brief if necessary, estimate the number of tasks required given
  *        the current tasks in use and the numbers of cells.
  *
- * If e->tasks_per_cell is set greater than 0 then that value is used
+ * If e->tasks_per_cell is set greater than 0.0 then that value is used
  * as the estimate of the average number of tasks per cell,
  * otherwise we attempt an estimate.
  *
@@ -1951,8 +1979,13 @@ void engine_print_task_counts(const struct engine *e) {
  */
 int engine_estimate_nr_tasks(const struct engine *e) {
 
-  int tasks_per_cell = e->tasks_per_cell;
-  if (tasks_per_cell > 0) return e->s->tot_cells * tasks_per_cell;
+  float tasks_per_cell = e->tasks_per_cell;
+  if (tasks_per_cell > 0.0f) {
+    if (e->verbose)
+      message("tasks per cell given as: %.2f, so maximum tasks: %d",
+              e->tasks_per_cell, (int)(e->s->tot_cells * tasks_per_cell));
+    return (int)(e->s->tot_cells * tasks_per_cell);
+  }
 
   /* Our guess differs depending on the types of tasks we are using, but we
    * basically use a formula <n1>*ntopcells + <n2>*(totcells - ntopcells).
@@ -2057,15 +2090,15 @@ int engine_estimate_nr_tasks(const struct engine *e) {
   int ncells = e->s->tot_cells;
 #endif
 
-  double ntasks = n1 * ntop + n2 * (ncells - ntop);
+  float ntasks = n1 * ntop + n2 * (ncells - ntop);
   if (ncells > 0) tasks_per_cell = ceil(ntasks / ncells);
 
-  if (tasks_per_cell < 1.0) tasks_per_cell = 1.0;
+  if (tasks_per_cell < 1.0f) tasks_per_cell = 1.0f;
   if (e->verbose)
-    message("tasks per cell estimated as: %d, maximum tasks: %d",
-            tasks_per_cell, ncells * tasks_per_cell);
+    message("tasks per cell estimated as: %.2f, maximum tasks: %d",
+            tasks_per_cell, (int)(ncells * tasks_per_cell));
 
-  return ncells * tasks_per_cell;
+  return (int)(ncells * tasks_per_cell);
 }
 
 /**
@@ -2515,7 +2548,9 @@ void engine_collect_end_of_step(struct engine *e, int apply) {
       &e->collect_group1, data.updated, data.g_updated, data.s_updated,
       data.inhibited, data.g_inhibited, data.s_inhibited, data.ti_hydro_end_min,
       data.ti_hydro_end_max, data.ti_hydro_beg_max, data.ti_gravity_end_min,
-      data.ti_gravity_end_max, data.ti_gravity_beg_max, e->forcerebuild);
+      data.ti_gravity_end_max, data.ti_gravity_beg_max, e->forcerebuild,
+      e->s->tot_cells, e->sched.nr_tasks,
+      (float)e->sched.nr_tasks / (float)e->s->tot_cells);
 
 /* Aggregate collective data from the different nodes for this step. */
 #ifdef WITH_MPI
@@ -4262,6 +4297,8 @@ void engine_init(struct engine *e, struct space *s, struct swift_params *params,
   e->cputime_last_step = 0;
   e->last_repartition = 0;
 #endif
+  e->total_nr_cells = 0;
+  e->total_nr_tasks = 0;
 
 #if defined(WITH_LOGGER)
   e->logger = (struct logger *)malloc(sizeof(struct logger));
@@ -4794,8 +4831,10 @@ void engine_config(int restart, struct engine *e, struct swift_params *params,
    * On restart this number cannot be estimated (no cells yet), so we recover
    * from the end of the dumped run. Can be changed on restart. */
   e->tasks_per_cell =
-      parser_get_opt_param_int(params, "Scheduler:tasks_per_cell", 0);
-  int maxtasks = 0;
+      parser_get_opt_param_float(params, "Scheduler:tasks_per_cell", 0.0);
+  e->tasks_per_cell_max = 0.0f;
+
+  float maxtasks = 0;
   if (restart)
     maxtasks = e->restart_max_tasks;
   else
diff --git a/src/engine.h b/src/engine.h
index 938213d8c5b8889266e74e38f6fb2c00baa1d256..0e0e9895a8b0d1928e48c52ad760d2303447c24d 100644
--- a/src/engine.h
+++ b/src/engine.h
@@ -107,6 +107,7 @@ enum engine_step_properties {
 #define engine_default_timesteps_file_name "timesteps"
 #define engine_max_parts_per_ghost 1000
 #define engine_max_sparts_per_ghost 1000
+#define engine_tasks_per_cell_margin 1.2
 
 /**
  * @brief The rank of the engine as a global variable (for messages).
@@ -211,7 +212,13 @@ struct engine {
   /* Total numbers of particles in the system. */
   long long total_nr_parts, total_nr_gparts, total_nr_sparts;
 
-  /* The total number of inhibted particles in the system. */
+  /* Total numbers of cells (top-level and sub-cells) in the system. */
+  long long total_nr_cells;
+
+  /* Total numbers of tasks in the system. */
+  long long total_nr_tasks;
+
+  /* The total number of inhibited particles in the system. */
   long long nr_inhibited_parts, nr_inhibited_gparts, nr_inhibited_sparts;
 
 #ifdef SWIFT_DEBUG_CHECKS
@@ -326,8 +333,9 @@ struct engine {
   size_t nr_links, size_links;
 
   /* Average number of tasks per cell. Used to estimate the sizes
-   * of the various task arrays. */
-  size_t tasks_per_cell;
+   * of the various task arrays. Also the maximum from all ranks. */
+  float tasks_per_cell;
+  float tasks_per_cell_max;
 
   /* Average number of links per tasks. This number is used before
      the creation of communication tasks so needs to be large enough. */
diff --git a/src/entropy_floor.h b/src/entropy_floor.h
index e65d55b2ab51047748403ee690edd7cb361eda1d..9f2e97ccc815bee1087884d060492ff3715f1c6f 100644
--- a/src/entropy_floor.h
+++ b/src/entropy_floor.h
@@ -27,6 +27,10 @@
 /* Config parameters. */
 #include "../config.h"
 
+#include "common_io.h"
+#include "error.h"
+#include "inline.h"
+
 /* Import the right entropy floor definition */
 #if defined(ENTROPY_FLOOR_NONE)
 #include "./entropy_floor/none/entropy_floor.h"
diff --git a/src/entropy_floor/EAGLE/entropy_floor.h b/src/entropy_floor/EAGLE/entropy_floor.h
index 4f69d5f8e4d23f7f1cc7fb6a5fc93ff795d7d991..41d35fa0484cc1ee491a3c6293893ad5d2b5583f 100644
--- a/src/entropy_floor/EAGLE/entropy_floor.h
+++ b/src/entropy_floor/EAGLE/entropy_floor.h
@@ -21,6 +21,7 @@
 
 #include "adiabatic_index.h"
 #include "cosmology.h"
+#include "hydro.h"
 #include "hydro_properties.h"
 #include "parser.h"
 #include "units.h"
diff --git a/src/entropy_floor/none/entropy_floor.h b/src/entropy_floor/none/entropy_floor.h
index 1ffae096f84f77b4a49083cb60b28bc2530a5a98..871ef8977e091841128e280184646e3be02957fd 100644
--- a/src/entropy_floor/none/entropy_floor.h
+++ b/src/entropy_floor/none/entropy_floor.h
@@ -25,6 +25,10 @@
  * floors.
  */
 
+struct cosmology;
+struct hydro_props;
+struct part;
+
 /**
  * @brief Properties of the entropy floor.
  *
diff --git a/src/runner.c b/src/runner.c
index 035e1ab3f3720be807a3ea8df71de726615c9c1c..354d40f52e5e256c7e3447ae58ce5ac3e1abae0f 100644
--- a/src/runner.c
+++ b/src/runner.c
@@ -433,6 +433,7 @@ void runner_do_cooling(struct runner *r, struct cell *c, int timer) {
   const struct phys_const *constants = e->physical_constants;
   const struct unit_system *us = e->internal_units;
   const struct hydro_props *hydro_props = e->hydro_properties;
+  const struct entropy_floor_properties *entropy_floor_props = e->entropy_floor;
   const double time_base = e->time_base;
   const integertime_t ti_current = e->ti_current;
   struct part *restrict parts = c->hydro.parts;
@@ -476,8 +477,9 @@ void runner_do_cooling(struct runner *r, struct cell *c, int timer) {
         }
 
         /* Let's cool ! */
-        cooling_cool_part(constants, us, cosmo, hydro_props, cooling_func, p,
-                          xp, dt_cool, dt_therm);
+        cooling_cool_part(constants, us, cosmo, hydro_props,
+                          entropy_floor_props, cooling_func, p, xp, dt_cool,
+                          dt_therm);
       }
     }
   }
diff --git a/tools/plot_task_dependencies.py b/tools/plot_task_dependencies.py
index 11bb093423232fa6546c7ffdaecc62c23fbfbb37..14ba7c99f621c0c62c3c9cb8e8d3b0c78e491401 100644
--- a/tools/plot_task_dependencies.py
+++ b/tools/plot_task_dependencies.py
@@ -5,10 +5,25 @@ This file generates a graphviz file that represents the SWIFT tasks
 
 Example: ./plot_task_dependencies.py dependency_graph_*.csv
 """
-import sys
 from pandas import read_csv
 import numpy as np
 from subprocess import call
+from optparse import OptionParser
+
+
+def parseOption():
+    parser = OptionParser()
+
+    parser.add_option(
+        "-c", "--with-calls", dest="with_calls",
+        help="Add the function calls in the graph",
+        action="store_true")
+
+    opt, files = parser.parse_args()
+    if len(files) != 1:
+        raise Exception("You need to provide one file")
+
+    return opt, files
 
 
 def getGitVersion(f, git):
@@ -190,7 +205,63 @@ def taskIsGravity(name):
     return False
 
 
-def writeTask(f, name, implicit, mpi):
+def getFunctionCalls(name):
+    txt = None
+    if name == "ghost":
+        txt = """hydro_end_density, chemistry_end_density,<br/>
+        hydro_prepare_gradient, hydro_reset_gradient,<br/>
+        hydro_prepare_force, hydro_reset_acceleration,<br/>
+        hydro_init_part, chemistry_init_part,<br/>
+        hydro_has_no_neighbours, chemistry_part_has_no_neighbours
+        """
+
+    elif name == "cooling":
+        txt = "cooling_cool_part"
+
+    elif name == "timestep":
+        txt = "tracers_after_timestep"
+
+    elif name == "drift_part":
+        txt = """drift_part, tracers_after_drift,<br/>
+        hydro_init_part, chemistry_init_part,<br/>
+        tracers_after_init
+        """
+
+    elif name == "kick1":
+        txt = "kick_part, kick_gpart, kick_spart"
+
+    elif name == "kick2":
+        txt = """kick_part, kick_gpart, kick_spart,<br/>
+        hydro_reset_predicted_values,
+        gravity_reset_predicted_Values,<br/>
+        stars_reset_predicted_values,
+        """
+
+    elif name == "end_force":
+        txt = """hydro_end_force, gravity_end_force,<br/>
+        stars_end_force"""
+
+    elif name == "drift_gpart":
+        txt = """drift_gpart, gravity_init_gpart,<br/>
+        drift_spart
+        """
+
+    if "density" in name and "stars" not in name:
+        txt = """runner_iact_nonsym_chemistry, runner_iact_chemistry,<br/>
+        runner_iact_nonsym_density, runner_iact_density"""
+
+    if "force" in name and "end" not in name:
+        txt = "runner_iact_nonsym_density, runner_iact_density"
+
+    if txt is None:
+        return None
+    else:
+        pre = "<" + name + "<BR/> <Font POINT-SIZE='10'>Calls: "
+        app = "</Font>>"
+        return pre + txt + app
+
+
+def writeTask(f, name, implicit, mpi, with_calls):
     """
     Write the special task (e.g. implicit and mpi)
 
@@ -208,6 +279,9 @@ def writeTask(f, name, implicit, mpi):
 
     mpi: int
         Is the task MPI related
+
+    with_calls: bool
+        if true, write down the function calls
     """
     # generate text
     txt = "\t " + name + "["
@@ -226,6 +300,11 @@ def writeTask(f, name, implicit, mpi):
     if taskIsGravity(name):
         txt += "color=red3,"
 
+    if with_calls:
+        func = getFunctionCalls(name)
+        if func is not None:
+            txt += "label=" + func + ","
+
     # remove extra ','
     if txt[-1] == ",":
         txt = txt[:-1]
@@ -235,7 +314,7 @@ def writeTask(f, name, implicit, mpi):
     f.write(txt)
 
 
-def writeHeader(f, data, git):
+def writeHeader(f, data, git, opt):
     """
     Write the header and the special tasks
 
@@ -250,6 +329,9 @@ def writeHeader(f, data, git):
 
     git: str
         The git version
+
+    opt: object
+        The options provided to this script
     """
     # write header
     f.write("digraph task_dep {\n")
@@ -272,7 +354,8 @@ def writeHeader(f, data, git):
             continue
 
         written.append(ta)
-        writeTask(f, ta, data["implicit_in"][i], data["mpi_in"][i])
+        writeTask(f, ta, data["implicit_in"][i], data["mpi_in"][i],
+                  opt.with_calls)
 
     # do task out
     for i in range(N):
@@ -281,7 +364,8 @@ def writeHeader(f, data, git):
             continue
 
         written.append(tb)
-        writeTask(f, tb, data["implicit_out"][i], data["mpi_out"][i])
+        writeTask(f, tb, data["implicit_out"][i], data["mpi_out"][i],
+                  opt.with_calls)
 
     f.write("\n")
 
@@ -402,10 +486,8 @@ def writeFooter(f):
 
 
 if __name__ == "__main__":
-    # get input
-    filenames = sys.argv[1:]
-    if len(filenames) < 1:
-        raise Exception("You should provide at least a file name")
+
+    opt, files = parseOption()
 
     # output
     dot_output = "dependency_graph.dot"
@@ -414,7 +496,7 @@ if __name__ == "__main__":
     # read files
     data = []
     git = None
-    for f in filenames:
+    for f in files:
         tmp = read_csv(f, delimiter=",", comment="#")
         git = getGitVersion(f, git)
         data.append(tmp)
@@ -423,7 +505,7 @@ if __name__ == "__main__":
 
     # write output
     with open(dot_output, "w") as f:
-        writeHeader(f, data, git)
+        writeHeader(f, data, git, opt)
 
         writeClusters(f, data)
 
@@ -434,3 +516,7 @@ if __name__ == "__main__":
     call(["dot", "-Tpng", dot_output, "-o", png_output])
 
     print("You will find the graph in %s" % png_output)
+
+    if opt.with_calls:
+        print("We recommand to use the python package xdot available on pypi:")
+        print("  python -m xdot %s" % dot_output)