Skip to content
Snippets Groups Projects

[WIP] Fortran bindings

Open Aidan Chalk requested to merge fortran_bindings into master

Created Fortran Bindings for quicksched master, including a few extra functions required to avoid declaring the qsched structure inside fortran (because of pthread difficulty).

Tested with OpenMP version (don't have a pthread setup to test with i think).

Makefile.am in the fortran_examples folder is ignored.

This is created hopefully to test DL_POLY with at some point soon, but should be generally easy(ish) to use in FORTRAN codes.

Do we want any other examples? I'm loathe to implement anything particularly complex but currently I have not tested:

qsched_adduse, qsched_addlock, qsched_free, qsched_reset, qsched_ensure, qsched_res_own.

New functions: f_qsched_create, f_qsched_destroy control the creation of the quicksched object.

Will quickly fix them emory leak (no call to qsched_free) in the super simple example

Merge request reports

Ready to merge by members who can write to the target branch.
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
1646 1674 lock_init( &s->lock );
1647 1675
1648 1676 }
1677
1678
1679 struct qsched * f_qsched_create()
1680 {
1681 struct qsched *s;
1682 s = (struct qsched *) malloc(sizeof(struct qsched));
1683 return s;
1684 }
1685 void f_qsched_destroy( struct qsched *s)
1686 {
1687 free(s);
1688 }
  • Aidan Chalk
  • 1646 1674 lock_init( &s->lock );
    1647 1675
    1648 1676 }
    1677
    1678
    1679 struct qsched * f_qsched_create()
    1680 {
    1681 struct qsched *s;
    1682 s = (struct qsched *) malloc(sizeof(struct qsched));
    1683 return s;
    1684 }
    1685 void f_qsched_destroy( struct qsched *s)
    1686 {
    1687 free(s);
    1688 }
    • I originally created them so these are separate from qsched_init and qsched_free to mirror C more closely where you can do

      struct qsched s;
      qsched_init(s,..);
      ...
      qsched_free(s);
      qsched_init(s,...);
      ...
      qsched_free(s);

      If this function also does qsched_free I felt it was more likely to lead to crashes where people call qsched_free then f_qsched_destroy - but I'm not sure which is better.

  • Aidan Chalk
  • 1646 1674 lock_init( &s->lock );
    1647 1675
    1648 1676 }
    1677
    1678
    1679 struct qsched * f_qsched_create()
    1680 {
    1681 struct qsched *s;
    1682 s = (struct qsched *) malloc(sizeof(struct qsched));
    1683 return s;
    1684 }
    1685 void f_qsched_destroy( struct qsched *s)
    1686 {
    1687 free(s);
    1688 }
    • I don't want to port the qsched struct to FORTRAN due to the Pthread specific stuff so this is basically just wrappers for the struct qsched s; you'd do in C.

  • Aidan Chalk Added 1 commit:

    Added 1 commit:

    • 39024352 - Missed the module file for including when using make install
  • @nnrw56 remembered what I need you to check if possible - why the makefiles for the fortran_example folder isn't working.

  • Aidan Chalk Added 1 commit:

    Added 1 commit:

    • 644009c6 - Attempt to fix up the Makefile.am to make building more reliable
  • Have not managed to get any performance from this yet - my attempt to use this as part of DL_POLY results in the task code (i.e. the functions called inside the runner function) performing X times slower than in serial where X is (roughly) the number of threads active, and I have no reasonable explanation for the behaviour I see. Profilers just tell me "this function is slower", as does wrapping the function in omp_get_wtime().

    The scheduler itself seems to behave sensibly (quicksched timers result in 24 threads resulting in no more than 3-4x longer spent in the scheduler), so I am completely stuck with tuning DL_POLY for this at the moment.

    I'll try to come up with a testcase for the example folder which we can actually do a runtime test in and try to work out if there is something weird going on - maybe the bh test is fairly easy to do in FORTRAN but I'm not sure yet.

  • The only thing that I can imagine is that the FORTRAN code is pinning it's main thread -- and all its child threads -- on a single core. Do you have a way of checking what physical core your threads are running on to check this?

    If this is the case, there are two possible routes to solve it:

    • Some flag to the FORTRAN compiler that lets it use all the threads?
    • Setting the preferred CPU when creating the pthreads in QuickSched, much as we do in SWIFT?
  • I'm using the OpenMP version (as the pthread version will be incorrect as the code is reliant on !$omp threadprivate inside the runner function in FORTRAN), and I'm in theory using all the cores according to KMP_AFFINITY=verbose,scatter, also having tried OMP_PROC_BIND.

    But it definitely looks like what you would expect an affinity issue to look like. The threads should be created once (which is inside the FORTRAN code), pinned to whichever core, but then should be fine to be used wherever.

    A non-hierarchical example is probably easiest, QR could be possible but switching everything from pointers to indices is annoying, but want to be pointer-free to check performance (and there are some issues with FORTRAN/C interoperability and pointers if you aren't careful atm). I could do something stupid like a "block-based" particle method, where i create "cells" of particles (at random) and compute N^2 interaction between particles

  • Ok, created a super naive n^2 interaction test and it scales at 90% to 24 cores. Its available in b7e3c77f and it should work fine (accidentally in the wrong branch but we can sort that later).

  • The DL_POLY version still doesn't work at all however, I assume I coded something incorrectly but I can't work out what right now. Will have a last look tomorrow but otherwise I need to move onto other work.

  • Please register or sign in to reply
    Loading