Initial version

2022-04-20 14:08:38 +02:00
commit 68b8e1a305
48 changed files with 5133 additions and 0 deletions
--- a/doc/special/autosave.rst
+++ b/doc/special/autosave.rst
@ -0,0 +1,93 @@
+Automatic Saving of Analysis Data
+=================================
+
+Mdevaluate provides a functionality to save the result of analysis functions automatically.
+The data is saved to a file after it was computed.
+If an analysis was done in the exact same way before, the result is loaded from this file.
+
+This function may be activated through calling :func:`mdevaluate.autosave.enable`, which takes a directory as input.
+If this directory is a relative path (e.g. no trailing slash) the results will be saved in a location
+relative to the directory of the trajectory file.
+If the output files of your simulations are located in a subdirectory, like ``/path/to/sim/Output`` it is possible
+to specify the auto save location as ``../data`` such that the result files will be placed under ``/path/to/sim/data``.
+
+At the moment the two functions which use this behavior are:
+
+- :func:`~mdevaluate.correlation.shifted_correlation`
+- :func:`~mdevaluate.distribution.time_average`
+
+Any other function can make use of the autosave mechanism by decorating it with :func:`mdevaluate.autosave.autosave_data`.
+
+A full example
+--------------
+
+This is how it works, for a detailed explanation see below::
+
+  import mdevaluate as md
+  md.autosave.enable('data')
+  water = md.open('/path/to/sim').subset(atom_name='OW')
+  md.correlation.shifted_correlation(
+      md.correlation.msd,
+      water,
+      description='test'
+  )
+  # The result will be saved to the file:
+  # /path/to/sim/data/shifted_correlation_msd_OW_test.npz
+
+Checksum of the Analysis Call
+-----------------------------
+
+The autosave module calculates a checksum for each call of an analysis function,
+which is used to validate a present the data file.
+This way the result should only be loaded from file if the analysis is exactly the same.
+This includes the function code that is evaluated, so the result will be recomputed if any bit of the code changes.
+But there is always the possibility that checksums coincide accidentally,
+by chance or due to a bug in the code, which should be kept in mind when using this functionality.
+
+Special Keyword Arguments
+-------------------------
+
+The autosave module introduces two special keyword arguments to the decorated functions:
+
+- ``autoload``: This prevents the loading of previously calculated data even if a valid file was found.
+- ``description``: A descriptive string of the specific analysis, see below.
+
+Those keywords may be passed to those function (shifted_correlation, time_average) like any other keyword argument.
+If autosave was not enabled, they will be ignored.
+
+File names and Analysis Descriptions
+------------------------------------
+
+The evaluated data is saved to human readable files, whose name is derived from the function call
+and the automatic description of the subset.
+The latter one is assigned based on the ``atom_name`` and ``residue_name`` of the :func:`~mdevaluate.atoms.AtomSubset.subset` method.
+
+In some cases this is not enough, for example if the same subset is analyzed spatially resolved,
+which would lead to identical filenames that would be overwritten.
+Therefore a more detailed description of each specific analysis call needs to be provided.
+For this reason the autosave module introduces the before mentioned keyword argument ``description``.
+The value of this keyword is appended to the filename and in addition if any of
+the other arguments of the function call has a attribute description, this will appended as well.
+For example this (pseudo) code will lead to the filename ``shifted_correlation_isf_OW_1-2nm_nice.npz``::
+
+  OW = traj.subset(atom_name='OW')
+
+  corr = subensemble_correlation(spatial_selector)
+  corr.description = '1-2nm'
+
+  shifted_correlation(
+    isf,
+    OW,
+    correlation=corr,
+    description='nice'
+  )
+
+
+Reusing the autosaved data
+--------------------------
+
+The results of the functions are saved in NumPy's  npz format, see :func:`numpy.savez`.
+If the result should be used in a different place, it can either be loaded with
+:func:`numpy.load` or :func:`mdevaluate.autosave.load_data`.
+The latter function will return the result of the function call directly, the former
+returns a dict with the keys ``checksum`` and ``data``, the latter yielding the results data.
--- a/doc/special/energies.rst
+++ b/doc/special/energies.rst
@ -0,0 +1,18 @@
+Gromacs Energy Files
+====================
+
+It is possible to read the energy files (.edr) written out by Gromacs with mdevaluate.
+Those files contain thermodynamic properties of the system, like temperature or pressure.
+The exact contents of an energy file depend on the type of ensemble that was simulated,
+an NVT simulation's energy file for example will not contain information about the box size.
+
+To open these files use the function :func:`mdevaluate.open_energy`, which takes the filename of an energy file.
+The types of energies stored in the file can be shown with the :attr:`types` attribute of the class :class:`~mdevaluate.reader.EnergyReader`,
+the :attr:`units` attribute gives the units of these energy types.
+The timesteps at which those energies were written out are accessible through the :attr:`~mdevaluate.reader.EnergyReader.time` property.
+The time series of one of these energies can be accessed through the named index, comparable to python dictionaries.
+::
+  import mdevaluate as md
+  edr = md.open_energy('/path/to/energy.edr')
+  # plot the evolution of temperature
+  plot(edr.time, edr['Temperature'])
--- a/doc/special/overlap.rst
+++ b/doc/special/overlap.rst
@ -0,0 +1,76 @@
+Computing the Overlap Function
+==============================
+
+The overlap function is defined as the portion of particles of a given set,
+whose positions *overlap* after a given time :math:`t` with the reference configuration at :math:`t=0`.
+This is calculated as follows:
+The Initial positions define spheres of a given radius :math:`r` which then are used
+to test how many of the particles at a later time are found within those spheres.
+Normalized by the number of spheres this gives the correlation of the configurational overlap.
+
+.. math::
+
+   Q(t) = \frac{1}{N} \left\langle \sum\limits_{i=1}^N n_i(t) \right\rangle
+
+Where :math:`n_i(t)` defines the :math:`N` spheres, with :math:`n_i(t)=1` if a particle
+is found within this sphere at time :math:`t` and :math:`n_i(0) = 1` for :math:`1\leq i \leq N`.
+
+Evaluation with mdevaluate
+--------------------------
+
+Computation of the overlap requires the relatively expensive computation of next neighbor distances,
+which scales with the order of :math:`\mathcal{O}(N^2)`.
+There are more efficient ways for the solution of this problem, the one used here is
+the so called :class:`~scipy.spatial.cKDTree`.
+This is much more efficient and allows to compute the overlap relatively fast::
+
+  OW = md.open('/path/to/sim').subset(atom_name='OW')
+  tree = md.coordinates.CoordinatesKDTree(OW)
+  Qol = md.correlation.shifted_correlation(
+      partial(md.correlation.overlap, crds_tree=tree, radius=0.11),
+      OW
+  )
+
+As seen above, mdevaluate provides the function :func:`~mdevaluate.correlation.overlap`
+for this evaluation, which uses a special object of type :class:`~mdevaluate.coordinates.CoordinatesKDTree`
+for the neighbor search.
+The latter provides two features, necessary for the computation:
+First it computes a :class:`~scipy.spatial.cKDTree` for each necessary frame of the trajectory;
+second it caches those trees, since assembly of KDTrees is expensive.
+The size of the cache can be controlled with the keyword argument ``maxsize`` of the CoordinatesKDTree initialization.
+
+Note that this class uses the C version (hence the lowercase C) rather than
+the pure Python version :class:`~scipy.spatial.KDTree` since the latter is significantly slower.
+The only downside is, that the C version had a memory leak before SciPy 0.17,
+but as long as a recent version of SciPy is used, this shouldn't be a problem.
+
+Overlap of a Subsystem
+----------------------
+
+In many cases the overlap of a subsystem, e.g. a spatial region, should be computed.
+This is done by selecting a subset of the initial configuration before defining the spheres.
+The overlap is then probed with the whole system.
+This has two benefits:
+
+1. It yields the correct results
+2. The KDTree structures are smaller and thereby less computation and memory expensive
+
+An example of a spatial resolved analysis, where ``OW`` is loaded as above::
+
+  selector = partial(
+      md.coordinates.spatial_selector,
+      transform=md.coordinates.spherical_radius,
+      rmin=1.0,
+      rmax=1.5
+  )
+  tree = md.coordinates.CoordinatesKDTree(OW, selector=selector)
+  Qol = md.correlation.shifted_correlation(
+      partial(md.correlation.overlap, crds_tree=tree, radius=0.11),
+      OW
+  )
+
+This computes the overlap of OW atoms in the region :math:`1.0 \leq r \leq 1.5`.
+This method can of course be used to probe the overlap of any subsystem, which is selected by the given selector function.
+It should return a viable index for a (m, 3) sized NumPy array when called with original frame of size (N, 3)::
+
+  subset = frame[selector(frame)]
--- a/doc/special/spatial.rst
+++ b/doc/special/spatial.rst
@ -0,0 +1,38 @@
+Spatial Resolved Analysis
+=========================
+
+This section describes how spatially resolved correlation can be analyzed with mdevaluate.
+This guide assumes that the variable ``traj`` holds a trajectory where the subset of atoms that should be analyzed are selected.
+For example::
+
+  traj = md.open('/path/to/sim', cached=1000).subset(atom_name='OW')
+
+Which would load a simulation from the directory ``/path/to/sim`` and select all ``OW`` atoms.
+Note that for this use case, the caching is quite useful since it enables us to iterate over spatial regions
+without significant time penalty.
+Moving on let's calculate the ISF of water oxygens with spherical radius between 0.5 and 0.7 nm::
+
+  from functools import partial
+  func = partial(md.correlation.isf, q=22.7)
+  selector = partial(
+    md.coordinates.spatial_selector,
+    transform=md.coordinates.spherical_radius,
+    rmin=0.5, rmax=0.7
+  )
+  t, S = md.correlation.shifted_correlation(
+    func, traj,
+    correlation=md.correlation.subensemble_correlation(selector)
+  )
+
+To explain how this works, let's go through the code from bottom to top.
+The spatial filtering is done inside the shifted_correlation by the function
+:func:`mdevaluate.correlation.subensemble_correlation`.
+This function takes a selector function as argument that should take a frame as input
+and return the selection of the coordinates that should be selected.
+A new selection is taken for the starting frame of each shifted time segment.
+
+In this case the selection is done with the function :func:`mdevaluate.coordinates.spatial_selector`.
+This function takes four arguments, the first being the frame of coordinates which is handed by :func:`subensemble_correlation`.
+The second argument is a transformation function, which transforms the input coordinates to the coordinate which will be filtered,
+in this case the spherical radius.
+The two last arguments define the minimum and maximum value of this quantity.