updates to MT tutorial, fixes in config

hvasbath · Aug 23, 2018 · b86e9e3 · b86e9e3
1 parent c6773be
commit b86e9e3
Show file tree

Hide file tree

Showing 14 changed files with 90 additions and 77 deletions.
diff --git a/data/examples/FullMT/config_geometry.yaml b/data/examples/FullMT/config_geometry.yaml
@@ -101,7 +101,7 @@ problem_config: !beat.ProblemConfig
       lower: [-1.4142135623730951]
       upper: [1.4142135623730951]
       testvalue: [0.84551376]
-    moment: !beat.heart.Parameter
+    magnitude: !beat.heart.Parameter
       name: magnitude
       form: Uniform
       lower: [4.5]

diff --git a/docs/_static/FullMT_data.png b/docs/_static/FullMT_data.png
diff --git a/docs/_static/FullMT_windowed.png b/docs/_static/FullMT_windowed.png
diff --git a/docs/api.rst b/docs/api.rst
@@ -44,14 +44,14 @@ The :mod:`sampler` Module
 
 
 The :mod:`ffi` Module
-------------------------
+---------------------
 
 .. automodule:: ffi
    :members:
 
 
 The :mod:`parallel` Module
-------------------------
+--------------------------
 
 .. automodule:: parallel
    :members:

diff --git a/docs/conf.py b/docs/conf.py
@@ -39,7 +39,8 @@
    # 'sphinx.ext.mathjax',
     'sphinx.ext.viewcode',
     'sphinx.ext.doctest',
-    'numpydoc'
+    'sphinx.ext.napoleon'
+   # 'numpydoc'
 ]
 
 # Add any paths that contain templates here, relative to this directory.

diff --git a/docs/examples.rst b/docs/examples.rst
@@ -186,9 +186,8 @@ This looks reasonably well!
 
  .. image:: _static/fomosto_traces_snuffler.png
 
-
-Sample the solution space
-^^^^^^^^^^^^^^^^^^^^^^^^^
+Data windowing and optimization setup
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Once we are confident that the GFs are reasonable we may continue to define the optimization specific setup variables.
 First of all we check again the WaveformFitConfig for the waves we want to optimize. 
 In this case we want to optimize the whole waveform from P until the end of the surface waves.
@@ -216,7 +215,9 @@ Finally, we fix the depth prior to 8km (upper and lower) as we only calculated G
         testvalue: [8.0]
 
 Of course, in a real case this would not be fixed.
-Also we may inspect the data::
+
+The specifications of the tapers, filters and channels that the user defined above determine which part of the data traces are used in the course of the optimization.
+We may inspect the raw data together with the processed data that is going to be used in the course of the optimization with ::
 
     beat check FullMT --what='traces'
 
@@ -225,11 +226,21 @@ A detailed tutorial about handeling the browser is given `here <https://pyrocko.
 
   .. image:: _static/FullMT_data.png
 
-For example you can also right click in the window and see a menu that pops up. There you can select sort the traces by distance or azimuth to get a better picture of the setup.
+To better sort the displayed traces and to inspect the processed data we may use snufflers display options.
+Please right click in the window and see a menu that pops up. There, please select: "Sort by Distance" to sort the traces by distance to get a better picture of the setup. To see, which traces actually belong to the same station and component, please open the menu again and select "Subsort by Network, Station, Channel (Grouped by Location)" and "Common Scale per Component". To distinguish better between the overlapping traces please select as well "Color Traces" and deselect "Show Boxes". 
+Your display should look something like this.
+
+  .. image:: _static/FullMT_windowed.png
+
+In red we see the raw data traces as stored under $project_directory/seismic_data.pkl; and in blue we see the processed data where the WaveformFitConfig parameters (see above) have been applied. The blue traces are going to be used in this form throughout the optimization. For this setup here we are good, but for future problems the user may now adjust the configuration and repeatedly check if the data windowing is satisfying. For example, the user may widen the arrival_taper times to make sure that a respective wave train is completely included in case it is cut at the taper boundary. Or in case of noisy or bad quality data a station may be completely excluded by putting its name in the "blacklist" parameter.
+
 Now that we checked the optimization setup we are good to go.
 
-Firstly, we fix the source parameters to some random value and only optimize for the hyperparameters (HPs).
-How many different random source parameters are choosen and the sampling repeated is determined by the hyper_sampler_config parameters 'n_stages' (default:5) ::
+Sample the solution space
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Firstly, we fix the source parameters to some random value and only optimize for the noise scaling or hyperparameters (HPs).
+How many different random source parameters are choosen and the sampling repeated is determined by the hyper_sampler_config parameters 'n_chains' (default:20). In case there are several CPUs available the 'n_jobs' parameter determines how many processes (Markov Chains) are sampled in paralell. You may want to increase that now! To start the sampling please run ::
 
     beat sample FullMT --hypers
 
@@ -240,15 +251,15 @@ the HPs parameter bounds show something like::
       h_any_P_T: !beat.heart.Parameter
         name: h_any_P_T
         form: Uniform
-        lower: [-4.0]
-        upper: [5.0]
-        testvalue: [0.5]
+        lower: [-3.0]
+        upper: [3.0]
+        testvalue: [0.0]
       h_any_P_Z: !beat.heart.Parameter
         name: h_any_P_Z
         form: Uniform
-        lower: [-4.0]
-        upper: [5.0]
-        testvalue: [0.5]
+        lower: [-3.0]
+        upper: [2.0]
+        testvalue: [-0.5]
 
 At this point the bounds could be relaxed again as well by manually editing the configuration file, or the step could be entirely skipped.
 Now that we have an initial guess on the hyperparameters we can run the optimization using the default sampling algorithm, a Sequential Monte Carlo sampler.

diff --git a/docs/getting_started.rst b/docs/getting_started.rst
@@ -66,8 +66,8 @@ For example to optimize for a Full Moment Tensor for the Landers EQ by using sei
 
 This will create project directory called LandersEQ in the current directory.
 Within the directoy you will see that there have been two files created:
-  - BEAT_log.txt 
-  - geometry_config.yaml 
+ - BEAT_log.txt 
+ - geometry_config.yaml 
 
 The first file is a logging file where all the executed comments and outputs are written to. In case something goes wrong this log file helps to find the error.
 For now it contains::

diff --git a/src/backend.py b/src/backend.py
@@ -686,7 +686,7 @@ def point(self, idx):
 
         Returns
         -------
-        dictionary of point values
+        dict : of point values
         """
         ipoint = self._index.point(idx)
         return self._straces[ipoint['k']].point(ipoint['k_idx'])
@@ -777,8 +777,8 @@ def get_highest_sampled_stage(homedir, return_final=False):
     """
     Return stage number of stage that has been sampled before the final stage.
 
-    Paramaeters
-    -----------
+    Parameters
+    ----------
     homedir : str
         Directory to the sampled stage results
 

diff --git a/src/config.py b/src/config.py
@@ -460,16 +460,16 @@ class GeodeticConfig(Object):
     blacklist = List.T(
         String.T(),
         optional=True,
-        default=['placeholder'],
-        help='Station name for station to be thrown out.')
+        default=[],
+        help='GPS station name or scene name to be thrown out.')
     types = List.T(
         String.T(),
         default=['SAR'],
         help='Types of geodetic data, i.e. SAR, GPS, ...')
     calc_data_cov = Bool.T(
         default=True,
-        help='Flag for calculating the data covariance matrix based on the'
-             ' pre P arrival data trace noise.')
+        help='Flag for calculating the data covariance matrix, '
+             'outsourced to "kite"')
     interpolation = StringChoice.T(
         choices=['nearest_neighbor', 'multilinear'],
         default='multilinear',

diff --git a/src/covariance.py b/src/covariance.py
@@ -398,7 +398,7 @@ def autocovariance(data):
 
 def toeplitz_covariance(data, window_size):
     """
-    Get Töplitz banded matrix for given data.
+    Get Toeplitz banded matrix for given data.
 
     Returns
     -------
@@ -413,7 +413,7 @@ def toeplitz_covariance(data, window_size):
 
 def non_toeplitz_covariance(data, window_size):
     """
-    Get scaled non- Töplitz covariance matrix, which may be able to account
+    Get scaled non- Toeplitz covariance matrix, which may be able to account
     for non-stationary data-errors.
 
     Parameters

diff --git a/src/ffi.py b/src/ffi.py
@@ -517,7 +517,7 @@ def starttimes2idxs(self, starttimes, interpolation='nearest_neighbor'):
         starttimeidxs, starttimes : :class:`numpy.ndarray` or
             :class:`theano.tensor.Tensor`, int16
             (output depends on interpolation scheme,
-             if multilinear interpolation factors are returned as well)
+            if multilinear interpolation factors are returned as well)
         """
         if interpolation == 'nearest_neighbor':
             return backends[self._mode].round(

diff --git a/src/heart.py b/src/heart.py
@@ -2077,7 +2077,7 @@ def prepare_data(self, source, engine, outmode='array'):
                 arrival_times[i] = get_phase_arrival_time(
                     engine=engine, source=source,
                     target=target, wavename=self.name)
-            
+
             self._prepared_data = taper_filter_traces(
                 self.datasets,
                 arrival_taper=self.config.arrival_taper,
@@ -2195,7 +2195,7 @@ def get_station_correction_idxs(self, targets):
         """
         Returns array of indexes to problem stations,
 
-        Paremeters
+        Parameters
         ----------
         targets : list
             containing :class:`DynamicTarget` Objects
@@ -2341,8 +2341,8 @@ def init_datahandler(seismic_config, seismic_data_path='./'):
     """
     Initialise datahandler.
 
-    Parameter
-    ---------
+    Parameters
+    ----------
     seismic_config : :class:`config.SeismicConfig`
     seismic_data_path : str
         absolute path to the directory of the seismic data
@@ -2382,8 +2382,8 @@ def init_wavemap(waveformfit_config, datahandler=None, event=None):
     relation to the seismic Phase of interest and allows individual
     specificiations.
 
-    Parameter
-    ---------
+    Parameters
+    ----------
     waveformfit_config : :class:`config.WaveformFitConfig`
     datahandler : :class:`DataWaveformCollection`
     event : :class:`pyrocko.model.Event`

diff --git a/src/parallel.py b/src/parallel.py
@@ -256,7 +256,7 @@ def memshare(parameternames):
 
     Parameters
     ----------
-    paremeternames : list of str
+    parameternames : list of str
         off names to :class:`theano.tensor.sharedvar.TensorSharedVariable`
     """
     for paramname in parameternames:
@@ -284,16 +284,16 @@ def memshare_sparams(shared_params):
     shared_params : list
         of :class:`theano.tensor.sharedvar.TensorSharedVariable`
 
-    Returns:
-    --------
+    Returns
+    -------
 
     memshared_instances : list
         of :class:`multiprocessing.sharedctypes.RawArray`
         list of sharedctypes (shared memory arrays) that point
         to the memory used by the current process's Theano variable.
 
-    Notes:
-    ------
+    Notes
+    -----
     Modiefied from:
     https://github.com/JonathanRaiman/theano_lstm/blob/master/theano_lstm/shared_memory.py
 
@@ -327,16 +327,16 @@ def borrow_memory(shared_param, memshared_instance):
     Spawn different processes with the shared memory
     of your theano model's variables.
 
-    Parameters:
-    -----------
+    Parameters
+    ----------
     shared_param : :class:`theano.tensor.sharedvar.TensorSharedVariable`
         the Theano shared variable where
         shared memory should be used instead.
     memshared_instance : :class:`multiprocessing.RawArray`
         the memory shared across processes (e.g.from `memshare_sparams`)
 
-    Notes:
-    ------
+    Notes
+    -----
     Modiefied from:
     https://github.com/JonathanRaiman/theano_lstm/blob/master/theano_lstm/shared_memory.py
 
@@ -346,32 +346,35 @@ def borrow_memory(shared_param, memshared_instance):
     a list called "params". We loop through each theano shared variable and
     call `borrow_memory` on it to share memory across processes.
 
-        def spawn_model(path, wrapped_params):
-            # prevent recompilation and arbitrary locks
-            theano.config.reoptimize_unpickled_function = False
-            theano.gof.compilelock.set_lock_status(False)
-
-            # load your function from its pickled instance (from path)
-            myfunction = MyFunction.load(path)
-
-            # for each parameter in your function
-            # apply the borrow memory strategy to replace
-            # the internal parameter's memory with the
-            # across-process memory:
-            for param, memshared_instance in zip(
-                    myfunction.get_shared(), memshared_instances):
-                borrow_memory(param, memory)
-
-            # acquire your dataset (either through some smart shared memory
-            # or by reloading it for each process)
-            dataset, dataset_labels = acquire_dataset()
-
-            # then run your model forward in this process
-            epochs = 20
-            for epoch in range(epochs):
-                model.update_fun(dataset, dataset_labels)
+    Examples
+    --------
+    def spawn_model(path, wrapped_params):
+        # prevent recompilation and arbitrary locks
+        theano.config.reoptimize_unpickled_function = False
+        theano.gof.compilelock.set_lock_status(False)
+
+        # load your function from its pickled instance (from path)
+        myfunction = MyFunction.load(path)
+
+        # for each parameter in your function
+        # apply the borrow memory strategy to replace
+        # the internal parameter's memory with the
+        # across-process memory:
+        for param, memshared_instance in zip(
+                myfunction.get_shared(), memshared_instances):
+            borrow_memory(param, memory)
+
+        # acquire your dataset (either through some smart shared memory
+        # or by reloading it for each process)
+        dataset, dataset_labels = acquire_dataset()
+
+        # then run your model forward in this process
+        epochs = 20
+        for epoch in range(epochs):
+            model.update_fun(dataset, dataset_labels)
     See `borrow_all_memories` for list usage.
     """
+
     logger.debug('%s' % shared_param.name)
     param_value = num.frombuffer(memshared_instance)
     param_value.shape = shared_param.get_value(True, True).shape
@@ -383,8 +386,8 @@ def borrow_all_memories(shared_params, memshared_instances):
     Run theano_borrow_memory on a list of params and shared memory
     sharedctypes.
 
-    Parameters:
-    -----------
+    Parameters
+    ----------
     shared_params : list
         of :class:`theano.tensor.sharedvar.TensorSharedVariable`
         the Theano shared variable where
@@ -393,8 +396,8 @@ def borrow_all_memories(shared_params, memshared_instances):
         of :class:`multiprocessing.RawArray`
         the memory shared across processes (e.g.from `memshare_sparams`)
 
-    Notes:
-    ------
+    Notes
+    -----
     Same as `borrow_memory` but for lists of shared memories and
     theano variables. See `borrow_memory`
     """

diff --git a/src/utility.py b/src/utility.py
@@ -147,7 +147,7 @@ def d2l(self, dpt):
 
         Returns
         -------
-        point
+        lpoint
         """
 
         a_list = copy.copy(self.list_arrays)
@@ -173,7 +173,7 @@ def l2d(self, a_list):
 
         Returns
         -------
-        point
+        :class:`pymc3.model.Point`
         """
         point = {}
 
@@ -280,9 +280,7 @@ def weed_input_rvs(input_rvs, mode, datatype):
     Parameters
     ----------
     input_rvs : dict
-        of :class:`pymc3.Distribution`
-        or set
-            of variable names
+        of :class:`pymc3.Distribution` or set of variable names
     mode : str
         'geometry', 'static, 'kinematic', 'interseismic' determining the
         discarded RVs