Skip to content

Commit

Permalink
updates to MT tutorial, fixes in config
Browse files Browse the repository at this point in the history
  • Loading branch information
hvasbath committed Aug 23, 2018
1 parent c6773be commit b86e9e3
Show file tree
Hide file tree
Showing 14 changed files with 90 additions and 77 deletions.
2 changes: 1 addition & 1 deletion data/examples/FullMT/config_geometry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ problem_config: !beat.ProblemConfig
lower: [-1.4142135623730951]
upper: [1.4142135623730951]
testvalue: [0.84551376]
moment: !beat.heart.Parameter
magnitude: !beat.heart.Parameter
name: magnitude
form: Uniform
lower: [4.5]
Expand Down
Binary file modified docs/_static/FullMT_data.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/FullMT_windowed.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,14 +44,14 @@ The :mod:`sampler` Module


The :mod:`ffi` Module
------------------------
---------------------

.. automodule:: ffi
:members:


The :mod:`parallel` Module
------------------------
--------------------------

.. automodule:: parallel
:members:
Expand Down
3 changes: 2 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@
# 'sphinx.ext.mathjax',
'sphinx.ext.viewcode',
'sphinx.ext.doctest',
'numpydoc'
'sphinx.ext.napoleon'
# 'numpydoc'
]

# Add any paths that contain templates here, relative to this directory.
Expand Down
37 changes: 24 additions & 13 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -186,9 +186,8 @@ This looks reasonably well!

.. image:: _static/fomosto_traces_snuffler.png


Sample the solution space
^^^^^^^^^^^^^^^^^^^^^^^^^
Data windowing and optimization setup
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once we are confident that the GFs are reasonable we may continue to define the optimization specific setup variables.
First of all we check again the WaveformFitConfig for the waves we want to optimize.
In this case we want to optimize the whole waveform from P until the end of the surface waves.
Expand Down Expand Up @@ -216,7 +215,9 @@ Finally, we fix the depth prior to 8km (upper and lower) as we only calculated G
testvalue: [8.0]

Of course, in a real case this would not be fixed.
Also we may inspect the data::

The specifications of the tapers, filters and channels that the user defined above determine which part of the data traces are used in the course of the optimization.
We may inspect the raw data together with the processed data that is going to be used in the course of the optimization with ::

beat check FullMT --what='traces'

Expand All @@ -225,11 +226,21 @@ A detailed tutorial about handeling the browser is given `here <https://pyrocko.

.. image:: _static/FullMT_data.png

For example you can also right click in the window and see a menu that pops up. There you can select sort the traces by distance or azimuth to get a better picture of the setup.
To better sort the displayed traces and to inspect the processed data we may use snufflers display options.
Please right click in the window and see a menu that pops up. There, please select: "Sort by Distance" to sort the traces by distance to get a better picture of the setup. To see, which traces actually belong to the same station and component, please open the menu again and select "Subsort by Network, Station, Channel (Grouped by Location)" and "Common Scale per Component". To distinguish better between the overlapping traces please select as well "Color Traces" and deselect "Show Boxes".
Your display should look something like this.

.. image:: _static/FullMT_windowed.png

In red we see the raw data traces as stored under $project_directory/seismic_data.pkl; and in blue we see the processed data where the WaveformFitConfig parameters (see above) have been applied. The blue traces are going to be used in this form throughout the optimization. For this setup here we are good, but for future problems the user may now adjust the configuration and repeatedly check if the data windowing is satisfying. For example, the user may widen the arrival_taper times to make sure that a respective wave train is completely included in case it is cut at the taper boundary. Or in case of noisy or bad quality data a station may be completely excluded by putting its name in the "blacklist" parameter.

Now that we checked the optimization setup we are good to go.

Firstly, we fix the source parameters to some random value and only optimize for the hyperparameters (HPs).
How many different random source parameters are choosen and the sampling repeated is determined by the hyper_sampler_config parameters 'n_stages' (default:5) ::
Sample the solution space
^^^^^^^^^^^^^^^^^^^^^^^^^

Firstly, we fix the source parameters to some random value and only optimize for the noise scaling or hyperparameters (HPs).
How many different random source parameters are choosen and the sampling repeated is determined by the hyper_sampler_config parameters 'n_chains' (default:20). In case there are several CPUs available the 'n_jobs' parameter determines how many processes (Markov Chains) are sampled in paralell. You may want to increase that now! To start the sampling please run ::

beat sample FullMT --hypers

Expand All @@ -240,15 +251,15 @@ the HPs parameter bounds show something like::
h_any_P_T: !beat.heart.Parameter
name: h_any_P_T
form: Uniform
lower: [-4.0]
upper: [5.0]
testvalue: [0.5]
lower: [-3.0]
upper: [3.0]
testvalue: [0.0]
h_any_P_Z: !beat.heart.Parameter
name: h_any_P_Z
form: Uniform
lower: [-4.0]
upper: [5.0]
testvalue: [0.5]
lower: [-3.0]
upper: [2.0]
testvalue: [-0.5]

At this point the bounds could be relaxed again as well by manually editing the configuration file, or the step could be entirely skipped.
Now that we have an initial guess on the hyperparameters we can run the optimization using the default sampling algorithm, a Sequential Monte Carlo sampler.
Expand Down
4 changes: 2 additions & 2 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,8 @@ For example to optimize for a Full Moment Tensor for the Landers EQ by using sei

This will create project directory called LandersEQ in the current directory.
Within the directoy you will see that there have been two files created:
- BEAT_log.txt
- geometry_config.yaml
- BEAT_log.txt
- geometry_config.yaml

The first file is a logging file where all the executed comments and outputs are written to. In case something goes wrong this log file helps to find the error.
For now it contains::
Expand Down
6 changes: 3 additions & 3 deletions src/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -686,7 +686,7 @@ def point(self, idx):
Returns
-------
dictionary of point values
dict : of point values
"""
ipoint = self._index.point(idx)
return self._straces[ipoint['k']].point(ipoint['k_idx'])
Expand Down Expand Up @@ -777,8 +777,8 @@ def get_highest_sampled_stage(homedir, return_final=False):
"""
Return stage number of stage that has been sampled before the final stage.
Paramaeters
-----------
Parameters
----------
homedir : str
Directory to the sampled stage results
Expand Down
8 changes: 4 additions & 4 deletions src/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -460,16 +460,16 @@ class GeodeticConfig(Object):
blacklist = List.T(
String.T(),
optional=True,
default=['placeholder'],
help='Station name for station to be thrown out.')
default=[],
help='GPS station name or scene name to be thrown out.')
types = List.T(
String.T(),
default=['SAR'],
help='Types of geodetic data, i.e. SAR, GPS, ...')
calc_data_cov = Bool.T(
default=True,
help='Flag for calculating the data covariance matrix based on the'
' pre P arrival data trace noise.')
help='Flag for calculating the data covariance matrix, '
'outsourced to "kite"')
interpolation = StringChoice.T(
choices=['nearest_neighbor', 'multilinear'],
default='multilinear',
Expand Down
4 changes: 2 additions & 2 deletions src/covariance.py
Original file line number Diff line number Diff line change
Expand Up @@ -398,7 +398,7 @@ def autocovariance(data):

def toeplitz_covariance(data, window_size):
"""
Get Töplitz banded matrix for given data.
Get Toeplitz banded matrix for given data.
Returns
-------
Expand All @@ -413,7 +413,7 @@ def toeplitz_covariance(data, window_size):

def non_toeplitz_covariance(data, window_size):
"""
Get scaled non- Töplitz covariance matrix, which may be able to account
Get scaled non- Toeplitz covariance matrix, which may be able to account
for non-stationary data-errors.
Parameters
Expand Down
2 changes: 1 addition & 1 deletion src/ffi.py
Original file line number Diff line number Diff line change
Expand Up @@ -517,7 +517,7 @@ def starttimes2idxs(self, starttimes, interpolation='nearest_neighbor'):
starttimeidxs, starttimes : :class:`numpy.ndarray` or
:class:`theano.tensor.Tensor`, int16
(output depends on interpolation scheme,
if multilinear interpolation factors are returned as well)
if multilinear interpolation factors are returned as well)
"""
if interpolation == 'nearest_neighbor':
return backends[self._mode].round(
Expand Down
12 changes: 6 additions & 6 deletions src/heart.py
Original file line number Diff line number Diff line change
Expand Up @@ -2077,7 +2077,7 @@ def prepare_data(self, source, engine, outmode='array'):
arrival_times[i] = get_phase_arrival_time(
engine=engine, source=source,
target=target, wavename=self.name)

self._prepared_data = taper_filter_traces(
self.datasets,
arrival_taper=self.config.arrival_taper,
Expand Down Expand Up @@ -2195,7 +2195,7 @@ def get_station_correction_idxs(self, targets):
"""
Returns array of indexes to problem stations,
Paremeters
Parameters
----------
targets : list
containing :class:`DynamicTarget` Objects
Expand Down Expand Up @@ -2341,8 +2341,8 @@ def init_datahandler(seismic_config, seismic_data_path='./'):
"""
Initialise datahandler.
Parameter
---------
Parameters
----------
seismic_config : :class:`config.SeismicConfig`
seismic_data_path : str
absolute path to the directory of the seismic data
Expand Down Expand Up @@ -2382,8 +2382,8 @@ def init_wavemap(waveformfit_config, datahandler=None, event=None):
relation to the seismic Phase of interest and allows individual
specificiations.
Parameter
---------
Parameters
----------
waveformfit_config : :class:`config.WaveformFitConfig`
datahandler : :class:`DataWaveformCollection`
event : :class:`pyrocko.model.Event`
Expand Down
77 changes: 40 additions & 37 deletions src/parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ def memshare(parameternames):
Parameters
----------
paremeternames : list of str
parameternames : list of str
off names to :class:`theano.tensor.sharedvar.TensorSharedVariable`
"""
for paramname in parameternames:
Expand Down Expand Up @@ -284,16 +284,16 @@ def memshare_sparams(shared_params):
shared_params : list
of :class:`theano.tensor.sharedvar.TensorSharedVariable`
Returns:
--------
Returns
-------
memshared_instances : list
of :class:`multiprocessing.sharedctypes.RawArray`
list of sharedctypes (shared memory arrays) that point
to the memory used by the current process's Theano variable.
Notes:
------
Notes
-----
Modiefied from:
https://github.com/JonathanRaiman/theano_lstm/blob/master/theano_lstm/shared_memory.py
Expand Down Expand Up @@ -327,16 +327,16 @@ def borrow_memory(shared_param, memshared_instance):
Spawn different processes with the shared memory
of your theano model's variables.
Parameters:
-----------
Parameters
----------
shared_param : :class:`theano.tensor.sharedvar.TensorSharedVariable`
the Theano shared variable where
shared memory should be used instead.
memshared_instance : :class:`multiprocessing.RawArray`
the memory shared across processes (e.g.from `memshare_sparams`)
Notes:
------
Notes
-----
Modiefied from:
https://github.com/JonathanRaiman/theano_lstm/blob/master/theano_lstm/shared_memory.py
Expand All @@ -346,32 +346,35 @@ def borrow_memory(shared_param, memshared_instance):
a list called "params". We loop through each theano shared variable and
call `borrow_memory` on it to share memory across processes.
def spawn_model(path, wrapped_params):
# prevent recompilation and arbitrary locks
theano.config.reoptimize_unpickled_function = False
theano.gof.compilelock.set_lock_status(False)
# load your function from its pickled instance (from path)
myfunction = MyFunction.load(path)
# for each parameter in your function
# apply the borrow memory strategy to replace
# the internal parameter's memory with the
# across-process memory:
for param, memshared_instance in zip(
myfunction.get_shared(), memshared_instances):
borrow_memory(param, memory)
# acquire your dataset (either through some smart shared memory
# or by reloading it for each process)
dataset, dataset_labels = acquire_dataset()
# then run your model forward in this process
epochs = 20
for epoch in range(epochs):
model.update_fun(dataset, dataset_labels)
Examples
--------
def spawn_model(path, wrapped_params):
# prevent recompilation and arbitrary locks
theano.config.reoptimize_unpickled_function = False
theano.gof.compilelock.set_lock_status(False)
# load your function from its pickled instance (from path)
myfunction = MyFunction.load(path)
# for each parameter in your function
# apply the borrow memory strategy to replace
# the internal parameter's memory with the
# across-process memory:
for param, memshared_instance in zip(
myfunction.get_shared(), memshared_instances):
borrow_memory(param, memory)
# acquire your dataset (either through some smart shared memory
# or by reloading it for each process)
dataset, dataset_labels = acquire_dataset()
# then run your model forward in this process
epochs = 20
for epoch in range(epochs):
model.update_fun(dataset, dataset_labels)
See `borrow_all_memories` for list usage.
"""

logger.debug('%s' % shared_param.name)
param_value = num.frombuffer(memshared_instance)
param_value.shape = shared_param.get_value(True, True).shape
Expand All @@ -383,8 +386,8 @@ def borrow_all_memories(shared_params, memshared_instances):
Run theano_borrow_memory on a list of params and shared memory
sharedctypes.
Parameters:
-----------
Parameters
----------
shared_params : list
of :class:`theano.tensor.sharedvar.TensorSharedVariable`
the Theano shared variable where
Expand All @@ -393,8 +396,8 @@ def borrow_all_memories(shared_params, memshared_instances):
of :class:`multiprocessing.RawArray`
the memory shared across processes (e.g.from `memshare_sparams`)
Notes:
------
Notes
-----
Same as `borrow_memory` but for lists of shared memories and
theano variables. See `borrow_memory`
"""
Expand Down
8 changes: 3 additions & 5 deletions src/utility.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ def d2l(self, dpt):
Returns
-------
point
lpoint
"""

a_list = copy.copy(self.list_arrays)
Expand All @@ -173,7 +173,7 @@ def l2d(self, a_list):
Returns
-------
point
:class:`pymc3.model.Point`
"""
point = {}

Expand Down Expand Up @@ -280,9 +280,7 @@ def weed_input_rvs(input_rvs, mode, datatype):
Parameters
----------
input_rvs : dict
of :class:`pymc3.Distribution`
or set
of variable names
of :class:`pymc3.Distribution` or set of variable names
mode : str
'geometry', 'static, 'kinematic', 'interseismic' determining the
discarded RVs
Expand Down

0 comments on commit b86e9e3

Please sign in to comment.