Skip to content

NetEmd with large locations  #121

Open
@andeElliott

Description

If we use locations with very large values calling the functions in different orders can give different answers. For example, in theory all should be equivalent (assuming the same distribution):

min_emd of two variance 1 histograms (i.e. calling normalise_hist_variance)) using optimise
net_emd of two variance 1 histograms (i.e. calling normalise_hist_variance)) using optimise

min_emd of two variance 1 histograms (i.e. calling normalise_hist_variance)) using optimiseRonly
net_emd of two variance 1 histograms (i.e. calling normalise_hist_variance)) using optimiseRonly

min_emd of two mean 0 variance 1 histograms (i.e. calling normalise_hist_variance)) using optimise
net_emd of two mean 0 variance 1 histograms (i.e. calling normalise_hist_variance)) using optimise

min_emd of two mean 0 variance 1 histograms (i.e. calling normalise_hist_variance)) using optimiseRonly
net_emd of two mean 0 variance 1 histograms (i.e. calling normalise_hist_variance)) using optimiseRonly

However, because of the floating point problems with the variance normalisation, this can cause issues. Importantly it is not always obvious which of the methods will fail.

There are three solutions:

1.) Leave it as is, in theory the user should not be normalising histograms and this is an edge case caused by big numbers

2.) Mean centre when normalising by variance, it is quite arb. that we fix the mean to the same value anyway.

3.) Change the dhist object to have a mean offset so that we can get the best of both worlds, but it will require a not small adjustment.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions