Skip to content

should result of nest_lc() and fit_lc() be more consistent? #30

Closed
@kbarbary

Description

@kbarbary

nest_lc and fit_lc both return result dictionaries with a lot of similar fields (e.g. param_names, errors, covariance). However, there are some inconsistencies:

  • In fit_lc(), param_names contains all parameters of the model, while in nest_lc() it only contains the varied parameters. (In fit_lc(), cov_names is used to denote the varied parameters.)
  • fit_lc() has a parameters array with all the parameters, while nest_lc() has a param_dict dictionary with all the parameters.

This inconsistency arises because the primary quantity of interest returned by nest_lc() is the samples array, and param_names indicates what the columns of that array are. It seems natural that samples only contains varied parameters.

Activity

rbiswas4

rbiswas4 commented on Jun 23, 2014

@rbiswas4
Member

Yes, it would be good to have consistency such that fit_lc() and nest_lc() results in terms of the mean/max likelihood arguments and covariances are accessible in the same way for possible studies.

Thinking of possible uses of these variables:

  1. Study population statistics of SN in a sample. This could also be used to decide on appropriate distributions for simulations. In this case, we would like to retrieve all the parameters, whether varied or fixed (presumably because we have other information that fixes them).
  2. Cosmology analysis: Here we would like to have the covariance between z, mB, x1, c which is what would give the contribution to the error on μ coming from the light curve fit. As of now, we are computing this independently, directly from the samples. (since mB is not included in the covariance).

Maybe there are more, and we should think about the operations required for those.

If these are the only operations is it good to return all parameters and their covariances (including 0 for fixed parameters)? The next stage could select the parameters necessary for cosmology and build up for the most general case (all varying?)

rrgupta

rrgupta commented on Jun 23, 2014

@rrgupta

Some things I just noticed:

  1. nest_lc():
    a) res.param_dict = dictionary
    b) res.errors = 1-D array (though documentation says dictionary)
  2. fit_lc():
    a) res.parameters = 1-D array
    b) res.errors = ordered dictionary

I think for consistency it would be great if for both nest_lc() and fit_lc() the fit parameters and their associated errors were both dictionaries. This would make things much simpler for the user and would allow one to easily swap a call to "nest_lc" for "fit_lc" or vice-versa without having to also change how the results are handled throughout the script.

For the near term, I think something has to be done with 1b) since as is, it disagrees with the documentation.

kbarbary

kbarbary commented on Jun 24, 2014

@kbarbary
MemberAuthor

nest_lc() formerly returned a dictionary; it was erroneously changed in a recent commit. So in this case, it is the code that is wrong. I'll fix this.

If we further change nest_lc() to match fit_lc() we'd have param_names include all the parameters (including fixed ones) and introduce cov_names for the varied parameters. Then cov_names would match up to the columns of samples.

To me this seems a bit distasteful: In my mind, the main point of nest_lc is to return the sample array, not the best-fit values, errors and covariances (which are only value-added post-processing on the samples), and having the column names in a variable called cov_names seems strange. Perhaps the problem is that the name cov_names is a poor choice to begin with. Really need something like varied_param_names.... but shorter. varparam_names? vparam_names?

rbiswas4

rbiswas4 commented on Jun 24, 2014

@rbiswas4
Member

If we further change nest_lc() to match fit_lc() we'd have param_names include all the parameters (including fixed ones) and introduce cov_names for the varied parameters. Then cov_names would match up to the columns of samples.

To me this seems a bit distasteful: In my mind, the main point of nest_lc is to return the sample array, not the best-fit values, errors and covariances (which are only value-added post-processing on the samples), and having the column names in a variable called cov_names seems strange. Perhaps the problem is that the name cov_names is a poor choice to begin with. Really need something like varied_param_names.... but shorter. varparam_names? vparam_names?"

I agree that the job of a sampling algorithm is to return samples and not other things. Unfortunately, as of now, we are not using the power of the samples, instead sticking to a summary of samples in terms of the mean and covariances for cosmology. While, hopefully, this situation will change in future, it is entirely possible that the current method would be good enough for many purposes. So, I believe that consistency is going to be very useful.

Perhaps, a better way of dealing with these is to leave nest_lc() alone and have its primary function be returning samples. In fact, maybe think of turning off the covariance/"best fit" parameters (issues with backwards compatibilty :( ?) . And have two functions get_samplesummary(samples)
return these quantities. I would like us to try out samplers different from nest_lc(),
and so all samplers can use the get_samplesummary(samples) function.

Finally, we should work towards a utility get_SNsummary which takes the data, model and a method (fit_lc/algorithm, nest_lc, MH, etc.) and return the same quantities ("best fit", "covariance") for easy comparison of methods.

kbarbary

kbarbary commented on Jun 28, 2014

@kbarbary
MemberAuthor

After talking with @rbiswas4 we're thinking it would be best if nest_lc just returned parameter samples in its result, and we had a separate public function with a name like sample_statistics() or process_samples(). This function would calculate "best-fit" values, "errors" and covariance of the samples.

We still need to have consistent names for things across the functions. Right now I'm thinking of using the following:

  • vparam_names means "list of parameters to be varied / that were varied"
  • param_names means "all the model parameters"

But not totally sure yet.

added this to the v0.5.0 milestone on Jun 28, 2014
kbarbary

kbarbary commented on Jan 20, 2015

@kbarbary
MemberAuthor

I've backtracked a bit on this. In order to keep backwards compatibility, we can't have nest_lc only return samples. However, I've done as much as I think possible to make the results consistent.

Here's a list of the consistent fields now:

  • parameters: 1-d array of all parameter values, including fixed ones.
  • vparam_names: names of only varied parameters
  • covariance: 2-d array of covariance for only varied parameters
  • errors: OrderedDict of errors (just varied parameters)
  • ncall
  • ndof

The results of fit_lc also include param_names (list of all parameters, corresponding to model.param_names) whereas nest_lc does not. I would add it to nest_lc, but it would break backwards compatibility, since this field formerly was used for the varied parameter names.

rbiswas4

rbiswas4 commented on Jan 20, 2015

@rbiswas4
Member

I may be forgetting part of our discussion, but the motivation for consistency was to have a uniform set of operations to obtain the most commonly used results ie. summary statistics.

So, can we tackle this by leaving nest_lc and fit_lc alone as you suggest, but add a wrapper function::

results0 = nest_lc(...)
results1 = fit_lc(...)

where results0 and results1 are not completely consistent, but a wrapper function gives consistent results
get_summarystats(results0)
get_summarystats(results1)

In fact, I think we could do this by not actually including the wrapper function in the code (one person
will want to use mean(samples), another will want median(samples), etc.) but provide an example in the documentation.

kbarbary

kbarbary commented on Jan 20, 2015

@kbarbary
MemberAuthor

The results are now almost entirely consistent, so I don't really see the benefit of a function like get_summarystats.

We had discussed not having nest_lc calculate any sort of mean parameters. However, the return value is currently res, est_model where the latter is a copy of the model with the parameters set to mean values. Unless we get rid of this second return value (backwards incompatible!) there's really no point to getting rid of the summary stats in res.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      should result of `nest_lc()` and `fit_lc()` be more consistent? · Issue #30 · sncosmo/sncosmo