Closed
Description
nest_lc
and fit_lc
both return result dictionaries with a lot of similar fields (e.g. param_names
, errors
, covariance
). However, there are some inconsistencies:
- In
fit_lc()
,param_names
contains all parameters of the model, while innest_lc()
it only contains the varied parameters. (Infit_lc()
,cov_names
is used to denote the varied parameters.) fit_lc()
has aparameters
array with all the parameters, whilenest_lc()
has aparam_dict
dictionary with all the parameters.
This inconsistency arises because the primary quantity of interest returned by nest_lc()
is the samples
array, and param_names
indicates what the columns of that array are. It seems natural that samples
only contains varied parameters.
Metadata
Metadata
Assignees
Labels
No labels
Activity
rbiswas4 commentedon Jun 23, 2014
Yes, it would be good to have consistency such that fit_lc() and nest_lc() results in terms of the mean/max likelihood arguments and covariances are accessible in the same way for possible studies.
Thinking of possible uses of these variables:
Maybe there are more, and we should think about the operations required for those.
If these are the only operations is it good to return all parameters and their covariances (including 0 for fixed parameters)? The next stage could select the parameters necessary for cosmology and build up for the most general case (all varying?)
rrgupta commentedon Jun 23, 2014
Some things I just noticed:
a) res.param_dict = dictionary
b) res.errors = 1-D array (though documentation says dictionary)
a) res.parameters = 1-D array
b) res.errors = ordered dictionary
I think for consistency it would be great if for both nest_lc() and fit_lc() the fit parameters and their associated errors were both dictionaries. This would make things much simpler for the user and would allow one to easily swap a call to "nest_lc" for "fit_lc" or vice-versa without having to also change how the results are handled throughout the script.
For the near term, I think something has to be done with 1b) since as is, it disagrees with the documentation.
kbarbary commentedon Jun 24, 2014
nest_lc()
formerly returned a dictionary; it was erroneously changed in a recent commit. So in this case, it is the code that is wrong. I'll fix this.If we further change
nest_lc()
to matchfit_lc()
we'd haveparam_names
include all the parameters (including fixed ones) and introducecov_names
for the varied parameters. Thencov_names
would match up to the columns ofsamples
.To me this seems a bit distasteful: In my mind, the main point of
nest_lc
is to return the sample array, not the best-fit values, errors and covariances (which are only value-added post-processing on the samples), and having the column names in a variable calledcov_names
seems strange. Perhaps the problem is that the namecov_names
is a poor choice to begin with. Really need something likevaried_param_names
.... but shorter.varparam_names
?vparam_names
?rbiswas4 commentedon Jun 24, 2014
If we further change nest_lc() to match fit_lc() we'd have param_names include all the parameters (including fixed ones) and introduce cov_names for the varied parameters. Then cov_names would match up to the columns of samples.
To me this seems a bit distasteful: In my mind, the main point of nest_lc is to return the sample array, not the best-fit values, errors and covariances (which are only value-added post-processing on the samples), and having the column names in a variable called cov_names seems strange. Perhaps the problem is that the name cov_names is a poor choice to begin with. Really need something like varied_param_names.... but shorter. varparam_names? vparam_names?"
I agree that the job of a sampling algorithm is to return samples and not other things. Unfortunately, as of now, we are not using the power of the samples, instead sticking to a summary of samples in terms of the mean and covariances for cosmology. While, hopefully, this situation will change in future, it is entirely possible that the current method would be good enough for many purposes. So, I believe that consistency is going to be very useful.
Perhaps, a better way of dealing with these is to leave nest_lc() alone and have its primary function be returning samples. In fact, maybe think of turning off the covariance/"best fit" parameters (issues with backwards compatibilty :( ?) . And have two functions get_samplesummary(samples)
return these quantities. I would like us to try out samplers different from nest_lc(),
and so all samplers can use the get_samplesummary(samples) function.
Finally, we should work towards a utility get_SNsummary which takes the data, model and a method (fit_lc/algorithm, nest_lc, MH, etc.) and return the same quantities ("best fit", "covariance") for easy comparison of methods.
kbarbary commentedon Jun 28, 2014
After talking with @rbiswas4 we're thinking it would be best if
nest_lc
just returned parameter samples in its result, and we had a separate public function with a name likesample_statistics()
orprocess_samples()
. This function would calculate "best-fit" values, "errors" and covariance of the samples.We still need to have consistent names for things across the functions. Right now I'm thinking of using the following:
vparam_names
means "list of parameters to be varied / that were varied"param_names
means "all the model parameters"But not totally sure yet.
kbarbary commentedon Jan 20, 2015
I've backtracked a bit on this. In order to keep backwards compatibility, we can't have
nest_lc
only return samples. However, I've done as much as I think possible to make the results consistent.Here's a list of the consistent fields now:
parameters
: 1-d array of all parameter values, including fixed ones.vparam_names
: names of only varied parameterscovariance
: 2-d array of covariance for only varied parameterserrors
: OrderedDict of errors (just varied parameters)ncall
ndof
The results of
fit_lc
also includeparam_names
(list of all parameters, corresponding tomodel.param_names
) whereasnest_lc
does not. I would add it tonest_lc
, but it would break backwards compatibility, since this field formerly was used for the varied parameter names.rbiswas4 commentedon Jan 20, 2015
I may be forgetting part of our discussion, but the motivation for consistency was to have a uniform set of operations to obtain the most commonly used results ie. summary statistics.
So, can we tackle this by leaving nest_lc and fit_lc alone as you suggest, but add a wrapper function::
where results0 and results1 are not completely consistent, but a wrapper function gives consistent results
get_summarystats(results0)
get_summarystats(results1)
In fact, I think we could do this by not actually including the wrapper function in the code (one person
will want to use mean(samples), another will want median(samples), etc.) but provide an example in the documentation.
kbarbary commentedon Jan 20, 2015
The results are now almost entirely consistent, so I don't really see the benefit of a function like
get_summarystats
.We had discussed not having
nest_lc
calculate any sort of mean parameters. However, the return value is currentlyres, est_model
where the latter is a copy of the model with the parameters set to mean values. Unless we get rid of this second return value (backwards incompatible!) there's really no point to getting rid of the summary stats inres
.