Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Qvalue calculation is too conservative #171

Closed
2 tasks done
bvenn opened this issue Jan 18, 2022 · 3 comments · Fixed by #172
Closed
2 tasks done

[BUG] Qvalue calculation is too conservative #171

bvenn opened this issue Jan 18, 2022 · 3 comments · Fixed by #172
Assignees
Labels

Comments

@bvenn
Copy link
Member

bvenn commented Jan 18, 2022

Describe the bug
In FSharp.Stats.Testing.Multiple.Qvalues local FDRs are calculated and afterwards smoothed so that the q value of pi is the minimal FDR of all p values greater than pi.

While the local FDR calculation is correct, the smoothing does not take the minimal FDR of pvals greater than pi, but the maximal FDR of pvals lower than pi, which makes the computation more conservative as it must be.

image

Solution
Modify the bindby function accordingly.

  • fix bindby
  • add unit tests
@bvenn bvenn added the bug label Jan 18, 2022
@bvenn bvenn self-assigned this Jan 18, 2022
@bvenn
Copy link
Member Author

bvenn commented Jan 18, 2022

The issue is more complex than I thought. While for monotonic pvalues the strategy works, but if many identical pvalues exist, the sorting corrupts the q value smoothing. If many identical keys exist (pvalues), it is not clear which index to choose.

Reproduce


#r "nuget: Plotly.NET, 2.0.0-preview.16"
open Plotly.NET

let index = Array.init 10000 id
let testValues =
	[|
		[|1. .. 5000.|]
		Array.init 2000 (fun x-> 5000.)
		[|5001..8000|]
	|]
	|> Array.concat

testValues |> Array.indexed |> Chart.Point |> Chart.show
System.Array.Sort(testValues,index)
index |> Array.indexed |> Chart.Point |> Chart.show

image

Edit: When Seq.sort or List.sort is used instead of Array.Sort the problem seems to be solved.

bvenn added a commit that referenced this issue Jan 18, 2022
bvenn added a commit that referenced this issue Jan 18, 2022
@bvenn
Copy link
Member Author

bvenn commented Jan 18, 2022

The standard q value implementation is fixed. I decided to omit the bindBy function, since it reduces the readability and causes harm when the p value collection is too large. The monotonization of the q values is now packed within the respective function. Unit tests must be corrected and the Qvalues.ofPvaluesRobust requires further inspection of validity and proper documentation.

bvenn added a commit that referenced this issue Jan 19, 2022
@bvenn
Copy link
Member Author

bvenn commented Jan 19, 2022

The robust q value version has an additional term, that corrects small p values, especially when the number of tests is low. Its described in Storey, J.D. (2002), A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64: 479-498. https://doi.org/10.1111/1467-9868.00346 in function 9.

image

@bvenn bvenn mentioned this issue Jan 19, 2022
2 tasks
bvenn added a commit that referenced this issue Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant