-
Notifications
You must be signed in to change notification settings - Fork 15
/
Copy pathrsHRF_deleteoutliers.m
103 lines (98 loc) · 3.32 KB
/
rsHRF_deleteoutliers.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
function [b,idx,outliers] = rsHRF_deleteoutliers(a,alpha,rep);
% [B, IDX, OUTLIERS] = DELETEOUTLIERS(A, ALPHA, REP)
%
% For input vector A, returns a vector B with outliers (at the significance
% level alpha) removed. Also, optional output argument idx returns the
% indices in A of outlier values. Optional output argument outliers returns
% the outlying values in A.
%
% ALPHA is the significance level for determination of outliers. If not
% provided, alpha defaults to 0.05.
%
% REP is an optional argument that forces the replacement of removed
% elements with NaNs to presereve the length of a. (Thanks for the
% suggestion, Urs.)
%
% This is an iterative implementation of the Grubbs Test that tests one
% value at a time. In any given iteration, the tested value is either the
% highest value, or the lowest, and is the value that is furthest
% from the sample mean. Infinite elements are discarded if rep is 0, or
% replaced with NaNs if rep is 1 (thanks again, Urs).
%
% Appropriate application of the test requires that data can be reasonably
% approximated by a normal distribution. For reference, see:
% 1) "Procedures for Detecting Outlying Observations in Samples," by F.E.
% Grubbs; Technometrics, 11-1:1--21; Feb., 1969, and
% 2) _Outliers in Statistical Data_, by V. Barnett and
% T. Lewis; Wiley Series in Probability and Mathematical Statistics;
% John Wiley & Sons; Chichester, 1994.
% A good online discussion of the test is also given in NIST's Engineering
% Statistics Handbook:
% http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm
%
% ex:
% [B,idx,outliers] = deleteoutliers([1.1 1.3 0.9 1.2 -6.4 1.2 0.94 4.2 1.3 1.0 6.8 1.3 1.2], 0.05)
% returns:
% B = 1.1000 1.3000 0.9000 1.2000 1.2000 0.9400 1.3000 1.0000 1.3000 1.2000
% idx = 5 8 11
% outliers = -6.4000 4.2000 6.8000
%
% ex:
% B = deleteoutliers([1.1 1.3 0.9 1.2 -6.4 1.2 0.94 4.2 1.3 1.0 6.8 1.3 1.2
% Inf 1.2 -Inf 1.1], 0.05, 1)
% returns:
% B = 1.1000 1.3000 0.9000 1.2000 NaN 1.2000 0.9400 NaN 1.3000 1.0000 NaN 1.3000 1.2000 NaN 1.2000 NaN 1.1000
% Written by Brett Shoelson, Ph.D.
% shoelson@helix.nih.gov
% 9/10/03
% Modified 9/23/03 to address suggestions by Urs Schwartz.
% Modified 10/08/03 to avoid errors caused by duplicate "maxvals."
% (Thanks to Valeri Makarov for modification suggestion.)
if nargin == 1
alpha = 0.05;
rep = 0;
elseif nargin == 2
rep = 0;
elseif nargin == 3
if ~ismember(rep,[0 1])
error('Please enter a 1 or a 0 for optional argument rep.')
end
elseif nargin > 3
error('Requires 1,2, or 3 input arguments.');
end
if isempty(alpha)
alpha = 0.05;
end
b = a;
b(isinf(a)) = NaN;
%Delete outliers:
outlier = 1;
while outlier
tmp = b(~isnan(b));
meanval = mean(tmp);
maxval = tmp(find(abs(tmp-mean(tmp))==max(abs(tmp-mean(tmp)))));
maxval = maxval(1);
sdval = std(tmp);
tn = abs((maxval-meanval)/sdval);
critval = zcritical(alpha,length(tmp));
outlier = tn > critval;
if outlier
tmp = find(a == maxval);
b(tmp) = NaN;
end
end
if nargout >= 2
idx = find(isnan(b));
end
if nargout > 2
outliers = a(idx);
end
if ~rep
b=b(~isnan(b));
end
return
function zcrit = zcritical(alpha,n)
%ZCRIT = ZCRITICAL(ALPHA,N)
% Computes the critical z value for rejecting outliers (GRUBBS TEST)
tcrit = tinv(alpha/(2*n),n-2);
zcrit = (n-1)/sqrt(n)*(sqrt(tcrit^2/(n-2+tcrit^2)));