-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
na.fill long call times #259
Comments
Note that there is no |
That would be consistent, just curious why the zoo method is so much slower
than simply specifying the condition in brackets.
…On Wed, Jul 25, 2018 at 1:43 PM, Joshua Ulrich ***@***.***> wrote:
Note that there is no na.fill.xts() method, so zoo:::na.fill.zoo() is
dispatched. We can probably add an xts method, and only dispatch to the zoo
method if fill is not a scalar.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#259 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHKwqnDtyu2BaFf5gx1klzac3DjMVSF3ks5uKLxXgaJpZM4Vgcoi>
.
|
@ckatsulis it looks like a lot of the time is spent in creating symbol names from the |
The column names of a merge() result can be monstrous for raw, unnamed numeric objects. Something similar to dput() for the entire column is used, but zoo deparses the dput() output and limits the deparse length to ~60 characters. Use a modified version of makeNames() (defined in merge.zoo()) to make the column names. Major differences are: - Use substitute(alist(...)) to avoid evaluating '...' - Use deparse(x, nlines = 1L) to deparse only the first expression NOTE: this may break existing code that relies on behavior for integer objects. For example, current behavior is: x <- .xts(1:5, 1:5) lx <- list(x, x) do.call(merge, lx) X1.5 X1.5.1 1969-12-31 18:00:01 1 1 1969-12-31 18:00:02 2 2 1969-12-31 18:00:03 3 3 1969-12-31 18:00:04 4 4 1969-12-31 18:00:05 5 5 New behavior has column names: structure.1.5...Dim...c.5L..1L...index...structure.1.5..tclass...c..POSIXct... structure.1.5...Dim...c.5L..1L...index...structure.1.5..tclass...c.. POSIXct....1 Fixes #248. See #259.
Hi @ckatsulis, the fix for #248 improves performance for this case fairly dramatically. Though it's still not nearly as fast as x <- .xts(1:1e7, 1:1e7)
is.na(x) <- sample(1e7, 1e6)
microbenchmark::microbenchmark(na.fill(x, 0), times = 1)
# Unit: seconds
# expr min lq mean median uq max neval
# na.fill(x, 0) 1.944266 1.944266 1.944266 1.944266 1.944266 1.944266 1
y <- coredata(x)
microbenchmark::microbenchmark(na.fill(y, 0), times = 1)
# Unit: seconds
# expr min lq mean median uq max neval
# na.fill(y, 0) 6.566733 6.566733 6.566733 6.566733 6.566733 6.566733 1
microbenchmark::microbenchmark(X[is.na(X)] <- 0, times = 1, setup = X <- x)
# Unit: milliseconds
# expr min lq mean median uq max neval
# X[is.na(X)] <- 0 176.9659 176.9659 176.9659 176.9659 176.9659 176.9659 1 |
Performance using Rcpp ifelse()
|
Two orders of magnitude better is more than enough for me. Is na_fill
exported in the new xts?
…On Wed, Dec 12, 2018 at 11:29 AM harvey131 ***@***.***> wrote:
Performance using Rcpp ifelse()
library(xts)
Rcpp::cppFunction("
NumericMatrix na_fill(const NumericMatrix x, const double fill) {
NumericMatrix r = clone(x);
for (int j=0; j < r.ncol(); j++) {
r(_,j) = ifelse( is_na(r(_,j)), fill, r(_,j) );
}
return r;
}")
f1 = function(x, fill) {
x[is.na(x)] <- fill
x
}
x <- .xts(1:1e7, 1:1e7)is.na(x) <- sample(1e7, 1e6)
microbenchmark::microbenchmark(na.fill(x, 0),
na_fill(x, 0),
f1(x, 0),
times=100)
Unit: milliseconds
expr min lq mean median uq max neval
na.fill(x, 0) 1082.35488 1146.39916 1178.56886 1175.79914 1203.5146 1309.6379 100
na_fill(x, 0) 71.08533 76.49017 97.88448 89.81974 110.8743 166.9555 100
f1(x, 0) 69.06708 78.01604 98.97025 93.17469 105.8702 177.6743 100
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#259 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHKwqp_LxDNh7N6dHScaIaDp8N7vtM8-ks5u4T0MgaJpZM4Vgcoi>
.
|
@ckatsulis the code @harvey131 posted isn't in xts, let alone exported. I'm not likely to add a dependency on compiled code unless there's significant performance improvement over a native R solution. In this case, the Rcpp code is roughly the same as I was also hoping to make the change upstream in zoo, but time is a constraint, as always. |
Agreed. No need if the existing call is the same speed for all intents and purposes.
Chris
Sent from my iPhone
… On Dec 13, 2018, at 07:23, Joshua Ulrich ***@***.***> wrote:
@ckatsulis the code @harvey131 posted isn't in xts, let alone exported. I'm not likely to add a dependency on compiled code unless there's significant performance improvement over a native R solution. In this case, the Rcpp code is roughly the same as x[is.na(x)] <- 0.
I was also hoping to make the change upstream in zoo, but time is a constraint, as always.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Idea: |
na.fill()
takes much longer than other methods to fill NA values. I expected near constant time to find and fill NA values.Reproducible timings are below. I will send
testData
via email independently as it is too large for this forum.Session Info
The text was updated successfully, but these errors were encountered: