-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong results when date-indexed xts with duplicate index values is subset by its own index #275
Comments
I did some more checking.
xts with as.Date test
So the issue is purely when indexing / selecting records within the xts object. Now if you look at the outcome of the xts with index compared to the input data, you can see that the first 2 records disappeared.
But the issue is not as such with the index. If I change the index with the dates from the Data$timestamp you have the same issue.
I also checked what happened if the timestamp was not passed as a date, but as Posixct. Then it works correctly, but only if there are no duplicate timestamps. As soon as you introduce a duplicate record, the issue appears. I will do some more tests later. |
I have narrowed the issue down to the function debugging:
result of the objects:
Which when the NA's are removed from the firstlast object the outcome results in the issue specified above. |
Thanks for the detailed investigation @pverspeelt! The The output below shows the results prior to introducing the require(xts)
Data <- structure(
list(
timestamp = c("2013-03-06 01:00:00", "2014-07-06 21:00:00",
"2014-07-31 23:00:00", "2014-08-09 17:00:00",
"2014-08-14 20:00:00", "2014-08-14 22:00:00",
"2014-08-16 15:00:00", "2014-08-19 02:00:00",
"2014-12-28 18:00:00", "2015-01-17 17:00:00"),
user = c(1, 2, 2, 3, 3, 3, 3, 3, 4, 4)),
.Names = c("timestamp", "user"),
row.names = c("220667", "331481", "422653", "629430", "378111", "646137",
"558638", "151641", "599370", "482750"),
class = "data.frame")
(x <- xts(Data$user, as.Date(Data$timestamp)))
# [,1]
# 2013-03-06 1
# 2014-07-06 2
# 2014-07-31 2
# 2014-08-09 3
# 2014-08-14 3
# 2014-08-14 3
# 2014-08-16 3
# 2014-08-19 3
# 2014-12-28 4
# 2015-01-17 4
x[index(x)] # Different (wrong) data. Why?
# [,1]
# 2013-03-06 1
# 2014-07-06 2
# 2014-07-31 2
# 2014-08-09 3
# 2014-08-14 3
# 2014-08-14 3
# 2014-08-14 3
# 2014-08-14 3
# 2014-08-16 3
# 2014-08-19 3
# 2014-12-28 4
# 2015-01-17 4
packageVersion("xts")
# [1] '0.11.0' These results look correct. There are duplicate values for In situations where behavior is unclear, I prefer to defer to consistent behavior with zoo. That's not possible in this case, because zoo does not attempt to handle objects with duplicate index values. For example: z <- as.zoo(x)
z[index(z)]
# 2013-03-06 2014-07-06 2014-07-31 2014-08-09 2014-08-14 2014-08-14 2014-08-16
# 1 2 2 3 3 3 3
# 2014-08-19 2014-12-28 2015-01-17
# 3 4 4
# Warning message:
# In zoo(rval, index(x)[i]) :
# some methods for "zoo" objects do not work if the index entries in 'order.by' are not unique We may consider what zoo does when you subset a zoo object with and z[-5,][index(z)] # Remove one of the 2014-08-14 rows
# 2013-03-06 2014-07-06 2014-07-31 2014-08-09 2014-08-14 2014-08-16 2014-08-19
# 1 2 2 3 3 3 3
# 2014-12-28 2015-01-17
# 4 4 |
The result of subsetting an object with 'i' that contains duplicates, may be longer than 'i' and/or the object. But the code assumed the subset result would never be longer than the object index. This worked correctly prior to 1d707c5, when the fill_window_dups_rev() function was added. Check for when the subset output length reaches the length of the initially allocated result object. Increase the length of the result object by twice the remaining elements in 'i' if we run out of space. Fixes #275.
Given a xts object with a date index that contains duplicate dates, incorrect results are returned if the object is subset with its own date index.
Thanks to scs for their Stackoverflow question.
Session Info
The text was updated successfully, but these errors were encountered: