Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebase dev #96

Merged
merged 58 commits into from
Jan 4, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
56813a7
Merge pull request #79 from roaldarbol/dev
roaldarbol Dec 7, 2024
57d6b36
Lots of new functions! Still rough
Dec 12, 2024
72cd715
Mostly docstrings and a few modifications.
Dec 12, 2024
db721a8
A few name changes
Dec 12, 2024
33f4790
Log y-axis on check_poses
Dec 12, 2024
b95b92a
Fix. Also silence ggplot
Dec 12, 2024
e39fc92
Add reference_keypoint to check_pose
Dec 12, 2024
622b533
Add doc and export to replace_na
Dec 12, 2024
143d137
Allow na_interpolation to return an unfiltered data frame with a warning
Dec 12, 2024
d817a39
Pathc to last commit
Dec 12, 2024
d692715
Another small patch
Dec 12, 2024
efcbe5b
Allow plotting of all NAs in check_na_timing
Dec 12, 2024
0118c58
Add min_obs parameter to smooth_movement
Dec 12, 2024
4d752ea
Hopefully improve speed of translate_coords_keypoint
Dec 13, 2024
9e47f16
lots of new functions and test data moved
Dec 14, 2024
d40e5e0
Just filename changes
Dec 14, 2024
906c0f1
Just docs and patches to ensure successful building
Dec 14, 2024
530e868
Add better read_trex docstring
Dec 14, 2024
eb9cb2b
Merge pull request #87 from roaldarbol/main
roaldarbol Dec 14, 2024
4b835ab
Expose and add documentation to set_individual and set_framerate
Dec 15, 2024
6a304fd
Add imports
Dec 15, 2024
91afc51
Expport set_ functions and update get_example_data
Dec 15, 2024
0177652
Add to NAMESPACE
Dec 15, 2024
c58f8ca
Fix time series plots when all values are NA
Dec 15, 2024
4904fdd
Tiny patch
Dec 15, 2024
be0c861
Add peak/trough detection
Dec 16, 2024
7111681
Great improvements the extrema detection functions. Also lots of test…
Dec 16, 2024
004aa31
Add movement classification
Dec 16, 2024
5a12cce
Export classification
Dec 16, 2024
2d5e44d
And add it to NAMESPACE
Dec 16, 2024
add149a
Merge branch 'everything_everywhere_all_at_once' of https://github.co…
Dec 16, 2024
6b71fbc
Fix bug in filter_by_speed
Dec 17, 2024
4f1e22e
Add NA tests
Dec 17, 2024
b547bfd
Bug fix for calculate_kinematics - added group_by keypoint and indivi…
Dec 17, 2024
679d30c
Fix set_framerate so it detects whether a frame rate has previously b…
Dec 17, 2024
a497269
Add bandwidth filters
Dec 19, 2024
d5b0ec6
Updates to the classification functions
Dec 19, 2024
414e90b
Changed method names in smooth_movement function
Dec 19, 2024
eb39902
Added NA testing for filter_by_speed
Dec 19, 2024
fe972ff
Add return_type parameter
Dec 19, 2024
1d35cfc
Add return_type parameter
Dec 19, 2024
4685c92
Merge branch 'everything_everywhere_all_at_once' of https://github.co…
Dec 19, 2024
d746502
Clean-up
Dec 19, 2024
ca26260
Patch
Dec 19, 2024
b05dca9
Another patch
Dec 19, 2024
6a1a08c
Patch again
Dec 19, 2024
65c696d
Improved bandwidth filters
Dec 20, 2024
18e7661
Adds Kalman filters
Dec 20, 2024
d9d1ca5
Add rotation of coordinates and egocentric transformation
Dec 21, 2024
ff8ebdc
Alignment of timeseries and classification w peak+trough
Dec 25, 2024
383aa8d
Improve detection of active periods
Dec 25, 2024
e0589f2
Adds replace_na functions and classify_low_periods
Dec 26, 2024
f8e440e
Filtering functions
Jan 3, 2025
d4b790d
Calculations
Jan 3, 2025
0492954
Filter NA functions
Jan 3, 2025
e64e9fa
The rest
Jan 3, 2025
0c2bba8
Update version
Jan 3, 2025
635e699
Merge pull request #94 from roaldarbol/everything_everywhere_all_at_once
roaldarbol Jan 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Great improvements the extrema detection functions. Also lots of test…
…ing.
  • Loading branch information
Mikkel Roald-Arbøl committed Dec 16, 2024
commit 711168114b13b534a989d97a25e19098d20021bc
330 changes: 280 additions & 50 deletions R/find_extrema.R
Original file line number Diff line number Diff line change
@@ -1,51 +1,296 @@
#' Find Peaks in Time Series Data
#'
#' @description
#' Identifies peaks (local maxima) in a numeric time series, with options to filter peaks based on
#' height and prominence. The function handles missing values (NA) appropriately and is compatible
#' with dplyr's mutate. Includes flexible handling of plateaus and adjustable window size for peak detection.
#'
#' @param x Numeric vector containing the time series data
#' @param min_height Minimum height threshold for peaks (default: -Inf)
#' @param min_prominence Minimum prominence threshold for peaks (default: 0)
#' @param plateau_handling String specifying how to handle plateaus. One of:
#' * "strict" (default): No points in plateau are peaks
#' * "middle": Middle point(s) of plateau are peaks
#' * "first": First point of plateau is peak
#' * "last": Last point of plateau is peak
#' * "all": All points in plateau are peaks
#' @param window_size Integer specifying the size of the window to use for peak detection (default: 3).
#' Must be odd and >= 3. Larger values detect peaks over wider ranges.
#'
#' @return A logical vector of the same length as the input where:
#' * `TRUE` indicates a confirmed peak
#' * `FALSE` indicates a non-peak
#' * `NA` indicates peak status could not be determined due to missing data
#'
#' @details
#' The function uses a sliding window algorithm for peak detection (window size specified by window_size parameter),
#' combined with a region-based prominence calculation method similar to that described in Palshikar (2009).
#'
#' # Peak Detection
#' A point is considered a peak if it is the highest point within its window (default window_size of 3
#' compares each point with its immediate neighbors). The first and last (window_size-1)/2 points in the
#' series cannot be peaks and are marked as NA. Larger window sizes will identify peaks that dominate over
#' a wider range, typically resulting in fewer peaks being detected.
#'
#' # Prominence
#' Prominence measures how much a peak stands out relative to its surrounding values.
#' It is calculated as the height of the peak minus the height of the highest minimum
#' between this peak and any higher peaks (or the end of the series if no higher peaks exist).
#'
#' # Plateau Handling
#' Plateaus (sequences of identical values) are handled according to the plateau_handling parameter:
#'
#' * strict: No points in a plateau are considered peaks (traditional behavior)
#' * middle: For plateaus of odd length, the middle point is marked as a peak.
#' For plateaus of even length, the two middle points are marked as peaks.
#' * first: The first point of each plateau is marked as a peak
#' * last: The last point of each plateau is marked as a peak
#' * all: Every point in the plateau is marked as a peak
#'
#' Note that in all cases, the plateau must still qualify as a peak relative to its
#' surrounding window (i.e., higher than all other points in the window).
#'
#' # Missing Values (NA) Handling
#' The function uses the following rules for handling NAs:
#' * If a point is NA, it cannot be a peak (returns NA)
#' * If any point in the window is NA, peak status cannot be determined (returns NA)
#' * For prominence calculations, stretches of NAs are handled appropriately
#' * A minimum of window_size points is required; shorter series return all NAs
#'
#' @references
#' Palshikar, G. (2009). Simple Algorithms for Peak Detection in Time-Series.
#' Proc. 1st Int. Conf. Advanced Data Analysis, Business Analytics and Intelligence.
#'
#' @examples
#' # Basic usage with default window size (3)
#' x <- c(1, 3, 2, 6, 4, 5, 2)
#' find_peaks(x)
#'
#' # With larger window size
#' find_peaks(x, window_size = 5) # More stringent peak detection
#'
#' # With minimum height
#' find_peaks(x, min_height = 4, window_size = 3)
#'
#' # With plateau handling
#' x <- c(1, 3, 3, 3, 2, 4, 4, 1)
#' find_peaks(x, plateau_handling = "middle", window_size = 3) # Middle of plateaus
#' find_peaks(x, plateau_handling = "all", window_size = 5) # All plateau points
#'
#' # With missing values
#' x <- c(1, 3, NA, 6, 4, NA, 2)
#' find_peaks(x)
#'
#' # Usage with dplyr
#' library(dplyr)
#' data_frame(
#' time = 1:10,
#' value = c(1, 3, 7, 4, 2, 6, 5, 8, 4, 2)
#' ) %>%
#' mutate(peaks = find_peaks(value, window_size = 3))
#'
#' @seealso
#' * \code{\link{find_troughs}} for finding local minima
#' * \code{\link[pracma]{findpeaks}} in the pracma package for alternative peak detection methods
#'
#' @note
#' * The function is optimized for use with dplyr's mutate
#' * For noisy data, consider using a larger window_size or smoothing the series before peak detection
#' * Adjust min_height and min_prominence to filter out unwanted peaks
#' * Choose plateau_handling based on your specific needs
#' * Larger window_size values result in more stringent peak detection
#'
#' @export
find_peaks <- function(x, min_height = -Inf, min_prominence = 0,
plateau_handling = c("strict", "middle", "first", "last", "all"),
window_size = 3) {
plateau_handling <- match.arg(plateau_handling)
find_extrema(x, min_height, min_prominence, plateau_handling, "peak", window_size)
}

#' Find Troughs in Time Series Data
#'
#' @description
#' Identifies troughs (local minima) in a numeric time series, with options to filter troughs based on
#' height and prominence. The function handles missing values (NA) appropriately and is compatible
#' with dplyr's mutate. Includes flexible handling of plateaus and adjustable window size for trough detection.
#'
#' @param x Numeric vector containing the time series data
#' @param max_height Maximum height threshold for troughs (default: Inf)
#' @param min_prominence Minimum prominence threshold for troughs (default: 0)
#' @param plateau_handling String specifying how to handle plateaus. One of:
#' * "strict" (default): No points in plateau are troughs
#' * "middle": Middle point(s) of plateau are troughs
#' * "first": First point of plateau is trough
#' * "last": Last point of plateau is trough
#' * "all": All points in plateau are troughs
#' @param window_size Integer specifying the size of the window to use for trough detection (default: 3).
#' Must be odd and >= 3. Larger values detect troughs over wider ranges.
#'
#' @return A logical vector of the same length as the input where:
#' * `TRUE` indicates a confirmed trough
#' * `FALSE` indicates a non-trough
#' * `NA` indicates trough status could not be determined due to missing data
#'
#' @details
#' The function uses a sliding window algorithm for trough detection (window size specified by window_size parameter),
#' combined with a region-based prominence calculation method similar to that described in Palshikar (2009).
#'
#' # Trough Detection
#' A point is considered a trough if it is the lowest point within its window (default window_size of 3
#' compares each point with its immediate neighbors). The first and last (window_size-1)/2 points in the
#' series cannot be troughs and are marked as NA. Larger window sizes will identify troughs that dominate over
#' a wider range, typically resulting in fewer troughs being detected.
#'
#' # Prominence
#' Prominence measures how much a trough stands out relative to its surrounding values.
#' It is calculated as the height of the lowest maximum between this trough and any lower
#' troughs (or the end of the series if no lower troughs exist) minus the height of the trough.
#'
#' # Plateau Handling
#' Plateaus (sequences of identical values) are handled according to the plateau_handling parameter:
#'
#' * strict: No points in a plateau are considered troughs (traditional behavior)
#' * middle: For plateaus of odd length, the middle point is marked as a trough.
#' For plateaus of even length, the two middle points are marked as troughs.
#' * first: The first point of each plateau is marked as a trough
#' * last: The last point of each plateau is marked as a trough
#' * all: Every point in the plateau is marked as a trough
#'
#' Note that in all cases, the plateau must still qualify as a trough relative to its
#' surrounding window (i.e., lower than all other points in the window).
#'
#' # Missing Values (NA) Handling
#' The function uses the following rules for handling NAs:
#' * If a point is NA, it cannot be a trough (returns NA)
#' * If any point in the window is NA, trough status cannot be determined (returns NA)
#' * For prominence calculations, stretches of NAs are handled appropriately
#' * A minimum of window_size points is required; shorter series return all NAs
#'
#' @references
#' Palshikar, G. (2009). Simple Algorithms for Peak Detection in Time-Series.
#' Proc. 1st Int. Conf. Advanced Data Analysis, Business Analytics and Intelligence.
#'
#' @examples
#' # Basic usage with default window size (3)
#' x <- c(5, 3, 4, 1, 4, 2, 5)
#' find_troughs(x)
#'
#' # With larger window size
#' find_troughs(x, window_size = 5) # More stringent trough detection
#'
#' # With maximum height
#' find_troughs(x, max_height = 3, window_size = 3)
#'
#' # With plateau handling
#' x <- c(5, 2, 2, 2, 3, 1, 1, 4)
#' find_troughs(x, plateau_handling = "middle", window_size = 3) # Middle of plateaus
#' find_troughs(x, plateau_handling = "all", window_size = 5) # All plateau points
#'
#' # With missing values
#' x <- c(5, 3, NA, 1, 4, NA, 5)
#' find_troughs(x)
#'
#' # Usage with dplyr
#' library(dplyr)
#' data_frame(
#' time = 1:10,
#' value = c(5, 3, 1, 4, 2, 1, 3, 0, 4, 5)
#' ) %>%
#' mutate(troughs = find_troughs(value, window_size = 3))
#'
#' @seealso
#' * \code{\link{find_peaks}} for finding local maxima
#' * \code{\link[pracma]{findpeaks}} in the pracma package for alternative extrema detection methods
#'
#' @note
#' * The function is optimized for use with dplyr's mutate
#' * For noisy data, consider using a larger window_size or smoothing the series before trough detection
#' * Adjust max_height and min_prominence to filter out unwanted troughs
#' * Choose plateau_handling based on your specific needs
#' * Larger window_size values result in more stringent trough detection
#'
#' @export
find_troughs <- function(x, max_height = Inf, min_prominence = 0,
plateau_handling = c("strict", "middle", "first", "last", "all"),
window_size = 3) {
plateau_handling <- match.arg(plateau_handling)
find_extrema(x, max_height, min_prominence, plateau_handling, "trough", window_size)
}

#' Internal function for finding extrema
#' @noRd
find_extrema <- function(x, height_threshold, min_prominence, plateau_handling, type) {
find_extrema <- function(x, height_threshold, min_prominence, plateau_handling, type, window_size) {
# Input validation
if (!is.numeric(x) && !all(is.na(x))) stop("Input must be numeric or NA")
if (!is.numeric(window_size) || window_size < 3 || window_size %% 2 == 0) {
stop("window_size must be an odd integer >= 3")
}

# Initialize result vector
n <- length(x)
is_extremum <- rep(FALSE, n)

# Handle special cases
if (n < 3) return(rep(NA, n))
if (n < window_size) return(rep(NA, n))

# First and last points can't be extrema
is_extremum[1] <- NA
is_extremum[n] <- NA
# Edge points can't be extrema
half_window <- floor(window_size/2)
is_extremum[1:half_window] <- NA
is_extremum[(n-half_window+1):n] <- NA

# Set comparison function based on type
compare <- if (type == "peak") {
function(a, b) if (is.na(a) || is.na(b)) NA else a > b
# Function to check if value meets height threshold
meets_height_threshold <- if (type == "peak") {
function(value) value > height_threshold
} else {
function(a, b) if (is.na(a) || is.na(b)) NA else a < b
function(value) value < height_threshold
}

# Function to safely find extremum value
safe_extremum <- if (type == "peak") {
function(x) if (all(is.na(x))) NA else min(x, na.rm = TRUE)
} else {
function(x) if (all(is.na(x))) NA else max(x, na.rm = TRUE)
# Function to check if point is extremum within window
is_window_extremum <- function(idx) {
if (is.na(x[idx])) return(NA)

# First check height threshold
if (!meets_height_threshold(x[idx])) return(FALSE)

window_start <- max(1, idx - half_window)
window_end <- min(n, idx + half_window)
window_values <- x[window_start:window_end]

if (any(is.na(window_values))) return(NA)

if (type == "peak") {
all(window_values <= x[idx]) && any(window_values < x[idx])
} else {
all(window_values >= x[idx]) && any(window_values > x[idx])
}
}

# Function to handle plateaus
handle_plateau <- function(start_idx, end_idx, value) {
# First check height threshold
if (!meets_height_threshold(value)) {
return(list(indices = NULL, is_na = FALSE))
}

# Check if we can determine if this is an extremum
if (start_idx == 1 || end_idx == n ||
is.na(x[start_idx - 1]) || is.na(x[end_idx + 1])) {
window_start <- max(1, start_idx - half_window)
window_end <- min(n, end_idx + half_window)

if (any(is.na(x[window_start:window_end]))) {
return(list(indices = NULL, is_na = TRUE))
}

if (!compare(value, x[start_idx - 1]) || !compare(value, x[end_idx + 1])) {
return(list(indices = NULL, is_na = FALSE)) # Not an extremum
is_extremum <- if (type == "peak") {
all(x[window_start:window_end] <= value) &&
any(x[window_start:window_end] < value)
} else {
all(x[window_start:window_end] >= value) &&
any(x[window_start:window_end] > value)
}

# Check height threshold
if (type == "peak") {
if (value <= height_threshold) return(list(indices = NULL, is_na = FALSE))
} else {
if (value >= height_threshold) return(list(indices = NULL, is_na = FALSE))
if (!is_extremum) {
return(list(indices = NULL, is_na = FALSE))
}

# Return indices based on plateau handling method
Expand All @@ -66,8 +311,8 @@ find_extrema <- function(x, height_threshold, min_prominence, plateau_handling,
}

# Process the sequence
i <- 2
while (i < n) {
i <- (half_window + 1)
while (i <= (n - half_window)) {
# Handle NA
if (is.na(x[i])) {
is_extremum[i] <- NA
Expand All @@ -92,23 +337,7 @@ find_extrema <- function(x, height_threshold, min_prominence, plateau_handling,
i <- plateau_end + 1
} else {
# Handle single point
if (i > 1 && i < n) {
if (is.na(x[i-1]) || is.na(x[i+1])) {
is_extremum[i] <- NA
} else {
left_compare <- compare(x[i], x[i-1])
right_compare <- compare(x[i], x[i+1])

if (is.na(left_compare) || is.na(right_compare)) {
is_extremum[i] <- NA
} else if (left_compare && right_compare) {
if ((type == "peak" && x[i] > height_threshold) ||
(type == "trough" && x[i] < height_threshold)) {
is_extremum[i] <- TRUE
}
}
}
}
is_extremum[i] <- is_window_extremum(i)
i <- i + 1
}
}
Expand All @@ -118,32 +347,33 @@ find_extrema <- function(x, height_threshold, min_prominence, plateau_handling,
for (i in which(is_extremum)) {
if (is.na(is_extremum[i])) next

# Calculate prominence
left_idx <- i - 1
right_idx <- i + 1
# Calculate prominence using window_size for initial search
left_idx <- max(1, i - half_window)
right_idx <- min(n, i + half_window)

# Look left
while(left_idx > 1 && !is.na(x[left_idx]) &&
(type == "peak" && x[left_idx] <= x[i] ||
type == "trough" && x[left_idx] >= x[i])) {
left_idx <- left_idx - 1
}

# Look right
while(right_idx < n && !is.na(x[right_idx]) &&
(type == "peak" && x[right_idx] <= x[i] ||
type == "trough" && x[right_idx] >= x[i])) {
right_idx <- right_idx + 1
}

# If we hit NA on either side before finding a higher/lower point,
# we can't determine prominence
if (is.na(x[left_idx]) || is.na(x[right_idx])) {
is_extremum[i] <- NA
next
}

extremum_value <- safe_extremum(c(x[left_idx:i], x[i:right_idx]))
extremum_value <- if (type == "peak") {
min(x[c(left_idx:i, i:right_idx)], na.rm = TRUE)
} else {
max(x[c(left_idx:i, i:right_idx)], na.rm = TRUE)
}

if (is.na(extremum_value)) {
is_extremum[i] <- NA
next
Expand Down
Loading