Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multichannel peak, onset, and beat detection #1766

Merged
merged 47 commits into from
Mar 14, 2024
Merged

Conversation

bmcfee
Copy link
Member

@bmcfee bmcfee commented Oct 10, 2023

Reference Issue

Fixes #1695
Follows up on #1130
Fixes #1773
Fixes #44

What does this implement/fix? Explain your changes.

This PR implements the extensions to various detection methods proposed in #1695 to support (dense, boolean) output and multi-channel input.

  • peak picking
  • onset detection
  • [ ] onset backtracking (maybe out of scope, but could be improved)
  • beat tracking

Any other comments?

Everything here should be backward-compatible, and at least as efficient as the existing implementations.

@bmcfee bmcfee added enhancement Does this improve existing functionality? functionality Does this add new functionality? labels Oct 10, 2023
@bmcfee bmcfee added this to the 0.10.2 milestone Oct 10, 2023
@codecov
Copy link

codecov bot commented Oct 10, 2023

Codecov Report

Attention: Patch coverage is 98.52941% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 98.74%. Comparing base (1a21330) to head (b8d0d5a).
Report is 1 commits behind head on main.

Files Patch % Lines
librosa/beat.py 97.77% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1766      +/-   ##
==========================================
- Coverage   98.77%   98.74%   -0.03%     
==========================================
  Files          34       34              
  Lines        4670     4638      -32     
==========================================
- Hits         4613     4580      -33     
- Misses         57       58       +1     
Flag Coverage Δ
unittests 98.74% <98.52%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bmcfee bmcfee force-pushed the multichannel-detection branch 2 times, most recently from f0d467c to 5acc4eb Compare October 12, 2023 20:18
@bmcfee
Copy link
Member Author

bmcfee commented Oct 12, 2023

Thinking through the beat tracking implementation a bit, I'll likely make the following changes:

  1. Convert the dynamic program to populate a boolean array instead of position information. After the beat positions have been filled in, we can use np.flatnonzero to identify their locations.
  2. The above should allow us to jit-compile the DP, which should provide a healthy speed boost in the beat tracker.
  3. Once the first two steps are done, then we can use numba guvectorize to run the tracker independently across channels.
  4. Finally, update the top-level beat tracker function to appropriately handle units / sparse-dense like we do now in onsets.

@bmcfee
Copy link
Member Author

bmcfee commented Oct 16, 2023

While working on the beat tracker, I'm taking some time to clean up some internal API and comments to be more useful and transparent.

@bmcfee
Copy link
Member Author

bmcfee commented Oct 16, 2023

Circling back on a very old issue #44 (closed back in 2019 when we merged PLP) while my head is back in this code. And yes, I recognize that it's somewhat a moot point, as the beat tracker we have is far behind SotA these days. However, I still think there's value in at least documenting where the difficulties are, if not improving things to the point where it remains useful.

Local scores

The first step of the beat tracker, after computing the onset strength envelope (OSE) and estimating tempo is to compute a "local score" for each frame. This is essentially the OSE convolved with a very sharp window over a ±1 beat window:

librosa/librosa/beat.py

Lines 447 to 450 in fb7013b

def __beat_local_score(onset_envelope, period):
"""Construct the local score for an onset envlope and given period"""
window = np.exp(-0.5 * (np.arange(-period, period + 1) * 32.0 / period) ** 2)
return scipy.signal.convolve(__normalize_onsets(onset_envelope), window, "same")

This is all easily implemented by a 1d conv when the tempo is fixed. When the tempo is not fixed, we'd need either a separate filter for each frame, or some kind of aggregated filter. The former is inefficient, but could possibly be computed quickly by sparsification. The latter seems problematic to me.

Back-searching

The next place where variable tempo could be problematic is in DP score calculation:

librosa/librosa/beat.py

Lines 458 to 475 in fb7013b

# Search range for previous beat
window = np.arange(-2 * period, -np.round(period / 2) + 1, dtype=int)
# Make a score window, which begins biased toward start_bpm and skewed
if tightness <= 0:
raise ParameterError("tightness must be strictly positive")
txwt = -tightness * (np.log(-window / period) ** 2)
# Are we on the first beat?
first_beat = True
for i, score_i in enumerate(localscore):
# Are we reaching back before time 0?
z_pad = np.maximum(0, min(-window[0], len(window)))
# Search over all possible predecessors
candidates = txwt.copy()
candidates[z_pad:] = candidates[z_pad:] + cumscore[window[z_pad:]]

where each frame looks back to a window around where we'd expect the previous beat to land.

@bmcfee
Copy link
Member Author

bmcfee commented Oct 17, 2023

Update on beat tracking before i break on this for a bit.

The multichannel extension / refactor is almost done. I'm running into some numba dispatching problems though, as it's unhappy about the following:

@numba.guvectorize(
        [
            "void(float32[:], float32, float32, int32[:], float32[:])",
            "void(float64[:], float64, float32, int32[:], float64[:])",
        ],
        "(t),(),()->(t),(t)",
        nopython=True, cache=True)
def __beat_track_dp(localscore, frames_per_beat, tightness, backlink, cumscore):

where in general, localscore has shape [n,m,p,..., n_frames] and frames_per_beat has shape [n,m,p...], matching localscore up to but not including the trailing dimension. I suppose there's not enough in the layout string to enforce this agreement, and I'm not sure yet how to work around it (maybe extend the dimension of frames_per_beat?).

This might need to be reworked a little as a jitted 1d function that gets wrapped up in a numpy vectorize instead of directly using numba vectorize 😓.

@bmcfee
Copy link
Member Author

bmcfee commented Oct 18, 2023

This still needs testing, but multichannel beat tracking is now functional. With jit optimizations, we ended up being just marginally faster than the old implementation, so it's possible that there are more gains to be had here.

Next step is to expand the tests to cover the multichannel cases.

@bmcfee
Copy link
Member Author

bmcfee commented Oct 19, 2023

😆 Rewriting the beat tracker has revealed an off-by-one error that has been trimming one too many beats since ... forever!

Here's the problem. The current (main branch) implementation does the following when trimming beats:

librosa/librosa/beat.py

Lines 505 to 516 in 5ca70f5

def __trim_beats(localscore: np.ndarray, beats: np.ndarray, trim: bool) -> np.ndarray:
"""Remove spurious leading and trailing beats"""
smooth_boe = scipy.signal.convolve(localscore[beats], scipy.signal.hann(5), "same")
if trim:
threshold = 0.5 * ((smooth_boe**2).mean() ** 0.5)
else:
threshold = 0.0
valid = np.argwhere(smooth_boe > threshold)
return beats[valid.min() : valid.max()]

that is, it finds all beat positions where the score exceeds a threshold, and then slices the index array from the first to the last.

The multi-channel branch rewrites this as an explicit loop:

librosa/librosa/beat.py

Lines 540 to 553 in 7f07f0b

smooth_boe = np.convolve(localscore[beats], w)[len(w)//2:-len(w)//2+1]
threshold = 0.5 * ((smooth_boe**2).mean()**0.5)
# Suppress bad beats
n = 0
while localscore[n] < threshold:
beats_trimmed[n] = False
n += 1
n = len(localscore) - 1
while localscore[n] < threshold:
beats_trimmed[n] = False
n -= 1

The idea is the same, but note that when we use the array slice notation (main branch), it excludes the upper end of the range, even though the score at that index exceeds the threshold. This is verified by printing out smooth_boe, threshold on an example track from our test suite (multichannel branch):

In [1]: import librosa

In [2]: y, sr = librosa.load('data/test1_22050.wav')

In [3]: bpm, beats = librosa.beat.beat_track(y=y, sr=sr)
[7.36539798 8.16666805 8.83107379 8.38428126 4.82421986 1.50669484
 0.78669973 1.10853638 1.30460965 3.12087935 3.75983776 1.46726352] 2.599477926489801

The values printed out are numerically identical to those calculated on the main branch. However, the main branch excludes the last valid beat (score 3.75983776) while the multichannel branch includes it. 🤦

This turns out to be the cause of our current test failures, where a display example using beat coords is triggering an error.

@bmcfee
Copy link
Member Author

bmcfee commented Oct 19, 2023

I think aside from linting and final cleanups, this is ready for CR.

One thing that we should discuss though is how we want to handle user-provided tempo in multichannel beat tracking. Previously, tempo could only ever be a scalar, so there was no ambiguity. Scalar bpm= still works in the multichannel case, with broadcasting working as you might expect.

You can also provide a multichannel array into bpm= where a separate tempo is given for each channel. If we're interpreting this as a scalar, it should have bpm.ndim == onset_envelope.ndim-1 and bpm.shape == onset_envelope.shape[:-1] ; ie excluding the time dimension, but matching everywhere else.

The hitch here is that feature.tempo does not provide output in this shape. Rather, it preserves the trailing dimension with bpm.shape[-1] == 1, meaning that each bpm is technically a 1d vector and not a scalar. This is why we slice out the last dimension when estimating tempo internally:

librosa/librosa/beat.py

Lines 196 to 203 in 42ca94c

if bpm is None:
bpm = _tempo(
onset_envelope=onset_envelope,
sr=sr,
hop_length=hop_length,
start_bpm=start_bpm,
prior=prior,
)[..., 0]

The problem I'm anticipating is that we'd like to be able to do something like:

>>> bpm = librosa.feature.tempo(y=y, sr=sr, ...)
>>> _, beats = librosa.beat.beat_track(y=y, sr=sr, bpm=bpm, ...)

but this will fail to vector dispatch properly:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 librosa.beat.beat_track(y=y, sr=sr, sparse=False, bpm=bpm)

File ~/git/librosa/librosa/beat.py:206, in beat_track(y, sr, onset_envelope, hop_length, start_bpm, tightness, trim, bpm, prior, units, sparse)
    197     bpm = _tempo(
    198         onset_envelope=onset_envelope,
    199         sr=sr,
   (...)
    202         prior=prior,
    203     )[..., 0]
    205 # Then, run the tracker
--> 206 beats = __beat_tracker(onset_envelope, bpm, float(sr) / hop_length, tightness, trim)
    208 if sparse:
    209     beats = np.flatnonzero(beats)

File ~/git/librosa/librosa/beat.py:454, in __beat_tracker(onset_envelope, bpm, frame_rate, tightness, trim)
    452 tail = __last_beat(cumscore)
    453 beats = np.zeros_like(onset_envelope, dtype=bool)
--> 454 __dp_backtrack(backlink, tail, beats)
    457 # Discard spurious trailing beats
    458 if trim:

File ~/miniconda3/envs/py311/lib/python3.11/site-packages/numba/np/ufunc/gufunc.py:171, in GUFunc.__call__(self, *args, **kwargs)
    167 def __call__(self, *args, **kwargs):
    168     # If compilation is disabled OR it is NOT a dynamic gufunc
    169     # call the underlying gufunc
    170     if self._frozen or not self.is_dynamic:
--> 171         return self.ufunc(*args, **kwargs)
    172     elif "out" in kwargs:
    173         # If "out" argument is supplied
    174         args += (kwargs.pop("out"),)

ValueError: output operand requires a reduction along dimension -1, but the reduction is not enabled. The dimension size of 1 does not match the expected output shape.

We could detect this case easily enough and fix it internally, eg:

if bpm.ndim == onset_envelope.ndim:
    if bpm.shape[-1] != 1:
        raise ParameterError("blah blah blah")
    # Slice out the trailing time axis
    bpm = bpm[..., 0]

but it seems a little hacky. Curious if other people have suggestions here.

@bmcfee bmcfee changed the title [WIP] Multichannel peak, onset, and beat detection [CR] Multichannel peak, onset, and beat detection Oct 20, 2023
@bmcfee
Copy link
Member Author

bmcfee commented Oct 20, 2023

In pinning down the vectorized tempo API, it's probably worth thinking about the (hypothetical) variable-tempo case as well. Mainly I'm trying to avoid painting ourselves into a corner here in case we do at some point want to revisit this idea.

The hunch lurking in the back of my mind is that the DP-based beat tracker could still be very useful with a sufficiently strong onset estimator, and most modern beat trackers do include some kind of DP, Viterbi, or otherwise sequential decoding for exactly this reason. As such, I think it might be worth maintaining and even extending the beat tracker logic here even if the core method itself is rather antiquated, because it could find use as a utility function in more sophisticated models.

Circling back on a very old issue #44
[...]

Back-searching

The next place where variable tempo could be problematic is in DP score calculation:

librosa/librosa/beat.py

Lines 458 to 475 in fb7013b

# Search range for previous beat
window = np.arange(-2 * period, -np.round(period / 2) + 1, dtype=int)
# Make a score window, which begins biased toward start_bpm and skewed
if tightness <= 0:
raise ParameterError("tightness must be strictly positive")
txwt = -tightness * (np.log(-window / period) ** 2)
# Are we on the first beat?
first_beat = True
for i, score_i in enumerate(localscore):
# Are we reaching back before time 0?
z_pad = np.maximum(0, min(-window[0], len(window)))
# Search over all possible predecessors
candidates = txwt.copy()
candidates[z_pad:] = candidates[z_pad:] + cumscore[window[z_pad:]]

where each frame looks back to a window around where we'd expect the previous beat to land.

Now that we've rewritten the backlink search as a jitted function, we actually have a bit more flexibility to support a variable search window for each time step. The existing implementation uses a preallocated window that gets copied and truncated at each step, and really is a bit of a kludge to maintain as much vectorization as possible for efficiency purposes. If instead, we unroll this into an explicit loop (which would be fast under numba), we can let each frame have a different tempo and hence a different search window.

Local scores

This is all easily implemented by a 1d conv when the tempo is fixed. When the tempo is not fixed, we'd need either a separate filter for each frame, or some kind of aggregated filter. The former is inefficient, but could possibly be computed quickly by sparsification. The latter seems problematic to me.

This part is still not great, but I do think implementing this quasi-convolution as a sparse near-toeplitz matrix multiply would be a viable solution here. We can retain the conv1d version for static tempo of course.

Multi-channel, multi-tempo 😱

The API difficulty here creeps in when we now have to consider the different ways in which a bpm= parameter can be scalar or multidimensional. There are four cases to consider supporting. In general, let's assume a multichannel onset envelope of shape (m, n).

  • scalar - shared tempo across all time (and channels)
  • ndarray, shape=(m,) - fixed tempo across time, different tempo for each channel
  • ndarray, shape=(n,) - variable tempo across time, fixed across channels
  • ndarray, shape=(m,n) - variable tempo across time and channels

There's an inherent ambiguity with cases (2) and (3) that I'm not wild about. The obvious way to resolve this is to require expanded singleton dimensions in whichever direction makes the use case explicit. However, this goes against the way we currently support tempo arguments (see above comment), so we should think carefully about how exactly we want to handle that.

@bmcfee
Copy link
Member Author

bmcfee commented Oct 23, 2023

Remaining TODOs:

  • Add docstring examples for sparse / multichannel mode
  • Consider refactoring the beat tracker dp inner loop to explicitly unroll the vectorized cumulative score calculation
  • Figure out a consistent way to handle tempo arrays

@bmcfee
Copy link
Member Author

bmcfee commented Oct 24, 2023

I've rewritten the DP backsearch as an explicit loop rather than a vectorized argmax. Speed is basically unaffected, and it would now be trivial to support time-varying tempo in this part of the algorithm.

I'm still struggling with getting vector dispatch to work consistently when bpm.ndim == onset_envelope.ndim, as described above. Numba's guvectorize should be a little better about this, but so far I haven't had much luck - it always results in an all-pair broadcast, so that an input signal of shape (2, n) with a tempo estimate of (2, 1) turns into a localscore array of size (2, 2, n) and then the whole thing blows up on us. In principle, guvectorize is supposed to handle the special case where the trailing dim has length 1, but maybe i'm doing something wrong here.

@bmcfee bmcfee force-pushed the multichannel-detection branch from 7e15993 to 30c368f Compare October 24, 2023 19:08
@bmcfee
Copy link
Member Author

bmcfee commented Nov 7, 2023

In principle, guvectorize is supposed to handle the special case where the trailing dim has length 1, but maybe i'm doing something wrong here.

I've now rewritten this using guvectorize and it seems to essentially work, at least in the sense that calling with fully vectorized tempo inputs does properly generalize independent single calls, eg with a stereo input:

In [24]: onsets = librosa.onset.onset_strength(y=y, sr=sr, aggregate=np.median)

In [25]: bpm = librosa.feature.tempo(onset_envelope=onsets, sr=sr)

In [26]: t1, b1 = librosa.beat.beat_track(onset_envelope=onsets, bpm=bpm, sparse=False)

In [27]: t1a, b1a = librosa.beat.beat_track(onset_envelope=onsets[0], bpm=bpm[0], sparse=False)

In [28]: t1b, b1b = librosa.beat.beat_track(onset_envelope=onsets[1], bpm=bpm[1], sparse=False)

In [29]: t1, t1a, t1b
Out[29]: 
(array([[ 89.10290948],
        [117.45383523]]),
 array([89.10290948]),
 array([117.45383523]))

In [30]: np.allclose(b1[0], b1a)
Out[30]: True

In [31]: np.allclose(b1[1], b1b)
Out[31]: True

A problem does arise if you pass in a tempo vector that is not fully formed, eg if we were to flatten the bpm array to shape (2,) instead of shape (2, 1) as above:

In [32]: t2, b2 = librosa.beat.beat_track(onset_envelope=onsets, bpm=bpm.flatten(), sparse=False)

In [33]: t2
Out[33]: array([ 89.10290948, 117.45383523])

In [34]: np.allclose(b1, b2)
Out[34]: False

In this case, it appears to be taking the first tempo (bpm[0]) and applying it to both channels.

The proper fix here, I think, is to use expand_dims to ensure that bpm has a trailing singleton dimension before calling into vectorized subroutines.

@bmcfee
Copy link
Member Author

bmcfee commented Nov 7, 2023

I put in the expand_dims hack, and it appears to be functional now:

In [1]: import librosa

In [2]: import numpy as np

In [3]: y, sr = librosa.load(librosa.ex('fishin', hq=True), mono=False)

In [4]: onsets = librosa.onset.onset_strength(y=y, sr=sr, aggregate=np.median)

In [5]: bpm = librosa.feature.tempo(onset_envelope=onsets, sr=sr)

In [6]: t1, b1 = librosa.beat.beat_track(onset_envelope=onsets, bpm=bpm, sparse=False)

In [7]: t2, b2 = librosa.beat.beat_track(onset_envelope=onsets, bpm=bpm.flatten(), sparse=False)

In [8]: np.allclose(b1, b2)
Out[8]: True

@bmcfee
Copy link
Member Author

bmcfee commented Nov 8, 2023

Seems like this fails with 3-dimensional inputs.. will need to investigate further.

@bmcfee
Copy link
Member Author

bmcfee commented Nov 8, 2023

Seems like this fails with 3-dimensional inputs.. will need to investigate further.

Following up: I believe the implementation is correct. The disagreement I was seeing came down to onset envelope calculation differing when given multichannel input. This eventually traces back to the amplitude_to_db calculation, which is not channel-dependent.

@bmcfee
Copy link
Member Author

bmcfee commented Nov 8, 2023

I'm going to take a crack at extending this to support time-varying tempo, because why not?

@bmcfee
Copy link
Member Author

bmcfee commented Nov 9, 2023

Quick stab at a time-varying local score function:

from librosa.util.exceptions import ParameterError
@numba.guvectorize(
        [
            "void(float32[:], float32[:], float32[:])",
            "void(float64[:], float64[:], float64[:])",
        ],
        "(t),(n)->(t)",
        nopython=True, cache=False)
def bls_numbav2(onset_envelope, frames_per_beat, localscore):

    N = len(onset_envelope)
    
    if len(frames_per_beat) == 1:
        # Static tempo mode
        window = np.exp(-0.5 * (np.arange(-frames_per_beat[0], frames_per_beat[0] + 1) * 32.0 / frames_per_beat[0]) ** 2)
        K = len(window)
        # This is a vanilla same-mode convolution
        for i in range(len(onset_envelope)):
            localscore[i] = 0.
            # we need i + K // 2 - k < N ==> k > i + K //2 - N
            # and i + K // 2 - k >= 0 ==>    k <= i + K // 2
            for k in range(max(0, i + K // 2 - N + 1), min(i + K // 2, K)):
                localscore[i] += window[k] * onset_envelope[i + K//2 -k]
                
    elif len(frames_per_beat) == len(onset_envelope):
        # Time-varying tempo estimates
        # This isn't exactly a convolution anymore, since the filter is time-varying, but it's pretty close
        for i in range(len(onset_envelope)):
            window = np.exp(-0.5 * (np.arange(-frames_per_beat[i], frames_per_beat[i] + 1) * 32.0 / frames_per_beat[i]) ** 2)
            K = 2 * int(frames_per_beat[i]) + 1
            
            localscore[i] = 0.
            for k in range(max(0, i + K // 2 - N + 1), min(i + K // 2, K)):
                localscore[i] += window[k] * onset_envelope[i + K // 2 - k]
    else:
        raise ParameterError(f"Invalid bpm shape={len(frames_per_beat)} does not match onset envelope shape={len(onset_envelope)}")

For static tempo, this is about 200μs slower than the straight ahead scipy implementation (70μs on fishin example with standard params) we previously had, but about 200μs faster than a numba-wrapped np.convolve (420μs). The latter case is because until recently, numba's convolve implementation did not support modes, so we have to explicitly copy out the same-mode slice if using np.convolve. The above implementation avoids this copy and works in-place on the pre-allocated output array.

For time-varying tempo, it takes about 1.3ms. We could probably shave that down a bit, but I don't actually think it would be worth the added code complexity to do so.

@bmcfee
Copy link
Member Author

bmcfee commented Nov 9, 2023

Dynamic tempo beat tracker is now implemented in this branch. I haven't implemented tests yet, but so far it seems to be behaving exactly as I'd expect. On the brahms example:

image

Red and blue show the current tracker (static tempo), orange and green show how it works with dynamic tempo. Dashed lines are tempo estimates (Autocorrelation of oenv), solid lines are inter-beat intervals converted to BPM after running the detector. The static tempo tracker is allowed to fluctuate from the static tempo, and it does so pretty severely when the tempo shifts. The dynamic tracker follows the estimated tempo, as expected.

Neither of these are necessarily great, though I suspect most of the problems in the dynamic case are due to bad local tempo estimation, and not a failure of the tracking per se. A better dynamic tempo estimator would probably improve the tracking results.

@bmcfee
Copy link
Member Author

bmcfee commented Nov 10, 2023

Doing some more thorough testing here (static tempo case), I noticed that the DP behavior is now slightly different from what we had in prior stable versions. For identical inputs (onset envelope saved to disk) in this branch vs 0.9.2, I've verified that the computation agrees numerically up to the local score step. The next step is the DP, which in this branch produces the following backlink table:

[-1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 6, 7, 7, 7, 7, 7, 7, 14, 14, 14, 14, 14, 14, 14, 14, 14, 23, 23, 24, 24, 24, 25, 25, 25, 30, 31, 31, 31, 32, 33, 33, 34, 34, 34, 41, 42, 42, 42, 42, 42, 42, 43, 48, 48, 49, 49, 50, 51, 52, 53, 53, 53, 58, 58, 60, 60, 61, 61, 62, 62, 62, 62, 67, 68, 69, 70, 70, 71, 72, 72, 76, 76, 77, 77, 80, 80, 81, 81, 81, 81, 81, 81, 87, 88, 88, 89, 90, 90, 93, 94, 96, 97, 97, 98, 99, 99, 100, 100, 100, 100, 101, 106, 106, 108, 108, 109, 109, 113, 114, 116, 116, 116, 116, 117, 117, 117, 118, 118, 120, 123, 126, 126, 127, 127, 127, 132, 132, 133, 133, 134, 134, 135, 135, 136, 136, 136, 137, 143, 144, 144, 144, 144, 145, 149, 150, 151, 151, 151, 153, 153, 154, 154, 154, 154, 160, 160, 161, 161, 161, 162, 163, 167, 168, 168, 170, 170, 170, 171, 171, 171, 173, 173, 178, 178, 179, 179, 179, 180, 181, 185, 186, ...

where the special token value of -1 indicates no prior link is possible.
In the 0.9.2 branch, it produces the following:

[-1, -1, -1, -1, -1, -13, -12, -11, -10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 6, 7, 7, 7, 7, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 23, 24, 24, 24, 28, 29, 29, 30, 30, 31, 31, 31, 33, 33, 34, 34, 34, 36, 36, 42, 43, 43, 43, 45, 47, 47, 48, 49, 49, 49, 51, 52, 53, 53, 53, 54, 54, 60, 61, 62, 62, 63, 63, 64, 66, 67, 67, 68, 70, 70, 71, 72, 72, 72, 73, 73, 80, 80, 80, 81, 81, 81, 81, 81, 86, 87, 88, 88, 89, 90, 90, 91, 91, 96, 97, 98, 98, 99, 99, 100, 100, 100, 100, 105, 106, 106, 108, 108, 109, 109, 109, 114, 116, 116, 116, 116, 117, 117, 117, 118, 118, 123, 123, 126, 126, 127, 127, 127, 128, 132, 133, 133, 134, 134, 135, 135, 136, 136, 136, 140, 143, 144, 144, 144, 144, 145, 145, 150, 151, 151, 151, 153, 153, 154, 154, 154, 160, 160, 160, 161, 161, 161, 162, 163, 163, 168, 168, 170, 170, 170, 171, 171, 171, 173, 176, 178, 178, 179, 179, 179, 180, 181, 181, 186, ...

which is not only different, but clearly incorrect because we should never have negative values here apart from -1. I'm not sure exactly where these came from, but it does seem like a bug.

This never appeared in the final output because the backtracking loop discards any negative values. However, it does seem to corrupt the cumulative score computation, and the resulting detections are slightly different.

@bmcfee bmcfee force-pushed the multichannel-detection branch from f787468 to 03c0fed Compare March 2, 2024 16:08
@bmcfee bmcfee mentioned this pull request Mar 2, 2024
docs/examples/plot_dynamic_beat.py Outdated Show resolved Hide resolved
docs/examples/plot_dynamic_beat.py Outdated Show resolved Hide resolved
docs/examples/plot_dynamic_beat.py Outdated Show resolved Hide resolved
docs/examples/plot_dynamic_beat.py Outdated Show resolved Hide resolved
docs/examples/plot_dynamic_beat.py Outdated Show resolved Hide resolved
librosa/beat.py Show resolved Hide resolved
librosa/beat.py Outdated Show resolved Hide resolved

# TODO: this might be better accomplished with a np.broadcast_shapes check
if bpm.shape[-1] not in (1, onset_envelope.shape[-1]):
raise ParameterError(f"Invalid bpm shape={bpm.shape} does not match onset envelope shape={onset_envelope.shape}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

untested?

librosa/util/utils.py Show resolved Hide resolved
if x.ndim != 1:
raise ParameterError("input array must be one-dimensional")
if sparse and x.ndim != 1:
raise ParameterError("input array must be one-dimensional if sparse=True")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same proposal as before

@bmcfee
Copy link
Member Author

bmcfee commented Mar 13, 2024

Thanks @lostanlen ! I've updated the text mostly taking your suggestions (with some minor tweaks here and there). Exception messages have also been expanded per your suggestion, though the phrasing is different in the two cases you identified because one has to do with onsets/signals and the other with arbitrary data arrays.

@bmcfee
Copy link
Member Author

bmcfee commented Mar 13, 2024

Doc CI failure is due to a temporary outage of the scipy procedings archive, nothing to be done from our side.

@lostanlen
Copy link
Contributor

Thank you @bmcfee for incorporating my feedback.
It looks like this conditional branch in beat.py in still untested?

    if bpm.shape[-1] not in (1, onset_envelope.shape[-1]):
        raise ParameterError(f"Invalid bpm shape={bpm.shape} does not match onset envelope shape={onset_envelope.shape}")

@bmcfee
Copy link
Member Author

bmcfee commented Mar 13, 2024

It looks like this conditional branch in beat.py in still untested?

Yeah - I'm not too worried about this. It's a redundant check within a private function, and the user-facing function (beat_track) checks are tested. I left this check in place primarily for debugging purposes, and since it's not hurting anything, I'd prefer to leave it here in case we need it later on.

@lostanlen lostanlen self-requested a review March 14, 2024 08:38
@bmcfee bmcfee changed the title [CR] Multichannel peak, onset, and beat detection Multichannel peak, onset, and beat detection Mar 14, 2024
@bmcfee bmcfee merged commit 0033e43 into main Mar 14, 2024
11 of 13 checks passed
@bmcfee bmcfee deleted the multichannel-detection branch March 19, 2024 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Does this improve existing functionality? functionality Does this add new functionality?
2 participants