Peak Detection

Introduction

Musical instrument
signals generally consist of a transient portion and steady state or quasi-periodic
portion. The transient part is usually the attack of the signal and the
steady state the portion that follows the attack part. When investigating
time variant signals it is critical to make use of both time and
frequency domain analysis techniques. Some important features in musical
signals include duration, amplitude modulation, pitch, spectral harmonicity,
spectral envelope, spectral centroid and the like. Attack time is especially
considered a salient feature of musical timbre (Eagleson and Eagleson 1947;
Saldanha and Corso 1964; Elliot 1975) and has been thought to be a dominant
feature of musical instruments. However, it has also been discovered that
the attack time and also note-to-note transients of a signal are neither
sufficient nor necessary for recognizing
musical instruments (Kendall 1986). This controversial discovery supports
the importance of the steady state portion of a signal.

This chapter
mainly describes the implementation of the signal processing algorithms
used in the software system for extracting features that depict these transient
and stationary characteristics in the frequency and time domain. The frequency
domain analysis section of this chapter is primarily based on the discrete
Fourier transform (DFT). DFT based spectral analysis algorithms discussed
includes
short time Fourier transform, spectral centroid, spectral
smoothness andtracking
of partials over time. In the time domain analysis section I will
mainly describe the implementation of algorithms including pitch detection
with interpolation and a period averaging based on the autocorrelation
function. Other modules discussed are amplitude envelope, amplitude modulation,
attack time computation and noise content analysis.

Frequency
Domain Analysis

DFT
and STFT

The spectral
analysis part of feature extraction is primarily based on the discrete
Fourier transform (DFT). Below the continuous time and discrete time versions
of the Fourier transform are shown.

(2.1)

(2.2)

To extract
transitory spectral characteristics the short time Fourier transform (STFT)
was used (Allen 1977; Allen and Rabiner 1977). The basic algorithm is as
follows.

(2.3)

As seen in
figure 2.1 the STFT can be simply described as windowing and taking the
FFT of the signal. There are various window types available in the program

Figure 2.1 Short time Fourier transform
and Spectral Peak Detection

with different
side-lobe and main lobe characteristics.
The Hamming window has been shown to work particularly well with
musical
signals (De Poli, Piccialli and Roads 1991). See the appendix for details
regarding windowing and its side-lobe and main lobe characteristics.

Spectral
Peak Detection and Tracking

Pitched musical
instruments display a high degree of harmonic spectral quality when analyzed
for frequency content. Most tend to have
quasi-integer harmonic
relationships between spectral peaks and the fundamental frequency. In
voice, the spectral envelope displays mountain-like contours or valleys
known as formants. The locations of the formants distinctively describe
vowels. This is also evident in violins, but the number of valleys is greater
and the formant locations change very little with time unlike the voice,
which varies substantially for each vowel. Woodwinds such as the bassoon
and oboe on the other hand have fewer formants than the voice, but tend
to have stronger and clearer spectral contours that perceptually characterize
the woodwind family (Cook 1999). Generally, musical instruments like the
plucked string (figure 2.2) exhibit lower energy in the high frequency
bins. The higher partials normally have less energy and also die out faster
than lower ones over time.

Figure
2.2 Plucked string spectrum

Using the short time Fourier transform,
I have implemented a spectral peak detection and tracking method, extracting
quasi-integer related harmonics from the spectrum. The peak picking algorithm
takes into consideration magnitude and frequency information to select
the most prominent and harmonically behaving peaks. To help in the search
for spectral peaks, various threshold values are used as described below.

The spectral
peak detection algorithm is divided into four main steps. The first pass
roughly locates possible peaks, where the roughness factor for searching
peaks is controlled via a threshold
value. The threshold value basically dictates the degree of “peakiness”
that is allowed for a local maximum to be considered a possible peak. The
second pass filters out peaks that may have been erroneously selected in
step
1. The third pass looks for any broken harmonic sequence, analyzing
harmonic relationships of the currently selected peaks. In this pass, peaks
that may have been deleted or missed in the previous two passes are inserted.
The final pass looks at the selected peaks and further does a harmonic
analysis ultimately leaving a set of peaks that are most probably
harmonics. A mean and scalable standard deviation error
method is applied for control of inharmonicity.

Figure 2.3 Peak detection algorithm

Step
1: Rough Peak Detection

In
the rough peak detection algorithm possible peaks are picked using
negative and positive slope threshold values to guide in the selection
process. As shown in figure 2.4 the polarity
of the slope of the spectrum is computed from bin to bin (DC to Nyquist)
using the basic assumption that a transition from positive to

Figure 2.4 Rough search for peaks

negative slope calls for the possibility
of a peak. The following conditions help in the selection of a peak:

The
slope must change polarity, positive to negative.

The
magnitude difference between the peak candidate and the current
bin’s magnitude component (X[k]-X[k+4]) must be greater than a threshold
value – see example (figure 2.5).

A
new peak candidate search occurs only after there is a slope change from
negative to positive and when a threshold value as shown in figure 2.6
is exceeded.

Refer
to flowcharts in the appendix for details.

Figure 2.5 Actual peak assessment

Figure 2.6 Transitional peaks (noise)

Step
2: Prominent Peak Search

In step 2,
prominent
peaks are located from a set of potential peaks found in step 1. The
purpose is to filter out local peaks which may be present
between
stronger partial candidates as shown in figure 2.7. The search for prominent
peaks is done in the following way:

Figure 2.7 Prominent peak search

The bin with
the maximum magnitude is found.

Relative
to position of the peak with maximum amplitude, peaks are analyzed moving
towards DC.

Relative
to position of peak with maximum amplitude, peaks are analyzed moving towards
the Nyquist frequency.

Local maxima
or peaks are picked out using an adaptive threshold value that is reflective
of a prominent peaks (possible
partials) and its neighboring peaks as shown in figure 2.7. For
example a 50% threshold value will require neighboring peaks to be greater
than at least half the magnitude of the prominent peak (possible
partial). Refer to the appendix for details on algorithm.

Step
3: Harmonic Break Search

The third
step is called the harmonic break search. Here, I have tried to
analyze if some “potential partials” were deleted or missed in the previous
steps. This may occur when potentially harmonically related peaks temporarily
have little energy or are simply much weaker than the stronger ones, but
are nevertheless harmonic. The harmonic break search is divided
into the following sub-routines:

Analyze harmonic
relationship between current partial candidates, by computing the mean
bin spacing between all prominent peaks.

(2.4)

Detecting any
harmonic
breaks, or discontinuities between prominent peaks.

If discontinuities
are found, going back to step 1 and 2 and do a refined search of possible
peaks between pairs of prominent peaks.

Figure 2.8 Harmonic break search

In the
harmonic
break search’s second step, harmonic discontinuities are detected using
a pair of threshold values limiting the range of harmonic deviation. Hence,
the algorithm expects the possibility of a peak within the threshold bounds
computed in sub-step 2 (figure 2.8). Refer to appendix for more details
on algorithm.

Step
4: Harmonicity Analysis

Finally in
step 4 an overall harmonicity verification is performed. In this last step,
the first few peaks (selectable in software) are used as a guide to determine
the final set of partials. The reason for choosing the first few
peaks of the spectrum is due to the fact that in highly pitch salient signals,
lower harmonics usually are stronger and more stable than higher ones.

The idea
is to use the gaussian normal distribution function employing mean, variance
and standard deviation for eliminating inharmonic or misbehaving partials.
A peak that is outside a right and left threshold bound is considered inharmonic
and misbehaving. A mean bin spacing value denoting the bin distances between
neighboring peak candidates is computed to render the variance and
standard
deviation. As the lower partials generally tend to be more stable and
have more energy, the first K (K: integer > 0) peaks are used for the computation
of the standard deviation. A scaled version of the the standard deviation
is then used as a criterion for evaluating inharmonicity of each partial
candidate. The scaledstandard deviation is
increased or decreased to control the permitted spread of each peak.
In other words, the scaled standard deviation is directly
relevant to the amount of inharmonicty tolerated
for selecting the final set of peaks. The scalar that controls the
scaled
standard deviation is a value between 0 and 1, where 1 is equivalent
to limiting the peaks to the original un-scaled
standard deviation. This method is implemented by computing an ideal
sequence of harmonics using the above acquired data. Hence the ideal harmonic
series is a sequence of partials as shown below.

(2.5)

The ideal set
of harmonics and the actual set of harmonics are compared and the error (equation
2.6) for each peak is computed and verified against the scaled standard
deviation for final assessment. Peaks that have excessive error values
are deleted from the final set of peaks and the remaining ones are finally
considered harmonics. See the appendix for more details on algorithm.

(2.6)

Equation 2.6
shows the error between the ideal and actual bins where M is the number
of ideal peaks and N is the number of actual peaks in the spectrum. M and
N have different values as missing partials may exist in the actual set
of peaks.

Partial
Tracking between Frames

Once harmonics
have been evaluated in each frame (a frame is equal to the length of the
FFT), they are combined to render a spectrogram. Frame to frame partial
movement is determined using a harmonic continuity criterion as
shown in figure 2.9.

Figure 2.9 Partial tracking between
frames

The harmonic
continuity criterion is explained as follows: Each harmonic in a frame
is allowed to sway in frequency within a set of error margin values. Hence,
as shown in figure 2.9, four of the harmonics make a continuous harmonic
path (k, k+1, k+2, k+3).
However, the harmonic in frame k+4 exceeds the allowed error margin and
breaks the previous harmonic path. At frame k+4 a new path is created and
the path which started at frame k is discontinued. The harmonic continuity
criterion is helpful in observing movements of the harmonics over time
and frequency.

출처: http://silvertone.princeton.edu/~park/thesis/dartmouth/html/ch2-1.html

이것이 좋아요:

답글 남기기 응답 취소

Press ESC to close

이것이 좋아요:

공유하기:

각종 데이터베이스의 JDBC 다운로드 링크

Free pascal에서 SQLite 사용하기

답글 남기기 응답 취소