|
Recent
work with Giorgos Siamantas is addressing a number of issues
highlighted by previous separation work at York. In particular, for
practical
applications the earlier
work requires a great deal of user
interaction, particularly in the form of a user-produced MIDI score
providing approximate information regarding instrument types,
pitches and timings which assists the algorithm when separating
complicated polyphonic melodies. Such user input not only is costly and
slow, but also restricts the use of the algorithm and introduces a wide
range of uncertainties, in the sense that the separation algorithm can
be misled by inaccurate information supplied by a user, or can produce
different results depending on the information supplied by different
users. Current work hence includes developing a new multipitch
'front-end' to the algorithm which estimates the fundamental
frequencies of the various instruments and removes the need for this
user input.
Further research is also under way into exploiting the
information
contained within the residual channel. This channel contains all the
data which is not consistent with any a priori
restrictions/information in terms of instrument models and
characteristics. For example, if we choose to identify, extract and
separate energy that corresponds to broadly harmonic structures of
partials, then the residual will contain not only true 'noise' sources,
but all energies associated with strongly inharmonic partials and
broadband energy associated with transient events such as the attacks
of individual notes. This means that the individual output channels
contain separated information about not only the frequency and
amplitude variations of the notes produced by multiple instruments, but
also, via the content of the residual, the onset times and the
strength/duration of the note attack.
This opens up a number of new possibilities:
- Using the information in all of the separate
'demixed' output tracks and the residual to provide revised
estimates of the information provided initially by the front-end
multipitch estimator, establishing an
iterative procedure for improving
the overall separation quality;
- Concentrating on the content of the residual track,
using
the separation
process as a means of removing harmonic energy from the signal,
enhancing the relative strength of the individual note attacks and
hence providing an enhanced onset
detection process;
- Linking portions of the nonharmonic attack energy
within the residual with the associated harmonic decay/sustain/release
portion of
individual note events within each separated track. This allows new creative
control over note characteristics - for example, time
stretching or pitch shifting of the decay/sustain/release portion of a
note without distortion of the attack portion;
- Separating the individual instruments within the two
channels of a stereo input, and then using the relative timing and
amplitude information from all of the output (harmonic and residual)
tracks to allow calculation of the relative delays and attenuation of
the individual instruments, providing enhanced estimation of their
positions within the stereo image, hence allowing not only manipulation
of the positions of the individual sources within the stereo image, but
also enabling an enhanced conversion
to
a surround sound format.
|
|
Iterative
improvement
of
separation
quality
For example, the graphs below show plots of one
particular
performance measure (the Signal-To-Distortion Ratio - SDR) for the two
individual sources extracted from mixtures of a flute (D6) and a
bassoon (A4) for test cases ranging over a wide range of relative
volumes.
The left-hand plot shows the results for a single
application of the
new separation system, where the initial multipitch detection stage has
failed to detect the flute sound within the original mix below a
certain critical relative volume level - below this threshold, the
bassoon sound
simply
swamps the flute.
The right-hand plot shows the effect of iteratively
repeating the process using the information now available in the output
channels. Where the initial step has failed to detect the flute due to
its small relative energy, that energy ends up within the residual
channel. Although small relative to the initial mix, this energy
may now be significant relative to the other content within just the
residual, and hence this channel can usefully be fed back to the
multipitch detector for further processing. In this particular example,
the overall effect is to extend the effective range of relative volumes
over which the two individual instruments can be recognised by about
12dB.
|
|
|

SDR measures for the two
extracted sources at different relative volumes using a single pass of
the combined multipitch/separation algorithm.
|

SDR measures for the two extracted sources at different
relative
volumes, showing a significant improvement due to using the information
within the residual channel to inform a second pass of the
multipitch/separation algorithm.
|
|
|
Similarly, the graphs below illustrate a rather worse
case where, unlike the above example, the initial multipitch detector
fails for not just one, but both instruments in a
cello (B3) and saxophone (A3) mixture below a certain threshold energy
ratio.
Here, the use of the residual is even more effective,
confirming that the pitch detection and
signal separation stages are strongly connected processes - it is,
of course, easier to separate signals where some pitch information is
available, and estimating pitch information is much easier for isolated
sources. An iterative implementation where the two processes assist
each
other is an effective overall approach.
|
|
|

SDR measures for the two
extracted sources at different relative volumes using a single pass of
the combined multipitch/separation algorithm.
|

SDR measures for the two extracted sources at different
relative
volumes, showing a significant improvement due to using the information
within the residual channel to inform a second pass of the
multipitch/separation algorithm.
|
|
|
In practice, the use of an initial pitch detection
stage, together with an iterative improvement process, means that good
quality results can be obtained that would previously have required
significant user input. Below, for example, the extract from
'African Breeze',
performed by Hugh Masekela with Jonathan Butler (previous results
page) is processed using the new automated method to isolate just
the flugelhorn, which is then remixed at a different volume (all output
files are normalised to a RMS average level of -20dB, to ease
comparison). This confirms that excellent remixed versions of mono
originals can be produced without the need for user interaction in the
demixing process.
|
|
|

|

|
|
|
|
|
|

|
|
|
|
|
|
Enhanced
onset detection
The graphs below show the results of the separation
process for three isolated notes - a cello, a violin and a saxophone.
The left-hand plots show the residual after identification and removal
of all (near) harmonic partials, and the right-hand ones show the
attack portion as part of the original note waveform. Several points
are clear:
- The energy in the residual is relatively small except
for well-defined periods of time associated with the non-harmonic
processes during the early stages of the notes;
- The amplitudes, durations and envelopes of the
attacks are quite different for the three instruments;
- In all cases the duration of the separated attack
energy does
not correlate well with the parameters estimated via the commonly used
attack-decay-sustain-release (ADSR) envelope;
- In each overall waveform there is no clear boundary
between the non-harmonic attack energy and the more harmonic
decay/sustain/release
portion - instead there are smooth transitions between the two
behaviours as the standing waves are established and the partials
evolve.
|
|
|

The residual of an isolated cello note.
|
The residual in the
context of the original cello note (note the amplitude axis scale
change).
|
|
|

The residual of an isolated violin note.
|

The residual in the context of the original violin note (note the
amplitude axis
scale change).
|
|
|

The residual of an isolated saxophone note.
|

The residual in the context of the original saxophone note (note the
amplitude axis scale change).
|
|
|
For the general purposes of onset detection, and for
specific applications such as tempo/beat analysis and for
parametrising sounds for the purposes of music information retrieval
(MIR), a major challenge is detecting the attack of notes in the
presence of other sounds - again, the separation approach provides a
powerful way to address this problem.
The graphs below show two different test mixtures of the
three instruments above, with different start times and in different
orders. For the first (saxophone, violin and cello) mixture the
separation process leads to a residual which provides considerably
enhanced access to the individual note
attacks and allows improved estimation of the onset time and the total
attack energy.
|
|
|

The residual signal for a mixture of three different instruments
(saxophone, violin and cello) starting at different times (0.5s
intervals). The individual note attacks and the corresponding onset
times are clearly defined.
|

The residual in the context of the original mixture waveform, showing
the extent to which the separation process has isolated the individual
note attacks (note the
amplitude axis scale change).
|
|
|
The second example below shows the residual from the
separation process for the same instruments in reverse order (cello,
violin and saxophone). This is a harder case, since the violin and
(especially) the saxophone attacks are relatively weak. Nevertheless,
the residual channel still provides considerably improved information.
|
|
|

|

|
|
|
In practice, the availability of the residual and the
increased clarity of the note attacks means that unlike normal onset
detectors, where the detection algorithm has to work on the whole
signal, the onset detection process can be applied just to the residual
signal, as in the example below.
|
|
|
|
|
|
|
|
|
|
The original signal and the
residual signal for a mixture of three different instruments
(cello, saxophone and violin) starting at different times (0.5s
intervals). Here, a conventional onset detector applied to the original
signal (triangles) fails to find the violin onset at 1.5s and produces
some spurious results. Alternatively, a custom onset detector applied
just to the residual signal correctly locates just three onsets at
about 0.5s, 1.0s and 1.5s.
|
|
Creative
and processing effects using separation
Even
within a single note event, being able to separate
the attack from the remainder of the note opens up the opportunity for
enhanced effects such as time stretching and pitch shifting. For
example, the cello note below has been subjected to both such processes
- not only in the normal fashion, but also by first identifying,
separating and protecting the attack from the processes. The
conventional approaches 'soften' or 'draw out' the attack, but the
separation results show that it is quite possible to modify only part
of
the body of the
note, retaining the sharpness and clarity of the attack.
|
|
|
|
 |
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Beyond enhanced effects for isolated notes, the
ability to extract individual sources from within a melody opens up the
opportunity for processing mono and stereo sources in ways that
were previously impossible. For example, taking a simple flute/cello
mix, it is quite possible to apply a pitch shift to just one instrument
- or to apply totally *different* pitch shifts to both instruments.
|
|
|
|
 |
|
|
|
|
|
|
|
|
|
|
|
 |
|
|
|
 |
|
|
|
 |
|
|
|
Similarly, in the cello/guitar mix below, it is quite
possible to modify just the guitar and leave the cello unchanged. In
fact, as above, the key idea is that not only is the cello content not
modified by the time-strectching and pitch-shifting processes, but also
the guitar attack remains unchanged - only the energy identified as
being associated with the broadly harmonic content of the chosen
instrument is changed.
|
|
|
 |
|
|
|
|
|
|
|
|
|
|
More realistically, in the example below the 'African
Breeze' sample used earlier has undergone a separation process,
identifying and isolating the horn solo from the rest of the music.
Then the horn has been subjected to two different pitch shifts,
producing four tracks in total, which have then been remixed together.
Such a procedure would not normally have been possible without access
to multitrack masters of the recording.
|
|
|

|
|
|

|
|
|
Similarly, in the example below, the horn has been
demixed,
pitch-shifted down by an octave, and remixed with the original horn and
the remaining content to produce a more harmonious mixture.
|
|
|

|
|
|
The final
(surreal!) example is 'The Deflating Trumpeter', produced by applying a
full octave slide down in frequency to the horn before combining it
with the remaining (unchanged) content.
|
|
|

|
|
A further publication (PDF format) is
available, as below...
|
|
|
G.
Siamantas,
M.R.
Every
and
J.E.
Szymanski,
'Separating Sources From Single-Channel Musical Material: A Review And
Future Directions'
Proceedings of the Digital Music Research Network Summer Conference
2006, Goldsmiths College, University of London , U.K., (22-23 July
2006).
|
|