Audio Signal Processing Research - Results and Demos #1

NOTE: There are still compatibility issues with playing media files in some browsers or with different operating systems. Here sample sounds are embedded using a dedicated player, but if the player bar doesn't appear or fails to run for any reason direct links to the mp3 files are also provided.

Results page #1 - Initial Work On Source Separation

Initial (unpublished) work with John Jones, in 1998, used the idea of applying a perfect harmonic model and a number of time- and frequency-based statistical measures, combined using a Bayesian Belief Network, to allow the estimation of key model parameters as they varied over time. These parameters could be associated with a number of individual acoustic sources and then used to drive a separate sinusoidal resynthesis process for each instrument/harmonic source.


This spectrogram shows a (deliberately bad!) vocal harmony. This early work used signals sampled at only 16kHz, so the entire range of frequencies up to the Nyquist limit is shown. Both voices show significant variations in pitch over time, and overlapping partials are evident.

Spectrogram of original harmony

Extraction
                                of a single voice from a sung harmony

Time waveform of original vocal harmony (sound file).

This spectrogram shows a resynthesized (dominant) voice #1 after its time-dependent behaviour has been identified.

Since a perfectly harmonic model is imposed as a basic assumption this represents only those parts of the original data that are consistent with a harmonic model - all broadband breath noise, processing artefacts and other noise sources are suppressed.

Spectrogram of voice 1

Extraction
                                of a single voice from a sung harmony

Time waveform of reconstructed voice #1 (sound file).

Here, the spectrogram shows a weaker resynthesized voice #2 after its time-dependent behaviour has been identified.

In this particular case the voice is harder to identify for two distinct reasons - it is not only quiter than the first voice, but it also happens to have a higher pitch. This means that the partial spacing is greater and that there are less partials available to provide a match with a frequency model.

For both voices the resulting sounds are somewhat idealized versions of the originals - the strict harmonic model means that vocal results sound somewhat 'thin' compared to the originals.

Spectrogram of voice 2

Extraction
                                of a single voice from a sung harmony

Time waveform of reconstructed voice #2 (sound file).

Here the two voices are remixed and shown for direct comparison with the original spectrogram.

Clearly, the identification and tracking of the individual partials is excellent - especially in view of the fact that no additional information regarding formants or any other vocal model information has been employed.

Crucially, this work not only established that individual instruments could potentially be identified and extracted from mono recordings, but also that realistic results could be achieved even with only fairly simple models and a basic sinusoidal resynthesis approach.

Spectrogram of both extracted
                                voices

Extraction
                                of a single voice from a sung harmony

Time waveform of reconstructed voices remixed (sound file).

An interesting spin-off from this parametric approach is the increased access that the user has to the instrument characteristics - individual partial frequencies and amplitudes can be manipulated, or they can be treated as parameter streams and used to drive alternative instruments via MIDI, or even to control lighting or other effects. Please note that, as above, these old (1998-1999) examples are only sampled at 16kHz, and so are of limited quality.


For example: an initial vocal (sound file).


The vocal after parameter estimation and modification - here the magnitudes of the identified partials are manipulated, producing an organ-like effect (sound file).


Creative use of the vocal after assorted time- and frequency-shifting processes have been applied to the identified partials (sound file).


Another example: a pair of guitar riffs (sound file).


Here the partial structures of the two guitars have been identified and used to drive a MIDI instrument, which has then been remixed with the original (sound file).


A final example: an extract from the Queen song 'Fat Bottomed Girls' (sound file).


Here the note extraction process enables access to the partials for manipulation and resynthesis of an entirely new 'voice' (sound file).


And the new sound can be remixed with the original to give an interesting creative result... (sound file).


Back to the Top