Music transcription is a complex cognitive task that requires a trained musician to
listen to a piece of music, write down what notes were played and the timing of the
notes. The task is further complicated if the music is polyphonic, where several notes
are played simultaneously, requiring the musician to listen repeatedly to the piece of
music so as to work out the notes that were played and their timing. This thesis
describes a polyphonic note detection system based on a simple masking technique
that can accurately transcribe chords and polyphonic piano music. The system,
developed in MATLAB, will take input files in .wav format. The music is segmented
by using Note Average Energy (NAE) onset detection. The onsets are used to
segment the music into note windows which are then analysed using the FFT.
Following the extraction and compilation of the frequency peaks in each note
window, an iterative masking procedure is used to detect and successively extract the
notes and any associated harmonics. The masking procedure uses a database of note
masks compiled from multiple note examples using both monophonic and polyphonic
examples. The instrument modeled in the work described in this thesis is the Technics
KN800 PCM digital keyboard. Once the .wav files have been input into the system,
the system will run automatically until completion. A list of notes played and the
timing of these notes is output from the system. The run time, from the input of the
.wav file to completion of the analysis and output of notes played, can vary
depending on variables such as the length, complexity and degree of polyphony of the
musical piece entered into the system. The thesis presents the results of testing the
system on isolated chords and music played at realistic tempos.