Academic Commons

Articles

Ground-Truth Transcriptions of Real Music from Force-Aligned MIDI Syntheses

Turetsky, Robert J.; Ellis, Daniel P. W.

Many modern polyphonic music transcription algorithms are presented in a statistical pattern recognition framework. But without a large corpus of real-world music transcribed at the note level, these algorithms are unable to take advantage of supervised learning methods and also have difficulty reporting a quantitative metric of their performance, such as a Note Error Rate. We attempt to remedy this situation by taking advantage of publicly-available MIDI transcriptions. By force-aligning synthesized audio generated from a MIDI transcription with the raw audio of the song it represents we can correlate note events within the MIDI data with the precise time in the raw audio where that note is likely to be expressed. Having these alignments will support the creation of a polyphonic transcription system based on labeled segments of produced music. But because the MIDI transcriptions we find are of variable quality, an integral step in the process is automatically evaluating the integrity of the alignment before using the transcription as part of any training set of labeled examples. Comparing a library of 40 published songs to freely available MIDI files, we were able to align 31 (78%). We are building a collection of over 500 MIDI transcriptions matching songs in our commercial music collection, for a potential total of 35 hours of note-level transcriptions, or some 1.5 million note events.

Files

Also Published In

Title
ISMIR 2003

More About This Work

Academic Units
Electrical Engineering
Publisher
International Symposium on Music Information Retrieval
Published Here
June 29, 2012
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.