Academic Commons

Presentations (Communicative Events)

Mining Large-Scale Music Data Sets

Ellis, Daniel P. W.; Bertin-Mahieux, Thierry

Large collections of music audio are now common and present an interesting research opportunity: what statistical patterns and structure can be discovered across thousands or millions of examples? Unfortunately, copyright restrictions can interfere with access to such collections, so we have developed the Million Song Dataset, including derived features but not the original audio, to support commercial-scale music analysis on a common, research database. The audio features are augmented by a wide range of metadata including lyrics, tags, and listener playcounts. Now the database is ready, we have begun analyzing the content, including tasks such as identifying cover songs -- significantly harder for such a large collection.

Files

More About This Work

Academic Units
Electrical Engineering
Published Here
July 12, 2012

Notes

Presented at Information Theory and Applications Workshop, February 5-10, 2012, San Diego, Calif.

Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.