DLfM '24: Proceedings of the 11th International Conference on Digital Libraries for Musicology

Digital Library logo Full Citation in the ACM Digital Library

Direct labelling of form of Classical-period piano sonata movements from audio recordings

Paul Burger and J. P. Jacobs

Musical form is defined as the overall structure of a music piece. The labelling of musical form types (for the purpose of, e.g., querying online music databases) by utilizing raw audio alone is a relatively unexplored area in the field of music information retrieval research. This study investigates the use of self-similarity matrices based on features derived from the raw audio as input into a convolutional neural network to label eight form types found in the movements of piano sonatas from the Classical period, composed by Mozart, Beethoven, Haydn, Clementi and Czerny. Specifically, the focus on pieces for solo piano allows for the use of piano roll features which are generated from the raw audio by state-of-the-art piano transcription software. This work entails the first time that passing the entire self-similarity matrix to a convolutional neural network for the purposes of overall musical form recognition is proposed and explored. The method circumvents the potential difficulties related to inferring form labels in a bottom-up manner based on audio segment boundary detection and segment matching, by directly generating form labels from the audio. Self-similarity matrices based on velocity piano rolls (that contain values that relate to the velocity of the notes being played) were found to outperform other self-similarity matrix types and achieved a macro average ROC-AUC score of 0.823 and a coverage score of 2.045 on a custom data set which was compiled from verified musicological sources. The study is posed as a multi-label classification problem rather than a multi-class classification problem as different form labels were found for several piano sonata movements.

Acoustic classification of guitar tunings with deep learning

Edward Hulme, David Marshall, Kirill Sidorov and Andrew Jones

A guitar tuning is the allocation of pitches to the open strings of the guitar. A wide variety of guitar tunings are featured in genres such as blues, classical, folk, and rock. Standard tuning provides a convenient placing of intervals and a manageable selection of fingerings. However, numerous other tunings are frequently used as they offer different harmonic possibilities and playing methods.

A robust method for the acoustic classification of guitar tunings would provide the following benefits for digital libraries for musicology: (i) guitar tuning tags could be assigned to music recordings; these tags could be used to better organise, retrieve, and analyse music in digital libraries, (ii) tuning classification could be integrated into an automatic music transcription system, thus facilitating the production of more accurate and fine-grained symbolic representations of guitar recordings, (iii) insights acquired through guitar tunings research, would be helpful when designing systems for indexing, analysing, and transcribing other string instruments.

Neural networks offer a promising approach for the automated identification of guitar tunings as they can learn useful features for complex discriminative tasks. Furthermore, they can learn directly from unstructured data, thereby reducing the need for elaborate feature extraction techniques.

Thus, we evaluate the potential of neural networks for the acoustic classification of guitar tunings. A dataset of authentic song recordings, which featured polyphonic acoustic guitar performances in various tunings, was compiled and annotated. Additionally, a dataset of synthetic polyphonic guitar audio in 5 different tunings was generated with sample-based audio software and tablatures. Using audio converted into log mel spectrograms and chromagrams as input, convolutional neural networks were trained to classify guitar tunings. The resulting models were tested using unseen data from disparate recording conditions. The best performing systems attained a classification accuracy of 97.5% (2 tuning classes) and 73.9% (5 tuning classes).

This research provides evidence that neural networks can classify guitar tunings from music audio recordings; produces novel annotated datasets that contain authentic and synthetic guitar audio, which can serve as a benchmark for future guitar tuning research; proposes new methods for the collection, annotation, processing, and synthetic generation of guitar data.

Svara-forms and coarticulation in carnatic music: An investigation using deep clustering

Thomas Nuttall, Xavier Serra and Lara Pearson

Across musical genres worldwide, there are many styles where the shortest conceptual units (e.g., notes) are often performed with ornamentation rather than as static pitches. Carnatic music, a style of art music from South India, is one example. In this style, ornamentation can include slides and wide oscillations that hardly rest on the theoretical pitch implied by the svara (note) name. The highly ornamented and oscillatory qualities of the style, in which the same svara may be performed in several different ways, means that transcription from audio to symbolic notation is a challenging task. However, according to the grammar of the Carnatic style, there are a limited number of ways that a svara may be realized in a given rāga (melodic framework), and these ways depend to some extent on immediate melodic context and svara duration. Therefore, in theory, it should be possible to identify not only svaras but also the various characteristic ways that any given svara is performed - referred to here as ‘svara-forms’.

In this paper we present a dataset of 1,530 manually created svara annotations in a single performance of a composition in rāga Bhairavi, performed by the senior Carnatic vocalist Sanjay Subrahmanyan. We train a recurrent neural network and sequence classification model, DeepGRU, on the extracted pitch time series of the predominant vocal melody corresponding to these annotations to learn an embedding that classifies svara label with 87.6% test accuracy. We demonstrate how such embeddings can be used to cluster svaras that have similar forms and hence elucidate the distinct svara-forms that exist in this performance, whilst assisting in their automatic identification. Furthermore, we compare the melodic features of our 54 svara-form clusters to illustrate their unique character and demonstrate the dependency between these cluster allocations and the immediate melodic context in which these svaras are performed.

(Re-)capturing the emotional geography of lost venues: A case study of the Willow Community Digital Archive

Rachel Cowgill, David Bainbridge, Alan Dix, Victoria Hoyle, Vicki Fong and David Thomas

The loss of many high-street music venues in recent years has highlighted their connectedness to place and communities. Understanding the emotional geographies of these venues, as experienced by their patrons, is key to explaining the outcry that can accompany such closures. In these circumstances it can be challenging to try to (re)capture the intangible elements that defined a lost venue and widen the scope for musicological enquiry. This paper sets out to address that challenge by exploring methods developed by the Willow Community Digital Archive to co-create a community archive in celebration of The Willow, a family-run restaurant-cum-nightclub that operated in York, UK, for over 40 years. Further, we report on how these methods informed the crafting of a general-purpose digital library system to form the archive. We also detail some initial experiments with ChatGPT, embedded into the archive, to investigate its potential to encourage visitors to engage with and inspire further contributions to the archive.

Popular musical arrangements in the nineteenth century home: A study of the Harmonicon supported by digital tools

David Lewis and Kevin Page

Musicologists often remove all traces of the scaffolding used to construct their scholarship at the point of completion – presenting information about bibliographic and evidential sources, but not describing the tools and digital resources used. This makes an analysis of the state of digital support for musicology harder to achieve. In this paper, we consider both outcome and scaffolding, presenting a musicological study built upon digitised library resources, which made use of digital tools, and then considering the digital affordances that were required by the study.

We explore the musical content of the music periodicals, The Harmonicon (1823-1833) and The Musical Library (1834-1837), considering what it tells us about music making and reception in early nineteenth-century England. Journals such as these are important both for bringing a wide range of music into the home, but also for adapting music written for concert halls and the opera for the domestic sphere through musical arrangement. Since this music was more accessible to many than ticket prices, its selection and deployment in such volumes would have been critical for shaping an audiences musical tastes. At the same time, the editor was compelled to tailor the music to the abilities and interests of the audience, in an economically highly challenging environment.

This musicological study was supported by digital tools at multiple stages in the process. We describe the interaction between tools and scholarship, reflecting on where these were strong, but also considering opportunities for future development. We do this in terms of an iterative model of research, digitisation and editing, acknowledging that research must be able to continue despite imperfections and absences in tools, resources and digital data.

Open Edirom: From hybrid music edition to open data publication

Lena Frömmel, Tobias Bachmann, Anna Plaksin and Andreas Münzmay

The OPEN Edirom project is developing a digital edition of incidental music for Goethe’s play Faust, representing an innovative initiative within the realm of music philology and MEI/TEI edition. Embracing the "data first" principle, OPEN Edirom prioritizes making its content openly accessible, thereby enabling diverse potential uses for researchers and performers. Our aim involves presenting the scholarly text and music edition in its entirety, incorporating its various forms of data, i.e. music, texts, source images, metadata, and annotations, all displayed with Edirom software.

The piece we edit in this project is Goethe’s renowned play Faust I, as adapted by Carl Seydelmann, along with the corresponding music composed by Peter Joseph von Lindpaintner for the Court Theatre in Stuttgart. The work premiered in 1832.

This paper delves into the concept of music edition as open data publication and delineates its advantages over analog and hybrid editions in terms of reusability and alignment with the FAIR principles. It also addresses the challenges encountered in data preparation, both specific to incidental music and in general data processing. Furthermore, we propose solutions and recommendations for similar projects based on our insights.

A preliminary proposal for a systematic GABC encoding of Gregorian chant

Martha E. Thomae, David Rizo, Eliseo Fuentes-Martínez, Cristina Alís Raurich, Elsa De Luca and Jorge Calvo-Zaragoza

In the last years, several approaches have addressed the encoding of the different music scripts used for plainchant. One of these approaches is the GABC format. While being a comprehensive symbolic representation of square notation, the lack of a formal specification for GABC usually leads to ambiguities, which must be avoided in the specification of any encoding format. Sometimes, the simple trial-and-error approach of entering the GABC code into an engraving system—such as Illuminare, Scrib.io, or GABC Transcription Tool—can solve this ambiguity. However, these engraving systems have shown some inconsistency among themselves when rendering GABC, sometimes displaying different music for the same code snippet. This paper presents a systematic approach to encoding Gregorian chant originally written in Aquitanian neumes and square notation to eliminate ambiguities inherent in the GABC specification. By formalizing the grammar of GABC, we address the challenges of inaccurate renderings in current music notation software. Our methodology includes developing a “Systematic GABC” (S-GABC) following a critical and scientific mentality to ensure the endurance of the notation. This paper demonstrates our system’s effectiveness in standardizing Gregorian chant encoding, offering significant contributions to digital musicology and enhancing the accuracy of musical heritage digitization.

Navigating the RISM data with RISM Online

Andrew Hankinson and Laurent Pugin

In 2021, the RISM Digital Center introduced RISM Online. This represented a shift in how we present the RISM data to a global audience, supporting new methods of digital research and keeping the RISM project central to modern music scholarship. RISM Online is designed from the ground-up to move past a simple descriptive catalogue, treating the results of over 70 years of indexing, collating, organizing, and curating musical source descriptions and authorities as a significant and valuable data resource in itself. In this paper we explore the shift from catalogue to dataset more closely, looking at some of the unique and valuable information captured by RISM that can be of use to data-driven musicology. On the way we will identify how RISM Online is making this data available through the tools and access points we have built. Finally, we will report on some ongoing experiments with the RISM data as we seek to exploit the relationships captured therein as an active area of future work.

FACETS: A tool for improved exploration of large symbolic music collections

Tiange Zhu, Raphaël Fournier-S'Niehotta and Philippe Rigaux

Large collections of symbolic music documents need efficient information retrieval tools. We introduce FACETS, a versatile tool for exploring and management of such collections. FACETS is a scalable and flexible content-based search engine, offering melodic and rhythmic querying modes. For improved navigation, a faceted interface orders the results, to reduce information overloading, and it may be used as a primary entry point in the tool. FACETS is available as a standalone Docker image and Github repository, aiming to help musicologists, composers, MIR researchers and the interested public.

An online tool for semi-automatically annotating music scores for optical music recognition

Stanisław Graczyk, Zuzanna Piniarska, Mateusz Kałamoniak, Tomasz Łukaszewski, Ewa Łukasik

The paper describes an online tool, OMRAT, for semi-automatic annotation of music scores for Optical Music Recognition (OMR) systems. OMRAT uses deep neural networks, machine learning, and music notation ontologies at different stages to respectively detect musical objects, establish relationships between them, and convert them into a machine-readable format MEI. A human editor verifies the output of the recognition stage to correct potential errors and remove incorrect labels as needed. The tool can create training/testing datasets for OMR systems and may be used for notation editors or audio synthesizers.

JazzDAP: Collaborative research tools for digital jazz archives

Kevin Allain, Tillman Weyde

This paper introduces a novel web platform designed for exploration, analysis, and collaboration in the jazz music domain called JazzDAP. Our platform integrates advanced music information retrieval techniques with user-friendly interfaces, tailored for musicologists, archivists, and jazz enthusiasts. The platform employs a contour based algorithm for pattern recognition, enabling users to search for specific musical motifs, with filters based on metadata, e.g. artist, location and year of recording. Users can listen to audio sections or MIDI excerpts from the matches and delve into detailed metadata, including the years of recordings, prevalence of specific patterns, and information about the artists associated with them. Visualizations aid in uncovering trends, evolution, and connections in the development of jazz. A key innovation of our platform is the introduction of workflow objects, allowing users to save elements of interest accompanied by notes, named workflows, and engage in collaborative discussions. Users can use workflows to annotate, share insights, and communicate with each other, fostering a community-driven exploration of jazz music. This collaborative aspect enhances the platform’s utility for researchers and enthusiasts alike, aiming to create a dynamic environment for the exchange of knowledge and discoveries. This paper outlines the platform’s structure, highlights its key features, and presents preliminary user feedback. We believe that our work opens new avenues for the exploration and understanding of jazz music, offering a valuable resource for researchers, archivists, and enthusiasts passionate about the intricate patterns that shape the genre.

2024 Proceedings

DLfM '24: Proceedings of the 11th International Conference on Digital Libraries for Musicology