GaMaDHaNi: Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani Classical Music

Nithya Shikarpur, Krishna Maneesha Dendukuri, Yusong Wu, Antoine Caillon, Cheng-Zhi Anna Huang

GaMaDHaNi Samples Page

Paper: [Link]

This is the project page for GaMaDHaNi: “Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani Classical Music”, accepted in ISMIR 2024. There are audio and figure examples attached for relevant instances mentioned in the paper. The index on the left hand side will help you navigate the different samples with each item listed as “<section number> <relevant section name>” from the paper.

Abstract

Hindustani music is a performance-driven oral tradition that exhibits the rendition of rich melodic patterns. In this paper, we focus on generative modeling of singers’ vocal melodies extracted from audio recordings, as the voice is musically prominent within the tradition. Prior generative work in Hindustani music models melodies as coarse dis- crete symbols which fails to capture the rich expressive melodic intricacies of singing. Thus, we propose to use a finely quantized pitch contour, as an intermediate rep- resentation for hierarchical audio modeling. We propose GaMaDHaNi, a modular two-level hierarchy, consisting of a generative model on pitch contours, and a pitch con- tour to audio synthesis model. We compare our approach to non-hierarchical audio models and hierarchical mod- els that use a self-supervised intermediate representation, through a listening test and qualitative analysis. We also evaluate audio model’s ability to faithfully represent the pitch contour input using Pearson correlation coefficient. By using pitch contours as an intermediate representation, we show that our model may be better equipped to listen and respond to musicians in a human-AI collaborative set- ting by highlighting two potential interaction use cases (1) primed generation, and (2) coarse pitch conditioning.

Note: The sound of a tanpura (drone) is added to the background of the generated samples highlighted below. The use of a tanpura is common in Hindustani vocal performance, and is added here to simulate that sound.

Fig 2: The overall hierarchical generation structure of GaMaDHaNi comprising of the Pitch Generator, the Spectrogram Generator and a vocoder. During inference, given an optional short melodic input, i.e. ‘prime’, each of the generators produce a pitch continuation and a spectrogram conditioned on the resulting pitch respectively.

Fig 2: The overall hierarchical generation structure of GaMaDHaNi comprising of the Pitch Generator, the Spectrogram Generator and a vocoder. During inference, given an optional short melodic input, i.e. ‘prime’, each of the generators produce a pitch continuation and a spectrogram conditioned on the resulting pitch respectively.

Our code is soon to be released!! Updates will be posted here.