Update: This paper won the ‘best special call paper award’ at ISMIR 2022.
In this study, we try to use audio information (pitch and voicing) along with video information (x, y coordinated of the right and left wrists of a singer) to determine the raga of 12 s clips. We were able to show that video data, if incorporated correctly, can help correct the mistakes made by a classifier using solely audio data. We show quantitative results backed by qualitative commentary.