Remember the AI Song Contest from March? The contest hosted by VPRO was Eurovision with a twist, where teams had to use AI for some aspect of their song entry. This expanded the focus to be not only on the final song, but also on teams’ collaboration process with AI behind the scenes. Thirteen teams from eight different countries competed, bringing together 61 musicians, developers, and scientists. As judges on the AI Panel, we came up with survey questions to guide teams in documenting how they incoporated AI into their songwriting process. We then carried out a qualitative analysis to better understand what musicians and developers needed when co-creating a song with AI, the fundamental challenges they faced, and the creative strategies they came up with to overcome some of these challenges.
Every song was different, both creatively and in what they wanted to convey. Team Uncanny Valley trained DDSP on Koala bear calls to raise awareness of forest fires in Australia, while team DataDada took a goofy and witty approach to “Rock the World”, generating the phrase by curating a list of frequently used English words and trying out which pairs sounded fun in French. Team COMPUTD / Shuman&Angel-Eye used AI to write a song about the experience of writing a song using AI, while team OVGneUrovision created an interactive system where they could jam back-and-forth with various ML models.
We found that teams faced three common challenges when co-composing with AI because AI was not easily decomposable (not easy to tweak individual musical components), was not context-aware (not fully aware of the musical context it was generating for), and also not easily steerable (not easy to request for the music to bear a certain mood or effect). Rather than using one giant end-to-end model, teams often used a combination of smaller, independent models that aligned well with the building blocks in music (i.e. melodies, chords, etc), and then stitched those smaller pieces together by hand. Teams often generated hundreds of samples and then manually curated them post-hoc, or used another algorithm to filter and rank the model outputs.
Ultimately, teams not only had to manage the “flare and focus” aspects of the creative process, but also had to juggle them with a parallel process of exploring and curating multiple ML models and outputs. These findings reflect a need to design ML models that take musicians’ needs and workflow into core consideration.
ISMIR presentation
As ISMIR 2020 is virtual this year, we are also trying out a new format for this blog post. First up is our 4-minute ISMIR video presentation to give you a summary of our paper. Immediately below, we unroll our poster, with a few fun before-and-after musical examples to illustrate how teams used AI in their song entries.
Workaround strategy for AI not being decomposable: Team Dadabots x Portrait XO initially wanted to take an end-to-end approach by training on audio to generate their vocal tracks. However, when the generated vocals were nonsensical, there wasn’t an easy way for them to tweak the lyrics or melody in the audio. Still, they listened closely, and found a moment where they could transcribe the lyrics, and it was a lower voice singing “I’ll marry you, punk come”. They riffed along and composed a duet. In the short excerpt below, you’ll first hear the generated lower voice, and then a higher voice sung by Portrait XO joining in.
| Excerpt from final song | 
Workaround strategy for AI not being context-aware: Team Uncanny Valley first generated lyrics and melodies separately. As trying to find pairs that go well together could really slow down the creative process, they came up with a clever way to match up the generated lyrics and melody according to their stress patterns. Below, you can hear two raw examples of the matches, and then how they sound in the final song.
| Lyrics: Beautiful the World | Lyrics: Welcome home | |
|---|---|---|
| Raw synthesis of matched lyrics and melody | 
| Excerpts from final song | 
Workaround strategy for AI not being steerable: Team Dadabots x Portrait XO felt repeating their chorus a second time would be too boring and wanted some variation. But there wasn’t a knob that they could turn to make the music sound “darker” or a cadence to sound more “elaborate”. To achieve these musical effects, they had to try out three different models and interfaces to rewrite and to reharmonize. Below, you’ll hear the first chorus from the final song, and then the modified ending of the second chorus, which is an extended Bach-like cadence generated using Coconet.
| First chorus | Second chorus (with extended Bach-like cadence) | |
|---|---|---|
| Excerpt from final song | 
Acknowledgements
This blog post is based on our ISMIR 2020 paper titled AI Song Contest: Human-AI Co-creation authored by Cheng-Zhi Anna Huang, Hendrik Vincent Koops, Ed Newton-Rex, Monica Dinculescu, and Carrie J. Cai.
@article{huang2020ai,
    title={AI Song Contest: Human-AI Co-Creation in Songwriting},
    author={Huang, Cheng-Zhi Anna and Koops, Hendrik Vincent and Newton-Rex, Ed and Dinculescu, Monica and Cai, Carrie J.},
     booktitle = {International Society for Music Information Retrieval (ISMIR)},
    year={2020},
}
Thank you Karen van Dijk from VPRO, who organized the AI Song Contest and supported us throughout the process. Special thanks to teams Uncanny Valley and Sandra Uitdenbogerd, and Dadabots x Portrait XO for allowing us to use excerpts from their AI Song Contest song entry and raw material as examples. We thank all teams for their amazing contributions that made this research possible! Also, we want to thank the organization team at ISMIR 2020, and their recommendation on using the Visme Infographic Maker to make our poster.
