Last March we launched the Bach Doodle with the goal to make music composition more approachable. Users entered their own melodies, and used a machine learning model to harmonize them in the style of Bach chorales. We compiled the melodies and harmonizations users submitted into a new open source dataset.

When we put this dataset together, I was excited to find out what was in it, and most importantly, whether any of you entered the same melody. I spent some time creating a set of interactive visualizations to dig into this, and the results were super interesting! (Spoilers: more of you know the Pirates of the Caribbean theme than the Star Wars one, but neither holds a candle to Megalovania from Undertale.)

Screenshot of the top repeated melodies visualization highlighting Ode to Joy as a melody harmonized 1358 times.
Ode to Joy was in the top 2000 most entered melodies by the users. It was harmonized 1358 times.


In three days, the web app received more than 50 million queries for harmonizations around the world. Users could choose to rate their compositions and contribute them to a public dataset, which we are releasing here. We hope that the community finds this dataset useful for applications ranging from ethnomusicological studies, to music education, to games and improving machine learning models.

When deciding what data to collect from users, one of the things I was most excited about was grouping melodies by the country they were composed in. After the Quick, Draw! dataset was released, Ian Johnson did a super interesting analysis that showed how drawing styles are very regional: what users drew for “outlet” around the world changed based on what outlets actually look like in that part of the world.

My hypothesis was that something very similar would happen for compositions: in every country people would enter the melodies they’re most familiar with – childhood melodies, popular songs, or national anthems. And this proved to be true! In Japan, people entered a cute childhood song about tulips, or the popular jingle you hear in every FamilyMart store. In Romania, they entered the most famous classic Romanian composition. In Taiwan, they entered the National Flag Anthem, and music from animated movies.

Explore these regional hits and more with our collection of visualizations for the dataset, download the dataset or read more about it in the paper it was introduced!

How to Cite

If you use the Bach Doodle Dataset in your work, you can cite:

Cheng-Zhi Anna Huang, Curtis Hawthorne, Adam Roberts, Monica Dinculescu,
James Wexler, Leon Hong and Jacob Howcroft. "The Bach Doodle: Approachable
music composition with machine learning at scale."
 International Society for Music Information Retrieval, 2019.

You can also use the following BibTeX entry:

@inproceedings{bachdoodle2019,
   author = {Cheng-Zhi Anna Huang and Curtis Hawthorne and Adam Roberts
   and Monica Dinculescu and James Wexler and Leon Hong and Jacob Howcroft},
   title = {The {B}ach {D}oodle: Approachable music composition with machine learning at scale},
   booktitle = {International Society for Music Information Retrieval (ISMIR)},
   year = {2019},
   url={https://goo.gl/magenta/bach-doodle-paper}
}