CocoChorales is a dataset consisting of over 1400 hours of audio mixtures containing four-part chorales performed by 13 instruments, all synthesized with realistic-sounding generative models. CocoChorales contains mixes, sources, and MIDI data, as well as annotations for note expression (e.g., per-note volume and vibrato) and synthesis parameters (e.g., multi-f0).

Contents

Dataset

We created CocoChorales using two generative models produced by Magenta: Coconet and MIDI-DDSP. The dataset was created in two stages. First, we used a trained Coconet model to generate a large set of four-part chorales in the style of J.S. Bach. The output of this first stage is a set of note sequences, stored as MIDI, to which we assign a tempo and add random timing variations to each note (for added realism).

In the second stage, we use MIDI-DDSP to synthesize these MIDI files into audio, resulting in audio clips that sound like the chorales were performed by live musicians. This MIDI-DDSP model was trained on URMP. We define a set of ensembles that consist of the following instruments, in Soprano, Alto, Tenor, Bass (SATB) order:

  • String Ensemble: Violin 1, Violin 2, Viola, Cello.
  • Brass Ensemble: Trumpet, French Horn, Trombone, Tuba.
  • Woodwind Ensemble: Flute, Oboe, Clarinet, Bassoon.
  • Random Ensemble: Each SATB part is randomly assigned an instrument according to the following:
    • Soprano: Violin, Flute, Trumpet, Clarinet, Oboe.
    • Alto: Violin, Viola, Flute, Clarinet, Oboe, Saxophone, Trumpet, French Horn.
    • Tenor: Viola, Cello, Clarinet, Saxophone, Trombone, French Horn.
    • Bass: Cello, Double Bass, Bassoon, Tuba.

Each instrument in the ensemble is synthesized separately, with annotations for the high-level expressions used for each note (e.g., vibrato, note volume, note brightness, etc; all expressions shown here, and more details in Sections 3.2 and B.3 of the MIDI-DDSP paper) as well as detailed low-level annotations for synthesis parameters (e.g., f0’s, amplitudes of each harmonic, etc). Because the MIDI-DDSP model skews sharp, we randomly applied pitch augmentation to the f0’s (see Figure 2, here) to . All four audio clips for each instrument in the ensemble are then mixed together to produce an example in the dataset.

Because all of the data in CocoChorales originate from generative models, all of the annotations perfectly correspond to the audio data. All in all, the dataset contains 240,000 examples, 60,000 mixes from each one of the four ensemble types above. Each ensemble has its own train/validation/test split All of the audio is 16 kHz, 16-bit PCM data. Each example contains:

  • A mixture
  • Source audio for all four instruments
    • Gain applied to each source
  • MIDI with tempo and precise timing
  • The name of the ensemble with instrument names
  • Note expression annotations for every note:
    • Volume, Volume Fluctuation, Volume Peak Position, Vibrato, Brightness, and Attack Noise used by MIDI-DDSP to synthesize every note (see Sections 3.2 and B.3 of the MIDI-DDSP paper for more details)
  • Synthesis parameters for every source (250 Hz):
    • Fundamental frequency (f0), amplitude, amplitude of all harmonics, filtered noise parameters
    • Amount of pitch augmentation applied

Further Details

A detailed view of the contents of the CocoChorales dataset is provided at this link.

Download

For download instructions, please see this github page. The compressed version of the full dataset is 2.9 Tb, and the uncompressed version is larger than 4 Tb. There is a “tiny” version for download as well.

MD5 Hashes for all zipped files in the download are provided here.

License

The CocoChorales dataset was made by Yusong Wu and is available under the Creative Commons Attribution 4.0 International (CC-BY 4.0).

How to Cite

If you use CocoChorales in your work, we ask that you cite the following paper where it was introduced:

Yusong Wu, Josh Gardner, Ethan Manilow, Ian Simon, Curtis Hawthorne, and Jesse Engel.
“The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling.”
arXiv preprint, arXiv:2209.14458, 2022.

You can also use the following bibtex entry:

@article{wu2022chamber,
  title = {The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling},
  author = {Wu, Yusong and Gardner, Josh and Manilow, Ethan and Simon, Ian and Hawthorne, Curtis and Engel, Jesse},
  journal={arXiv preprint arXiv:2209.14458},
  year = {2022},
}