The Groove MIDI Dataset (GMD) is composed of 13.6 hours of aligned MIDI and (synthesized) audio of human-performed, tempo-aligned expressive drumming. The dataset contains 1,150 MIDI files and over 22,000 measures of drumming.

Contents

License

Creative Commons License

The dataset is made available by Google LLC under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Dataset

To enable a wide range of experiments and encourage comparisons between methods on the same data, we created a new dataset of drum performances recorded in MIDI format. We hired professional drummers and asked them to perform in multiple styles to a click track on a Roland TD-11 electronic drum kit. We also recorded the aligned, high-quality synthesized audio from the TD-11 and include it in the release.

The Groove MIDI Dataset (GMD), has several attributes that distinguish it from existing ones:

  • The dataset contains about 13.6 hours, 1,150 MIDI files, and over 22,000 measures of drumming.
  • Each performance was played along with a metronome set at a specific tempo by the drummer.
  • The data includes performances by a total of 10 drummers, with more than 80% of duration coming from hired professionals. The professionals were able to improvise in a wide range of styles, resulting in a diverse dataset.
  • The drummers were instructed to play a mix of long sequences (several minutes of continuous playing) and short beats and fills.
  • Each performance is annotated with a genre (provided by the drummer), tempo, and anonymized drummer ID.
  • Most of the performances are in 4/4 time, with a few examples from other time signatures.
  • In addition to the MIDI recordings that are the primary source of data for the experiments in this work, we captured the synthesized audio outputs of the drum set and aligned them to within 2ms of the corresponding MIDI files.

A train/validation/test split configuration is provided for easier comparison of model accuracy on various tasks.

Split Beats Fills Measures (approx.) Hits Duration (minutes)
Train 378 519 17752 357618 648.5
Validation 48 76 2269 44044 82.2
Test 77 52 2193 43832 84.3
Total 503 647 22214 445494 815.0


For more information about how the dataset was created and several applications of it, please see the paper where it was introduced: Learning to Groove with Inverse Sequence Transformations.

For an example application of the dataset, see our blog post on GrooVAE.

MIDI Data

Format

The Roland TD-11 splits the recorded data into separate tracks: one for meta-messages (tempo, time signature, key signature), one for control changes (hi-hat pedal position), and one for notes. The control changes are set on channel 0 and the notes on channel 9 (the canonical drum channel). To simplify processing of this data, we made two adustments to the raw MIDI files before distributing:

  • We merged all messages (meta, control change, and note) to a single track.
  • We set all messages to channel 9 (10 if 1-indexed).

Drum Mapping

The Roland TD-11 used to record the performances in MIDI uses some pitch values that differ from the General MIDI (GM) Specifications. Below we show how the Roland mapping compares to GM. Please take note of these discrepancies during playback and training. The final column shows the simplified mapping we used in our paper.

Pitch Roland Mapping GM Mapping Paper Mapping Frequency
36 Kick Bass Drum 1 Bass (36) 88067
38 Snare (Head) Acoustic Snare Snare (38) 102787
40 Snare (Rim) Electric Snare Snare (38) 22262
37 Snare X-Stick Side Stick Snare (38) 9696
48 Tom 1 Hi-Mid Tom Mid Tom (48) 13145
50 Tom 1 (Rim) High Tom High Tom (50) 1561
45 Tom 2 Low Tom Low Tom (45) 3935
47 Tom 2 (Rim) Low-Mid Tom Mid Tom (48) 1322
43 Tom 3 (Head) High Floor Tom Low Tom (45) 11260
58 Tom 3 (Rim) Vibraslap Low Tom (45) 1003
46 HH Open (Bow) Open Hi-Hat Open Hi-Hat (46) 3905
26 HH Open (Edge) N/A Open Hi-Hat (46) 10243
42 HH Closed (Bow) Closed Hi-Hat Closed Hi-Hat (42) 31691
22 HH Closed (Edge) N/A Closed Hi-Hat (42) 34764
44 HH Pedal Pedal Hi-Hat Closed Hi-Hat (42) 52343
49 Crash 1 (Bow) Crash Cymbal 1 Crash Cymbal (49) 720
55 Crash 1 (Edge) Splash Cymbal Crash Cymbal (49) 5567
57 Crash 2 (Bow) Crash Cymbal 2 Crash Cymbal (49) 1832
52 Crash 2 (Edge) Chinese Cymbal Crash Cymbal (49) 1046
51 Ride (Bow) Ride Cymbal 1 Ride Cymbal (51) 43847
59 Ride (Edge) Ride Cymbal 2 Ride Cymbal (51) 2220
53 Ride (Bell) Ride Bell Ride Cymbal (51) 5567

Control Changes

The TD-11 also records control changes specifying the position of the hi-hat pedal on each hit. We have preserved this information under control 4.

Download

GMD is available as a zip file containing the MIDI and WAV files as well as metadata in CSV format.

groove-v1.0.0.zip

Size: 4.76GB
SHA256: 21559feb2f1c96ca53988fd4d7060b1f2afe1d854fb2a8dcea5ff95cf3cce7e9

A MIDI-only version of the dataset is also available.

groove-v1.0.0-midionly.zip

Size: 3.11MB
SHA256: 651cbc524ffb891be1a3e46d89dc82a1cecb09a57c748c7b45b844c4841dcc1e

The metadata file (info.csv) has the following fields for every MIDI/WAV pair:

Field Description
drummer An anonymous string ID for the drummer of the performance.
session A string ID for the recording session (unique per drummer).
id A unique string ID for the performance.
style A string style for the performance formatted as “<primary>/<secondary>”. The primary style comes from the Genre List below.
bpm An integer tempo in beats per minute for the performance.
beat_type Either “beat” or “fill”
time_signature The time signature for the performance formatted as “<numerator>-<denominator>”.
midi_filename Relative path to the MIDI file.
audio_filename Relative path to the WAV file (if present).
duration The float duration in seconds (of the MIDI).
split The predefined split the performance is a part of. One of “train”, “validation”, or “test”.

Genre List: afrobeat, afrocuban, blues, country, dance, funk, gospel, highlife, hiphop, jazz, latin, middleeastern, neworleans, pop, punk, reggae, rock, soul


TensorFlow Dataset

The model can be trivially loaded as a tf.data.Dataset using TensorFlow Datasets (TFDS).

For example, you can iterate through the dataset using just the following lines of code:

import tensorflow as tf
import tensorflow_datasets as tfds

# tfds works in both Eager and Graph modes
tf.enable_eager_execution()

# Load the full GMD with MIDI only (no audio) as a tf.data.Dataset
dataset = tfds.load(
    name="groove/full-midionly",
    split=tfds.Split.TRAIN,
    try_gcs=True)

# Build your input pipeline
dataset = dataset.shuffle(1024).batch(32).prefetch(
    tf.data.experimental.AUTOTUNE)
for features in dataset.take(1):
  # Access the features you are interested in
  midi, genre = features["midi"], features["style"]["primary"]

We have also included predefined configurations for preprocessing the data in various ways. For example, if you want to train on 2-measure examples and also want to use audio at 16KHz, you can load "groove/2bar-16000hz". The full list of available features and predefined configurations in the TFDS documentation. If you wish to use settings not reflected in an existing configuration, you can create your own GrooveConfig and pass it to the builder_config argument in tfds.load.

How to Cite

If you use the Groove MIDI Dataset in your work, please cite the paper where it was introduced:

Jon Gillick, Adam Roberts, Jesse Engel, Douglas Eck, and David Bamman.
"Learning to Groove with Inverse Sequence Transformations."
  International Conference on Machine Learning (ICML), 2019.

You can also use the following BibTeX entry:

@inproceedings{groove2019,
    Author = {Jon Gillick and Adam Roberts and Jesse Engel and Douglas Eck and David Bamman},
    Title = {Learning to Groove with Inverse Sequence Transformations},
    Booktitle = {International Conference on Machine Learning (ICML)},
    Year = {2019},
}

Acknowledgements

We’d like to thank the following primary contributors to the dataset:

Additional drumming provided by: Jon Gillick, Mikey Steczo, Sam Berman, and Sam Hancock.