The Groove MIDI Dataset (GMD) is composed of 13.6 hours of aligned MIDI and (synthesized) audio of human-performed, tempo-aligned expressive drumming. The dataset contains 1,150 MIDI files and over 22,000 measures of drumming.
Contents
License
The dataset is made available by Google LLC under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.
Dataset
Update: If you’re looking for a dataset suitable for drum transcription or other audio-focused applications, see our Expanded Groove MIDI Dataset.
To enable a wide range of experiments and encourage comparisons between methods on the same data, we created a new dataset of drum performances recorded in MIDI format. We hired professional drummers and asked them to perform in multiple styles to a click track on a Roland TD-11 electronic drum kit. We also recorded the aligned, high-quality synthesized audio from the TD-11 and include it in the release.
The Groove MIDI Dataset (GMD), has several attributes that distinguish it from existing ones:
- The dataset contains about 13.6 hours, 1,150 MIDI files, and over 22,000 measures of drumming.
- Each performance was played along with a metronome set at a specific tempo by the drummer.
- The data includes performances by a total of 10 drummers, with more than 80% of duration coming from hired professionals. The professionals were able to improvise in a wide range of styles, resulting in a diverse dataset.
- The drummers were instructed to play a mix of long sequences (several minutes of continuous playing) and short beats and fills.
- Each performance is annotated with a genre (provided by the drummer), tempo, and anonymized drummer ID.
- Most of the performances are in 4/4 time, with a few examples from other time signatures.
- Four drummers were asked to record the same set of 10 beats in their own style. These are included in the test set split, labeled
eval-session/groove1-10
. - In addition to the MIDI recordings that are the primary source of data for the experiments in this work, we captured the synthesized audio outputs of the drum set and aligned them to within 2ms of the corresponding MIDI files.
A train/validation/test split configuration is provided for easier comparison of model accuracy on various tasks.
Split | Beats | Fills | Measures (approx.) | Hits | Duration (minutes) |
---|---|---|---|---|---|
Train | 378 | 519 | 17752 | 357618 | 648.5 |
Validation | 48 | 76 | 2269 | 44044 | 82.2 |
Test | 77 | 52 | 2193 | 43832 | 84.3 |
Total | 503 | 647 | 22214 | 445494 | 815.0 |
For more information about how the dataset was created and several applications of it, please see the
paper where it was introduced: Learning to Groove with Inverse Sequence Transformations.
For an example application of the dataset, see our blog post on GrooVAE.
MIDI Data
Format
The Roland TD-11 splits the recorded data into separate tracks: one for meta-messages (tempo, time signature, key signature), one for control changes (hi-hat pedal position), and one for notes. The control changes are set on channel 0 and the notes on channel 9 (the canonical drum channel). To simplify processing of this data, we made two adustments to the raw MIDI files before distributing:
- We merged all messages (meta, control change, and note) to a single track.
- We set all messages to channel 9 (10 if 1-indexed).
Drum Mapping
The Roland TD-11 used to record the performances in MIDI uses some pitch values that differ from the General MIDI (GM) Specifications. Below we show how the Roland mapping compares to GM. Please take note of these discrepancies during playback and training. The final column shows the simplified mapping we used in our paper.
Pitch | Roland Mapping | GM Mapping | Paper Mapping | Frequency |
---|---|---|---|---|
36 | Kick | Bass Drum 1 | Bass (36) | 88067 |
38 | Snare (Head) | Acoustic Snare | Snare (38) | 102787 |
40 | Snare (Rim) | Electric Snare | Snare (38) | 22262 |
37 | Snare X-Stick | Side Stick | Snare (38) | 9696 |
48 | Tom 1 | Hi-Mid Tom | High Tom (50) | 13145 |
50 | Tom 1 (Rim) | High Tom | High Tom (50) | 1561 |
45 | Tom 2 | Low Tom | Low-Mid Tom (47) | 3935 |
47 | Tom 2 (Rim) | Low-Mid Tom | Low-Mid Tom (47) | 1322 |
43 | Tom 3 (Head) | High Floor Tom | High Floor Tom (43) | 11260 |
58 | Tom 3 (Rim) | Vibraslap | High Floor Tom (43) | 1003 |
46 | HH Open (Bow) | Open Hi-Hat | Open Hi-Hat (46) | 3905 |
26 | HH Open (Edge) | N/A | Open Hi-Hat (46) | 10243 |
42 | HH Closed (Bow) | Closed Hi-Hat | Closed Hi-Hat (42) | 31691 |
22 | HH Closed (Edge) | N/A | Closed Hi-Hat (42) | 34764 |
44 | HH Pedal | Pedal Hi-Hat | Closed Hi-Hat (42) | 52343 |
49 | Crash 1 (Bow) | Crash Cymbal 1 | Crash Cymbal (49) | 720 |
55 | Crash 1 (Edge) | Splash Cymbal | Crash Cymbal (49) | 5567 |
57 | Crash 2 (Bow) | Crash Cymbal 2 | Crash Cymbal (49) | 1832 |
52 | Crash 2 (Edge) | Chinese Cymbal | Crash Cymbal (49) | 1046 |
51 | Ride (Bow) | Ride Cymbal 1 | Ride Cymbal (51) | 43847 |
59 | Ride (Edge) | Ride Cymbal 2 | Ride Cymbal (51) | 2220 |
53 | Ride (Bell) | Ride Bell | Ride Cymbal (51) | 5567 |
Control Changes
The TD-11 also records control changes specifying the position of the hi-hat pedal on each hit. We have preserved this information under control 4.
Download
GMD is available as a zip file containing the MIDI and WAV files as well as metadata in CSV format.
groove-v1.0.0.zip
Size: 4.76GB
SHA256: 21559feb2f1c96ca53988fd4d7060b1f2afe1d854fb2a8dcea5ff95cf3cce7e9
A MIDI-only version of the dataset is also available.
groove-v1.0.0-midionly.zip
Size: 3.11MB
SHA256: 651cbc524ffb891be1a3e46d89dc82a1cecb09a57c748c7b45b844c4841dcc1e
The metadata file (info.csv
) has the following fields for every MIDI/WAV pair:
Field | Description |
---|---|
drummer | An anonymous string ID for the drummer of the performance. |
session | A string ID for the recording session (unique per drummer). |
id | A unique string ID for the performance. |
style | A string style for the performance formatted as “<primary>/<secondary>”. The primary style comes from the Genre List below. |
bpm | An integer tempo in beats per minute for the performance. |
beat_type | Either “beat” or “fill” |
time_signature | The time signature for the performance formatted as “<numerator>-<denominator>”. |
midi_filename | Relative path to the MIDI file. |
audio_filename | Relative path to the WAV file (if present). |
duration | The float duration in seconds (of the MIDI). |
split | The predefined split the performance is a part of. One of “train”, “validation”, or “test”. |
Genre List: afrobeat, afrocuban, blues, country, dance, funk, gospel, highlife, hiphop, jazz, latin, middleeastern, neworleans, pop, punk, reggae, rock, soul
TensorFlow Dataset
The model can be trivially loaded as a tf.data.Dataset
using TensorFlow Datasets (TFDS).
For example, you can iterate through the dataset using just the following lines of code:
import tensorflow as tf
import tensorflow_datasets as tfds
# tfds works in both Eager and Graph modes
tf.enable_eager_execution()
# Load the full GMD with MIDI only (no audio) as a tf.data.Dataset
dataset = tfds.load(
name="groove/full-midionly",
split=tfds.Split.TRAIN,
try_gcs=True)
# Build your input pipeline
dataset = dataset.shuffle(1024).batch(32).prefetch(
tf.data.experimental.AUTOTUNE)
for features in dataset.take(1):
# Access the features you are interested in
midi, genre = features["midi"], features["style"]["primary"]
We have also included predefined configurations for preprocessing the data in various ways. For example, if you want to train on 2-measure examples and also want to use audio at 16KHz, you can load "groove/2bar-16000hz"
. The full list of available features and predefined configurations in the TFDS documentation. If you wish to use settings not reflected in an existing configuration, you can create your own GrooveConfig
and pass it to the builder_config
argument in tfds.load
.
How to Cite
If you use the Groove MIDI Dataset in your work, please cite the paper where it was introduced:
Jon Gillick, Adam Roberts, Jesse Engel, Douglas Eck, and David Bamman.
"Learning to Groove with Inverse Sequence Transformations."
International Conference on Machine Learning (ICML), 2019.
You can also use the following BibTeX entry:
@inproceedings{groove2019,
Author = {Jon Gillick and Adam Roberts and Jesse Engel and Douglas Eck and David Bamman},
Title = {Learning to Groove with Inverse Sequence Transformations},
Booktitle = {International Conference on Machine Learning (ICML)},
Year = {2019},
}
Acknowledgements
We’d like to thank the following primary contributors to the dataset:
- Dillon Vado (of Never Weather)
- Jonathan Fishman (of Phish)
- Michaelle Goerlitz (of Wild Mango)
- Nick Woodbury (of SF Contemporary Music Players)
- Randy Schwartz (of El Duo)
Additional drumming provided by: Jon Gillick, Mikey Steczo, Sam Berman, and Sam Hancock.