Editorial Note: One of the most rewarding experiences when putting out something for the world to use is to see someone build upon it. This is why we were very excited to see that a year after we open-sourced the code and model checkpoints for an arbitrary image stylization network architecture, Reiichiro Nakano had ported the model to TensorFlow.js. We reached out to Rei after noticing his demo online and he graciously accepted to contribute his code and model checkpoint to Magenta.js as the seed of our new image library @magenta/image. In this post, he shares his experience porting a deep learning model to TensorFlow.js, as well as optimizing it for a fast browser experience.
Shortly after deeplearn.js was released in 2017, I used it to port one of my favorite deep learning algorithms, neural style transfer, to the browser. One year later, deeplearn.js has evolved into TensorFlow.js, libraries for easy browser-based style transfer have been released, and my original demo no longer builds. So I started looking for a new project.
One of the main points of feedback I received from the community was that people wanted to provide their own style images to be used for stylization. Most style transfer models in the browser, including mine, are based on Johnson, et al 2016, which requires training a separate neural network for each style image. This means that in order to create pastiches of their own artwork, artists would have to train a separate model and port it to the browser–a process that requires a powerful GPU, several hours of training, and non-trivial technical know-how. A more desirable solution would be to consider a model that can already perform fast style transfer on any pair of content and style, and port that to the browser.Read full post.
Generating long pieces of music is a challenging problem, as music contains structure at multiple timescales, from milisecond timings to motifs to phrases to repetition of entire sections. We present Music Transformer, an attention-based neural network that can generate music with improved long-term coherence. Here are three piano performances generated by the model:
I’m a musician and a creative technologist with Google’s Pie Shop, an experience design studio tasked with translating the complex concepts behind emerging technologies at Google into tangible exhibits. For the last year or so I’ve been thinking about and designing tools that help musicians make use of Magenta’s musical models.
The project began as a browser based tool, but this summer the Pie Shop team and I also turned it into an interactive installation in the form of a latent space of melodies that you can walk on.
As a musician – someone who spent a lot of time studying and attempting to master music theory – I was initially very skeptical about applying machine learning to music. However, as a technologist and composer who uses computers as part of my music making, I saw pretty quickly how artistically interesting the idea of a musical palette could be.Read full post.
MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) is a dataset composed of over 172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. This new dataset enables us to train a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude (~0.1 ms to ~100 s), a process we call Wave2Midi2Wave.
Here’s an excerpt of music composed by a Music Transformer model by Huang et al. trained on MIDI data transcribed from the piano audio in the dataset and then synthesized using a WaveNet model also trained using MAESTRO.
We are making MAESTRO available under a Creative Commons Attribution Non-Commercial Share-Alike license. More information and download links are on the MAESTRO dataset webpage.
Full details about the dataset and our Wave2Midi2Wave process are available in our paper: Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset.Read full post.
We introduce Piano Genie, an intelligent controller that maps 8-button input to a full 88-key piano in real time:
Piano Genie is in some ways reminiscent of video games such as Rock Band and Guitar Hero that are accessible to novice musicians, with the crucial difference that users can freely improvise on Piano Genie rather than re-enacting songs from a fixed repertoire. You can try it out yourself via our interactive web demo!Read full post.
Inspired by Steve Reich’s Music for 18 musicians, I used machine learning to create a visual to go along with it:
It uses videos recorded from train windows, with landscapes that moves from right to left, to train a machine learning (ML) algorithm. First, it learns how to predict the next frame of the videos, by analyzing examples. Then it produces a frame from a first picture, then another frame from the one just generated, etc. The output becomes the input of the next calculation step. So, except for the initial image that I chose, all the other frames were generated by the algorithm. In other words, the process is a feedback loop made of an artificial neural network.full post.
Many of the generative models in Magenta.js require music to be input as a symbolic representation like MIDI, But what if you only have audio?
Try out the demo app Piano Scribe shown below to see the library in action for youself. If you don’t have recordings of a piano handy, you can try singing to it, and it will do its best!
Learn how to use the library in your own app in the documentation and share what you make using #madewithmagenta!Read full post.
Previously, we introduced MusicVAE, a hierarchical variational autoencoder over musical sequences. In this post, we demonstrate the use of MusicVAE to model a particular type of sequence: individual measures of General MIDI music with optional underlying chords.
General MIDI is a symbolic music representation that uses a standard set of 128 instrument sounds; this restriction to predefined instruments like “Honky-Tonk Piano” and “SynthStrings 1” often results in a cheesy sound reminiscent of old video game music. We use General MIDI here as basic representation to explore polyphonic music generation with multiple instruments, not because we expect it to make a comeback.
With that out of the way, here is a CodePen that demonstrates a few of the things you can do with such a model:Read full post.
I’m one of those people who always loved music but never became a musician, and was left feeling vaguely wistful by what could have been. That is until a couple of years ago, when something connected and I found a way to make a lot more room for music in my life while not straying too far from the path I was already on professionally.
The key realization was that even though I was not a musician, I could take my existing skills and interests in software development and design and use them as a lens to point toward music. This illuminated the direction I’ve been heading in ever since: Exploring intersections between music, software, design, and AI - and having a blast doing it.Read full post.
Here is a simple demo we made with it that plays an endless stream of MusicVAE samples:Read full post.
When a painter creates a work of art, she first blends and explores color options on an artist’s palette before applying them to the canvas. This process is a creative act in its own right and has a profound effect on the final work.
Musicians and composers have mostly lacked a similar device for exploring and mixing musical ideas, but we are hoping to change that. Below we introduce MusicVAE, a machine learning model that lets us create palettes for blending and exploring musical scores.
As an example, listen to this gradual blending of 2 different melodies, A and B. We’ll explain how this morph was achieved throughout the post.
Part of the goal of Magenta is to close the loop between artistic creativity and machine learning. Earlier this year, we released NSynth (Neural Audio Synthesis), a new approach to audio synthesis using neural networks.To make the algorithm more accessible to musicians, we created playable interfaces such as the Sound Maker and the Ableton Live plugin. We’ve been delighted to see the creative uses of the algorithm, from industrial dubstep to scenic atmospherics.
As an experiment in making machine learning even more tactile, immediate, playable, and fun, we’ve collaborated with Creative Lab to create NSynth Super: an open source hardware version of the instrument. Accessibility and community are key to our mission, and this hardware release is no different. On GitHub, you’ll find instructions and a list of materials and tools you’ll need to make your own NSynth Super. We’re excited to hear the new sounds and music you create with it. Learn more at g.co/nsynthsuper.Read full post.
Onsets and Frames is our new model for automatic polyphonic piano music transcription. Using this model, we can convert raw recordings of solo piano performances into MIDI.
For example, have you ever made a recording of yourself improvising at the piano and later wanted to know exactly what you played? This model can automatically transcribe that piano recording into a MIDI pianoroll that could be used to play the same music on a synthesizer or as a starting point for sheet music. Automatic transcription opens up many new possibilities for analyzing music that isn’t readily available in notated form and for creating much larger training datasets for generative models.
We’re able to achieve a new state of the art by using CNNs and LSTMs to predict pitch onset events and then using those predictions to condition framewise pitch predictions.
Editorial Note: We’re excited to feature a guest blog post by another member of our extended community, Hanoi Hantrakul, whose team recently won the Outside Lands Hackathon by building an interactive application based on NSynth.Read full post.
We present Performance RNN, an LSTM-based recurrent neural network designed to model polyphonic music with expressive timing and dynamics. Here’s an example generated by the model:
Update (01/03/19): Try out the new magic-sketchpad game!
Try the sketch-rnn demo.
For mobile users on a cellular data connection: The size of this first demo is around 5 MB of data. Everytime you change the model in the demo, you will use another 5 MB of data.
We made an interactive web experiment that lets you draw together with a recurrent neural network model called sketch-rnn.Read full post.
Editorial Note: One of the best parts of working on the Magenta project is getting to interact with the awesome community of artists and coders. Today, we’re very happy to have a guest blog post by one of those community members, Parag Mital, who has implemented a fast sampler for NSynth to make it easier for everyone to generate their own sounds with the model.Read full post.
I review (with animations!) backprop and truncated backprop through time (TBPTT), and introduce a multi-scale adaptation of TBPTT to hierarchical recurrent neural networks that has logarithmic space complexity. I wished to use this to study long-term dependencies, but the implementation got too complicated and kind of collapsed under its own weight. Finally, I lay out some reasons why long-term dependencies are difficult to deal with, going above and beyond the well-studied sort of gradient vanishing that is due to system dynamics.
Last summer at Magenta, I took on a somewhat ambitious project. Whereas most of Magenta was working on the symbolic level (scores, MIDI, pianorolls), I felt that this left out several important aspects of music, such as timbre and phrasing. Instead, I decided to work on a generative model of real audio.Read full post.
Sketch-RNN, a generative model for vector drawings, is now available in Magenta. For an overview of the model, see the Google Research blog from April 2017, Teaching Machines to Draw (David Ha). For the technical machine learning details, see the arXiv paper A Neural Representation of Sketch Drawings (David Ha and Douglas Eck).
Vector drawings of flamingos from our Jupyter notebook.
In a previous post, we described the details of NSynth (Neural Audio Synthesis), a new approach to audio synthesis using neural networks. We hinted at further releases to enable you to make your own music with these technologies. Today, we’re excited to follow through on that promise by releasing a playable set of neural synthesizer instruments:
- An interactive AI Experiment made in collaboration with Google Creative Lab that lets you interpolate between pairs of instruments to create new sounds.
- A MaxForLive Device that integrates into both Max MSP and Ableton Live. It allows you to explore the space of NSynth sounds through an intuitive grid interface. [DOWNLOAD]
One of the goals of Magenta is to use machine learning to develop new avenues of human expression. And so today we are proud to announce NSynth (Neural Synthesizer), a novel approach to music synthesis designed to aid the creative process.
Unlike a traditional synthesizer which generates audio from hand-designed components like oscillators and wavetables, NSynth uses deep neural networks to generate sounds at the level of individual samples. Learning directly from data, NSynth provides artists with intuitive control over timbre and dynamics and the ability to explore new sounds that would be difficult or impossible to produce with a hand-tuned synthesizer.
The acoustic qualities of the learned instrument depend on both the model used and the available training data, so we are delighted to release improvements to both:
- A dataset of musical notes an order of magnitude larger than other publicly available corpora.
- A novel WaveNet-style autoencoder model that learns codes that meaningfully represent the space of instrument sounds.
A full description of the dataset and the algorithm can be found in our arXiv paper.Read full post.
Magenta was first announced to the public nearly one year ago at Moogfest, a yearly music festival in Durham, NC that brings together together artists, futurist thinkers, inventors, entrepreneurs, designers, engineers, scientists, and musicians to explore emerging sound technologies.
This year we will be returning to continue the conversation, share what we’ve built in the last year, and help you make music with Magenta.Read full post.
Google Creative Lab just released A.I. Duet, an interactive experiment which lets you play a music duet with the computer. You no longer need code or special equipment to play along with a Magenta music generation model. Just point your browser at A.I. Duet and use your laptop keyboard or a MIDI keyboard to make some music. You can learn more by reading Alex Chen’s Google Blog post. A.I. Duet is a really fun way to interact with a Magenta music model. As A.I. Duet is open source, it can also grow into a powerful tool for machine learning research. I learned a lot by experimenting with the underlying code.
Read full post.
We are excited to announce our new RL Tuner algorithm, a method for enchancing the performance of an LSTM trained on data using Reinforcement Learning (RL). We create an RL reward function that teaches the model to follow certain rules, while still allowing it to retain information learned from data. We use RL Tuner to teach concepts of music theory to an LSTM trained to generate melodies. The two videos below show samples from the original LSTM model, and the same model enchanced using RL Tuner.
Read full post.
Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur have extended image style transfer by creating a single network which performs more than one stylization of an image. The paper has also been summarized in a Google Research Blog post. The source code and trained models behind the paper are being released here.
The model creates a succinct description of a style. These descriptions can be combined to create new mixtures of styles. Below is a picture of Picabo stylized with a mixture of 3 different styles. Adjust the sliders below the image to create more styles.
(or Learning Music Learned From Music)
A few days ago, DeepMind posted audio synthesis results that included .wav files generated from a training data set of hours of solo piano music. Each wave file (near the bottom of their post) is 10 seconds long, and sounds very much like piano music. I took a closer look at these samples.Read full post.
The magenta team is happy to announce our first step toward providing an easy-to-use interface between musicians and TensorFlow. This release makes it possible to connect a TensorFlow model to a MIDI controller and synthesizer in real time.
Don’t have your own MIDI keyboard? There are many free software components you can download and use with our interface. Find out more details on setting up your own TensorFlow-powered MIDI rig in the README.Read full post.
One of the difficult problems in using machine learning to generate sequences, such as melodies, is creating long-term structure. Long-term structure comes very naturally to people, but it’s very hard for machines. Basic machine learning systems can generate a short melody that stays in key, but they have trouble generating a longer melody that follows a chord progression, or follows a multi-bar song structure of verses and choruses. Likewise, they can produce a screenplay with grammatically correct sentences, but not one with a compelling plot line. Without long-term structure, the content produced by recurrent neural networks (RNNs) often seems wandering and random.
But what if these RNN models could recognize and reproduce longer-term structure?Read full post.
This past June, Magenta, in parternship with the Artists and Machine Intelligence group, hosted the Music, Art and Machine Intelligence (MAMI) Conference in San Francisco. MAMI brought together artists and researchers to share their work and explore new ideas in the burgeoning space intersecting art and machine learning.Read full post.
Magenta’s primary goal is to push the envelope forward in research on music and art generation. Another goal of ours is to teach others about that research. This includes disseminating important works in the field in one place, a resource that if curated, will be valuable to the community for years to come.Read full post.
We are excited to release our first tutorial model, a recurrent neural network that generates music. It serves as an end-to-end primer on how to build a recurrent network in TensorFlow. It also demonstrates a sampling of what’s to come in Magenta. In addition, we are releasing code that converts MIDI files to a format that TensorFlow can understand, making it easy to create training datasets from any collection of MIDI files.Read full post.
We’re happy to announce Magenta, a project from the Google Brain team that asks: Can we use machine learning to create compelling art and music? If so, how? If not, why not? We’ll use TensorFlow, and we’ll release our models and tools in open source on our GitHub. We’ll also post demos, tutorial blog postings and technical papers. Soon we’ll begin accepting code contributions from the community at large. If you’d like to keep up on Magenta as it grows, you can follow us on our blog, watch our GitHub repo, and join our discussion group.Read full post.