Lyria RealTime API
Lyria team
For the last few years, we have continued to explore how different ways of interacting with generative AI technologies for music can lead to new creative possibilities. A primary focus has been on what we refer to as “live music models”, which can be controlled by a user in real-time.
Lyria RealTime is Google DeepMind’s latest model developed for this purpose, and we are excited to share an experimental API that anyone can use to explore the technology, create some jams, develop an app, or build their own musical instruments. You can try a demo app now in Google AI Studio, fork it to build your own, or have a look at the API documentation.
Here are a few interfaces we have open sourced in Google AI Studio for inspiration that you can easily fork and make your own:
PromptDJ
Our most fully-featured demo allows you to add prompts and use sliders to control their relative impact on the music. Advanced Settings let you try out manual overrides for different musical aspects like note density, tempo, and key.
Try it now !PromptDJ MIDI
With PromptDJ MIDI, you can use a virtual MIDI controller to mix together text descriptors (that you can edit) and produce a single stream of music. You can even map the knobs to a physical MIDI controller via WebMIDI like Toro y Moi used during the I/O preshow.
Try it now !PromptDJ Pad
PromptDJ Pad harkens back to our earlier experiments with latent space interfaces NSynth Super and MusicVAE Beat Blender, allowing you to easily explore the space between four editable prompts.
Try it now !Capabilities
With Lyria RealTime, it is possible to traverse the space of multi-instrumental audio: explore the never-before-heard music between genres, unusual instrument combinations, or abstract concepts.
The core capabilities of the model and API are:
- Generates a continuous stream of 48kHz stereo music.
- Low latency – maximum of 2 seconds between control change and effect.
- Latent space steering based on a mixture of text descriptors.
- Manual control over music features
- Tempo, key.
- Options to reduce or silence particular instrument groups (drums, bass, other).
- Control for density of note onsets.
- Control for spectral brightness.
- Sampling temperature and top-k settings (“chaos” control).
Interfaces for Live Music Models
One of the things we are most excited about with live music models is the number of novel interfaces they make possible by mapping human actions to musical controls. This harkens back to our earlier work with Magenta.js and the large number of applications it and other earlier Magenta technologies spawned. We hope the Lyria RealTime API will empower even more creativity by developers.
Live music models introduce a different interaction paradigm than text-to-song generators, which have impressive capabilities but lack the instantaneous feedback loops available to players of traditional instruments. The goal of models like Lyria RealTime is to put the human more deeply in the loop, centering the experience on the joy of the process over the final product. The higher bandwidth channel of communication and control often results in outputs that are more unique and personal, as every action the player takes (or doesn’t) has an effect.
In Lyria RealTime, the ability to adjust prompt mixtures and quickly hear the results allows players to efficiently explore the sonic landscape to find novel textures and loops. Real-time interactivity also provides the possibility of this latent exploration being its own type of musical performance, the interpolation through space combined with anchoring of the audio context producing a structure similar to DJ set or improvisation session. Beyond performance, it can also be used to provide interactive soundscapes for physical spaces like artist installations or virtual spaces like video games.
Our first public experiment with Lyria RealTime was MusicFX DJ, which we developed last year as a collaboration with Google Labs. MusicFX DJ allows you to create and conduct a continuous flow of music, and we worked with producers and artists to make the tool more inspiring and useful to musicians and amateurs alike.
At this year’s I/O, Toro y Moi (Chaz Bear) took Lyria RealTime for a spin on stage before the keynote, using a different interface that he operated via a physical MIDI controller. Chaz’s performance leaned deeply into the live nature of the model, improvising with it to lead the crowd on a sonic journey full of surprises for himself and the audience.
How it Works
Live generative music is particularly difficult because it requires both real-time generation (i.e. real-time factor > 1, generating 2 seconds of audio in less than 2 seconds), causal streaming (i.e. online generation), and low-latency controllability.
Lyria RealTime overcomes these challenges by adapting the MusicLM architecture to perform block autoregression. The model generates a continuous stream of music in sequential chunks, each steered by the previous audio output and a style embedding for the next chunk. By manipulating the style embedding (weighted average of text or audio prompt embeddings), players can shape and morph the music in real-time, mixing together different styles, instruments, and musical attributes.
Future Work
We are currently working on the next generation of real-time models with higher quality, lower latency, more interactivity, and on-device operability, to create truly playable instruments and live accompaniment. Stay tuned as we continue working with communities of musicians and developers on these technologies.