
Google researchers have created an artificial intelligence (AI) music generator that takes text cues to generate minutes-long musical compositions, according to a research paper published by the firm.
MusicLM is a new product from Google that employs a hierarchical sequence-to-sequence architecture for conditional music synthesis.
The music is created at a rate of 24kHz, which remains constant for several minutes. A whistled or hummed tune may also be transformed into different instruments using MusicLM.
Google released a number of snippets created using MusicLM. It built shorter tracks from “rich captions” and numerous “story mode” clips as samples.
The following is an example of a “rich caption” text prompt used by Google to construct the music for an arcade game.
“The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff,” the prompt reads.
“The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls.”
The “story mode” clip examples were created using various shorter text prompts, including:
- Time to meditate (0:00-0:15), time to wake up (0:15-0:30), time to run (0:30-0:45), time to give 100% (0:45-0:60)
- Electronic song played in a videogame (0:00-0:15), meditation song played next to a river (0:15-0:30), fire (0:30-0:45), fireworks (0:45-0:60)
Google’s MusicLM can also produce music clips from paintings — especially from descriptions of the artworks — including Salvador Dali’s iconic “The Persistence of Memory”.
MusicLM is capable of simulating human vocals. While it appears to get the tone and general sound of voices correct, they don’t sound completely authentic.
