Visualizing Music with GANs

reikonomusha · on Oct 30, 2019

I had trouble connecting the music to the transitions and morphs. I perceived it just as music overlaid on fun-to-see GAN-generated images, though clearly that’s not the aim. The music is beautiful and the imagery is intriguing, however.

McIceT · on Oct 30, 2019

This is something I've been working on https://www.youtube.com/watch?v=52qWiLoOeIQ you should clearly see the connection between the morphs and the music

jressey · on Oct 30, 2019

I'm blown away.

So the glasses are always present when the bass/808s are hitting, so is there something that maps the sound to the images?

What is it about the algorithms that make the images 'dance' so quickly between the 3.5 beat and the 1? Is it because there are static risers that move so quickly through the wave spectrum?

Wait... is light skin mapped to when highs dominate and dark skin to lows?

McIceT · on Oct 30, 2019

I'm glad you like it! Actually compared to the linked post I don't do any manual latent-space representation selection. It's just a bit of "smart" signal processing. I've written a framework that makes it really simple to do these visualizations (not open-source yet). Here's one more example: https://www.youtube.com/watch?v=X4r4njUjE2M

jcims · on Oct 30, 2019

It could just be my brain, but it seems like there is a loose correlation between the mouths in the video and the lyrics.

In Phantom Part II they mostly have their mouths closed. In La La Land it varies but the mouths are mostly open. If you focus on the mouth you'll get little mental radar blips where the mouth could be tracking what is being said.

"Pretty girl and you let go" - https://youtu.be/52qWiLoOeIQ?t=18

"If you wanna waste time baby" - https://youtu.be/52qWiLoOeIQ?t=44

"Yeah i met her at a one oak" - https://youtu.be/52qWiLoOeIQ?t=84 (esp the one oak part)

Anyone that actually watches these will probably just think I'm low on sleep, but it's kind of interesting.

isoprophlex · on Oct 30, 2019

Jesus holy fuck that's SUPER IMPRESSIVE

I'd be delighted to learn how you translate musical structure to a point in latent space!

eufouria · on Nov 1, 2019

Cool stuff! Would be interested in knowing if and when this library becomes open source. Where could I follow you?

McIceT · on Nov 4, 2019

You could follow me on twitter @tsmcalister. I'll post there once it's released. Depends a lot on how much time I have to work on it. Hopefully by the end of the month!

JonathanFly · on Oct 30, 2019

Looking forward to when you release this, would love to try this with other models!

TeMPOraL · on Oct 31, 2019

Absolutely. Amazing. I was waiting for someone to try this, and you did, and the results are way more impressive than I imagined they would be.

(Try this with different kinds of songs if you can!)

McIceT · on Oct 31, 2019

Any specific song request? :)

jcims · on Oct 31, 2019

For some reason this made me yearn for a GAN that generates motorcycles riding through landscapes. Love the storytelling potential of your work, great stuff!

dpau · on Oct 30, 2019

yeah i when i read the title i was hoping the GAN visualizations would reveal the underlying structure of the music somehow, or the images would have a more compelling link to the music. EDIT: just realized it might be much more interesting to train GANs with movie soundtracks, thus forming a link between music and image

guelo · on Oct 30, 2019

I don't really get what the connection to the music is supposed to be either. Does the music influence the GAN somehow?

craze3 · on Oct 30, 2019

You may notice that the visual changes occur at the same frequency as the music's tempo. But you're right, there is no influence on the content itself.

guelo · on Oct 30, 2019

That could be accomplished by tying the speed of the animation playback to the tempo.

sillysaurusx · on Oct 30, 2019

GANs are an interesting frontier. Videos are much more engaging than photos. The next logical step is to make a real-time GAN, like a videogame you can walk around in.

Imagine using a Vive to explore a GAN interactively. You'd be able to control the GAN using vive controllers and by walking around your room.

Right now it takes 163ms to render a 1024x1024 frame on a K80 GPU. That's 6 FPS, which is within an order of magnitude of 60FPS.

I haven't timed a 256x256 GAN, but presumably it would be 16x faster to generate. If so, then you'd be able to achieve 98FPS.

The above timings are based on the 1024x1024 FFHQ GAN model, which generates portraits of humans. https://github.com/pbaylies/stylegan-encoder

And indeed, it looks like the author uploaded an FFHQ music video 14 minutes ago! https://www.youtube.com/watch?v=3TLEfOMBbMw It looks cool.

Someone should train a 256x256 FFHQ and make a 90FPS interactive renderer for it.

Unfortunately it's not possible to take a large GAN like 1024x1024 FFHQ and only generate a 256x256 image. Each GAN is trained for a specific size, so you're stuck with 6 FPS at 1024x1024. I wish the FFHQ authors had saved a 256x256 checkpoint during training.

Training a 256x256 GAN from scratch costs somewhere in the range of $150 GCE credits. But you might be able to bootstrap a 256x256 FFHQ using the weights from the 1024x1024 FFHQ (aka transfer learning). That might train a lot faster.

There is also the recent NoGAN technique, which skips progressive growing by pretraining the generator: https://github.com/jantic/DeOldify/#what-is-nogan Supposedly it speeds up GAN training by a huge amount.

alexcnwy · on Oct 30, 2019

This is so incredible and well executed. It's crazy to see how quickly GANs are moving (e.g. check out this tweet [1] by Ian Goodfellow on 4.5 years of GAN progress) ... excited to see what they can do a few years from now!

[1] https://twitter.com/goodfellow_ian/status/108497359623614464...

RootReducer · on Oct 30, 2019

This is a totally different way of visualizing a track - no GAN here, all done by hand, but also very cool: https://www.youtube.com/watch?v=LvEGgMbTW1s

BasDirks · on Oct 30, 2019

Insanely cool. Also see this work by the same author: https://www.youtube.com/watch?v=6du033n-J3s

autokad · on Oct 30, 2019

Is there shareable / repeatable code to go behind it? Its cool but without seeing how it was produced, it could be just as well a fancy output of a video editing software

manuaero · on Oct 30, 2019

this is nice. Great choice of track too! (https://www.youtube.com/watch?time_continue=137&v=K-AmcJhqV7...

captn3m0 · on Oct 30, 2019

So who’s converting this to Milkdrop?

withparadox2 · on Oct 30, 2019

Amazing, how to make this happen?

FpUser · on Oct 30, 2019

This may be GANs as the author stated but the end result looks surprisingly similar to a bunch of pixel shaders making transitions between source and target images with said transitions driven either by pure algorithms and/or derived from blurred images themselves.

I've implemented music visualizer ages ago using similar concepts (pure algos though, no real images). It happened when nVidia released the first affordable consumer video card with decent shader support. I think it was 6600GT . My animation part that made video dance to music was a bit more sophisticated though.

pierrec · on Oct 30, 2019

Regarding the music synchronization, OK, this is ancient stuff.

However, in terms of graphics, this strikes me as different from anything that was possible before the recent advances in GANs. During the era you're talking about, the art of shader-based music visualizers was being pushed by projects like Milkdrop 2, and nowadays a lot of similar research still happens on Shadertoy, and the demoscene, of course, hasn't stopped blowing people's minds.

But this is on another level entirely. It's as is the content and seemingly human concepts themselves are being smoothly animated.

FpUser · on Oct 30, 2019

this strikes me as different from anything that was possible before the recent advances in GANs

- well this is because you did not see my vis. It looked just like the one you saw on GAN's related link with similar transitions. Except that all imagery was generated by math formulas running in pixel shaders instead of ready bitmaps/videos.

Here is the actual screenshot: https://exsotron.com/exs_files/exvis-0003.jpg

Actually I played with the actual music video clips as a source of the imagery and the results were really cool but obviously other then experiment at home could not really do this part due to copyrights etc.

sillysaurusx · on Oct 30, 2019

Cool! How'd you make your video dance to a beat? Is the code available anywhere?

@jonathanfly has been posting some interesting machine learning effects: https://twitter.com/jonathanfly/status/1185843103271444480

I guess one way to make these effects dance to music would be to make a Mel spectrogram of the audio, then somehow use the shapes in the spectrogram to apply deltas to the rendered frames.

FpUser · on Oct 30, 2019

The music animation code worked something like this:

1) Each of my pixel shaders was driven by let's say 32 parameters (do not remember the exact value)

2) The code would generate first set of said parameters and the second set (values were random) and start transitioning ( lerp ) between 2 with the length of transition of about 60 seconds.

3) Upon completion of the transition the first set would be replaced by second set and the second set would be replaced by freshly generated third set and an infinum.

4) Seps 2 and 3 allowed for non stop fluid motion.

5) Lerp value the degree of transition between sets for each parameter would be modulated by sound (one FFT band for each also passed through synth like attack / decay.

6) Finally there was beat detection part which upon detecting a beat would invert lerp direction

There were more steps and various tricks to make it more interesting and non repetitive but I am not writing article here ;)

The end result was quite artistic. The visualizer was part of much bigger enterprise grade media playback / management / delivery /scheduling platform I've developed for hospitality industry

alexcnwy · on Oct 30, 2019

GANs + "A ton of python magic" [1] :)

[1] https://twitter.com/xsteenbrugge/status/1188798045158293505

carlbarrdahl · on Oct 30, 2019

beautiful! I enjoy how the morphs surprises me, making them hard to predict.

sladix · on Oct 30, 2019

Amazing work ! Congrats !