00:05hey everyone and welcome back to
00:06generative now this is the show where we
00:09talk to the builders who are creating
00:10the world's most exciting AI products
00:12and companies I am Michael mcnano I am a
00:14partner at Lightspeed and today we've
00:16got an awesome conversation I spoke with
00:18Mikey Schulman CEO co-founder of sunno
00:22AI about the future of AI generated
00:24music we talked about why audio's been a
00:27bit behind the curve behind other
00:28modalities like text and video
00:30and how AI generated music could change
00:32how we experience music moving forward
00:35so take a listen to this conversation
00:38Schulman hey Mikey hey it's great to be
00:41here I am so excited to talk about sunno
00:45uh it is definitely one of the most fun
00:48AI products I've played with and it
00:50seems like others that have have used it
00:52all have uh similar feelings and so I'm
00:55excited to get into it excited to get
00:56into the story uh before we do that I
00:59think we have have to go back a little
01:00bit because what I understand about you
01:03and your interests and passions growing
01:05up they're very aligned with what you're
01:07working on right now yeah definitely uh
01:09I think i' I've definitely been doing
01:11music for much longer than I've been
01:12doing AI uh played a lot of piano as a
01:15kid um taught myself a few other
01:17instruments uh played bass in a in a lot
01:20of bands in in high school and college
01:22um in New York uh some some very small
01:26clubs that maybe you've heard of uh and
01:29um really really fell in love with it I
01:33um you know I I remember very distinctly
01:35the first show um I played ever uh was
01:39uh with a a four-piece band and I
01:42remember at the end of the night getting
01:43a very small stack of bills uh from the
01:46promoter and I remember just thinking
01:47like wow this is Criminal uh I know that
01:50I had more fun than the audience did
01:52tonight it wasn't a very good show but
01:53it's just so fun to make music uh with
01:56people and um you know it I guess it
02:00comes full circle having having done a
02:02lot of stuff in between then and now
02:03being able to let people make music uh
02:06is a lot of fun and they enjoy it and it
02:09definitely feels good that we make
02:11something that makes people smile well
02:14it it sounds like you were entertaining
02:15people you were getting paid for it so
02:17um what about what about did you did you
02:19and I guess your your bandmates did you
02:21try to to you ever spend any time in the
02:23studio did you try to record music or
02:25only ever really playing out live we did
02:27we we recorded uh one EP I'll say like
02:30I'm certainly no seasoned uh Studio
02:32musician so uh take this all with a with
02:35a grain of salt but like a lot of takes
02:37it's like laborious and and at least for
02:40me in many ways less fun than just
02:42playing for fun um I I actually remember
02:46uh one track I um I it was actually like
02:50a really good take and I actually kind
02:52of like slipped off my chair a little
02:55bit like at the very very end um and we
02:58end up having to basically redo the
03:00whole thing it was like really really
03:01sad yeah and and and just thinking like
03:04wow that would never have happened if we
03:05weren't recording so so it it seems like
03:07you were very focused or at least for
03:09part of your life really focused on
03:10music but uh you studied physics at
03:12Harvard which was a very very different
03:14thing than music and then you end up you
03:18end up at Keno right tell us tell us a
03:20little maybe how did that happen and
03:22tell us a little bit about Keno maybe
03:23for people that that don't know about it
03:25yeah total total accident honestly
03:27really a nice right place right time
03:29type of a thing for me um I was in the
03:31last year of grad school and I was
03:33introduced to some people at Kento one
03:35of whom is is one of my co-founders now
03:37Martin um and we went to lunch uh
03:42because I was local and um at lunch they
03:44said when do you want to come interview
03:46and um I said I don't know I'm a grad
03:49student I can do whatever I want and
03:50they said how about right now and so we
03:51went upstairs and I and I interviewed
03:53and I did very poorly um but they they
03:56decided to to give me a chance and yeah
03:59like I said right right place right time
04:00a lot of a lot of stuff kind of uh
04:02luckily went right at kentro so kro was
04:05a company that did um machine learning
04:07for financial services um we did a lot
04:10of NOP you can think there's a lot of
04:12documents that are really financially
04:14relevant and had you kind of in an
04:16automated way make sense out of a lot of
04:17those um uh acquired by S&P Global in
04:212018 in some sense this was amazing uh
04:24you know we we got acquired by a company
04:26with the kind of the one of the biggest
04:28if not the biggest Pro of financial
04:30documents that you might want to play
04:31with so you know you're kind of like a
04:32kid in a candy store um got lots of good
04:35data to train on we had a lot of fun
04:37learned a lot met a lot of great people
04:39um and sort of was the the impetus for
04:43for starting sunno is we uh we did one
04:47audio project which was uh learning to
04:49transcribe earning calls so do speech to
04:51text um on public company earning skulls
04:54so if I don't know if you've ever read
04:55one Mike uh in in your SE the transcript
04:58I have I have I you know when I was at
05:01Spotify I would uh I would try to listen
05:03to most of them but whenever I couldn't
05:05listen I would I would read the
05:06transcripts so Prett pretty dense L
05:09dense information yeah not not real page
05:11Turners but um there's a good chance
05:13then that you read an S&P transcript um
05:16oh okay uh and uh a lot of people resell
05:18the S&P ones and they're perfect uh
05:21those transcripts and um pre 20109 they
05:25really couldn't be done with any um
05:27automation or machine learning and so
05:28you know people would ask like why don't
05:30you just go to insert your favorite
05:32cloud provider here and get speech to
05:34text from them the answer is that it
05:36will come back with so many errors that
05:38it will take longer to fix than to start
05:40from scratch um and of course if you
05:44have 20 years of perfectly transcribed
05:46training data sitting in the basement
05:48and you're a machine Learning Company
05:50like this you know your eyes kind of
05:51light up and that was that was our first
05:53foray into um audio AI um and like the
05:58rest is history we felt fell in love
05:59with it it is like it's so beautiful
06:02it's so much more human it's messy
06:04there's so much interesting stuff in it
06:07um and uh it is so far behind images and
06:11text and I think that was definitely
06:13true in 2019 when we started on that
06:15project and you know in some sense it's
06:17more true now if you just think about
06:19everything that you've seen happen in
06:21images and text and um there's nothing
06:24fundamental that says audio has to be
06:26behind forever um and that was I guess
06:29the kick in the pants that we needed to
06:30say okay let's go and and let's do this
06:33right we we love we love audio we love
06:35AI um let's let's do this yeah I mean
06:39that's an observation we we had at
06:41Spotify often uh and Daniel e the CEO
06:43has talked about this um in that audio
06:47is something that everyone consumes we
06:49all listen to music we all you know many
06:52of us listen to podcast audio books we
06:54the radio I mean it is massive massive
06:57in terms of the amount of attention that
06:58gets uh allocated to consuming audio
07:01content but for some reason it hasn't
07:03necessarily been valued in the same way
07:05as as say video I think things are just
07:08beginning to change now but you know
07:09even rewind just six months or something
07:11like that and audio looks a lot like
07:13text did in 2019 which is to say um
07:18there's a lot of very uni purpose models
07:22like very task specific models and so I
07:24you know we we we we look at text a lot
07:26to kind of understand the landscape and
07:28so imagine uh I I know this we did this
07:30at kencho things like named entity
07:32recognition it's like here's a piece of
07:34text let me pull out all of the people
07:36and places and companies in this piece
07:38of text and you'll train a model to do
07:40only this one thing and you will always
07:42be extremely data limited you're
07:44basically always just limited by how
07:45much labeled data you can get your hands
07:47on and I think audio in those two tasks
07:51looks a lot like that it's like how much
07:53label data can I get on speech to text
07:55or how much label data can I get on text
07:59these models end up small and uh
08:02somewhat brittle because of the lack of
08:04all that data and now go back to text no
08:07one would ever think about training a
08:08named entity recognizer model you would
08:10just paste some text into GPT and you
08:12would say give me the named entities in
08:14this piece of text and it's because
08:16there was some sort of Paradigm Shift
08:17there of I'm going to forget about a
08:20single purpose model I'm going to train
08:21a very large self-supervised model on as
08:23much unlabeled text as I can get my
08:25hands on and okay then let's just do the
08:28same thing for audio train a large
08:30self-supervised model on as much audio
08:32as we can get our hands on and then we
08:33can figure out how to make it do the
08:35things that we need to do um and I think
08:37you're just beginning to see that happen
08:40audio so if I could um maybe my my
08:44understanding of what you're saying
08:45probably in a much more uh simple simple
08:48way because I'm obviously not nearly as
08:50close to the research as you are it
08:51sounds like lack of training data lack
08:54of data has been a big limitation G GA
08:56gathering up a bunch of text doesn't
08:59seem that hard there's you know a
09:01treasure Trove of it sitting on the
09:03internet audio maybe not so much is is
09:06that kind of what you're saying big time
09:08Big Time so there's no common crawl or
09:10pile for audio um and even if you had it
09:14which you don't it's really hard to work
09:16with audio um you can't easily inspect
09:19your data you can't easily search over
09:20your data you know in text if you just
09:22have two data points and you want to
09:24know are these two data points the same
09:26are these two sentences the same that's
09:28really really easy to do in audio that's
09:29really really hard to do and so you just
09:32have to be much more careful um with how
09:36you arrange an organize your data um in
09:38a way that you just don't have to in
09:40text and so yeah it makes people not
09:42want to do it got it so how much of the
09:46focus of Keno was on this work that you
09:48were doing around audio and
09:49transcriptions of earnings call was that
09:51just sort of a sliver of the work and
09:53and where did that happen along sort of
09:56Keno um it was like a it was a part but
10:00not not the majority at all like most of
10:02what we were doing was text and it
10:04happened um maybe started a year after
10:07the acquisition you know after we kind
10:09of leared uh uh where all the data lived
10:13and What all the interesting projects
10:14are and I think um don't get me wrong
10:18there are a lot of future interesting
10:20things to do in financial services with
10:22audio but I think ultimately there's way
10:25more stuff with audio outside of
10:27financial services uh that you can do
10:30and that's like really what Drew us to
10:31want to go do that and I think um
10:35Financial Services is a for good reason
10:38a somewhat more risk averse uh domain
10:42and so I think there's so many
10:44interesting things to do with text there
10:46that it will be hard to look away from
10:47those text based projects got it so so
10:51you're you're you're running machine
10:52learning at Ken show then obviously you
10:54go through the acquisition you're
10:56inspired by this this challenge around
10:58audio did did you and your co-founders
11:00actually I believe all your co-founders
11:02were at Keno did you leave specifically
11:04to go work on this audio project that
11:07soono uh pretty much we knew that um as
11:11much as we loved our jobs like pursuing
11:13audio in a Financial Service Company
11:15doesn't necessarily make sense and so um
11:18I think we didn't know exactly the form
11:20uh that our product would take but we
11:22knew that there was big opportunities
11:24yeah so tell us a little bit about that
11:27that sort of founding Journey you know
11:29I'd love to hear a little bit about the
11:30the early days of sunno you know it's
11:31funny we we looked around for a while
11:34and we talked to a lot of people doing
11:36things in audio um and in some sense one
11:39of the biggest learnings there was what
11:40I said before is like people really
11:41don't like working with audio um and I
11:44think maybe that's what made us special
11:45is we really liked working with audio
11:47but like most people don't like working
11:48with audio and we we knew that um the
11:53quote unquote right way to do things
11:55would be to to try to train a a big
11:57self-supervised model of Foundation
11:59model is is the buzz word and we always
12:01said we we wanted to do this right so
12:03that um things are kind of future proof
12:05and it gives us capabilities um very
12:08easily compared to training things in
12:10sort of the old style way and honestly
12:12we weren't sure the instant we left Keno
12:15that we would do music uh as opposed to
12:17speech you know we knew speech very well
12:19a lot of people um told us like speech
12:21is this really really big Market don't
12:23focus on music like it that would be a
12:25really really bad idea and I think a
12:27couple of things happened one is you
12:29know by virtue of all four of us being
12:31musicians and audio files like we kind
12:33of couldn't help ourselves in some sense
12:35of of of starting to do music um and
12:38then there was another really big
12:39realization which is that we put out um
12:42an openin source project called bark
12:44which is a text to speech model um and
12:46it got good update by the community got
12:49a lot of GitHub stars and what we found
12:52is that um the we would ask people uh
12:56like what they were interested in um on
12:58the the there there was a little sign up
13:00form on the read me from bark and the
13:03biggest thing people wanted was music
13:05and this was like a big aha moment for
13:06us it's like here's this text to speech
13:08model it got a lot of popularity people
13:11like it and the biggest thing that
13:13everybody here wants is not text to
13:14speech it's music and so um if you kind
13:17of can't help yourself from trying to do
13:19music and then everybody in the
13:20community is telling you to go do music
13:22it's that's like a pretty strong signal
13:24that was a little over 6 months ago so
13:26we're like just past 6 months from
13:27releasing our first music model
13:29um we've released a few more since then
13:31and yeah haven't haven't really looked
13:33back so so you you come out with bark
13:37and you're saying developers in the open
13:39source Community are saying hey we want
13:41to use this for music We want to build
13:43music applications with this model
13:45that's right and in fact you know we had
13:47a we had a little Discord server um and
13:50we would see people trying to pull music
13:53out of bark albeit poorly and this is
13:56like a this is like a real clue of like
13:57here's somebody really abusing your
13:59model to try to do something it's not
14:00meant to do because they really want to
14:02do it yeah what what was that like like
14:04did it work was it actually spitting out
14:06music or what what did that sound I I
14:09will decline to say what is and is not
14:10music um no it it spit out like little
14:14bits of Music sometimes not terribly
14:15reliably obviously um you know and
14:19sometimes it would sing and sometimes it
14:20would be background music and and it was
14:22a total crapshoot but um this was a
14:24model meant to do something very very
14:26different got it and and and maybe talk
14:29a little bit about the model and and and
14:31how it worked and you know give us sort
14:34of the simplified version of of how this
14:35model Works maybe compare to for some of
14:38the other modalities we talked about
14:39text and video it's no secret we're
14:41we're big fans of transformers this is
14:43probably our text backgrounds um not all
14:46audio is done with Transformers there's
14:48a lot of audio that's done with
14:50diffusion and these two um these two
14:53methods have pros and cons depending on
14:56the modality also but we we are big fans
14:59um some of it was inspired by some
15:00academic work but there really was very
15:02little out there in the open source
15:04around doing things in audio with
15:07Transformers and so there was a lot of
15:08kind of basic research or or technology
15:11that we had to build um we had the Good
15:14Fortune of by picking something like
15:16Transformers where there's a lot of work
15:18being done in text there's a lot of
15:19stuff from the open source that we were
15:21able to B to borrow so if you go and you
15:23look at the source code for um bark
15:25you'll see big capital letters we we we
15:27thank and and attribute a lot of the
15:29code to for example Andre kpa's nanog
15:31GPT which is like a really really easy
15:33strip down implementation of GPT and
15:36it's just like having resources like
15:38that available to us really let us focus
15:40on the bits that we love to do which is
15:43really understanding audio and really um
15:45trying to model audio correctly and so
15:48um bark was a series of a few
15:51Transformer models that you need to turn
15:54uh text into ultimately nice sounding
15:57speech or sometimes uh in some abusive
15:59use cases nice sounding music but um
16:03yeah you know it's it's it's crazy to
16:04think that that was a year ago uh I
16:07think we've Advanced a lot in the
16:08community uh and the open source
16:09community and the research Community has
16:11also Advanced a lot since then so so
16:14developers Wanted music you guys as
16:17Founders were very passionate about
16:18music sounds like you were all musicians
16:20but maybe beyond the passion for
16:23music why music from a product Direction
16:26like what's what's the opportunity for
16:29generative music I think I think there's
16:31a lot uh and I think we only focus on on
16:34some of it I'll tell you uh the corner
16:36that we focus on which I think is uh the
16:39most interesting and the most fun um is
16:44making music that doesn't exist and so
16:48um I think you know the way we think
16:50about this is let's just look at the
16:52music landscape right now most people uh
16:57have a streaming service like Spotify or
16:59apple music and they kind of sit there
17:01passively listening to music much of the
17:03time maybe not even paying attention to
17:05it and um there is so much more to
17:08experiencing music than just that and
17:12one very big thing is creating music and
17:14I I guess I'm I'm lucky to have
17:16experienced that Joy creating music with
17:18people but there's usually a pretty big
17:20barrier to doing that which is like
17:22getting pretty good at an instrument or
17:24getting pretty good at some complicated
17:26piece of software like Ableton or log or
17:28even Garage Band and you can think
17:31generative AI is a tool or means to an
17:33end to letting everybody kind of take
17:35the sounds that are in their head and
17:37and make really nice music out of it you
17:39know one thing one thing that like we
17:41often think about is you look at like
17:43the gaming industry is 30 to 50 times
17:47bigger than the music industry and you
17:49know if you wanted to be really
17:50reductive about why that is it's because
17:52like you are a very active participant
17:54in gaming and so you know one one thing
17:58to think about is like what are the
18:00other 49 50ths of music that we've just
18:04yet to build a compelling product
18:05experience around and it's not only like
18:07I sit in front of my computer and I make
18:09music and then I share it with someone
18:11although that is really really big but
18:12there's lots of things from doing it
18:15with people like you often play games
18:16with people and that's can be some of
18:18the most fun gaming experiences uh you
18:21know we we talk a lot about having
18:23collaborative concerts and stuff like
18:24that um or little Jam sessions there are
18:28I think totally unexplored social
18:30dynamics around what happens when you
18:31let people make music for their friends
18:34and share it the same way you might do
18:35with an image so there's just so much
18:38that is I think unexplored and we're
18:40just beginning to explore the creation
18:42bit now but um there's a lot more to
18:44come after that maybe to set the context
18:46for I'm I'm sure a lot that we're going
18:48to be talking about with this what is
18:50the product today just talk about whatso
18:52does specifically today so today we let
18:55you um in a couple of ways make songs
18:58that don't exist right
19:05now so there is a song that's in your
19:07head there's something that's going on
19:09in your life right now and you can turn
19:11that into a song uh the same way that
19:13you might write in a journal or take a
19:15picture of something you can turn it
19:16into a song and music is a really human
19:20uh form of expression that I think we're
19:22all really hardwired to want to do uh
19:24you can do something as simple as type
19:26in uh make me a reggae song about
19:30podcasting um or you can go a little bit
19:33deeper and you can bring your own lyrics
19:34and you can really understand the
19:36different ways you can prompt these
19:37models to do things whether it's with uh
19:40playing with line breaks or saying this
19:42is a verse and this is a chorus or uh
19:45describing the music with adjectives
19:47that most people would not think to use
19:48to describe the music and out will pop a
19:50song um and hopefully you enjoy it um
19:53and I think most importantly hopefully
19:55you've enjoyed the process of of
19:57arriving at the the final product now at
19:59scale like when people are doing this at
20:01scale like what do you what do you think
20:03the role of generative music will be in
20:06relation to non- AI generated music
20:10that's you know fast forward five years
20:12from now like are my kids instead of
20:14listening to Taylor Swift are they going
20:15to be listening to AI generated music is
20:17there a hard line or how how do how will
20:20these things coexist I'm not necessarily
20:22sure there's a hard line because I think
20:24AI is a set of tools and those tools
20:26will come to artists as well and while
20:28we are not focused on that other people
20:30are and I'm sure there's going to be a
20:31lot of work um where artists do things
20:34with AI but um I think the answer is
20:37definitely both and um you like
20:41listening to Taylor Swift because you
20:42have some connection with her and her
20:44music and what she's writing about and
20:46you will also like listening to the song
20:49that your best friend made for you
20:51because you have a very different kind
20:52of connection with your best friend and
20:55um I think you know when I said
20:59uh we're trying to figure out what the
21:00other 49 50ths of Music really is that
21:03is to say I don't think that first 50th
21:04is going away it's like really really
21:09so I yeah I I think I think there is a a
21:12pretty big Continuum here that we're
21:14trying to explore how much do you all as
21:16a team think about the format of music
21:18as it exists today right now music is
21:20delivered obviously digitally via
21:23streaming service Spotify um it's
21:25organized into these algorithmic
21:28that could deliver to people as a result
21:31the you know the actual creative format
21:33of the music is often optimized by the
21:35artist by The Producers to play into
21:37that you know it's something like two
21:39and a half minutes to three minutes gets
21:41right to the you know gets right to the
21:42hook to catch you as quickly as possible
21:44how much do you think about the format
21:46of music as it exists today and how do
21:48you think the format will evolve in a
21:51world where much of it is AI generated
21:53it's a really good question we think
21:54about this a lot and I think the the
21:56short answer is we don't know but we
21:58know it's going to change and we know
21:59that it is changing and I think uh you
22:03know like you mentioned uh songs are now
22:05you know two and a half to three minutes
22:07and maybe five years ago they used to be
22:09longer and 10 years ago they used to be
22:10longer still and they're trying to get
22:12to the hook faster you know for for
22:14various reasons having to do with how
22:15streams are paid out um you have things
22:17like Tik Tok now where you're taking 60
22:19seconds of an existing song and you're
22:21doing something with it and I think AI
22:22will certainly accelerate some of the
22:24changes in format but we don't exactly
22:26know what they are and we are playing
22:28around with them a lot you know one
22:29thing that I I think pretty firmly is
22:31you know right now we make songs that
22:32are uh you know one to two minutes and
22:35but a one minute song is very different
22:37from a one minute clip from an existing
22:40song and how do you condense the Journey
22:43of a whole song that used to be 3
22:44minutes into one minute is something
22:47that I don't think it's for us
22:49necessarily to Define I think it's for
22:50us to make the tools to let people make
22:52the music and then that new art form
22:55will kind of emerge um from people
22:58making stuff you know and then kind of
23:00as a Cary when you have a lot of one
23:03minute songs not one minute clips of
23:04songs what is a playlist mean playlist
23:07Maybe now takes like different forms
23:08like maybe actually I make an album and
23:11my album doesn't have 11 songs on it
23:13maybe my album had 25 songs on it
23:15because it's still that listening
23:17experience and it is a new way to tell
23:20stories um in kind of smaller smaller
23:23chunks you know it's like a a book with
23:25three-page chapters yeah it seems like
23:27the delivery mechanism of music has so
23:29much to do with the with the the actual
23:32creative format of music right and so it
23:35strikes me that however we're all
23:37listening to music will probably inform
23:40what people end up wanting to do with
23:42sunno as a tool and so do you imagine a
23:45world in which you know a sunog
23:47generated song is delivered in the same
23:50way that say a Taylor Swift generated
23:52song is and we're sort of listening to
23:54them back to back or they happening in
23:56like completely different context and
23:58experiences I I think it is certainly
24:00possible that you're listening to those
24:01things back to back but there will
24:04certainly be new experiences from this
24:06new art form that uh we are trying to
24:09figure out what they are and I think it
24:12would be maybe a little bit reductive of
24:13us to think it has to be all one or all
24:15the other um you know like one thing we
24:21um we are almost indifferent as to what
24:25you do with the songs that you create
24:27you you the songs that you create witho
24:29if you want to put them um on YouTube
24:32that's good we're not going to we're not
24:33going to try to prevent you from doing
24:35that and we're not going to try to take
24:36a cut if you make a lot of money but
24:38we're basically just as happy if you
24:40shared that song with your three College
24:42roommates and you all got a good laugh
24:44out of it and then that's where the song
24:46lives forever and ever so um and again I
24:50think there's a lot in between Spotify
24:53and that three-person group text message
24:55it's interesting to think about where
24:58this all could be going I mean maybe
25:00just taking like image models as a
25:02comparison um it feels like the
25:05creativity and almost the art form of
25:07making an image will naturally change
25:10like the experience of making that will
25:11change as the models get faster and
25:14faster right now I go into mid journey I
25:17type in a prompt I get four of them at
25:19low resolution um you know I think that
25:23experience has then inspires the way
25:26that I create for that medium and I
25:28imagine as it gets faster and faster I
25:30probably start to create differently
25:33right and maybe in a more Dynamic way
25:35now going back toso and applying it toso
25:37I imagine as your models and your your
25:39experience gets faster it will also
25:42impact the creative experience but it
25:44might impact the consumption experience
25:45as well right I mean can we imagine a
25:47world where music is almost dynamically
25:50being created on the Fly for our own
25:54tastes 100% And I think um there's you
25:59know right now uh music gets streamed to
26:02you on the Fly and there's a really
26:04interesting Discovery problem of
26:06figuring out what music out there is
26:08going to be most in line with my weirdo
26:12tastes in music in other modalities like
26:14basically user generated content makes
26:16for increasingly small smaller and
26:18smaller niches that really appeal to
26:20people and ultimately what generative AI
26:22will let you do is let you create that
26:25really really Micro Niche that has only
26:27one person who really really likes it uh
26:29and you can't support a whole industry
26:32just for that one person but generative
26:33AI can make you stuff on the Fly
26:35tailored to your exact tastes and can
26:38keep learning from the things that you
26:39tell it to make that music better and
26:41better so I think that is yet another
26:43experience in music that um doesn't
26:46exist yet today that we are excited
26:48about eventually being able to tackle so
26:51we we talked about how the delivery
26:53mechanism inspires the creative process
26:56but it almost feels like like based on
26:58what you're saying that the creative
27:00process could Inspire the delivery
27:02mechanism in the future if music is so
27:04fundamentally easy to create in this
27:06near near future and dynamic and
27:09personalized maybe we don't listen to
27:11songs the way we do now right where we
27:12just listen to the three minute or even
27:14the one minute song and then skip to the
27:15next one right it could be something
27:19entirely 100% I think the B one of the
27:21big unknowns here I we don't we I'm I'm
27:24100% sure you're right and I'm also 100%
27:26sure I don't know the answer to exactly
27:28what it looks like too um I think one of
27:31the biggest question marks for us is
27:33ultimately what do the sharing Dynamics
27:35and the social dynamics look like here
27:38and so just to just to be clear about
27:39that you know right now when we stream
27:41music there's a bunch of artists who
27:43make music the rest of us just passively
27:46listen to it we stream it it's like very
27:48unidirectional um and there is some
27:50connection between artists and fans but
27:52it's it's it's somewhat cursory and then
27:55there are and I would say that kind of
27:57resembles let's say the the highend of
28:00Instagram where you have people spending
28:02six figures making posts and they have a
28:05lot of followers and they make their
28:06livelihoods like that but there's lots
28:08of things that are missing in music that
28:10exist in other modalities whether it's
28:11the social graph that exists um on
28:14Facebook or whether it's like that tail
28:16end of Instagram where it's like someone
28:19with a locked account with 17 followers
28:22posting pictures of their kids and that
28:24doesn't exist in music yet today and
28:26that can exist in music
28:28and the exact nature of how these things
28:30get shared and liked and remixed and
28:33used for uh additional creative
28:35inspiration we don't know those details
28:37but they're going to matter a lot in how
28:39these things ultimately get distributed
28:41yeah it's fascinating um so what's gone
28:44on so so far since you shifted to music
28:47there's a Discord there's a website
28:49believe you also had a really exciting
28:50partnership with Microsoft that you
28:52announced recently maybe talk us through
28:54some of the highlights over the past
28:55couple of months yeah um so we started
28:58on Discord and it was an amazing way you
29:00know we had a Discord from our open
29:01source uh work and it was an amazing way
29:04to get our stuff out to the community
29:06and and and see how people liked it um
29:09in November we um released our first
29:12non- Discord experience uh uh web app
29:15and and people like that even more and I
29:17think since then yeah it's been it's
29:18been super exciting so the the
29:20partnership with Microsoft is is is
29:22great they um they've integrated our
29:25kind of free tier into co-pilot which is
29:27a set of lots of different creative
29:29experiences and and just like you may
29:31want to make an image uh to uh as an
29:35outlet or for any other creative reason
29:37you may want to do the same with a song
29:39how did that come together I mean that's
29:40that's huge that's huge it's it's funny
29:43I I think um I I've heard at least
29:45rumors that somebody had shown a the
29:48very thin web app to Satya over
29:50Thanksgiving and he played with it and
29:52he really loved it so obviously um we
29:55were we were pretty stoked to have the
29:58to get uh our songs into Microsoft
30:01co-pilot eventually that is a an amazing
30:05way to spread the word about how
30:06enjoyable making music really is uh you
30:09know we've just uh announced a the
30:11little Valentine's Day experience um I
30:14think music has is so important when we
30:17think about let's say soundtracking our
30:19lives and these different experiences
30:21that we can bring to people that are
30:22tailored to to for example Valentine's
30:25Day and how how do we get people making
30:27songs for one another and spreading love
30:29and spreading Joy um really really
30:31exciting people people seem to like it a
30:33lot um so yeah I don't know it's it's
30:36been been a a a really fun a couple
30:39months um and we build a product that
30:42smile speaking of product you know so
30:45much of your background has been in um
30:47machine learning engineering research
30:49now in this role leading sunno uh I
30:52imagine you're spending so much of your
30:53time on product designing product
30:56thinking about the product experience
30:58how how do how do you approach Building
31:00Product atso um to connect users with
31:03with this model that you've developed
31:05yeah I'll caveat this with I don't think
31:07I'm terribly good at product but there's
31:08certainly people here who who are I
31:10think there there's a few important
31:12principles that that we that we think
31:14about um I'd say the first is uh one
31:18thing we say is like Aesthetics matter
31:20uh which is to say it's very easy with a
31:22machine learning background to just be
31:24super obsessed with whatever
31:25quantitative metrics you have about how
31:27good your models are and ultimately all
31:29of these quantitative metrics are are
31:31flawed and they're especially flawed in
31:33in music just because they're not very
31:35mature and ultimately like these are the
31:38best judges of what's going on and so
31:39you need to listen to the stuff that is
31:42coming out of your model and that is
31:43like a big uh driving ethos both for the
31:46product and both for and and the whole
31:48company um another big thing that that
31:51we think about is just like what are the
31:53workflows that people want to do that
31:56are intuitive for for people to make
31:58music and it can be tough actually with
32:00a lot of um people at the company who
32:03really love music to
32:06um it it to to tend to make things that
32:09resemble the workflows that
32:11professionals like to use to make music
32:13and you know one thing like we always
32:15tell ourselves like my mom does not want
32:17to make music the same way a
32:18professional producer wants to make
32:20music and for example I think once
32:22you've once you've talked about
32:24something like I want to stem this out
32:25and I want to take that kick drum and I
32:27want to take out the low end like I
32:29promise you that's not how my mom thinks
32:31gone too far yeah it's gone too far and
32:32so um thinking about it very much from
32:36the perspective of here's somebody with
32:39very specific tastes in music who's
32:41never produced a bit of music in their
32:42lives how do they intuitively think
32:45about this process and um either
32:48fortunately or unfortunately it means
32:50that we end up building things that are
32:52often not suited for professional
32:54musicians and it's just kind of not the
32:56goal right now the last thing I'll say
33:00uh it is also very easy uh with like a
33:04lot of academics to think that you can
33:07just first principles reason your way
33:09toward what is the best experience here
33:11and um I think this is a this is an
33:14empirical science where um let's say
33:17because we have backgrounds in music we
33:18actually can't intuitive the way novices
33:21want to use it and so we need to run a
33:24lot of experiments and we need to ship a
33:26lot of features and we need to see how
33:28people like them and then we can kind of
33:32there speaking of product and the
33:34product experience you talked earlier
33:36about how now it's you know you can you
33:38can bring your own lyrics you can just
33:40describe it but it sounds like it's
33:41mostly coming from you know Tech a text
33:44to song approach as you think about the
33:47products Evolution what are the other
33:48ways in which you might enable people to
33:51create music outside of just text yeah
33:54it's a really good question I'll say
33:55before I answer it I'll just say like uh
33:57I'll be unkind to machine learning
33:59people because I am a machine learning
34:00person like you know people will think
34:02like oh this is a text to music model
34:04and I think that is an extremely
34:06non-user Centric way of thinking about
34:08what we're doing and if all you can do
34:10is think about A to B text to music or
34:13tapping on the table to music or it's
34:15like um we will never build something
34:18people want to use and so we really try
34:19to think hard about it from the other
34:21end and think about like what inspires
34:23people to want to make music and how can
34:25we get them to express that and so yes
34:28describing stuff is great but not
34:30everybody is in the habit of writing
34:31lyrics maybe there are other ways of
34:34describing things whether that's tapping
34:36your pencil on the table and recording
34:39that um or maybe you sang into your
34:42microphone and for some inspiration um I
34:46think maybe you can come with your own
34:47samples of music and kind of describe
34:49why you like them and and start that as
34:51a process kind of like you might do mood
34:53boarding in Pinterest um there's lots of
34:56stuff there I think I think um one of
34:58the workflows that we are really into is
35:01what we call soundtracking your life and
35:02it's like what are all of the random
35:04things that are happening to you today
35:06and how are you going to kind of show
35:07those to a model as inspiration for the
35:10sounds that are in your head um and
35:14maybe it's a really loud car horn you
35:16know like whatever it is um so we try to
35:19keep a pretty open mind there um we've
35:21got a lot of stuff coming that I I don't
35:23want to that I that I don't want to
35:25tease out just yet but I think
35:27um I think we are very cognizant of the
35:31fact that text to music is is going to
35:34limiting it's really really cool and
35:36then on the output side you know you've
35:38you've clearly made some very
35:39intentional choices um obviously in
35:42terms of the quality which is phenomenal
35:45um but also on the limitations right I I
35:48believe uh you know you make it very
35:50difficult if not impossible for anyone
35:52to make anything that may be infringing
35:54of of existing artist rights maybe talk
35:56a little a little bit about that we
35:58always said we were going to do audio
36:00the right way and techn technically that
36:02meant um kind of the foundation model
36:04approach and then you know ethically and
36:06and legally that means um not uh trying
36:11to infringe on um existing artists and
36:14trying to do this in a way that is
36:16artist friendly and this is not only
36:18because it's ethical and legal and moral
36:20it's because it's also what we think is
36:22the future of how people want to do
36:24music it's that other 4950 that we were
36:27talking about before of people aren't
36:29going to lose their connections to
36:30artists um but there's just a lot of
36:33other experiences that we can do and so
36:35um we don't let you say make me a Taylor
36:38Swift song um actually our models don't
36:41even know who Taylor Swift is but
36:43instead of actually making something up
36:46that's just not um we try to gently
36:50nudge users into the behavior that we
36:52think is the long lasting one which is
36:54making the original music that's in your
36:55head um and the same for other people's
37:00and um one thing that you know is
37:02actually nice for us is there's a lot of
37:04these cover type things where you know
37:06maybe you could have uh Taylor Swift
37:08doing Enter Sandman by Metallica And I
37:12think those tend to go very viral and I
37:14think that is really not the future uh
37:18of how people want to do music and so
37:19we're very happy to kind of let those
37:21things go viral and then kind of um
37:24evaporate somewhat quickly and I think
37:26the analogy that I have not with your
37:28platform exactly exactly so you can't do
37:30that with our platform you can do that
37:31with someone else's it is almost
37:32certainly illegal um it is very viral
37:35and I don't think that is the long
37:36lasting use case I think the the analogy
37:38that I have in my head for for making
37:40covers for Taylor Swift covering
37:42interent Sandman is uh the first time
37:45you played with GPT and you made a
37:46Shakespeare sonnet about drinking coffee
37:49and then you made another one and then
37:50the third time you were like this is
37:52cute but this is not what I want to do
37:54and that doesn't mean that GPT isn't
37:56extremely cool and useful because it is
37:59it's just not for writing parody
38:01Shakespeare sonnets the comparison to
38:03GPT makes me think you know one of the
38:05one of the things that people knock it
38:06about is that you know it it it sort of
38:10when you try to get it to do things that
38:11are really creative it sort of hits this
38:14kind of upper bounds and maybe that's
38:16because it's you know it's a it's train
38:19you know it's training on the world's
38:20knowledge so it sort of it it bumps up
38:23against the limits of that knowledge and
38:25that information um thinking about music
38:28though music is obviously so creative
38:30and if you think about like the greatest
38:31artists ever so many of them became the
38:35greatest artist ever because they did
38:37something that no one had ever done
38:38before right um and so do you imagine a
38:42world in which sunno and and maybe just
38:44music models in general can do things
38:46that have never been done before or does
38:50it is it always going to kind of hit up
38:51against the upper limits of of
38:53everything that came before yeah I I
38:55think we can make make music that is
38:57kind of above the the limits of what
38:59we've seen right now um and I think that
39:01for a few reasons one is you know humans
39:03keep making music that is above the the
39:05limits that humans continue to see and
39:07so they are you know standing on the
39:10shoulders of giants you get inspiration
39:12from everything not just from other
39:13musicians but you get inspiration from
39:14the car horn that that honked outside of
39:17your window um and uh there's a lot of
39:22like uh new ways that humans can be
39:24inspired by these models as well so am I
39:27using autotune when I record my album
39:29like that did something that was
39:31impossible for a human to do to sing
39:33that perfectly and then the tool let me
39:35do it or that I use an effects pedal on
39:37my guitar that was impossible to pull
39:39out of my amp but that thing did it and
39:41so I think this is kind of a natural
39:44progression and I don't know where the
39:45line of which constituted Ai and which
39:47was not but um I I think that's kind of
39:50always been a part of this and I'll tell
39:53you one thing that I'm actually pretty
39:54excited about um if if you think about
39:57the progression of Music um more
40:02recently progress in music looks like in
40:04like things that are sonically
40:05interesting so like interesting sounds
40:07but not necessarily more interesting
40:09chord changes um and I think uh AI has
40:14the potential to kind of bring that back
40:16where you do more interesting stuff
40:18melodically and harmonically it also
40:21obviously sonically but um I think uh it
40:26it is a way for humans to express the
40:29sounds that are in their heads and
40:31sometimes uh that there's a block um
40:34that kind of artificially sets the bar
40:35for humans to love you know one of the
40:37things that's come up on on this podcast
40:39a couple of times is this notion of of
40:42training and training on the entirety of
40:44the internet um definitely feels like
40:46it's a little bit of a wild west right
40:48now um you know the the open AI New York
40:51Times case comes to mind um and
40:53everything you know everything that's
40:54being discussed there like how do you
40:56think this shakes out are there changes
40:58coming and what will that mean for the
41:00future of these large foundational
41:03models yeah I I think there definitely
41:05are uh changes that are going to come I
41:08think some will be across the modalities
41:11uh others may be modality specific I
41:13think um music is particularly
41:15interesting because there are just much
41:17more established uh ways of uh
41:21interacting with rights holders and
41:24um for just for example on the on the um
41:29let's take that New York Times open AI
41:31lawsuit where uh the New York Times is
41:33upset that you can get an entire New
41:35York Times article at a GPT with the
41:37right prompt um there are you know if I
41:41have a restaurant and I stream some song
41:43there are like very obvious ways that
41:46are established that I can um compensate
41:49the owners of that songs the master and
41:51the songwriter and that doesn't exist in
41:53text and so I think it's not going to be
41:55blanket solution that apply across the
41:57whole industry I think some things will
41:59and some things won't um the thing
42:01that's going to be really tricky here I
42:03think is actually uh geographical
42:05distance uh geographical differences
42:07between um let's say different countries
42:10and um I I'm honestly a little bit
42:12worried about how things will shake out
42:14is is open AI going to have to have
42:16different models in the EU and in the US
42:19and stuff like that and just um that
42:22that make it very difficult and kind of
42:24prone to to I don't know hacking and
42:27people are going to be using like vpns
42:28to access different countries models and
42:33um I think it's really really early I
42:36think people think this is going to get
42:38sorted out in the next six months and I
42:40don't even think the open AI the open AI
42:42lawsuit won't get sorted out in the next
42:43six months but that won't even sort
42:44everything out for the the whole
42:46industry so you know for us as sunno
42:48especially with music being somewhat far
42:49farther behind images and text we watch
42:52this stuff very closely you know we we
42:54uh try to always do the legal and moral
42:58and ethical thing and these things
42:59aren't always the same something that's
43:01legal may still be not artist friendly
43:03the thing that we stress is like it's
43:04like super early and uh I don't think
43:06there are any scenarios that are amazing
43:09for us or terrible for us that I can
43:12that I can foresee in the next year it
43:15seems like open AI um is is you know
43:17doing licensing deals with uh with media
43:23publishers and you have to imagine a
43:25world in which those Med Publishers
43:26obviously then benefit in some way maybe
43:28you know maybe it helps Drive
43:30subscriptions or I don't know maybe
43:32there's a new version of these models
43:33where you know you can use certain ones
43:35that that Output New York Times and
43:37certain ones that don't do you do you
43:39imagine similar types of deals happening
43:41in music can you can you imagine a world
43:43in which you know I know we keep
43:45mentioning Taylor Swift the Taylor
43:47Swifts of the world are collaborating
43:49directly with music models um to create
43:53you know generative Taylor Swift music
43:56yeah 100% so you know Google is is
43:58starting to do this with their Lia
44:00project and they have a few big artists
44:02um on board there but you know if we
44:05think about it like let's fast forward a
44:07few years and the uh licensing um uh
44:13climate is a little less uncertain and
44:16maybe we can let you prompt the model
44:17with a Taylor Swift song and like this
44:19was the big inspiration and there's
44:21something that is akin to the way people
44:23pay out for sampling now but it's
44:24obviously a little bit different
44:26um and then all you had to do Mike when
44:29you wanted to make a Taylor Swift
44:30inspired song was like pay for that
44:33sample and I don't know how much it will
44:34be and I don't know how much will
44:35actually go to Taylor Swift but like we
44:37will be able to let you do that and we
44:39can actually do that right now we just
44:40can't let you do that because we don't
44:42know how to pay Taylor Swift for it
44:44right right so it's technically possible
44:46it's just not um the infrastructure for
44:50that from a royalty perspective is not
44:51yet exact exactly what's next for sunno
44:54where where where where do you go from
44:56here what what are you looking at over
44:57the next couple of quarters yeah uh a
44:59lot of exciting stuff so um we've got a
45:02new model that we're going to release
45:03soon um it is kind of better in all of
45:06the ways that we think about so we're
45:07really really excited about it just um
45:09can go Tactical for a second when we
45:11think about models qualities we think
45:14about audio Fidelity like does it sound
45:15like it was crisply recorded with uh
45:18beautiful Hardware um we think about
45:20song quality is it catchy does it make
45:22me feel um and we think about
45:24controllability when I asked for
45:25something does that something come out
45:27um and because music is still so early
45:30it's it's we're still making gains
45:31across all of these so really really
45:34exciting um lots of new ways that we
45:37want people to interact with stuff so um
45:40whether that is different ways of
45:41prompting the model whether that's
45:43different interfaces um into our product
45:45we're like really excited about all of
45:48those um uh yeah those are those are the
45:52big ones uh there there's a couple of
45:53other secret things coming down the pike
45:55that that that we're not ready to talk
45:56about just yet but I think you know the
45:58the future the future of music is really
46:00really big and I think you know people
46:02think a lot about how streaming almost
46:04killed the music industry uh and people
46:07think that AI will almost kill the music
46:09industry and it's I think it's really
46:10really the opposite I think we we are
46:12quite confident that there is another
46:134950 it'ss of music that um we among
46:17others are going to try to
46:19uncover uh always got to ask is sunno
46:21hiring we're always hiring uh um so
46:25we're in we're in Cambridge
46:26Massachusetts um kind of a a fun place
46:29uh for for uh building companies it's
46:32something we've done here before um
46:34always looking for um the best talent
46:37across software and machine learning and
46:39product uh and design and music if you
46:42uh really love building stuff and you
46:44really love music this is probably a
46:46good home for you Mikey thank you so
46:48much this has been so much fun we really
46:50appreciate all the time awesome thanks
46:51so much this was this was great happy to
46:53happy to happy to be here let's do it
46:55again sometimes see you thanks
46:57Mike thank you so much for listening to
46:59generative now if you liked what you
47:01heard please rate and review the episode
47:03that really does help and of course
47:05subscribe to the podcast on platforms
47:07like Spotify YouTube and apple podcasts
47:10if you'd like to learn more you can
47:12follow us at Lightspeed VP on YouTube
47:15Twitter LinkedIn Instagram and
47:17generative now is produced by Lightspeed
47:19in partnership with pod people I am
47:21Michael mcnano and we will be back next
47:23week with another awesome conversation