00:08yeah so sunno enables you to make any
00:11song you can imagine um I think ER is
00:14when we first started talking um things
00:15have evolved a bit um since we released
00:18released bark I think we've we've really
00:20dialed in and have been focused on on
00:23music um and so um yeah it's a tool that
00:27today uh you input a simple text prompt
00:30um and you receive um you know something
00:33like an 80c song which you can continue
00:36and do all sorts of fun stuff with um in
00:39the fullness of time I think you'll be
00:40able to prompt with other modalities um
00:43not just text but perhaps other audio
00:46images maybe even video
00:48potentially um and receive um songs that
00:53are longer or shorter than 80 Seconds um
00:56so little bite-size snacks um or perhaps
01:00um the Grand Vision being um you know
01:03perhaps like these these infinite songs
01:05or or streams of whatever You' like to
01:12to there are four of us so larger larger
01:15founding team Mike Mikey he uh was born
01:18in New Jersey um but um yeah I think he
01:22he has a lot of family in in Israel um
01:25and I believe his uh believe his wife
01:27studied studied or did med school um in
01:30Israel but I'm I'm not sure about that
01:39speaks it's been really neat and and
01:43flattering to to see the interest and
01:47Ando and and these these various models
01:50I think uh bark maybe maybe starting
01:53there was was a bit of a surprise to us
01:55candidly I think I I mentioned this over
01:57email when we first spoke but um we we
02:01really viewed bark as um a research
02:05Milestone that that really proved out
02:06that there was a there there for using
02:08these Transformers based models um for
02:10for audio um and so we we thought it was
02:14a really cool demo essentially and and
02:17proof point that this could work um for
02:19audio um and of course we we use speech
02:21uh given that it's the the most paved
02:23path um within audio certainly relative
02:26to music and and sound effects um and
02:31subdomains um but we were just totally
02:34surprised and overwhelmed by the
02:35interest with with bark and I think we
02:38had always been interested in in
02:40flipping over that music card and and
02:41seeing if this could work for music as
02:43well I think uh most of the team atso
02:45has been musicians or music lovers if
02:48they don't play themselves long before
02:50we we programmed um and so it was always
02:52something we wanted to uh to try but I
02:55think we we were really
02:58um emboldened to to take that leap by
03:01seeing the community essentially bully
03:02bark into singing and so um you know
03:06best case scenario um it sounded like
03:08drunken karaoke worst case scenario it
03:11sounded like total chaos when people
03:13would try to get barked to sing but um
03:16people were were were loving it
03:18nonetheless and so that's when we put
03:19out these these text to music models um
03:22which we've been um working on since
03:26it's actually crazy to think that was
03:27only six months ago um that we
03:30made this move into to really focusing
03:32on on music um and it's been a really
03:37then you you have the privilege of
03:40speaking with the the worst programmer
03:42at atso that that's me um I technically
03:46like I I've I've taken the the corser
03:48courses and the Andre classes but I'm
03:50I'm awful I'm I'm Dreadful um with with
03:53programming but just about everybody
03:55else is extremely technically proficient
03:58um I'm I'm kind of the
04:00the oddb bird here um but um yeah it's
04:03it's it's a great team I I love um
04:05working with everybody I'm really
04:07inspired by um the the small group that
04:12together no that's uh that's Mikey
04:15Michael scholman um who who we were
04:17talking about earlier um I'm just one of
04:20the the co-founders I I think
04:22technically uh Co although um you know
04:27on um a little bit everything right now
04:34days yeah that's cool I'm sure I I got
04:37to talk about this with with with Mikey
04:39I'm sure he's he's seen this but so is
04:41nud when you do or do not include the
04:44bowels fascinating is it is it smart
04:47enough to into it um with without nud
04:50like if you do not include the
04:53B I'll uh I'll have to come back to you
04:56on this I I have to confess ignorance um
05:00um I suspect this may be one of those
05:02cases of of emergent Behavior whereas
05:06the model scale they they Inuit it
05:08things that you didn't
05:09explicitly program them to to understand
05:12but I I I I must con confess ignorance
05:15again I I'm not exactly sure but these
05:17are one of my favorite things about
05:19working in the space is those moments of
05:22um emergent behavior and and within the
05:24audio domain just to wonk out for a
05:25minute they're almost like the most
05:27delightful to me uh because so subtle um
05:31these these like signs of of
05:34intelligence um we previously came from
05:37more of like an Enterprise ASR
05:38background doing transcription for uh
05:41earnings calls um and and like Financial
05:43Services more generally and we trained
05:46this um this amazing speech recognition
05:49model that similarly like could Inuit it
05:51currency symbols um like uh we we um if
05:54you just said you know like uh you know
05:57five five bucks um five bucks uh would
05:59like do dollar sign 5 and and punctuate
06:02it correctly in this amazing way and
06:04there was this feature SL buug similarly
06:07that just blew my mind I had a similar
06:09reaction to that as as this nikud thing
06:11that that you mentioned phenomenon and
06:14um we we actually had to rein this in
06:16but the model would attempt to impute a
06:18currency based on somebody's accent and
06:20the surrounding context um so if a if a
06:24if a British um you know if you said
06:27like five uh that's that's example um
06:30but like you know certain certain Curr
06:34be well yeah maybe it would do like 5
06:36AUD in that case or something like
06:39amazing and our CEO at the time was um
06:42was was English and so um yeah we we ran
06:44all sorts of tests like having him you
06:46know say say stff because our our fake
06:48fake British accents were were awful but
06:51um yeah anyway that uh on on the
06:54generative side that was discriminative
06:56but on the generative side I'm I'm
06:57always like thrilled to uncover these
06:59these Easter eggs and it's like one of
07:00my favorite things um to to learn about
07:07you yeah t TBD I mean I happy to share a
07:11sneak peek at the road map um and then
07:14we can talk about um audio prompting
07:17specifically um some of the promises
07:19some of the risks and concerns that that
07:21we're thinking about um so I guess at
07:25the highest level um we're focused on
07:30then control then speed um in in that
07:34so to date like while in painting is an
07:37example of like a controlability feature
07:39is something that we're we're
07:40researching actively we've up until this
07:44point and we'll continue to do so try to
07:45push the envelope on on quality and get
07:48the songs to a place where they're worth
07:50editing um honestly they were pretty bad
07:54um for a long time I'm a bit of a harsh
07:56critic I I think the songs still leave a
07:59lot to be desired um and so we'll
08:02continue to improve performance before
08:05focusing more on on um you know greeting
08:08variations or all these added in
08:10workflows that said like the the control
08:14controlability feature that I'm I'm most
08:15excited for is in painting um today uh
08:19you know I'll make a song and I love it
08:22and there's a 5 Second section in the
08:24middle that's just not quite right and
08:27today there's only one blunt instrument
08:29available to you which is to regenerate
08:32the entire thing and you lose the parts
08:35you didn't like but you also lose the
08:36parts that you loved and so what we want
08:38to do is enable folks to highlight a
08:42specific section within their song and
08:43then iteratively prompt um and and kind
08:46of nudge that in the right direction uh
08:49so we just call this in painting pulling
08:52um you know analoges and inspiration
08:54from from like mid Journey um or
08:56playground or or some of the the huge
08:59gen roads and making um feedback loops a
09:02little bit tighter from idea to waveform
09:05uh you know six months ago it took a
09:08minute or two to generate something like
09:10a 30 second clip um with the new
09:12streaming architecture you can get um
09:14the song to start playing in in around
09:16eight seconds um which is a huge
09:18difference and so I think continuing to
09:21push things closer toward real
09:23instruments where I want to strum the
09:26guitar and it happens instantly um I
09:28think that that's the north star for
09:30these models as well that they have the
09:32tightest possible feed feedback loops
09:34between your idea to the sound you're
09:39um as for voice prompting um and and
09:43prompting with with other modalities
09:45it's it's always struck me as funny that
09:47we limit the prompts for a music model
09:50to text um there's an old Frank Zappa
09:53quote that I love and you may be
09:57um writing about music music is like
09:59dancing about architecture and I think
10:02sometimes I personally lack the
10:05vocabulary to describe what I want to
10:08come across in the song um and so we're
10:12very excited about a future where you
10:15can actually hum or perhaps sing um part
10:18of your prompt maybe enhance it with
10:21some text maybe you don't even need it
10:23um and and that is like a big part of or
10:26all of your prompt is actually audio
10:27based audio inputs audio outputs
10:31um the primary design constraint that
10:34we're working with is that we want to
10:35focus The Experience on creating
10:37original music just like the text based
10:39prompts and so um we do want
10:45to avoid enabling non-consensual voice
10:48cloning um I'm sure you saw the fake
10:51Drake stuff from Ghost Rider um and you
10:54know I thought it was a a really catchy
10:56song um but I don't think it's
11:00uh a sustainable future um and so I
11:02thought it was was really neat and and
11:04frankly I enjoyed the song If if I have
11:07to be honest um we're not trying to be
11:09like moral absolutist or too
11:11philosophical here but but I think we're
11:13trying to approach this where we're
11:16hopeful that in the future like AI will
11:18be something that we use with artists or
11:20to create new artists not something that
11:22you do two artists so from Drake's
11:25perspective or Travis Scott which was
11:27another one of the songs like
11:29it can't feel good to wake up and and
11:33you know hear a song using your likeness
11:35that you didn't sanction and so all this
11:40Grimes is a different one you know and
11:42like all the elf Tech stuff is amazing
11:44um and and super interesting I mean
11:45Holly hearnen was was a Pioneer here but
11:49to your point like night and day
11:50difference between like opin versus um
11:54uh very much um an unwilling participant
11:57in their own music and so or their own
12:00is where the quot should go not not the
12:02music but um that's all to say like we
12:06are very excited about features such as
12:08like humming a melody and having it come
12:10out in a full symphonic orchestra or
12:13tapping your pencil on the table and
12:14turning it into high hats or the beat
12:16for your hip-hop Loop um you know
12:19something like this I think is is a huge
12:21Tailwind for creation what we want to
12:24avoid is somebody playing Taylor swift
12:26over their iPhone into the suo input
12:29field um and and setting up our users um
12:34for for ourselves and our users up for
12:37for trouble from a copyright perspective
12:40so um that's all to say it's something
12:43we're we're thinking about actively
12:44we're starting to design our mobile
12:46experience and I think that form factor
12:48will lend itself nicely to audio inputs
12:53desktop um but we're approaching this
12:56very thoughtfully very carefully um um
12:59where we want this to be a tool for
13:01Creative expression um but but not at
13:05the expense of um you know the cannon of
13:11music me as you point out it's such a
13:14rapidly evolving extremely Dynamic area
13:16of the law um especially across
13:18countries and it's been amazing to see
13:20how Global the user base is but that's
13:22just more markets um that we um you know
13:25need to take the various local copyright
13:29um into account I will say like what we
13:31do given that this is a moving Target a
13:34little bit is that we try to hold
13:36ourselves to a super high standard
13:37especially on the output side and so um
13:41that's why we've started blocking um
13:44folks from from inputting lyrics to
13:47existing songs within their prompts and
13:50we've actually gone a step further um
13:54and you know candid layers we we've
13:57antagonized some users um doing this but
14:00we we don't allow folks to prompt within
14:02the style field um like in the style of
14:06artists um and so what we're really
14:09focused on is making sure that the
14:12resulting songs from puno are
14:14infringing on existing rights holders
14:18rights um and so we've we've continued
14:21to to pour more time energy money um
14:25into making sure that the songs that
14:27come out of sunno are totally new to
14:30Nova Works um and then yeah to your
14:33point like super super evolving
14:35questions I think we we try to approach
14:37this super pragmatically um we're
14:40GNA um you know train train the best
14:43models we can with the data that we're
14:45allowed to use um and so that's that's
14:47kind of the the posture um but we're
14:49watching all this closely um lots to be
14:52decided by the courts and we'll adapt
15:01yep it's all it's all homegrown and as
15:03I'm sure you understand um you know
15:05these these things are primarily data
15:08and of course algorithmic Innovations
15:11but but the data is a huge source of
15:12edge for us and so we we don't disclose
15:15the specific training regimes for the
15:19models uh the model certainly not um and
15:22to our knowledge not the data sets
15:25either but um yeah I think everybody is
15:27is um yeah that it's fully homegrown
15:33sunno um there no there there are a few
15:36other so I mean there have been a lot of
15:38non lyrical or or ayrical ones so like
15:41even even stable audio and and some of
15:43the ones that you you called out music
15:44gen um refusion um is is a newer um
15:49believe a little smaller company who who
15:51does do lyrical music generation Uber
15:53duck um although I believe limited to
15:56English perhaps Rap Only
15:59um so yeah there there are some some
16:02some folks around I do think we're at an
16:04inflection point where I expect um the
16:07space will heat up I think um I think
16:11more of these tools will emerge I'm
16:14hopeful that sunno will be um you know
16:19of the future of those tools um we're
16:23we're trying hard to to to be part of it
16:26um but yeah I think I think you're going
16:27to see a lot of a lot of companies and
16:29and um entering the space and and
16:32becoming more interested in
16:37this yeah um I guess for for starters we
16:40have we have a lot of um love and
16:42appreciation for for Discord it
16:44was very helpful and continues to be
16:47very useful for bootstrapping a
16:49community um I think the the primary
16:53consideration from a product perspective
16:55was the ability to just have Your Design
16:59constraints um when when building out
17:04experience um and then also to to your
17:08um before you even get there I think
17:11there is a a barrier to entry for folks
17:14to to to use your tool via VIA Discord
17:18um my family for example I don't think
17:21had played with sunno until we we um had
17:24the web app um it was kind of falling
17:27teeth to get folks to to try it and I
17:29think whenever you're you know trying to
17:31get a tool in people's hands it's
17:33important to take that into
17:34consideration where Discord is
17:36incredible is um having a community
17:42essentially um folks sharing notes on
17:45best practices for for prompting and
17:47stuff and I do think that some of our
17:50heaviest um you know you know some of
17:54our most valued power users um prefer to
17:56use the model via Discord even today
17:58today um that said like um yeah web has
18:02opened up a lot of new opportunities for
18:06and remains to be seen if if this is
18:09true but I'm hopeful that a mobile
18:11experience will um will will be similar
18:16uh to to web so it's also something
18:18we're thinking about is what would asuno
18:22experience on on your your iPhone or
18:24your Android um look and and feel like
18:29what unique opportunities does that form
18:31factor unlock that you can't do on
18:35Discord um and those are things that
18:37we're we're thinking about a lot I guess
18:39last thing I'll say is for for some time
18:42like it's been really neat to see this
18:44this Evolution like originally people
18:46were trying so know to
18:48see what music AI could make they had
18:51played with chat GPT they had used mid
18:53journey and they wanted to see where the
18:55AI was at in terms of music um we're
19:00moving into new user cohorts and and
19:02groups where they're coming to sunno to
19:05see what music they can make as a person
19:07and they don't particularly care about
19:09AI um and it's becoming more of the how
19:12than the what and I think that's been
19:16really exciting to see is how um as we
19:20not move away from uh you know that that
19:23sort of white hot Center of people who
19:25are already interested in AI also
19:27interested in music and live on Discord
19:29um sort of that initial user Persona um
19:32into folks who are you know just
19:35interested in in music and and and have
19:38a less lesser interest in in Ai and
19:41they're not on Discord I think it's been
19:43like really helpful to have these other
19:45uh channels for folks to play with the
19:46tools and um and and yeah we we get a
19:51ton of interesting feedback from those
19:52different communities yeah it's it's
19:55been cool just to see folks who are less
19:57deep than Ai and um how how they're
20:00using the tools get their feedback and
20:02and and all that I think that'll be uh a
20:05big part of our story for for this
20:12uh um we we were we were really excited
20:16about the the partnership and and
20:17continued to be um I think there's a lot
20:19we can do together um it's it's just fun
20:23I I will say I think it I think it
20:25brings a little bit of levity to to um
20:27it it's a it's a cool additional
20:29Dimension to the co-pilot experience
20:31I'll say um and for us it's it's just
20:34been you know we have a ton of respect
20:38um that platform obviously and um it's
20:41it's just been fun to to chat with those
20:43folks about like you know their their
20:45thoughts on on the product and the
20:46experience as well but yeah it's been um
20:49I think it's it's been pretty pretty
20:50lowkey um it's it's been cool to see
20:53again the different use cases that
20:54present themselves on that platform
20:56versus our um you know Discord or web
20:59app and stuffff but um yeah I think
21:02um I think that was really cool I think
21:05um you know that was that was just more
21:07of the the PC it'll be neat to see these
21:09things roll out potentially more more
21:11broadly we we'll see um where where
21:13things go but um yeah just just a lot of
21:18fun honestly I know it's uh not very
21:20maybe not the most intellectual thing to
21:22say but it's uh that's kind of the the
21:25primary we we've really enjoyed it
21:32yeah I think that's the the body of work
21:33is um let's let's figure out ways to
21:36confirm it's the the user's voice and I
21:39think there there are ways to do this
21:40that we're we're exploring um but I'm
21:43personally very very bullish on or
21:45excited about I I have a terrible voice
21:49um the ability to kind of sing into my
21:52phone or even hum into my phone and and
21:55um get essentially like the Next
21:57Generation of of autotune and you know
22:00you you'll never reach the quality or
22:02the the soul or the emotion of um you
22:07know Adele or or somebody like this or
22:09you know some of my favorite singers
22:10don't have conventionally beautiful
22:11voices um but they they hit you in
22:15certain places Neil Young being a
22:18special example of that for me where um
22:20you know Bob Bob Dylan is is another you
22:22know surface level not not beautiful
22:25but exactly um and so you know I don't
22:28think like Beauty um so to speak is is
22:31the only ingredient um I think I think
22:33soul and delivery has a huge component
22:36of it but um I think um it's it's
22:39something we're thinking about is is
22:42yeah that that full spectrum of of
22:44product experiences um where you can
22:47again prompt with with audio as well as
22:51modalities um whilst like ensuring that
22:54they're original um original works that
22:57are coming coming out of um out of the
23:00engine so yeah I I think the core Tech
23:03is there it's more like product
23:05guidelines and um and and designing this
23:09like super thoughtfully um that's that's
23:12really important to us in more of the
23:18question yeah I I do I do have a
23:20question for you I guess the main other
23:22thing I would share is like we
23:25um I mentioned that shift from
23:28it's it's been really neat to see folks
23:31who are really excited about AI already
23:33continuing to use the the tool in that
23:34Community growing additionally we're
23:37we're starting to experience folks who
23:40are playing with AI kind of For the
23:42First Time music lovers and stuff and so
23:45we will be um we'll be doing a series of
23:48experiences I think this year where um
23:52the question why would I make a song
23:54instead of listen to all the great music
23:56out there is sort of implicitly answered
23:58um as is what should I make a song about
24:01because a lot of times people get into
24:03the app and then you're presented with
24:05the blinking cursor problem where it's
24:09clear what you where you should even
24:11start and so um one example this will
24:14will come out in two weeks um the the
24:18Val a Valentine's Day experience um
24:21which will be a little simpler a bit
24:23more guided um more of like a uh you
24:27know a different user experience um that
24:30uh enable folks to make you know love
24:32songs and and Valentine's Day songs for
24:34for their crushes and um friends and
24:37family and stuff so um a little bit
24:41different than than the gore experience
24:42but excited for some of these um drops
24:46or or side experiences that'll sort of
24:49let folks dip a toe in and experience
24:53the tech um in a little bit more of a
24:56whimsical playful um experience um and
25:01so I'm I'm really looking forward to
25:02those I think they're going to be a lot
25:04of fun um yeah as for questions like you
25:08mentioned uh you expressed a lot of
25:11interest in in in voice cloning like
25:14having played with the tool um what else
25:17um jumps to mind that would you know
25:21exciting uh and and Powerful for
25:28thank you yeah I I would love to I would
25:30love to hear their their thoughts I
25:31think um yeah it's um you you said
25:35something a lot of interesting things
25:37there but I I think we we are like
25:40really really laser focused on the
25:42consumer experience um it's been really
25:44cool to see a handful of quite a few now
25:48actually Grammy winning artists using
25:49the tool um but the the the core user
25:54that we're focused on right now is folks
25:55who are experiencing making music for
25:56the first time and so that's folks like
25:59you um I think there's a spectrum here
26:02of total first-time user to like Beyond
26:05say um and we're we're definitely
26:08skewing toward toward consumer um for
26:11For Better or Worse um I'm I'm
26:14personally very excited about it um but
26:17um yeah I think some of these things
26:19will be addressed in in V3 which which
26:21I'll leave you with this um it's it's
26:23coming very soon this this next version
26:24of the model it's trained we're testing
26:26it internally um the
26:30primary improvements will
26:33be um just better better quality
26:38generally um Improvement to the to the
26:40sonics I I grew up in a a small town in
26:43the rural South and we were kind of out
26:46of range of the radio station in the
26:48city and as you drive closer it would
26:50kind of come into into quality and I
26:53feel like each successive model
26:55generation ofso is getting closer to to
26:57that proverbial city um so better audio
27:01quality one two is better prompt
27:03adherence um so primarily that means
27:06lyrics sometimes today you'll put in
27:08lyrics and the model will just be like
27:10Noah I'm not singing
27:11that and that is very frustrating and
27:14that bothers me deeply um so that will
27:19be addressed um improved not fixed um
27:22but but a step in the right direction
27:24song structure tags um will um work
27:28better things like intro outro Verse
27:31Chorus um I bucket that under prompted
27:34hear so the songs will have like a mo
27:36more coherent like structure rather than
27:39just dropping you into a chorus and then
27:42abruptly stopping um which happens right
27:45now so you can expect more build more
27:51down very frustrating to me uh because
27:53you can't land the plane um and so um
27:58that's that's coming and then um yeah I
28:02would say I'd say those are those are
28:04the the big ones and then part of the
28:06last one I suppose is continuation
28:07consistency I but between us I I I quite
28:11dislike the current continuation
28:13workflow I think it's very clunky I
28:14think it is uh not elegant I don't think
28:17it works um as well as it should um and
28:21so improved continuation
28:23consistency and then as part of that
28:25like I think we'll continue to
28:28revisit if clicking the ellipses to the
28:31right of the clip pressing continue
28:33typing in your new lyrics like if if
28:35that's really the optimal user
28:37experience because um I think in fact I
28:40know we can uh do something a bit bit
28:43cleaner easier to use and so um I think
28:46the ux Run continuation will will change
28:48uh to make it a little bit easier to
28:50extend your favorite songs um so yeah I
28:53mean similar to Mid Journey it's early
28:55days like um if you re wind uh you know
28:58a year on on Mid journey I think they
29:00were on B2 or B3 or something and so um
29:04I think that's like the the perfect
29:05analog is these things will not be fixed
29:07they will be improved um it's it's you
29:10know exciting though to follow that
29:13directional era of progress and and uh
29:15you know super exciting to see where
29:26actually um I I think no I think I think
29:31that would be very exciting um I would
29:34be that would be very cool I I would be
29:36thrilled uh to to see that what would
29:41excited is if there's just line in a
29:45song that the greatest producer of all
29:47time is bagging groceries or something
29:51and it's it's this like idea of untapped
29:54potential um and so I think what's more
29:56exciting to me and the reason I'm really
30:00energized by focusing on on the consumer
30:03side of the barbell is if we could
30:07make folks artists who otherwise never
30:10would have been so you hear a lot of
30:14people who got into programming because
30:15they were playing video games and they
30:17wanted to hack the experience and they
30:19sort of realized only later that they
30:21were programming and so similarly here
30:25like rather than focusing on you know um
30:29helping amazing artists when when more
30:31Grammys and stuff I think what would be
30:34really cool to me is
30:35that initial spark where we change mind
30:40shift where it's like maybe I could make
30:41music for people who otherwise would
30:43have just been passive consumers for for
30:45their whole life the identity shift for
30:48the non-m musician to be like no I could
30:51maybe that's available to me like
30:53changing the menu of options for people
30:56I think is is really exciting and you
30:59know that doesn't necessarily mean they
31:00have to use sunno as their only
31:02instrument it's more of the identity
31:03shift of like I can listen to music I
31:06enjoy listening to music but I also can
31:09make music and that's another arrow in
31:10my quiver of of life um and so I think I
31:13think that would be what would be most
31:15exciting to me is like doing that for as
31:21possible cool well thank thank you for
31:23your time uh let's do this again
31:25sometime it was it was great to talk to
31:26you and uh hear your perspective and and
31:29everything and looking forward to