WO2020121225A1 - Production automatisée de musique - Google Patents
Production automatisée de musique Download PDFInfo
- Publication number
- WO2020121225A1 WO2020121225A1 PCT/IB2019/060674 IB2019060674W WO2020121225A1 WO 2020121225 A1 WO2020121225 A1 WO 2020121225A1 IB 2019060674 W IB2019060674 W IB 2019060674W WO 2020121225 A1 WO2020121225 A1 WO 2020121225A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- music
- musical
- audio
- production
- segments
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H7/00—Instruments in which the tones are synthesised from a data store, e.g. computer organs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0033—Recording/reproducing or transmission of music for electrophonic musical instruments
- G10H1/0041—Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
- G10H1/0058—Transmission between separate instruments or between individual components of a musical system
- G10H1/0066—Transmission between separate instruments or between individual components of a musical system using a MIDI interface
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/056—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
- G10H2210/111—Automatic composing, i.e. using predefined musical rules
- G10H2210/115—Automatic composing, i.e. using predefined musical rules using a random process to generate a musical note, phrase, sequence or structure
- G10H2210/121—Automatic composing, i.e. using predefined musical rules using a random process to generate a musical note, phrase, sequence or structure using a knowledge base
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
- G10H2210/151—Music Composition or musical creation; Tools or processes therefor using templates, i.e. incomplete musical sections, as a basis for composing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/571—Chords; Chord sequences
- G10H2210/576—Chord progression
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/011—Files or data streams containing coded musical information, e.g. for transmission
- G10H2240/016—File editing, i.e. modifying musical data files or streams as such
- G10H2240/021—File editing, i.e. modifying musical data files or streams as such for MIDI-like files or data streams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/075—Musical metadata derived from musical analysis or for use in electrophonic musical instruments
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/075—Musical metadata derived from musical analysis or for use in electrophonic musical instruments
- G10H2240/085—Mood, i.e. generation, detection or selection of a particular emotional content or atmosphere in a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/121—Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
- G10H2240/131—Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/311—Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
Definitions
- This disclosure relates to automated music production, in which music is produced in digital musical notation format and rendered therefrom into audio format.
- AI artificial intelligence
- An AI music production system provided by a related technology offers users significantly more choice and flexibility in the control they have over the production of AI music, which extends for example to control over its composition and/or arrangement by sophisticated AI engines. Users can cause tracks in multiple musical styles and of high musical quality to be generated according to customised music production parameters to which drive various aspects of the AI composition and arrangement processes in a simple, effective and intuitive manner.
- the rendering process in particular can cause a significant delay from the time at which the user provides the music production parameters and the time at which the audio render becomes available for listening.
- a significant factor is the computing resources that are required to render digital music (i.e. music in digital musical notation format) into a self-contained audio render.
- multiple virtual instruments (synthesisers) are used in conjunction with various digital effects processing in order to achieve an overall musical effect.
- Careful musical variation is generally needed to achieve the desired musical effect which typically means introducing some form of time- modulation into the settings applied by the virtual instruments, effects etc. (referred to as automation).
- automation generally means that, in order to facilitate customisation, custom tracks will have to be rendered from“scratch” by driving the virtual instruments and digital effects individually according to events encoded in the digital track to be rendered. This is time consuming as a consequence of the significant computational resources that are required. For example, it may take several minutes to fully render a digital track of even medium complexity into audio format.
- the present invention allows a“preview” audio render of a digital track to be generated extremely quickly before a“full” audio render of that track is available.
- the full audio render is generated in the manner described above, whereas the preview track is an approximation created using pre-generated music segments stored in audio format. Audio data of the pre-generated music segments are sequenced in order to provide a reasonable and informative approximation of the final audio render before the generation of the latter has completed (and, in some embodiments, before it has even begun).
- a first aspect of the present invention provides a computer-implemented method of rendering music into audio format, the method comprising: receiving, at a music production system, one or more music production parameters defined by a user for producing a custom piece of music; using the user-defined music production parameters to produce a custom piece of music in digital musical notation format; rendering the custom piece of music into audio format for outputting to the user; and creating, for outputting to the user before the rendering of the custom piece of music has completed, a preview audio render thereof, using pre generated music segments stored in audio format, the music segments having been generated by producing multiple sections of music according to different predetermined music production parameters, and rendering the multiple sections of music into audio format.
- the pre-generated music segments are stored in association with production metadata indicating the predetermined music production parameters used to produce them.
- the preview render is created by matching sections of the custom piece of music to different ones of the pre generated music segments, based on the user-defined music production parameters and the production metadata, and sequencing audio data of the different pre-generated music segments, the preview render comprising the
- lower-level music production parameters may be determined in dependence on these (such as musical parts for different sections), which can be used to produce the custom piece of music and which can also be compared to corresponding low-level parameters used to generate the music segments as indicated by the production metadata, in order to carry out the above matching.
- the step of rendering the custom piece of music into audio format may be instigated in response to a full render request received from the user after the preview render has been made available for outputting to the user. This provides the user with an opportunity to listen to the preview, before deciding whether or not to instigate the full rendering process (for example, having listened to the preview render, he may decide to make further changes to the music production parameters). This is beneficial in term of system resources, as it prevents resources from being consumed in generating full audio renders that are unwanted.
- At least two of the pre-generated music segments may differ in at least one of the following respects: musical dynamics, duration, tempo, melodic content, musical parts, combination of musical parts, musical function of musical parts, instrument or sound characteristics, or melodic content. That or those differences may be indicated by the music production metadata.
- the music segments may have been generated initially in digital musical notation format and rendered therefrom into audio format.
- At least two of the music segments in digital musical notation format may have different: automation settings, dynamics settings, instrument or sound settings, symbolic music data, digital effects settings. That or those differences may be indicated by the music production metadata.
- the one or more music production parameters comprise at least one of: a duration for the custom piece of music, and a timing for a musical event (such as a concentration of musical intensity) within the custom piece of music - which may be referred to herein as a“sync point”.
- the one or more user-defined music production parameters may be processed by at least one music production component of the music production system so as to autonomously determine one or more further music production parameters based thereon, wherein the one or more autonomously-determined music production parameters are used to produce the custom piece of music, and the sections of the custom piece of music are matched to said different ones of the pre-generated music segments by comparing the autonomously- determined music production parameters with the production metadata.
- the one or more autonomously-determined music production parameters may comprise at least one of: one or more musical part parameters for the custom piece of music, one or more section parameters defining the sections of the custom piece of music (e.g. their durations and/or relative ordering), and one or more composition settings for the custom piece of music (e.g. for selecting, from a set of available probabilistic sequence models of a composition engine, one or more probabilistic sequence models for autonomously composing music for at least one section of the custom piece of music).
- the one or more user-defined music productions parameters may indicate one or more user requirements (such as a user-defined track duration and/or one or more sync points(each) having a user-defined timing), and further music production parameter(s)may be determined for producing the custom piece of music in a way that conforms to the user- defined duration and/or sync point(s) (the user’s requirements).
- the further music production parameter(s) may for example define the sections of the custom piece of music (e.g. their ordering and/or duration)and/or one or more musical attributes of each section (e.g. its musical parts, a musical function of musical parts, its musical type such as verse, chorus, middle eight etc.) in a way that satisfies the user requirements and makes sense musically.
- the music production component may for example be an artificial intelligence music production component.
- the sections of the custom piece of music may be matched to said different ones of the pre-generated music segments by comparing the user-defined music production parameters with the production metadata.
- the custom piece of music may exhibit musical variations determined from the music production parameters, and the sections are matched to the pre-generated music segments to approximate the musical variations in the preview render.
- the musical variations may be determined from an intensity curve defined by the music production parameters and the sections are matched to the pre-generated music segments by matching a portion of the intensity curve in each section to at least one of the pre-generated music segments.
- At least one of the music segments may have been generated with a flat intensity curve having a single intensity value across a duration of that music segment. Further or alternatively, at least one of the music segments may have been generated with a time- varying intensity curve having different intensity values within a duration of that music segment.
- the musical variations are introduced by varying at least one of the following: dynamic settings, automation settings, tempo, composition settings, musical parts, a musical function of at least one musical part, and instrument or sound settings.
- the custom piece of music may be a custom arrangement of an existing piece of music and the music production parameters may comprise one or more arrangement parameters for determining the custom arrangement.
- the custom arrangement may be an arrangement of pre-determined music elements.
- At least one of the predetermined music elements may be re-composed automatically, by a composition engine of the music production system, to fit the custom arrangement.
- the music production parameters may comprise one or more composition parameters and a composition engine of the music production system may be caused to autonomously compose at least one music element for use in at least one section of the custom arrangement based on the composition parameters.
- the music production parameters may comprise one or more performance parameters which are used to introduce performance variation into at least one section of the custom arrangement.
- Another aspect of the invention provides a particularly efficient mechanism for creating an audio render of a custom piece of music, in which an overall music production process is implemented in a distributed fashion between a local user device and a remote music production system in communication therewith.
- a user device for creating an audio render of a custom piece of music comprises: a network interface for communicating with a remote music production system; a user interface for receiving user inputs from a user of the user device; memory; one or more processors configured to execute computer-readable music production code which is configured, when executed, to cause the one or more process to carry out the following operations: downloading from the remote music production system and storing in the memory of the user device a set of pre-generated music segments in audio format; processing user inputs received at the user interface to determine at least one music production parameter for producing a custom piece of music; generating and transmitting to the music production system at least one electronic message comprising the at least one music production parameter; receiving from the music production system arrangement instructions for creating at the user device an audio render of the custom piece of music; and sequencing audio data of at least two of the pre-generated music segments according to the arrangement instructions, and thereby creating an audio render of the custom piece of music comprising the sequenced audio data.
- a further aspect of the invention provides a computer program product comprising computer- readable code stored on a non -transitory computer readable storage medium, which is configured, when executed on one or more processors, to carry out any of the above operations.
- Figure 1 shows a schematic block diagram of a music production system
- Figure 2 shows how an incoming job request may be handled by a music production system
- Figure 3 shows a high-level overview of a music production system with the core system components arranged in a stack;
- Figure 5 illustrates one example architecture of a composition engine for generating music segments for multiple musical parts
- Figure 6 shows a flow chart for a method of generating a track in response to a request from a user
- Figure 7 shows a schematic illustration of a possible structure of a settings database
- Figure 8 illustrates a hierarchical selection mechanism for selecting track settings
- Figure 9 shows a schematic block diagram of an application programming interface
- Figure lO shows a flow diagram illustrating a method of editing a musical track
- Figure 11 shows a schematic block diagram of a music production system which incorporated preview rendering functions
- Figure 13 illustrates by example the principles by which musical variations may be approximated in a preview render
- Figure 14 illustrates by example the principles by which a preview render may be created
- Figure 16 shows a system architecture in which preview rendering functions may be implemented effectively.
- music is pre-generated in a music production system in a way that can be arranged subsequently in a customizable fashion.
- a pre-generated piece of music may be referred to herein as a track or song.
- a track comprises a set of pre-generated music segments in digital musical notation format that can be arranged in different ways.
- an“audio clip pack” is created - that is, a number of musical segments, but in audio format (audio clips) that can be combined later.
- metadata describing the segment is stored (production metadata). That metadata describes how the segment is permitted to be used in a musical arrangement.
- a user can subsequently select any pre-generated track via a front-end interface provided by the music production system (such as a website, API etc.). Once selected, they can input certain user-set parameters, such as (i) track duration and (ii) point(s) at which the music should climax (sync points).
- These user-set parameters are passed to a back-end of the music production system, where an arrangement is automatically created that satisfies the parameters.
- the metadata associated with each segment - which describes how that segment can be used in the arrangement - is, at this point, used to create an arrangement that uses the segments in ways they are permitted to be used.
- the term“arrangement” refers to a set of instructions for sequencing audio data of selected audio clips (preview rendering instructions 132 in Figure 11 - see below). In the described examples, this is a two-step process in which an arrangement envelope is generated based on the user-set parameters, and then a selection of the audio clips is made, using the associated metadata, to fit the arrangement envelope using the associated metadata.
- the user-set parameters and the metadata are used in combination to select audio clips for the preview arrangement and determine an order in which the selected audio clips are to be sequenced. That is, the metadata is used to determine an order in which audio data of the audio clips are to be sequenced, so that the clips are not used in orders that don't work musically.
- the requisite segments from the audio clip pack are then sequenced in the order the arrangement dictates. This provides a preview of the track that the user can listen to virtually immediately.
- the AI music production system can use AI to compose and/or produce original music.
- a feature of the above AI music production system is an application programming interface (API) that gives developers access to the full power of the AI composition and production system, allowing a user to automatically create professional quality, customised music at scale. It is noted however that the invention is not contingent on the provision of an API.
- An API is one mechanism by which a user can provide custom music production parameters (composition, arrangement etc.) for processing within the music production system. However, in general these can be provided by any suitable mechanism (such as a Web interface). It will therefore be appreciated that all descriptions below in relation to the API applies equally to a context in which the music production parameters are provided by some other means.
- the described API is an API for audio and MIDI. That is, with the API a user can generate both audio files and their underlying compositions in MIDI format. This description focuses on the audio generation aspects, which allow a user to:
- a broad range of applications can be powered using the API, including video creation, games, music making, generating music to accompany visual or other content in a variety of contexts, podcasting and content automation.
- the music of the track is first arranged in digital musical notation format and then rendered into audio format.
- the arrangement can be an arrangement of newly-composed music segments, existing music segments (in an editing context) or a combination of both.
- FIG. 3 shows a block diagram of the AI music production system which gives a high-level overview of some of its core functions that are described in further detail later.
- AI artificial intelligence
- ML machine learning
- expert rules-based
- ML machine learning
- production engine 3 which broadly represent two core aspects of the system’s functionality. These are shown arranged as layers in a stack, with the composition engine below the production engine to reflect their respective functions. Different possible structures of the stack are described later, but these all broadly follow this division between composition and production.
- MIDI is a standardised and widely used way of representing scores, but the term applies more generally to other formats, including bespoke formats.
- the composition engine preferably operates based on machine learning (ML) as described later.
- music segment and “musical segment” are synonymous and refer generally to any segment of music in digital musical notation format unless the format is otherwise specified.
- Each segment can for example be musical bar, fraction of a bar (e.g. crotchet, quaver, semi -quaver length segments etc.) or a sequence of multiple bars depending on the context.
- a music segment can be a segment within a longer musical score.
- a musical score can be made up of multiple musical parts (corresponding to different performative voices e.g. vocal parts, instruments, left and right-hand parts for a particular instrument etc.).
- each part is generally scored on a separate staff (although a chord part for example could be scored using chord symbols) and viewed from this perspective each music segment could correspond to a bar, a fraction of a bar or sequence of bars for one of the parts.
- MIDI segments refers to a music segment in MIDI format.
- individual MIDI segments can be embodied in separate MIDI files or data streams, different MIDI segments can be embodied within the same MIDI file or data stream. It is also possible to embody MIDI segments for different musical parts within the same MIDI file or data stream, e.g. using different MIDI channels for different parts, as is known in the art. Accordingly, in the following description, MIDI loops and individual segments of a MIDI loop or part may both be referred to as music segments. It will be clear in context what is being referred to.
- a core function of the production engine 3 is taking a set of one or more MIDI segments and converting them to audio data that can be played back - referred to herein as audio rendering.
- a self-contained representation of the track in audio format is referred to herein as an“audio render”. This is a complex process in which typically multiple virtual instruments and audio effects (reverb, delay, compression, distortion etc.) are carefully chosen to render different MIDI segments as individual audio data, which are“mixed” (combined) synergistically to form a final "track” having a desired overall musical and sonic effect or“soundscape” where the track is essentially a musical recording.
- the role of the production engine is analogous to that of a human music producer and the production engine can be configured based on expert human knowledge. However, in use, the production process is an entirely automated process driven by a comparatively small number of selected production parameters.
- the production engine is also an AI component, and can be implemented either as an expert (rules-based), non-ML system, an ML system or a combination of rules-based and ML processing.
- One key service provided by the system is the creation of piece of music, in the form of an audio track (e.g. WAV, AIFF, mp3 etc.)“from scratch”, which involves the composition creating MIDI segments that form the basis of the track that is produced by the production engine, by synthesising audio parts according to the MIDI segments that are then mixed in the manner outline above.
- This is referred to herein as a“full stack” service.
- a benefit of the system architecture is its ability to offer individual parts of the functionality of the production engine or the composition engine as services.
- One such service is“production as a service”, whereby a composer can provide to the system MIDI segments that he has composed, where in this context it is the AI system that assumes the role of producer, creating a finished audio track from those MIDI segments.
- This offers the functions of the production engine as a standalone service and is essentially the opposite of MIDI as a service.
- Production as a service is particularly useful for composers who lack production skills or inclination.
- Another important service in the present context provides the ability to edit/re-arrange existing tracks/compositions.
- All of the services can be accessed via an access component 14 in the form of an application programming interface (API), such as a web API, whereby API requests and responses are transmitted and received between an external device and an API server of the system via a computer network such as the Internet.
- API application programming interface
- the access component 14 comprises a computer interface to receive internal and external requests as described later.
- composition refers to the creation of the essential musical elements that make up a track, which are then arranged to create a piece of music with convincing long-term structure. These can both fall within the remit of a composer, or they can be quite separate stages, and historically this has been dependent to a certain extent on the style of music.
- composition and arrangement can essentially be performed as one.
- composition as it is used herein can refer to composition that incorporates arrangement or element composition depending on the context.
- the composition, arrangement and performance functions can be implemented as essentially standalone functions of the production engine, which take MIDI segments from the composition engine, and arrange and humanise them respectively.
- the MIDI segments could be short loops that are strictly time quantised to fractions (e.g. 1/16 or 1/32) of a bar. These can then be arranged (e.g. according to a verse-chorus type structure), and performance can be added by adding a degree of variation (temporal, velocity, pitch etc.) to approximate an imperfect human performance.
- a degree of variation temporary, velocity, pitch etc.
- humanisation in particular is an optional component, and may not be desirable for every type of music (e.g. certain styles of electronica).
- composition engine 2 A possible structure of the composition engine 2 is described below. First certain underlying principles that feed into the design of the composition engine 2 are discussed.
- a Probabilistic Sequence Model is a component which determines a probability distribution over sequences of values or items. This distribution can either be learned from a dataset of example sequences or fixed a priori , e.g. by a domain expert. By choosing an appropriate dataset or encoding suitable expert knowledge, a PSM can be made to reflect typical temporal structures in the domain of interest, for example, typical chord or note sequences in music.
- a PSM can be used to generate sequences according to its distribution by sampling one item at a time from the implied probability distribution over possible next items given a prefix of items sampled so far. That is, each item is selected according to a probability distribution of possible items that is generated by the PSM based on one or more of the items that have been chosen already.
- the items are music segments, which may for example correspond to a fraction of a bar (e.g. 1/16, 1/32 etc.) at the level of the composition engine but which can be segments of any length depending on how the PSM is configured. Each music segment can for example correspond to an individual note or chord at a particular point in the sequence.
- PSMs include Markov chains, probabilistic grammars, and recurrent neural networks with a probabilistic final layer (SOFTMAX etc.).
- a musical event is a complex object that can be described in terms of a potentially unbounded number of aspects or attributes pertaining to the event, including intrinsic properties such as pitch, duration, vibrato etc., but also the event’s relationships with its context, such the underlying harmony, its position in time, whether a note is higher or lower than the previous note, etc. Focusing on a limited number of these“viewpoints” allows a PSM to focus on capturing the probabilistic structure in certain aspects of musical sequences (in order to obtain a tractable model) whilst leaving others to be dealt with by some other system.
- Two PSMs can be coordinated by sharing one or more viewpoints; for example, values for a viewpoint can be generated from one PSM and fed in as constraints on the sampling space from the other. This vastly reduces the complexity of the modelling problem.
- a modular approach to working with viewpoints means that PSMs can easily be created to model arbitrary combinations of viewpoints, whilst ensuring consistent coordination between the PSMs, both during training and generation.
- A“divide and conquer” approach to solving the complex composition problem is to provide specialised PSMs for particular musical attributes (in particular styles).
- one PSM may emphasize in producing chord symbols with durations, and another might emphasize in chord symbols and melody note pitches and durations.
- each PSM can focus on modelling its combination of attributes accurately, leading to high-quality, musically convincing output.
- the loose coupling of PSMs means that they can be used freely in combinations chosen at the point of servicing a composition request, allowing the system to be flexible in the choice of numbers and kinds of parts that can be generated for one composition.
- Certain PSMs can be used in a way which allow the outputs of one to be the (perhaps partial) inputs of another. For example, A PSM over melody notes with chord symbols could be conditioned to match the chord symbol produced by a different PSM. This promotes coherence between parts, and allows the composition engine 2 to take advantage of the modularity of the multiple PSM approach without sacrificing musical quality.
- FIG 4 shows further details of one possible configuration of the composition engine 2 according to the principles set out above.
- the task is divided between multiple neural networks but these could be other forms of PSM as indicated.
- the composition engine 2 is shown having an input 402 and an output 404, which are an internal input and output respectively.
- the composition engine input 402 is configured to receive requests for MIDI segments, each having a job identifier (ID) assigned as described below.
- ID job identifier
- a key function of the composition engine is generating musically cooperating music segments for different musical parts, which are structured to be performed simultaneously to create a coherent piece of music.
- the MIDI segments can be midi "loops" which can be looped (repeated) in order to build up a more complex track. If different MIDI loops are provided for different musical parts, these can be looped simultaneously to achieve the effect of the parts playing together. Alternatively, multiple parts can be captured in a single MIDI loop. However, the principles can be extended such that the composition engine 2 provides longer sections of music, and even a complete section of music for each part that spans the duration of the track.
- Music segment(s) for multiple musical parts can be requested in a single job request. Where different passages of music are requested separately (e.g. verse and chorus), these can be requested by separate job requests, though the possibility of requesting such passages of music in a single job request (e.g. requesting verse and chorus together) is also viable.
- job request(s) correspond to the job requests of Figure 2 (described below), but are labelled 406a, 406b in Figure 4. Note that these job requests could be received directly from an external input of the access component (see Figure 1, below), or be received as an internal job request as explained with reference to Figure 2.
- Each job request comprises the job ID and a set of musical composition parameters, which in this example are:
- composition layer 2 is shown to comprise a plurality of composition modules, labelled 408A and 408B.
- Each composition module is in the form of a trained neural network, each of which has been trained on quite specific types of musical training data such that it can generate music in a particular style.
- the composition modules are referred to as networks, but the description applies equally to other forms of ML or PSM composition module.
- composition parameters in each job request 406a, 406b are used both to select an appropriate one of the networks 408A, 408B and also as inputs to the selected network.
- each of the predetermined styles is associated with a respective plurality of networks.
- Figure 4 shows the first networks 408A associated with a first style (Style A) and the second networks 408B associated with a second style (Style B).
- a composition controller 408 of the composition engine 2 selects an appropriate subset of the networks to service that job request.
- the network subset is selected on the basis that is associated with the musical style specified in the job request.
- chords and melody - can be requested in the same job request. This applies both to internal and external requests to the composition engine 2.
- MIDI segment(s) generated in response to each job request 506a, 506b are stored in a job database (24, Figure 1) in association with the assigned job ID.
- MIDI segments could be stored in a separate database and all description pertaining to the job database in this context applies equally to the separate database in that event.
- networks associated with a particular style cooperate to produce a plurality of musically cooperating elements. This is achieved by providing outputs of the networks as input to other networks in a hierarchical relationship.
- Figure 5 shows three networks associated with Style A: chord (CN), melody (MN) and harmony (HN), which correspond to the first networks 408A in Figure 4.
- each of the networks CN, MN and HN is shown configured to receive as inputs composition parameters 502 determined by the composition controller 408 of the composition engine 2 in the manner described above. Although shown as the same input, the network need not receive exactly the same parameters, and each can receive different selections of the composition parameters for example.
- the chords network CN is configured to generate a chord sequence (progression) 504 based on the parameters 502. This need not be MIDI, and could for example be a symbolic chord representation, but it may be convenient (though not essential) to convert it to MIDI for subsequent processing.
- the generated chord sequence is stored in the job database in association with the applicable job ID.
- the melody network MN receives, as input, the generated chord sequence 504 and generates a melody 506 based on the chord sequence 504 and the composition settings 502, to accompany the chord sequence in a musical fashion. That is, the melody 506 is built around the chord progression 504 in the musical sense.
- the generated melody 506 is also stored in the job database 24 in association with the applicable job ID.
- the melody 506 is inputted to the harmony network HN.
- the harmony network HN generates, based on the composition settings 502 and the melody 506, a harmony 508 which it outputs as a MIDI segment, which is a harmonization of the melody 506 in the musical sense.
- the harmonization network HN may also receive the chord sequence 504 as input, so that is can harmonize the melody 506 and also fit the harmony 508 to the chord sequence 504.
- the generated harmony 508 is also stored in the job database 24 in association with the applicable job ID.
- chord sequence 504, melody 506 and harmony 508 can be requested in the same job request, and in that event are stored together in the job database 24 in association with the same job ID.
- each network can be, but need not be MIDI - it could be some other digital musical notation format, such as a bespoke format (see above). It may be convenient, where the output is not MIDI, to convert it to MIDI later, but this is not essential.
- Networks can also take, as input, external MIDI, such as a user-generated or library MIDI segment and compose around this.
- external MIDI such as a user-generated or library MIDI segment
- percussion Another example of input that a network can compose to is percussion, which can be user or ML generated.
- the percussion can for example drive the rhythm of the composed segments, or the emphasis that is placed on certain notes (where emphasis/velocity is handled at the composition engine 2).
- FIG. 1 is a schematic block diagram illustrating one possible configuration of a music production pipeline 1 of the music production system.
- the music production system is organised into four layers or components. It will be evident from the following that there may be some overlap between functionality of the individual layers or components, but the following description illustrates clearly how the generation of a piece of music is organised in the music production system.
- the music production system operates to receive a group of settings, which will be described in more detail later, and generates a piece of music.
- a piece of music is referred to as a‘track’, but it will be understood that the system can produce music of any length / character.
- the track may be generated as a musical score in a digital musical score notation, such as MIDI, or in audio.
- MIDI MIDI
- a conversion layer may be provided within the system which converts a notation score into MIDI. It will be appreciated that this conversion layer could form part of the composition engine itself or could form part of another layer in the system that could receive a score and convert to MIDI for the purpose of using MIDI.
- a production management component (controller) 13 manages the layers of the system in the manner described below.
- the controller 13 handles both internal and external requests, and instigates functions at one or more of the layers as needed in order to service each request.
- Reference numeral 2 denotes the composition engine.
- the composition engine operates to receive a group of settings, which will be described in more detail later, and generates MIDI segments to be arranged and produced into a track. It generates segments of music in a symbolic format, to be arranged and produced into a track. It uses a collection of PSMs to generate the segments of music. These PSMs have been trained on datasets of music tracks chosen to exemplify a particular musical style.
- the composition engine determines which PSMs to employ on the basis of the input settings.
- Reference numeral 4 denotes an arrangement layer.
- the arrangement layer has the job of arranging the MIDI segments, produced by the composition engine 2 into a musical arrangement. The arrangement layer can be considered to operate in two phases.
- a first phase it receives arrangement parameters which will be described later and produces from those parameters a musical arrangement as an envelope defining timing and required sequences etc.
- the arrangement functionality of the arrangement layer is marked 6.
- This envelope defines the musical arrangement of a piece.
- these settings can be used to request MIDI segments from the composition engine 2, through the production manager.
- a second phase of the arrangement layer is the sequencing function 8.
- MIDI segments are sequenced according to the arrangement envelope into a finished piece of music.
- the MIDI segment may be provided by the composition engine (as mentioned earlier) or may be accessed from a pre-existing library of suitable MIDI segments, which can be generated in advance by the composition engine 2.
- the production management component 13 may for example check the library to see if suitable pre-existing MIDI is available, and if not instigate a request to the composition engine 2 to generate suitable MIDI.
- the library check can be performed at the composition engine 2 in response to a request, or alternatively the library check can be omitted altogether.
- MIDI segments may be introduced by an external user as will be described in more detail later.
- the arrangement layer 4 provides an arranged piece of music in MIDI form. In some situations, this 'raw' piece of music might be suitable for some purposes. However, in those circumstances, it will not be playable in any useful form. Therefore, a performance layer 10 is provided which adds performance quality structure to the piece of music produced by the arrangement layer 4.
- the arrangement layer generates a musical arrangement structure using the settings, which has a set of time sequenced sections for which it then requests MIDI from the composition engine (or elsewhere, e.g. from a library), and which in turn are sequenced according to the arrangement structure.
- the performance layer outputs a performance quality piece of music in MIDI. There are many applications where this is useful. However, similarly, there are other applications where an audio version of the piece of music is required. For this, an audio rendering layer 12 (audio engine) is provided which outputs a performance quality piece of music rendered in audio.
- the conversion or rendering of a piece of music MIDI to audio can be done in a number of different ways, and will not be described further as these include ways that are known in the art.
- the music production engine has an access component 14 which can be implemented in the form of an API (application programming interface).
- This access component enables communication within the music production system (in particular, the production management component 13 can communicate with the composition engine 2 via the access component 14 - see below), and also enables functionality to be provided to external users.
- the side of the access component 14 facing the music production system will be considered to be responsible for internal routing between the layers via the production management component, whereas the side facing away will be responsible for inputs and outputs from an external user. It will be appreciated that this is entirely diagrammatic and that the API could be implemented in any suitable way.
- an API is implemented using a piece of software executing on a processor within the API to implement the functions of the API.
- the API has at least one external input 16for receiving job requests from an external user and at least one external output 18 for returning completed jobs to an external user.
- the API enables communication between the internal layers of the music production system as will be described.
- a request for tags can be input by a user which retrieves a list of tags which are usable in providing settings to create a musical track.
- Tags include musical styles such as piano, folk et cetera.
- a full list of tags is given below by way of example only.
- Tags are held in a tags store 20. Such a request can also be used to request settings that are useable within the system if desired.
- Metadata and genre tags can be defined, such as mood and genre tags.
- genre tags include: Piano, Folk, Rock, Ambient, Cinematic, Pop, Chillout, Corporate, Drum and Bass, Synth Pop.
- mood tags include: Uplifting, Melancholic, Dark, Angry, Sparse, Meditative, Sci-fi, Action, Emotive, Easy listening, Tech, Aggressive, Tropical, Atmospheric. It may be that the system is configured such that only certain combinations of genre and mood tags are permitted, but this is a design choice. Note that this is not an exhaustive list of tags - any suitable set of tags can be used as will become apparent in due course when the role of the tags in selecting composition and production settings within the system is described.
- a library query can be provided at the input 16, the library query generates a search to a paginated list of audio library tracks which are held in a tracks store 22, or alternatively in the jobs database 24. These can be stored in an editable format which is described later. These are tracks which have been already created by the music production system or uploaded to the library from some other place. They are stored in a fashion which renders them suitable for later editing, as will be described in the track production process.
- the library query for tracks returns the following parameters:
- Job ID - this is a unique identity of a track which has been identified, and in particular is the unique ID allowing the track to be edited
- Tags - these are tags associated with the track identifying the style
- Duration this denotes the length of the piece of music.
- the length of a piece of music is generally around 3 minutes.
- pieces of music may be generated for a number of purposes and may have any suitable duration. As will be appreciated, these are just examples, and the request can return different parameters in different implementations.
- the input 16 can also take requests to create jobs.
- the jobs can be of different types.
- a first type of job is to create an audio track.
- the user may supply a number of audio track create settings which include:
- the system is capable of making some autonomous decisions based on minimal information. For example, the system is capable of creating an audio track if it is just supplied with the duration.
- the production management component 13 itself will determine tags, tempo and sync points in that event.
- the system is capable of generating a track with no input settings - any of the settings can be selected autonomously by the system if they are not provided in the track request.
- the production management component can also generate settings for one or more than one of the layers based on the musical style. When generating a complete track this involves generating, based on the style, both audio production parameters for the audio production engine 3 and composition parameters for the composition engine 2, as described in more detail below.
- a request for an audio track involves use of all of the components of the music production system, including the audio rendering layer to produce a track rendered in audio.
- a request to create a MIDI track uses the composition engine, the arrangement layer and performance layer to produce a track in MIDI. It does not use the audio rendering layer.
- the arrangement layer and performance layer are optional components and the system can be implemented without these.
- the composition engine 2 can be configured to generate fully-arranged MIDI with humanization where desired.
- a second type of request is to edit an existing audio track.
- Tracks are stored in a track library identified by unique job identifiers, in the manner described below.
- a user must supply the ID of the job to edit. Note that this could be achieved by carrying out the library query mentioned earlier in order to identify the correct job ID for the track that is needed to be edited.
- the user can provide a new duration for the track.
- the tempo and sync points can be defined.
- the output of this is a new version of the existing track, edited as defined by the new settings.
- the existing duration can be used if the user does not which to change the duration and wishes to edit some other aspect(s) of the track (or the system could even be configured to select a duration autonomously if none is provided but a change of duration is nonetheless desired).
- the system is able to handle edit requests because sufficient information about the decisions made by the system at every stage is stored in the job database 24 against the track ID as described below.
- the system may also be equipped to handle requests to edit a MIDI track as described later. These can be handled in much the same way as audio track edit requests, but the resulting output is MIDI rather than audio.
- a human user can provide a job request 30 in step lat the input 16 of the API 14.
- the job request 30 can in principle be any of the job types which have been described above, but the present part of the description relates to creation of an audio track or MIDI track.
- the job request 30 defines at least one parameter for defining the creation of those tracks, as described above. Alternatively, as noted, the job request 30 may define no parameters, and all parameters may in that event be selected autonomously by the system.
- ID A a job identifier is assigned to the job request 30. This is referred to herein as ID A.
- the job is then assigned to the production job queue32 which is associated with the production manager 13.
- the allocation of the job ID A to the production queue is denoted by step 3.
- the production manager operates to produce a track.
- the production manager 13 has access to the arrangement layer 4, the performance layer 10 and the audio rendering layer 12. Note that in Figure 2 the performance layer is not shown separately but is considered to be available to the production manager as needed.
- the production manager 13 operates in association with the arrangement layer 4 according to an artificial intelligence model embodied in the production layer. This can be embodied by a decision tree which incorporates human expertise and knowledge to guide the production layer through production of a track, however other implementations are possible. For example, as noted already, the production engine can be implemented using ML. This decision tree causes the production manager 13 to access the arrangement layer 4 as indicated at step 5.
- the arrangement layer 4 operates to provide a musical arrangement which consists of at least timing and desired time signature (number of beats in a bar) and returns an arrangement envelope to the production manager 13 as shown in step 5a.
- the production manager 13 is then activated to request MIDI segments which will be sequenced into the arrangement provided by the arrangement layer 4.
- this is just one possible implementation that is described by way of example.
- the system can be implemented without one or both of the arrangement layer 4 and performance layer 8, with the functions of these layers when desired handled elsewhere in the system, e.g. incorporated into the operation of the composition engine 2.
- This request can also be applied through an API input, referred to herein as the internal API input 17.
- the production manager 13 can generate a plurality of MIDI job requests; for example these are shown in Figure 2 labelled Bl, B2, B3 respectively.
- Each of the MIDI job requests are applied to the internal input 17 of the API 14.
- the API 14 assigns job identifiers to the MIDI job requests, indicated as ID Bl, ID B2 and ID B3 and these jobs labelled with the unique identifiers are supplied to the MIDI jobs queue 34 in step 8.
- the identifiers are returned to the production manager 13. This is shown by step 7.
- the jobs with their unique identifiers are assigned to the composition engine 2 which can generate using artificial intelligence/machine learning individual MIDI segments.
- the composition engine has been trained as described above.
- the composition engine 2 outputs MIDI segments as indicated at step 9 into the job database 24.
- the MIDI segments could be stored in a separate database or could be stored in the same job database as other completed jobs to be described. Each MIDI segment is stored in association with its unique identifier so that it can be recalled.
- the production manager 13 periodically polls the API 14 to see whether or not the jobs identified by ID Bl, ID B2 and ID B3 have been completed as described in the next paragraph. This is shown at step 10. When they are ready for access, they are returned to the production manager 13 who can supply them to the arrangement layer for sequencing as described above. The sequenced segments are returned via the production manager 13 either to an output (when a MIDI track is desired), or to the audio rendering layer 12 (step 12) when an audio track is required.
- Assigning job IDs in this way has various benefits. Because the job ID is assigned to a request when that request is received, a response to that request comprising the job ID can be returned immediately by the API 14 to the source of the request, before the request has actually been actioned (which depending on the nature of the request could take several seconds or more particularly in the case of audio). For example, a request for audio or MIDI can be returned before the audio or MIDI has actually been generated or retrieved. The source of the request can then use the returned job ID to query the system (repeatedly if necessary) as to whether the requested data (e.g. audio or MIDI) is ready, and when ready the system can return the requested data in response. This avoids the need to keep connections open whilst the request is processed which has benefits in terms of reliability and security.
- the requested data e.g. audio or MIDI
- FIG. 1 shows a flow chart for the process that eventually results in a full audio render of a desired track.
- a request for an audio track is one of the job types mentioned above which can be received at the input 16 of the API 14.
- the API provides a computer interface for receiving a request for an audio track.
- an audio track is an audio rendered piece of music of any appropriate length. It is assumed that it is a completed piece of music in the sense that it can be rendered in audio data and listened to as a complete musical composition.
- the incoming request is assigned a Job ID.
- the request can include one or more parameter for creating an audio track.
- a default track creation process involving for example default parameters.
- Such default parameters would be produced at the production management component 13 responsive to the request at the input 16.
- a default duration may be preconfigured at 90s.
- Other default lengths are possible.
- Based on the request multiple musical parts are determined. These may be determined at the production management component 13 based on input parameters in the request supplied at the input 16, or from parameters generated by the production management component. Alternatively, the musical parts may be provided in the request itself by the user making the request. In this case, musical parts may be extracted from the request by the production management component 13. This provides the music production system with extensive flexibility.
- step S602. Audio production settings are also generated from the request. This is shown in step S603. Note that step S602 and S603 could be carried out in sequence or in parallel. They may be carried out by the production management component, or any suitable component within the music production system.
- the audio production settings and musical parts are supplied to the audio rendering component, at step S604.
- a sequence of musical segments in digital musical notation format is supplied to the audio rendering component.
- This sequence is generated by the composition engine or obtained elsewhere and is in the form of MIDI segments.
- These MIDI segments can be generated as described earlier in the present description, although they do not need to be generated in this way.
- an arranged sequence of MIDI segments could be supplied to the audio rendering component 12.
- This arranged sequence could be derived from the arrangement component 4 as described earlier, or could be an arranged sequence generated by a combined composition and arrangement engine.
- an arranged MIDI sequence could be provided by the user who made the audio track request.
- the audio rendering component 12 uses the audio production settings, the musical parts and the MIDI sequence to render audio data of an audio track at step S605.
- the audio track is returned to the user who made the request through the output port 18 of the API component.
- the production management component 13 uses one or more tags to access a database of settings labelled 23 in Figure 1.
- the tag or tags may be defined in the request which is input at the input 16, or may be generated by the production management component from information in the input request, or generated autonomously at the production management component.
- tags appropriate to that style parameter can be requested from the tags database 20.
- one or more tag may be selected at random by the production component 13.
- the structure of the database of settings 23 is shown in Figure 7.
- the database 23 is queryable using tags, because each arrangement settings database object is associated with one or more of the tags. There is no limit to the number of tags which may be associated with a single arrangement settings object.
- the database of arrangement settings objects can be queried by providing one or multiple tags and returning all arrangement settings objects which are marked with all of the provided tags.
- An arrangement settings object 01 is shown in the database 23 associated with tags T1 and T2, but the object 01 can be associated with any number of tags.
- Each arrangement settings object comprises three groups of settings.
- arrangement settings 70 there is a group of arrangements setting 70, a group of composition settings 72 and a group of audio settings 74. This is just an example and there can be more or fewer groups of settings. As will be appreciated, the grouping of the settings reflects the architecture of the system, which can be designed flexibly as noted. For example, arrangement settings 70 may be incorporated in the composition settings 72 where arrangement is handled as part of composition.
- the groups have been defined to co-operate in a finished musical piece in accordance with the style indicated by the tag(s).
- tags can define such things as genre/mood/instruments.
- the settings recalled by the production management components 13 from the settings database 23 are used to control production of the music.
- a particular collection of settings can be selected from each group for each musical part, or one or more of the settings may apply to multiple musical parts.
- An instrument is selected for each part from the group of audio settings for the particular tag or tags. This is denoted by crosshatching in Figure 8.
- One way of selecting the instrument for each part is to select it randomly from the group of settings appropriate to that part.
- Within the audio settings there may be a category of settings associated with each part, for example bass, melody, harmony et cetera.
- a particular sound for the instrument is chosen by selecting a setting from a group of sound settings. This selection may be at random. One or more audio effects may be selected for each sound. Once again, this may be selected at random from a group of audio effects appropriate to the particular sound.
- the production management component 13 uses a decision tree in which knowledge about the suitability of particular instruments for particular parts, particular sounds, for particular instruments and particular audio effects has been embedded.
- the term "sound" in this context means a virtual instrument preset.
- Virtual instrument is a term of art and means a software synthesiser, and a virtual instrument preset refers to a particular virtual instrument preferably together with a set of one or more settings for configuring that virtual instrument.
- the virtual instrument preset defines a particular virtual instrument and the timbre or sonic qualities of the virtual instrument. Different virtual instrument presets can relate to the same or different virtual instruments. E.g. for a virtual instrument which emulates a piano, there might be a preset which makes the virtual instrument sound like a grand piano, and another which makes it sound like an upright piano. It is these presets that the system selects between when choosing the sound for an instrument.
- composition settings associated with the tag can be supplied to the composition engine 2 for controlling the output of MIDI segments to incorporate into the track.
- the arrangements settings 70 associated with the tag can be applied to the arrangement layer 4 for use in determining how the MIDI segments from the composition engine should be arranged as governed by the tag.
- Finished tracks are stored in the job database 24 in connection with the job ID that was assigned to the incoming request.
- the track may be stored in terms of the settings (track settings 80) which were selected to generate it, along with the sequenced MIDI and/or the un-sequenced MIDI loop(s) or other segment(s) output from the composition engine 2, instead of as the audio data itself. Then, this sequenced MIDI can be supplied to the audio rendering component 12 with the musical parts and the selected audio production settings (as in step S604 of the flow of Figure 6) to regenerate the track.
- the track settings 80 are made up of not only the selected audio settings, but also the composition settings and arrangement settings. That is to say, the track settings 80 contain all of the choices made by the production management components and thus all of the settings needed to completely reproduce a track. In order to reproduce an identical track, these stored track settings 80 can be used at step S604 in Figure 6 to create a duplicate track. In this context, the track settings 80 are referred to as reproducibility settings.
- the assigned job ID (ID A) constitutes an identifier of the track.
- the track settings 80 are stored in the job database 24 in association with the track identifier ID A.
- the identifiers ID Bl, ID B2 and ID B3 are stored in the job database 24 in association with the track identifier IDA such that the pieces of MIDI used to build the track can be retrieved using the track identifier ID A. These can be sequenced or un-sequenced MIDI segments, or a combination of both.
- the information stored in the job database 24 in association with ID A is sufficiently comprehensive that the track can be reproduced using that information at a later time.
- FIG. 10 shows an edit request 52 being received at the API 14 in step SI 102.
- the edit request 52 is shown to comprise a job ID 54 of a track to be edited and at least one new setting 56 according to which the track should be edited.
- An edit request is in effect a request to create a brand new track, but doing so using at least one of the settings and/or MIDI segments that were used to generate an earlier track.
- the track being edited can be an audio track or a MIDI track.
- a response 59 to the edit request 52 is returned to a source of the request 52.
- the response 59 comprises a job ID 58 which is a job ID assigned to the edit request 52 itself.
- this job ID 58 of the edit request 52 itself is different to the job ID 54 of the track to be edited, which was assigned to an earlier request that caused that track to be created (this earlier request could have been a request to create the track from scratch or could itself have been a request to edit an existing track).
- the edit request 52 is provided to the production management component 13 in the manner described above.
- the production manager 13 queries (SI 108) the job database 24 using the job ID 54 in order to retrieve the track settings 80 associated with the job ID 54, which it receives at step SI 110.
- the track settings 80 comprise one or more references to MIDI segments used to create the track these can also be retrieved by the production manager 13 if needed.
- references can be in the form of job IDs where the MIDI segments are stored in the jobs database 24 or they can be references to a separate database in which the MIDI segments are held. From this point, the method proceeds in the same way as described with reference to Figure 6 but for the fact that the track settings used to create the edited version of the track are a combination of one or more of the track settings 80retrieved from the job database 24 and the one or more new settings 56 provided in the edit request 52.
- a new setting 56 is a track duration, which a user can provide if he wants to create a longer or shorter version of an existing track.
- all of the original track settings 80 can be used to create the edited version of the track, along with the original MIDI segments, but with the original duration substituted for the new duration.
- new MIDI segments could be composed that are more suitable for the new duration, which involves an internal request to the composition engine 2. This is just a simple example and more complex track editing is envisaged.
- the production manager 13 may in fact select such new setting(s) 56 itself in response to the edit request 52, for example by selecting additional settings based on a setting indicated in the edit request 52 or by selecting new setting(s) autonomously by some other means.
- the job ID 58 assigned to the edit request 52 is stored in the job database 24 in the same way as for other requests along with the track settings for the edited track which are labelled 80'.
- the track settings 80' are the settings that have been used to generate the edited version of the track and as noted these are made up of a combination of one or more of the original track settings 80 with the new setting(s) 56 determined in response to the edit request 52 in the manner described above.
- the audio rendering functions referred to above are used in the present context to generate what is referred to herein as a“full” (desired) audio render of an arranged track.
- This may be a custom arrangement of an existing track which a user has caused to be created by providing arrangement parameters such as duration, sync points/intensity curve etc.
- MIDI is arranged, sequenced and automated in accordance with the arrangements parameters in the manner described above to provide a complete track in digital music notation format. This is then rendered into audio format by the audio rendering component 12 as described above, thereby generating the full audio render.
- preview rendering functionality allows a“preview” render of the custom arrangement to be created very quickly (typically within the space of a few seconds or less, but in any event in less time than it takes to generate the full audio render) so that the user may get some sense of the musical structure of custom arrangement he has caused to be generated before the full audio render is available.
- the preview render is only an approximation of the full audio render and will generally be of lower musical quality.
- an acceptable trade-off between musical quality and preview rendering speed may be attained.
- the preview rendering functionality is provided at least in the contexts of audio track creation and audio track editing as described above.
- Figure 11 shows a functional block diagram of the above music production system into which a preview controller 130 has been incorporated. Note that Figure 11 does not show every detail of the music production system but only shows the components of it that are considered relevant in the present context. In particular, the description of the flow of messages and data within the music production system is not repeated for the sake of conciseness. However, it will be appreciated that the steps and functions described in relation to Figure 11 may be carried out in accordance with the messaging and data flows described above in relation to the earlier Figures.
- Figure 11 shows the production manager 13 receiving a set of arrangement parameters 100 of the kind described above, such as track length (duration), sync point/intensity curve etc. These may for example be received in a track edit request comprising an identifier of an existing track to be edited.
- Reference numeral 102 denotes the track ID of the existing track in Figure 11.
- the production manager 13 processes the arrangement parameters 100 and track ID 102 in the manner described above so as to cause the arrangement layer 4 to generate a custom arrangement of the digital track in digital musical notation format.
- the custom arrangement 110 comprises MIDI tracks containing MIDI sequences which are generated by sequencing MIDI segments 104 stored in association with the track ID 102.
- the custom arrangement 110 comprises rendering settings 112 for rendering the MIDI tracks such as automation settings 112a, virtual instrument (synthesiser) settings (e.g. virtual instrument preset(s)) 112b and audio effects (FX) settings 112c over the duration of the track.
- these are determined by modifying track settings 106 stored in association with the existing track ID 102 in accordance with the arrangement parameters 100 provided by the user.
- the audio rendering component 12 processes the custom arrangement 110 to produce a full audio render thereof denoted by reference numeral 114.
- preview rendering steps are carried out as described below, to provide a quick but nonetheless informative preview audio render.
- the first of these intermediate steps is the generation of an arrangement envelope which is denoted in Figure 11 by reference numeral 118.
- the arrangement envelope 108 is generated by the arrangement function 6 of the arrangement layer 4 from a set of available“section templates” 116 which are selected independence on the user-defined arrangement parameters 100.
- the arrangement envelope 108 defines a sequence of musical sections for the custom arrangement 110 each of which is assigned one of the section templates 116 by the arrangement function 6.
- Figure 11 shows three sections 118a, 118b and 118c of the arrangement envelope 108 however it will be appreciated that this is purely an example and an arrangement envelope can have any number of musical sections. Musical sections could for example corresponded to a verse, chorus, middle-eight etc. of a track to be structured in that manner.
- the arrangement envelope 108 also defines a duration 120a, 120b and 120c for each section 118a, 118b and 118c.
- a typical section may have a duration that is a whole number of musical measures (e.g. 2 bars, 4 bars, 8 bars, 16 bars etc.). However, this is not an absolute rule and sections of unrestricted duration may be incorporated in order to accommodate a user-defined duration for the whole arrangement. For example, such sections could be included at the start and/or end of the arrangement envelope 108 or at a suitable pause point(s) within it.
- Figure 11 shows a time- varying intensity curve 122 defined over the cumulative duration of the sequence of sections.
- This can be used to provide musical variation across the various sections and within each section individually, by defining changes in musical intensity over time. In the present example, this is the mechanism by which more granular musical variation is incorporated into the custom arrangement 110.
- the intensity curve 122 is preferably set in accordance with the arrangement parameters 100. For example, it may be defined by one or more sync points of the arrangement parameters 100 or the user may be offered more fine-grained control over the shape and structure of the intensity curve 122 depending on the implementation and/or the user’s preferences.
- the intensity curve 122 is shown gradually rising to an intensity peak 123 throughout the first section 118a and the majority of the second section 118b before dropping off again somewhat more rapidly for the remainder of section 118b and the third and final section 118c.
- the intensity peak 123 could for example be defined by a sync point of the arrangement parameters 100.
- the MIDI sequencing component 8 of the arrangement therefore receives the arrangement envelope 108 and associated intensity curve 122 and sequences the MIDI music segments 104 associated with the track ID 102 in order to populate the defined musical sections 118a, 118b and 118c with composed MIDI in accordance with the respective section templates 116a, 116b, 116c associated therewith.
- the result is corresponding musical sections 124a, 124b, 124c in the custom arrangement 110 having the defined durations 120a, 120b and 120c and populated with sequenced MIDI spanning within those respective durations which has been selected from the existing music segments 104 in accordance with the applicable section templates 116a, 116b and 116c.
- the intensity curve 122 is used to introduce musical variation across the corresponding sections 124a, 124b and 124c of the custom arrangement 110.
- the intensity curve 122 can be used to set and vary note“velocity” across the duration of the custom arrangement by modulating a velocity of each note based on the value of the intensity curve 122 at a time of that note.
- Note velocity is a known concept in electronic music production and sets the dynamics according to which a note is rendered by a virtual instrument (analogous to how hard or soft a traditional musical instrument is played). This provides a simple but effective way of varying the musical dynamics across the custom arrangement 110 in accordance with the user-defined arrangement parameters 100.
- the intensity curve 122 can be used to introduce any desired musical variation within the custom arrangement by modifying one or more of the automation settings 112a, settings of the virtual instruments 112b and setting associated with the audio effects 112c (such as FX settings, send channel s/audio routings etc.). This can be achieved via a mapping function which maps the intensity curve 122 at different points in time to whichever settings are desired and in any desired manner.
- the mapping function can be section-specific as described below.
- the preview controller 130 of Figure 11 controls the generation of a preview render of the custom arrangement 110 for outputting to the user before the full audio render 114 is available.
- the preview controller 130 is shown receiving the arrangement envelope 108 and associated intensity curve 122 generated by the arrangement function 6, which it uses to determine a set of preview rendering instructions 132 for creating a preview render using audio data of a predetermined“audio clip pack” (see below). This is significantly quicker than the full rendering process hence the preview render can be made available to the user much more quickly than the full audio render 114.
- Figure 12 shows an example of a section templatel 16n in further detail.
- Figure 12 is a high- level block diagram illustrating certain structure of the section template 116n.
- the structure of the section template 116n can be embodied using any suitable data format. Reference is made to both Figure 11 and Figure 12 in the description below.
- the section template 116n defines one or more musical parts 202a for use in a section that is arranged according that section template 116n.
- the section template 116n also defines a musical function 202b for each of those musical parts.
- the musical parts 202a may for example correspond to different instruments (such as piano, bass, drums/percussion, strings, synth etc.) and each of those musical parts may have a musical function 202b such as lead, harmony, chords etc.
- the section template 116n also defines a mapping function 202c for a section arranged according to that template.
- the mapping function 202c defines a mapping of intensity values to the rendering settings 122. It is this mapping function 202c that defines how each intensity value on the intensity curve 122 is mapped to one or more of the rendering settings 112, such as automation 112a. That is to say, the mapping function 202c defines how the intensity curve 122 maps onto the rendering settings 112 of the custom arrangement 110 and therefore allows the one or more settings in question to be modulated over the duration of the custom arrangement 110 according to the intensity curve 108.
- the section template 116n also defines musical intensity limits 202d, which are upper and lower limits on the possible intensity values for any section arranged according to that template (denoted I m in and I ma x respectively). This accounts for the fact that certain arrangements of musical parts 202a and their musical functions 202b may only be musically appropriate within certain intensity limits. For example, it may be that a certain arrangement of musical parts is not considered appropriate for overly“soft” dynamics hence the lower intensity limit I m in in that event would be increased to prevent that section template from being used with intensity values below that limit.
- the MIDI sequencing function 8 uses the above information within the section template 116n to arrange, sequence and automate MIDI in a section of music having that template.
- the intensity curve 122 can be used to drive an essentially infinite range of musical variation within sections in accordance with the respective mapping function assigned to each section. Hence there are infinite possibilities when it comes to introducing musical variation into the final audio render 114.
- the scope for introducing musical variation is much more limited. This is an acceptable trade-off that is made in order to be able to provide the preview render quickly, and by implementing the preview rendering method disclosed below an adequate degree of musical variation can still be introduced to approximate the finer musical variation that will eventually be exhibited in the final audio render 114.
- each section template 116n has associated with it a set of pre-generated audio clips (that is music segments that have been pre-rendered into audio format) which is referred to herein as a“section audio clip pack” 204n for the section template 116n.
- a“section audio clip pack” 204n for the section template 116n.
- Each section audio clip pack 204 which it is associated with the corresponding section template 116n.
- Each audio clip 208 in the audio clip pack 204 is an audio render of a section of the track in question that has been rendered at a particular section duration 210 and with a particular intensity setting or settings 212.
- a track is associated with a set of section templates which may be used to arrange that track.
- the section audio clip packs of Figure 12 that are associated with those section templates constitute a“track audio clip pack” containing all of the audio clips that may be used to create a preview render of that track.
- the track audio clip pack is associated with the track ID 102 of that track.
- a section audio clip 204n pack may have an associated clip pack ID 206 and/or the track audio clip pack may have an associated clip pack ID (which may comprise the applicable track identifier 102). That is to say, in order to generate each audio clip in the audio clip pack, the MIDI music segments 104 for track ID 102 have been arranged and rendered according to the section template 116n in exactly the same way as described above - i.e.
- intensity value Imin i.e. the lower intensity limit set in the section template 116n
- I ma x i.e. the upper intensity limit set in the section template 116n
- Figure 14 schematically illustrates, by example, how a preview audio render 314 may be generated using audio clip packs.
- the arrangement envelope 108 of Figure l l is shown together with the associated intensity curve 122 defined across the arrangement envelope’s cumulative duration.
- Each section 118a, 118b, 118c is matched to one of the available section clip packs based on the section template 116a, 116b and 116c assigned to that section.
- the associated section clip pack is the clip associated with that section template and the section clip packs associated with section 118a, 118b and 118c are denoted by reference numerals 204a, 204b and 204c in Figure 14.
- an audio clip from the associated section clip pack 204a, 204b and 204c is selected based on the duration of that section 120a, 120b and 120c and the portion of the intensity curve 122 within that section.
- the portions of the intensity curve 122 within the first, second and third sections 118a, 118b and 118c respectively are denoted by reference numerals 122a, 122b and 122c.
- audio data of multiple audio clips may be used for a single section when generating the preview render.
- different musical parts may be rendered as separate audio clips, which can be mixed together within a given section of the preview render.
- a section associated with a particular section template could be divided into multiple sub-sections, and audio data from different clips of the section audio clip pack associated with that section could be used in the different sub-sections (in this respect, it is noted that the term“section” includes a sub-section of this nature unless otherwise indicated). Whilst this marginally increases the complexity of the preview rendering process, the impact is relatively negligible, and it is still significantly faster to create a preview render in this way than it is to generate a full audio render from scratch.
- Certain audio data may also be incorporated into a section at a time that is determined flexibly. For example, a pre-rendered drum fill could be introduced at a point in a section defined by the intensity curve, to match its expected timing in the full render.
- the matching of the intensity curve 122a to the audio clips is an approximate matching in which the audio clip that was rendered with intensity settings closest to that portion of the intensity curve is selected.
- the selected audio clip is not expected to exactly match the corresponding section of the final audio render 114.
- audio clips of duration 4 bars are selected for the first and second sections 118a, 118b from the respective associated section clip packs 204a, 204b, and an audio clip of duration 2 bars is selected for the third section 118c from the associated section clip pack 204c, so as to match the section durations 120, 120b and 120c defined in the arrangement envelope 108.
- the portion of the intensity curve 122a remains at relatively low intensity values therefore the audio clip of the correct duration rendered at the minimum intensity setting I m in (i.e. flat intensity curve at Imin) is selected to approximate that section.
- This clip is denoted by reference numeral 314a.
- the closest matching clip is that rendered at the maximum intensity setting I ma x (i.e. flat intensity curve at I ma x) hence the audio clip rendered at that intensity setting is selected.
- This clip is denoted by reference numeral 314b.
- the two-bar clip at the medium intensity setting d is selected as the closest approximation of the portion of the intensity curve 122c in that section. This clip is denoted by reference numeral 314c.
- the preview audio render 314 is created by sequencing the selected audio clips 314a, 314b and 314c in the order of the corresponding sections 118a, 118b and 118c.
- Figure 13 illustrates how, by applying the above steps, the continuously time-varying intensity curve 122 is approximated as a series of intensity step changes shown on the right- hand side of Figure 13.
- the intensity settings can be used to introduce rich dynamic variations within the final render 114 that will not be fully captured in the preview render 314.
- the preview renders 314 will nonetheless be a reasonable approximation of the overall musical structure of the final render 114 that is still in progress. This is because its constituent audio clips within the section clip packs 204a, 204b and 204c have been pre-produced and rendered using the same music production pipeline as the custom arrangement 110 and using intensity settings which at least approximately match the time-varying intensity settings used to produce the custom arrangement 110.
- Figure 15 shows a preview rendering component 300 which receives the preview rendering instructions 132 from the preview controller 130 and creates the preview render 314 from a set of stored section clip packs 204 in the manner described above. These audio clip packs 204 make up the track audio clip pack for the track in question.
- the preview rendering instructions 132 are shown to define a sequence of three musical sections and, for each of those sections, a section clip pack ID denoted by reference numerals 206a, 206b and 206c for sections 118a, 118b and 118c respectively.
- the instructions 132 also defining the respective durations 120a, 120b and 120c of those sections and respective intensity settings 322a, 322b and 322c for each of those sections which for matching to the audio clips in the associated section clip pack.
- the instructions 132 may identify the audio clips to be sequences.
- the audio rendering instructions 132 can comprise any clip identification data identifying audio clips within the track audio clip pack 204 to be sequenced in order to create the preview render, as well as specifying the manner in which the audio data of those clips should be sequenced.
- each audio clip 208 in each section clip pack 204n has associated production metadata which, in this example, indicates the duration and intensity settings according to which it was generated.
- This production metadata is stored in the section audio clip pack 204n in association with its constituent audio clips.
- intensity settings 322a, 322b and 322c are determined by the preview controller by approximately matching the portions of the intensity curve in each of those sections 122a, 122b and 122c to appropriate audio clip in the section clip pack 204a, 204b, 204c associated with that section, based on the production metadata therein.
- the preview rendering instructions 132 indicate which section clip packs to use and which individual clips within those section clip packs to use to generate the preview render 314. All the preview rendering component 300 needs to do is sequence audio data of the identified audio clips in accordance with the instructions 132.
- arrangement parameters in this case, defining the intensity curve
- the invention is not limited in this respect, and can be applied in the context of any music production parameters - such as arrangement and/or composition parameters.
- the underlying principles remain the same in that case.
- clip packs can be generated containing audio clips that have been pre-composed with different composition settings and rendered into audio format.
- clips within a section template may have different melodic content. This is a consequence of them having been rendered from digital music having different symbolic music data (i.e. different musical notes).
- Arrangement parameters may also be used to determine composition in the final render to some extent.
- the arrangement layer may request that the composition layer re compose a certain section of an existing track in order to better fit a custom arrangement.
- the preview render can use a pre-composed segment selected to approximate the expected output of the composition layer.
- the pre-generated music segments can comprise pre-generated audio renders of the user’s own MIDI compositions.
- Figure 16 shows a schematic block diagram for a particularly efficient hardware architecture for implementing the preview rendering functionality.
- the main audio production pipeline and the preview controller 130 are implemented at a remote music production computer system 350 that is accessed from a local user device 352 via a network 354 such as the Internet.
- the main music production pipeline is denoted by reference numeral 1 and comprises the composition, arrangement, production and audio rendering functionality described extensively above.
- the preview rendering component 300 that actually generates the preview render 314 is implemented at the local user device 352.
- the set of section audio clip packs 204 associated with its track ID 102 - which constitute the track audio clip pack associated with that track - are downloaded from the remote music production computer system 350 to the local user device 352 via the network 354 and are stored in local storage of the user device 352 (step SO).
- This set contains the section audio clip pack associated with every possible section template that can be used to generate that track (so if there are twenty section templates for the arrangement layer 4 to choose from, the twenty associated audio clip packs are downloaded.
- the user provides user inputs at a user interface 356 of the local user device 352 in order to vary the arrangement parameters 100 of the track in the manner described above.
- These arrangement parameters 100 are transmitted to the remote music production computer system 350 in one or more electronic messages such as an edit request to allow the above steps to be carried out at the music production computer system 350 (for example, using the API architecture described above, or via a Web interface etc.). This is shown as step S2.
- the main music production pipeline 1 begins the process of arrangement that will eventually result in the custom arrangement 110.
- the preview controller 130 generates, at the earliest opportunity (i.e.
- the preview rendering instructions 132 referred to above and transmits these back to the user device at step S4 so that the preview rendering component 300 can create the preview audio render 132 using the pre-downloaded section audio clip packs in accordance with those instructions whilst the remote processing within the main audio production pipeline 1 is still ongoing.
- the preview renders 314 can therefore be created and be played out to the user at the local user device 352 promptly after the user has have provided his preferred arrangement parameters 100.
- the generation of the full audio render 114 can be instigated automatically, in response to the user’s initial edit request whilst the user is listening to the preview of the edit.
- the full audio render of an arrangement only begins after a separate request is sent by the user, after they have listened to a preview of that arrangement and decided they want the full audio render version. This is beneficial as not all previews have to be turned into full renders as the user may not wish to proceed on the basis of a current preview render and may which to make further changes first. This can result in a significant saving of computational resources, by discouraging the generation of unwanted full renders.
- the full audio render 114 When the full audio render 114 is eventually completed sometime later, it is transmitted to the user device 352 for playing out to the user. This may be several minutes after the user has provided his preferred arrangement settings 100, but during that time he has already had an opportunity to preview an approximation of its musical structure.
- the various components referred to above and in particular the production management component 13, the production engine 3 (that is, the audio rendering component 12, the performance component 10 and the arrangement component 4), the composition engine 2 the preview controller 130 and the preview rendering component 300 are functional components of the system that are implemented in software. That is, the composition system comprises one or more processing units - such as general purpose CPUs, special purpose processing units such as GPUs or other specialized processing hardware, or a combination of general and special purpose processing hardware - configured to execute computer-readable instructions (code) which cause the one or more processing units to implement the functionality of each component described herein. Specialized processing hardware such as GPUs may be particularly appropriate for implementing certain parts of the ML functionality of the composition engine 2 and the other components also when those are implemented using ML.
- processing units - such as general purpose CPUs, special purpose processing units such as GPUs or other specialized processing hardware, or a combination of general and special purpose processing hardware - configured to execute computer-readable instructions (code) which cause the one or more processing units to implement the functionality of each component described herein.
- the processing unit(s) can be embodied in a computer device or network of cooperating computer devices, such as a server or network of servers.
- the system refers to the overall system which may, as noted, include the user device 352 of which certain functionality is implemented.
- Figure 9 shows a schematic block diagram illustrating some of the structure of the API 14, which is shown to comprise a computer interface 42 and a request manager 44 coupled to the computer interface 42.
- the request manager 44 manages the requests received at the computer interface 42 as described above. In particular, the request manager 44 allocates each request to an appropriate one of the job queues 31 and assigns a unique job identifier (ID) to each request (both internal and external).
- ID job identifier
- the job IDs service various purposes which are described later.
- the API 14 can be implemented as a server (API server) or server pool.
- the request manager 42 can be realized as a pool of servers and the computer interface 42 can be provided at least in part by a load balancer which receives requests on behalf of the server pool and allocates each request to one of the servers of the server pool 44, which in turn allocates it to the appropriate job queue.
- the API 14 is in the form of at least one computer device (such as a service) and any associated hardware configured to perform the API functions described herein.
- the computer interface 42 represents the combination of hardware and software that sends and received requests
- the request manager 44 represents the combination of hardware and software that manages those requests. Requests are directed to a network address of the computer interface, such as a URL or URI associated therewith.
- the API 14 can be a Web API, with at least one Web address provided for this purpose.
- One or multiple such network addresses can be provided for receiving incoming requests.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Auxiliary Devices For Music (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
L'invention concerne un procédé selon lequel au moins un paramètre de production de musique défini par l'utilisateur est reçu au niveau d'un système de production de musique et utilisé pour produire un morceau de musique personnalisé dans un format de notation musicale numérique, qui est rendu sous un format audio pour délivrer à l'utilisateur. Un rendu audio de prévisualisation est créé pour délivrer à l'utilisateur avant que l'étape de rendu ne soit achevée. Ceci utilise des segments de musique pré-générés stockés dans un format audio, les segments de musique ayant été générés par production d'une pluralité de sections de musique selon différents paramètres de production de musique prédéterminés et fournissant un rendu de la pluralité de sections de musique en format audio. Le rendu de prévisualisation est créé par l'appariement de sections de l'élément de musique personnalisé avec différents segments de musique pré-générés et le séquençage de données audio des différents segments de musique pré-générés, le rendu de prévisualisation comprenant les données audio séquencées. Ceci utilise des métadonnées de production indiquant des paramètres de production de musique prédéterminés utilisés pour générer les segments de musique.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB1820266.3 | 2018-12-12 | ||
| GB1820266.3A GB2581319B (en) | 2018-12-12 | 2018-12-12 | Automated music production |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020121225A1 true WO2020121225A1 (fr) | 2020-06-18 |
Family
ID=65147312
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IB2019/060674 Ceased WO2020121225A1 (fr) | 2018-12-12 | 2019-12-11 | Production automatisée de musique |
Country Status (2)
| Country | Link |
|---|---|
| GB (1) | GB2581319B (fr) |
| WO (1) | WO2020121225A1 (fr) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114005424A (zh) * | 2021-09-16 | 2022-02-01 | 北京灵动音科技有限公司 | 信息处理方法、装置、电子设备及存储介质 |
| US20220100820A1 (en) * | 2019-01-23 | 2022-03-31 | Sony Group Corporation | Information processing system, information processing method, and program |
| WO2022160054A1 (fr) * | 2021-01-29 | 2022-08-04 | 1227997 B.C. Ltd. | Système de traitement audio et d'intelligence artificielle et méthodologie pour composer, réaliser, mixer et compiler automatiquement de grandes collections de musique |
| US11450301B2 (en) * | 2018-05-24 | 2022-09-20 | Aimi Inc. | Music generator |
| CN115881063A (zh) * | 2021-09-23 | 2023-03-31 | 北京小米移动软件有限公司 | 音乐生成方法、装置及存储介质 |
| CN119759215A (zh) * | 2025-03-06 | 2025-04-04 | 保利文化传播有限公司 | 一种虚拟数字文物与观众的交互方法及系统 |
| WO2025245618A1 (fr) * | 2024-05-30 | 2025-12-04 | Npi Systems Ltd. | Systèmes et procédés d'accompagnement musical en temps réel à l'aide d'une intelligence artificielle |
Families Citing this family (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112037745B (zh) * | 2020-09-10 | 2022-06-03 | 电子科技大学 | 一种基于神经网络模型的音乐创作系统 |
| EP4020458B1 (fr) * | 2020-12-28 | 2026-02-11 | Bellevue Investments GmbH & Co. KGaA | Procédé de génération par gabarit de variantes d'une chanson. |
| EP4024392A1 (fr) * | 2020-12-31 | 2022-07-06 | Bellevue Investments GmbH & Co. KGaA | Procédé et système de construction de chanson basés sur l'énergie |
| GB2615223B (en) * | 2021-03-31 | 2024-07-24 | Daaci Ltd | System and methods for automatically generating a musical composition having audibly correct form |
| GB2615222B (en) * | 2021-03-31 | 2024-07-24 | Daaci Ltd | System and methods for automatically generating a musical composition having audibly correct form |
| GB2605440A (en) * | 2021-03-31 | 2022-10-05 | Daaci Ltd | System and methods for automatically generating a muscial composition having audibly correct form |
| GB2615224A (en) * | 2021-03-31 | 2023-08-02 | Daaci Ltd | System and methods for automatically generating a musical composition having audibly correct form |
| GB2615221B (en) * | 2021-03-31 | 2024-07-24 | Daaci Ltd | System and methods for automatically generating a musical composition having audibly correct form |
| CN113903367B (zh) * | 2021-09-30 | 2023-06-16 | 湖南卡罗德钢琴有限公司 | 一种基于钢琴全智能系统的采集还原方法 |
| US12505820B2 (en) * | 2022-05-05 | 2025-12-23 | Lemon, Inc. | Approach to automatic music remix based on style templates |
| EP4614492A1 (fr) * | 2024-03-06 | 2025-09-10 | Bellevue Investments GmbH & Co. KGaA | Système musical génératif utilisant des algorithmes à base de règles et des modèles ai |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090088877A1 (en) * | 2005-04-25 | 2009-04-02 | Sony Corporation | Musical Content Reproducing Device and Musical Content Reproducing Method |
| US20150148927A1 (en) * | 2003-01-07 | 2015-05-28 | Medialab Solutions Corp. | Systems and methods for portable audio synthesis |
| US20170092247A1 (en) * | 2015-09-29 | 2017-03-30 | Amper Music, Inc. | Machines, systems, processes for automated music composition and generation employing linguistic and/or graphical icon based musical experience descriptors |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2001086628A2 (fr) * | 2000-05-05 | 2001-11-15 | Sseyo Limited | Production informatisee de sequences sonores |
| US20060180007A1 (en) * | 2005-01-05 | 2006-08-17 | Mcclinsey Jason | Music and audio composition system |
| US7863511B2 (en) * | 2007-02-09 | 2011-01-04 | Avid Technology, Inc. | System for and method of generating audio sequences of prescribed duration |
| US11972746B2 (en) * | 2018-09-14 | 2024-04-30 | Bellevue Investments Gmbh & Co. Kgaa | Method and system for hybrid AI-based song construction |
-
2018
- 2018-12-12 GB GB1820266.3A patent/GB2581319B/en active Active
-
2019
- 2019-12-11 WO PCT/IB2019/060674 patent/WO2020121225A1/fr not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150148927A1 (en) * | 2003-01-07 | 2015-05-28 | Medialab Solutions Corp. | Systems and methods for portable audio synthesis |
| US20090088877A1 (en) * | 2005-04-25 | 2009-04-02 | Sony Corporation | Musical Content Reproducing Device and Musical Content Reproducing Method |
| US20170092247A1 (en) * | 2015-09-29 | 2017-03-30 | Amper Music, Inc. | Machines, systems, processes for automated music composition and generation employing linguistic and/or graphical icon based musical experience descriptors |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11450301B2 (en) * | 2018-05-24 | 2022-09-20 | Aimi Inc. | Music generator |
| US20220100820A1 (en) * | 2019-01-23 | 2022-03-31 | Sony Group Corporation | Information processing system, information processing method, and program |
| WO2022160054A1 (fr) * | 2021-01-29 | 2022-08-04 | 1227997 B.C. Ltd. | Système de traitement audio et d'intelligence artificielle et méthodologie pour composer, réaliser, mixer et compiler automatiquement de grandes collections de musique |
| CN114005424A (zh) * | 2021-09-16 | 2022-02-01 | 北京灵动音科技有限公司 | 信息处理方法、装置、电子设备及存储介质 |
| CN115881063A (zh) * | 2021-09-23 | 2023-03-31 | 北京小米移动软件有限公司 | 音乐生成方法、装置及存储介质 |
| WO2025245618A1 (fr) * | 2024-05-30 | 2025-12-04 | Npi Systems Ltd. | Systèmes et procédés d'accompagnement musical en temps réel à l'aide d'une intelligence artificielle |
| CN119759215A (zh) * | 2025-03-06 | 2025-04-04 | 保利文化传播有限公司 | 一种虚拟数字文物与观众的交互方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| GB2581319B (en) | 2022-05-25 |
| GB2581319A (en) | 2020-08-19 |
| GB201820266D0 (en) | 2019-01-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2020121225A1 (fr) | Production automatisée de musique | |
| US12051394B2 (en) | Automated midi music composition server | |
| US12039959B2 (en) | Automated music composition and generation system employing virtual musical instrument libraries for producing notes contained in the digital pieces of automatically composed music | |
| KR102459109B1 (ko) | 음악 생성기 | |
| US10854180B2 (en) | Method of and system for controlling the qualities of musical energy embodied in and expressed by digital music to be automatically composed and generated by an automated music composition and generation engine | |
| US11037538B2 (en) | Method of and system for automated musical arrangement and musical instrument performance style transformation supported within an automated music performance system | |
| US12205565B2 (en) | Music generator generation of continuous personalized music | |
| US10964299B1 (en) | Method of and system for automatically generating digital performances of music compositions using notes selected from virtual musical instruments based on the music-theoretic states of the music compositions | |
| US9070351B2 (en) | Adjustment of song length | |
| US11024275B2 (en) | Method of digitally performing a music composition using virtual musical instruments having performance logic executing within a virtual musical instrument (VMI) library management system | |
| KR20220128672A (ko) | 음악 콘텐츠 생성 | |
| US20240038205A1 (en) | Systems, apparatuses, and/or methods for real-time adaptive music generation | |
| JP2025540804A (ja) | オーディオ出力ファイルを生成するための方法、システム及びコンピュータプログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19895051 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20/09/2021) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 19895051 Country of ref document: EP Kind code of ref document: A1 |