WO2007133760A2 - Procédé et système d'extraction d'informations muisicales - Google Patents

Procédé et système d'extraction d'informations muisicales Download PDF

Info

Publication number
WO2007133760A2
WO2007133760A2 PCT/US2007/011599 US2007011599W WO2007133760A2 WO 2007133760 A2 WO2007133760 A2 WO 2007133760A2 US 2007011599 W US2007011599 W US 2007011599W WO 2007133760 A2 WO2007133760 A2 WO 2007133760A2
Authority
WO
WIPO (PCT)
Prior art keywords
music
database
user
audio clip
advertisements
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2007/011599
Other languages
English (en)
Other versions
WO2007133760A3 (fr
Inventor
Frank Geshwind
Todd Carter
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OWL MULTIMEDIA Inc
Original Assignee
OWL MULTIMEDIA Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/715,863 external-priority patent/US20070214133A1/en
Application filed by OWL MULTIMEDIA Inc filed Critical OWL MULTIMEDIA Inc
Publication of WO2007133760A2 publication Critical patent/WO2007133760A2/fr
Publication of WO2007133760A3 publication Critical patent/WO2007133760A3/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/632Query formulation
    • G06F16/634Query by example, e.g. query by humming
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • G10H2240/131Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
    • G10H2240/141Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process

Definitions

  • the present invention relates to music information retrieval in general, and more particularly to systems and methods for searching or finding music with music, by searching, e.g., for music from a library that has a sound that is similar to a given sound provided as a search query, and to methods and systems for tracking revenue generated by these computer-user interactions.
  • searching e.g., for music from a library that has a sound that is similar to a given sound provided as a search query
  • methods and systems for tracking revenue generated by these computer-user interactions include, inter alia, systems that allow a user to discover unknown music, and systems that allow a user to look for music based directly on queries formed from sounds that the user likes.
  • the metadata does not fully characterize the sound of the music, and so the searches fall short in many respects when a user is looking for a particular "sound” or "feel” of the music in any but the coarsest of senses (i.e., a particular artist or genre can be found, but one has difficulty, for example, finding music that contains sounds similar to the guitar solo in a particular recording that the user has on his computer).
  • Some related art systems are based on musical audio features, or are content based. These typically characterize the digital signals that comprise the music tracks, and relate to the whole music track.
  • U.S. Patent 7,081,579 which is incorporated by reference in its entirety, recites "determining an average value of the coefficients for each characteristic from each said part of said selected song file.” It calls for utilizing a whole-music-track characterizing technique, wherein the system parameters are averaged to characterize an entire music track.
  • Such systems have several disadvantages. Typically the features available to practitioners today do not fully capture the richness of human perception of media. Also, it is often beyond the capacity of currently available algorithms to fully characterize and represent the complexity of characterization of an entire media track, song, performance or program.
  • the present invention relates in part to the use of "clips" (sub-portions of the media files) — smaller sections of media files that are statistically more likely to have a single "character” or sound or quality.
  • Some related art systems use, for example, excerpted music clips (sub-portions of the whole track) for audio summarization. This allows users to browse collections and hear portions of the track(s) without taking the time to hear the whole track. But these systems do not teach using these clips for searching, active learning or query refining in accordance with an embodiment of the present invention.
  • the present invention relates to finding music based on the sound of segments of music taken from a possibly larger piece of music.
  • Present-day text-based information retrieval is largely based on the notion of a "key word".
  • text-based information retrieval systems provide a means for users to search for documents that contain a particular word or phrase.
  • the system and method provides ways for users to search for music based on "key sounds" analogous to key words.
  • complex queries can be generated by combining clips and other information in accordance with an embodiment of the present invention.
  • U.S. Patent No. 6,674,452 which is incorporated herein by reference in its entirety, describes a Graphical User Interface for building complex music information retrieval queries by combining elements of a query. Also a use of music "segmentation" is discussed in U.S. Patent No. 5,918,223, which is incorporated herein by reference in its entirety, and which describes systematic splitting of music files into smaller pieces for analysis, primarily to combine the results of such splitting by averaging the data. It also describes using the segmented data on a predetermined library of music in order to characterize segments within the predetermined library.
  • 7,081,579 also discusses "section processing" in which a single representative segment is selected for music in a predetermined library, by comparing each segment to the averaged track. While elements of these related systems can be used in conjunction with the methods and systems of the present invention, these related art system do not teach, nor contemplate the present invention, including but not limited to the way in which clips are used to specify and refine queries and the way data is indexed and searched in the database and the way in which results are provided.
  • the present invention relates in part to more efficient ways of performing content based searches. Indeed a very large database can be required in order to systematically catalog sounds within pieces of music, over a possibly large library of music - larger, a priori, than the database required to catalog a single sound summary for each piece of music.
  • the present invention relates to methods for using content based features and approximate similarity techniques, such as but not limited to approximate nearest neighbor algorithms and locality sensitive hashing to efficiently store and index information about a library of music, and efficiently search through this index.
  • the present invention relates in part to methods and systems for choosing and displaying advertisements in connection with music search, discovery and recommendation.
  • Related art systems exist for displaying advertisements in connection with search results, such as US Patent 6,269,361, which is incorporated herein by reference in its entirety, and which describes a system for influencing the position for a search listing within a search result list generated by an Internet search engine, based on search terms comprising one or more keywords.
  • the present invention relates in part to searching for music based on the sound features of the music, it analogously relates to influencing the position for a search listing within a search result list generated by an Internet search engine, based on search terms comprising one or more music features — something which the related art does not teach nor contemplate.
  • a web site comprises a web server with web pages and files including client application code and server code, databases, and other components, each as described herein and additionally comprising those standard elements of a web server, known to those of skill in the art.
  • the client application provides an interface allowing a user to specify a first audio clip (the query).
  • the query clip is comprised of one or more clips, segments or time windows of sound taken from a potentially larger music, sound, audio or media file.
  • this larger music file is specified and supplied from the user's computer, and/or from a library of music files on the web server, and/or from third-party music collections and/or servers.
  • This query clip is processed by the client application to produce a characteristic set of query sound features.
  • the query sound features are passed to the server by the client application.
  • the server additionally comprises a database of sound features for a large library of music clips.
  • the server processes the query sound features by searching the database to find those music clips that are closest to or match the query sound features. References to the resulting/corresponding music files (the query results) are passed back to the client application.
  • the client application displays the query results.
  • the client is additionally comprised of components that allow the user to do one or more of: play back or preview the sound clips corresponding to the results, refine the query results, get additional information related to the results, conduct new queries, download one or more results, label or tag, rate or review one or more results, share one or more results, create a new musical composition comprising one or more results, purchase copies of the music files returned, generate and purchase ringtones and purchase other merchandise associated or affiliated with the results.
  • users select one or more sound clip examples from one or more sources including but not limited to the user's personal library, and/or search results from embodiments of the present invention. Sound features from this collection of music clips are generated in accordance with the methods disclosed herein. These sound features are used to filter sets of music, audio tracks and/or clips to create search results for the user. These results are generated and presented as a personalized search in accordance with the search and recommendation system disclosed herein.
  • the filter is used to generate a live feed of new music that is of potential interest to the user in accordance with an embodiment of the present invention.
  • the present invention in accordance with an embodiment comprises a system for receiving, processing and storing new music files from one or more new music file providers, a system for filtering this collection of new music files to determine a subset of the new music files estimated to be of interest to a user in accordance with the filter as described herein, and a system for providing the results of such a process to the user that could include, but is not limited to, XML feeds standard in the art such as RSS or ATOM feeds, or, for another example, by periodic or real-time email alerts to the user(s) as soon as new music is encountered that is deemed to be of interest to the user.
  • Such refinement and creation of modified queries is accomplished in accordance with the present invention by the methods and systems disclosed herein, and in part using the methods and systems disclosed in the U.S. Patent Application 11/230,949, filed 9/15/2005, Geshwind et. al., System and Method for Document Analysis, Processing and Information Extraction, which is incorporated herein by reference in its entirety.
  • a computer based method for searching a music library comprises the steps of receiving an audio clip from a user; computing musical features of the audio clip; transmitting the musical features of the audio clip to a server; and receiving a segment of a music file from the server determined to be similar to the audio clip by comparing the musical features of the audio clip to musical features associated with segments of a plurality of music files stored in the music library to find the segment from the segments of the plurality of music files stored in the music library that is similar to the audio clip.
  • a system for searching a music library comprises a music library and a client device connected to a server over a communications network.
  • the music library comprises a plurality of music files and a plurality of musical features associated with segments of the plurality of music files.
  • the client device associated with a user and connected to a communications network, selects an audio clip, plays said audio clip and computes music features of the audio clip.
  • the server receives the musical features of the audio clip from the client device over the communications network and compares the musical features of the audio clip to the musical features stored in the music library to find a segment from segments of the plurality of music files that is similar to the audio clip.
  • a computer medium comprises a code for searching a music library.
  • the code comprises instructions for: receiving an audio clip from a user; computing musical features of the audio clip; transmitting the musical features of the audio clip to a server; and receiving a segment of a music file from the server determined to be similar to the audio clip by comparing the musical features of the audio clip to musical features associated with segments of a plurality of music files stored in the music library to find the segment from the segments of the plurality of music files stored in the music library that is similar to the audio clip.
  • the present invention accepts input music and/or audio clip in a set of predetermined formats which can include, without limitation, music formats known in the art such as WAV, MP3, and AAC formats.
  • a suitable decoder/decompression element for decoding/decompressing the input audio into raw digital audio samples.
  • advertisements are accepted from advertisers and are selected for display along with music search, discovery and recommendation results.
  • Advertisers can be but are not limited to music owners, publishers or artists.
  • Advertisers are provided with a system in accordance with an embodiment of the present invention, in order to specify music content and other advertising that the advertisers wish to promote in specified contexts.
  • the system is comprised of an interface that allows the advertiser to specify this context by associating music features with advertisements.
  • the context occurs in an embodiment of the present invention, when the music features associated with an advertisement are sufficiently similar to music features corresponding to a search query.
  • Associated databases to track these specifics, to record the display of the advertisements and other associated events such as but not limited to clicking by the user on the advertisements, user account and billing information, are provided in accordance with an embodiment of the present invention.
  • the advertisements are displayed when the associated data arises in connection with a user conduction a query using the systems described herein, wherein the data matches the data associated with the advertisement including but not limited to the sound of specified music and/or other music metadata associated with the advertisements as described herein.
  • a computer based method for selecting and displaying advertisements comprises the steps of receiving an audio clip from a user; computing musical features of the audio clip and transmitting the musical features of the audio clip to a server.
  • the computer based method further comprises the step of receiving a set of advertisements from the server determined to be relevant in the context of the audio clip by comparing the musical features of the audio clip to musical features stored in a database and associated with a plurality of advertisements stored in the database to find the set of advertisements from the database that is determined to be relevant in the context of the audio clip.
  • a system for selecting and displaying advertisements comprises a client device, a server and a database.
  • the client device associated with a user and connected to a communications network, receives an audio clip from the user and computes musical features of said audio clip.
  • the server receives the musical features of the audio clip from the client device over the communications network.
  • the server determines a set of advertisements to be relevant in the context of the audio clip by comparing the musical features of the audio clip, to musical features stored in a database and associated with a plurality of advertisements stored in the database to find the set of advertisements from the database that is determined to be relevant in the context of the audio clip.
  • the server transmits the set of advertisements to the client device over the communications network.
  • a computer medium comprises a code for selecting and displaying advertisements.
  • the code comprises instructions for receiving an audio clip from a user, computing musical features of the audio clip, transmitting the musical features of the audio clip to a server, and receiving a set of advertisements from the server determined to be relevant in the context of the audio clip by comparing the musical features of the audio clip to musical features stored in a database and associated with a plurality of advertisements stored in the database to find the set of advertisements from the database that is determined to be relevant in the context of the audio clip.
  • Figure 3 shows a high-level client side block diagram in accordance with an embodiment of the present invention
  • Figure 5A shows a block diagram of a clip feature vector calculation system in accordance with an embodiment of the present invention
  • Figure 5C shows a block diagram of normalized temporal feature computation in accordance with an embodiment of the present invention
  • Figure 6 shows a block diagram of a system for building a server-side clip feature vector database in accordance with an embodiment of the present invention
  • the server (206) calculates hash function scores for the query sent in step 250, performs a pre-search based on has function matching in step 255, and then performs a refined search based on, for example but not limited to, Euclidean norm distance of music features restricted to the subset of matches from the hash function pre-search in step 260.
  • the refined search can be based on other similarity measures including but not limited to diffusion distance as described in the references cited herein.
  • the server (206) then sends music tracks and clips corresponding to the refined search results to the client application (204) in step 265.
  • FIG 3 shows a high-level client side block diagram in accordance with an embodiment of the present invention.
  • a user (202) opens a query file on the user's computer in step 305, via the client application (204).
  • the file is played and a selection is made, generating a query request in step 310.
  • the query is comprised of the clip features as described herein.
  • Figure 4 shows some details of this clip selection process in accordance with an embodiment of the present invention.
  • a circular buffer is kept. This buffer holds the decoded sample values of the music (e.g., PCM samples), for a fixed time window such as 10 seconds.
  • a predetermined sized window such as a ten second window advances by one second of music file for every one second of real time. This repeats until the user hits the search button (or, e.g., manually grabs and drags the selection window) in step 420.
  • the current buffer is used to generate a search query vector in accordance with an embodiment of the present invention in step 425.
  • the results of the query are sent from the server (206) to the client (204) in step 315.
  • the results are displayed on the user's computer in step 320, optionally the user (202) creates a refined query request in step 325, and the process is repeated either with a whole new query, or with a refined query in step 330.
  • users (202) can use a clip from any one of the result tracks of the first query as a seed (i.e., a selected clip) for a new query.
  • a Mel-filter spectral weighting is applied (e.g., this can reduce, e.g., the 512 frequency samples per time bin to, say, 40 frequency bins) in step 520, and a logarithm is taken in step 525. This produces the Mel- Table.
  • the results are further processed to produce spectral features as shown in Figure 5B, and temporal features as shown in Figure 5C.
  • Figure 5B shows a block diagram of normalized spectral feature computation in accordance with an embodiment of the present invention.
  • the Mel-Table generated from the process depicted in Figure 5A is used compute spectral features.
  • a DCT in frequency (for each time bin) is computed in step 540, and the 18 lowest- frequency samples are kept in step 545.
  • the mean and covariance of these 18- dimensional vectors, over the set of time bins, is computed in step 550.
  • FIG. 6 shows a block diagram of a system for building a server-side clip feature vector database in accordance with an embodiment of the present invention.
  • N 10 seconds
  • M desired window shift
  • the algorithm shown loops over each track in a library in step 605, and a series of clips of length N seconds, with M second shifts in step 610. That is, for each track, a sequence of N second clips is produced by taking as a window the first N seconds of the then current track, and then shifting the window by M seconds to get the next window, etc.
  • the query result is returned in step 850, which consists of the R closest music clips from within the set L, where the notion of closest is, for example but not limited to, in the sense of Euclidean distance. In other embodiments other distance functions can be used including without limitation diffusion distance as taught in the cited references.
  • Use of the webpage comprises use of the search interface as described in Figure 1, and then the corresponding use of the additional elements in the corresponding way, to play the result clips in any desired order, refine the search, and perform new searches.
  • Some embodiments additionally comprise a system and method for controlling and tracking revenue, and selling of advertisement and promotion related to the use of the information retrieval systems described herein, in accordance with an embodiment of the present invention.
  • advertisements can be promoted based on their relationship to the content being searched.
  • the present invention enables the promotion of music directly through the sound of the music.
  • Some embodiments of the present invention in this regard are comprised of a database disposed to receive, store, and serve information about an amount paid or too be paid for the promotion of a particular song (or artist, or for any of the songs from a collection, etc.).
  • a particular aspect of the present invention in this regard relates to the automated or assisted refinement of queries by using the results of a first query, computing statistics on metadata and other features from the set of results of this first query, and using these results to create a refined query in the style of the fr_rnatr_bin algorithms described in U.S. Patent Application 11/230,949.
  • this query refinement information can be presented to the user as a characterization of the clip, with an interface that allows the user to select elements of this characterization to refine the query.
  • One example comprises a musical racing game played by a player and an opponent.
  • Game play comprises the opponent picking a challenge: the player is to start with a seed song or genre or artist (say, "Enya"), and a (typically very different) target song or genre or artist (say “Metallica”).
  • the player's goal is to try to jump from the seed to the target through music recommendations generated by the system, so the player:
  • Player's score for the round is from a predetermined formula, such as 10 minus the number of iterations that it takes to get from seed to target.
  • a game can consist of a variant of the game of Monopoly wherein, among other adaptations, the concepts of cities and real- estate are replaced by the concepts of genres and artists.
  • Other elements of the game are adapted to the music industry in similar ways.
  • Game play proceeds by music recommendation events as described herein instead of the rolling of a die.
  • Players buy and sell the right to promote artists, and must pay each other when searches produce hits that contain artists owned by the other players.
  • Some embodiments additionally comprise bonus points if player finds some new music that opponent likes, or if player comes across the "secret artist of the day", etc.
  • Music fingerprinting is the process of identifying music from an audio segment instance of the music, and can involve the identification of artist, title, genre, album, performance date or instance and other metadata, from algorithmically "listening" to the music.
  • a music fingerprint in this regard is a data summary of the music or a segment of the music, from which the music can be uniquely identified as described.
  • the music features described herein are used as a fingerprint of the music. Indeed, one finds that in practicing an embodiment of the search invention as disclosed herein, the music file from which the search query arises, when it happens to also be in the database/music library, is returned as the first/best result of the query.
  • a straight comparison can be conducted in a neighborhood of each of the resulting target clips within their corresponding full music files (e.g., via a local matched filter using the query clip as the filter), to produce an additional score of confidence or match.
  • a result can be returned only if this score is greater than a pre-determined threshold.
  • tags or labels such as labels provided by users, to describe clips.
  • Such embodiments comprise one ore more interface elements allowing users to specify tags associated with a clip, to specify tags to be used as queries for searches, or to augment queries, and a database for storing and retrieving the tags and linking the tags with the associated clips. These tags can then be used as additional feature data in any of the embodiments described herein.
  • a system and method allowing a user to search for lyrics within music, and more particularly to search for the offset of a given textually specified lyric(s) into a segment of digital audio known or believed to contain the corresponding sung, spoken, voiced or otherwise uttered lyric(s).
  • the present system comprises a search query specification element (1000), a song or song database element (1010), a search element (1020), a controlling element (1030) and a result presenting element (1040).
  • a user enters a query with the query specification element (1000), the query comprising one or more words of text.
  • the controller receives this query request and causes the search element (1020) to search the database element (1010), to find one or more results which are then presented by the result presenting element (1040).
  • a result comprises the specification of a segment of digital audio, together with a time offset t, such that at approximately the time "t" within the audio segment, the lyrics corresponding to the search query are uttered, according to the search algorithm within (1020).
  • the controlling element (1030) comprises a client- server Internet application, comprising one or more client applications (i.e., including but not limited to computer programs, scripts, web pages, Java code, javascript, ajax and the like), and one or more server applications.
  • the query specification element (1000) comprises a text entry field on a webpage served by the server and rendered by the client of the controlling element (1030).
  • the database (1010) comprises a set of digital audio segments, and a set of corresponding lyrics files.
  • the audio segments are, for example, audio recordings of performed music.
  • the lyrics files contain the text of the lyrics of the songs in the corresponding music files, but they do not necessarily have a priori information about the precise or approximate time-offset within the music, at which any given lyric is uttered (although in some embodiments, such information is also in the database and can be used to generate or augment the search results).
  • the search element (1020) comprises database access components, and an algorithm or collection of algorithms for finding the offset of lyric utterance given the target lyric(s), a music file, and a lyrics file containing the target lyric(s).
  • a user types a word or phrase into a search box, and receives one or more short audio clips containing the word (together with relevant meta-information so that the user will know from which audio pieces the corresponding clips were taken, perhaps how to buy the songs, etc.).
  • M_ij Sound_Similarity_Matrix( audio file, win step, win_len)
  • audio_file : source audio file to search (or an index or pointer to such a file)
  • win step : window step size for the similarity computation
  • win_len : the length of a window for the similarity computation
  • audio_l pre_process( audio_file) % (in one embodiment, pre_process does nothing and simply returns the whole file; in another embodiment, prejprocess filters audio_file and returns only that portion of audio_file that corresponds to speech segments, with the intervening portions removed.)
  • feat_i get_features(win) %
  • these can be, e.g., FFT, MFCC, cepstral, temporal samples (i.e., the identity function) or filtered sub-samples, just to name a few, others are possible
  • Compute M_i j similarity( feat_i, featj) % similarity can be, e.g., inner product or any other similarity measure
  • lyrics_file textual lyrics file for the lyrics to audio_file
  • Offset one ore more offsets into audio_file, approximately where the lyrics are believed to be uttered Algorithm:
  • a user or other source can provide additional information about the alignment between textual lyrics and utterances within an audio file.
  • the database can simply be augmented with pre-computed data on this alignment, and this can be used to conduct the searches described.
  • the methods and systems described herein are used to present a user with a first lyrics-to-utterance alignment. The user examines this alignment and listens to the corresponding audio files, and corrects the offsets. This corrected data is then entered into a database. The user can be the same as the user in the embodiments described elsewhere or another user.
  • speech recognition algorithms are also used to align textual lyrics with audio utterances, as known to one of skill in the art, in combination with or instead of certain of the elements described herein.
  • Some embodiments of the present invention are additionally comprised of relevance feedback mechanisms.
  • Such an embodiment is comprised of a search or recommendation system as disclosed herein, and one or more mechanisms for measuring the user's reaction to the search recommendation results.
  • Such mechanisms can be comprised of active interface elements, for example like the "thumbs up” and “thumbs down” interface on a standard TiVo remote control (see, for example, the TiVo Series2 DVR Viewers Guide, pages 8 — 9, in the section entitled “TiVo Suggestions”), or a rating on a scale of 1 to 10, or some other rating or feedback system known to one of skill in the art, and can also be comprised of passive relevance assessment elements such as the number of times or amount of time that a user listens to a particular result, information about the use of rewind, fast forward or skip buttons, use of or changes to the volume settings, and the like.
  • Relevance assessment can be comprised of personal/individual information such as that relating to the user's prior choices, contents of the user's library, and the like, and relevance assessment can also be comprised of community data such as collaborative filtering data, methods and techniques.
  • a classifier such as those standard in the art including but not limited to those based on kernel methods, support vector machines, classification and regression trees, nearest neighbor classifiers and the like, and/or recommendation systems such as those additionally disclosed herein, is trained on a first set of data.
  • a search or recommendation is performed in accordance with the present invention.
  • the user is allowed to interact with the results to produce relevance information as disclosed herein. This relevance information is then used to re-train the classifier or relevance method.
  • the search or recommendation results can then be reordered, and/or a new search or recommendation performed in accordance with the relevance modified data, and new results provided.
  • the present invention can additionally be used as an automatic seek button for looking for music on a digital radio, or as a method for creating playlists.
  • Certain embodiments of the present invention comprise systems for creating new music by mixing existing music, sounds, audio data, clips or samples.
  • Such an embodiment comprises a search and/or recommendation engines as disclosed herein, as well as components for mixing returned results into a destination track. The process can be iterated while keeping a persistent destination track.
  • Such an embodiment can comprise music mixing elements standard in the art including but not limited to slicing, fade-in, fade-out, special effects, echo, reverb, loudness adjustments, pitch adjustments, synchronization elements and the like.
  • An embodiment of the present invention comprises a method for finding similar users by measuring the similarity of the user's music collections in accordance with the methods disclosed herein. Additionally such a system can create a virtual merged music collection comprised of the results, collections and preferences of the two users, for example as a component in an online social networking website.
  • An embodiment of the present invention comprises a system for specifying a series of clips from one or more sources. Such a series will be called a multiclip herein.
  • a multiclip provides a way for a music search engine to learn a user's preferences and to conduct queries by allowing users to identify select and search on regions of auditory interest within a music, audio file or media file from the user computer.
  • a multiclip provides for a summary of a piece of music.
  • a multiclip is used to provide one or more clips sought to characterize the beginning, middle, and end of a piece of music. A search is then conducted in accordance with the present invention and the result provided to the user.
  • each sound in a library is automatically summarized using techniques known to those of skill in the art. Such techniques include but are not limited to identifying representative clips by forming a similarity matrix from the collection of segments of the sound at a given timescale (or at a plurality of timescales), and then taking the representative clips to correspond to regions of support of the top few eigenvectors of the similarity matrix.
  • each piece of music in a library of music may be summarized by a multiclip comprising a few clips within the piece of music, together with the order of occurrence or the location of occurrence of the clips.
  • Some embodiments of the present invention are comprised of components for advertising.
  • the advertisements are stored in a database and are rendered in response to advertising opportunities as disclosed herein.
  • an advertising system comprises a music search, discovery, and/or recommendation service as described herein, a database of advertisements wherein the advertisements are associated with music features, and a web client server application as described, wherein the web client is comprised of a display comprising a music search section and an advertising section as shown in Figure 1.
  • search query data are sent to the server.
  • the server returns search results in response to the query data as described herein.
  • the server searches through the advertisement database to find advertisements for which the associated musical features mentioned herein are also matches or similar to the search query features.
  • Such features can include but are not limited to music features such as the spectral and temporal features described herein, as well as music metadata.
  • the advertising results can be ordered in a number of ways including but not limited to according to the degree of match, according to a price to be paid for or an expected return on rendering the advertisement, or a combination of those elements.
  • the server sends search results back to the client application and also sends advertising results back to the client application.
  • the client renders the search results and the advertising results in their respective sections of the client application display area.
  • An embodiment of the present invention can comprise an advertising customer interface and advertising database analogous to similar systems known in the art and incorporating the elements described herein, and a system and method for the selection and rendering of advertisements in accordance with the present invention.
  • An advertising customer interface in accordance with an embodiment of the present invention comprises a customer interface such as but not limited to a web- based advertising customer client-server application.
  • a customer interface such as but not limited to a web- based advertising customer client-server application.
  • applicant will call the advertising customer client-server application the customer client-server application (and customer application, customer client, customer server, etc), and will call the music search and discovery application the end-user client-server application (and end-user application, end-user server, end-user client, etc).
  • the customer application is illustrated by the block diagram in Figure 11. As depicted in Figure 11, the application has an entrance block (1150) by which a customer can choose to login to the system or register for an account.
  • the login block gets the user's credentials, such as id and password, and tests for validity. If the credentials are valid control is passed to the account summary block (1168) and otherwise back to the login/registration block (1150). If, from the entrance block (1150), the user chooses to register for a new account, control is passed to the registration block (1162).
  • the registration block collects user's account information such as contact first and last name, company name, identification of the set of music that the customer wishes to bid on, mailing and billing addresses and the like. This information is placed in the database for later activation. Once activated, an account is created for the user.
  • the account summary block (1168) displays welcome and summary information, such as but not limited to, the user's name and address, account balances, number of active advertising campaigns that the user has within the system, the number of impressions and clicks that the user's advertisements have received by use of the system, within the past accounting period, and other information about the account and account activity as dictated by the particulars of the application. From the account summary block (1168), the user may choose to manage advertising campaigns or logout. If the user selects logout, control is passed back to the entrance block (1150), and if the user selects to manage advertising campaigns, control is passed to the advertising campaigns management block (1172).
  • welcome and summary information such as but not limited to, the user's name and address, account balances, number of active advertising campaigns that the user has within the system, the number of impressions and clicks that the user's advertisements have received by use of the system, within the past accounting period, and other information about the account and account activity as dictated by the particulars of the application.
  • the user may choose to manage advertising campaigns or log
  • control is passed to the preview block (1180) where the user can review the choices just made and select "OK” in which case control is passed to a database entry block (1184) and the new / edited campaign is entered into the database, or cancel in which case no entry is made into the database. In either case control is then passed back to the management block (1172). At any time that the user can select cancel from blocks (1176) or (1180) and control is then passed back to the management block (1172). From the management block (1172) the user may also choose to list/browse the set of advertising campaigns that the user has within the system and control is passed to a campaign listing block (1194).
  • block diagram is meant to illustrate a particular embodiment and is not meant to be limiting.
  • the individual functions and interface elements described need not be implemented as separate blocks or elements, and can be embodied, for example, in the logic and instructions of client/server code as server-side and client side scripts and programs.
  • An advertising database in accordance with an embodiment of the present invention is comprised of a database, the database being comprised of advertising customer information such as but not limited to contact name and address, login credentials such as user id and password, encrypted and made secure by methods known in the art, billing and other information, and a specification of which music is associated with the advertiser.
  • the database is also comprised of the information for all bids entered into the system, and all advertising content and data associated with advertisements entered into the system and this can include but is not limited to specific URL/links that a user is to be sent to if the user clicks on the associated advertisement; image and/or text information for the display of the advertisement; and/or sound to be played when the advertisement or a "play button" portion thereof is clicked.
  • each advertisement is associated with music data than can include but is not limited to the specification of a music clip, the music sound features associated with that clip, and/or music metadata.
  • a search is conducted on a website in accordance with the present invention, as depicted in figure 12, when a search is conducted by a user that is similar to the track, the track is selected in the pre-search step (1255) and further selected in the refinement step (1260), the customer's account is updated to indicated that the advertisement was displayed to an end-user in step (1262), and advertisement is generated, which can include, for example and without limitation, images and text stored in the advertising database, and the advertisement is sent to the client application in step (1265).
  • the advertisement in rendered by the client application in step (1270).
  • step (1275) if a user clicks on the advertisement, the client informs the server application (for example but not limited to by passing an XML message to the server), and in step (1280) the server updates the customer's account to reflect the fact that a click of the advertisement has happened, and can include other relevant statistics, for example but without limitation, the date and time of the click, and certain information that may optionally be known about the user such as age, gender, and location.
  • the server application for example but not limited to by passing an XML message to the server
  • step (1280) the server updates the customer's account to reflect the fact that a click of the advertisement has happened, and can include other relevant statistics, for example but without limitation, the date and time of the click, and certain information that may optionally be known about the user such as age, gender, and location.
  • a customer wishes to promote a first track that is relatively unknown to the general public - for example but not limited to a new piece of music by an up-and-coming artist, and the customer wishes to have this first track associated with a second track, for example but not limited to the case that the second track is of a similar genre and/or style as the first track, and the second track is more popular and well known.
  • the customer uses the system as described, providing the second track to the system in order to determine the music features to associate with the ad, and providing data about the first track in connection with the advertising content of the ad.
  • the ad can include but is not limited to text, images and sounds associated with the first track, and can optionally include a statement that end- users who like the second track may wish to consider purchase of the first track.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Library & Information Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Physics (AREA)
  • Game Theory and Decision Science (AREA)
  • Acoustics & Sound (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne des systèmes et des procédés permettant de chercher ou de trouver de la musique avec de la musique, qui consistent à rechercher, par exemple, de la musique dans une bibliothèque qui renferme un son analogue à un son donné fourni comme interrogation de recherche, et des procédés et systèmes de suivi des recettes générées par ces interactions ordinateur-utilisateur, et de promotion de la musique et de vente d'espaces publicitaires. Il s'agit notamment de systèmes qui permettent à l'utilisateur de découvrir des musiques qu'il ne connaît pas, de systèmes qui lui permettent de rechercher des musiques fondées directement sur des interrogations formées à base de sons que l'utilisateur aime. Dans certains modes de réalisation, ces interrogations sont constituées d'un clip ou d'un segment relativement petit d'un fichier multimédia plus grand. Un système serveur client comprenant des éléments graphiques du Web, des publicités et/ou d'autres liens de recettes affiliées, des éléments à l'appui de l'interrogation musicale, et d'un lecteur de musique, une base de données, des éléments de mise en correspondance de clips musicaux avec des clips d'une bibliothèque, et des éléments de présentation de résultats.
PCT/US2007/011599 2006-05-12 2007-05-14 Procédé et système d'extraction d'informations muisicales Ceased WO2007133760A2 (fr)

Applications Claiming Priority (12)

Application Number Priority Date Filing Date Title
US79997306P 2006-05-12 2006-05-12
US79997406P 2006-05-12 2006-05-12
US60/799,973 2006-05-12
US60/799,974 2006-05-12
US81171306P 2006-06-07 2006-06-07
US81169206P 2006-06-07 2006-06-07
US60/811,713 2006-06-07
US60/811,692 2006-06-07
US85571606P 2006-10-31 2006-10-31
US60/855,716 2006-10-31
US11/715,863 2007-03-07
US11/715,863 US20070214133A1 (en) 2004-06-23 2007-03-07 Methods for filtering data and filling in missing data using nonlinear inference

Publications (2)

Publication Number Publication Date
WO2007133760A2 true WO2007133760A2 (fr) 2007-11-22
WO2007133760A3 WO2007133760A3 (fr) 2008-02-21

Family

ID=38694537

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/011599 Ceased WO2007133760A2 (fr) 2006-05-12 2007-05-14 Procédé et système d'extraction d'informations muisicales

Country Status (1)

Country Link
WO (1) WO2007133760A2 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4068272A4 (fr) * 2019-11-26 2022-12-07 Sony Group Corporation Dispositif de traitement d'informations, procédé de traitement d'informations et programme de traitement d'informations

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7859551B2 (en) * 1993-10-15 2010-12-28 Bulman Richard L Object customization and presentation system
US7490107B2 (en) * 2000-05-19 2009-02-10 Nippon Telegraph & Telephone Corporation Information search method and apparatus of time-series data using multi-dimensional time-series feature vector and program storage medium
US20020194601A1 (en) * 2000-12-01 2002-12-19 Perkes Ronald M. System, method and computer program product for cross technology monitoring, profiling and predictive caching in a peer to peer broadcasting and viewing framework

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4068272A4 (fr) * 2019-11-26 2022-12-07 Sony Group Corporation Dispositif de traitement d'informations, procédé de traitement d'informations et programme de traitement d'informations

Also Published As

Publication number Publication date
WO2007133760A3 (fr) 2008-02-21

Similar Documents

Publication Publication Date Title
US20070276733A1 (en) Method and system for music information retrieval
US20070282860A1 (en) Method and system for music information retrieval
Typke A survey of music information retrieval systems
Bogdanov et al. Semantic audio content-based music recommendation and visualization based on user preference examples
Logan et al. A Music Similarity Function Based on Signal Analysis.
US7505959B2 (en) System and methods for the automatic transmission of new, high affinity media
US8438168B2 (en) Scalable music recommendation by search
Hoashi et al. Personalization of user profiles for content-based music retrieval based on relevance feedback
US10229196B1 (en) Automatic selection of representative media clips
Knees et al. A survey of music similarity and recommendation from music context data
Knees et al. A music search engine built upon audio-based and web-based similarity measures
US8280889B2 (en) Automatically acquiring acoustic information about music
US20060217828A1 (en) Music searching system and method
US20160267177A1 (en) Music steering with automatically detected musical attributes
US20200394988A1 (en) Spoken words analyzer
US20070124293A1 (en) Audio search system
US20120331386A1 (en) System and method for providing acoustic analysis data
Logan et al. A content-based music similarity function
Bogdanov et al. Content-based music recommendation based on user preference examples
US20080275904A1 (en) Method of Generating and Methods of Filtering a User Profile
WO2007133760A2 (fr) Procédé et système d'extraction d'informations muisicales
KR20010094312A (ko) 컴퓨터 네트워크를 이용한 음악데이터 전자상거래 방법
Hoashi et al. Comparison of User Ratings of Music in Copyright-free Databases and On-the-market CDs
WO2002001548A1 (fr) Systeme pour caracteriser des morceaux de musique
Shao User-centric music information retrieval

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07777056

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC, EPO FORM 1205A OF 25.02.09

122 Ep: pct application non-entry in european phase

Ref document number: 07777056

Country of ref document: EP

Kind code of ref document: A2