WO2012120531A2 - Procédé pour détecter rapidement et avec précision une correspondance de contenus audio - Google Patents

Procédé pour détecter rapidement et avec précision une correspondance de contenus audio Download PDF

Info

Publication number
WO2012120531A2
WO2012120531A2 PCT/IN2012/000076 IN2012000076W WO2012120531A2 WO 2012120531 A2 WO2012120531 A2 WO 2012120531A2 IN 2012000076 W IN2012000076 W IN 2012000076W WO 2012120531 A2 WO2012120531 A2 WO 2012120531A2
Authority
WO
WIPO (PCT)
Prior art keywords
audio
matched
sample
match
links
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/IN2012/000076
Other languages
English (en)
Other versions
WO2012120531A3 (fr
Inventor
Makarand Prabhakar Karanjkar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of WO2012120531A2 publication Critical patent/WO2012120531A2/fr
Publication of WO2012120531A3 publication Critical patent/WO2012120531A3/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • the present invention relates to a method for detection of audio content match.
  • Advertisement slots for advertising before and in between specific events are constantly rising in cost per second, and advertisers are therefore naturally interested in independent monitoring of the broadcasting of the advertisement they have paid for, in the slot they have paid for. Further, there is also an interest amongst advertisers in trying to quantify the impact or reception of their advertisement across a target area of coverage. All these factors are driving a need to monitor audio content as it is being broadcast, as close to real time as possible, if not actually in real time. This allows for rapid adjustment and potential recalibration of the advertisement or content broadcast strategy. Traditionally, a lot of the monitoring has happened using manual listening techniques, with logs recording time and frequency of playback being produced for the content being monitored.
  • Mazer discloses a method which include steps of receiving a set of broadcast information; converting the set of broadcast information into a frequency representation of the set of broadcast information; dividing the frequency representation into a predetermined number of frequency segments, each frequency segment representing one of the frequency bands associated with the semitones of the music scale; forming an array, wherein the number of elements in the array correspond to the predetermined number of. frequency segments, and wherein each frequency segment with a value greater than a threshold value is represented by binary 1 and all other frequency segments are represented by binary 0; comparing the array to a set of reference arrays, each reference array representing a previously identified unit of information; determining, based on the comparison, whether the set of broadcast information is the same as any of the previously identified units of broadcast information.
  • a method for recognizing an audio sample is disclosed in U.S. Patent No. 7,346,512 B2 issued to Wang and Smith, which locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings.
  • Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints.
  • Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints.
  • landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed.
  • the file is identified with the sample.
  • the method can be used for any type of sound or music, and is particularly effective for audio signals subject to linear and nonlinear distortion such as background noise, compression artefacts, or transmission dropouts.
  • the sample can be identified in a time proportional to the logarithm of the number of entries in the database. Given sufficient computational power, recognition can be performed in nearly real time as the sound is being sampled.
  • U.S. Patent No. 6,990,453 B2 also issued to Wan and Smith discloses a similar recognition method for audio samples. Additionally, US Patent No.
  • US 7,580,832 B2 issued to Allamanche et al. discloses an apparatus for producing a fingerprint signal from an audio signal includes a means for calculating energy values for frequency bands of segments of the audio signal which are successive in time, so as to obtain, from the audio signal, a sequence of vectors of energy values, a means for scaling the energy values to obtain a sequence of scaled vectors, and a means for temporal filtering of the sequence of scaled vectors to obtain a filtered sequence which represents the fingerprint, or from which the fingerprint may be derived.
  • a fingerprint is produced which is robust against disturbances due to problems associated with coding or with transmission channels, and which is especially suited for mobile radio applications.
  • US Patent No. 7,529,659 B2 issued to Wold discloses a system for determining an identity of a received work.
  • the system receives audio data for an unknown work.
  • the audio data is divided into segments.
  • the system generates a signature of the unknown work from each of the segments.
  • Reduced dimension signatures are then generated by at least a portion of the signatures.
  • the reduced dimension signatures are then compared to reduced dimensions signatures of known works that are stored in a database.
  • a list of candidates of known works is generated from the comparison.
  • the signatures of the unknown works are then compared to the signatures of the known works in the list of candidates.
  • the unknown work is then identified as the known work having signatures matching within a threshold.
  • Object of the present invention is to provide a method for audio content match detection, which is fast and capable of accurately detecting match for an audio content.
  • Another object of the present invention is to provide a method for audio content match detection, which is capable of improving match times for a sample of audio exposed to noise and distortion, in inverse proportion to the amount of noise in the sample.
  • Figure la shows a flow chart for method generating stets of links of a sample audio accordance with the present invention
  • Figure lb shows a flow chart for method for audio content match detection in accordance with the present invention
  • Figure 2 shows graph of signal value against the time duration for a portion of an audio sample
  • Figure 3 shows a graph of Fast Fourier Transform obtained for the portion of the audio sample
  • Figure 4 shows a three dimensional Spectrogram of each of the portion of the audio sample
  • Figure 5 shows top view of figure 4 illustrating a threshold peaks of spectrogram
  • Figure 6 shows a Frequency Peak Links
  • Figures 7a and 7b shows the links of peaks selected to form a set. Detail Description of the Invention
  • the present invention provides a method for detection of audio content match.
  • the method is fast and capable of accurately detecting match for an audio content. Further, the method is capable of improving match times for a sample of audio exposed to noise and distortion, in inverse proportion to the amount of noise in the sample.
  • the method is also capable of matching very fast a relatively noise free audio sample, further, with incremental noise in the audio sample resulting in a gradually increasing in time for match.
  • the method offers a substantial improvement in the ability to reject false positive matches, resulting from the use of a technique whereby for the same set of matched feature points, a greater amount of information relating to matching of the clips is derived, significantly improving the ability to reject false positives.
  • the method (100) starts at step (10).
  • a predefined digital audio sample is divided into plurality of equal portions by using digital sampling.
  • a preferred embodiment uses a .WAV file, representing the analog values of the sampled audio, encoded into either 16 bits, and sampled at 8000kHz, with a block of 1024 such samples. This is illustrated in figure 2-
  • the Fast Fourier Transform (FFT) for each of the portion of the digital audio sample is computed as shown in figure 3.
  • a preferred embodiment uses a set of 1024 audio samples, to which a Hamming Window is applied, and an FFT computed, with a time overlap of 128 samples, for each FFT.
  • Frequency information for each portion of audio sample is placed along the 'Y' axis of a graph, and each frequency profile (FFT) computed is stacked up along the 'X' axis representing time. With frequency amplitude along the z-axis, the graph provides a frequency-time representation of the audio sample.
  • the FFT's computed are stacked together to configure a spectrogram as shown in figure 4.
  • the spectrogram may also involve using overlapping equally sized portions of the audio sample, to improve the time resolution of the representation.
  • a preferred embodiment uses a time overlap of 128 audio samples.
  • step (30) significant and dominant peaks for each portion of the audio record are selected, by first computing the absolute value of the FFT, followed by the logarithm, and then peaks detected by using a threshold computed by applying a point spread to each peak in the FFT, and accumulating and rescaling the resulting spread values, which results in a smooth threshold that is representative of relative signal strength at each frequency along the FFT.
  • the result of applying the threshold produces a set of peaks that are truly dominant for that block of audio samples, without suffering from clustering in a single portion of the spectrum, and these peaks are marked along a frequency (y axis) versus time (x axis) as illustrated in figure 5.
  • the improvement starts at step (35).
  • the selection of peaks to link is based upon criteria, such as a predetermined frequency range relative to the peak under consideration, and/or time distance with respect to the peak under consideration, value of the peak relative to the peak under consideration to form frequency links of various fixed lengths (see figure 6).
  • the selected peaks are linked to produce peak frequency links, which are clubbed into sets. Each set containing frequency links produced from the same set of peaks of distinguishing frequency peak links linking at least two peaks each.
  • the peak link frequency values are concatenated to produce a string of numbers, which represent a frequency peak link. Each of these strings (representing the frequency peak link) and a frame/time of occurrence of the first peak within that link, is clubbed together to form a complete feature.
  • the clubbed frequency peak link sets are stored in a database.
  • the database stores sets of features, of various peak links of fixed lengths (constructed using a varying number of peaks). An inverse relationship is maintained between the number of features in a set and their length (number of peaks). For example, to detect clear (noise free) audio, five peaks are clubbed together to form a set, whereas to detect unclear or an audio sample subjected to noise, a set containing three peaks are selected.
  • the premise is that the probability of a match for longer peak links is smaller for a noisy sample, relative to that of finding a match of a shorter peak link.
  • step (45) the method ends.
  • the method (200) starts at step (110).
  • step (120) an audio which is to be matched with the audio sample is subjected to compute Fast Fourier Transform (FFT).
  • FFT Fast Fourier Transform
  • step (130) the computed FFT of the audio to be matched are stacked together to form a spectrogram.
  • step (140) the significant and dominant peaks form the spectrogram are selected.
  • step (160) searching the database for a first set of matching peak frequency links, and combining each element of each set of peak frequency links into an exhaustive set of combination pairs without repeating combinations, similarly, searching the database for a second set of matching peak frequency links, and combining each element of each set of peak frequency links into an exhaustive set of combination pairs without repeating combinations extracted from the database.
  • these sets are compared with the sets stored in the database of the audio sample to detect an audio match.
  • the search yields a list of matching peak links, and their corresponding frame (time) of occurrence (refer Computation below).
  • the set of peak links, computed from the audio sample, for which matches were found in the database is used to form peak link pairs amongst themselves ordered over time, resulting in a list of n!/2*(n-r)! pairs of peaks links.
  • Each of the peak links is paired with peak links that follow it in time.
  • the matched links retrieved from the database are also paired.
  • the average of tdl: td2: td3 is computed. Further, the time differences are then recomputed and represented as a percentage of the computed average. Similar treatment is applied to tdl': td2': td3'. Now a comparison is made between these two times, computed as a percentage of the average time difference between link pairs. This makes the method (100) robust to distortions in playback speed.
  • each matched peak link is paired with at least one other, results in at least two bits of information that contributes towards computation of the result.
  • the greater the number of pairs the greater the number of bits of information that each peak link contributes towards the computation of the result.
  • This additional information directly contributes to the ability of the method (100) to distinguish between a real match and a false positive in a much more robust way.
  • the speedup for a matching case and graceful degradation in speed for a miss during the matching method/search is obtained by using sets of longer peak link pairs initially, followed by a check using successively shorter peak link sets.
  • the database thus stores multiple sets of peak links of various fixed length (number of peaks), corresponding to originally profiled audio samples.
  • the number of peaks computed from the a ⁇ dio sample (subjected to noise) will match fewer peaks computed from a potentially matching sample, profiled and stored (in the form of frequency peak links) into a database earlier. This is on account of distortion resulting in missing or misaligned peaks in the audio sample due to the presence of noise.
  • the set of longer length links are used first, since the probability of finding a longer matching link is lowered due to missing or misaligned peaks in the audio sample, and a fewer number of potential matches are found.
  • the method (100 & 200) provides advantage of increasing accuracy for detaching match for an audio sample with some disturbance. Further, the method (100 & 200) improves match times for a sample of audio exposed to noise and distortion, in inverse proportion to the amount of noise in the sample. Furthermore, the method (100 & 200) matches very fast a relatively noise free audio sample, further, with incremental noise in the audio sample resulting in a gradually increasing in time for match. Moreover, the method (100 & 200) provides a substantial improvement in the ability to reject false positive matches, resulting from the use of a technique whereby for the same set of matched feature points, a greater amount of information relating to matching of the clips is derived, significantly improving the ability to reject false positives.

Landscapes

  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

La présente invention concerne un procédé de détection de correspondance de contenus audio. Le procédé est rapide et capable de détecter avec précision une correspondance d'un contenu audio. En outre, le procédé permet d'améliorer des temps de correspondance pour un échantillon audio exposé au bruit et à la distorsion, dans une proportion inverse à la quantité de bruit présent dans l'échantillon. Le procédé permet aussi de faire correspondre très rapidement un échantillon audio relativement exempt de bruit, avec un bruit incrémentiel dans l'échantillon audio, ce qui débouche sur une augmentation graduelle du temps nécessaire à la correspondance. Par ailleurs, le procédé offre une amélioration importante de la capacité de rejet de fausses correspondances positives, provenant de l'utilisation d'une technique selon laquelle pour le même ensemble de points caractéristiques correspondants, on obtient une quantité supérieure d'informations concernant la mise en correspondance des séquences, ce qui améliore de manière significative la capacité de rejet des faux positifs.
PCT/IN2012/000076 2011-02-02 2012-02-02 Procédé pour détecter rapidement et avec précision une correspondance de contenus audio Ceased WO2012120531A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN292MU2011 2011-02-02
IN292/MUM/2011 2011-02-02

Publications (2)

Publication Number Publication Date
WO2012120531A2 true WO2012120531A2 (fr) 2012-09-13
WO2012120531A3 WO2012120531A3 (fr) 2013-03-14

Family

ID=46582734

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2012/000076 Ceased WO2012120531A2 (fr) 2011-02-02 2012-02-02 Procédé pour détecter rapidement et avec précision une correspondance de contenus audio

Country Status (1)

Country Link
WO (1) WO2012120531A2 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2731030A1 (fr) * 2012-11-13 2014-05-14 Samsung Electronics Co., Ltd Procédé de recherche d'informations musicales et appareil correspondant
CN108039178A (zh) * 2017-12-15 2018-05-15 奕响(大连)科技有限公司 一种傅里叶变换时域与频域的音频相似判断方法
CN108091346A (zh) * 2017-12-15 2018-05-29 奕响(大连)科技有限公司 一种局部傅里叶变换的音频相似判断方法
CN109073614A (zh) * 2016-04-13 2018-12-21 株式会社岛津制作所 数据处理装置以及数据处理方法
CN110019922A (zh) * 2017-12-07 2019-07-16 北京雷石天地电子技术有限公司 一种音频高潮识别方法和装置
CN114218244A (zh) * 2022-02-23 2022-03-22 华谱科仪(北京)科技有限公司 一种在线色谱仪数据库更新方法、数据识别方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5437050A (en) 1992-11-09 1995-07-25 Lamb; Robert G. Method and apparatus for recognizing broadcast information using multi-frequency magnitude detection
US6990453B2 (en) 2000-07-31 2006-01-24 Landmark Digital Services Llc System and methods for recognizing sound and music signals in high noise and distortion
US7529659B2 (en) 2005-09-28 2009-05-05 Audible Magic Corporation Method and apparatus for identifying an unknown work
US7580832B2 (en) 2004-07-26 2009-08-25 M2Any Gmbh Apparatus and method for robust classification of audio signals, and method for establishing and operating an audio-signal database, as well as computer program

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004062813A (ja) * 2002-07-31 2004-02-26 Sony Corp コンテンツの固有識別子生成装置及び方法、並びにコンピュータ・プログラム
KR100774585B1 (ko) * 2006-02-10 2007-11-09 삼성전자주식회사 변조 스펙트럼을 이용한 음악 정보 검색 방법 및 그 장치

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5437050A (en) 1992-11-09 1995-07-25 Lamb; Robert G. Method and apparatus for recognizing broadcast information using multi-frequency magnitude detection
US6990453B2 (en) 2000-07-31 2006-01-24 Landmark Digital Services Llc System and methods for recognizing sound and music signals in high noise and distortion
US7346512B2 (en) 2000-07-31 2008-03-18 Landmark Digital Services, Llc Methods for recognizing unknown media samples using characteristics of known media samples
US7580832B2 (en) 2004-07-26 2009-08-25 M2Any Gmbh Apparatus and method for robust classification of audio signals, and method for establishing and operating an audio-signal database, as well as computer program
US7529659B2 (en) 2005-09-28 2009-05-05 Audible Magic Corporation Method and apparatus for identifying an unknown work

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2731030A1 (fr) * 2012-11-13 2014-05-14 Samsung Electronics Co., Ltd Procédé de recherche d'informations musicales et appareil correspondant
US9659092B2 (en) 2012-11-13 2017-05-23 Samsung Electronics Co., Ltd. Music information searching method and apparatus thereof
CN109073614A (zh) * 2016-04-13 2018-12-21 株式会社岛津制作所 数据处理装置以及数据处理方法
CN109073614B (zh) * 2016-04-13 2020-07-14 株式会社岛津制作所 数据处理装置以及数据处理方法
CN110019922A (zh) * 2017-12-07 2019-07-16 北京雷石天地电子技术有限公司 一种音频高潮识别方法和装置
CN108039178A (zh) * 2017-12-15 2018-05-15 奕响(大连)科技有限公司 一种傅里叶变换时域与频域的音频相似判断方法
CN108091346A (zh) * 2017-12-15 2018-05-29 奕响(大连)科技有限公司 一种局部傅里叶变换的音频相似判断方法
CN114218244A (zh) * 2022-02-23 2022-03-22 华谱科仪(北京)科技有限公司 一种在线色谱仪数据库更新方法、数据识别方法及装置

Also Published As

Publication number Publication date
WO2012120531A3 (fr) 2013-03-14

Similar Documents

Publication Publication Date Title
CN102959624B (zh) 用于音频媒体识别的系统和方法
US9093120B2 (en) Audio fingerprint extraction by scaling in time and resampling
JP5362178B2 (ja) オーディオ信号からの特徴的な指紋の抽出とマッチング
KR101109303B1 (ko) 오디오 복제 검출기
US8492633B2 (en) Musical fingerprinting
US20130139673A1 (en) Musical Fingerprinting Based on Onset Intervals
US20060229878A1 (en) Waveform recognition method and apparatus
WO2012120531A2 (fr) Procédé pour détecter rapidement et avec précision une correspondance de contenus audio
US9659092B2 (en) Music information searching method and apparatus thereof
US20060155399A1 (en) Method and system for generating acoustic fingerprints
EP2973034B1 (fr) Procédés et systèmes permettant d'organiser et de rechercher une base de données d'enregistrements de contenus multimédias
US20150310008A1 (en) Clustering and synchronizing multimedia contents
US20140280233A1 (en) Methods and Systems for Arranging and Searching a Database of Media Content Recordings
Kim et al. Robust audio fingerprinting using peak-pair-based hash of non-repeating foreground audio in a real environment
WO2003088534A1 (fr) Identification de contenu audio basee sur des caracteristiques
George et al. Scalable and robust audio fingerprinting method tolerable to time-stretching
CA2439596C (fr) Procede et appareil permettant d'identifier des fichiers electroniques
Bardeli Robust identification of time-scaled audio
Kusuma et al. Audio Fingerprint Application for the Media Industry
HK1190473A (en) Extraction and matching of characteristic fingerprints from audio signals
HK1190473B (en) Extraction and matching of characteristic fingerprints from audio signals
Krishna et al. Journal Homepage:-www. journalijar. com
Sonje et al. Audio Retrieval using Hash-Index SearchTechnique
Lykartsis et al. ASSESSMENT OF FEATURE EXTRACTION METHODS IN AUDIO FINGERPRINTING
HK1181913B (en) System and method for audio media recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12740407

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12740407

Country of ref document: EP

Kind code of ref document: A2