WO2016188329A1 - 一种音频处理方法、装置及终端 - Google Patents
一种音频处理方法、装置及终端 Download PDFInfo
- Publication number
- WO2016188329A1 WO2016188329A1 PCT/CN2016/081999 CN2016081999W WO2016188329A1 WO 2016188329 A1 WO2016188329 A1 WO 2016188329A1 CN 2016081999 W CN2016081999 W CN 2016081999W WO 2016188329 A1 WO2016188329 A1 WO 2016188329A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature
- character
- value
- time
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/061—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of musical phrases, isolation of musically relevant segments, e.g. musical thumbnail generation, or for temporal structure analysis of a musical piece, e.g. determination of the movement sequence of a musical work
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/005—Non-interactive screen display of musical or status data
- G10H2220/011—Lyrics displays, e.g. for karaoke applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
Definitions
- the invention relates to the field of audio processing technologies, and in particular, to an audio processing method, device and terminal.
- the Internet audio library contains a large number of audio files such as songs and song clips.
- the application of Internet audio is also increasing, such as: K song system, listening to song system and so on.
- Many audio file application scenarios require paragraph segmentation of audio files. For example, when a song segmentation chorus is implemented in a K song system, it is usually necessary to segment the songs. For example, when listening to songs, it is necessary to listen to the song segments. It is usually necessary to divide the passage of the song; and so on.
- the segmentation of audio files is usually performed manually, and the segmentation processing efficiency is low, which cannot meet the user's demand for audio files, thereby reducing the intelligence of audio processing.
- an embodiment of the present invention provides an audio processing method, apparatus, and terminal.
- the technical solution is as follows:
- An embodiment of the present invention provides an audio processing method, including:
- the value of the at least one feature element in the sequence of correlation features determines a paragraph change time; the target audio file is divided into paragraphs of the total number of preset paragraphs according to the paragraph change time.
- the target audio can be realized according to the correlation between the constituent elements in the file data of the target audio file, such as the similarity between the character sentences, the time interval between the character sentences, or the correlation between the audio frames.
- Paragraph division of files can improve segmentation processing efficiency and improve the intelligence of audio processing.
- the subtitle feature sequence may be constructed according to the similarity between at least one character sentence in the subtitle file corresponding to the target audio file, and the subtitle feature sequence is optimized according to the total number of preset paragraphs, and optimized according to the optimization.
- the value of at least one character feature element in the subsequent subtitle feature sequence determines a paragraph change time, and then divides the target audio file into segments of the total number of preset paragraphs according to the paragraph change time, and the audio processing process utilizes
- the similarity characteristics of the character single sentences between the subtitle paragraphs based on the similarity of the character single sentences in the subtitle file, realize the paragraph division of the target audio file, which can improve the segmentation processing efficiency and improve the intelligence of the audio processing.
- the time feature sequence may be constructed according to a time interval between at least one character sentence in the subtitle file corresponding to the target audio file, and each time feature in the time feature sequence is adjusted according to the total number of preset segments. a value of the element, and determining a paragraph change time according to the adjusted value of the at least one time feature element in the time feature sequence, and then dividing the target audio file into the total number of the preset paragraphs according to the paragraph change time Paragraph, the audio processing process utilizes the time interval feature of the character single sentence between the subtitle paragraphs, and realizes segmentation of the target audio file based on the time interval between the character sentences in the subtitle file, thereby improving the segmentation processing efficiency and improving the audio processing.
- Intelligence the time interval feature of the character single sentence between the subtitle paragraphs
- a peak feature sequence may be constructed according to a correlation of at least one audio frame included in audio data of the target audio file, the peak feature sequence is normalized, and according to the regularized peak feature
- the value of at least one peak feature element in the sequence determines a paragraph change time, and the target audio file is segmented according to the paragraph change time, and the audio processing process utilizes correlation characteristics of audio frames between the audio segments to achieve Segment of the target audio file Falling partitioning can improve segmentation processing efficiency and improve the intelligence of audio processing.
- FIG. 1 is a flowchart of an audio processing method according to an embodiment of the present invention
- FIG. 2 is a flowchart of another audio processing method according to an embodiment of the present invention.
- FIG. 3 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present disclosure.
- FIG. 4 is a schematic structural view of an embodiment of the building unit shown in FIG. 3;
- FIG. 5 is a schematic structural diagram of an embodiment of the optimization unit shown in FIG. 3;
- FIG. 5 is a schematic structural diagram of an embodiment of the optimization unit shown in FIG. 3;
- FIG. 6 is a schematic structural diagram of an embodiment of the optimization processing unit shown in FIG. 5;
- Figure 7 is a schematic structural view of an embodiment of the determining unit shown in Figure 3;
- FIG. 8 is a flowchart of an audio processing method according to an embodiment of the present invention.
- FIG. 9 is a flowchart of another audio processing method according to an embodiment of the present invention.
- FIG. 10 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present disclosure.
- FIG. 11 is a schematic structural view of an embodiment of the building unit shown in FIG. 10;
- FIG. 12 is a schematic structural view of an embodiment of the adjusting unit shown in FIG. 10;
- Figure 13 is a schematic structural view of an embodiment of the determining unit shown in Figure 10;
- FIG. 14 is a flowchart of an audio processing method according to an embodiment of the present invention.
- FIG. 15 is a flowchart of another audio processing method according to an embodiment of the present invention.
- FIG. 16 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present disclosure.
- FIG. 17 is a schematic structural diagram of an embodiment of the acquiring unit shown in FIG. 16;
- Figure 18 is a schematic structural view of an embodiment of the building unit shown in Figure 16;
- FIG. 19 is a schematic structural view of an embodiment of the regular processing unit shown in FIG. 16;
- Fig. 20 is a schematic structural view of an embodiment of the determining unit shown in Fig. 16.
- the audio file may include, but is not limited to, a song, a song fragment, and the like.
- Subtitle files may include, but are not limited to, lyrics, lyrics fragments, and the like.
- An audio file can correspond to a subtitle file.
- a subtitle file can be arranged by at least one character single sentence order. Taking song A as an example, the subtitle file corresponding to song A can be expressed as follows:
- the unit time is usually Ms, for example: the above [641, 770] is used to describe the time attribute of the character single sentence "a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 ", where "641" represents the character single sentence "a 1 a 2
- the order of each character single sentence included in the subtitle file may be determined, for example, according to the description of the subtitle file corresponding to the above song A, the character sentence “a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 ” is the first character single sentence; the character single sentence “b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 ” is the second character single sentence; the character single sentence “c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 ” is a single sentence, and so on.
- the character single sentence "a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 " and the character single sentence "b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 " are character single sentences "c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 ′ precede character single sentence, character single sentence “b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 ” and character single sentence “c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 ” is a single sentence of the character single sentence “a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 ”, and so on.
- the character single sentence "a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 " is an adjacent preceding character sentence of the character single sentence "b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 ";
- the character single sentence "b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 " is the adjacent single character sentence of the character single sentence "a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 ", And so on.
- An audio file can be divided into multiple audio passages, and the audio passages usually have a certain Repeatability; then, a subtitle file can be divided into multiple subtitle paragraphs, and the subtitle paragraphs have certain similarities, that is to say, there is a certain similarity between the subtitles contained in the subtitle paragraphs.
- the embodiment of the present invention can utilize the similarity feature of the character single sentence between the subtitle paragraphs described above, and implement the paragraph division of the target audio file based on the similarity of the character single sentences in the subtitle file.
- An audio file can be divided into multiple audio passages. There is usually a long pause between audio passages, that is, there is usually a long time interval between audio passages; then, one subtitle file can be divided into multiple subtitles. There is a long time interval between paragraphs and subtitle paragraphs, that is, there is a long time interval between the character sentences contained in the subtitle paragraphs.
- the embodiment of the present invention can utilize the time interval feature of the character single sentence between the subtitle paragraphs described above, and implement the segmentation of the target audio file based on the time interval between the character sentences in the subtitle file.
- an audio file includes audio data, and audio data (e.g., PCM data) can be obtained by decoding (e.g., PCM decoding) the audio file.
- the audio data of one audio file may include at least one audio frame, that is, the audio data of one audio file may be represented as a sequence of frames composed of a plurality of audio frame sequences.
- An audio file can be divided into multiple audio passages, and the audio passages usually have a certain degree of repeatability, that is, there is a certain correlation between the audio frames contained between the audio passages.
- the embodiment of the present invention can utilize the correlation feature of the audio frames between the audio passages described above to implement paragraph division of the target audio file.
- an embodiment of the present invention provides an audio processing method, which specifically includes: acquiring file data of a target audio file; and constructing a correlation feature sequence according to correlation feature data between constituent elements of the file data; Determining the sequence of correlation features according to the total number of preset paragraphs; determining a paragraph change time according to the value of the at least one feature element in the optimized correlation feature sequence; dividing the target audio file according to the paragraph change time a paragraph for the total number of preset paragraphs.
- the target audio can be realized according to the correlation between the constituent elements in the file data of the target audio file, such as the similarity between the character sentences, the time interval between the character sentences, or the correlation between the audio frames. Paragraph division of files can improve segmentation processing efficiency and improve the intelligence of audio processing.
- FIG. 1 is a flowchart of an audio processing method according to an embodiment of the present invention; the method may include the following steps S101 to S105.
- An audio file corresponds to a subtitle file.
- a plurality of audio files, attributes of each audio file, and subtitle files corresponding to each audio file are stored in the Internet audio library, wherein the attributes of the audio file may include, but are not limited to, audio features of the audio file, and audio files.
- logo and more the subtitle file corresponding to the target audio file may be obtained from the Internet audio library; the specific acquisition manner may include, but is not limited to: searching for the subtitle corresponding to the target audio file in the Internet audio library according to the identifier of the target audio file. File and obtain the subtitle file found; or, the audio feature of the target audio file can be extracted and matched with the audio feature of the audio file in the Internet audio library, thereby locating the target audio file in the Internet audio library, and obtaining the corresponding Subtitle file.
- the target audio file is the song A
- the structure of the subtitle file corresponding to the song A can be referred to the example shown in this embodiment.
- the subtitle file is composed of N (N is a positive integer) characters in a single sentence sequence.
- p(0) can be used to represent the first character single sentence "a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 ′
- p(1) can be used to represent the second character single sentence “b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 ”
- p(2) can be used to represent the third character single sentence “c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 ′
- p(N-1) is used to represent the Nth character single sentence.
- the subtitle feature sequence can be used to reflect the similarity between the at least one character single sentence.
- the similarity algorithm may first be used to calculate the similarity between the at least one character single sentence, where the similarity between each character single sentence and its subsequent character single sentence needs to be calculated, that is, the calculation of p ( 0) similarity between p(1), similarity between p(0) and p(2)... similarity between p(0) and p(N-1); calculation p(1) The similarity between p(2), the similarity between p(1) and p(3), the similarity between p(1) and p(N-1); and so on.
- the similarity algorithm may include, but is not limited to, an edit distance algorithm (Levenshtein Distance), a longest common subsequences (LCS), a Heckel algorithm, a Greedy String Tiling (GST), and the like. .
- the subtitle feature sequence may be constructed according to the number, order, and calculated similarity of the at least one character single sentence.
- the constructed subtitle feature sequence s(n) includes a total of N character feature elements, respectively s(0), s ( 1)...s(N-1).
- s(0) can be used to describe the similarity between p(0) and its subsequent character single sentence
- the number of s(1) The value can be used to describe the similarity between p(1) and its subsequent character single sentence; and so on.
- the total number of preset paragraphs may be set according to the actual segmentation requirements of the user for the target audio file. Assuming that M (M is a positive integer and M>1) represents the total number of preset paragraphs, the purpose of optimizing the subtitle feature sequence s(n) according to the total number of preset paragraphs M is to make the optimized The subtitle feature sequence s(n) can be divided into the total number of preset paragraphs M subtitle paragraphs to meet the actual segmentation requirements of the target audio file.
- the optimized subtitle feature sequence s(n) can be divided into a total number of preset paragraphs M subtitle paragraphs, and the value of the character feature elements in the subtitle feature sequence s(n) can be used to describe characters.
- the turning point of the M subtitle segments can be determined, and further M subtitle segments can be obtained from the subtitle file. Start and end time.
- the target audio file is divided into paragraphs of the total number of preset paragraphs according to the paragraph change time. Since the audio file and the subtitle file correspond to each other, according to the start and end time of the obtained M subtitle segments, the target audio file can be segmented correspondingly to obtain M audio passages.
- the subtitle feature sequence may be constructed according to the similarity between at least one character sentence in the subtitle file corresponding to the target audio file, and the subtitle feature sequence is optimized according to the total number of preset paragraphs, and according to the optimized
- the value of at least one character feature element in the sequence of subtitle features determines a paragraph change time, and then divides the target audio file into paragraphs of the total number of preset paragraphs according to the paragraph change time, and the audio processing process utilizes a subtitle paragraph
- the similarity characteristics of the character single sentence between the two, based on the similarity of the character single sentence in the subtitle file to achieve paragraph segmentation of the target audio file can improve the segmentation processing efficiency and enhance the intelligence of audio processing.
- FIG. 2 is a flowchart of another audio processing method according to an embodiment of the present invention; the method may include the following steps S201 to S213.
- the target audio file is the song A
- the structure of the subtitle file corresponding to the song A can be referred to the example shown in this embodiment.
- the subtitle file is composed of N (N is a positive integer) characters in a single sentence sequence.
- p(0) can be used to represent the first character single sentence "a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 ′
- p(1) can be used to represent the second character single sentence “b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 ”
- p(2) can be used to represent the third character single sentence “c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 ′
- p(N-1) is used to represent the Nth character single sentence.
- step S201 of the embodiment reference may be made to the step S101 of the embodiment shown in FIG. 1 , and details are not described herein.
- S202 Determine, according to the quantity of the at least one character single sentence, the number of character feature elements that construct the subtitle feature sequence.
- the subtitle file is composed of N (N is a positive integer) characters in a single sentence order, that is, the number of the at least one character single sentence is N, then, this step may determine that the number of character feature elements of the subtitle feature sequence is also N. That is, the length of the subtitle feature sequence is N.
- the constructed subtitle feature sequence s(n) includes a total of N character feature elements, respectively s(0), s(1)...s(N-1) ).
- S203 Determine an index of each character feature element that constructs the subtitle feature sequence according to an order of each character single sentence in the at least one character single sentence.
- the order of the N character single sentences of the subtitle file is arranged as p(0), p(1)...p(N-1), and it is assumed that the subtitle feature sequence s(n): s(0) corresponds to p(0) , s(1) corresponds to p(1), and so on, and s(N-1) corresponds to p(N-1). Then, the index of s(0) in the subtitle feature sequence s(n) is 1, that is, the first character feature element; the index of s(1) is 2, that is, the second character feature element; and so on, The index of s(N-1) is N, that is, the Nth character feature element.
- the initial value may be set according to actual needs.
- the initial value may be assumed to be 0.
- the specific processing procedure of this step S205 may include the following s11-s13:
- the similarity algorithm may include, but is not limited to, an edit distance algorithm, a longest common substring algorithm, a Heckel algorithm, a greedy string matching algorithm, and the like.
- the similarity obtained by the calculation is normalized to the interval of [0, 1]. If the similarity between the two sentences is equal to 0, it indicates that the two sentences are completely different. If the similarity between a two-character single sentence is equal to 1, it means that the two characters are exactly the same.
- S13 Determine whether the extracted maximum similarity is greater than a preset similar threshold, and change the value of the corresponding character feature element according to the determination result.
- the preset similar threshold may be set according to actual needs, and the preset similar threshold may be represented by Th, and 0 ⁇ Th ⁇ 1.
- the target value may be set according to actual needs, and the target value is greater than the initial value. In this embodiment, the target value may be set to 1. According to the example shown in step s12, for example, it is determined whether Q 02 is greater than a preset similar threshold Th.
- the constructed subtitle feature sequence is s(n), and s(n) is composed of N character feature elements s(0), s(1)...s(N-1), and the subtitle feature sequence
- the value of each character feature element in s(n) forms a sequence consisting of 0 and 1.
- Steps S202 to S206 of this embodiment may be specific refinement steps of step S102 of the embodiment shown in FIG. 1.
- step S208 Determine whether the quantity is located in a fault tolerance interval corresponding to the total number of preset paragraphs; If the result of the disconnection is YES, the process proceeds to step S210; if the result of the determination is negative, the process proceeds to step S209.
- M (M is a positive integer and M>1) represents the total number of preset paragraphs
- the fault tolerance interval corresponding to the total number M of preset paragraphs can be expressed as [Mu, M+u] (u is an integer), where u Indicates an integer range interval, which can be set according to actual needs.
- it is required to determine whether the number of character feature elements whose value is 1 in the subtitle feature sequence s(n) is located in the interval of [Mu, M+u], and if the determination result is yes, indicating that The subtitle feature sequence s(n) can be divided into a total number of preset paragraphs M subtitle segments to meet the actual segmentation requirements of the target audio file. If the judgment result is no, it indicates that the subtitle feature sequence s(n) cannot be well divided into the total number of preset paragraphs M subtitle paragraphs, and the actual segmentation requirement for the target audio file cannot be satisfied, and some adjustment is needed.
- the adjustment process of this step may include the following steps s21-s22:
- the value of the preset similar threshold Th needs to be increased according to the preset step size, and the above step s13 is performed again to adjust the value of each character feature element in the subtitle feature sequence. .
- the preset similar threshold is decreased according to a preset step size to adjust each character feature element in the subtitle feature sequence. The value.
- the value of the preset similar threshold Th needs to be decreased according to the preset step size, and the above step s13 is re-executed to adjust the value of each character feature element in the subtitle feature sequence.
- the preset step size may be set according to actual needs, and the preset step size may be a fixed step size, that is, the preset similar valve is increased or decreased each time by using a fixed step size.
- the value of the value Th; the preset step size may also be a random step size, that is, each time the asynchronous length is used to increase or decrease the value of the preset similar threshold Th.
- Steps S207 to S209 of this embodiment may be specific refinement steps of step S103 of the embodiment shown in FIG. 1.
- the target index is 5 and 11, and the character single sentence of the paragraph turn can be located in the subtitle file as the fifth character single sentence and the eleventh character single sentence, that is, the fifth character single sentence is a subtitle paragraph.
- the starting position that is, the 1-4 character single sentence in the subtitle file constitutes a subtitle paragraph;
- the 11th character single sentence is the starting position of another subtitle paragraph, that is, the 5-10 character single sentence in the subtitle file Form a subtitle paragraph.
- the time attribute of each character single sentence is recorded in the subtitle file, including the start time, duration, and end time of each character single sentence; this step may read the paragraph change time from the subtitle file, according to this embodiment.
- the 1-4 character single sentence in the subtitle file constitutes a subtitle paragraph
- the read paragraph change time is: the end time of the fourth character single sentence and the start time of the fifth character single sentence
- the 5-10 character single sentence in the subtitle file constitutes a subtitle paragraph
- the read paragraph change time is: the end time of the 10th character single sentence and the start time of the 11th character single sentence.
- Step S210 to step S212 of this embodiment may be a specific refinement step of step S104 of the embodiment shown in FIG. 1.
- the start and end time of the M subtitle segments can be obtained according to step S210 - step S212.
- the target audio file is divided into paragraphs of the total number of preset paragraphs according to the paragraph change time. Since the audio file and the subtitle file correspond to each other, according to the start and end time of the obtained M subtitle segments, the target audio file can be segmented correspondingly to obtain M audio passages.
- step S213 of the embodiment reference may be made to the step S105 of the embodiment shown in FIG. 1 , and details are not described herein.
- the subtitle feature sequence may be constructed according to the similarity between at least one character sentence in the subtitle file corresponding to the target audio file, and the subtitle feature sequence is optimized according to the total number of preset paragraphs, and according to the optimized
- the value of at least one character feature element in the sequence of subtitle features determines a paragraph change time, and then divides the target audio file into paragraphs of the total number of preset paragraphs according to the paragraph change time, and the audio processing process utilizes a subtitle paragraph
- the similarity feature of the character single sentence between the two, based on the similarity of the character single sentence in the subtitle file to achieve the paragraph of the target audio file Points can improve segmentation processing efficiency and improve the intelligence of audio processing.
- FIGS. 3 to 7 The structure and function of the audio processing device provided by the embodiment of the present invention will be described in detail below with reference to FIG. It should be noted that the apparatus shown in the following FIGS. 3 to 7 can be operated in the terminal to be applied to the method shown in the above-mentioned FIGS. 1 to 2.
- FIG. 3 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present invention.
- the apparatus may include: an obtaining unit 301, a constructing unit 302, an optimizing unit 303, a determining unit 304, and a segmenting unit 305.
- the obtaining unit 301 is configured to acquire a subtitle file corresponding to the target audio file, where the subtitle file is composed of at least one character single sentence order.
- An audio file corresponds to a subtitle file.
- a plurality of audio files, attributes of each audio file, and subtitle files corresponding to each audio file are stored in the Internet audio library, wherein the attributes of the audio file may include, but are not limited to, audio features of the audio file, and audio files.
- the obtaining unit 301 may obtain the subtitle file corresponding to the target audio file from the Internet audio library.
- the specific acquisition manner may include, but is not limited to: searching for the target audio file in the Internet audio library according to the identifier of the target audio file. Subtitle file and obtaining the subtitle file found; or, the audio feature of the target audio file can be extracted and matched with the audio feature of the audio file in the Internet audio library, thereby locating the target audio file in the Internet audio library and obtaining Corresponding subtitle file.
- the target audio file is the song A
- the structure of the subtitle file corresponding to the song A can be referred to the example shown in this embodiment.
- the subtitle file is composed of N (N is a positive integer) characters in a single sentence sequence.
- p(0) can be used to represent the first character single sentence "a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 ′
- p(1) can be used to represent the second character single sentence “b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 ”
- p(2) can be used to represent the third character single sentence “c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 ′
- p(N-1) is used to represent the Nth character single sentence.
- the constructing unit 302 is configured to construct a caption feature sequence according to the similarity between the at least one character single sentence, the caption feature sequence including at least one character feature element.
- the subtitle feature sequence can be used to reflect the similarity between the at least one character single sentence.
- the constructing unit 302 may calculate a similarity between the at least one character single sentence by using a similarity algorithm, where the similarity between each character single sentence and its subsequent character single sentence needs to be calculated, that is, the calculation needs to be performed. (0) the similarity between p(1), the similarity between p(0) and p(2)... the similarity between p(0) and p(N-1); calculate p( 1) similarity between p(2), similarity between p(1) and p(3)...p(1) Similarity to p(N-1); and so on.
- the similarity algorithm may include, but is not limited to, an edit distance algorithm, a longest common substring algorithm, a Heckel algorithm, a greedy string matching algorithm, and the like.
- the constructing unit 302 may construct the caption feature sequence according to the number, order, and calculated similarity of the at least one character single sentence.
- the constructed subtitle feature sequence s(n) includes a total of N character feature elements, respectively s(0), s ( 1)...s(N-1).
- s(0) can be used to describe the similarity between p(0) and its subsequent character single sentence
- value of s(1) can be used to describe the similarity between p(1) and its subsequent character single sentence
- the optimization unit 303 is configured to optimize the sequence of the subtitle features according to the total number of preset paragraphs.
- the total number of preset paragraphs may be set according to the actual segmentation requirements of the user for the target audio file. Assuming that M (M is a positive integer and M>1) represents the total number of preset paragraphs, the optimization unit 303 optimizes the subtitle feature sequence s(n) according to the preset number of paragraphs M to make The optimized subtitle feature sequence s(n) can be divided into a total number of preset paragraphs M subtitle segments to meet the actual segmentation requirements of the target audio file.
- the determining unit 304 is configured to determine a paragraph change time according to the value of the at least one character feature element in the optimized caption feature sequence.
- the optimized subtitle feature sequence s(n) can be divided into a total number of preset paragraphs M subtitle paragraphs, and the value of the character feature elements in the subtitle feature sequence s(n) can be used to describe characters.
- the similarity between the single sentences, then the determining unit 304 can determine the turning point of the M subtitle segments according to the value of the character feature elements in the optimized subtitle feature sequence s(n), and further obtain the subtitle file from the subtitle file.
- the segmentation unit 305 is configured to divide the target audio file into segments of the total number of preset paragraphs according to the paragraph change time.
- the segmentation unit 305 can correspondingly segment the target audio file according to the start and end time of the obtained M subtitle segments, and obtain M audio passages.
- the subtitle feature sequence may be constructed according to the similarity between at least one character sentence in the subtitle file corresponding to the target audio file, and the subtitle feature sequence is optimized according to the total number of preset paragraphs, and according to the optimized
- the value of at least one character feature element in the sequence of subtitle features determines a paragraph change time, and then divides the target audio file into paragraphs of the total number of preset paragraphs according to the paragraph change time, and the audio processing process utilizes a subtitle paragraph Similarity between single sentences Sexual characteristics, based on the similarity of character single sentences in the subtitle file to achieve paragraph segmentation of the target audio file, can improve segmentation processing efficiency and improve the intelligence of audio processing.
- the building unit 302 may include: a quantity determining unit 401, an index determining unit 402, a value setting unit 403, a value changing unit 404, and a sequence building unit. 405.
- the quantity determining unit 401 is configured to determine the number of character feature elements that construct the subtitle feature sequence according to the number of the at least one character single sentence.
- the caption file is composed of N (N is a positive integer) characters in a single sentence order, that is, the number of the at least one character single sentence is N, then the number determining unit 401 can determine the character feature element of the caption feature sequence.
- the number is also N, that is, the length of the subtitle feature sequence is N.
- the constructed subtitle feature sequence s(n) includes a total of N character feature elements, respectively s(0), s(1)...s(N-1) ).
- the index determining unit 402 is configured to determine an index of each character feature element that constructs the subtitle feature sequence according to an order of each character single sentence in the at least one character sentence.
- the order of the N character single sentences of the subtitle file is arranged as p(0), p(1)...p(N-1), and it is assumed that the subtitle feature sequence s(n): s(0) corresponds to p(0) , s(1) corresponds to p(1), and so on, and s(N-1) corresponds to p(N-1). Then, the index of s(0) in the subtitle feature sequence s(n) is 1, that is, the first character feature element; the index of s(1) is 2, that is, the second character feature element; and so on, The index of s(N-1) is N, that is, the Nth character feature element.
- the value setting unit 403 is configured to set a value of each character feature element that constructs the subtitle feature sequence to an initial value.
- the initial value may be set according to actual needs.
- the initial value may be assumed to be 0.
- a value changing unit 404 configured to: for any one of the at least one character sentence, if the maximum similarity between the target character sentence and the subsequent character sentence of the target character sentence is greater than a preset similarity valve A value that changes a value of a character feature element corresponding to the target character sentence from an initial value to a target value.
- the specific processing of the data changing unit 404 may include the following A-C:
- A using a similarity algorithm to calculate the similarity between the at least one character sentence, which is required here Calculate the similarity between each character single sentence and its subsequent character single sentence, that is, calculate the similarity between p(0) and p(1), and the similarity between p(0) and p(2) Degrees of similarity between p(0) and p(N-1); calculating the similarity between p(1) and p(2), and the similarity between p(1) and p(3) ...the similarity between p(1) and p(N-1); and so on.
- the similarity algorithm may include, but is not limited to, an edit distance algorithm, a longest common substring algorithm, a Heckel algorithm, a greedy string matching algorithm, and the like.
- the similarity obtained by the calculation is normalized to the interval of [0, 1]. If the similarity between the two sentences is equal to 0, it indicates that the two sentences are completely different. If the similarity between a two-character single sentence is equal to 1, it means that the two characters are exactly the same.
- the preset similar threshold may be set according to actual needs, and the preset similar threshold may be represented by Th, and 0 ⁇ Th ⁇ 1.
- the target value may be set according to actual needs, and the target value is greater than the initial value. In this embodiment, the target value may be set to 1. According to the example shown in this embodiment, for example, it is determined whether Q 02 is greater than a preset similar threshold Th.
- the sequence construction unit 405 is configured to construct the subtitle feature sequence according to the number, index, and value of the character feature elements that construct the subtitle feature sequence.
- the constructed subtitle feature sequence is s(n), and s(n) is composed of N character feature elements s(0), s(1)...s(N-1), and the subtitle feature sequence
- the value of each character feature element in s(n) forms a sequence consisting of 0 and 1.
- the subtitle feature sequence may be constructed according to the similarity between at least one character sentence in the subtitle file corresponding to the target audio file, and the subtitle feature sequence is optimized according to the total number of preset paragraphs, and according to the optimized
- the value of at least one character feature element in the sequence of subtitle features determines a paragraph change time, and then divides the target audio file into paragraphs of the total number of preset paragraphs according to the paragraph change time, and the audio processing process utilizes a subtitle paragraph
- the similarity feature of the character single sentence between the two, based on the similarity of the character single sentence in the subtitle file to achieve the paragraph of the target audio file Points can improve segmentation processing efficiency and improve the intelligence of audio processing.
- FIG. 5 is a schematic structural diagram of an embodiment of an optimization unit shown in FIG. 3; the optimization unit 303 may include: a quantity statistics unit 501, a determination unit 502, and an optimization processing unit 503.
- the quantity statistics unit 501 is configured to count the number of character feature elements whose value in the subtitle feature sequence is the target value. According to the example of the embodiment shown in Fig. 4, the number statistic unit 501 needs to count the number of character feature elements whose value is 1 in the subtitle feature sequence s(n).
- the determining unit 502 is configured to determine whether the quantity is located in a fault tolerance interval corresponding to the total number of preset paragraphs.
- M (M is a positive integer and M>1) represents the total number of preset paragraphs
- the fault tolerance interval corresponding to the total number M of preset paragraphs can be expressed as [Mu, M+u] (u is an integer), where u Indicates an integer range interval, which can be set according to actual needs.
- the determining unit 502 needs to determine whether the number of character feature elements whose value is 1 in the caption feature sequence s(n) is located in the interval of [Mu, M+u], and if the judgment result is yes, indicating that the The subtitle feature sequence s(n) can be divided into a total number of preset paragraphs M subtitle segments to meet the actual segmentation requirements of the target audio file. If the judgment result is no, it indicates that the subtitle feature sequence s(n) cannot be well divided into the total number of preset paragraphs M subtitle paragraphs, and the actual segmentation requirement for the target audio file cannot be satisfied, and some adjustment is needed.
- the optimization processing unit 503 is configured to adjust the size of the preset similar threshold to adjust the value of each character feature element in the subtitle feature sequence if the determination result is no.
- FIG. 6 is a schematic structural diagram of an embodiment of the optimization processing unit shown in FIG. 5 .
- the optimization processing unit 503 includes: a first adjustment unit 601 and a second adjustment unit 602.
- the first adjusting unit 601 is configured to: if the quantity is greater than a maximum fault tolerance value in a fault tolerance interval corresponding to the total number of the preset paragraphs, increase the preset similar threshold according to a preset step size to adjust the subtitle feature sequence The value of each character's feature element in .
- the first adjusting unit 601 needs to increase the value of the preset similar threshold Th according to a preset step size, and re-adjust each character ç characteristic element in the subtitle feature sequence. The value.
- the second adjusting unit 602 is configured to: if the quantity is less than a maximum fault tolerance value in a fault tolerance interval corresponding to the total number of the preset paragraphs, reduce the preset similar threshold according to a preset step size to adjust the subtitle feature sequence The value of each character's feature element in .
- the second adjusting unit 602 needs to decrease the value of the preset similar threshold Th according to a preset step size, and re-adjust the value of each character feature element in the subtitle feature sequence.
- the preset step size may be set according to actual needs, and the preset step size may be a fixed step size, that is, each time a fixed step size is used to increase or decrease the value of the preset similar threshold value Th.
- the preset step size may also be a random step size, that is, each time the unsynchronized length is used to increase or decrease the value of the preset similar threshold Th.
- the subtitle feature sequence may be constructed according to the similarity between at least one character sentence in the subtitle file corresponding to the target audio file, and the subtitle feature sequence is optimized according to the total number of preset paragraphs, and according to the optimized
- the value of at least one character feature element in the sequence of subtitle features determines a paragraph change time, and then divides the target audio file into paragraphs of the total number of preset paragraphs according to the paragraph change time, and the audio processing process utilizes a subtitle paragraph
- the similarity characteristics of the character single sentence between the two, based on the similarity of the character single sentence in the subtitle file to achieve paragraph segmentation of the target audio file can improve the segmentation processing efficiency and enhance the intelligence of audio processing.
- FIG. 7 is a schematic structural diagram of an embodiment of the determining unit 304 shown in FIG. 3; the determining unit 304 may include: a target index acquiring unit 701, a positioning unit 702, and a time reading unit 703.
- the target index obtaining unit 701 is configured to obtain, from the optimized sequence of the caption features, a target index corresponding to the character feature element whose value is the target value.
- the target index obtaining unit 701 can obtain the target index of 5 and 11.
- the locating unit 702 is configured to locate a character single sentence of a paragraph turn in the subtitle file according to the target index.
- the target index is 5 and 11, and the positioning unit 702 can locate the character single sentence of the paragraph turn in the subtitle file as the fifth character single sentence and the eleventh character single sentence, that is, the fifth character single sentence
- the starting position of a subtitle paragraph, that is, the 1-4 character single sentence in the subtitle file constitutes a subtitle paragraph; the 11th character single sentence is the starting position of another subtitle paragraph, that is, the 5th in the subtitle file -
- a 10-character single sentence forms a subtitle paragraph.
- the time reading unit 703 is configured to read the paragraph change time from the subtitle file according to the character single sentence of the paragraph transition.
- the time reading unit 703 can read the paragraph change time from the subtitle file.
- the 1-4 character single sentence in the subtitle file constitutes a subtitle paragraph
- the read paragraph change time is: the end time of the fourth character single sentence and the fifth character single sentence
- the start time; the 5-10 character single sentence in the subtitle file constitutes a subtitle paragraph, then the read paragraph change time is: the end time of the 10th character single sentence and the start time of the 11th character single sentence.
- the subtitle feature sequence may be constructed according to the similarity between at least one character sentence in the subtitle file corresponding to the target audio file, and the subtitle feature sequence is optimized according to the total number of preset paragraphs, and according to the optimized
- the value of at least one character feature element in the sequence of subtitle features determines a paragraph change time, and then divides the target audio file into paragraphs of the total number of preset paragraphs according to the paragraph change time, and the audio processing process utilizes a subtitle paragraph
- the similarity characteristics of the character single sentence between the two, based on the similarity of the character single sentence in the subtitle file to achieve paragraph segmentation of the target audio file can improve the segmentation processing efficiency and enhance the intelligence of audio processing.
- the embodiment of the invention further discloses a terminal, which can be a PC (Personal Computer), a notebook computer, a mobile phone, a PAD (tablet computer), an in-vehicle terminal, a smart wearable device and the like.
- a terminal which can be a PC (Personal Computer), a notebook computer, a mobile phone, a PAD (tablet computer), an in-vehicle terminal, a smart wearable device and the like.
- An audio processing device may be included in the terminal.
- For the structure and function of the device refer to the related description of the embodiment shown in FIG. 3 to FIG. 7 , and details are not described herein.
- the subtitle feature sequence may be constructed according to the similarity between at least one character sentence in the subtitle file corresponding to the target audio file, and the subtitle feature sequence is optimized according to the total number of preset paragraphs, and according to the optimized
- the value of at least one character feature element in the sequence of subtitle features determines a paragraph change time, and then divides the target audio file into paragraphs of the total number of preset paragraphs according to the paragraph change time, and the audio processing process utilizes a subtitle paragraph
- the similarity characteristics of the character single sentence between the two, based on the similarity of the character single sentence in the subtitle file to achieve paragraph segmentation of the target audio file can improve the segmentation processing efficiency and enhance the intelligence of audio processing.
- a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
- the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
- FIG. 8 is a flowchart of an audio processing method according to an embodiment of the present invention. the method may include the following steps S801 to S805.
- An audio file corresponds to a subtitle file.
- the subtitle file includes at least one character single sentence and key information of each character single sentence; the key information of a character single sentence includes: an identifier (ID), a start time (start_time), and an end time (end_time).
- ID an identifier
- start_time a start time
- end_time an end time
- a plurality of audio files, attributes of each audio file, and subtitle files corresponding to each audio file are stored in the Internet audio library, wherein the attributes of the audio file may include, but are not limited to, audio features of the audio file, and audio files. logo and more.
- the subtitle file corresponding to the target audio file may be obtained from the Internet audio library; the specific acquisition manner may include, but is not limited to: searching for the subtitle corresponding to the target audio file in the Internet audio library according to the identifier of the target audio file. File and obtain the subtitle file found; or, the audio feature of the target audio file can be extracted and matched with the audio feature of the audio file in the Internet audio library, thereby locating the target audio file in the Internet audio library, and obtaining the corresponding Subtitle file.
- the target audio file is the song A
- the structure of the subtitle file corresponding to the song A can be referred to the example shown in this embodiment.
- the subtitle file is composed of N (N is a positive integer) characters in a single sentence sequence.
- p(0) can be used to represent the first character single sentence "a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 ′
- p(1) can be used to represent the second character single sentence “b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 ”
- p(2) can be used to represent the third character single sentence “c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 ′
- p(N-1) is used to represent the Nth character single sentence.
- the sequence of time features can be used to reflect the degree of time interval between the at least one character single sentence.
- the time feature sequence can be constructed according to the number, order, and calculated time interval of the at least one character single sentence.
- t(n) is used to represent the time feature sequence.
- the constructed time feature sequence t(n) includes a total of N time feature elements, which are t(0), t(1)...t(N-1), respectively.
- t(0) can be set to 0
- the value of t(1) is used to represent the time interval between p(1) and p(0)
- the value of t(2) is used to represent p(2)
- the time interval from p(1); and so on, the value of t(N-1) is used to represent the time interval between p(N-1) and p(N-2).
- the total number of preset paragraphs may be set according to the actual segmentation requirements of the user for the target audio file. Assuming that M (M is a positive integer and M>1) represents the total number of the preset paragraphs, the numerical value of each time feature element in the time feature sequence t(n) is adjusted according to the total number M of preset paragraphs. The adjusted time feature sequence t(n) can just extract the turning points corresponding to the M subtitle segments, thereby realizing the actual segmentation requirement of the target audio file.
- each time feature element in the adjusted time feature sequence t(n) can reflect the turning point corresponding to the M caption segments. Then, the step may be based on at least one time feature in the adjusted time feature sequence. The value of the element, the start and end time of the M subtitle paragraphs is obtained from the subtitle file.
- the target audio file is divided into paragraphs of the total number of preset paragraphs according to the paragraph change time. Since the audio file and the subtitle file correspond to each other, according to the start and end time of the obtained M subtitle segments, the target audio file can be segmented correspondingly to obtain M audio passages.
- the time feature sequence may be constructed according to a time interval between at least one character sentence in the subtitle file corresponding to the target audio file, and the value of each time feature element in the time feature sequence is adjusted according to the total number of preset segments. And determining a paragraph change time according to the adjusted value of the at least one time feature element in the time feature sequence, and then dividing the target audio file into the paragraphs of the total number of preset paragraphs according to the paragraph change time,
- the audio processing process utilizes the time interval feature of the character single sentence between the subtitle paragraphs, and realizes the segmentation of the target audio file based on the time interval between the character sentences in the subtitle file, thereby improving the segmentation processing efficiency and improving the intelligence of the audio processing. .
- FIG. 9 is a flowchart of another audio processing method according to an embodiment of the present invention.
- the method may include the following steps S901 to S905.
- the target audio file is the song A
- the structure of the subtitle file corresponding to the song A can be referred to the example shown in this embodiment.
- the subtitle file is composed of N (N is a positive integer) characters in a single sentence sequence.
- p(0) can be used to represent the first character single sentence "a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 ′
- p(1) can be used to represent the second character single sentence “b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 ”
- p(2) can be used to represent the third character single sentence “c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 ′
- p(N-1) is used to represent the Nth character single sentence.
- step S901 of the embodiment reference may be made to the step S801 of the embodiment shown in FIG. 1 , and details are not described herein.
- S902. Determine, according to the quantity of the at least one character single sentence, the number of time feature elements of the time-of-time feature sequence.
- the subtitle file is composed of N (N is a positive integer) characters in a single sentence order, that is, the number of the at least one character single sentence is N, then, this step may determine that the number of time feature elements of the time feature sequence is also N. That is, the length of the time feature sequence is N. Assuming that t(n) is used to represent the time feature sequence, the constructed time feature sequence t(n) includes N time feature elements, respectively t(0), t(1)...t(N-1). ).
- S903. Determine an index of each time feature element that constructs the time feature sequence according to an order of each character single sentence in the at least one character single sentence.
- the order of the N character single sentences of the subtitle file is arranged as p(0), p(1)...p(N-1), assuming that the time feature sequence t(n): t(0) corresponds to p(0) , t(1) corresponds to p(1), and so on, and t(N-1) corresponds to p(N-1). Then, the index of t(0) in the time feature sequence t(n) is 1, that is, the first time feature element; the index of t(1) is 2, that is, the second time feature element; and so on, The index of t(N-1) is N, that is, the Nth time feature element.
- step S904 may include the following steps s11-s12:
- the time characteristic sequence constructed is t(n), and t(n) is composed of N time feature elements t(0), t(1)...t(N-1), and the time feature sequence
- Steps S902 to S905 of this embodiment may be specific refinement steps of step S802 of the embodiment shown in FIG. 8.
- S907 Adjust a value of the found time feature element to a target value, and adjust a value of the time feature element other than the found time feature element in the time feature sequence to a reference value.
- the target value and the feature value may be set according to actual needs. In the embodiment of the present invention, the target value may be set to 1, and the reference value is 0.
- the specific processing of the step S906-S907 may be: first traversing the value of each time feature element in the time feature sequence t(n), and finding the time feature element corresponding to the maximum value; excluding the found time feature element, The values of each time feature element in the time feature sequence t(n) are traversed, and the time feature element corresponding to the largest value is found therefrom; the traversal process is repeated until the M-1 maximum values are found. Finally, the M-1 maximum values found in the time feature sequence t(n) are all adjusted to 1, and the other values are adjusted to 0.
- Steps S906 to S907 of this embodiment may be specific refinement steps of step S803 of the embodiment shown in FIG. Since the M subtitle paragraphs correspond to the M-1 paragraph turning points, the adjusted time feature sequence t(n) can be extracted to the M-1 paragraph turning points corresponding to the M subtitle paragraphs through steps S906-S907. Thereby achieving the actual segmentation requirements for the target audio file.
- the character single sentence of the paragraph turn can be located in the subtitle file as a 5th character single sentence, that is, the 5th character single sentence is the starting position of a subtitle paragraph, that is, the The 1-4 character single sentence in the subtitle file constitutes a subtitle paragraph.
- this step can read the paragraph change time from the subtitle file, according to the example shown in this embodiment.
- the 1-4 character single sentence in the subtitle file constitutes a subtitle paragraph, and the read paragraph change time is: the end time of the fourth character single sentence and the start time of the fifth character single sentence.
- Step S908 to step S910 of this embodiment may be a specific refinement step of step S804 of the embodiment shown in FIG. 8. According to step S908-step S910, the start and end time of the M subtitle segments can be obtained.
- the target audio file is divided into paragraphs of the total number of preset paragraphs according to the paragraph change time. Since the audio file and the subtitle file correspond to each other, according to the start and end time of the obtained M subtitle segments, the target audio file can be segmented correspondingly to obtain M audio passages.
- step S911 of the embodiment reference may be made to the step S805 of the embodiment shown in FIG. 8 , and details are not described herein.
- the time feature sequence may be constructed according to a time interval between at least one character sentence in the subtitle file corresponding to the target audio file, and the value of each time feature element in the time feature sequence is adjusted according to the total number of preset segments. And determining a paragraph change time according to the adjusted value of the at least one time feature element in the time feature sequence, and then dividing the target audio file into the paragraphs of the total number of preset paragraphs according to the paragraph change time,
- the audio processing process utilizes the time interval feature of the character single sentence between the subtitle paragraphs, and realizes the segmentation of the target audio file based on the time interval between the character sentences in the subtitle file, thereby improving the segmentation processing efficiency and improving the intelligence of the audio processing. .
- FIGS. 10 to 13 The structure of the audio processing device provided by the embodiment of the present invention will be described below with reference to FIG. 10 to FIG. And features are described in detail. It is to be noted that the apparatus shown in the following FIGS. 10 to 13 can be operated in a terminal to be applied to the method shown in the above-described FIGS. 8 to 9.
- FIG. 10 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present invention.
- the apparatus may include: an obtaining unit 1001, a constructing unit 1002, an adjusting unit 1003, a determining unit 1004, and a segmenting unit 1005.
- the obtaining unit 1001 is configured to acquire a subtitle file corresponding to the target audio file, where the subtitle file is composed of at least one character single sentence order.
- An audio file corresponds to a subtitle file.
- the subtitle file includes at least one character single sentence and key information of each character single sentence; the key information of a character single sentence includes: an identifier (ID), a start time (start_time), and an end time (end_time).
- ID an identifier
- start_time a start time
- end_time an end time
- a plurality of audio files, attributes of each audio file, and subtitle files corresponding to each audio file are stored in the Internet audio library, wherein the attributes of the audio file may include, but are not limited to, audio features of the audio file, and audio files.
- the obtaining unit 1001 may obtain the subtitle file corresponding to the target audio file from the Internet audio library.
- the specific acquisition manner may include, but is not limited to: searching for the target audio file in the Internet audio library according to the identifier of the target audio file.
- the target audio file is the song A
- the structure of the subtitle file corresponding to the song A can be referred to the example shown in this embodiment.
- the subtitle file is composed of N (N is a positive integer) characters in a single sentence sequence.
- p(0) can be used to represent the first character single sentence "a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 ′
- p(1) can be used to represent the second character single sentence “b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 ”
- p(2) can be used to represent the third character single sentence “c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 ′
- p(N-1) is used to represent the Nth character single sentence.
- the building unit 1002 is configured to construct a time feature sequence according to a time interval between the at least one character single sentence, the time feature sequence including at least one time feature element.
- the sequence of time features can be used to reflect the degree of time interval between the at least one character single sentence.
- the constructing unit 1002 calculates a time interval between the at least one character single sentence, where the time interval p(1).start_time-p(0).end_time between p(1) and p(0) needs to be calculated. Calculate the time interval p(2).start_time-p(1).end_time between p(2) and p(1); and so on, calculate between p(N-1) and p(N-2) The time interval p(N-1).start_time-p(N-2).end_time.
- the building unit 1002 may construct the time feature sequence according to the number, order, and calculated time intervals of the at least one character single sentence.
- the constructed time feature sequence t(n) includes a total of N time feature elements, respectively t(0), t( 1)...t(N-1).
- t(0) can be set to 0
- the value of t(1) is used to represent the time interval between p(1) and p(0)
- the value of t(2) is used to represent p(2)
- the value of t(N-1) is used to represent the time interval between p(N-1) and p(N-2).
- the adjusting unit 1003 is configured to adjust the value of each time feature element in the time feature sequence according to the total number of preset segments.
- the total number of preset paragraphs may be set according to the actual segmentation requirements of the user for the target audio file. Assuming that M (M is a positive integer and M>1) represents the total number of preset paragraphs, the adjusting unit 1003 adjusts each time feature element in the time feature sequence t(n) according to the total number M of preset paragraphs. The purpose of the numerical value is to enable the adjusted time feature sequence t(n) to be extracted to the inflection point corresponding to the M subtitle segments, thereby realizing the actual segmentation requirement of the target audio file.
- the determining unit 1004 is configured to determine a paragraph change time according to the value of the at least one time feature element in the adjusted time feature sequence.
- each time feature element in the adjusted time feature sequence t(n) can reflect the turning point corresponding to the M caption segments, and then the determining unit 1004 can be based on at least the adjusted time feature sequence.
- the value of a time feature element, the start and end time of the M subtitle paragraphs is obtained from the subtitle file.
- the segmentation unit 1005 is configured to divide the target audio file into segments of the total number of preset paragraphs according to the paragraph change time.
- the segmentation unit 1005 can correspondingly segment the target audio file according to the start and end time of the obtained M subtitle segments, and obtain M audio passages.
- the time feature sequence may be constructed according to a time interval between at least one character sentence in the subtitle file corresponding to the target audio file, and the value of each time feature element in the time feature sequence is adjusted according to the total number of preset segments. And determining a paragraph change time according to the adjusted value of the at least one time feature element in the time feature sequence, and then dividing the target audio file into the paragraphs of the total number of preset paragraphs according to the paragraph change time,
- the audio processing process utilizes the time interval characteristics of character sentences between subtitle paragraphs, based on the time between character sentences in the subtitle file By segmenting the paragraphs of the target audio file, the segmentation processing efficiency can be improved, and the intelligence of audio processing can be improved.
- the construction unit 1002 may include: a quantity determination unit 1101, an index determination unit 1102, a value setting unit 1103, and a sequence construction unit 1104.
- the quantity determining unit 1101 is configured to determine the number of time feature elements of the time-of-time feature sequence according to the number of the at least one character single sentence.
- the subtitle file is composed of N (N is a positive integer) characters in a single sentence order, that is, the number of the at least one character single sentence is N, then the quantity determining unit 1101 may determine the temporal feature element of the time feature sequence.
- the number is also N, that is, the length of the time feature sequence is N.
- the constructed time feature sequence t(n) includes N time feature elements, respectively t(0), t(1)...t(N-1). ).
- the index determining unit 1102 is configured to determine an index of each time feature element that constructs the time feature sequence according to an order of each character single sentence in the at least one character single sentence.
- the order of the N character single sentences of the subtitle file is arranged as p(0), p(1)...p(N-1), assuming that the time feature sequence t(n): t(0) corresponds to p(0) , t(1) corresponds to p(1), and so on, and t(N-1) corresponds to p(N-1). Then, the index of t(0) in the time feature sequence t(n) is 1, that is, the first time feature element; the index of t(1) is 2, that is, the second time feature element; and so on, The index of t(N-1) is N, that is, the Nth time feature element.
- a value setting unit 1103 configured to set a time interval between the target character single sentence and an adjacent preceding character sentence of the target character sentence to the one of the at least one character single sentence The value of the time feature element corresponding to the target character single sentence.
- the specific processing of the value setting unit 1103 may include the following A-B:
- the sequence construction unit 1104 is configured to construct the time feature sequence according to the number, index, and value of the temporal feature elements that construct the time feature sequence.
- the time characteristic sequence constructed is t(n), and t(n) is composed of N time feature elements t(0), t(1)...t(N-1), and the time feature sequence
- the time feature sequence may be constructed according to a time interval between at least one character sentence in the subtitle file corresponding to the target audio file, and the value of each time feature element in the time feature sequence is adjusted according to the total number of preset segments. And determining a paragraph change time according to the adjusted value of the at least one time feature element in the time feature sequence, and then dividing the target audio file into the paragraphs of the total number of preset paragraphs according to the paragraph change time,
- the audio processing process utilizes the time interval feature of the character single sentence between the subtitle paragraphs, and realizes the segmentation of the target audio file based on the time interval between the character sentences in the subtitle file, thereby improving the segmentation processing efficiency and improving the intelligence of the audio processing. .
- FIG. 12 is a schematic structural diagram of an embodiment of an adjustment unit shown in FIG. 10; the adjustment unit 1003 may include an element search unit 1201 and a value adjustment unit 1202.
- the element searching unit 1201 is configured to search, from the time feature sequence, a time feature element that reduces the number of pre-preset paragraphs by one maximum value.
- the element search unit 1201 needs to find the temporal characteristics of the first M-1 maximum values from the temporal feature sequence t(n). element.
- the value adjustment unit 1202 is configured to adjust the value of the found time feature element to the target value, and adjust the value of the time feature element other than the found time feature element in the time feature sequence to a reference value.
- the target value and the feature value may be set according to actual needs. In the embodiment of the present invention, the target value may be set to 1, and the reference value is 0.
- the specific processing procedure of the element searching unit 1201 and the value adjusting unit 1202 may be: first, the element searching unit 1201 traverses the value of each time feature element in the time feature sequence t(n), and finds the maximum value corresponding thereto. Time characteristic element; after excluding the found time feature element, traversing the value of each time feature element in the time feature sequence t(n) again, and finding the time feature element corresponding to the largest value; looping the above traversal process until searching To M-1 maximum values until. Finally, the value adjustment unit 1202 adjusts the M-1 maximum values found in the time feature sequence t(n) to 1, and adjusts other values to 0.
- the time feature sequence may be constructed according to a time interval between at least one character sentence in the subtitle file corresponding to the target audio file, and the value of each time feature element in the time feature sequence is adjusted according to the total number of preset segments. And determining a paragraph change time according to the adjusted value of the at least one time feature element in the time feature sequence, and then dividing the target audio file into the paragraphs of the total number of preset paragraphs according to the paragraph change time,
- the audio processing process utilizes the time interval feature of the character single sentence between the subtitle paragraphs, and realizes the segmentation of the target audio file based on the time interval between the character sentences in the subtitle file, thereby improving the segmentation processing efficiency and improving the intelligence of the audio processing. .
- the determining unit 1004 may include: a target index acquiring unit 1301 , a positioning unit 1302 , and a time reading unit 1303 .
- the target index obtaining unit 1301 is configured to obtain, from the adjusted time feature sequence, a target index corresponding to a time feature element whose value is a target value.
- the target index obtaining unit 1301 needs to acquire a target index corresponding to a time feature element with a value of 1, that is, an index of the M-1 time feature elements that need to be acquired.
- the locating unit 1302 is configured to locate a character single sentence of a paragraph turn in the subtitle file according to the target index.
- the positioning unit 1302 can locate the paragraph sentence of the paragraph turn in the subtitle file as the fifth character single sentence, that is, the fifth character single sentence is the start of a subtitle paragraph.
- the position, that is, the 1-4 character single sentence in the subtitle file constitutes a subtitle paragraph. In the same way, you can locate the character sentences of M-1 paragraph transitions.
- the time reading unit 1303 is configured to read the paragraph change time from the subtitle file according to the character single sentence of the paragraph transition.
- the key information of each character sentence is recorded in the subtitle file, including the start time and the end time of each character single sentence; the time reading unit 1303 reads the paragraph change time from the subtitle file, according to the present
- the 1-4 character single sentence in the subtitle file constitutes a subtitle paragraph
- the read paragraph change time is: the end time of the fourth character single sentence and the start time of the fifth character single sentence .
- the time feature sequence may be constructed according to a time interval between at least one character sentence in the subtitle file corresponding to the target audio file, and the value of each time feature element in the time feature sequence is adjusted according to the total number of preset segments. And determining a paragraph change time according to the adjusted value of the at least one time feature element in the time feature sequence, and then dividing the target audio file into the paragraphs of the total number of preset paragraphs according to the paragraph change time,
- the audio processing process utilizes the time interval feature of the character single sentence between the subtitle paragraphs, and realizes the segmentation of the target audio file based on the time interval between the character sentences in the subtitle file, thereby improving the segmentation processing efficiency and improving the intelligence of the audio processing. .
- the embodiment of the invention further discloses a terminal, which can be a PC (Personal Computer), a notebook computer, a mobile phone, a PAD (tablet computer), an in-vehicle terminal, a smart wearable device and the like.
- a terminal which can be a PC (Personal Computer), a notebook computer, a mobile phone, a PAD (tablet computer), an in-vehicle terminal, a smart wearable device and the like.
- An audio processing device may be included in the terminal.
- For the structure and function of the device refer to the related description of the embodiment shown in FIG. 10 to FIG. 13 , and details are not described herein.
- the time feature sequence may be constructed according to a time interval between at least one character sentence in the subtitle file corresponding to the target audio file, and the value of each time feature element in the time feature sequence is adjusted according to the total number of preset segments. And determining a paragraph change time according to the adjusted value of the at least one time feature element in the time feature sequence, and then dividing the target audio file into the paragraphs of the total number of preset paragraphs according to the paragraph change time,
- the audio processing process utilizes the time interval feature of the character single sentence between the subtitle paragraphs, and realizes the segmentation of the target audio file based on the time interval between the character sentences in the subtitle file, thereby improving the segmentation processing efficiency and improving the intelligence of the audio processing. .
- FIG. 14 is a flowchart of an audio processing method according to an embodiment of the present invention; the method may include the following steps S1401 to S1405.
- S1401 Acquire audio data of a target audio file, where the audio data includes at least one audio frame.
- An audio file includes audio data, and audio data (for example, PCM data) can be obtained by decoding an audio file (for example, PCM decoding). This step can decode the target audio file to obtain audio data of the target audio file.
- the audio data may include at least one audio frame, The audio data may be represented as a sequence of frames consisting of the at least one audio frame sequence.
- S1402. Construct a peak feature sequence according to the correlation of the at least one audio frame, the peak feature sequence including at least one peak feature element.
- the sequence of peak features can be used to reflect the similarity of the at least one audio frame.
- the correlation of the at least one audio frame may be first calculated by using a correlation calculation formula, where a correlation function sequence of the at least one audio frame may be obtained by calculation, and if r() is used to represent the correlation function, then the correlation is The calculation can obtain r(n), r(n+1), r(n+2)...r(N-2), r(N-1).
- a peak feature sequence can be constructed by performing a maximum value, a peak value, and the like on the correlation function sequence of the at least one audio frame.
- the peak feature sequence is represented by v(n).
- the constructed peak feature sequence v(n) includes a total of N peak feature elements, respectively v(0), v(1)...v(N-1).
- v(0) can be used to describe the correlation between the audio frame x(0) and its subsequent audio frame
- v(1) can be used to describe the correlation between x(1) and its subsequent audio frame. Sex; and so on.
- the peak feature sequence v(n) may be normalized by using a scan interval corresponding to a preset interval coefficient.
- the purpose of the regularization process is to make the peak feature sequence v(n) have only one maximum peak in the scan interval corresponding to the preset interval coefficient, so as to ensure the accuracy of the subsequent paragraph division.
- S1404 Determine a paragraph change time according to the value of the at least one peak feature element in the normalized peak feature sequence.
- each peak feature element in the regularized peak feature sequence v(n) may be used to describe the correlation between audio frames. Then, this step may be based on at least the regularized peak feature sequence. The value of a peak feature element determines when the audio segment changes.
- S1405 Perform segmentation on the target audio file according to the paragraph change time.
- the target audio file can be segmented according to the time at which the obtained audio passage changes.
- the peak feature sequence may be constructed according to the correlation of the at least one audio frame included in the audio data of the target audio file, and the peak feature sequence is normalized and processed according to the regularized peak feature sequence.
- the value of the at least one peak feature element determines a paragraph change time, and the target audio file is segmented according to the paragraph change time, and the audio processing process is advantageous.
- FIG. 15 is a flowchart of another audio processing method according to an embodiment of the present invention. the method may include the following steps S1501 to S1510.
- S1501 Acquire a type of the target audio file, the type includes: a two-channel type or a mono type.
- a plurality of audio files and attributes of each audio file are stored in the Internet audio library, wherein the attributes of the audio file may include, but are not limited to, an audio feature of the audio file, an identifier of the audio file, a type of the audio file, and the like.
- the type of the target audio file may be obtained from the Internet audio library; the specific acquisition manner may include, but is not limited to: searching for the type of the target audio file in the Internet audio library according to the identifier of the target audio file; or The audio features of the target audio file can be extracted to match the audio features of the audio files in the internet audio library, thereby locating the target audio file in the internet audio library and obtaining the type of the target audio file.
- the type of the target audio file is a mono type, decoding the content of the target audio file from the mono output to obtain audio data; or, if the type of the target audio file is double sound a channel type, selecting one channel from the two channels, decoding the content output by the target audio file from the selected channel to obtain audio data; or processing the two channels into a mixed channel, The target audio file is decoded from the content output by the mixed channel to obtain audio data.
- the target audio file outputs audio content through one channel, and this step needs to decode the mono audio output to obtain audio data.
- the type of the target audio file is a two-channel type, the target audio file outputs audio content through two channels. In this step, the audio content outputted by one channel can be selected for decoding to obtain audio data. In addition, this step is performed. It is also possible to first process two channels into a mixed channel by using a processing method such as Downmix, and then decode the audio content output from the mixed channel to obtain audio data.
- Step S1501 - step S1502 of this embodiment may be step S1401 of the embodiment shown in FIG. 14 Specific refinement steps.
- S1503 Perform correlation calculation on each audio frame in the at least one audio frame to obtain a correlation function sequence corresponding to the at least one audio frame.
- the correlation of the at least one audio frame may be calculated using a correlation calculation formula, which may be expressed as follows:
- the correlation function sequence for obtaining the at least one audio frame can be calculated by the above formula (1) as r(n), r(n+1), r(n+2)...r(N-2), r(N- 1).
- S1504 Perform a maximum value calculation on the correlation function sequence corresponding to the at least one audio frame to generate a reference sequence.
- the reference sequence can be expressed as D(n).
- the reference sequence can be obtained by using a maximum value calculation formula, and the maximum value calculation formula can be expressed as follows:
- max() is a maximum value finding function.
- the reference sequence D(n) obtained by the above formula (2) includes a total of N elements, which are d(0), d(1)...d(N-1), respectively.
- S1505 Perform peak value calculation on the reference sequence to obtain the peak feature sequence.
- the peak feature sequence is represented by v(n).
- the constructed peak feature sequence v(n) includes a total of N peak feature elements, respectively v(0), v(1)...v(N-1).
- v(0) can be used to describe the correlation between the audio frame x(0) and its subsequent audio frame
- v(1) can be used to describe the correlation between x(1) and its subsequent audio frame. Sex; and so on.
- Step S1503 - step S1505 of this embodiment may be a specific refinement step of step S1402 of the embodiment shown in FIG.
- S1506 Acquire a scan interval corresponding to a preset interval coefficient.
- the preset interval coefficient may be set according to actual needs. If the preset interval coefficient is Q, the scan interval corresponding to the preset interval coefficient may be [iQ/2, i+Q/ 2] (where i is an integer and 0 ⁇ i ⁇ N-1).
- the peak feature sequence is normalized by using a scan interval corresponding to the preset interval coefficient, and a value of a peak feature element corresponding to a maximum peak in a scan interval corresponding to the preset interval coefficient is set as a target value. And setting a value of the other peak feature elements other than the peak feature element corresponding to the maximum peak in the scan interval corresponding to the preset interval coefficient as an initial value.
- the target value and the feature value may be set according to actual needs.
- the target value may be set to 1, and the reference value is 0.
- step S1506-step S1507 performs the normalization processing on the peak feature sequence v(n) in order to make the peak feature sequence v(n) have only one maximum peak in the scan interval corresponding to the preset interval coefficient.
- Step S1506 - step S1507 of this embodiment may be a specific refinement step of step S1403 of the embodiment shown in FIG.
- S1509 Calculate a paragraph change time according to the target index and a sampling rate of the target audio file.
- the target index may be divided by the sampling rate of the target audio file to obtain a paragraph change time.
- the obtained target index is i
- the sampling rate is f
- S1510 Perform segmentation on the target audio file according to the paragraph change time.
- the target audio file can be segmented according to the time at which the obtained audio passage changes.
- the peak feature sequence may be constructed according to the correlation of the at least one audio frame included in the audio data of the target audio file, and the peak feature sequence is normalized and processed according to the regularized peak feature sequence.
- the value of the at least one peak feature element determines a paragraph change time, and the target audio file is segmented according to the paragraph change time, and the audio processing process utilizes the correlation feature of the audio frame between the audio segments to implement the target audio file. Paragraph division can improve segmentation processing efficiency and improve the intelligence of audio processing.
- FIGS. 16 to 20 The structure and function of the audio processing device provided by the embodiment of the present invention will be described in detail below with reference to FIG. 16 to FIG. It is to be noted that the apparatus shown in the following FIGS. 16 to 20 can be operated in a terminal to be applied to the method shown in the above-described FIGS. 14 to 15.
- FIG. 16 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present invention.
- the apparatus may include: an obtaining unit 1601, a constructing unit 1602, a regularizing processing unit 1603, a determining unit 1604, and a segmenting unit 1605.
- the obtaining unit 1601 is configured to acquire audio data of a target audio file, where the audio data includes at least one audio frame.
- An audio file includes audio data, and audio data (for example, PCM data) can be obtained by decoding an audio file (for example, PCM decoding).
- the obtaining unit 1601 may decode the target audio file to obtain audio data of the target audio file.
- the audio data may include at least one audio frame, the audio data may be represented as a sequence of frames consisting of the at least one audio frame sequence.
- the constructing unit 1602 is configured to construct a peak feature sequence according to the correlation of the at least one audio frame, the peak feature sequence including at least one peak feature element.
- the sequence of peak features can be used to reflect the similarity of the at least one audio frame.
- the constructing unit 1602 may calculate a correlation of the at least one audio frame by using a correlation calculation formula, where a correlation function sequence of the at least one audio frame may be obtained by calculation, assuming that r() is used to represent the correlation function, then Correlation calculations can obtain r(n), r(n+1), r(n+2)...r(N-2), r(N-1).
- the constructing unit 1602 may construct a peak feature sequence by performing a maximum value, a peak value, and the like on the correlation function sequence of the at least one audio frame.
- the peak feature sequence is represented by v(n).
- the constructed peak feature sequence v(n) includes a total of N peak feature elements, respectively v(0), v(1)...v(N-1).
- v(0) can be used to describe the correlation between the audio frame x(0) and its subsequent audio frame
- v(1) can be used to describe the correlation between x(1) and its subsequent audio frame. Sex; and so on.
- the regularization processing unit 1603 is configured to perform regular processing on the peak feature sequence.
- the regularization processing unit 1603 may adopt a scan interval corresponding to a preset interval coefficient to The peak feature sequence v(n) is structured.
- the purpose of the regularization process is to make the peak feature sequence v(n) have only one maximum peak in the scan interval corresponding to the preset interval coefficient, so as to ensure the accuracy of the subsequent paragraph division.
- the determining unit 1604 is configured to determine a paragraph change time according to the value of the at least one peak feature element in the regularized peak feature sequence.
- the values of the peak feature elements in the regularized peak feature sequence v(n) may be used to describe the correlation between audio frames, then the determining unit 1604 may be based on the regularized peak feature sequence.
- the value of at least one of the peak feature elements determines the time at which the audio segment changes.
- the segmentation unit 1605 is configured to perform segmentation of the target audio file according to the paragraph change time.
- the segmentation unit 1605 may perform paragraph division on the target audio file in accordance with the time at which the obtained audio passage changes.
- the peak feature sequence may be constructed according to the correlation of the at least one audio frame included in the audio data of the target audio file, and the peak feature sequence is normalized and processed according to the regularized peak feature sequence.
- the value of the at least one peak feature element determines a paragraph change time, and the target audio file is segmented according to the paragraph change time, and the audio processing process utilizes the correlation feature of the audio frame between the audio segments to implement the target audio file. Paragraph division can improve segmentation processing efficiency and improve the intelligence of audio processing.
- the obtaining unit 1601 may include: a type acquiring unit 1701 and a decoding unit 1702.
- the type obtaining unit 1701 is configured to acquire a type of the target audio file, and the type includes: a two-channel type or a mono type.
- a plurality of audio files and attributes of each audio file are stored in the Internet audio library, wherein the attributes of the audio file may include, but are not limited to, an audio feature of the audio file, an identifier of the audio file, a type of the audio file, and the like.
- the type obtaining unit 1701 may obtain the type of the target audio file from the Internet audio library; the specific obtaining manner may include: but is not limited to: searching for the type of the target audio file in the Internet audio library according to the identifier of the target audio file; Alternatively, the audio features of the target audio file may be extracted to match the audio features of the audio files in the internet audio library, thereby locating the target audio file in the internet audio library and obtaining the type of the target audio file.
- the decoding unit 1702 is configured to: if the type of the target audio file is a mono type, The audio file is decoded from the content of the mono output to obtain audio data; or, if the type of the target audio file is a two-channel type, one channel is selected from the two channels, Decoding the content of the target audio file from the selected channel to obtain audio data; or processing the two channels into a mixed channel, decoding the content output from the mixed channel of the target audio file to obtain audio data.
- the target audio file outputs audio content through one channel, and the decoding unit 1702 needs to decode the audio content output by the mono to obtain audio data.
- the type of the target audio file is a two-channel type, the target audio file outputs audio content through two channels, and the decoding unit 1702 may select audio content output by one channel to decode and obtain audio data, and The decoding unit 1702 may firstly process the two channels into a mixed channel by using a processing method such as Downmix, and then decode the audio content output by the mixed channel to obtain audio data.
- the peak feature sequence may be constructed according to the correlation of the at least one audio frame included in the audio data of the target audio file, and the peak feature sequence is normalized and processed according to the regularized peak feature sequence.
- the value of the at least one peak feature element determines a paragraph change time, and the target audio file is segmented according to the paragraph change time, and the audio processing process utilizes the correlation feature of the audio frame between the audio segments to implement the target audio file. Paragraph division can improve segmentation processing efficiency and improve the intelligence of audio processing.
- FIG. 18 is a schematic structural diagram of an embodiment of a construction unit shown in FIG. 16; the construction unit 1602 may include a correlation calculation unit 1801, a generation unit 1802, and a sequence determination unit 1803.
- the correlation calculation unit 1801 is configured to perform correlation calculation on each audio frame in the at least one audio frame to obtain a correlation function sequence corresponding to the at least one audio frame.
- the correlation calculation unit 1801 may calculate a correlation of the at least one audio frame using a correlation calculation formula, and the correlation calculation formula may represent the formula (1) in the embodiment shown in FIG. 2.
- the correlation function sequence for obtaining the at least one audio frame can be calculated by the above formula (1) as r(n), r(n+1), r(n+2)...r(N-2), r(N- 1).
- a generating unit 1802 configured to maximize a sequence of correlation functions corresponding to the at least one audio frame The value is calculated to generate a baseline sequence.
- the reference sequence may be represented as D(n), and the generating unit 1802 may obtain the reference sequence by using a maximum value calculation formula, and the maximum value calculation formula may be expressed as the formula (2) in the embodiment shown in FIG. 2 .
- the reference sequence D(n) obtained by the above formula (2) includes a total of N elements, which are d(0), d(1)...d(N-1), respectively.
- the sequence obtaining unit 1803 is configured to perform peak finding calculation on the reference sequence to obtain the peak feature sequence.
- the peak feature sequence is represented by v(n).
- the constructed peak feature sequence v(n) includes a total of N peak feature elements, respectively v(0), v(1)...v(N-1).
- v(0) can be used to describe the correlation between the audio frame x(0) and its subsequent audio frame
- v(1) can be used to describe the correlation between x(1) and its subsequent audio frame. Sex; and so on.
- the values of the respective peak feature elements of the peak feature sequence v(n) can be obtained.
- the peak feature sequence may be constructed according to the correlation of the at least one audio frame included in the audio data of the target audio file, and the peak feature sequence is normalized and processed according to the regularized peak feature sequence.
- the value of the at least one peak feature element determines a paragraph change time, and the target audio file is segmented according to the paragraph change time, and the audio processing process utilizes the correlation feature of the audio frame between the audio segments to implement the target audio file. Paragraph division can improve segmentation processing efficiency and improve the intelligence of audio processing.
- FIG. 19 is a schematic structural diagram of an embodiment of a regular processing unit shown in FIG. 16; the regular processing unit 1603 may include an interval obtaining unit 1901 and a regularizing unit 1902.
- the interval obtaining unit 1901 is configured to acquire a scan interval corresponding to the preset interval coefficient.
- the preset interval coefficient may be set according to actual needs. If the preset interval coefficient is Q, the scan interval corresponding to the preset interval coefficient may be [iQ/2, i+Q/ 2] (where i is an integer and 0 ⁇ i ⁇ N-1).
- the aligning unit 1902 is configured to align the peak feature sequence by using a scan interval corresponding to the preset interval coefficient, and set a value of a peak feature element corresponding to a maximum peak value in a scan interval corresponding to the preset interval coefficient Set to a target value, and the scan interval corresponding to the preset interval coefficient
- the value of the other peak feature elements other than the peak feature element corresponding to the maximum peak is set as the initial value.
- the target value and the feature value may be set according to actual needs. In the embodiment of the present invention, the target value may be set to 1, and the reference value is 0.
- the purpose of the peak feature sequence v(n) normalization process is to make the peak feature sequence v(n) have only one maximum peak in the scan interval corresponding to the preset interval coefficient, so as to ensure subsequent segmentation. The accuracy.
- the peak feature sequence may be constructed according to the correlation of the at least one audio frame included in the audio data of the target audio file, and the peak feature sequence is normalized and processed according to the regularized peak feature sequence.
- the value of the at least one peak feature element determines a paragraph change time, and the target audio file is segmented according to the paragraph change time, and the audio processing process utilizes the correlation feature of the audio frame between the audio segments to implement the target audio file. Paragraph division can improve segmentation processing efficiency and improve the intelligence of audio processing.
- the determining unit 1604 may include a target index acquiring unit 2001 and a time calculating unit 2002.
- the target index obtaining unit 2001 is configured to obtain, from the normalized peak feature sequence, a target index corresponding to a peak feature element whose value is a target value.
- the target index that is available is i.
- the time calculation unit 2002 is configured to calculate a paragraph change time according to the target index and a sampling rate of the target audio file.
- the time calculation unit 2002 may obtain the paragraph change time by dividing the target index by the sampling rate of the target audio file.
- the obtained target index is i
- the sampling rate is f.
- the peak feature sequence may be constructed according to the correlation of the at least one audio frame included in the audio data of the target audio file, and the peak feature sequence is normalized and processed according to the regularized peak feature sequence.
- the value of the at least one peak feature element determines a paragraph change time, and the target audio file is segmented according to the paragraph change time, and the audio processing process is advantageous.
- the embodiment of the invention further discloses a terminal, which can be a PC (Personal Computer), a notebook computer, a mobile phone, a PAD (tablet computer), an in-vehicle terminal, a smart wearable device and the like.
- a terminal which can be a PC (Personal Computer), a notebook computer, a mobile phone, a PAD (tablet computer), an in-vehicle terminal, a smart wearable device and the like.
- An audio processing device may be included in the terminal.
- For the structure and function of the device refer to the related description of the embodiment shown in FIG. 16 to FIG. 20, and details are not described herein.
- the peak feature sequence may be constructed according to the correlation of the at least one audio frame included in the audio data of the target audio file, and the peak feature sequence is normalized and processed according to the regularized peak feature sequence.
- the value of the at least one peak feature element determines a paragraph change time, and the target audio file is segmented according to the paragraph change time, and the audio processing process utilizes the correlation feature of the audio frame between the audio segments to implement the target audio file. Paragraph division can improve segmentation processing efficiency and improve the intelligence of audio processing.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Abstract
Description
Claims (31)
- 一种音频处理方法,其特征在于,包括:获取目标音频文件的文件数据;根据所述文件数据的组成元素之间的相关性特征数据,构建相关性特征序列;按照预设段落总数对所述相关性特征序列进行优化;根据优化后的所述相关性特征序列中的至少一个特征元素的数值确定段落变化时间;按照所述段落变化时间将所述目标音频文件划分为所述预设段落总数的段落。
- 根据权利要求1所述的方法,其特征在于,根据所述文件数据的组成元素之间的相关性特征数据,构建相关性特征序列包括:所述文件数据是指字幕文件,所述字幕文件由至少一个字符单句顺序组成,根据所述至少一个字符单句之间的相似度构建字幕特征序列,所述字幕特征序列包括至少一个字符特征元素。
- 如权利要求2所述的方法,其特征在于,所述根据所述至少一个字符单句之间的相似度构建字幕特征序列,包括:根据所述至少一个字符单句的数量确定构建字幕特征序列的字符特征元素的数量;根据所述至少一个字符单句中各字符单句的顺序,确定构建所述字幕特征序列的各字符特征元素的索引;将构建所述字幕特征序列的各字符特征元素的数值均设置为初始值;针对所述至少一个字符单句中的任一个目标字符单句,若所述目标字符单句与所述目标字符单句的在后字符单句之间的最大相似度大于预设相似阀值,将所述目标字符单句对应的字符特征元素的数值从初始值变更为目标值;按照构建所述字幕特征序列的字符特征元素的数量、索引及数值,构建所述字幕特征序列。
- 如权利要求3所述的方法,其特征在于,所述按照预设段落总数对所述相关性特征序列进行优化,包括:统计所述字幕特征序列中数值为目标值的字符特征元素的数量;判断所述数量是否位于所述预设段落总数对应的容错区间内;若否,调整所述预设相似阀值的大小以调整所述字幕特征序列中的各字符特征元素的数值。
- 如权利要求4所述的方法,其特征在于,所述若否,调整所述预设相似阀值的大小以调整所述字幕特征序列中的各字符特征元素的数值,包括:若所述数量大于所述预设段落总数对应的容错区间内的最大容错值,按照预设步长增大所述预设相似阀值以调整所述字幕特征序列中的各字符特征元素的数值;若所述数量小于所述预设段落总数对应的容错区间内的最大容错值,按照预设步长减小所述预设相似阀值以调整所述字幕特征序列中的各字符特征元素的数值。
- 如权利要求5所述的方法,其特征在于,所述根据优化后的所述相关性特征序列中的至少一个特征元素的数值确定段落变化时间,包括:从优化后的所述字幕特征序列中获取数值为目标值的字符特征元素对应的目标索引;根据所述目标索引在所述字幕文件中定位段落转折的字符单句;根据所述段落转折的字符单句从所述字幕文件中读取段落变化时间。
- 根据权利要求1所述的方法,其特征在于,根据所述文件数据的组成元素之间的相关性特征数据,构建相关性特征序列包括:所述文件数据是指字幕文件,所述字幕文件由至少一个字符单句顺序组成,根据所述至少一个字符单句之间的时间间隔构建时间特征序列,所述时间特征序列包括至少一个时间特征元素。
- 如权利要求7所述的方法,其特征在于,所述根据所述至少一个字符单句之间的时间间隔构建时间特征序列,包括:根据所述至少一个字符单句的数量确定构建时间特征序列的时间特征元素的数量;根据所述至少一个字符单句中各字符单句的顺序,确定构建所述时间特征序列的各时间特征元素的索引;针对所述至少一个字符单句中的任一个目标字符单句,将所述目标字符单句与所述目标字符单句的相邻在先字符单句之间的时间间隔设置为所述目标字符单句对应的时间特征元素的数值;按照构建 所述时间特征序列的时间特征元素的数量、索引及数值,构建所述时间特征序列。
- 如权利要求8所述的方法,其特征在于,所述按照预设段落总数对所述相关性特征序列进行优化,包括:从所述时间特征序列中查找前预设段落数量减1个最大数值的时间特征元素;将查找到的时间特征元素的数值调整为目标值,将所述时间特征序列中除查找到的时间特征元素之外的其他时间特征元素的数值调整为参考值。
- 如权利要求9所述的方法,其特征在于,所述根据优化后的所述相关性特征序列中的至少一个特征元素的数值确定段落变化时间,包括:从调整后的所述时间特征序列中获取数值为目标值的时间特征元素对应的目标索引;根据所述目标索引在所述字幕文件中定位段落转折的字符单句;根据所述段落转折的字符单句从所述字幕文件中读取段落变化时间。
- 根据权利要求1所述的方法,其特征在于,根据所述文件数据的组成元素之间的相关性特征数据,构建相关性特征序列包括:所述文件数据是指音频数据,所述音频数据包括至少一个音频帧,根据所述至少一个音频帧的相关性构建峰值特征序列,所述峰值特征序列包括至少一个峰值特征元素。
- 如权利要求11所述的方法,其特征在于,所述根据所述至少一个音频帧的相关性构建峰值特征序列,包括:对所述至少一个音频帧中各音频帧进行相关计算,获得所述至少一个音频帧对应的相关函数序列;对所述至少一个音频帧对应的相关函数序列进行最大值计算,生成基准序列;对所述基准序列进行峰值求取计算,获得所述峰值特征序列。
- 如权利要求12所述的方法,其特征在于,所述按照预设段落总数对所述相关性特征序列进行优化,包括:获取预设的间隔系数对应的扫描区间;采用所述预设的间隔系数对应的扫 描区间对所述峰值特征序列进行规整,将所述预设的间隔系数对应的扫描区间内的最大峰值对应的峰值特征元素的数值设置为目标值,将所述预设的间隔系数对应的扫描区间内除所述最大峰值对应的峰值特征元素之外的其他峰值特征元素的数值设置为初始值。
- 如权利要求13所述的方法,其特征在于,所述根据优化后的所述相关性特征序列中的至少一个特征元素的数值确定段落变化时间,包括:从规整后的所述峰值特征序列中获取数值为目标值的峰值特征元素对应的目标索引;根据所述目标索引及所述目标音频文件的采样率,计算段落变化时间。
- 如权利要求11所述的方法,其特征在于,所述获取目标音频文件的文件数据,包括:获取所述目标音频文件的类型,所述类型包括:双声道类型或单声道类型;若所述目标音频文件的类型为单声道类型,对所述目标音频文件从所述单声道输出的内容进行解码获得音频数据;若所述目标音频文件的类型为双声道类型,从所述双声道中选取一个声道,对所述目标音频文件从所选取声道输出的内容进行解码获得音频数据;或者将所述双声道处理为混合声道,对所述目标音频文件从所述混合声道输出的内容进行解码获得音频数据。
- 如权利要求1-10任一项所述的方法,其特征在于,所述字幕文件包括至少一个字符单句及各字符单句的关键信息;一个字符单句的关键信息包括:标识、开始时间和结束时间。
- 一种音频处理装置,其特征在于,包括:获取单元,用于获取目标音频文件的文件数据;构建单元,用于根据所述文件数据的组成元素之间的相关性特征数据,构建相关性特征序列;优化单元,用于按照预设段落总数对所述相关性特征序列进行优化;确定单元,用于根据优化后的所述相关性特征序列中的至少一个特征元素的数值确定段落变化时间;分段单元,用于按照所述段落变化时间将所述目标音频文件划分为所述预设段落总数的段落。
- 如权利要求17所述的装置,其特征在于,所述构建单元用于所述文件数据是指字幕文件,所述字幕文件由至少一个字符单句顺序组成,根据所述至少一个字符单句之间的相似度构建字幕特征序列,所述字幕特征序列包括至少一个字符特征元素;或,所述构建单元用于所述文件数据是指字幕文件,所述字幕文件由至少一个字符单句顺序组成,根据所述至少一个字符单句之间的时间间隔构建时间特征序列,所述时间特征序列包括至少一个时间特征元素;或,所述构建单元用于所述文件数据是指音频数据,所述音频数据包括至少一个音频帧,根据所述至少一个音频帧的相关性构建峰值特征序列,所述峰值特征序列包括至少一个峰值特征元素。
- 如权利要求18所述的装置,其特征在于,所述构建单元包括:数量确定单元,用于根据所述至少一个字符单句的数量确定构建字幕特征序列的字符特征元素的数量;索引确定单元,用于根据所述至少一个字符单句中各字符单句的顺序,确定构建所述字幕特征序列的各字符特征元素的索引;数值设置单元,用于将构建所述字幕特征序列的各字符特征元素的数值均设置为初始值;数值变更单元,用于针对所述至少一个字符单句中的任一个目标字符单句,若所述目标字符单句与所述目标字符单句的在后字符单句之间的最大相似度大于预设相似阀值,将所述目标字符单句对应的字符特征元素的数值从初始值变更为目标值;序列构建单元,用于按照构建所述字幕特征序列的字符特征元素的数量、索引及数值,构建所述字幕特征序列。
- 如权利要求18所述的装置,其特征在于,所述构建单元包括:数量确定单元,用于根据所述至少一个字符单句的数量确定构建时间特征序列的时间特征元素的数量;索引确定单元,用于根据所述至少一个字符单句中各字符单句的顺序,确定构建所述时间特征序列的各时间特征元素的索引;数值设置单元,用于针对所述至少一个字符单句中的任一个目标字符单句,将所述目标字符单句与所述目标字符单句的相邻在先字符单句之间的时间间隔设置为所述目标字符单句对应的时间特征元素的数值;序列构建单元,用于按照构建所述时间特征序列的时间特征元素的数量、索引及数值,构建所述时间特征序列。
- 如权利要求18所述的装置,其特征在于,所述构建单元包括:相关计算单元,用于对所述至少一个音频帧中各音频帧进行相关计算,获得所述至少一个音频帧对应的相关函数序列;生成单元,用于对所述至少一个音频帧对应的相关函数序列进行最大值计算,生成基准序列;序列求取单元,用于对所述基准序列进行峰值求取计算,获得所述峰值特征序列。
- 如权利要求19所述的装置,其特征在于,所述优化单元包括:数量统计单元,用于统计所述字幕特征序列中数值为目标值的字符特征元素的数量;判断单元,用于判断所述数量是否位于所述预设段落总数对应的容错区间内;优化处理单元,用于若判断结果为否,调整所述预设相似阀值的大小以调整所述字幕特征序列中的各字符特征元素的数值。
- 如权利要求22所述的装置,其特征在于,所述优化处理单元包括:第一调整单元,用于若所述数量大于所述预设段落总数对应的容错区间内的最大容错值,按照预设步长增大所述预设相似阀值以调整所述字幕特征序列中的各字符特征元素的数值;第二调整单元,用于若所述数量小于所述预设段落总数对应的容错区间内 的最大容错值,按照预设步长减小所述预设相似阀值以调整所述字幕特征序列中的各字符特征元素的数值。
- 如权利要求19所述的装置,其特征在于,所述确定单元包括:目标索引获取单元,用于从优化后的所述字幕特征序列中获取数值为目标值的字符特征元素对应的目标索引;定位单元,用于根据所述目标索引在所述字幕文件中定位段落转折的字符单句;时间读取单元,用于根据所述段落转折的字符单句从所述字幕文件中读取段落变化时间。
- 如权利要求19所述的装置,其特征在于,所述优化单元包括:元素查找单元,用于从所述时间特征序列中查找前预设段落数量减1个最大数值的时间特征元素;数值调整单元,用于将查找到的时间特征元素的数值调整为目标值,将所述时间特征序列中除查找到的时间特征元素之外的其他时间特征元素的数值调整为参考值。
- 如权利要求19所述的装置,其特征在于,所述确定单元包括:目标索引获取单元,用于从调整后的所述时间特征序列中获取数值为目标值的时间特征元素对应的目标索引;定位单元,用于根据所述目标索引在所述字幕文件中定位段落转折的字符单句;时间读取单元,用于根据所述段落转折的字符单句从所述字幕文件中读取段落变化时间。
- 如权利要求19至26任一项所述的装置,其特征在于,所述字幕文件包括至少一个字符单句及各字符单句的关键信息;一个字符单句的关键信息包括:标识、开始时间和结束时间。
- 如权利要求19所述的装置,其特征在于,所述优化单元包括:区间获取单元,用于获取预设的间隔系数对应的扫描区间;规整单元,用于采用所述预设的间隔系数对应的扫描区间对所述峰值特征序列进行规整,将所述预设的间隔系数对应的扫描区间内的最大峰值对应的峰值特征元素的数值设置为目标值,将所述预设的间隔系数对应的扫描区间内除所述最大峰值对应的峰值特征元素之外的其他峰值特征元素的数值设置为初始值。
- 如权利要求19所述的装置,其特征在于,所述确定单元包括:目标索引获取单元,用于从规整后的所述峰值特征序列中获取数值为目标值的峰值特征元素对应的目标索引;时间计算单元,用于根据所述目标索引及所述目标音频文件的采样率,计算段落变化时间。
- 如权利要求19所述的装置,其特征在于,所述获取单元包括:类型获取单元,用于获取所述目标音频文件的类型,所述类型包括:双声道类型或单声道类型;解码单元,用于若所述目标音频文件的类型为单声道类型,对所述目标音频文件从所述单声道输出的内容进行解码获得音频数据;或者,用于若所述目标音频文件的类型为双声道类型,从所述双声道中选取一个声道,对所述目标音频文件从所选取声道输出的内容进行解码获得音频数据;或者将所述双声道处理为混合声道,对所述目标音频文件从所述混合声道输出的内容进行解码获得音频数据。
- 一种终端,其特征在于,包括如权利要求17-30任一项所述的音频处理装置。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2018513709A JP6586514B2 (ja) | 2015-05-25 | 2016-05-13 | オーディオ処理の方法、装置及び端末 |
| EP16799218.9A EP3340238B1 (en) | 2015-05-25 | 2016-05-13 | Method and device for audio processing |
| US15/576,198 US20180158469A1 (en) | 2015-05-25 | 2016-05-13 | Audio processing method and apparatus, and terminal |
Applications Claiming Priority (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510271014.1A CN105047202B (zh) | 2015-05-25 | 2015-05-25 | 一种音频处理方法、装置及终端 |
| CN201510271769.1A CN105047203B (zh) | 2015-05-25 | 2015-05-25 | 一种音频处理方法、装置及终端 |
| CN201510270567.5 | 2015-05-25 | ||
| CN201510271769.1 | 2015-05-25 | ||
| CN201510270567.5A CN104978961B (zh) | 2015-05-25 | 2015-05-25 | 一种音频处理方法、装置及终端 |
| CN201510271014.1 | 2015-05-25 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2016188329A1 true WO2016188329A1 (zh) | 2016-12-01 |
Family
ID=57393734
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2016/081999 Ceased WO2016188329A1 (zh) | 2015-05-25 | 2016-05-13 | 一种音频处理方法、装置及终端 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20180158469A1 (zh) |
| EP (1) | EP3340238B1 (zh) |
| JP (1) | JP6586514B2 (zh) |
| WO (1) | WO2016188329A1 (zh) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104978961A (zh) * | 2015-05-25 | 2015-10-14 | 腾讯科技(深圳)有限公司 | 一种音频处理方法、装置及终端 |
| CN119132305A (zh) * | 2024-08-26 | 2024-12-13 | 平安科技(深圳)有限公司 | 一种翻译方法、装置、设备及其存储介质 |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10567461B2 (en) | 2016-08-04 | 2020-02-18 | Twitter, Inc. | Low-latency HTTP live streaming |
| US10674222B2 (en) * | 2018-01-19 | 2020-06-02 | Netflix, Inc. | Techniques for generating subtitles for trailers |
| CN109213974B (zh) * | 2018-08-22 | 2022-12-20 | 北京慕华信息科技有限公司 | 一种电子文档转换方法及装置 |
| CN111863043B (zh) * | 2020-07-29 | 2022-09-23 | 安徽听见科技有限公司 | 音频转写文件生成方法、相关设备及可读存储介质 |
| CN112259083B (zh) * | 2020-10-16 | 2024-02-13 | 北京猿力未来科技有限公司 | 音频处理方法及装置 |
| CN113591921B (zh) * | 2021-06-30 | 2024-07-19 | 北京旷视科技有限公司 | 图像识别方法及装置、电子设备、存储介质 |
Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6243676B1 (en) * | 1998-12-23 | 2001-06-05 | Openwave Systems Inc. | Searching and retrieving multimedia information |
| JP2001175294A (ja) * | 1999-12-21 | 2001-06-29 | Casio Comput Co Ltd | 音声分析装置及び音声分析方法 |
| CN1595397A (zh) * | 2004-07-14 | 2005-03-16 | 华南理工大学 | 可听文本的自动制作和播放的方法 |
| CN1685345A (zh) * | 2002-11-01 | 2005-10-19 | 三菱电机株式会社 | 用于挖掘视频内容的方法 |
| CN1983276A (zh) * | 2005-11-15 | 2007-06-20 | 国际商业机器公司 | 定位和检索以压缩数字格式存储的数据内容的方法和装置 |
| JP2007206183A (ja) * | 2006-01-31 | 2007-08-16 | Yamaha Corp | カラオケ装置 |
| CN104978961A (zh) * | 2015-05-25 | 2015-10-14 | 腾讯科技(深圳)有限公司 | 一种音频处理方法、装置及终端 |
| CN105047202A (zh) * | 2015-05-25 | 2015-11-11 | 腾讯科技(深圳)有限公司 | 一种音频处理方法、装置及终端 |
| CN105047203A (zh) * | 2015-05-25 | 2015-11-11 | 腾讯科技(深圳)有限公司 | 一种音频处理方法、装置及终端 |
Family Cites Families (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB9917985D0 (en) * | 1999-07-30 | 1999-09-29 | Scient Generics Ltd | Acoustic communication system |
| JP4203308B2 (ja) * | 2002-12-04 | 2008-12-24 | パイオニア株式会社 | 楽曲構造検出装置及び方法 |
| US20060173692A1 (en) * | 2005-02-03 | 2006-08-03 | Rao Vishweshwara M | Audio compression using repetitive structures |
| EP1785891A1 (en) * | 2005-11-09 | 2007-05-16 | Sony Deutschland GmbH | Music information retrieval using a 3D search algorithm |
| US20080300702A1 (en) * | 2007-05-29 | 2008-12-04 | Universitat Pompeu Fabra | Music similarity systems and methods using descriptors |
| JP5130809B2 (ja) * | 2007-07-13 | 2013-01-30 | ヤマハ株式会社 | 楽曲を制作するための装置およびプログラム |
| US8428949B2 (en) * | 2008-06-30 | 2013-04-23 | Waves Audio Ltd. | Apparatus and method for classification and segmentation of audio content, based on the audio signal |
| CN102467939B (zh) * | 2010-11-04 | 2014-08-13 | 北京彩云在线技术开发有限公司 | 一种歌曲音频切割装置及方法 |
| CN102956238B (zh) * | 2011-08-19 | 2016-02-10 | 杜比实验室特许公司 | 用于在音频帧序列中检测重复模式的方法及设备 |
| GB2523973B (en) * | 2012-12-19 | 2017-08-02 | Magas Michela | Audio analysis system and method using audio segment characterisation |
| US9183849B2 (en) * | 2012-12-21 | 2015-11-10 | The Nielsen Company (Us), Llc | Audio matching with semantic audio recognition and report generation |
| CN104347067B (zh) * | 2013-08-06 | 2017-04-12 | 华为技术有限公司 | 一种音频信号分类方法和装置 |
| CN104347068B (zh) * | 2013-08-08 | 2020-05-22 | 索尼公司 | 音频信号处理装置和方法以及监控系统 |
| EP2963651A1 (en) * | 2014-07-03 | 2016-01-06 | Samsung Electronics Co., Ltd | Method and device for playing multimedia |
-
2016
- 2016-05-13 WO PCT/CN2016/081999 patent/WO2016188329A1/zh not_active Ceased
- 2016-05-13 JP JP2018513709A patent/JP6586514B2/ja active Active
- 2016-05-13 EP EP16799218.9A patent/EP3340238B1/en active Active
- 2016-05-13 US US15/576,198 patent/US20180158469A1/en not_active Abandoned
Patent Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6243676B1 (en) * | 1998-12-23 | 2001-06-05 | Openwave Systems Inc. | Searching and retrieving multimedia information |
| JP2001175294A (ja) * | 1999-12-21 | 2001-06-29 | Casio Comput Co Ltd | 音声分析装置及び音声分析方法 |
| CN1685345A (zh) * | 2002-11-01 | 2005-10-19 | 三菱电机株式会社 | 用于挖掘视频内容的方法 |
| CN1595397A (zh) * | 2004-07-14 | 2005-03-16 | 华南理工大学 | 可听文本的自动制作和播放的方法 |
| CN1983276A (zh) * | 2005-11-15 | 2007-06-20 | 国际商业机器公司 | 定位和检索以压缩数字格式存储的数据内容的方法和装置 |
| JP2007206183A (ja) * | 2006-01-31 | 2007-08-16 | Yamaha Corp | カラオケ装置 |
| CN104978961A (zh) * | 2015-05-25 | 2015-10-14 | 腾讯科技(深圳)有限公司 | 一种音频处理方法、装置及终端 |
| CN105047202A (zh) * | 2015-05-25 | 2015-11-11 | 腾讯科技(深圳)有限公司 | 一种音频处理方法、装置及终端 |
| CN105047203A (zh) * | 2015-05-25 | 2015-11-11 | 腾讯科技(深圳)有限公司 | 一种音频处理方法、装置及终端 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3340238A4 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104978961A (zh) * | 2015-05-25 | 2015-10-14 | 腾讯科技(深圳)有限公司 | 一种音频处理方法、装置及终端 |
| CN104978961B (zh) * | 2015-05-25 | 2019-10-15 | 广州酷狗计算机科技有限公司 | 一种音频处理方法、装置及终端 |
| CN119132305A (zh) * | 2024-08-26 | 2024-12-13 | 平安科技(深圳)有限公司 | 一种翻译方法、装置、设备及其存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2018522288A (ja) | 2018-08-09 |
| JP6586514B2 (ja) | 2019-10-02 |
| EP3340238A4 (en) | 2019-06-05 |
| US20180158469A1 (en) | 2018-06-07 |
| EP3340238A1 (en) | 2018-06-27 |
| EP3340238B1 (en) | 2020-07-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2016188329A1 (zh) | 一种音频处理方法、装置及终端 | |
| US10776422B2 (en) | Dual sound source audio data processing method and apparatus | |
| US20230059882A1 (en) | Speech synthesis method and apparatus, device and computer storage medium | |
| WO2020119187A1 (zh) | 用于分割视频的方法和装置 | |
| WO2020029966A1 (zh) | 视频处理方法及装置、电子设备和存储介质 | |
| WO2021056797A1 (zh) | 音视频信息处理方法及装置、电子设备和存储介质 | |
| CN110234037A (zh) | 视频片段的生成方法及装置、计算机设备及可读介质 | |
| CN109644283B (zh) | 基于音频能量特性的音频指纹识别 | |
| US8060494B2 (en) | Indexing and searching audio using text indexers | |
| CN107766492B (zh) | 一种图像搜索的方法和装置 | |
| CN107591149A (zh) | 音频合成方法、装置及存储介质 | |
| US20180157657A1 (en) | Method, apparatus, client terminal, and server for associating videos with e-books | |
| CN108388597A (zh) | 会议摘要生成方法以及装置 | |
| WO2024045475A1 (zh) | 语音识别方法、装置、设备和介质 | |
| WO2023160515A1 (zh) | 视频处理方法、装置、设备及介质 | |
| CN114692630A (zh) | 分词方法、装置、电子设备及可读存储介质 | |
| CN116489449B (zh) | 一种视频冗余片段检测方法及系统 | |
| CN115331660B (zh) | 神经网络训练方法、语音识别方法、装置、设备及介质 | |
| CN104637496B (zh) | 计算机系统及音频比对方法 | |
| CN114222193B (zh) | 一种视频字幕时间对齐模型训练方法及系统 | |
| US11490170B2 (en) | Method for processing video, electronic device, and storage medium | |
| De Nies et al. | Ghent University-iMinds at MediaEval 2013: An Unsupervised Named Entity-based Similarity Measure for Search and Hyperlinking. | |
| CN116028669A (zh) | 一种基于短视频的视频搜索方法、装置、系统和存储介质 | |
| CN113901269B (zh) | 视频召回方法 | |
| CN115906781B (zh) | 音频识别加锚点方法、装置、设备及可读存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16799218 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 15576198 Country of ref document: US |
|
| ENP | Entry into the national phase |
Ref document number: 2018513709 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2016799218 Country of ref document: EP |
