WO2023045730A1 - 一种音视频处理方法、装置、设备及存储介质 - Google Patents

一种音视频处理方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023045730A1
WO2023045730A1 PCT/CN2022/116650 CN2022116650W WO2023045730A1 WO 2023045730 A1 WO2023045730 A1 WO 2023045730A1 CN 2022116650 W CN2022116650 W CN 2022116650W WO 2023045730 A1 WO2023045730 A1 WO 2023045730A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
video
edited
text data
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2022/116650
Other languages
English (en)
French (fr)
Inventor
郑炜明
郦橙
付雪伦
黄益修
夏瑞
郑鑫
鲍琳
王维斯
丁辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to KR1020237044829A priority Critical patent/KR102919002B1/ko
Priority to JP2023578889A priority patent/JP7764507B2/ja
Priority to EP22871780.7A priority patent/EP4344225A4/en
Publication of WO2023045730A1 publication Critical patent/WO2023045730A1/zh
Priority to US18/395,118 priority patent/US20240127860A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs

Definitions

  • the present disclosure relates to the field of data processing, and in particular to an audio and video processing method, device, equipment and storage medium.
  • an embodiment of the present disclosure provides an audio and video processing method, which can improve the accuracy of audio and video editing and simplify user operations.
  • the present disclosure provides an audio and video processing method, the method comprising:
  • the audio and video segment corresponding to the target audio and video time stamp in the audio and video to be edited is processed.
  • the method also includes:
  • the preset keywords or the preset mute segments in the text data are displayed according to a preset second display style.
  • the first editing entry corresponds to a first editing card, and a one-key delete control is set on the first editing card; in response to a trigger operation on the first editing entry,
  • the preset second display style after displaying the preset keyword or the preset mute segment in the text data, it further includes:
  • the preset keyword or the preset mute segment is deleted from the text data.
  • the method also includes:
  • the human voice in the audio and video to be edited is enhanced.
  • the method also includes:
  • the method also includes:
  • normalization processing is performed on the loudness of the volume in the audio and video to be edited.
  • the method also includes:
  • the volume of music and voice in the audio and video clips in the audio and video In response to the trigger operation for the smart clip control, adjust the volume of music and voice in the audio and video clips in the audio and video to be edited in the previous preset time period, and obtain the audio and video clips after volume adjustment; Wherein, the volume of the music in the audio and video segment after the volume adjustment is inversely proportional to the volume of the human voice.
  • the preset operation includes a selection operation, and based on the preset operation, the audio and video segment corresponding to the target audio and video time stamp in the audio and video to be edited is processed ,include:
  • the audio and video segment corresponding to the target audio and video time stamp in the audio and video to be edited is displayed.
  • the preset operation includes a delete operation, and based on the preset operation, the audio and video segment corresponding to the target audio and video time stamp in the audio and video to be edited is processed ,include:
  • the audio and video segment corresponding to the target audio and video time stamp in the audio and video to be edited is deleted.
  • the preset operation includes a modification operation, and based on the preset operation, the audio and video segment corresponding to the target audio and video time stamp in the audio and video to be edited is processed ,include:
  • the audio-video segment corresponding to the target audio-video time stamp in the audio-video segment to be edited is replaced.
  • the method also includes:
  • a first audio and video clip is generated based on the first text data and the timbre information in the audio and video to be edited;
  • the first audio and video segment is added to the audio and video to be edited.
  • the present disclosure also provides an audio and video processing device, the device comprising:
  • the first display module is used to display the text data corresponding to the audio and video to be edited; wherein, the text data has a mapping relationship with the audio and video timestamp of the audio and video to be edited;
  • the second display module is used to display the audio and video to be edited according to the time axis track;
  • a determining module configured to determine the audio and video timestamp corresponding to the target text data as the target audio and video timestamp in response to a preset operation triggered for the target text data in the text data;
  • the editing module is configured to process the audio and video segment corresponding to the target audio and video time stamp in the audio and video to be edited based on the preset operation.
  • the present disclosure provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is made to implement the above method.
  • the present disclosure provides a device, including: a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the computer program, Implement the above method.
  • the present disclosure provides a computer program product, where the computer program product includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the above method is implemented.
  • the embodiment of the present disclosure provides an audio and video processing method, by displaying the text data corresponding to the audio and video to be edited, in response to the preset operation triggered for the target text data in the text data, determine the audio and video timestamp corresponding to the target text data , as the target audio and video timestamp, and based on preset operations, process the audio and video segment corresponding to the target audio and video timestamp in the audio and video to be edited. It can be seen that the audio and video processing method provided by the embodiments of the present disclosure can improve the accuracy of audio and video editing, simplify user operations, and lower the threshold for user operations.
  • FIG. 1 is a flowchart of an audio and video processing method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of an audio and video processing interface provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of another audio and video processing interface provided by an embodiment of the present disclosure.
  • FIG. 4 is a flowchart of another audio and video processing method provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of another audio and video processing interface provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of another audio and video processing interface provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an audio and video processing device provided by an embodiment of the present disclosure.
  • Fig. 8 is a schematic structural diagram of an audio and video processing device provided by an embodiment of the present disclosure.
  • FIG. 1 it is a flowchart of an audio and video processing method provided by an embodiment of the present disclosure. The method includes:
  • S101 Display text data corresponding to audio and video to be edited.
  • the text data has a mapping relationship with the audio and video time stamp of the audio and video to be edited, and the audio and video time stamp is used to indicate the playing time of each frame of the audio and video.
  • audio and video to be edited include but are not limited to recorded audio and video, audio and video obtained based on a script, and the like.
  • the text data can be obtained by speech recognition of the audio and video to be edited, or it can be a script. Wherein, when the text data is a script, the text data can be matched with the audio and video to be edited to obtain the audio and video of the aforementioned text data and the audio and video to be edited.
  • the mapping relationship of video time stamps, speech recognition methods include but not limited to ASR (Automatic Speech Recognition, automatic speech recognition) technology.
  • the text data can be displayed on the interface.
  • the interface is shown in FIG. 2
  • area P in FIG. 2 shows the displayed text data.
  • the text data of different users can be determined, such as the text data of user a and user b shown in FIG. 2 .
  • S102 Display the audio and video to be edited according to the time axis track.
  • the audio and video to be edited can be displayed on the interface according to the time axis track.
  • area Q in FIG. 2 shows the displayed audio and video to be edited.
  • step 102 is not specifically limited.
  • the preset operations include, but are not limited to, selection operations, deletion operations, and modification operations. Since the text data has a mapping relationship with the audio and video time stamp of the audio and video to be edited, for the target text data in the text data, the target audio and video time stamp corresponding to the target text data can be determined according to the mapping relationship.
  • S104 Based on the preset operation, process the audio and video segment corresponding to the target audio and video time stamp in the audio and video to be edited.
  • the corresponding audio and video segments in the audio and video to be edited can be determined based on the audio and video timestamps, and the audio and video clips based on text are realized by processing the audio and video segments corresponding to the target audio and video timestamps in the audio and video to be edited , by editing the text and the audio and video clips corresponding to the linked editing, it is possible to realize the editing of the audio and video with high accuracy.
  • the preset operation includes a selection operation, and based on the preset operation, processing the audio and video segment corresponding to the target audio and video timestamp in the audio and video to be edited includes: following the preset first display style , to display the audio and video segment corresponding to the target audio and video timestamp in the audio and video to be edited.
  • the first display style is, for example, highlighting.
  • Figure 3 shows a schematic diagram of another interface. Referring to Figure 3, based on the selection operation, the target text data can be highlighted, and based on the time axis track The audio and video segment corresponding to the target audio and video timestamp is highlighted, and the highlighted part is shown as the dotted line in Figure 3 .
  • the preset operation includes a delete operation, and based on the preset operation, the audio and video segment corresponding to the target audio and video timestamp in the audio and video to be edited is processed, including: based on the delete operation, the audio and video segment to be edited is processed The audio and video segment corresponding to the target audio and video timestamp in the video is deleted.
  • the target text data may be deleted, and the audio and video segment corresponding to the target audio and video time stamp may be deleted.
  • a delete control may be displayed, and in response to a trigger operation on the delete control, the target text data and the audio and video segment corresponding to the target audio and video timestamp are deleted.
  • the preset operation includes a modification operation, and based on the preset operation, the audio and video segment corresponding to the target audio and video timestamp in the audio and video to be edited is processed, including: obtaining the modified Text data; an audio and video segment is generated based on the modified text data and the timbre information in the audio and video to be edited, as the audio and video segment to be modified; the audio and video segment corresponding to the target audio and video timestamp in the audio and video to be edited is generated by using the audio and video segment to be modified. Video clips are replaced.
  • the target text data can be modified.
  • the modification control can be displayed, in response to the trigger operation for the modification control, and the modified content is generated according to the received modification content.
  • text data can be generated, and the audio and video segment corresponding to the target audio and video time stamp is replaced according to the audio and video segment to be modified, so as to realize the modification of the audio and video to be edited.
  • the audio and video processing method provided by the embodiments of the present disclosure, by displaying the text data corresponding to the audio and video to be edited, in response to the preset operation triggered for the target text data in the text data, determine the audio and video timestamp corresponding to the target text data, As the target audio and video timestamp, and based on preset operations, the audio and video segment corresponding to the target audio and video timestamp in the audio and video to be edited is processed. It can be seen that the audio and video processing method provided by the embodiments of the present disclosure can edit audio and video based on text. Since there is a mapping relationship between text and audio and video timestamps, by editing text and associated audio and video clips, audio and video can be edited.
  • ineffective modal particles such as "um”, “uh” and “that” and silent segments usually appear in the dialogue. Therefore, in order to ensure the continuity of the dialogue, there is an audio and video to be edited to Delete the aforementioned invalid modal particles and the need for silent segments.
  • the audio and video processing method of the embodiment of the present disclosure also includes:
  • Step 401 showing the first editing entry for preset keywords or preset silent segments.
  • the text data corresponding to the audio and video to be edited can be detected to determine the preset keywords or preset mute segments in the text data, and there are preset keywords or preset mute segments in the text data
  • the first edit entry is displayed.
  • the control shown in area A in FIG. 3 is the first editing entry, and the information of "modification suggestion 01: remove invalid modal particles" is displayed on the first editing entry.
  • the preset keywords can include vocabulary such as invalid modal particles, and there are many ways to determine the preset keywords in the text data.
  • Language processing technology to determine preset keywords in text data.
  • the preset mute segment is determined according to the interval between the audio and video timestamps corresponding to two adjacent characters, for example, when the interval is greater than a preset threshold, it is determined that there is a preset mute segment between two adjacent characters .
  • the mute segment can be displayed in the form of spaces on the interface.
  • the display length of the mute segment can be determined according to the value of the interval.
  • Step 402 In response to the trigger operation on the first edit entry, display the preset keywords or preset mute segments in the text data according to the preset second display style.
  • the trigger operation for the first editing entry includes but not limited to click operation, voice instruction, and touch track.
  • the second display style may be highlighting or other display styles, which are not specifically limited here.
  • Fig. 5 shows a schematic diagram of an interface.
  • preset keywords “uh”, “um” and “that” are highlighted in the interface, as shown by the dotted line.
  • Step 403 in response to the trigger operation on the one-key delete control, delete the preset keyword or the preset mute segment from the text data.
  • the first editing entry corresponds to the first editing card, and a one-key delete control is set on the first editing card.
  • the first editing card is displayed, and the displaying manner of the first editing card includes but not limited to a drop-down option, a floating window, and the like.
  • the first editing card is shown in area B in Fig. 5, and the number of occurrences can be counted for each preset keyword, and the preset keywords and the corresponding number of occurrences can be displayed in the first editing card .
  • the target keyword in response to a trigger operation for the target keyword in the preset keywords, is removed from the preset keywords, and the number of occurrences of the preset keywords displayed in the first edit card is synchronously modified, so as to Enable users to remove keywords that are not invalid modal particles by clicking and other operations, so as to avoid being deleted with one click.
  • the deletion operation of preset keywords or preset silent segments can be presented in the form of an editing card, providing one-click operation, saving editing time, simplifying user operations, and lowering the threshold for users to use.
  • the audio and video processing method further includes: displaying a voice enhancement control on the second editing card; in response to a trigger operation on the voice enhancement control, performing enhancement processing on the human voice in the audio and video to be edited.
  • the second editing entry for the audio and video to be edited is displayed, and the second editing entry corresponds to the second editing card, and the voice enhancement control is set on the second editing card.
  • noise detection can be performed based on audio and video to be edited, and when noise is detected, a second editing entry is displayed.
  • the control shown in area C in FIG. 2 is the second editing entry, and the second editing entry The "Enhancement Suggestion: Speech Enhancement" message is displayed on the display.
  • a second edit card is displayed.
  • the second editing card is shown in area D in Fig. 6.
  • the voice enhancement control "Enhanced Voice" is displayed in the second editing card.
  • the sound is enhanced, and the triggering operation includes but is not limited to click operation, voice command, and touch track.
  • the voice enhancement operation can be presented in the form of an edit card, providing a one-button operation, which can enhance the user's voice to satisfy the listening experience, simplify the user operation, and lower the threshold for the user to use.
  • the audio and video processing method further includes: based on the music genre of the audio and video to be edited and/or the content in the text data corresponding to the audio and video to be edited, determining the soundtrack corresponding to the audio and video to be edited; Add to the audio and video clips to be edited.
  • multiple tags can be preset, and there is a mapping relationship between each tag and one or more soundtracks, based on the music genre of the audio and video to be edited and/or the content in the text data corresponding to the audio and video to be edited, Determine the tag corresponding to the music genre and/or the content in the text data, and determine the corresponding soundtrack of the audio and video to be edited based on the mapping relationship between the tag and the soundtrack.
  • the theme of the content as "sports” based on natural language processing technology, and then determine the soundtrack corresponding to the "sports" label, which is the soundtrack corresponding to the audio and video to be edited , and add the soundtrack to the audio and video clip to be edited.
  • the corresponding label is determined, and the soundtrack corresponding to the label is used as the soundtrack corresponding to the audio and video to be edited, and the soundtrack is added to the audio and video segment to be edited.
  • the soundtrack can be intelligently recommended based on the content and genre of the text data to meet the scene requirements for adding soundtracks, enrich the diversity of listening experience, improve the listening experience, simplify user operations, and lower the threshold for users to use.
  • the audio and video processing method further includes: displaying a loudness equalization control on the third editing card; in response to a trigger operation on the loudness equalization control, performing normalization processing on the loudness of the volume in the audio and video to be edited .
  • a third editing entry for the audio and video to be edited is displayed, the third editing entry corresponds to the third editing card, and a loudness equalization control is set on the third editing card.
  • the volume loudness detection can be performed based on the audio and video to be edited, and a third editing entry is displayed when it is detected that the audio and video to be edited does not satisfy the preset loudness equalization condition.
  • the third editing card is displayed, and in response to the trigger operation on the loudness balance control, the loudness of the volume in the audio and video to be edited is normalized, for example, the audio and video in the audio and video to be edited The loudness of the volume is within the preset range.
  • the loudness equalization operation can be presented in the form of an editing card, providing one-click operation, which can improve the listening experience, simplify user operations, and lower the threshold for users to use.
  • the audio and video processing method further includes: displaying the smart clip control on the fourth editing card; in response to a trigger operation on the smart clip control, The music volume and the human voice volume in the audio and video clips are adjusted to obtain the audio and video clips after volume adjustment.
  • the fourth editing entry for the audio and video to be edited is displayed, and the fourth editing entry corresponds to the fourth editing card, and the smart clip control is set on the fourth editing card.
  • the fourth editing card is displayed, and in response to the trigger operation for the smart clip control, the music volume and vocal Adjust the volume, for example, increase the human voice volume by the first volume value, decrease the music volume by the second volume value, or decrease the music volume by the third volume value in the audio and video segment where the human voice is detected, and obtain the adjusted volume Audio and video clips.
  • the volume of the music in the audio and video segment is inversely proportional to the volume of the human voice.
  • title generation can also be realized, for example, in response to a trigger operation on the smart cutout control, determine the currently selected second text data and the corresponding second text data For the second audio and video segment, the second text data and the second audio and video segment are copied and pasted to the preset title area to realize the effect of the clip.
  • the smart clip function can be presented in the form of an editing card, providing one-click operation, realizing the effect of clips, simplifying user operations, and lowering the threshold for users to use.
  • the audio and video processing method further includes: when receiving an adding operation for the first text data in the text data, generating a second text data based on the first text data and the timbre information in the audio and video to be edited. An audio and video segment; based on the position information of the first text data in the text data, determine the first audio and video timestamp corresponding to the first text data; based on the first audio and video timestamp, add the first audio and video segment to the to-be-edited audio and video.
  • the first text data may be obtained in response to an input operation, or may be obtained based on copying existing text data.
  • the timbre information of each user can be obtained according to the audio and video to be edited.
  • the corresponding first audio and video time stamp is determined according to the position information of the first text data in the text data, and the time stamp of the first audio and video is Add the first audio and video clip at the poked position.
  • the aforementioned editing entry can be automatically displayed based on the detection result, or can be displayed on the interface in response to a trigger operation.
  • timbre cloning and voice broadcasting technologies are used to clone timbre based on added text and intelligently generate audio and video clips, which realizes the addition of audio and video clips based on text input, reduces the time cost and editing cost caused by re-recording, and simplifies user experience. operate.
  • the present disclosure also provides an audio and video processing device.
  • FIG. 7 it is a schematic structural diagram of an audio and video processing device provided by an embodiment of the present disclosure.
  • the device includes:
  • the first display module 701 is configured to display the text data corresponding to the audio and video to be edited; wherein, the text data has a mapping relationship with the audio and video time stamp of the audio and video to be edited.
  • the second display module 702 is configured to display the audio and video to be edited according to the time axis track.
  • the determining module 703 is configured to determine an audio and video time stamp corresponding to the target text data as a target audio and video time stamp in response to a preset operation triggered for the target text data in the text data.
  • the editing module 704 is configured to process the audio and video segment corresponding to the target audio and video time stamp in the audio and video to be edited based on the preset operation.
  • the audio and video processing device also includes:
  • the first processing module is configured to display a first edit entry for a preset keyword or a preset mute segment; in response to a trigger operation for the first edit entry, according to a preset second display style, edit the text data The preset keyword or the preset mute segment in the .
  • the first edit entry corresponds to the first edit card, and the first edit card is provided with a one-key delete control; the first edit module is also used to respond to the one-key delete control The trigger operation of deleting the preset keyword or the preset silent segment from the text data.
  • the audio and video processing device also includes:
  • the second processing module is configured to display a voice enhancement control on the second editing card; in response to a trigger operation on the voice enhancement control, perform enhancement processing on the human voice in the audio and video to be edited.
  • the audio and video processing device also includes:
  • the first adding module is used to determine the soundtrack corresponding to the audio and video to be edited based on the music genre of the audio and video to be edited and/or the content in the text data corresponding to the audio and video to be edited; add the soundtrack to to the audio and video segment to be edited.
  • the audio and video processing device also includes:
  • the third processing module is configured to display the loudness equalization control on the third editing card; in response to a trigger operation on the loudness equalization control, normalize the loudness of the volume in the audio and video to be edited.
  • the audio and video processing device also includes:
  • the fourth processing module is used to display the smart cutout control on the fourth editing card; in response to the trigger operation for the smart cutout control, the audio and video clips in the pre-set time period in the audio and video to be edited
  • the volume of the music and the volume of the human voice are adjusted to obtain the volume-adjusted audio and video segment; wherein, the volume of the music in the audio and video segment after the volume adjustment is inversely proportional to the volume of the human voice.
  • the preset operation includes a selection operation
  • the editing module 704 is specifically configured to: display the audio and video segment corresponding to the target audio and video time stamp in the audio and video to be edited according to a preset first display style.
  • the preset operation includes a delete operation
  • the editing module 704 is specifically configured to: based on the delete operation, edit the audio and video segment corresponding to the target audio and video timestamp in the audio and video to be edited to delete.
  • the preset operation includes a modification operation
  • the editing module 704 is specifically configured to: obtain the modified text data corresponding to the modification operation; based on the modified text data and the audio and video to be edited
  • the timbre information generates an audio and video segment as the audio and video segment to be modified; using the audio and video segment to be modified, the audio and video segment corresponding to the target audio and video time stamp in the audio and video to be edited is replaced.
  • the audio and video processing device also includes:
  • the second adding module is configured to generate a first audio and video based on the first text data and the timbre information in the audio and video to be edited when receiving an adding operation for the first text data in the text data segment; based on the position information of the first text data in the text data, determine the first audio and video timestamp corresponding to the first text data; based on the first audio and video timestamp, convert the first The audio and video clips are added to the audio and video to be edited.
  • the audio and video processing device by displaying the text data corresponding to the audio and video to be edited, in response to the preset operation triggered for the target text data in the text data, determine the audio and video timestamp corresponding to the target text data, As the target audio and video timestamp, and based on preset operations, the audio and video segment corresponding to the target audio and video timestamp in the audio and video to be edited is processed. It can be seen that the audio and video processing method provided by the embodiments of the present disclosure can edit audio and video based on text. Since there is a mapping relationship between text and audio and video timestamps, by editing text and associated audio and video clips, audio and video can be edited.
  • an embodiment of the present disclosure also provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device realizes this
  • the audio and video processing method described in the embodiment is disclosed.
  • the embodiment of the present disclosure also provides a computer program product, the computer program product includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the audio and video processing method described in the embodiment of the present disclosure is implemented.
  • an embodiment of the present disclosure also provides an audio and video processing device, as shown in FIG. 8, which may include:
  • Processor 801 , memory 802 , input device 803 and output device 804 The number of processors 801 in the audio and video processing device may be one or more, and one processor is taken as an example in FIG. 8 .
  • the processor 801 , the memory 802 , the input device 803 and the output device 804 may be connected through a bus or in other ways, wherein connection through a bus is taken as an example in FIG. 8 .
  • the memory 802 can be used to store software programs and modules, and the processor 801 executes various functional applications and data processing of the audio and video processing device by running the software programs and modules stored in the memory 802 .
  • the memory 802 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function, and the like.
  • the memory 802 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.
  • the input device 803 can be used to receive input digital or character information, and generate signal input related to user settings and function control of the audio and video processing equipment.
  • the processor 801 will load the executable files corresponding to the processes of one or more application programs into the memory 802 according to the following instructions, and the processor 801 will run the executable files stored in the memory 802. Application programs, so as to realize various functions of the above-mentioned audio and video processing equipment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Television Signal Processing For Recording (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

提供了一种音视频处理方法、装置、设备及存储介质,其中,方法包括:展示待编辑音视频对应的文本数据;其中,文本数据与待编辑音视频的音视频时间戳具有映射关系;以及,按照时间轴轨道展示待编辑音视频;响应于针对文本数据中的目标文本数据触发的预设操作,确定目标文本数据对应的音视频时间戳,作为目标音视频时间戳;基于预设操作,对待编辑音视频中与目标音视频时间戳对应的音视频片段进行处理。

Description

一种音视频处理方法、装置、设备及存储介质
相关申请的交叉引用
本公开要求于2021年9月22日提交的,申请名称为“一种音视频处理方法、装置、设备及存储介质”的、中国专利申请号为“202111109213.4”的优先权,该中国专利申请的全部内容通过引用结合在本公开中。
技术领域
本公开涉及数据处理领域,尤其涉及一种音视频处理方法、装置、设备及存储介质。
背景技术
随着互联网信息的日益丰富,观看音视频已成为人们日常生活中的一项娱乐活动。为了提高用户的观看体验,在发布各类音视频之前,对音视频进行剪辑是一个重要环节。
目前,在音视频剪辑过程中,对于无效词汇剪辑等一些细微改动,通常是用户反复收听音视频,同时微调时间起始点和结束点,以对音视频进行剪辑,操作繁琐,音视频剪辑的准确性有待提高。
技术解决方案
为了解决上述技术问题或者至少部分地解决上述技术问题,本公开实施例提供了一种音视频处理方法,能够提高音视频剪辑的精确性,简化用户操作。
第一方面,本公开提供了一种音视频处理方法,所述方法包括:
展示待编辑音视频对应的文本数据;其中,所述文本数据与所述待编辑音视频的音视频时间戳具有映射关系;
以及,按照时间轴轨道展示所述待编辑音视频;
响应于针对所述文本数据中的目标文本数据触发的预设操作,确定所述目标文本数据对应的音视频时间戳,作为目标音视频时间戳;
基于所述预设操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行处理。
一种可选的实施方式中,所述方法还包括:
展示针对预设关键词或预设静音片段的第一编辑入口;
响应于针对所述第一编辑入口的触发操作,按照预设第二显示样式,对所述文本数据中的所述预设关键词或所述预设静音片段进行显示。
一种可选的实施方式中,所述第一编辑入口对应于第一编辑卡片,所述第一编辑卡片上设置有一键删除控件;所述响应于针对所述第一编辑入口的触发操作,按照预设第二显示样式,对所述文本数据中的所述预设关键词或所述预设静音片段进行显示之后,还包括:
响应于针对所述一键删除控件的触发操作,从所述文本数据中删除所述预设关键词或所述预设静音片段。
一种可选的实施方式中,所述方法还包括:
在第二编辑卡片上展示语音增强控件;
响应于针对所述语音增强控件的触发操作,对所述待编辑音视频中的人声进行增强处理。
一种可选的实施方式中,所述方法还包括:
基于所述待编辑音视频的音乐体裁和/或所述待编辑音视频对应的文本数据中的内容,确定所述待编辑音视频对应的配乐;
将所述配乐添加到所述待编辑音视频片段中。
一种可选的实施方式中,所述方法还包括:
在第三编辑卡片上展示响度均衡控件;
响应于针对所述响度均衡控件的触发操作,对所述待编辑音视频中音量的响度进行归一化处理。
一种可选的实施方式中,所述方法还包括:
在第四编辑卡片上展示智能片花控件;
响应于针对所述智能片花控件的触发操作,对所述待编辑音视频中的前预设时间段内的音视频片段中的音乐音量与人声音量进行调节,得到音量调节后音视频片段;其中,所述音量调节后音视频片段中的音乐音量与人声音量成反比。
一种可选的实施方式中,所述预设操作包括选中操作,所述基于所述预设操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行处理,包括:
按照预设第一显示样式,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行显示。
一种可选的实施方式中,所述预设操作包括删除操作,所述基于所述预设操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行处理,包括:
基于所述删除操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行删除。
一种可选的实施方式中,所述预设操作包括修改操作,所述基于所述预设操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行处理,包括:
获取所述修改操作对应的修改后文本数据;
基于所述修改后文本数据和所述待编辑音视频中的音色信息生成音视频片段,作为待修改音视频片段;
利用所述待修改音视频片段,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行替换处理。
一种可选的实施方式中,所述方法还包括:
当接收到在所述文本数据中针对第一文本数据的增加操作时,基于所述第一文本数据和所述待编辑音视频中的音色信息,生成第一音视频片段;
基于所述第一文本数据在所述文本数据中的位置信息,确定所述第一文本数据对应的第一音视频时间戳;
基于所述第一音视频时间戳,将所述第一音视频片段添加到所述待编辑音视频中。
第二方面,本公开还提供了一种音视频处理装置,所述装置包括:
第一展示模块,用于展示待编辑音视频对应的文本数据;其中,所述文本数据与所述待编辑音视频的音视频时间戳具有映射关系;
第二展示模块,用于按照时间轴轨道展示所述待编辑音视频;
确定模块,用于响应于针对所述文本数据中的目标文本数据触发的预设操作,确定所述目标文本数据对应的音视频时间戳,作为目标音视频时间戳;
编辑模块,用于基于所述预设操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行处理。
第三方面,本公开提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在终端设备上运行时,使得所述终端设备实现上述的方法。
第四方面,本公开提供了一种设备,包括:存储器,处理器,及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现上述的方法。
第五方面,本公开提供了一种计算机程序产品,所述计算机程序产品包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现上述的方法。
本公开实施例提供的技术方案与相关技术相比具有如下优点:
本公开实施例提供了一种音视频处理方法,通过展示待编辑音视频对应的文本数据,响应于针对文本数据中的目标文本数据触发的预设操作,确定目标文本数据对应的音视频时间戳,作为目标音视频时间戳,以及基于预设操作,对待编辑音视频中与目标音视频时间戳对应的音视频片段进行处理。可见,本公开实施例提供的音视频处理方 法能够提高音视频剪辑的精确性,简化了用户操作,降低用户操作门槛。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的一种音视频处理方法的流程图;
图2为本公开实施例提供的一种音视频处理界面的示意图;
图3为本公开实施例提供的另一种音视频处理界面的示意图;
图4为本公开实施例提供的另一种音视频处理方法的流程图;
图5为本公开实施例提供的另一种音视频处理界面的示意图;
图6为本公开实施例提供的另一种音视频处理界面的示意图;
图7为本公开实施例提供的一种音视频处理装置的结构示意图;
图8为本公开实施例提供的一种音视频处理设备的结构示意图。
具体实施方式
为了能够更清楚地理解本公开的上述目的、特征和优点,下面将对本公开的方案进行进一步描述。需要说明的是,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本公开,但本公开还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本公开的一部分实施例,而不是全部的实施例。
本公开实施例提供了一种音视频处理方法,参考图1,为本公开实施例提供的一种音视频处理方法的流程图,该方法包括:
S101:展示待编辑音视频对应的文本数据。
其中,文本数据与待编辑音视频的音视频时间戳具有映射关系,音视频时间戳用于指示每帧音视频的播放时间。
本公开实施例中,待编辑音视频包括但不限于录制得到的音视频、基于脚本得到的音视频等。文本数据可以是对待编辑音视频进行语音识别得到的,也可以是脚本,其中,文本数据为脚本的情况下,可以将文本数据与待编辑音视频匹配得到前述文本数据与待编辑音视频的音视频时间戳的映射关系,语音识别方法包括但不限于ASR(Automatic Speech Recognition,自动语音识别)技术。
本实施例中,可在界面上展示文本数据,作为一种示例,界面例如图2所示,图2中区域P示出了展示的文本数据。待编辑音视频包含不同用户的语音的情况下,可以确定不同用户的文本数据,如图2中展示的用户a和用户b的文本数据。
S102:按照时间轴轨道展示待编辑音视频。
本实施例中,可在界面上按照时间轴轨道展示待编辑音视频,作为一种示例,图2中区域Q示出了展示的待编辑音视频。
需要说明的是,步骤102的执行顺序不作具体限制。
S103:响应于针对文本数据中的目标文本数据触发的预设操作,确定目标文本数据对应的音视频时间戳,作为目标音视频时间戳。
本实施例中,预设操作包括但不限于选中操作、删除操作、修改操作。由于文本数据与待编辑音视频的音视频时间戳具有映射关系,因此针对文本数据中的目标文本数据,可以根据映射关系确定目标文本数据对应的目标音视频时间戳。
S104:基于预设操作,对待编辑音视频中与目标音视频时间戳对应的音视频片段进行处理。
本公开实施例中,基于音视频时间戳可确定待编辑音视频中的对应音视频片段,通过对待编辑音视频中与目标音视频时间戳对应的音视频片段进行处理,实现基于文本剪辑音视频,通过对文本的剪辑,联动的剪辑对应的音视频片段,能够实现对音视频精确性较高的剪辑。
一种可选的实施方式中,预设操作包括选中操作,则基于预设操作,对待编辑音视频中与目标音视频时间戳对应的音视频片段进行处理,包括:按照预设第一显示样式,对待编辑音视频中与目标音视频时间戳对应的音视频片段进行显示。
作为一种示例,第一显示样式例如是高亮显示,图3示出了另一种界面的示意图,参照图3,基于选中操作,可以对目标文本数据进行高亮显示,以及基于时间轴轨道对目标音视频时间戳对应的音视频片段进行高亮显示,高亮显示部分如图3中虚线部分。
一种可选的实施方式中,预设操作包括删除操作,则基于预设操作,对待编辑音视频中与目标音视频时间戳对应的音视频片段进行处理,包括:基于删除操作,对待编辑音视频中与目标音视频时间戳对应的音视频片段进行删除。
其中,基于删除操作,可以对目标文本数据进行删除,以及基于对目标音视频时间戳对应的音视频片段进行删除。例如图3所示,在选中目标文本数据后,可以展示删除控件,响应于针对删除控件的触发操作,删除目标文本数据以及目标音视频时间戳对应的音视频片段。
一种可选的实施方式中,预设操作包括修改操作,则基于预设操作,对待编辑音视频中与目标音视频时间戳对应的音视频片段进行处理,包括:获取修改操作对应的修改后文本数据;基于修改后文本数据和待编辑音视频中的音色信息生成音视频片段,作为待修改音视频片段;利用待修改音视频片段,对待编辑音视频中与目标音视频时间戳对应的音视频片段进行替换处理。
其中,基于修改操作,可以对目标文本数据进行修改,例如图3所示,在选中目标文本数据后,可以展示修改控件,响应于针对修改控件的触发操作,并根据接收的修改内容生成修改后文本数据。以及基于修改后文本数据和音色信息生成待修改音视频片段,根据待修改音视频片段替换目标音视频时间戳对应的音视频片段,实现对待编辑音视频的修改。
本公开实施例提供的音视频处理方法中,通过展示待编辑音视频对应的文本数据,响应于针对文本数据中的目标文本数据触发的预设操作,确定目标文本数据对应的音视频时间戳,作为目标音视频时间戳,以及基于预设操作,对待编辑音视频中与目标音视频时间戳对应的音视频片段进行处理。可见,本公开实施例提供的音视频处理方法能够基于文本剪辑音视频,由于文本与音视频时间戳存在映射关系,通过对文本的剪辑,联动的剪辑对应的音视频片段,能够实现对音视频精确性较高的剪辑,并且,通过展示与音视频时间戳具有映射关系的文本数据,能够直观的展示音视频内容,相较于相关技术中用户剪辑音视频内容的方案,简化了用户操作,降低用户操作门槛。
基于上述实施例,在音视频处理场景中,为了提高听感体验,存在针对无效语气词剪辑、配乐、片花制作等多种功能的需求。根据本公开实施例的方法,可以便捷的实现上述功能,降低用户使用门槛,说明如下。
一种可选的实施方式中,在对话中通常会出现“嗯”“呃”“那个”等无效语气词以及静音片段,因此,为了保证对话的连贯性,存在对待编辑音视频进行编辑,以删除前述无效语气词及静音片段的需求。
因此,如图4所示,本公开实施例的音视频处理方法还包括:
步骤401,展示针对预设关键词或预设静音片段的第一编辑入口。
本实施例中,可以对展示的待编辑音视频对应的文本数据进行检测,确定文本数据中的预设关键词或预设静音片段,并在文本数据中存在预设关键词或预设静音片段的情况下,展示第一编辑入口。作为一种示例,图3中区域A所示控件为第一编辑入口,第一编辑入口上展示“修改建议01:去除无效语气词”信息。
其中,预设关键词可包括无效语气词等词汇,确定文本数据中预设关键词的实现方式有多种,例如可以通过匹配的方式确定文本数据中的预设关键词,再例如可以基于自然语言处理技术,确定文本数据中的预设关键词。
其中,预设静音片段根据两个相邻文字所对应的音视频时间戳之间的间隔确定,例如当该间隔大于预设阈值的情况下,确定两个相邻文字之间存在预设静音片段。静音片段可在界面上以空格的形式展示,可选地,可根据间隔的值,确定静音片段的展示长度。
步骤402,响应于针对第一编辑入口的触发操作,按照预设第二显示样式,对文本数据中的预设关键词或预设静音片段进行显示。
其中,针对第一编辑入口的触发操作包括但不限于点击操作、语音指令、触摸轨迹。第二显示样式可以是高亮显示,也可以是其他形式的显示样式,此处不作具体限制。
图5示出了一种界面的示意图,图5中预设关键词“呃”“嗯”“那个”在界面中高亮显示,如虚线部分所示。
步骤403,响应于针对一键删除控件的触发操作,从文本数据中删除预设关键词或预设静音片段。
本公开实施例中,第一编辑入口对应于第一编辑卡片,第一编辑卡片上设置有一键删除控件。响应于针对第一编辑入口的触发操作,展示第一编辑卡片,第一编辑卡片的展示方式包括但不限于下拉选项、悬浮窗等。
举例而言,参照图5,第一编辑卡片如图5中区域B所示,可以针对每一个预设关键词统计出现次数,并在第一编辑卡片中展示预设关键词及对应的出现次数。
可选地,响应于针对预设关键词中目标关键词的触发操作,将目标关键词从预设关键词中去除,并同步修改第一编辑卡片中展示的预设关键词的出现次数,以使用户可以通过点击等操作,去除不属于无效语气词的关键词,以避免其被一键删除。
本实施例中,能够将预设关键词或预设静音片段删除操作以编辑卡片的形式呈现,提供了一键式操作,节省编辑时长,简化用户操作,降低用户使用门槛。
一种可选的实施方式中,音视频处理方法还包括:在第二编辑卡 片上展示语音增强控件;响应于针对语音增强控件的触发操作,对待编辑音视频中的人声进行增强处理。
本实施例中,展示针对待编辑音视频的第二编辑入口,第二编辑入口对应于第二编辑卡片,第二编辑卡片上设置语音增强控件。例如,可以基于待编辑音视频进行噪声检测,并在检测到噪声的情况下,展示第二编辑入口,作为一种示例,图2中区域C所示控件为第二编辑入口,第二编辑入口上展示“强化建议:语音增强”信息。进而,响应于针对第二编辑入口的触发操作,展示第二编辑卡片。
参照图6,第二编辑卡片如图6中区域D所示,第二编辑卡片中展示了语音增强控件“增强语音”,响应于对该语音增强控件的触发操作,对待编辑音视频中的人声进行增强处理,触发操作包括但不限于点击操作、语音指令、触摸轨迹。
本实施例中,能够将语音增强操作以编辑卡片的形式呈现,提供了一键式操作,能够增强用户人声满足听感体验,并简化用户操作,降低用户使用门槛。
一种可选的实施方式中,音视频处理方法还包括:基于待编辑音视频的音乐体裁和/或待编辑音视频对应的文本数据中的内容,确定待编辑音视频对应的配乐;将配乐添加到待编辑音视频片段中。
本实施例中,可以预先设置多个标签,每个标签与一个或多个配乐之间存在映射关系,基于待编辑音视频的音乐体裁和/或待编辑音视频对应的文本数据中的内容,确定与音乐体裁和/或文本数据中的内容对应的标签,基于标签与配乐之间的映射关系,确定待编辑音视频对应的配乐。
作为一种示例,针对待编辑音视频对应的文本数据中的内容,基于自然语言处理技术确定内容的主题为“运动”,进而确定“运动”标签对应的配乐,为待编辑音视频对应的配乐,并将该配乐添加到待编辑音视频片段中。
作为另一种示例,基于待编辑音视频的音乐体裁,确定对应的标 签,将该标签对应的配乐,作为待编辑音视频对应的配乐,并将该配乐添加到待编辑音视频片段中。
本实施例中,可以基于文本数据的内容、体裁智能推荐配乐,以满足添加配乐的场景需求,丰富听感的多样性、提高听感体验,并简化用户操作,降低用户使用门槛。
一种可选的实施方式中,音视频处理方法还包括:在第三编辑卡片上展示响度均衡控件;响应于针对响度均衡控件的触发操作,对待编辑音视频中音量的响度进行归一化处理。
本实施例中,展示针对待编辑音视频的第三编辑入口,第三编辑入口对应于第三编辑卡片,第三编辑卡片上设置响度均衡控件。例如,可以基于待编辑音视频进行音量的响度检测,并在检测到待编辑音视频不满足预设的响度均衡条件的情况下,展示第三编辑入口。进而,响应于针对第三编辑入口的触发操作,展示第三编辑卡片,响应于针对响度均衡控件的触发操作,对待编辑音视频中音量的响度进行归一化处理,例如使待编辑音视频中音量的响度处于预设范围内。
本实施例中,能够将响度均衡操作以编辑卡片的形式呈现,提供了一键式操作,能够提高听感体验,并简化用户操作,降低用户使用门槛。
一种可选的实施方式中,音视频处理方法还包括:在第四编辑卡片上展示智能片花控件;响应于针对智能片花控件的触发操作,对待编辑音视频中的前预设时间段内的音视频片段中的音乐音量与人声音量进行调节,得到音量调节后音视频片段。
本实施例中,展示针对待编辑音视频的第四编辑入口,第四编辑入口对应于第四编辑卡片,第四编辑卡片上设置智能片花控件。响应于针对第四编辑入口的触发操作,展示第四编辑卡片,响应于针对智能片花控件的触发操作,对待编辑音视频中的前预设时间段内的音视频片段中的音乐音量与人声音量进行调节,例如将人声音量增大第一音量值,将音乐音量降低第二音量值,或者在检测到人声的音视频片 段中,将音乐音量降低第三音量值,得到音量调节后音视频片段。
其中,音量调节后音视频片段中的音乐音量与人声音量成反比。
可选地,基于第四编辑卡片上展示的智能片花控件,还可以实现片头生成,例如,响应于针对智能片花控件的触发操作,确定当前选中的第二文本数据和与第二文本数据对应的第二音视频片段,将第二文本数据和第二音视频片段复制并粘贴至预设的片头区域,实现片花的效果。
本实施例中,能够将智能片花功能以编辑卡片的形式呈现,提供了一键式操作,实现片花的效果,并简化用户操作,降低用户使用门槛。
一种可选的实施方式中,音视频处理方法还包括:当接收到在文本数据中针对第一文本数据的增加操作时,基于第一文本数据和待编辑音视频中的音色信息,生成第一音视频片段;基于第一文本数据在文本数据中的位置信息,确定第一文本数据对应的第一音视频时间戳;基于第一音视频时间戳,将第一音视频片段添加到待编辑音视频中。
本实施例中,第一文本数据可以是响应于输入操作得到的,也可以是基于已有文本数据复制得到的。可根据待编辑音视频获取各用户的音色信息,当增加第一文本数据时,根据第一文本数据在文本数据中的位置信息确定对应的第一音视频时间戳,并在第一音视频时间戳的位置处添加第一音视频片段。
需要说明的是,前述编辑入口可以基于检测结果自动展示,也可以响应于触发操作在界面上展示。
本实施例中,采用音色克隆和语音播报技术,基于增加的文本克隆音色、智能生成音视频片段,实现了基于文本输入增加音视频片段,降低重新录制带来的时间成本和编辑成本,简化用户操作。
基于上述方法实施例,本公开还提供了一种音视频处理装置,参考图7,为本公开实施例提供的一种音视频处理装置的结构示意图,所述装置包括:
第一展示模块701,用于展示待编辑音视频对应的文本数据;其中,所述文本数据与所述待编辑音视频的音视频时间戳具有映射关系。
第二展示模块702,用于按照时间轴轨道展示所述待编辑音视频。
确定模块703,用于响应于针对所述文本数据中的目标文本数据触发的预设操作,确定所述目标文本数据对应的音视频时间戳,作为目标音视频时间戳。
编辑模块704,用于基于所述预设操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行处理。
一种可选的实施方式中,音视频处理装置还包括:
第一处理模块,用于展示针对预设关键词或预设静音片段的第一编辑入口;响应于针对所述第一编辑入口的触发操作,按照预设第二显示样式,对所述文本数据中的所述预设关键词或所述预设静音片段进行显示。
一种可选的实施方式中,第一编辑入口对应于第一编辑卡片,所述第一编辑卡片上设置有一键删除控件;第一编辑模块还用于,响应于针对所述一键删除控件的触发操作,从所述文本数据中删除所述预设关键词或所述预设静音片段。
一种可选的实施方式中,音视频处理装置还包括:
第二处理模块,用于在第二编辑卡片上展示语音增强控件;响应于针对所述语音增强控件的触发操作,对所述待编辑音视频中的人声进行增强处理。
一种可选的实施方式中,音视频处理装置还包括:
第一添加模块,用于基于所述待编辑音视频的音乐体裁和/或所述待编辑音视频对应的文本数据中的内容,确定所述待编辑音视频对应的配乐;将所述配乐添加到所述待编辑音视频片段中。
一种可选的实施方式中,音视频处理装置还包括:
第三处理模块,用于在第三编辑卡片上展示响度均衡控件;响应于针对所述响度均衡控件的触发操作,对所述待编辑音视频中音量的 响度进行归一化处理。
一种可选的实施方式中,音视频处理装置还包括:
第四处理模块,用于在第四编辑卡片上展示智能片花控件;响应于针对所述智能片花控件的触发操作,对所述待编辑音视频中的前预设时间段内的音视频片段中的音乐音量与人声音量进行调节,得到音量调节后音视频片段;其中,所述音量调节后音视频片段中的音乐音量与人声音量成反比。
一种可选的实施方式中,预设操作包括选中操作,编辑模块704具体用于:按照预设第一显示样式,对待编辑音视频中与目标音视频时间戳对应的音视频片段进行显示。
一种可选的实施方式中,预设操作包括删除操作,编辑模块704具体用于:基于所述删除操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行删除。
一种可选的实施方式中,预设操作包括修改操作,编辑模块704具体用于:获取所述修改操作对应的修改后文本数据;基于所述修改后文本数据和所述待编辑音视频中的音色信息生成音视频片段,作为待修改音视频片段;利用所述待修改音视频片段,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行替换处理。
一种可选的实施方式中,音视频处理装置还包括:
第二添加模块,用于当接收到在所述文本数据中针对第一文本数据的增加操作时,基于所述第一文本数据和所述待编辑音视频中的音色信息,生成第一音视频片段;基于所述第一文本数据在所述文本数据中的位置信息,确定所述第一文本数据对应的第一音视频时间戳;基于所述第一音视频时间戳,将所述第一音视频片段添加到所述待编辑音视频中。
前述实施例对音视频处理方法的解释说明,同样适用于本实施例的音视频处理装置,此处不再赘述。
本公开实施例提供的音视频处理装置中,通过展示待编辑音视频 对应的文本数据,响应于针对文本数据中的目标文本数据触发的预设操作,确定目标文本数据对应的音视频时间戳,作为目标音视频时间戳,以及基于预设操作,对待编辑音视频中与目标音视频时间戳对应的音视频片段进行处理。可见,本公开实施例提供的音视频处理方法能够基于文本剪辑音视频,由于文本与音视频时间戳存在映射关系,通过对文本的剪辑,联动的剪辑对应的音视频片段,能够实现对音视频精确性较高的剪辑,并且,通过展示与音视频时间戳具有映射关系的文本数据,能够直观的展示音视频内容,相较于相关技术中用户剪辑音视频内容的方案,简化了用户操作,降低用户操作门槛。
除了上述方法和装置以外,本公开实施例还提供了一种计算机可读存储介质,计算机可读存储介质中存储有指令,当所述指令在终端设备上运行时,使得所述终端设备实现本公开实施例所述的音视频处理方法。
本公开实施例还提供了一种计算机程序产品,所述计算机程序产品包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现本公开实施例所述的音视频处理方法。
另外,本公开实施例还提供了一种音视频处理设备,参见图8所示,可以包括:
处理器801、存储器802、输入装置803和输出装置804。音视频处理设备中的处理器801的数量可以一个或多个,图8中以一个处理器为例。在本公开的一些实施例中,处理器801、存储器802、输入装置803和输出装置804可通过总线或其它方式连接,其中,图8中以通过总线连接为例。
存储器802可用于存储软件程序以及模块,处理器801通过运行存储在存储器802的软件程序以及模块,从而执行音视频处理设备的各种功能应用以及数据处理。存储器802可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等。此外,存储器802可以包括高速随机存取存储器,还可 以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。输入装置803可用于接收输入的数字或字符信息,以及产生与音视频处理设备的用户设置以及功能控制有关的信号输入。
具体在本实施例中,处理器801会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行文件加载到存储器802中,并由处理器801来运行存储在存储器802中的应用程序,从而实现上述音视频处理设备的各种功能。
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所述的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (15)

  1. 一种音视频处理方法,所述方法包括:
    展示待编辑音视频对应的文本数据;其中,所述文本数据与所述待编辑音视频的音视频时间戳具有映射关系;
    以及,按照时间轴轨道展示所述待编辑音视频;
    响应于针对所述文本数据中的目标文本数据触发的预设操作,确定所述目标文本数据对应的音视频时间戳,作为目标音视频时间戳;
    基于所述预设操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行处理。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:
    展示针对预设关键词或预设静音片段的第一编辑入口;
    响应于针对所述第一编辑入口的触发操作,按照预设第二显示样式,对所述文本数据中的所述预设关键词或所述预设静音片段进行显示。
  3. 根据权利要求2所述的方法,其中,所述第一编辑入口对应于第一编辑卡片,所述第一编辑卡片上设置有一键删除控件;所述响应于针对所述第一编辑入口的触发操作,按照预设第二显示样式,对所述文本数据中的所述预设关键词或所述预设静音片段进行显示之后,还包括:
    响应于针对所述一键删除控件的触发操作,从所述文本数据中删除所述预设关键词或所述预设静音片段。
  4. 根据权利要求1所述的方法,其中,所述方法还包括:
    在第二编辑卡片上展示语音增强控件;
    响应于针对所述语音增强控件的触发操作,对所述待编辑音视频中的人声进行增强处理。
  5. 根据权利要求1所述的方法,其中,所述方法还包括:
    基于所述待编辑音视频的音乐体裁和/或所述待编辑音视频对应的 文本数据中的内容,确定所述待编辑音视频对应的配乐;
    将所述配乐添加到所述待编辑音视频片段中。
  6. 根据权利要求1所述的方法,其中,所述方法还包括:
    在第三编辑卡片上展示响度均衡控件;
    响应于针对所述响度均衡控件的触发操作,对所述待编辑音视频中音量的响度进行归一化处理。
  7. 根据权利要求1所述的方法,其中,所述方法还包括:
    在第四编辑卡片上展示智能片花控件;
    响应于针对所述智能片花控件的触发操作,对所述待编辑音视频中的前预设时间段内的音视频片段中的音乐音量与人声音量进行调节,得到音量调节后音视频片段;其中,所述音量调节后音视频片段中的音乐音量与人声音量成反比。
  8. 根据权利要求1所述的方法,其中,所述预设操作包括选中操作,所述基于所述预设操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行处理,包括:
    按照预设第一显示样式,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行显示。
  9. 根据权利要求1所述的方法,其中,所述预设操作包括删除操作,所述基于所述预设操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行处理,包括:
    基于所述删除操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行删除。
  10. 根据权利要求1所述的方法,其中,所述预设操作包括修改操作,所述基于所述预设操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行处理,包括:
    获取所述修改操作对应的修改后文本数据;
    基于所述修改后文本数据和所述待编辑音视频中的音色信息生成音视频片段,作为待修改音视频片段;
    利用所述待修改音视频片段,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行替换处理。
  11. 根据权利要求1所述的方法,其中,所述方法还包括:
    当接收到在所述文本数据中针对第一文本数据的增加操作时,基于所述第一文本数据和所述待编辑音视频中的音色信息,生成第一音视频片段;
    基于所述第一文本数据在所述文本数据中的位置信息,确定所述第一文本数据对应的第一音视频时间戳;
    基于所述第一音视频时间戳,将所述第一音视频片段添加到所述待编辑音视频中。
  12. 一种音视频处理装置,所述装置包括:
    第一展示模块,用于展示待编辑音视频对应的文本数据;其中,所述文本数据与所述待编辑音视频的音视频时间戳具有映射关系;
    第二展示模块,用于按照时间轴轨道展示所述待编辑音视频;
    确定模块,用于响应于针对所述文本数据中的目标文本数据触发的预设操作,确定所述目标文本数据对应的音视频时间戳,作为目标音视频时间戳;
    编辑模块,用于基于所述预设操作,对所述待编辑音视频中与所述目标音视频时间戳对应的音视频片段进行处理。
  13. 一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在终端设备上运行时,使得所述终端设备实现如权利要求1-11任一项所述的方法。
  14. 一种设备,包括:存储器,处理器,及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如权利要求1-11任一项所述的方法。
  15. 一种计算机程序产品,所述计算机程序产品包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现如权利要求1-11任一项所述的方法。
PCT/CN2022/116650 2021-09-22 2022-09-02 一种音视频处理方法、装置、设备及存储介质 Ceased WO2023045730A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020237044829A KR102919002B1 (ko) 2021-09-22 2022-09-02 오디오/비디오 처리 방법 및 장치, 디바이스 및 저장 매체
JP2023578889A JP7764507B2 (ja) 2021-09-22 2022-09-02 音声ビデオ処理方法、装置、機器及び記憶媒体
EP22871780.7A EP4344225A4 (en) 2021-09-22 2022-09-02 AUDIO/VIDEO PROCESSING METHOD AND APPARATUS, DEVICE AND STORAGE MEDIUM
US18/395,118 US20240127860A1 (en) 2021-09-22 2023-12-22 Audio/video processing method and apparatus, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111109213.4 2021-09-22
CN202111109213.4A CN115914734A (zh) 2021-09-22 2021-09-22 一种音视频处理方法、装置、设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/395,118 Continuation US20240127860A1 (en) 2021-09-22 2023-12-22 Audio/video processing method and apparatus, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2023045730A1 true WO2023045730A1 (zh) 2023-03-30

Family

ID=85719279

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/116650 Ceased WO2023045730A1 (zh) 2021-09-22 2022-09-02 一种音视频处理方法、装置、设备及存储介质

Country Status (6)

Country Link
US (1) US20240127860A1 (zh)
EP (1) EP4344225A4 (zh)
JP (1) JP7764507B2 (zh)
KR (1) KR102919002B1 (zh)
CN (1) CN115914734A (zh)
WO (1) WO2023045730A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114915836A (zh) * 2022-05-06 2022-08-16 北京字节跳动网络技术有限公司 用于编辑音频的方法、装置、设备和存储介质
CN116866670A (zh) * 2023-05-15 2023-10-10 维沃移动通信有限公司 视频编辑方法、装置、电子设备和存储介质
CN119545081B (zh) * 2023-08-31 2026-01-13 北京字跳网络技术有限公司 视频处理方法、装置、电子设备、存储介质
CN120786154B (zh) * 2025-09-10 2025-12-26 杭州云智创心网络有限公司 基于多模态大模型协同的视频剪辑方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005260283A (ja) * 2004-02-13 2005-09-22 Matsushita Electric Ind Co Ltd Avコンテンツのネットワーク再生方法
CN103442300A (zh) * 2013-08-27 2013-12-11 Tcl集团股份有限公司 一种音视频跳转播放方法以及装置
CN105744346A (zh) * 2014-12-12 2016-07-06 深圳Tcl数字技术有限公司 字幕切换方法及装置
CN108259965A (zh) * 2018-03-31 2018-07-06 湖南广播电视台广播传媒中心 一种视频剪辑方法和剪辑系统
CN112231498A (zh) * 2020-09-29 2021-01-15 北京字跳网络技术有限公司 互动信息处理方法、装置、设备及介质

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10191248A (ja) * 1996-10-22 1998-07-21 Hitachi Denshi Ltd 映像編集方法およびその方法の手順を記録した記録媒体
JP2004287193A (ja) * 2003-03-24 2004-10-14 Equos Research Co Ltd データ作成装置、データ作成プログラム、及び車載装置
KR20060130692A (ko) * 2004-03-31 2006-12-19 마쯔시다덴기산교 가부시키가이샤 악곡 데이터 편집장치 및 악곡 데이터 편집방법
JP2006227363A (ja) * 2005-02-18 2006-08-31 Nhk Computer Service:Kk 放送音声用辞書作成装置および放送音声用辞書作成プログラム
JP2009507453A (ja) * 2005-09-07 2009-02-19 ポータルビデオ・インコーポレーテッド ビデオ編集方法および装置におけるテキスト位置の時間見積もり
JP4741406B2 (ja) * 2006-04-25 2011-08-03 日本放送協会 ノンリニア編集装置およびそのプログラム
US9870796B2 (en) * 2007-05-25 2018-01-16 Tigerfish Editing video using a corresponding synchronized written transcript by selection from a text viewer
CN106598996B (zh) * 2015-10-19 2021-01-01 广州酷狗计算机科技有限公司 多媒体海报生成方法及装置
EP3776410A4 (en) * 2018-04-06 2021-12-22 Korn Ferry INTERVIEW TRAINING SYSTEM AND PROCESS WITH SYNCHRONIZED FEEDBACK
US12231745B1 (en) * 2019-01-23 2025-02-18 Amazon Technologies, Inc. Automated video summary generation using textual quotes
CN110401878A (zh) * 2019-07-08 2019-11-01 天脉聚源(杭州)传媒科技有限公司 一种视频剪辑方法、系统及存储介质
CN112243151A (zh) * 2019-07-16 2021-01-19 腾讯科技(深圳)有限公司 一种音频播放控制方法、装置、设备及介质
US20210043174A1 (en) * 2019-08-09 2021-02-11 Auxbus, Inc. System and method for semi-automated guided audio production and distribution
CN112752047A (zh) * 2019-10-30 2021-05-04 北京小米移动软件有限公司 视频录制方法、装置、设备及可读存储介质
KR102177768B1 (ko) * 2020-01-23 2020-11-11 장형순 클라우드 기반 음성결합을 이용한 맞춤형 동영상 제작 서비스 제공 시스템
CN112822542B (zh) * 2020-08-27 2026-02-17 腾讯科技(深圳)有限公司 视频合成方法、装置、计算机设备和存储介质
CN112102841B (zh) * 2020-09-14 2024-08-30 北京搜狗科技发展有限公司 一种音频编辑方法、装置和用于音频编辑的装置
CN113365133B (zh) * 2021-06-02 2022-10-18 北京字跳网络技术有限公司 视频分享方法、装置、设备及介质
US12119027B2 (en) * 2021-08-27 2024-10-15 Logitech Europe S.A. Method and apparatus for simultaneous video editing
US11770590B1 (en) * 2022-04-27 2023-09-26 VoyagerX, Inc. Providing subtitle for video content in spoken language
TWI892389B (zh) * 2023-12-27 2025-08-01 瑞昱半導體股份有限公司 藉助於偵測自定義詞的語音特徵對聲控裝置進行喚醒控制之方法及處理電路

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005260283A (ja) * 2004-02-13 2005-09-22 Matsushita Electric Ind Co Ltd Avコンテンツのネットワーク再生方法
CN103442300A (zh) * 2013-08-27 2013-12-11 Tcl集团股份有限公司 一种音视频跳转播放方法以及装置
CN105744346A (zh) * 2014-12-12 2016-07-06 深圳Tcl数字技术有限公司 字幕切换方法及装置
CN108259965A (zh) * 2018-03-31 2018-07-06 湖南广播电视台广播传媒中心 一种视频剪辑方法和剪辑系统
CN112231498A (zh) * 2020-09-29 2021-01-15 北京字跳网络技术有限公司 互动信息处理方法、装置、设备及介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4344225A4 *

Also Published As

Publication number Publication date
JP2024523464A (ja) 2024-06-28
US20240127860A1 (en) 2024-04-18
JP7764507B2 (ja) 2025-11-05
CN115914734A (zh) 2023-04-04
KR102919002B1 (ko) 2026-01-28
KR20240013879A (ko) 2024-01-30
EP4344225A4 (en) 2024-10-02
EP4344225A1 (en) 2024-03-27

Similar Documents

Publication Publication Date Title
WO2023045730A1 (zh) 一种音视频处理方法、装置、设备及存储介质
US12026354B2 (en) Video generation
US11430485B2 (en) Systems and methods for mixing synthetic voice with original audio tracks
US20180286459A1 (en) Audio processing
CN104301771A (zh) 视频文件播放进度的调整方法及装置
CN111046226B (zh) 一种音乐的调音方法及装置
CN112102841B (zh) 一种音频编辑方法、装置和用于音频编辑的装置
US20200097528A1 (en) Method and Device for Quickly Inserting Text of Speech Carrier
WO2016202176A1 (zh) 一种媒体文件合成方法、装置和设备
CN113923479A (zh) 音视频剪辑方法和装置
CN109460548B (zh) 一种面向智能机器人的故事数据处理方法及系统
CN109949792B (zh) 多音频的合成方法及装置
CN113516962B (zh) 语音播报方法、装置、存储介质及电子设备
CN115082267A (zh) 具有角色扮演的语言学习方法、装置、计算机设备及存储介质
US20050016364A1 (en) Information playback apparatus, information playback method, and computer readable medium therefor
CN113204668A (zh) 音频裁剪方法、装置、存储介质与电子设备
JP7562610B2 (ja) 映像コンテンツに対する合成音のリアルタイム生成を基盤としたコンテンツ編集支援方法およびシステム
CN111625677B (zh) 一种音频播放方法、电子设备和存储介质
CN106231395B (zh) 播放控制方法及媒体播放器、计算机可读存储介质
WO2022194038A1 (zh) 音乐的延长方法、装置、电子设备和存储介质
CN114491087A (zh) 文本处理方法、装置、电子设备以及存储介质
WO2021080971A1 (en) Device and method for creating a sharable clip of a podcast
WO2025092363A1 (zh) 一种多媒体资源处理方法、装置、设备及存储介质
US20250335502A1 (en) Display method and system for multimedia device
CN115602168A (zh) 一种dia音频商业化内容互动方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22871780

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202327087401

Country of ref document: IN

Ref document number: 11202309775Y

Country of ref document: SG

ENP Entry into the national phase

Ref document number: 2023578889

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2022871780

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20237044829

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020237044829

Country of ref document: KR

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112023027241

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2022871780

Country of ref document: EP

Effective date: 20231222

WWE Wipo information: entry into national phase

Ref document number: 11202309775Y

Country of ref document: SG

ENP Entry into the national phase

Ref document number: 112023027241

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20231222

NENP Non-entry into the national phase

Ref country code: DE