CN113473170A

CN113473170A - Live broadcast audio processing method and device, computer equipment and medium

Info

Publication number: CN113473170A
Application number: CN202110807055.3A
Authority: CN
Inventors: 何思远
Original assignee: Guangzhou Fanxing Huyu IT Co Ltd
Current assignee: Guangzhou Fanxing Huyu IT Co Ltd
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2021-10-01
Anticipated expiration: 2041-07-16
Also published as: CN113473170B

Abstract

The embodiment of the application discloses a live audio processing method and device, computer equipment and a medium, and belongs to the technical field of computers. The method comprises the following steps: receiving a plurality of audio streams of a target song in a live broadcasting process of a live broadcasting room, wherein the plurality of audio streams comprise a human voice audio stream of a target object, a mixed audio stream of the target object and an original audio stream; playing based on a first audio stream in the live broadcast room; and in response to that the audio quality information corresponding to the human voice audio stream meets a switching condition, switching the first audio stream into a second audio stream in the multi-channel audio stream, and playing the second audio stream in the live broadcast room. The method can switch among the human voice audio stream, the mixed audio stream and the original audio stream, realizes the switching of the audio streams, and enables various sounds to be played in the live broadcast room, thereby meeting the requirements of different audiences and improving the live broadcast effect.

Description

Live broadcast audio processing method and device, computer equipment and medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a live audio processing method and device, computer equipment and a medium.

Background

With the development of audio processing technology, audio processing technology is also applied more and more widely, for example, the audio processing technology is applied to the field of live broadcasting, the audio processing technology is adopted to process the voice of a main broadcast, and the processed voice is played in a live broadcasting room.

In the related technology, the anchor terminal mixes the voice of the anchor and the accompaniment audio of the song and then sends the mixed voice to the audience terminal, so that the audience in the live broadcast room can hear the voice of the anchor and the accompaniment of the song. However, in this way, only voice and accompaniment can be played in the live broadcast room, and the played sound is single, and cannot meet the requirements of different audiences, thereby affecting the live broadcast effect.

Disclosure of Invention

The embodiment of the application provides a live audio processing method and device, computer equipment and a medium, which realize switching among multiple paths of audio streams and improve the live effect. The technical scheme is as follows:

in one aspect, a live audio processing method is provided, where the method includes:

receiving a plurality of audio streams of a target song in a live broadcasting process of a live broadcasting room, wherein the plurality of audio streams comprise a voice audio stream of a target object, a mixed audio stream of the target object and an original audio stream, and the mixed audio stream is obtained by mixing the voice of the target object and an accompaniment audio of the target song;

playing in the live broadcast room based on a first audio stream, wherein the first audio stream is any one of the multiple audio streams;

and in response to that the audio quality information corresponding to the human voice audio stream meets a switching condition, switching the first audio stream into a second audio stream in the multi-channel audio stream, and playing the second audio stream in the live broadcast room based on the second audio stream, wherein the second audio stream is different from the first audio stream, and the audio quality information is used for representing the audio quality of a corresponding audio clip.

In one possible implementation manner, the switching the first audio stream to the second audio stream in the multiple audio streams in response to that the audio quality information corresponding to the human voice audio stream satisfies a switching condition includes:

and switching the first audio stream to the second audio stream in response to that an automatic switching function for the audio stream is in an on state and the audio quality information corresponding to the human voice audio stream meets the switching condition.

In another possible implementation manner, in response to that the audio quality information corresponding to the vocal audio stream satisfies a switching condition, the method switches the first audio stream to a second audio stream in the multiple audio streams, and before playing based on the second audio stream in the live broadcast room, the method further includes:

and analyzing the stream message corresponding to the human voice audio stream to obtain the audio quality information of the audio clip.

In another possible implementation manner, the playing based on the second audio stream in the live broadcast room includes:

determining a second timestamp in the second audio stream adjacent to the first timestamp based on a first timestamp of a currently playing audio clip, the second timestamp being located after the first timestamp;

and when the currently played audio clip is played, playing the audio clip corresponding to the second timestamp in the second audio stream in the live broadcast room.

In another possible implementation, the method further includes:

and when the voice audio stream is played, responding to the triggering operation of the playing control associated with the original audio stream, and playing based on the original audio stream.

In another aspect, a live audio processing method is provided, and the method includes:

in the live broadcast process of a live broadcast room, acquiring voice of a target object sent according to a target song to obtain voice audio stream of the target object;

mixing the voice of the target object with the accompaniment audio of the target song to obtain a mixed audio stream of the target object;

and sending the voice audio stream and the mixed audio stream to a target server, wherein the target server is used for acquiring an original audio stream of the target song and sending the voice audio stream, the mixed audio stream and the original audio stream to a first terminal.

In one possible implementation, before sending the human voice audio stream and the mixed audio stream to a target server, the method further includes:

and respectively identifying each audio clip in the human voice audio stream to obtain audio quality information corresponding to the human voice audio stream, wherein the audio quality information is used for representing the audio quality of the corresponding audio clip.

In another aspect, a live audio processing system is provided, which includes a first terminal, a target server, and a second terminal;

the second terminal is used for acquiring a voice of a target object sent out according to a target song in a live broadcast process of a live broadcast room, obtaining a voice audio stream of the target object, mixing the voice of the target object with accompaniment audio of the target song to obtain a mixed audio stream of the target object, and sending the voice audio stream and the mixed audio stream to the target server;

the target server is used for acquiring an original audio stream of the target song and sending the voice audio stream, the mixed audio stream and the original audio stream to the first terminal;

the first terminal is configured to receive the vocal audio stream, the mixed audio stream, and the original audio stream in a live broadcast process of the live broadcast room, and play the received vocal audio stream, the mixed audio stream, and the original audio stream in the live broadcast room based on a first audio stream, where the first audio stream is any one of the multiple audio streams;

the first terminal is configured to switch, in response to that audio quality information corresponding to the human voice audio stream meets a switching condition, the first audio stream to a second audio stream in the multiple audio streams, and play the second audio stream in the live broadcast room based on the second audio stream, where the second audio stream is different from the first audio stream, and the audio quality information is used to indicate audio quality of a corresponding audio clip.

In another aspect, a live audio processing apparatus is provided, the apparatus including:

the audio stream receiving module is used for receiving a plurality of audio streams of a target song in a live broadcasting process of a live broadcasting room, wherein the plurality of audio streams comprise a voice audio stream of a target object, a mixed audio stream of the target object and an original audio stream, and the mixed audio stream is obtained by mixing the voice of the target object and an accompaniment audio of the target song;

the playing module is used for playing in the live broadcast room based on a first audio stream;

and the audio stream switching module is used for switching the first audio stream into a second audio stream in the multi-channel audio stream in response to that the audio quality information corresponding to the human voice audio stream meets a switching condition, playing the second audio stream in the live broadcasting room based on the second audio stream, wherein the second audio stream is different from the first audio stream, and the audio quality information is used for representing the audio quality of a corresponding audio clip.

In a possible implementation manner, the audio stream switching module is configured to switch the first audio stream to the second audio stream in response to that an automatic switching function for an audio stream is in an on state and that audio quality information corresponding to the human voice audio stream meets the switching condition.

In another possible implementation manner, the apparatus further includes:

and the quality information acquisition module is used for analyzing the stream message corresponding to the human voice audio stream to obtain the audio quality information of each audio clip.

In another possible implementation manner, the audio stream switching module includes:

a time stamp determining unit, configured to determine, based on a first time stamp of a currently played audio clip, a second time stamp adjacent to the first time stamp in the second audio stream, where the second time stamp is located after the first time stamp;

and the audio clip playing unit is used for playing the audio clip corresponding to the second timestamp in the second audio stream in the live broadcast room when the currently played audio clip is played.

In another possible implementation manner, the playing module is further configured to:

the voice acquisition module is used for acquiring voice of a target object sent out according to a target song in the live broadcast process of a live broadcast room to obtain voice audio stream of the target object;

the mixing module is used for mixing the voice of the target object with the accompaniment audio of the target song to obtain a mixed audio stream of the target object;

and the audio stream sending module is used for sending the human voice audio stream and the mixed audio stream to a target server, and the target server is used for acquiring a primary audio stream of the target song and sending the human voice audio stream, the mixed audio stream and the primary audio stream to a first terminal.

In one possible implementation, the apparatus further includes:

and the quality information acquisition module is used for respectively identifying each audio clip in the voice audio stream to obtain audio quality information corresponding to the voice audio stream, wherein the audio quality information is used for representing the audio quality of the corresponding audio clip.

In another aspect, a computer device is provided, which includes a processor and a memory, where at least one program code is stored, and loaded and executed by the processor to implement the operations performed in the live audio processing method according to the above aspect.

In another aspect, a computer-readable storage medium is provided, in which at least one program code is stored, the at least one program code being loaded and executed by a processor to implement the operations performed in the live audio processing method according to the above aspect.

In another aspect, a program code product or program code is provided, the program code product or program code comprising program code stored in a computer readable storage medium, which is loaded and executed by a processor to implement the operations performed in the live audio processing method as described in the above aspect.

The method, the device, the computer equipment and the storage medium provided by the embodiment of the application can acquire the vocal audio stream, the mixed audio stream and the original audio stream corresponding to the target song in the live broadcasting process of the live broadcasting room, and can switch the audio streams when the audio quality information corresponding to the vocal audio stream meets the switching condition based on the first audio stream, switch the first audio stream into the second audio stream different from the first audio stream, realize the switching among the multi-channel audio streams, and play various sounds in the live broadcasting room, thereby meeting the requirements of different audiences and improving the live broadcasting effect.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of a live audio processing system provided in an embodiment of the present application;

fig. 2 is a flowchart of a live audio processing method provided in an embodiment of the present application;

fig. 3 is a flowchart of another live audio processing method provided in an embodiment of the present application;

fig. 4 is a flowchart of another live audio processing method provided in an embodiment of the present application;

fig. 5 is a schematic diagram of switching audio streams according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an embodiment of the present application for generating an audio stream;

fig. 7 is a schematic structural diagram of a live audio processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of another live audio processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a live audio processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of another live audio processing apparatus provided in an embodiment of the present application;

fig. 11 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.

It will be understood that the terms "first," "second," and the like as used herein may be used herein to describe various concepts, which are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, the first switching condition may be referred to as a second switching condition, and the second switching condition may be referred to as the first switching condition without departing from the scope of the present application.

As used herein, the terms "at least one," "a plurality," "each," "any," and the like, at least one comprises one, two, or more than two, and a plurality comprises two or more than two, each referring to each of the corresponding plurality, and any referring to any one of the plurality. For example, the plurality of audio segments includes 3 audio segments, each audio segment refers to each of the 3 audio segments, and any one of the 3 audio segments refers to any one of the 3 audio segments, which may be a first one, a second one, or a third one.

Fig. 1 is a schematic diagram of a live audio processing system according to an embodiment of the present application. Referring to fig. 1, the live audio processing system includes at least one first terminal 101 (1 is taken as an example in fig. 1), a target server 102, and a second terminal 103. Each first terminal 101 is connected to the target server 102 through a wireless or wired network, and the second terminal 103 is connected to the target server 102 through a wireless or wired network.

The first terminal 101 and the second terminal 103 have installed thereon a target application served by the target server 102, through which the first terminal 101 and the second terminal 103 can implement functions such as data transmission, message interaction, and the like. Optionally, the first terminal 101 and the second terminal 103 are computers, mobile phones, tablet computers or other terminals. Optionally, the target application is a target application in the operating systems of the first terminal 101 and the second terminal 103, or a target application provided by a third party. For example, the target application is a live application having a live function, and of course, the live application can also have other functions, such as a shopping function, a navigation function, a game function, and the like. Optionally, the target server 102 is a background server of the target application or a cloud server providing services such as cloud computing and cloud storage.

Based on the live audio processing system provided in fig. 1, the method provided in the embodiment of the present application can be applied in a live singing scene. For example, the main broadcast plays a song in a live broadcast room through a second terminal, the second terminal collects the voice of the main broadcast, the voice is mixed with the accompaniment audio of the song to obtain a mixed audio stream, the voice audio stream and the mixed audio stream corresponding to the voice are sent to a live broadcast server, the live broadcast server can also obtain the original audio stream of the song, and the voice audio stream, the mixed audio stream and the original audio stream are sent to a first terminal.

Fig. 2 is a flowchart of a live audio processing method according to an embodiment of the present application. The execution main body of the embodiment of the application is the first terminal. Referring to fig. 2, the method comprises the steps of:

201. and the first terminal receives the multi-channel audio stream of the target song in the live broadcasting process of the live broadcasting room.

The multi-channel audio stream comprises a voice audio stream of a target object, a mixed audio stream of the target object and an original audio stream, and the mixed audio stream is obtained by mixing the voice of the target object and the accompaniment audio of the target song. The target object is a live broadcast, the target song is any song, the voice audio stream is obtained by collecting the voice of the target object according to the target song, and the original audio stream is the original audio of the target song.

202. The first terminal plays in the live broadcast room based on a first audio stream, wherein the first audio stream is any audio stream in the multi-channel audio stream.

The first audio stream may be any one of a human audio stream, a mixed audio stream, or an original audio stream. The first audio stream comprises a plurality of audio clips, and after the first terminal receives the multi-channel audio stream, the audio clips in the first audio stream are played in a live broadcast room.

203. And the first terminal responds that the audio quality information corresponding to the human voice audio stream meets the switching condition, switches the first audio stream into a second audio stream in the multi-channel audio stream, and plays the second audio stream in the live broadcast room based on the second audio stream, wherein the second audio stream is different from the first audio stream.

The second audio stream is an audio stream different from the first audio stream in the multi-path audio stream. For example, if the first audio stream is a human audio stream, the second audio stream is either a mixed audio stream or an original audio stream. The audio quality information is used to represent the audio quality of a corresponding audio segment in the human audio stream. The switching condition is a condition for switching a first audio stream into a second audio stream, wherein the first audio stream is a human voice audio stream, and the human voice audio stream is switched into an original audio stream or a mixed audio stream under the condition that the audio quality information meets different switching conditions; the first audio stream is an original audio stream, and the human voice audio stream is switched into a human voice audio stream or a mixed audio stream under the condition that the audio quality information meets different switching conditions; the first audio stream is a mixed audio stream, and the mixed audio stream is switched into a human voice audio stream or an original audio stream under the condition that the audio quality information meets different switching conditions.

In the embodiment of the application, the first terminal determines whether to switch the current first audio stream according to the audio quality of each audio clip in the human voice audio stream, and switches the first audio stream to the corresponding second audio stream when the audio quality information meets the switching condition, so that the switching of the audio streams is realized.

The method provided by the embodiment of the application obtains the voice audio stream, the mixed audio stream and the original audio stream corresponding to the target song in the live broadcasting process of the live broadcasting room, and can switch the audio streams when the audio quality information corresponding to the voice audio stream meets the switching condition based on the first audio stream, and switch the first audio stream into the second audio stream different from the first audio stream, so that the switching among the multi-channel audio streams is realized, various sounds can be played in the live broadcasting room, the requirements of different audiences are met, and the live broadcasting effect is improved.

Fig. 3 is a flowchart of another live audio processing method according to an embodiment of the present application. The execution main body of the embodiment of the application is the second terminal. Referring to fig. 3, the method comprises the steps of:

301. and the second terminal acquires the voice of the target object according to the target song in the live broadcast process of the live broadcast room, and obtains the voice audio stream of the target object.

The target object is a main broadcast for live broadcast, the main broadcast sings a target song in a live broadcast room, and the second terminal acquires the voice of the target object, so that voice audio stream is obtained, and the voice of the main broadcast singing song is obtained.

302. And the second terminal mixes the voice of the target object with the accompaniment audio of the target song to obtain a mixed audio stream of the target object.

The second terminal obtains the accompaniment audio of the target song, mixes the acquired voice and the accompaniment audio to obtain a mixed audio stream, namely the mixed audio stream comprises the voice and the accompaniment audio.

303. The second terminal sends the human audio stream and the mixed audio stream to the target server.

The target server can also obtain an original audio stream of the target song, and sends the human voice audio stream, the mixed audio stream and the original audio stream to the first terminal, so that the first terminal receives the three audio streams, and when playing is carried out based on the audio streams, the three audio streams can be switched.

The method provided by the embodiment of the application, the voice audio stream is obtained through the collected voice, the voice and the accompaniment audio of the target song are mixed, the mixed audio stream is obtained, the voice audio stream and the mixed audio stream corresponding to the target song are obtained, the obtained voice audio stream and the obtained mixed audio stream are sent to the first terminal through the target server, the target server can also send the obtained original audio stream to the first terminal, the follow-up first terminal can perform audio stream switching based on the received multi-channel audio stream, multiple sounds can be played in a live broadcast room, and the live broadcast effect is improved.

Fig. 4 is a flowchart of another live audio processing method according to an embodiment of the present application. The interaction subject of the embodiment of the application is a first terminal, a second terminal and a target server. Referring to fig. 4, the method includes the steps of:

401. and the second terminal acquires the voice of the target object according to the target song in the live broadcast process of the live broadcast room, and obtains the voice audio stream of the target object.

The target object is a main broadcast for live broadcast, and the target song is any song. And in the process that the target object carries out live broadcast through the second terminal, the second terminal acquires the voice emitted by the target object, and processes the voice to obtain voice audio stream. The second terminal can collect the voice of the target object through the microphone.

In one possible implementation, the second terminal is installed with a target application, for example, a live application, a video application, or other live-enabled application. The target object sings a target song through the target application in a live broadcasting mode, and the second terminal collects voice sent by the target object.

In a possible implementation manner, the second terminal collects the voice to the sound card, the sound card performs the humble sound on the voice, that is, the sound card tunes the voice, and then the second terminal obtains the voice audio stream based on the processed voice.

In one possible implementation manner, the second terminal encodes the collected voice based on the target audio protocol to obtain a voice audio stream, where the voice audio stream includes a plurality of audio data packets and an audio header declaration. The audio data packet comprises an audio segment corresponding to the human voice, the audio header statement comprises audio format information and decoding information, and the audio header statement is used for initializing a decoder so that the decoder decodes the audio data packet based on the audio header statement to obtain the audio segment in the audio data packet. The audio header declaration is only sent once when the first audio data packet is sent, and then the audio header declaration does not need to be sent again under the condition that the audio format is not modified, and the audio header declaration is sent again under the condition that the audio format is modified, wherein the retransmitted audio header declaration comprises the modified audio format information and the decoding information.

For example, the target audio Protocol is a Real Time Messaging Protocol (RTMP), which is a stream push Protocol designed based on a container structure of Flash Video (FLV), where the FLV supports an audio format, a Video format, and a script information format, and also supports a user-defined data type, and on the basis of the FLV, the user can define the data type of the multi-audio stream. For example, see table 1 below for the respective data types for FLV:

TABLE 1

The live audio is audio mixed with the accompanying audio. The identifier on the left side in table 1 is an identifier corresponding to each data type in the audio protocol.

The audio header declaration contains the following data:

the audio data packet contains the following data:

in addition, in a possible implementation manner, after obtaining the voice audio stream, the second terminal identifies each audio segment in the voice audio stream, so as to obtain audio quality information corresponding to the voice audio stream, and adds the audio quality information to the stream message corresponding to the voice audio stream, so that after receiving the stream message corresponding to the voice audio stream, the first terminal can parse the stream message, so as to obtain the audio quality information of each audio segment.

Alternatively, the audio quality information is expressed in the form of a reference score, a larger reference score indicates a higher audio quality of the audio piece, and a smaller reference score indicates a worse audio quality of the audio piece.

Optionally, the second terminal may score each audio segment according to a matching degree between each audio segment in the human voice audio stream and the original song of the target song, so as to obtain a reference score. Wherein, the higher the matching degree is, the higher the reference score is; the lower the degree of match, the lower the reference score. Or, the second terminal may also acquire the reference score of each audio segment in other manners, and the manner of acquiring the reference score is not limited in the embodiment of the present application.

It should be noted that, in the embodiment of the present application, the second terminal obtains the audio quality information only as an example for description, in another embodiment, the second terminal sends the human voice audio stream to the target server, the target server identifies each audio segment in the human voice audio stream to obtain the audio quality information corresponding to the human voice audio stream, and adds the audio quality information to the stream message corresponding to the human voice audio stream.

402. And the second terminal mixes the voice of the target object with the accompaniment audio of the target song to obtain a mixed audio stream of the target object, and sends the voice audio stream and the mixed audio stream to the target server.

The mixed audio stream includes the accompaniment audio of the human voice and the target song, and the accompaniment audio is stored by the second terminal, or is sent to the second terminal for other equipment, or is acquired by adopting other methods, which is not limited in the embodiment of the present application.

In a possible implementation mode, because the voice and the accompaniment audio are two-channel audio, when the voice and the accompaniment audio are directly mixed, the difficulty in ensuring that the volume corresponding to the voice and the volume corresponding to the accompaniment audio are mutually adaptive is high, and therefore before the voice and the accompaniment audio are mixed, the volume corresponding to the voice and the volume corresponding to the accompaniment audio need to be adjusted firstly. The second terminal displays a third volume control related to the voice and a fourth volume control related to the accompaniment audio on a live broadcast interface of a live broadcast room, the target object can adjust the volume corresponding to the voice by adjusting the third volume control, and the volume corresponding to the accompaniment audio is adjusted by adjusting the fourth volume control until the volume corresponding to the voice or the volume corresponding to the accompaniment audio is adjusted to be proper. The second terminal responds to the adjustment operation of the third volume control and adjusts the volume corresponding to the voice; and responding to the adjustment operation of the fourth volume control, adjusting the volume corresponding to the accompaniment audio, and mixing the voice and the accompaniment audio by the second terminal based on the volume adjusted by the voice and the volume adjusted by the accompaniment audio to obtain a mixed audio stream.

In addition, in a possible implementation manner, the second terminal plays a corresponding audio clip in the live broadcast room based on the obtained mixed audio stream, that is, the target object can hear the audio obtained by mixing the played voice and the accompaniment audio.

It should be noted that, in the embodiment of the present application, the second terminal is used for performing sound mixing only as an example, in another embodiment, the second terminal sends the collected voice to the target server, and the target server mixes the voice and the accompaniment audio of the target song to obtain the mixed audio stream.

403. The target server obtains an original audio stream of the target song and sends the human voice audio stream, the mixed audio stream and the original audio stream to the first terminal.

In the embodiment of the application, the target server can identify the received voice audio stream or the mixed audio stream, determine the target song, and then acquire the original audio stream of the target song from the song database in the target server, or send a song acquisition request carrying the song identifier of the target song to other computer equipment, and the other computer equipment sends the original audio stream of the target song to the target server. Of course, the target server may also obtain the original audio stream in other manners, which is not limited in this embodiment of the application.

In addition, in the live broadcasting process, switching between any two audio streams in the human voice audio stream, the mixed audio stream and the original audio stream needs to be realized, so that it needs to be ensured that timestamps corresponding to the human voice audio stream, the mixed audio stream and the original audio stream are consistent, and an absolute timestamp is adopted, that is, a first audio segment in the human voice audio stream, the mixed audio stream and the original audio stream corresponds to the same timestamp. For example, audio segments in the human voice audio stream, the mixed audio stream, and the original audio stream all begin at 0 seconds and end at 300 seconds.

404. The first terminal receives the voice audio stream, the mixed audio stream and the original audio stream sent by the target server.

In the embodiment of the application, in order to ensure that two audio streams can be switched in time when the subsequent audio stream is switched, the first terminal receives three audio streams sent by the target server, and the audio stream needs to be acquired from the target server temporarily when the switching is performed. Optionally, the first terminal establishes three audio buffers, and stores the human voice audio stream, the mixed audio stream, and the original audio stream, so that the first terminal can directly obtain the audio streams stored in the buffers when switching the audio streams, thereby improving the switching efficiency.

405. The first terminal plays in the live broadcast room based on the first audio stream.

After receiving the three audio streams sent by the target server, the first terminal may decode any one of the audio streams to obtain each audio segment in the any audio stream, and sequentially play the audio segments according to the time stamps corresponding to the audio segments.

In a possible implementation manner, after the first terminal receives the multiple audio streams sent by the target server for the first time, the first terminal defaults to decode the human voice audio stream to obtain each audio segment in the human voice audio stream, and the audio segments are sequentially played according to the time stamp corresponding to each audio segment.

406. And the first terminal responds that the audio quality information corresponding to the human voice audio stream meets the switching condition, switches the first audio stream into a second audio stream, and plays the second audio stream in the live broadcast room.

In the embodiment of the application, the first terminal continuously obtains the audio quality information corresponding to the voice audio stream in the process of playing based on any audio stream, and switches the first audio stream into the second audio stream by judging whether the audio quality information corresponding to the voice audio stream meets the switching condition or not under the condition that the audio quality information meets the switching condition, wherein the second audio stream is different from the first audio stream, and the playing is continued based on the first audio stream under the condition that the audio quality information does not meet the switching condition.

The first audio stream may be a human audio stream, an original audio stream or a mixed audio stream, and the first audio stream may be switched to a corresponding second audio stream under different switching conditions, so that the following switching conditions are included:

the first method comprises the following steps: the first audio stream is a human voice audio stream.

In one possible implementation manner, the first terminal switches the human voice audio stream to a mixed audio stream and plays the mixed audio stream based on the mixed audio stream in the live broadcast room when the audio quality information of a plurality of continuous audio clips in the human voice audio stream meets a first switching condition. Wherein the first switching condition means that the audio quality of a plurality of continuous audio segments is higher than the reference audio quality, and the plurality of continuous audio segments comprise the currently played audio segment. Optionally, when the audio quality information includes reference scores of audio segments, the first terminal switches the human voice audio stream to the mixed audio stream if the reference scores of a plurality of consecutive audio segments in the human voice audio stream are greater than a first threshold. The first threshold is any value, for example, the first threshold is 20, and when the reference score of 2 consecutive audio segments is greater than 20 minutes, the human voice audio stream is switched to the mixed audio stream.

Optionally, the first terminal switches the human voice audio stream to the mixed audio stream when the audio quality information of the first number of audio segments in the human voice audio stream satisfies the first switching condition. Wherein the first number of audio segments is a consecutive number of audio segments, the first number being any number, such as a first number of 2, 3, 4 or other numbers.

Optionally, after the first terminal switches the audio stream to the mixed audio stream, the audio quality information of a plurality of consecutive audio segments in the human voice audio stream is continuously obtained, and when the audio quality information of the subsequent plurality of consecutive audio segments still satisfies the first switching condition, the first terminal continues to play based on the mixed audio stream, and when the audio quality information of the subsequent plurality of consecutive audio segments no longer satisfies the first switching condition, that is, when the audio quality information of the plurality of consecutive audio segments satisfies the second switching condition, the first terminal switches the mixed audio stream to the human voice audio stream or the original audio stream, and the following second condition is referred to in an implementation manner of switching the mixed audio stream to the human voice audio stream or the original audio stream.

In another possible implementation manner, the second terminal switches the vocal audio stream to the original vocal audio stream and plays the original vocal audio stream in the live broadcast room based on the original vocal audio stream when the audio quality information of the plurality of continuous audio segments in the vocal audio stream meets the second switching condition. Wherein the second switching condition means that the audio quality of a plurality of continuous audio segments is lower than the reference audio quality, and the plurality of continuous audio segments comprise the currently played audio segment. Optionally, when the audio quality information includes reference scores of audio segments, the first terminal switches the vocal audio stream to the original vocal audio stream when the reference scores of a plurality of consecutive audio segments in the vocal audio stream are smaller than a first threshold. For example, the first threshold is 20, and when the reference score of consecutive 3 audio segments is less than 20 minutes, the vocal audio stream is switched to the original audio stream.

Optionally, the first terminal switches the vocal audio stream to the original vocal audio stream when the audio quality information of the second number of audio segments in the vocal audio stream satisfies the second switching condition. Wherein the second number of audio segments is a consecutive number of audio segments, the second number being any number, such as a second number of 2, 3, 4 or other numbers.

The second number may be the same as or different from the first number, and for example, in order to make the audience in the live broadcast hear the singing voice of the main broadcast, the trigger condition for switching to the mixed audio stream may be set more loosely than the trigger condition for switching to the original audio stream, and the first number may be made smaller than the second number.

Optionally, after the first terminal switches the audio stream to the original audio stream, the audio quality information of a plurality of consecutive audio segments in the human voice audio stream is continuously obtained, and when the audio quality information of a plurality of consecutive audio segments later still satisfies the second switching condition, the first terminal continues to play based on the original audio stream, and when the audio quality information of a plurality of consecutive audio segments later no longer satisfies the second switching condition, that is, when the audio quality information of a plurality of consecutive audio segments satisfies the first switching condition, the first terminal switches the original audio stream to the human voice audio stream, and the following third case is referred to in an implementation manner of switching the original audio stream to the human voice audio stream or the mixed audio stream.

And the second method comprises the following steps: the first audio stream is a mixed audio stream.

The method includes the steps that a first terminal obtains audio quality information of an audio clip corresponding to a currently played audio clip in a human voice audio stream in real time in the process of playing based on a mixed audio stream, and determines whether the mixed audio stream needs to be switched or not based on the audio quality information, wherein a time stamp corresponding to the played audio clip is the same as a time stamp of the audio clip in the obtained human voice audio stream, for example, if the first terminal plays the audio clip of the 5 th second in the mixed audio stream, the audio quality information of the audio clip of the 5 th second in the human voice audio stream is obtained. That is, the first terminal switches the mixed audio stream to the human voice audio stream and plays the mixed audio stream based on the human voice audio stream in the live broadcasting room, or switches the mixed audio stream to the original audio stream and plays the mixed audio stream based on the original audio stream in the live broadcasting room, under the condition that the audio quality information of a plurality of continuous audio segments in the human voice audio stream meets the second switching condition.

Optionally, the first terminal obtains audio quality information of a third number of consecutive audio segments in the human voice audio stream, and switches the mixed audio stream to the original audio stream when the audio quality of the third number of consecutive audio segments is not higher than the reference audio quality; or, in the case that the audio quality of the other audio segments except the last audio segment among the third number of audio segments is not higher than the reference audio quality, the first terminal switches the mixed audio stream to the human-sound audio stream. When the second switching condition is that the audio quality of the continuous third number of audio segments is not higher than the reference audio quality, the mixed audio stream is switched to the original audio stream; and switching the mixed audio stream to the human sound audio stream when the audio quality of other audio segments except the last audio segment in the third number of audio segments is not higher than the reference audio quality. The third number may be the same as or different from the first number and the second number.

For example, the third number is 5, the first terminal obtains the audio quality of 5 consecutive audio segments in the current human voice audio stream, and switches the mixed audio stream to the human voice audio stream when the audio quality of the first 4 audio segments is not higher than the reference audio quality and the audio quality of the 5 th audio segment is higher than the reference audio quality; and when the audio quality of the continuous 5 audio segments is not higher than the reference audio quality, switching the mixed audio stream into the original audio stream.

And the third is that: the first audio stream is the original audio stream.

The method includes the steps that a first terminal obtains the audio quality of an audio clip corresponding to a currently played audio clip in a human voice audio stream in real time in the playing process based on an original audio stream, and determines whether switching of a mixed audio stream is needed or not based on audio quality information, wherein a time stamp corresponding to the played audio clip is the same as a time stamp of the audio clip in the obtained human voice audio stream, for example, if the first terminal plays the audio clip of the 5 th second in the original audio stream, the audio quality of the audio clip of the 5 th second in the human voice audio stream is obtained. The method comprises the steps that a first terminal switches an original audio stream into a human voice audio stream to play based on the human voice audio stream in a live broadcasting room under the condition that audio quality information of a plurality of continuous audio clips in the human voice audio stream meets a first switching condition, or switches the original audio stream into a mixed audio stream to play based on the mixed audio stream in the live broadcasting room.

Optionally, the first terminal obtains audio quality of a fourth number of consecutive audio segments in the human voice audio stream, and switches the original audio stream to the mixed audio stream when the audio quality of the fourth number of consecutive audio segments is higher than the reference audio quality; or, in the fourth number of audio segments, in the case that the audio quality of the other audio segments except the last audio segment is higher than the reference audio quality, the first terminal switches the original audio stream to the human sound audio stream. When the first switching condition is that the audio quality of the continuous fourth number of audio segments is higher than the reference audio quality, switching the original audio stream into a mixed audio stream; the first switching condition is that when the audio quality of other audio segments except the last audio segment in the fourth number of audio segments is higher than the reference audio quality, the original audio stream is switched to the human sound audio stream. The fourth number may be the same as or different from the first number, the second number, and the third number.

For example, the fourth number is 5, the first terminal obtains the audio quality of 5 consecutive audio segments in the current human voice audio stream, and switches the original audio stream to the human voice audio stream when the audio quality of the first 4 audio segments is higher than the reference audio quality and the audio quality of the 5 th audio segment is not higher than the reference audio quality; and when the audio quality of the continuous 5 audio segments is higher than the reference audio quality, switching the original audio stream into the mixed audio stream.

The second number may be the same as or different from the first number, and for example, in order to allow the viewer in the live broadcast to hear the singing voice of the main broadcast, the trigger condition for switching to the mixed audio stream may be set more loosely than the trigger condition for switching to the original audio stream, and the first number may be made smaller than the second number.

The switching manner corresponding to the first switching condition and the switching manner corresponding to the second switching condition may be implemented singly, or the two switching manners may be combined, that is, the switching conditions include a first switching condition and a second switching condition, the vocal audio stream is switched to a mixed audio stream when the current vocal audio stream is played and the first switching condition is satisfied, the vocal audio stream is switched to an original vocal audio stream when the current vocal audio stream is played and the second switching condition is satisfied, the mixed audio stream is switched to the vocal audio stream or the original vocal audio stream when the current vocal audio stream is played and the second switching condition is satisfied, and the original vocal audio stream is switched to the vocal audio stream or the mixed audio stream when the current vocal audio stream is played and the first switching condition is satisfied.

In one possible implementation manner, the first terminal displays prompt information in the case that the audio quality information meets the switching condition and is played to the last audio segment in the human voice audio stream, wherein the prompt information is used for indicating that the audio stream is failed to be switched. That is, in the case of playing to the last audio clip, even if the audio quality satisfies the switching condition, audio stream switching is not performed.

In a possible implementation manner, the first terminal needs to start an automatic switching function in the live broadcast room first, and performs automatic switching of the audio stream when the automatic switching function is started, and does not perform automatic switching of the audio stream when the automatic switching function is not started. Namely, the first terminal responds to that the automatic switching function aiming at the audio stream is in an open state, and the audio quality information corresponding to the human voice audio stream meets the switching condition, and switches the first audio stream into the second audio stream.

Optionally, the live interface of the live broadcast room displayed by the first terminal includes an automatic switching control, when the automatic switching function is in a closed state, the user triggers the automatic switching control, and the terminal responds to a triggering operation of the automatic switching control to set the automatic switching function to an open state, so as to start the automatic switching function.

The above embodiment describes a process in which the first terminal automatically switches the audio stream according to the audio quality information, and in another embodiment, the user may manually switch the audio stream. The live broadcast interface of the live broadcast room of the first terminal comprises a play control corresponding to each audio stream, a user triggers the play control associated with the second audio stream, the first terminal responds to the trigger operation of the play control associated with the second audio stream, the first audio stream is switched into the second audio stream, and the second audio stream is played in the live broadcast room. For example, when the first terminal plays the audio clip in the human voice audio stream, the user triggers the play control associated with the mixed audio stream, and the first terminal switches the human voice audio stream into the mixed audio stream to play the audio clip in the mixed audio stream; or the user triggers a play control related to the original audio stream, the first terminal switches the voice audio stream into the original audio stream, and an audio clip in the original audio stream is played.

For the above-mentioned ways of automatically switching audio streams and manually switching audio streams, when playing is performed based on the switched second audio stream, it is necessary to ensure the continuity between the audio before and after switching, that is, the switching of the audio streams needs to follow the principle of continuous, stable and incremental increase. For example, referring to fig. 5, the first terminal decodes the received vocal audio stream, the mixed audio stream, and the original audio stream, and obtains timestamps corresponding to four audio segments of the vocal audio stream, which are ID1, ID2, ID3, and ID4, respectively, the timestamps corresponding to four audio segments in the mixed audio stream are T1, T2, T3, and T4, respectively, the timestamps corresponding to four audio segments in the original audio stream are F1, F2, F3, and F4, respectively, where ID1, T1, and F1 are equal, ID2, T2, and F2 are equal, ID3, T3, and F3 are equal, and ID4, T4, and F4 are equal, and when switching is performed, switching is performed in a manner that timestamps are sequentially incremented.

Therefore, in a possible implementation manner, during the switching process, the first terminal determines, based on the first timestamp of the currently played audio clip, a second timestamp adjacent to the first timestamp in the second audio stream, where the second timestamp is located after the first timestamp, and when the currently played audio clip is played, in a live broadcast room, the audio clip corresponding to the second timestamp in the second audio stream is played, so as to ensure that the switched audio clip is a clip adjacent to the audio clip before switching.

For example, when switching to a mixed audio stream from a human audio stream, timestamps corresponding to four audio segments in the human audio stream are ID1, ID2, ID3 and ID4, respectively, an audio segment corresponding to ID1 is currently being played, and timestamps corresponding to four audio segments in the mixed audio stream are T1, T2, T3 and T4, respectively, there are the following three cases at the time of switching: in the first case: t1< ID1< T2< ID2, then playing the audio clip corresponding to T2 after switching; in the second case: if the ID1 is less than the ID2 is less than the T1 is less than the T2, the audio stream is switched after the audio segment corresponding to the ID2 is played, and the audio segment corresponding to the T1 is played after the audio stream is switched; in the third case: t1< T2< T3< T4< ID1, the audio streams are not switched.

For example, if the first time stamp of an audio clip in the human voice audio stream before switching is 50 seconds, and the second time stamp adjacent to the first time stamp in the original audio stream is 51 seconds, when the current audio clip is played, the audio clip corresponding to 51 seconds in the original audio stream is played. For another example, if the first time stamp of the audio segment in the audio stream of the person before switching is 50 seconds and the second time stamp adjacent to the first time stamp in the original audio stream is 52 seconds, the audio segment corresponding to 52 seconds in the original audio stream is played when the playing of the current audio segment and the audio segment corresponding to 51 seconds is finished.

In addition, before the first terminal can realize the mode of switching the audio streams, when the first terminal plays based on the voice audio stream, the first terminal responds to the triggering operation of the playing control associated with the original audio stream and plays based on the original audio stream. Optionally, the first terminal decodes the human voice audio stream and the original audio stream respectively to obtain each audio segment in the human voice audio stream and each audio segment in the original audio stream, mixes each audio segment in the decoded human voice audio stream and each audio segment in the original audio stream to obtain a mixed audio, and plays the mixed audio.

Optionally, in order to ensure that the volumes of each audio segment in the human voice audio stream are adapted to the volumes of each audio segment in the original audio stream, a live broadcast interface of a live broadcast room displayed by the first terminal includes a first volume control associated with the human voice audio stream and a second volume control associated with the original audio stream, and a user may adjust the audio controls, so as to adjust the volumes corresponding to the corresponding audio streams. The user performs adjustment operation on the first audio control, and the first terminal adjusts the volume corresponding to the human voice audio stream in response to the adjustment operation on the first volume control; and the user executes adjustment operation on the first audio control, and responds to the adjustment operation on the second volume control to adjust the volume corresponding to the original audio stream.

Referring to the schematic diagram of generating an audio stream shown in fig. 6, in the related art, a sound card processes a voice of a target object in a hard mixing manner, and when the hard mixing manner is used for processing, the sound card collects the voice, acquires an accompaniment audio of a target song, outputs a mixed audio stream in which the voice and the accompaniment audio are mixed, and sends the mixed audio stream to a first terminal through a target server, and the first terminal receives the mixed audio stream but cannot receive only the voice. In the embodiment of the application, the sound card adopts a soft mixing mode to process the collected voice of the target object, when the soft mixing mode is adopted for processing, the sound card collects the voice, obtains the accompaniment audio of the target song, outputs the voice audio stream and the mixed audio stream, sends the voice audio stream and the mixed audio stream to the first terminal through the target server, and the target server can also obtain the original singing audio stream and sends the original singing audio stream to the first terminal.

In the method provided by the embodiment of the application, in the live broadcasting process of a live broadcasting room, the second terminal obtains a voice audio stream through the collected voice, mixes the voice with the accompaniment audio of a target song to obtain a mixed audio stream, so as to obtain the voice audio stream and the mixed audio stream corresponding to the target song, the obtained voice audio stream and the mixed audio stream are sent to the first terminal through the target server, the target server also sends the obtained original audio stream to the first terminal, the first terminal can obtain the voice audio stream, the mixed audio stream and the original audio stream, and in the process of playing based on the first audio stream, when the audio quality information corresponding to the voice audio stream meets the switching condition, the audio stream can be switched, the first audio stream is switched to a second audio stream different from the first audio stream, so as to realize the switching among multiple paths of audio streams, the system has the advantages that various sounds can be played in the live broadcast room, so that the requirements of different audiences are met, and the live broadcast effect is improved.

Fig. 7 is a schematic structural diagram of a live audio processing apparatus according to an embodiment of the present application. Referring to fig. 7, the apparatus includes:

the audio stream receiving module 701 is configured to receive multiple audio streams of a target song during a live broadcast process in a live broadcast room, where the multiple audio streams include a vocal audio stream of a target object, a mixed audio stream of the target object, and an original audio stream, and the mixed audio stream is obtained by mixing a vocal sound of the target object and an accompaniment audio of the target song;

a playing module 702, configured to play in a live broadcast room based on a first audio stream;

the audio stream switching module 703 is configured to switch, in response to that audio quality information corresponding to the human voice audio stream meets a switching condition, the first audio stream to a second audio stream in the multiple audio streams, and play the second audio stream in the live broadcast room based on the second audio stream, where the second audio stream is different from the first audio stream, and the audio quality information is used to represent audio quality of a corresponding audio segment.

The device that this application embodiment provided, in the live broadcast in-process of live broadcast room, acquire the vocal audio stream that the target song corresponds, mix audio stream and former audio stream, and carry out the in-process of broadcast based on first audio stream, when audio quality information that the vocal audio stream corresponds satisfies the switching condition, can switch over the audio stream, switch over first audio stream into the second audio stream different with this first audio stream, realize the switching between multichannel audio stream, but the multiple sound of broadcast in messenger's live broadcast room, thereby satisfy different spectator's demand, improve the live broadcast effect.

In another possible implementation manner, referring to fig. 8, the audio stream switching module 703 is configured to switch the first audio stream to the second audio stream in response to that the automatic switching function for the audio stream is in an on state and that the audio quality information corresponding to the human voice audio stream meets the switching condition.

In another possible implementation, referring to fig. 8, the apparatus further includes:

the quality information obtaining module 704 is configured to analyze a stream packet corresponding to a human audio stream to obtain audio quality information of each audio segment.

In another possible implementation manner, referring to fig. 8, the audio stream switching module 703 includes:

a timestamp determining unit 7031, configured to determine, based on a first timestamp of a currently played audio segment, a second timestamp adjacent to the first timestamp in the second audio stream, where the second timestamp is located after the first timestamp;

and an audio clip playing unit 7032, configured to play, in the live broadcast room, an audio clip corresponding to the second timestamp in the second audio stream when the currently played audio clip is played.

In another possible implementation manner, the playing module 702 is further configured to:

and when playing is carried out based on the voice audio stream, responding to the triggering operation of the playing control associated with the original audio stream, and playing based on the original audio stream.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

It should be noted that: in the live audio processing apparatus provided in the foregoing embodiment, when processing live audio, only the division of the functional modules is exemplified, and in practical applications, the functions may be distributed by different functional modules as needed, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the live audio processing apparatus and the live audio processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

Fig. 9 is a schematic structural diagram of a live audio processing apparatus according to an embodiment of the present application. Referring to fig. 9, the apparatus includes:

a voice acquisition module 901, configured to acquire a voice of a target object sent according to a target song in a live broadcast process in a live broadcast room, so as to obtain a voice audio stream of the target object;

a mixing module 902, configured to mix a vocal sound of the target object with an accompaniment audio of the target song to obtain a mixed audio stream of the target object;

the audio stream sending module 903 is configured to send a person audio stream and a mixed audio stream to the target server, where the target server is configured to obtain an original audio stream of the target song, and send the person audio stream, the mixed audio stream, and the original audio stream to the first terminal.

The device that this application embodiment provided, the voice audio stream is obtained through the voice of gathering, mix the accompaniment audio frequency of voice and target song, obtain mixed audio stream, thereby obtain the voice audio stream and the mixed audio stream that the target song corresponds, send the voice audio stream and the mixed audio stream that obtain for first terminal through the target server, and the target server still can send the primitive song audio stream who obtains for first terminal, make follow-up first terminal can carry out the audio stream switching based on received multichannel audio stream, but make broadcast multiple sound in the live broadcast room, in order to improve live broadcast effect.

In another possible implementation, referring to fig. 10, the apparatus further includes:

the quality information obtaining module 904 is configured to identify each audio segment in the human voice audio stream, to obtain audio quality information corresponding to the human voice audio stream, where the audio quality information is used to indicate audio quality of the corresponding audio segment.

The embodiment of the present application further provides a terminal, where the terminal includes a processor and a memory, where the memory stores at least one program code, and the at least one program code is loaded and executed by the processor, so as to implement the operation executed in the live audio processing method of the foregoing embodiment.

Fig. 11 is a schematic structural diagram of a terminal 1100 according to an embodiment of the present application. The terminal 1100 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 1100 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so forth.

The terminal 1100 includes: a processor 1101 and a memory 1102.

Processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1101 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1101 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1101 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and rendering content that the display screen needs to display. In some embodiments, processor 1101 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 can also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one program code for execution by processor 1101 to implement the live audio processing method provided by method embodiments herein.

In some embodiments, the terminal 1100 may further include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102 and peripheral interface 1103 may be connected by a bus or signal lines. Various peripheral devices may be connected to the peripheral interface 1103 by buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, display screen 1105, camera assembly 1106, audio circuitry 1107, positioning assembly 1108, and power supply 1109.

The peripheral interface 1103 may be used to connect at least one peripheral associated with I/O (Input/Output) to the processor 1101 and the memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1101, the memory 1102 and the peripheral device interface 1103 may be implemented on separate chips or circuit boards, which is not limited by this embodiment.

The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1104 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, the radio frequency circuit 1104 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1104 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1104 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1105 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1105 is a touch display screen, the display screen 1105 also has the ability to capture touch signals on or over the surface of the display screen 1105. The touch signal may be input to the processor 1101 as a control signal for processing. At this point, the display screen 1105 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display 1105 may be one, disposed on a front panel of terminal 1100; in other embodiments, the display screens 1105 can be at least two, respectively disposed on different surfaces of the terminal 1100 or in a folded design; in other embodiments, display 1105 can be a flexible display disposed on a curved surface or on a folded surface of terminal 1100. Even further, the display screen 1105 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display screen 1105 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

Camera assembly 1106 is used to capture images or video. Optionally, camera assembly 1106 includes a front camera and a rear camera. The front camera is arranged on the front panel of the terminal, and the rear camera is arranged on the back of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1106 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 1107 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1101 for processing or inputting the electric signals to the radio frequency circuit 1104 to achieve voice communication. For stereo capture or noise reduction purposes, multiple microphones may be provided, each at a different location of terminal 1100. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1101 or the radio frequency circuit 1104 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1107 may also include a headphone jack.

Positioning component 1108 is used to locate the current geographic position of terminal 1100 for purposes of navigation or LBS (Location Based Service). The Positioning component 1108 may be a Positioning component based on the united states GPS (Global Positioning System), the chinese beidou System, the russian glonass Positioning System, or the european union galileo Positioning System.

Power supply 1109 is configured to provide power to various components within terminal 1100. The power supply 1109 may be alternating current, direct current, disposable or rechargeable. When the power supply 1109 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1100 can also include one or more sensors 1110. The one or more sensors 1110 include, but are not limited to: acceleration sensor 1111, gyro sensor 1112, pressure sensor 1113, fingerprint sensor 1114, optical sensor 1115, and proximity sensor 1116.

Acceleration sensor 1111 may detect acceleration levels in three coordinate axes of a coordinate system established with terminal 1100. For example, the acceleration sensor 1111 may be configured to detect components of the gravitational acceleration in three coordinate axes. The processor 1101 may control the display screen 1105 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1111. The acceleration sensor 1111 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1112 may detect a body direction and a rotation angle of the terminal 1100, and the gyro sensor 1112 may cooperate with the acceleration sensor 1111 to acquire a 3D motion of the user with respect to the terminal 1100. From the data collected by gyroscope sensor 1112, processor 1101 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensor 1113 may be disposed on a side bezel of terminal 1100 and/or underlying display screen 1105. When the pressure sensor 1113 is disposed on the side frame of the terminal 1100, the holding signal of the terminal 1100 from the user can be detected, and the processor 1101 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1113. When the pressure sensor 1113 is disposed at the lower layer of the display screen 1105, the processor 1101 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1105. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1114 is configured to collect a fingerprint of the user, and the processor 1101 identifies the user according to the fingerprint collected by the fingerprint sensor 1114, or the fingerprint sensor 1114 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the user is authorized by the processor 1101 to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. Fingerprint sensor 1114 may be disposed on the front, back, or side of terminal 1100. When a physical button or vendor Logo is provided on the terminal 1100, the fingerprint sensor 1114 may be integrated with the physical button or vendor Logo.

Optical sensor 1115 is used to collect ambient light intensity. In one embodiment, the processor 1101 may control the display brightness of the display screen 1105 based on the ambient light intensity collected by the optical sensor 1115. Specifically, when the ambient light intensity is high, the display brightness of the display screen 1105 is increased; when the ambient light intensity is low, the display brightness of the display screen 1105 is reduced. In another embodiment, processor 1101 may also dynamically adjust the shooting parameters of camera assembly 1106 based on the ambient light intensity collected by optical sensor 1115.

A proximity sensor 1116, also referred to as a distance sensor, is provided on the front panel of terminal 1100. Proximity sensor 1116 is used to capture the distance between the user and the front face of terminal 1100. In one embodiment, when the proximity sensor 1116 detects that the distance between the user and the front face of the terminal 1100 is gradually decreased, the display screen 1105 is controlled by the processor 1101 to switch from a bright screen state to a dark screen state; when the proximity sensor 1116 detects that the distance between the user and the front face of the terminal 1100 becomes progressively larger, the display screen 1105 is controlled by the processor 1101 to switch from a breath-screen state to a light-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 11 does not constitute a limitation of terminal 1100, and may include more or fewer components than those shown, or may combine certain components, or may employ a different arrangement of components.

The embodiment of the present application further provides a server, where the server includes a processor and a memory, where the memory stores at least one program code, and the at least one program code is loaded and executed by the processor, so as to implement the operations executed in the live audio processing method of the foregoing embodiment.

Fig. 12 is a schematic structural diagram of a server 1200 according to an embodiment of the present application, where the server 1200 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1201 and one or more memories 1202, where the memory 1202 stores at least one program code, and the at least one program code is loaded and executed by the processors 1201 to implement the methods provided by the foregoing method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

The embodiment of the present application further provides a computer-readable storage medium, where at least one program code is stored in the computer-readable storage medium, and the at least one program code is loaded and executed by a processor to implement the operations executed in the live audio processing method of the foregoing embodiment.

Embodiments of the present application also provide a program code product or program code comprising program code stored in a computer readable storage medium. The program code is loaded and executed by a processor to implement the operations performed in the live audio processing method of the above-described embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only an alternative embodiment of the present application and is not intended to limit the present application, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of live audio processing, the method comprising:

2. The method of claim 1, wherein switching the first audio stream to the second audio stream of the multiple audio streams in response to the audio quality information corresponding to the human voice audio stream satisfying a switching condition comprises:

3. The method of claim 1, wherein in response to the audio quality information corresponding to the vocal audio stream satisfying a switching condition, the first audio stream is switched to a second audio stream of the multiple audio streams, and before playing based on the second audio stream in the live broadcast room, the method further comprises:

4. The method of any of claims 1-3, wherein the playing based on the second audio stream in the live broadcast room comprises:

5. The method of claim 1, further comprising:

6. A method of live audio processing, the method further comprising:

7. The method of claim 6, wherein prior to sending the vocal audio stream and the mixed audio stream to a target server, the method further comprises:

8. A live audio processing system is characterized by comprising a first terminal, a target server and a second terminal;

9. A live audio processing apparatus, the apparatus comprising:

the playing module is used for playing in the live broadcast room based on a first audio stream, wherein the first audio stream is any one of the multiple audio streams;

10. A live audio processing apparatus, the apparatus comprising:

the voice acquisition module is used for acquiring voice of a target object according to a target song in the live broadcast process of a live broadcast room to obtain voice audio stream of the target object;

11. A computer device comprising a processor and a memory, the memory having stored therein at least one program code, the at least one program code loaded into and executed by the processor to perform operations performed in a live audio processing method as claimed in any one of claims 1 to 5 or to perform operations performed in a live audio processing method as claimed in any one of claims 6 to 7.

12. A computer-readable storage medium having stored therein at least one program code, which is loaded and executed by a processor, to implement operations performed in the live audio processing method as claimed in any one of claims 1 to 5 or to implement operations performed in the live audio processing method as claimed in any one of claims 6 to 7.