WO2022156336A1 - 音频数据处理方法、装置、设备、存储介质及程序产品 - Google Patents

音频数据处理方法、装置、设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2022156336A1
WO2022156336A1 PCT/CN2021/131404 CN2021131404W WO2022156336A1 WO 2022156336 A1 WO2022156336 A1 WO 2022156336A1 CN 2021131404 W CN2021131404 W CN 2021131404W WO 2022156336 A1 WO2022156336 A1 WO 2022156336A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal processing
result
processing strategy
optimization
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2021/131404
Other languages
English (en)
French (fr)
Inventor
曹木勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to JP2023544240A priority Critical patent/JP7597300B2/ja
Priority to KR1020237027570A priority patent/KR20230130730A/ko
Priority to EP21920712.3A priority patent/EP4283617A4/en
Publication of WO2022156336A1 publication Critical patent/WO2022156336A1/zh
Priority to US17/991,239 priority patent/US12477069B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G3/00Gain control in amplifiers or frequency changers
    • H03G3/20Automatic control
    • H03G3/30Automatic control in amplifiers having semiconductor devices
    • H03G3/3005Automatic control in amplifiers having semiconductor devices in amplifiers suitable for low-frequencies, e.g. audio amplifiers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/85Providing additional services to players
    • A63F13/87Communicating with other players during game play, e.g. by e-mail or chat
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G3/00Gain control in amplifiers or frequency changers
    • H03G3/20Automatic control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Definitions

  • the present application relates to the field of computer technology, and in particular, to an audio data processing method, apparatus, device, storage medium and program product.
  • a certain user using a mobile terminal can make a system call with other users (eg, user B) through the system call mode.
  • user A can make a system call (ie, make a phone call) with user B through the aforementioned system call mode in a phone call scenario.
  • the application layer of the game application X often needs to share the mobile terminal's System call mode at the end system layer.
  • the application layer and the terminal system layer will pass the same function
  • a type of signal processing unit that is, a voice optimization component with the same function
  • Embodiments of the present application provide an audio data processing method, apparatus, device, storage medium, and program product, which can improve the effect of voice optimization in game scenarios.
  • an embodiment of the present application provides an audio data processing method, the method is executed by a computer device, and the method includes: in a game voice mode, acquiring a first pre-signal processing strategy in an application layer of a business application The associated signal processing result; wherein, the first pre-signal processing strategy includes at least one first optimization component; according to the signal processing result, the application layer controls the second pre-signal processing strategy in the terminal system layer.
  • the switching state of the optimization component or controlling the switching state of the first optimization component in the first pre-signal processing strategy; wherein, the first optimization component enabled in the first pre-signal processing strategy is different from the second pre-signal processing strategy
  • the second optimization component enabled in the strategy obtains the uplink voice data of the first user corresponding to the business application in the game voice mode, based on the first optimization component and the second pre-signal processing strategy enabled in the first pre-signal processing strategy
  • the second optimization component enabled in the game voice mode performs voice optimization on the upstream voice data in the game voice mode.
  • an embodiment of the present application provides an audio data processing method, the method is executed by a computer device, and the method includes: in a game voice mode, acquiring a first pre-signal processing strategy in an application layer of a business application The associated signal processing result; wherein, the first pre-signal processing strategy includes at least one first optimization component; according to the signal processing result, the application layer controls the second pre-signal in the terminal system layer The switch state of the second optimization component in the processing strategy; wherein, the second pre-signal processing strategy includes at least one second optimization component.
  • an embodiment of the present application provides an audio data processing method, the method is executed by a computer device, and the method includes: in a game voice mode, acquiring a first pre-signal processing strategy in an application layer of a business application The associated signal processing result; wherein, the first pre-signal processing strategy includes at least one first optimization component; according to the signal processing result, control the first pre-signal processing strategy in the terminal system layer. 2.
  • the second optimization component enabled in the second pre-signal processing strategy is enabled in the second pre-signal processing strategy.
  • An aspect of an embodiment of the present application provides an apparatus for processing audio data, the apparatus comprising:
  • the processing result acquisition module is used to acquire the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application in the game voice mode; wherein, the first pre-signal processing strategy includes at least one first pre-signal processing strategy. an optimization component;
  • the component control module is used to control the switch state of the second optimization component in the second pre-signal processing strategy in the terminal system layer at the application layer, or control the first pre-signal processing strategy according to the signal processing result.
  • the switch state of the optimization component wherein, the first optimization component enabled in the first pre-signal processing strategy is different from the second optimization component enabled in the second pre-signal processing strategy;
  • the voice optimization module is used to obtain the uplink voice data of the first user corresponding to the business application in the game voice mode, based on the first optimization component enabled in the first pre-signal processing strategy and the enabled in the second pre-signal processing strategy.
  • the second optimization component performs voice optimization on the uplink voice data in the game voice mode.
  • An aspect of an embodiment of the present application provides an apparatus for processing audio data, the apparatus comprising:
  • the processing result acquisition module is used to acquire the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application in the game voice mode; wherein, the first pre-signal processing strategy includes at least a first optimization component;
  • a component control module configured to control the switch state of the second optimized component in the second pre-signal processing strategy in the terminal system layer at the application layer according to the signal processing result; wherein the second pre-signal At least one second optimization component is included in the processing strategy.
  • An aspect of an embodiment of the present application provides an apparatus for processing audio data, the apparatus comprising:
  • the processing result acquisition module is used to acquire the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application in the game voice mode; wherein, the first pre-signal processing strategy includes at least a first optimization component;
  • the component control module is configured to control the switch state of the second optimization component in the second pre-signal processing strategy in the terminal system layer, or control the first pre-signal processing strategy in the first pre-signal processing strategy according to the signal processing result.
  • a switch state of an optimization component wherein the first optimization component enabled in the first pre-signal processing strategy is different from the second optimization component enabled in the second pre-signal processing strategy.
  • An aspect of the embodiments of the present application provides a computer device, including: a processor and a memory;
  • the processor is connected to the memory, where the memory is used to store a computer program, and when the computer program is executed by the processor, the computer device executes the method provided by the embodiments of the present application.
  • An aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is adapted to be loaded and executed by a processor, so that a computing device having the processor executes the present application Methods provided by the examples.
  • embodiments of the present application provide a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the method provided by the embodiments of the present application.
  • a computer device for example, a mobile terminal
  • the optimization component that is, the voice optimization component in the second pre-signal processing strategy
  • control to turn on and off the first optimization component in the first pre-signal processing strategy that is, the voice optimization component in the first pre-signal processing strategy
  • the first optimization component enabled in the first pre-signal processing strategy is different from the second optimization component enabled in the second pre-signal processing strategy.
  • the embodiments of the present application propose that one or more voice optimizations in the terminal system layer can be controlled to be enabled or disabled at the application layer according to the aforementioned signal processing results (that is, the algorithm comparison results corresponding to the voice optimization components with the same function). Therefore, the voice optimization components with the same optimization function can run either at the application layer or at the terminal system layer, so that the sound quality impairment of the uplink voice data can be reduced from the root. It can be understood that the number and type of the second optimization components that are enabled or disabled in the end system layer will not be limited here.
  • the computer device when acquiring the uplink voice data of the first user in the game voice mode, can quickly perform the uplink voice data in the game voice mode based on the first optimization component and the second optimization component with different functions.
  • Voice optimization which can improve the voice optimization effect in game scenarios while reducing sound quality damage.
  • FIG. 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a business mode division provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a voice data processing flow provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a scenario of voice interaction in a game scene provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of an audio data processing method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a scenario of a test list provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a scenario for determining an optimal signal processing strategy associated with a sound quality parameter provided by an embodiment of the present application
  • each voice optimization component is turned on and off in a solution for controlling voice pre-signal processing provided by an embodiment of the present application
  • FIG. 9 is a schematic diagram of an audio data processing method provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a scenario of a resource configuration interface provided by an embodiment of the present application.
  • FIG. 11 is a schematic flowchart for providing a dual-speak service in different types of languages provided by an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of another audio data processing method provided by an embodiment of the present application.
  • FIG. 13 is a schematic flowchart of another audio data processing method provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of an audio data processing apparatus provided by an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Game Voice Mode A voice mode parallel to the media mode and the call mode provided by the terminal system according to the voice requirements and characteristics of the game application scenario.
  • sample rate also known as the sampling frequency, refers to the number of samples per second that are extracted from a continuous signal to form a discrete signal, and the unit is Hertz (Hz). The higher the sample rate, the more accurate the data. Commonly used sampling rates are 8khz, 16khz, 44.1khz, 48khz.
  • bits of Samples The number of bits is the sampling value or the sampling value, which is a parameter used to measure the fluctuation of the sound. It refers to the binary digits of the digital sound signal used by the sound card when collecting and playing sound files. Commonly used sampling bits are 8 bits, 16 bits and 32 bits, and the mobile phone platform is generally 16 bits.
  • the number of channels is also called the number of channels, that is, the number of sound channels, usually related to hardware devices.
  • the common number of channels are mono and dual channel (stereo). Mono sound can only be sounded through one speaker, and dual channel sound can be sounded through two speakers. Generally, the left and right channels have division of labor. So that you can feel the space effect more.
  • Noise Suppression The voice data collected by voice acquisition tools usually includes both valid voice data, such as human voice, music, etc., and useless noise data, such as ambient sound. Noise suppression is a technology that eliminates or reduces the influence of noise on the whole speech effect as much as possible according to the characteristics of speech data.
  • Automatic Gain Control An automatic control method that automatically adjusts the gain of the amplifier circuit with the signal strength, mainly used to enhance the signal strength of valid voice data.
  • Echo refers to the sound reflected or repeated by the sound wave, or the sound signal is collected and re-transmitted by the opposite end after being transmitted and played through the network, so that it returns to the speaker, through the signal processing algorithm Or the device to eliminate these sounds is echo cancellation.
  • Dynamic control is dynamic range control, which can dynamically adjust the audio output amplitude. When the volume is high, the volume is appropriately suppressed, and when the volume is low, the volume is appropriately increased, so that the volume is always controlled at a suitable level. In the range. Usually used to control the audio output power, so that the speaker is not broken, and can be heard clearly when playing at low volume.
  • the voice pre-processing technology refers to the technology of processing the original voice data before encoding and sending, so that the processed voice signal can better reflect the essential characteristics of the voice.
  • Speech preprocessing technology usually mainly includes noise suppression, echo cancellation, automatic gain and other technologies.
  • FIG. 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application.
  • the network architecture may include a service server 2000 and a cluster of user terminals.
  • a user terminal cluster may include one or more user terminals, and the number of user terminals will not be limited here.
  • the multiple user terminals here may specifically include a user terminal 3000a, a user terminal 3000b, a user terminal 3000c, . . . , and a user terminal 3000n.
  • user terminals 3000a, . . . , user terminals 3000b can be respectively connected to the service server 2000 through the network, so that each user terminal in the user terminal cluster can be connected to the service server 2000 through the network connection data interaction.
  • the service server 2000 shown in FIG. 1 may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud service, cloud database, cloud computing, cloud function, cloud Cloud servers for basic cloud computing services such as storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
  • cloud service such as storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
  • this embodiment of the present application may select a user terminal (for example, the user terminal used by user A) from the user terminal cluster shown in FIG. 1 as the target user terminal.
  • the user terminal 3000a shown is used as a target user terminal, and the target user terminal may integrate a service application with an audio data processing function (for example, an audio data collection and playback function).
  • the service application may specifically include an entertainment client (eg, a game client), a social client, an office client, a live client, and other application clients with audio data collection and playback functions.
  • the target user terminal (for example, user terminal 3000a) may specifically include: a mobile terminal carrying an audio data processing function, such as a smart phone, a tablet computer, a notebook computer, and a wearable device.
  • the application types corresponding to entertainment clients may be collectively referred to as game types
  • social clients eg, clients such as QQ, WeChat, etc.
  • office clients eg, The corresponding application types such as enterprise clients
  • live streaming clients are collectively referred to as non-game types.
  • different service modes can be adaptively selected according to different application types of the service application, so as to adapt to different service scenarios. Carry out different types of voice interaction services.
  • the service modes here may specifically include a system media mode 21a (also referred to as a "media mode”), a system call mode 21b (also referred to as a "voice call mode” or a “call mode”) and a game voice mode 21c.
  • a system media mode 21a also referred to as a "media mode”
  • a system call mode 21b also referred to as a "voice call mode” or a “call mode”
  • game voice mode 21c a game voice mode
  • the target user terminal can configure the service mode of the service application as the system media mode 21a by default when the user (ie, the first user) does not need voice dual-talk (ie, does not need to perform voice interaction).
  • the target user terminal can also intelligently identify the application type of the business application that needs to perform voice interaction when the user (ie, the first user) needs a voice dual-talk requirement (ie, voice interaction is required), and then can perform voice interaction according to the application type of the business application.
  • different business models are adaptively selected.
  • the application type of the business application is a game type
  • the application type of the business application is a non-game type
  • the system media mode 21a may be used to instruct the target user terminal to play audio data of music or video programs for the current user (ie, the first user).
  • the system call mode 21b can be used to instruct the target user terminal to make the current user (ie the first user) communicate with another user (ie the second user) through the system call mode 21b in a non-game scenario, and the second user may be the first user.
  • the user selected in the service application who requests to make a system call makes a system call.
  • the game voice mode 21c can be used to instruct the target user terminal to provide a brand-new voice interaction service in the game scene.
  • the user ie the first user
  • the third user the third user may be a game user who is in the same game camp as the first user in the game voice mode 21c to make a game voice call.
  • the voice environment in the game scene is more complex, and it is necessary to take into account the quality of voice calls and the quality of media playback.
  • the existing user terminal only provides the system call mode 21b suitable for the call scene and the system media mode 21a suitable for the music playing scene, without considering the combination of the two, and the application effect is not good in the game scene. Therefore, how to improve the sound quality of system media playback while ensuring the voice double-end calling experience has become the key to improving the voice experience of game users.
  • the mobile intelligent terminal provides a game voice mode 21c parallel to the system call mode 21b and the system media mode 21a to achieve the purpose of optimizing voice services in game application scenarios.
  • the game voice mode 21c is a voice mode applied in the game business or in the game scene, which aims to optimize the player's voice experience for the game scene.
  • effective optimization measures will be taken for each link of voice collection, processing, and settings for game application scenarios, so as to provide gamers with smooth game voice and high-quality game sound effects.
  • game voice mode is not only applicable to game scenarios, but also Applicable to other business scenarios that have the same or similar voice processing requirements as game scenarios, such as all voice business scenarios that require both voice call quality and media playback sound quality, such as video live broadcast scenarios, video conference scenarios, etc. This is not limited.
  • the game voice mainly goes through two stages: the voice data collection stage and the voice data playback stage.
  • FIG. 3 it shows a schematic diagram of the voice data processing flow.
  • the voice data collection stage it includes:
  • Voice Voice is usually input into the mobile phone through the microphone.
  • the microphone mainly converts the sound wave into a voltage signal, and then samples the voltage signal, thereby converting the continuous voltage signal into a digital signal that the computer can process.
  • the indicators that affect the quality of the collected voice signal mainly include the sampling rate, the number of sampling bits and the number of channels. The higher the sample rate, the more sound samples are taken per second, and the higher the final audio quality.
  • Voice signal preprocessing preprocessing the data collected by the microphone to improve the quality of the voice data.
  • the pre-processing process usually includes audio processing algorithms such as echo cancellation, automatic gain and noise suppression.
  • Voice coding is to compress the collected digital voice signal, reduce the transmission bit rate and perform digital transmission.
  • Transmission Transmit the encoded voice data to be sent to the designated voice server through the network, so that other users can listen to the user's voice data through the server.
  • the sequence includes:
  • Receive voice data the process of obtaining voice data of other users from a designated voice server for playback.
  • Decoding is a process corresponding to encoding, that is, decoding the received encoded voice data and converting digital signals into analog signals.
  • Post-processing The decoded voice data may be stuck due to problems such as packet loss, which may affect the audio playback effect. It is necessary to adjust and optimize the decoded voice data through the post-processing process.
  • the target user terminal can start the cooperation mechanism between the application layer and the terminal system layer in the game voice mode, and then can adaptively compare the results (ie, the signal processing results) of the algorithms according to the cooperation mechanism, from the application layer. Select to enable a voice optimization component with the same optimization function from the voice optimization component of the layer and the voice optimization component of the terminal system layer.
  • real-time vocal processing can be performed on the uplink voice data of the current user (that is, the above-mentioned first user) collected in real time in this game scenario, so as to improve the The voice optimization effect of uplink voice data can further improve the voice interaction experience between game users.
  • FIG. 4 is a schematic diagram of a scenario of voice interaction in a game scenario provided by an embodiment of the present application.
  • the application type of the service application in the user terminal 10a shown in FIG. 4 may be the above game type.
  • the user terminal 10a can switch the service mode of the service application from the system media mode to the game voice mode, so that the user 1 shown in FIG. User 2 shown in 4 (that is, the above-mentioned third user) makes a game voice call.
  • the user terminal 10a shown in FIG. 4 may be the above-mentioned target user terminal with the audio data processing function. It can be understood that, when the user 1 shown in FIG. 4 needs to perform voice interaction with the user terminal 20a corresponding to the user 2 shown in FIG. The voice of the user 1 is voice-optimized, so that the voice of the user 1 after the voice optimization can be used as the target voice optimization result corresponding to the uplink voice data to be sent to the user terminal 20a corresponding to the user 2, and then can be in the user terminal 20a. The voice of the voice-optimized user 1 is played through the speaker shown in FIG. 4 .
  • the voice of the user 1 collected by the microphone of the user terminal 10a may be collectively referred to as the voice uplink signal, that is, The audio frames obtained after spectrum analysis of the sound signal collected by the microphone may be collectively referred to as uplink speech data.
  • the voice-optimized voice of the user 1 played by the speaker of the user terminal 20a may be collectively referred to as voice
  • the downlink signal that is, the audio frame of the sound signal transmitted to the speaker for playback is called downlink voice data.
  • the voice of other users for example, user 2 optimized for the voice played by the speaker of the user terminal 10a It can also be collectively referred to as a voice downlink signal.
  • the user terminal 10a shown in FIG. 4 collects the voice of the user 1 (that is, the above-mentioned voice uplink signal) in real time through the microphone, the uplink voice data corresponding to the voice uplink signal can be obtained, Further, the optimal signal processing strategy can be obtained through the common negotiation between the application layer and the terminal system layer of the above-mentioned service application.
  • the second optimization component is to perform voice optimization on the uplink voice data of the user 1.
  • the second optimization component here is different from the first optimization component; in addition, the first optimization component enabled in the application layer (that is, the first optimization component enabled in the first pre-signal processing strategy) and the second pre-signal processing
  • the second optimization component disabled in the strategy has the same optimization function
  • the second optimization component enabled in the terminal system has the same optimization function as the first optimization component disabled in the aforementioned first pre-signal processing strategy.
  • the voice optimization components in the first pre-signal processing strategy may be collectively referred to as first optimization components
  • the speech optimization components in the second pre-signal processing strategy may be collectively referred to as second optimization components.
  • the voice optimization here is the pre-processing process described above, including but not limited to echo cancellation (Acoustic Echo Cancellation, AEC), noise suppression (noise suppression, NS), automatic gain control (Auto Gain Control, AGC).
  • AEC Acoustic Echo Cancellation
  • NS noise suppression
  • AGC Automatic gain control
  • the echo mainly refers to the voice that the speaker (for example, the aforementioned user 1 ) sends to other people (for example, the aforementioned user 2 ) through his own communication device (for example, the aforementioned user terminal 10a ).
  • the echo cancellation involved in the embodiments of the present application mainly refers to a processing scheme in which the target user terminal (for example, the aforementioned user terminal 10a) uses a certain algorithm device (for example, an echo cancellation component) to cancel the echo.
  • the noise mainly refers to the sound signal collected by the target user terminal (for example, the aforementioned user terminal 10a) and emitted by objects other than the speaker (for example, the aforementioned user 1)
  • the noise suppression involved in the embodiments of the present application mainly refers to a processing scheme for a target user terminal (eg, the aforementioned user terminal 10a) to eliminate such noise through a certain algorithm device (eg, a noise suppression component).
  • the target user terminal (for example, the aforementioned user terminal 10a) can use a certain algorithm device (for example, a gain control component) to intelligently adjust the energy of the speech signal according to the range of human auditory perception of the sound.
  • a processing scheme that is adjusted so that the speech signal is better perceived.
  • the user terminal 10a chooses to enable the first optimization component 11 (eg, the echo suppression component) in the first pre-signal processing strategy in the application layer through the algorithm comparison result, it needs to synchronously disable the
  • the first optimization component 11 eg, an echo suppression component
  • the echo suppression component in the policy This means that in the embodiment of the present application, when the target user terminal collects the voice of user 1 (that is, the uplink voice data of the first user) in the game scene in real time through the microphone, it only needs to be in the application layer or the terminal system layer.
  • Running a speech optimization component with the same optimization function ensures that the function of the speech optimization component with the same optimization function runs once, so that the computing resources caused by the repeated operation of the function of the speech optimization component with the same optimization function can be solved at the root cause waste problem.
  • the terminal used by the user 1 (for example, the user terminal 10a shown in FIG. 4) can determine The application type of the service application running in the user terminal 10a, and then the service mode of the service application can be switched from the system media mode to the game voice mode, so that the user terminal 10a can collect and record the game voice in real time in the game voice mode.
  • the voice of the user 1 is optimized to obtain the voice-optimized voice of the user 1 shown in FIG.
  • the user terminal 10a can broadcast the voice-optimized voice of the user 1 to other teammates in the camp where the user 1 is located (for example, the user 2, the user 2 can be other games in the same camp as the user 1). user).
  • the terminal for example, the user terminal 20a shown in FIG. 4
  • other teammates for example, user 2
  • the received voice of user 1 after the voice optimization can be played.
  • the user terminal 10a detects that the application type of the above-mentioned business application belongs to a non-game type (for example, a social type)
  • the user terminal 10a can intelligently change the business mode of the business application by The system media mode is switched to the system call mode, so as to execute the second type of voice call service in the system call mode, and the second type of voice call service may be the voice interaction service corresponding to the system call type in the non-game scenario.
  • a social scenario user 1 shown in FIG. 4 can be allowed to send a system call request corresponding to the system call type to user 2 shown in FIG.
  • the target user terminal controls the opening and closing of the second optimization component in the second pre-signal processing strategy in the terminal system layer through the above-mentioned application layer control, and the uplink to the first user.
  • voice optimization for voice data, reference may be made to the embodiments corresponding to FIG. 5 to FIG. 15 below.
  • FIG. 5 is a schematic flowchart of an audio data processing method provided by an embodiment of the present application.
  • the method is executed by a computer device, for example, the method can be executed by a user terminal (for example, the above-mentioned target user terminal, the target user terminal can be the user terminal 10a in the above-mentioned embodiment corresponding to FIG. 4 ), or can be executed by a service server (such as , the above-mentioned service server 2000 shown in FIG. 1 ) is executed, and may also be executed by the user terminal and the service server interacting and cooperating.
  • a user terminal for example, the above-mentioned target user terminal, the target user terminal can be the user terminal 10a in the above-mentioned embodiment corresponding to FIG. 4
  • a service server such as , the above-mentioned service server 2000 shown in FIG. 1
  • this embodiment is described by taking the method being executed by a user terminal as an example.
  • the audio data processing method may include at least one of the following steps
  • Step S101 in the game voice mode, obtain the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application;
  • the target user terminal can obtain the sound quality index of the business application in the game voice mode, and then can configure the sound quality parameters of the business application according to the sound quality index of the business application (the sound quality parameters here may include but are not limited to voice sampling rate and voice number of channels). Further, the target user terminal obtains the terminal type of the terminal to which the service application belongs, and searches for a test type matching the terminal type in the test list associated with the service application. If a test type matching the terminal type is found in the test list, the target user terminal can obtain the first test processing result obtained by adopting the first pre-signal processing strategy from the test list based on the sound quality parameter, and obtain the first test processing result obtained by adopting the first pre-signal processing strategy.
  • the sound quality parameters here may include but are not limited to voice sampling rate and voice number of channels.
  • the target user terminal obtains the terminal type of the terminal to which the service application belongs, and searches for a test type matching the terminal type in the test list associated with the service application. If a test type matching the terminal type is found in
  • the second test processing result obtained by the two pre-signal processing strategies is a pre-signal processing strategy in the application layer of the service application.
  • the second pre-signal processing strategy is the pre-signal processing strategy in the terminal system corresponding to the test terminal type.
  • the target user terminal can determine the optimal signal processing strategy associated with the sound quality parameter from the first pre-signal processing strategy and the second pre-signal processing strategy based on the first test processing result and the second test processing result, And the determined optimal signal processing strategy can be used as the signal processing result associated with the first pre-signal processing strategy.
  • the system resource package of the service application can also be loaded in the target user terminal in advance, and then the system resource package of the service application can be obtained after parsing and processing the system resource package.
  • the service mode of the service application can be initially configured as the system media mode according to the system resource data after initialization.
  • the embodiment of the present application may enter the system media mode by default after completing the initialization processing of the system resource data, so that in the system media mode, the application display interface of the service application can be output according to the system resource data after the initialization processing , so as to output the multimedia data (eg, video frame data and audio frame data, etc.) of the service application in the application display interface.
  • the application display interface may include a voice control for instructing the first user to initiate a voice interaction service, so that when the first user needs to perform voice interaction with other users, he can choose to trigger the currently off state.
  • a voice control so that the target user terminal can respond to the voice activation operation performed by the first user on the voice control, so that the application type of the service application that initiates the voice interaction service can be automatically detected.
  • the target user terminal determines that the application type of the service application that initiates the voice interactive service belongs to the game type, the target user terminal can determine that the current service scene is a game scene, and then the target user terminal can run the game.
  • the business mode of the business application is switched from the system media mode to the game voice mode. For example, in the game scene, a first voice call instruction associated with the game type is generated, and the target user terminal can be called based on the first voice call instruction.
  • the service mode of the service application running in the system is switched from the system media mode to the game voice mode, so that the above-mentioned first type of voice call service can be executed in the game voice mode subsequently.
  • the target user terminal can refine some sound quality parameters associated with the above voice dual-talk requirement according to the sound quality index of the service application.
  • the target user terminal may allow the first user (ie, the user using the target user terminal) to set the voice sampling rate and the number of voice channels corresponding to the target user terminal in the game voice mode.
  • the target user terminal determines that the application type of the business application that currently initiates the voice interactive service belongs to a non-game type, the target user terminal can determine that the current business scene is a non-game scene, and then The service mode of the service application running in the target user terminal can be switched from the system media mode to the system voice mode, for example, a second voice call instruction associated with the non-game type is generated in the non-game scene, and Based on the second voice call instruction, the service mode of the service application running in the target user terminal is switched from the system media mode to the system voice mode, so that the voice can be performed with other users (for example, the above-mentioned second user) in the system voice mode. interaction to perform the above-mentioned second type of voice call service.
  • the game voice mode and the system voice mode provided by the embodiments of the present application are both business modes for providing different types of voice call services in the above-mentioned voice dual-talk scenario.
  • the target user terminal can intelligently enter the above-mentioned game voice mode when the application type is the game type to execute the above-mentioned first type of voice call service.
  • the target user terminal may also intelligently enter the above-mentioned system voice mode when the application type is a non-game type, so as to execute the above-mentioned second type of voice call service.
  • the target user terminal involved in the embodiments of the present application may include, but is not limited to, a mobile terminal having the above voice data processing function. Therefore, the setting of the voice sampling rate corresponding to the target user terminal involved in the embodiment of the present application may mainly include setting the uplink sampling rate and the downlink sampling rate of the terminal.
  • the setting of the number of voice channels of the target user terminal involved in the embodiment of the present application mainly refers to the number of channels for setting the voice. For example, the number of channels may be set to two channels according to the sound quality index of the target user terminal.
  • the voice sampling rate (for example, the uplink sampling rate and the downlink sampling rate) here may be the number of sampling times that the recording component of the target user terminal samples the sound signal in a unit sampling period.
  • the speech adoption rate may include, but is not limited to, 4kHZ, 8kHZ, and 48kHZ. It should be understood that the value of the speech sampling rate can be used to reflect the authenticity and naturalness of the user's voice that can be restored by the recording component.
  • the first user can perform voice interaction through the voice dual-talk service provided by the voice interaction system in the target user terminal. That is, when the microphone in the target user terminal is used to collect the voice signal of the first user (that is, it can be used to collect the voice of the first user), in the game voice mode, according to the above-mentioned uplink sampling rate, the voice signal of the first user can be collected.
  • a user's voice signal is subjected to spectrum analysis to sample the uplink voice data of the first user in the game voice mode.
  • the target user terminal can perform voice optimization on the uplink voice data, and then can send the voice signal of the first user after voice optimization (that is, the voice of the first user after voice optimization) to other communication peers (for example, , the terminal corresponding to the above-mentioned third user), to play the voice of the above-mentioned voice-optimized first user through respective speakers in other communication peers.
  • the target user terminal can also be used to receive the voice-optimized third user's voice signal transmitted by other communication peers, and then can use the voice-optimized third user's voice according to the above downlink adoption rate.
  • the downlink voice data for transmitting to the speaker of the target user terminal can be obtained. In this way, when the downlink voice data is played through the speaker of the target user terminal, it can be as accurate as possible for the first user Restore the voice of the third user after voice optimization.
  • test terminal type in the test list 301a may be a test type corresponding to one or more test terminals. It can be understood that the test terminal types here can include but are not limited to models of one or more brands; optionally, the test terminal types here can also include the system types and system types of the terminal environment system corresponding to these models. version etc.
  • the business application when the developer corresponding to the business application develops the business application with the above-mentioned game voice mode, the business application can be integrated and installed in the test terminal corresponding to each known model used for testing in advance.
  • pre-signal processing strategies for example, the first pre-signal processing strategy in the above-mentioned application layer and the second pre-signal processing strategy in the above-mentioned terminal system layer) strategy
  • performance testing to test to obtain the optimized performance of each voice optimization component in the application layer of the same known model (that is, the same test type) under specific sound quality parameters, and the voice optimization of the corresponding function in the terminal system layer Optimized performance of components.
  • n is a positive integer
  • models under this brand can be the test types T1, . . . , and the test types Tn shown in FIG. 6 .
  • the test type T1 can be the model 1 of the brand A
  • the test type T2 can be the model 2 of the brand A
  • the test type Tn can be the model n of the brand A.
  • the developer can set the sound quality parameter as the sound quality parameter D1.
  • the sampling rate of the upstream voice is 8kHz
  • the sampling rate of the downstream voice is 8kHz
  • the number of mono channels such as left channels use the first pre-signal processing strategy in the application layer and the second in the terminal system layer.
  • the pre-signal processing strategy is used to test the voice test effect of the test terminal whose model is the test type T1 when the sound quality parameter is D1.
  • each speech optimization component in the above-mentioned application layer for example, the first echo cancellation component for echo cancellation, the first noise suppression component for noise suppression, and the The first optimization component such as the first gain control component that performs gain adjustment
  • performs test optimization on the uplink voice data for example, the uplink voice data R1 used for performance testing
  • the obtained test processing result can be the same as shown in FIG. 6 .
  • the test processing result of the application layer associated with the sound quality parameter D1.
  • test processing result obtained by the control component and other second optimization components testing and optimizing the same uplink voice data R1 may be the test processing result of the terminal system layer associated with the sound quality parameter D1 shown in FIG. 6 .
  • test processing result corresponding to the first echo cancellation component in the application layer may be the first test result 31a shown in FIG.
  • a voice optimization component with the same optimization function of an echo cancellation component can be the above-mentioned second echo cancellation component.
  • the test processing result obtained by using the second echo cancellation component to perform echo cancellation on the uplink voice data R1 can be It is the second test result 31b shown in FIG. 6 .
  • the test processing result corresponding to the first noise suppression component in the application layer may be the first test result 32a shown in FIG. 6; at this time, the first noise suppression component in the terminal system layer has the same optimization as the aforementioned first noise suppression component
  • the functional voice optimization component can be the above-mentioned second noise suppression component.
  • the test processing result obtained after using the second noise suppression component to perform noise suppression on the uplink voice data R1 can be the first noise suppression component shown in FIG. 6 .
  • Two test results 32b Two test results 32b.
  • the test processing result corresponding to the first gain control component in the application layer may be the first test result 33a shown in FIG. 6 .
  • the voice optimization component in the terminal system layer that has the same optimization function as the first gain control component may be the second gain control component.
  • the test processing result obtained after the gain adjustment is performed by R1 may be the second test result 33b shown in FIG. 6 .
  • the developer can also set the sound quality parameter as the sound quality parameter D2 (for example, the sampling rate of the upstream voice is 8kHz, the sampling rate of the downstream voice is 16kHz, the number of mono channels such as the left channel, etc. ) situation, use the first pre-signal processing strategy in the application layer and the second pre-signal processing strategy in the terminal system layer, the test obtains that this model is another test terminal of the test type Tn in this sound quality parameter D2 voice test effect.
  • the sound quality parameter D2 for example, the sampling rate of the upstream voice is 8kHz, the sampling rate of the downstream voice is 16kHz, the number of mono channels such as the left channel, etc.
  • each speech optimization component in the above-mentioned application layer for example, the first echo cancellation component for echo cancellation, the first noise suppression component for noise suppression, and the
  • the test processing result obtained by performing test optimization on another uplink voice data for example, uplink voice data R2 used for performance testing
  • the first optimization component such as the first gain control component for performing gain adjustment
  • the test processing results of the application layer associated with the sound quality parameter D2 are shown.
  • test processing result obtained by the control component and other second optimization components may be the test processing result of the terminal system layer associated with the sound quality parameter D2 shown in FIG. 6 .
  • test processing result corresponding to the first echo cancellation component in the application layer may be the first test result 34a shown in FIG. 6 ;
  • the voice optimization component with the same optimization function as the aforementioned first echo cancellation component can be the above-mentioned second echo cancellation component.
  • the test obtained after performing echo cancellation on the uplink voice data R2 using the second echo cancellation component The processing result may be the second test result 34b shown in FIG. 6 .
  • the test processing result corresponding to the first noise suppression component in the application layer may be the first test result 35a shown in FIG. 6; at this time, the first noise suppression component in the terminal system layer has the same optimization as the aforementioned first noise suppression component
  • the functional voice optimization component can be the above-mentioned second noise suppression component.
  • the test processing result obtained after using the second noise suppression component to perform noise suppression on the uplink voice data R1 can be the first noise suppression component shown in FIG. 6 .
  • Two test results 35b Two test results 35b.
  • the test processing result corresponding to the first gain control component in the application layer may be the first test result 36a shown in FIG. 6 .
  • the voice optimization component in the terminal system layer that has the same optimization function as the aforementioned first gain control component may be the aforementioned second gain control component.
  • the test processing result obtained after the gain adjustment of R2 is performed may be the second test result 36b shown in FIG. 6 .
  • the target user terminal when the target user terminal performs performance testing on the first optimization component in the application layer and the second optimization component with the same optimization function in the terminal system layer, it can be pre-tested that each known model has different performances. Test processing results under the sound quality parameters. Then, the developer can construct the test list 301a shown in FIG. 6 according to the above-mentioned test terminal type, sound quality parameter, application layer test processing result and terminal system layer test processing result. In this way, when the first user needs to perform the above-mentioned voice interaction service with other users under the above-mentioned game voice model, he can intelligently quickly enter the test list 301 according to the terminal type of the terminal to which the service application currently belongs (that is, the above-mentioned target user terminal).
  • the target user terminal can quickly start from the test based on the sound quality parameter set by the current user (that is, the above-mentioned first user) according to the sound quality index of the service application (for example, the sound quality parameter set by the first user is the above-mentioned sound quality parameter D1).
  • the first test processing result obtained by adopting the above-mentioned first pre-signal processing strategy and the second test processing result obtained by adopting the above-mentioned second pre-signal processing strategy are obtained.
  • the target user terminal can quickly change from the above-mentioned first pre-signal according to the voice test results of the voice optimization components with the same optimization function.
  • the first test processing result here may specifically include the first echo cancellation component (That is, the first test result 31a corresponding to the AEC component in the application layer), the first test result 32a corresponding to the above-mentioned first noise suppression component (that is, the NS component in the application layer), and the above-mentioned first gain control component (application layer).
  • the first test result 33a corresponding to the AGC component in the layer.
  • the second test processing result here may specifically include the second test result 31b corresponding to the above-mentioned second echo cancellation component (that is, the AEC component in the terminal system layer), the second test result 31b corresponding to the above-mentioned second noise suppression component (that is, the AEC component in the terminal system layer).
  • the sound quality parameter set by the first user according to the sound quality index is the other sound quality parameter (for example, the sound quality parameter D2) shown in the above-mentioned FIG.
  • the first test processing result obtained by the pre-signal processing strategy and the second test processing result obtained by using the second pre-signal processing strategy are listed one by one here.
  • Step S102 the application layer controls the switch state of the second optimization component in the second pre-signal processing strategy in the terminal system layer, or controls the switching state of the first optimization component in the first pre-signal processing strategy. switch status;
  • the application layer controls to open and close the second optimization component in the second pre-signal processing strategy in the terminal system layer, or control to open and close the first pre-signal processing strategy.
  • An optimization component wherein, the first pre-signal processing strategy includes at least one first optimization component, and the second pre-signal processing strategy includes at least one second optimization component.
  • the number of the first optimization components included in the first pre-signal processing strategy and the number of the second optimization components included in the second pre-signal processing strategy are the same, eg, three.
  • each first optimization component in the first pre-signal processing strategy has a second optimization component with the same optimization function in the second pre-signal processing strategy; correspondingly, the second pre-signal
  • Each second optimization component in the processing strategy has a first optimization component with the same optimization function in the first pre-signal processing strategy.
  • the first optimization component enabled in the first pre-signal processing strategy is different from the second optimization component enabled in the second pre-signal processing strategy.
  • the first optimization component opened in the first pre-signal processing strategy and the voice optimization component closed in the second pre-signal processing strategy have the same optimization function, and the second pre-signal processing strategy is opened.
  • the optimization component has the same optimization function as the first optimization component turned off in the first pre-signal processing strategy.
  • the target user terminal determines, according to the signal processing result, the second optimization component that is enabled in the second pre-signal processing strategy, and the second optimization component that is disabled in the second pre-signal processing strategy.
  • the target user terminal can start a coordination mechanism between the application layer and the terminal system layer of the terminal to which the service application belongs according to the aforementioned signal processing results, and then can control the application layer to open and close the second front-end system layer in the terminal system layer based on the coordination mechanism.
  • the target user terminal may use the second optimization component that is turned off in the second pre-signal processing strategy as the first cooperative component, and enable the first cooperative component in the first pre-signal processing strategy.
  • the target user terminal may use the second optimization component enabled in the second pre-signal processing strategy as the second coordination component, and disable the second optimization component in the first pre-signal processing strategy with the second coordination component.
  • the first optimization component of the same optimization function may be used in the application layer.
  • the speech optimization algorithm of the first optimization component in the above-mentioned first pre-signal processing strategy may include at least one of the following: a first echo cancellation algorithm for performing echo cancellation at the application layer (the first echo cancellation algorithm).
  • the first optimization component corresponding to the algorithm is the above-mentioned first echo cancellation component
  • the first noise suppression algorithm for noise suppression at the application layer is the above-mentioned first noise suppression component
  • a first gain control algorithm for performing gain adjustment at the application layer is the above-mentioned first gain control component.
  • the speech optimization algorithm of the second optimization component in the above-mentioned second pre-signal processing strategy may include at least one of the following: a second echo cancellation algorithm for performing echo cancellation at the terminal system layer (the second echo cancellation algorithm).
  • the corresponding second optimization component is the above-mentioned second echo cancellation component
  • a second noise suppression algorithm for noise suppression at the terminal system layer is the above-mentioned second noise suppression component
  • a second gain control algorithm for performing gain adjustment at the terminal system layer is the above-mentioned second gain control component.
  • the signal processing result obtained by the target user terminal may be obtained by the following steps: obtaining the first echo cancellation result corresponding to the first echo cancellation algorithm from the first test processing result, and obtaining the first echo cancellation result corresponding to the first echo cancellation algorithm from the first test processing result;
  • the second echo cancellation result corresponding to the second echo cancellation algorithm is obtained from the processing result, and then the optimal echo can be selected from the first echo cancellation algorithm and the second echo cancellation algorithm based on the first echo cancellation result and the second echo cancellation result Therefore, the optimal echo cancellation algorithm can be used as the first optimal signal processing strategy associated with the sound quality parameter.
  • the target user terminal may also obtain the first noise suppression result corresponding to the first noise suppression algorithm from the first test processing result, and obtain the second noise suppression result corresponding to the second noise suppression algorithm from the second test processing result.
  • the suppression result, and then the optimal noise suppression algorithm can be selected from the first noise suppression algorithm and the second noise suppression algorithm based on the first noise suppression result and the second noise suppression result, so that the optimal noise suppression algorithm can be used as the sound quality parameter.
  • the associated second optimal signal processing strategy can obtain the first gain control result corresponding to the first gain control algorithm from the first test processing result, and obtain the second gain control result corresponding to the second gain control algorithm from the second test processing result.
  • an optimal gain control algorithm can be selected from the first gain control algorithm and the second gain control algorithm, and then the optimal gain control algorithm can be used as a parameter related to the sound quality.
  • the target user terminal may determine the first optimal signal processing strategy, the second optimal signal processing strategy and the third optimal signal processing strategy as the signal processing results associated with the first pre-signal processing strategy.
  • FIG. 7 is a schematic diagram of a scenario for determining an optimal signal processing strategy associated with a sound quality parameter provided by an embodiment of the present application.
  • the first test processing result 401a shown in FIG. 7 may be the test processing result of the application layer associated with the sound quality parameter D1 in the embodiment corresponding to FIG. 6 (that is, the first test result associated with the sound quality parameter D1) process result).
  • the test result 41a in the first test processing result 401a may be the first test result 31a in the above-mentioned embodiment corresponding to FIG. 6 , that is, the test result 41a shown in FIG. 7 may be obtained from the first test processing result 401a The acquired first echo cancellation result corresponding to the first echo cancellation algorithm.
  • the test result 42a in the first test processing result 401a may be the first test result 32a in the embodiment corresponding to FIG. 6, that is, the test result 42a shown in FIG. 7 may be obtained from the first test processing result 401a The acquired first noise suppression result corresponding to the first noise suppression algorithm.
  • the test result 43a in the first test processing result 401a may be the first test result 33a in the above-mentioned embodiment corresponding to FIG. 6 , that is, the test result 43a shown in FIG. 7 may be obtained from the first test processing result 401a The obtained first gain control result corresponding to the first gain control algorithm.
  • the second test processing result 401b shown in FIG. 7 may be the test processing result of the terminal system layer associated with the sound quality parameter D1 in the embodiment corresponding to FIG. 6 (that is, the second test processing result associated with the sound quality parameter D1 ) result).
  • the test result 41b in the second test processing result 401b may be the second test result 31b in the embodiment corresponding to FIG. 6 , that is, the test result 41b shown in FIG. 7 may be obtained from the second test processing result 401b The acquired second echo cancellation result corresponding to the second echo cancellation algorithm.
  • the test result 42b in the second test processing result 401b may be the second test result 32b in the embodiment corresponding to FIG. 6, that is, the test result 42b shown in FIG.
  • test result 43b in the second test processing result 401b may be the second test result 33b in the embodiment corresponding to FIG. 6, that is, the test result 43b shown in FIG. 7 may be obtained from the second test processing result 401b The acquired second gain control result corresponding to the second gain control algorithm.
  • the target user terminal determines, according to the first echo cancellation result (for example, the test result 41a shown in FIG. 7 above) and the second echo cancellation result (for example, the test result 41b shown in FIG.
  • the specific process of the first optimal signal processing strategy can be described as follows: the target user terminal can obtain the first echo cancellation result corresponding to the first echo cancellation algorithm from the first test processing result, and obtain the first echo cancellation result from the second test processing result.
  • the target user terminal can determine the voice test effect of the first optimization component and the second optimization component with the same optimization function according to the test result 41a and the test result 41b. For example, by comparing the voice test effect V11 of the first echo cancellation component in the application layer at the application layer and the voice test effect V12 of the second echo cancellation component in the terminal system layer at the terminal system layer, it can be determined whether the test result 41a is Better than test result 41b. In this way, if the first comparison result shown in FIG.
  • the test result 41a is better than the test result 41b
  • the optimized quality corresponding to the aforementioned first echo cancellation result is better than the optimized quality corresponding to the aforementioned second echo cancellation result
  • the The first echo cancellation algorithm in the first pre-signal processing strategy is used as the first optimal signal processing strategy associated with the sound quality parameter; on the contrary, if the first comparison result shown in FIG. 7 indicates that the test result 41b is better than the test result 41a , it indicates that the optimized quality corresponding to the second echo cancellation result is better than the optimized quality corresponding to the first echo cancellation result, and then the second echo cancellation algorithm in the second pre-signal processing strategy can be used as the first echo cancellation algorithm associated with the sound quality parameter.
  • Optimal signal processing strategy is used as the first optimal signal processing strategy associated with the sound quality parameter.
  • test result 41a is the same as the test result 41b
  • the first echo cancellation algorithm in the first pre-signal processing strategy or the second echo cancellation algorithm in the second pre-signal processing strategy can be used as The first optimal signal processing strategy.
  • the target user terminal determines, according to the first noise suppression result (for example, the test result 42a shown in FIG. 7 above) and the second noise suppression result (for example, the test result 42b shown in FIG. 7 ).
  • the specific process of the second optimal signal processing strategy can be described as follows: the target user terminal can obtain the first noise suppression result corresponding to the first noise suppression algorithm from the first test processing result, and obtain the first noise suppression result from the second test processing result. the second noise suppression result corresponding to the second noise suppression algorithm; further, the target user terminal may perform a second comparison between the optimized quality corresponding to the first noise suppression result and the optimized quality corresponding to the second noise suppression result to obtain a second comparison result . It can be understood that, as shown in FIG.
  • the target user terminal can determine the voice test effect of each voice optimization component with the same optimization function according to the test result 42a and the test result 42b. For example, by comparing the voice test effect V21 of the first noise suppression component in the application layer at the application layer and the voice test effect V22 of the second noise suppression component in the terminal system layer at the terminal system layer, it can be determined that the test result 42a Whether it is better than the test result 42b; in this way, if the second comparison result shown in FIG.
  • the test result 42a is better than the test result 42b, it indicates that the optimized quality corresponding to the aforementioned first noise suppression result is better than the aforementioned second noise suppression result corresponding to the optimized quality, and then the first noise suppression algorithm in the first pre-signal processing strategy can be used as the second optimal signal processing strategy associated with the sound quality parameter; on the contrary, if the second comparison result shown in FIG. 7 indicates the test The result 42b is better than the test result 42a, which indicates that the optimized quality corresponding to the second noise suppression result is better than the optimized quality corresponding to the first noise suppression result, then the target user terminal can The suppression algorithm acts as the second optimal signal processing strategy associated with the sound quality parameters. Similarly, optionally, if the test result 42a is the same as the test result 42b, the first noise suppression algorithm in the first pre-signal processing strategy or the second noise suppression algorithm in the second pre-signal processing strategy may be used as The second optimal signal processing strategy.
  • the target user terminal determines
  • the specific process of the third optimal signal processing strategy can be described as follows: the target user terminal can obtain the first gain control result corresponding to the first gain control algorithm from the first test processing result, and obtain the second gain control result from the second test processing result.
  • the target user terminal can determine the voice test effect of each voice optimization component with the same optimization function according to the test result 43a and the test result 43b. For example, by comparing the voice test effect V31 of the first gain control component in the application layer at the application layer and the voice test effect V32 of the second gain control component in the terminal system layer at the terminal system layer, it can be determined whether the test result 43a is is better than the test result 43b; in this way, if the third comparison result shown in FIG.
  • the test result 43a is better than the test result 43b, it indicates that the optimization quality corresponding to the aforementioned first gain control result is better than that corresponding to the aforementioned second gain control result Optimize the quality, and then the first gain control algorithm in the first pre-signal processing strategy can be used as the third optimal signal processing strategy associated with the sound quality parameter;
  • the second comparison result shown in FIG. 7 indicates the test result 43b is better than the test result 43a, then it shows that the optimized quality corresponding to the second gain control result is better than the optimized quality corresponding to the first gain control result, and then the second gain control algorithm in the first pre-signal processing strategy can be used as the sound quality.
  • the third optimal signal processing strategy associated with the parameter Similarly, optionally, if the test result 43a is the same as the test result 43b, the first gain control algorithm in the first pre-signal processing strategy or the second gain control algorithm in the second pre-signal processing strategy can be used as The third optimal signal processing strategy.
  • the target user terminal can determine the current terminal type. It belongs to a new model, so that when the upstream voice data (for example, the above-mentioned voice data R3) of the first user can be obtained through the microphone in the game voice mode, the upstream voice data (for example, the above-mentioned voice data R3) can be further processed by the first pre-signal Above-mentioned voice data R3) carries out voice optimization (that is, carries out real-time voice optimization), to obtain the first voice optimization result in real time, and can carry out voice optimization to uplink voice data (for example, above-mentioned voice data R3) by the second pre-signal processing strategy (that is, performing real-time voice optimization), to obtain the second voice optimization result in real time; further, the target user terminal may, based on the first voice optimization result and the second voice optimization result, analyze
  • the target user terminal when it determines that its own model does not belong to a new model, it can use the voice optimization controls in the application layer in the above-mentioned game scenario to perform real-time acquisition of the first user.
  • Real-time voice optimization is performed on the above-mentioned voice data, and then the first voice optimization result corresponding to each voice optimization control in the application layer can be obtained.
  • the target user terminal can also perform real-time voice optimization on the above-mentioned voice data of the first user obtained in real time through each voice optimization control in the terminal system layer in the above-mentioned game scenario, and then can obtain the voice data in the terminal system layer.
  • the second voice optimization result corresponding to each voice optimization control wherein, the specific implementation method of comparing the voice optimization effects of the voice optimization components with the same optimization function by the target user terminal can refer to the above description of the voice test effects of the voice optimization components with the same optimization function, which will not be continued here. Repeat.
  • the first optimization component in the first pre-signal processing strategy may include at least one of the following: the above-mentioned first echo cancellation component, the above-mentioned first noise suppression component, and the above-mentioned first gain control component.
  • the second optimization component in the second pre-signal processing strategy may include at least one of the following: the above-mentioned second echo cancellation component, the above-mentioned second noise suppression component, and the above-mentioned second gain control component.
  • both the first echo cancellation component and the second echo cancellation component can be used for echo cancellation
  • both the first noise suppression component and the second noise suppression component can be used for noise suppression
  • the first gain control component and the second gain control Components can be used for gain adjustment.
  • the embodiment of the present application proposes that in the game voice mode, a corresponding switch can be provided for the application layer to control each part of the pre-signal processing scheme. (that is, each voice optimization component) is turned on and off to ensure that the voice optimization components with the same optimization function either run at the application layer or at the terminal system layer, so that real-time voice optimization in game scenarios (ie real-time human In the process of voice optimization), the performance consumption of the entire human voice optimization process can be reduced, and then the voice interaction experience in the game scene can be prompted.
  • the embodiment of the present application can also be in the game voice mode, can avoid terminal system resources (for example, , the waste of computing resources of CPU (Central Processing Unit, central processing unit), so that the power consumption of the terminal can be effectively saved.
  • FIG. 8 is a schematic diagram of a scenario for controlling each voice optimization component in a voice pre-signal processing solution provided by an embodiment of the present application to be turned on and off.
  • the voice pre-signal processing solution here can be the related processing performed by the target user terminal to improve the clarity and loudness of the uplink voice data, for example, the related processing can include echo cancellation, noise suppression, automatic gain, etc.
  • the voice pre-signal processing scheme including the above-mentioned first pre-signal processing strategy and the above-mentioned second pre-signal processing strategy is taken as an example to illustrate the control of each voice in the voice pre-signal processing scheme in the application layer. Optimize the specific process of opening and closing components.
  • the application layer 601a shown in FIG. 8 may be the application layer of the above-mentioned service application, and the voice pre-signal processing scheme corresponding to the application layer 601a may be the above-mentioned first pre-signal processing strategy.
  • the first pre-signal The first optimization component in the processing strategy at least includes: the speech optimization component 61a, the speech optimization component 62a, and the speech optimization component 63a shown in FIG. 8 .
  • the voice optimization component 61a shown in FIG. 8 may be the above-mentioned first echo cancellation component for performing echo cancellation; similarly, the voice optimization component 62a shown in FIG. 8 may be the above-mentioned first echo cancellation component for performing noise suppression.
  • the first noise suppression component; similarly, the speech optimization component 63a shown in FIG. 8 may be the above-mentioned first gain control component for performing gain adjustment.
  • the terminal system layer 602a shown in FIG. 8 may be the bottom system layer of the terminal to which the above-mentioned service application belongs (ie, the above-mentioned target user terminal), and the voice preamble signal processing scheme corresponding to the terminal system layer 602a may be the above-mentioned second preamble Signal processing strategy, in this way, the second optimization component in the second pre-signal processing strategy includes at least: speech optimization component 61b, speech optimization component 62b, and speech optimization component 63b shown in FIG. 8 .
  • the voice optimization component 61b shown in FIG. 8 may be the above-mentioned second echo cancellation component for performing echo cancellation; similarly, the voice optimization component 62b shown in FIG. 8 may be the above-mentioned second echo cancellation component for performing noise suppression.
  • the second noise suppression component; similarly, the speech optimization component 63b shown in FIG. 8 may be the above-mentioned second gain control component for performing gain adjustment.
  • a corresponding switch may be provided in the application layer 601a shown in FIG. 8 to help the application layer 601a control the terminal shown in FIG. 8 . Turning on and off of each speech optimization component in the system layer 602a.
  • the switch K11 in the application layer 601a shown in FIG. 8 can be used to control the voice optimization component 61a shown in FIG. 8, and the switch K12 in the application layer can be used to control the voice in the end system layer 602a shown in FIG. 8 Optimize component 61b. It can be understood that since the voice optimization component 61a in the application layer 601a shown in FIG. 8 has the same optimization function as the voice optimization component 61b in the terminal system layer 602a shown in FIG.
  • the coordination mechanism (also referred to as a negotiation mechanism) between the application layer 601a and the terminal system layer 602a, select whether to control the opening (or closing) of the second pre-signal processing strategy in the terminal system layer 602a in the application layer 601a
  • the speech optimization component in 61b For example, as shown in FIG.
  • the target user terminal can control and enable the voice optimization component 61a in the first pre-signal processing strategy in the application layer 601a, that is, the target user terminal can generate a switch K11 for controlling the service switch 64a to be turned off, And turn off the first control instruction of the switch K12, at this time, the first control instruction can be used to instruct the target user terminal to turn off the second optimization component in the second pre-signal processing strategy (for example, the voice optimization in FIG. 8 ).
  • the component 61b) is used as the first cooperative component, and a first optimization component (for example, the speech optimization component 61a shown in FIG. 8 ) having the same optimization function as the first cooperative component can be enabled in the first pre-signal processing strategy.
  • the switch K21 in the application layer 601a can be used to control the voice optimization component 62a shown in FIG. 8
  • the switch K22 in the application layer can be used to control the voice optimization component 62b in the terminal system layer 602a shown in FIG. 8 . It can be understood that, since the voice optimization component 62a in the application layer 601a shown in FIG. 8 has the same optimization function as the voice optimization component 62b in the terminal system layer 602a shown in FIG.
  • the coordination mechanism (also referred to as a negotiation mechanism) between the application layer 601a and the terminal system layer 602a, select whether to control the opening (or closing) of the second pre-signal processing strategy in the terminal system layer 602a in the application layer 601a
  • the speech optimization component 62b in .
  • the target user terminal may control the voice optimization component 62a to enable the second pre-signal processing strategy in the application layer 601a, that is, the target user terminal may generate a switch K22 for controlling the service switch 64b to be turned off, and Disconnect the second control instruction of the switch K21, at this time, the second control instruction can be used to instruct the target user terminal to turn on the second optimization component (for example, the voice optimization component of FIG.
  • the first optimization component for example, the speech optimization component 62a shown in FIG. 8 ) that has the same optimization function as the second coordination component can be turned off in the first pre-signal processing strategy.
  • the switch K31 in the application layer 601a can be used to control the voice optimization component 63a shown in FIG. 8
  • the switch K32 in the application layer can be used to control the voice optimization component 63b in the terminal system layer 602a shown in FIG. 8
  • the coordination mechanism also referred to as a negotiation mechanism
  • the target user terminal can generate a third control instruction for controlling the service switch 64c to turn off the switch K31 and turn off the switch K32.
  • the target user terminal uses the second optimization component (for example, the voice optimization component 63b in FIG. 8 ) that is turned off in the second pre-signal processing strategy as a new first cooperative component, and can be used in the first pre-signal processing strategy
  • the first optimization component for example, the speech optimization component 63a shown in FIG. 8
  • the first optimization component that has the same optimization function as the new first coordination component is enabled in .
  • Step S103 acquiring the uplink voice data of the first user corresponding to the service application in the game voice mode, based on the first optimization component enabled in the first pre-signal processing strategy and the second optimization enabled in the second pre-signal processing strategy A component that optimizes the upstream voice data in the game voice mode.
  • the target user terminal can further collect the data obtained in real time under the game scene based on the first optimization component enabled in the above-mentioned first pre-signal processing strategy and the second optimization component enabled in the above-mentioned second pre-signal processing strategy.
  • the voice optimization is performed on the uplink voice data of the first user of the target user terminal to ensure the clarity and loudness of the uplink voice data currently entered into the target user terminal. In this way, when the target user terminal is in the game voice mode, the voice of the first user with higher clarity and loudness can be transmitted to the communication peer (ie the terminal corresponding to the third user). In this way, the downlink voice data played by the speaker of the communication opposite end may be the voice of the first user after voice optimization processing.
  • a computer device may acquire the signal processing result associated with the first pre-signal processing strategy in the application layer of the service application in the game voice mode ; wherein, it can be understood that each first optimization component in the first pre-signal processing strategy has the same optimization function as the corresponding second optimization component in the second pre-signal processing strategy. Therefore, in the process of subsequent game real-time voice processing (ie, voice optimization of uplink voice data), the phenomenon of repeated operation of voice optimization components with the same function can be effectively solved in the game voice mode.
  • game real-time voice processing ie, voice optimization of uplink voice data
  • the embodiment of the present application proposes that one or more second optimization components in the terminal system layer can be controlled to be turned on or off at the application layer according to the aforementioned signal processing results (that is, the algorithm comparison results corresponding to the speech optimization components with the same function). , so that the voice optimization components with the same function can run either at the game application layer or at the terminal system layer, so that the sound quality damage of the uplink voice data can be reduced from the root. It can be understood that the number and type of the second optimization components that are enabled or disabled in the end system layer will not be limited here.
  • the computer device when it acquires the uplink voice data of the first user in the game voice mode, it can quickly perform a voice call on the collected uplink voice data based on the first optimization component that is turned on and the second optimization component that is turned on. optimization, which can improve the voice optimization effect in the game scene while reducing the sound quality damage.
  • FIG. 9 is a schematic diagram of an audio data processing method provided by an embodiment of the present application.
  • the method may be executed by a user terminal (for example, a target user terminal, the target user terminal may be the user terminal 3000a shown in FIG. 1 ), and the method may specifically include at least one of the following steps S201 to S213 One step:
  • Step S201 when the first user accesses the business application, obtains a system resource package for loading the business application, performs parsing processing on the system resource package, and obtains system resource data of the business application;
  • Step S202 initialize the system resource data, and initially configure the service mode of the service application as the system media mode based on the system resource data after the initialization process.
  • FIG. 10 is a schematic diagram of a scenario of a resource configuration interface provided by an embodiment of the present application. It can be understood that, in a game scenario, the game user A shown in FIG. 10 may be the user 1 in the embodiment corresponding to FIG. 4 above.
  • the system resource package for loading the service application can be obtained from the service server shown in FIG. 10 , and then The obtained system resource package can be parsed and processed by the encoder in the target user terminal to obtain the system resource data of the service application. Further, the target user terminal can also be used to perform initialization processing on the system resource data, and then can output the resource configuration interface in FIG. 10 based on the system resource data after initialization processing. As shown in FIG. 10 , the resource configuration interface can be used with In order to dynamically output the multimedia data in the system resource data after initialization processing, the multimedia data here may include but not limited to the image frames and audio frames shown in FIG. 10 .
  • this embodiment of the present application may initially configure the service mode of the service application as the system media mode based on the system resource data after initialization processing, so that FIG. 10 can be played through the speaker in the resource configuration interface shown in FIG. 10 .
  • the shown media audio data ie, the aforementioned audio frame data and video frame data.
  • the target user terminal can also perform the following step S103, and then the display interface of the service application can be switched from the resource configuration interface 800a shown in FIG. 10 to an application including a voice control. UI.
  • the service mode of the service application can be switched from the current system media mode to the above-mentioned game voice mode, so that in the game voice mode Interact with voice.
  • Step S203 outputting the application display interface of the service application based on the system resource data after the initialization process
  • the application display interface includes a voice control for instructing the first user to initiate a voice interaction service.
  • Step S204 detecting the application type of the service application in response to the first user's voice-on operation for the voice control
  • Step S205 when detecting that the application type of the business application is a game type, generate a first voice call instruction associated with the game type, and then switch the business mode of the business application from the system media mode to the game based on the first voice call instruction voice mode.
  • the target user terminal may directly switch the service mode of the service application from the system media mode to the game voice mode when detecting that the application type of the service application is a game type.
  • Step S206 in the game voice mode, obtain the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application;
  • Step S207 the application layer controls to open and close the second optimization component in the second pre-signal processing strategy in the terminal system layer, or control to open and close the first pre-signal processing strategy. optimize components;
  • Step S208 acquiring the uplink voice data of the first user corresponding to the service application in the game voice mode, based on the first optimization component enabled in the first pre-signal processing strategy and the second optimization enabled in the second pre-signal processing strategy A component that optimizes the upstream voice data in the game voice mode.
  • step S205-step S208 For the specific implementation manner of step S205-step S208, reference may be made to the description of step S101-step S104 in the embodiment corresponding to FIG. 5, which will not be repeated here.
  • Step S209 using the uplink voice data after voice optimization as the target voice optimization result corresponding to the uplink voice data;
  • Step S210 the target voice optimization result is sent to the terminal corresponding to the third user associated with the first user, so that the terminal corresponding to the third user plays the voice-optimized uplink voice data through the speaker in the game voice mode;
  • both the first user and the third user are game users in the same game camp in the game voice mode.
  • the target user terminal detects that the application type of the currently running business application belongs to a non-game type, it can also jump to execute the following step S211- Step S213, so that the above-mentioned first user can make a system call with other users (eg, the second user) in the system call mode.
  • Step S211 when detecting that the application type of the business application is a non-game type, generate a second voice call instruction associated with the game type, and switch the business mode of the business application from the system media mode to the system call based on the second voice call instruction model;
  • Step S212 when determining the call type of the voice interactive service as the system call type based on the system call mode, sending a system call request corresponding to the system call type to the second user through the service application;
  • the second user is a user selected by the first user in the service application and requesting to make a system call
  • Step S213 when the second user responds to the system call request, establish a system communication channel between the first user and the second user, and conduct a system call based on the system communication channel.
  • step S1 shown in FIG. 11 may be executed to initialize system resources.
  • the target user terminal may analyze the above
  • step S2 shown in FIG. 11 can be performed according to the system resource data after the initialization process, so that the target user terminal enters the system media mode by default.
  • the target user terminal can use the service application.
  • the initial configuration of the business mode is the system media mode.
  • step S3 shown in FIG. 11 may be performed to initiate a voice call at the application layer of the target user terminal.
  • the target user terminal may execute step S4 shown in FIG. 11 to determine the application type of the service application that initiates the voice call. If the application type of the service application is a game type, step S5 shown in FIG. 11 may be executed. , to enter the game voice mode, that is, the first user can make a voice call with other users (for example, the above-mentioned third user) in the game scene in the system call mode. Otherwise, as shown in FIG. 11 , step S11 shown in FIG. 11 may be executed to enter the system call mode, that is, the first user can play non-games with other users (for example, the above-mentioned second user) in the system call mode. System calls in the scenario.
  • the target user terminal may further perform step S6, to set the voice sampling rate of the terminal in the target user terminal (for example, set the uplink and downlink as shown in FIG. (to ensure the sampling rate) and the number of channels (to ensure the voice quality of the uplink and downlink), the voice adoption rate and the number of channels here can be the above sound quality parameters.
  • the target user terminal can further perform step S7, that is, the target user terminal can start the voice pre-voice processing algorithm of the application layer according to the above-mentioned algorithm comparison effect, and close the pre-voice of the terminal system layer. processing algorithm.
  • the target user terminal may also turn off the pre-voice processing algorithm of the application layer while enabling the pre-voice processing algorithm of the terminal system layer.
  • the voice optimization component with the same optimization function in the target user terminal works either at the application layer or at the terminal system layer, that is, the embodiment of the present application can ensure that the first optimization component at the application layer and the terminal as much as possible
  • the embodiment of the present application can ensure that the first optimization component at the application layer and the terminal as much as possible
  • the second optimization components with the same optimization function at the system layer only one voice processing algorithm of the voice optimization component is working at the same time, so that the power consumption can be reduced to the greatest extent, and the best voice quality effect can be provided.
  • a multi-terminal game voice call can be made in the game voice scene, that is, in the game voice
  • the target user terminal can perform optimization processing on the uplink voice data of the first user collected in real time through the first optimization component and the second optimization component determined by the above negotiation, and then the optimized first user can be processed. sound to other users.
  • step S9 shown in FIG.
  • the business mode of the business application can be switched from the aforementioned game voice mode back to the system media mode shown in FIG. 11 .
  • the optimized-processed voices of other users transmitted by the terminals corresponding to other users may be played through the system media mode.
  • the first user corresponding to the target user terminal may In the system media mode, the voices of other users (that is, the above-mentioned third user) after the optimization process are heard.
  • the first user can turn off the voice control without continuing to perform voice optimization on the uplink voice data of the first user, that is, the first user does not need to send optimization processing to other users in the game scene at this time. After the first user's voice.
  • step S10 shown in FIG. 11 may be executed to exit the current game system.
  • the target user terminal may release the relevant system resource data.
  • the target user terminal when the first user listens to music in the target user terminal, the target user terminal can work in the above-mentioned system media mode, and when the first user makes a phone call in the target user terminal, the target user terminal It can work in the above-mentioned system call mode; optionally, when the first user makes a game voice in the target user terminal, the target user terminal can work in the above-mentioned game voice mode.
  • the voice interaction system involved in the embodiments of the present application may include the following two modules.
  • One module is the game voice mode in the target user terminal, which may exist in parallel with the aforementioned system call mode and system media mode in the target user terminal. .
  • the voice uplink and downlink voice sampling rates configured based on the sound quality index of the target user terminal do not affect each other in the number of channels.
  • Another module is the pre-signal processing scheme running at the application layer.
  • the target user terminal can intelligently adjust the pre-signal processing scheme of the application layer according to the voice processing effect of the terminal system layer. In this way, through the cooperative work of the two modules, the target user terminal can improve the voice interaction experience between game users in the game scene.
  • the computer device enters the game voice mode when detecting that the application type of the business application is the game type, and then can adaptively according to the aforementioned signal processing results in the game voice mode. (that is, the comparison results of the algorithms corresponding to the speech optimization components with the same function), control the opening or closing of one or more second optimization components in the terminal system layer at the application layer, so that the speech optimization components with the same optimization function can be either It runs at the game application layer or at the terminal system layer, so that the sound quality damage of the uplink voice data can be reduced from the root. It can be understood that the number and type of the second optimization components that are enabled or disabled in the end system layer will not be limited here.
  • the computer device (for example, the target user terminal) can also, when acquiring the uplink voice data of the first user in the game voice mode, quickly based on the first optimization component that is turned on and the second optimization component that is turned on, for the game voice.
  • the uplink voice data in the mode is optimized for voice, which can improve the voice optimization effect in game scenarios while reducing sound quality damage.
  • the embodiment of the present application may also enter the system voice mode when it is detected that the application type of the service application is a non-game type, so that the first user can make a system call with other users in the system voice mode.
  • FIG. 12 is a schematic flowchart of another audio data processing method provided by an embodiment of the present application.
  • the method is executed by a computer device, for example, the method can be executed by a user terminal (for example, the above-mentioned target user terminal, the target user terminal can be the user terminal 10a in the above-mentioned embodiment corresponding to FIG. 4 ), or can be executed by a service server (such as , the above-mentioned service server 2000 shown in FIG. 1 ) is executed, and may also be executed by the user terminal and the service server interacting and cooperating.
  • a user terminal for example, the above-mentioned target user terminal, the target user terminal can be the user terminal 10a in the above-mentioned embodiment corresponding to FIG. 4
  • a service server such as , the above-mentioned service server 2000 shown in FIG. 1
  • this embodiment is described by taking the method being executed by a user terminal as an example.
  • the audio data processing method may include at least one of the following steps
  • Step S301 in the game voice mode, obtain the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application;
  • Step S302 according to the signal processing result, the application layer controls the switch state of the second optimization component in the second pre-signal processing strategy in the terminal system layer.
  • the first pre-signal processing strategy includes at least one first optimization component
  • the second pre-signal processing strategy includes at least one second optimization component
  • the application layer determines the second optimization component that needs to be turned on in the second pre-signal processing strategy in the end system layer, and/or, according to the signal processing result, the application layer determines the second optimization component in the end system layer.
  • the application layer controls the second optimization component to be turned on, if the second optimization component is in the off state.
  • the application layer controls the The second optimization component is closed, and if the current state of the second optimization component is the closed state, the second optimization component is kept closed.
  • the switching state of the first optimization component in the first pre-signal processing strategy is controlled at the application layer.
  • the above step S302 includes: according to the signal processing result, determining the second optimization component that is enabled in the second pre-signal processing strategy, and determining the second optimization component that is disabled in the second pre-signal processing strategy;
  • the second optimization component that is turned off in the second pre-signal processing strategy is used as the first cooperative component, and the first optimization that has the same optimization function as the first cooperative component is enabled in the first pre-signal processing strategy.
  • the first optimization component is: according to the signal processing result, determining the second optimization component that is enabled in the second pre-signal processing strategy, and determining the second optimization component that is disabled in the second pre-signal processing strategy;
  • the second optimization component that is turned off in the second pre-signal processing strategy is used as the first cooperative component, and the first optimization that has the same optimization function as the first
  • the above step S301 includes: acquiring the terminal type of the terminal to which the service application belongs, and searching for a test type that matches the terminal type in a test list associated with the service application; if the terminal type is found in the test list The matching test type is obtained, based on the sound quality parameter, the first test processing result obtained by adopting the first pre-signal processing strategy is obtained from the test list, and the second test processing result obtained by adopting the second pre-signal processing strategy is obtained. ; Based on the first test processing result and the second test processing result, determine the optimal signal processing strategy associated with the sound quality parameter from the first pre-signal processing strategy and the second pre-signal processing strategy, and the optimal signal processing strategy As a result of signal processing associated with the first preceding signal processing strategy.
  • the upstream voice data of the first user is acquired through the microphone in the game voice mode, the upstream voice data of the first user is processed by the first pre-signal processing strategy.
  • Perform voice optimization on the voice data to obtain a first voice optimization result, and perform voice optimization on the uplink voice data through the second pre-signal processing strategy to obtain a second voice optimization result; based on the first voice optimization result and the second voice optimization result, Determine the optimal signal processing strategy associated with the sound quality parameter from the first pre-signal processing strategy and the second pre-signal processing strategy, and take the optimal signal processing strategy as the signal processing associated with the first pre-signal processing strategy result.
  • the above method further includes: acquiring the uplink voice data of the first user corresponding to the service application in the game voice mode, based on the first optimization component and the second pre-signal enabled in the first pre-signal processing strategy The second optimization component enabled in the processing strategy performs voice optimization on the uplink voice data in the game voice mode.
  • the above method further includes: when the first user accesses the business application, acquiring a system resource package for loading the business application, and analyzing the system resource package to obtain system resource data of the business application;
  • the system resource data is initialized, and the service mode of the service application is initially configured as the system media mode based on the system resource data after the initialization process.
  • the above-mentioned method further includes: using the voice-optimized uplink voice data as a target voice optimization result corresponding to the uplink voice data; sending the target voice optimization result to a terminal corresponding to a third user associated with the first user , so that the terminal corresponding to the third user plays the voice-optimized uplink voice data through the speaker in the game voice mode.
  • the application layer of the business application has the right to control the switch state of the voice optimization component in the terminal system layer, so that the business application can be based on the actual business. Request or demand, flexibly control the switch state of the voice optimization component in the terminal system layer to ensure the voice optimization effect in this mode.
  • FIG. 13 is a schematic flowchart of another audio data processing method provided by an embodiment of the present application.
  • the method is executed by a computer device, for example, the method can be executed by a user terminal (for example, the above-mentioned target user terminal, the target user terminal can be the user terminal 10a in the above-mentioned embodiment corresponding to FIG. 4 ), or can be executed by a service server (such as , the above-mentioned service server 2000 shown in FIG. 1 ) is executed, and may also be executed by the user terminal and the service server interacting and cooperating.
  • a user terminal for example, the above-mentioned target user terminal, the target user terminal can be the user terminal 10a in the above-mentioned embodiment corresponding to FIG. 4
  • a service server such as , the above-mentioned service server 2000 shown in FIG. 1
  • this embodiment is described by taking the method being executed by a user terminal as an example.
  • the audio data processing method may include at least one of the following steps
  • Step S401 in the game voice mode, obtain the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application;
  • Step S402 according to the signal processing result, control the switch state of the second optimization component in the second pre-signal processing strategy in the terminal system layer, or control the switch state of the first optimization component in the first pre-signal processing strategy;
  • the first optimization component enabled in the first pre-signal processing strategy is different from the second optimization component enabled in the second pre-signal processing strategy.
  • the first pre-signal processing strategy includes at least one first optimization component
  • the second pre-signal processing strategy includes at least one second optimization component
  • the first optimization component opened in the first pre-signal processing strategy and the voice optimization component closed in the second pre-signal processing strategy have the same optimization function, and the second pre-signal processing strategy is opened.
  • the optimization component has the same optimization function as the first optimization component turned off in the first pre-signal processing strategy.
  • step S402 may be executed by the application layer of the service application, or may be executed by the terminal system layer, or may be executed jointly by the application layer and the terminal system layer.
  • the switching state of the first optimization component in the first pre-signal processing strategy is controlled by the application layer
  • the switching state of the second optimization component in the second pre-signal processing strategy is controlled by the terminal system layer.
  • the signal processing results need to be synchronized between the application layer and the end system layer, or the first optimization component and/or the second optimization component that needs to be turned on and off need to be synchronized.
  • the above step S402 includes: according to the signal processing result, determining the second optimization component to be enabled in the second pre-signal processing strategy, and determining the second optimization component to be disabled in the second pre-signal processing strategy;
  • the second optimization component that is turned off in the second pre-signal processing strategy is closed, and the second optimization component that is turned on and off in the first pre-signal processing strategy has the same optimization function as the first optimization component;
  • the second optimization component enabled in the signal processing strategy is enabled, and the first optimization component having the same optimization function as the second optimization component enabled in the first pre-signal processing strategy is disabled.
  • the second optimization component if the current state of the second optimization component is the closed state, then control the second optimization component to open, if the second optimization component is in the closed state. If the current state is on, keep the second optimization component on; for the second optimization component that needs to be turned off in the second pre-signal processing strategy, if the current state of the second optimization component is on, control the second optimization component.
  • the optimization component is closed, and if the current state of the second optimization component is the closed state, the second optimization component is kept closed.
  • the above step S401 includes: acquiring the terminal type of the terminal to which the service application belongs, and searching for a test type matching the terminal type in a test list associated with the service application; if the terminal type is found in the test list The matching test type is obtained, based on the sound quality parameter, the first test processing result obtained by adopting the first pre-signal processing strategy is obtained from the test list, and the second test processing result obtained by adopting the second pre-signal processing strategy is obtained. ; Based on the first test processing result and the second test processing result, determine the optimal signal processing strategy associated with the sound quality parameter from the first pre-signal processing strategy and the second pre-signal processing strategy, and the optimal signal processing strategy As a result of signal processing associated with the first preceding signal processing strategy.
  • the upstream voice data of the first user is acquired through the microphone in the game voice mode, the upstream voice data of the first user is processed by the first pre-signal processing strategy.
  • Perform voice optimization on the voice data to obtain a first voice optimization result, and perform voice optimization on the uplink voice data through the second pre-signal processing strategy to obtain a second voice optimization result; based on the first voice optimization result and the second voice optimization result, Determine the optimal signal processing strategy associated with the sound quality parameter from the first pre-signal processing strategy and the second pre-signal processing strategy, and take the optimal signal processing strategy as the signal processing associated with the first pre-signal processing strategy result.
  • the above method further includes: when the first user accesses the business application, acquiring a system resource package for loading the business application, and analyzing the system resource package to obtain system resource data of the business application;
  • the system resource data is initialized, and the service mode of the service application is initially configured as the system media mode based on the system resource data after the initialization process.
  • the method further includes: acquiring the uplink voice data of the first user corresponding to the service application in the game voice mode, based on the first optimization component and the second pre-signal enabled in the first pre-signal processing strategy.
  • the second optimization component enabled in the processing strategy performs voice optimization on the uplink voice data in the game voice mode.
  • one or more voice optimization components in the terminal system layer are controlled to be turned on or off according to the aforementioned signal processing results, so that the voice optimization components with the same optimization function can either run at the application layer, or It runs at the terminal system layer, so that the sound quality damage of the uplink voice data can be reduced from the root, and the voice optimization effect in the game scene can be improved.
  • FIG. 14 is a schematic structural diagram of an audio data processing apparatus provided by an embodiment of the present application.
  • the audio data processing apparatus 1 may include at least one of the following: a processing result acquisition module 12 , a component control module 13 and a voice optimization module 14 .
  • the audio data processing device may further include at least one of the following: a resource package acquisition module 15, an initialization module 16, an application interface output module 17, a voice enabling module 18, a game mode switching module 11, a call mode switching module 19, A call request sending module 20 , a communication channel establishing module 21 , a target result determining module 22 , a target result sending module 23 , and a voice closing module 24 .
  • the processing result acquisition module 12 is used to acquire the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application in the game voice mode; wherein, the first pre-signal processing strategy includes at least one The first optimization component;
  • the processing result acquisition module 12 includes: a sound quality index acquisition unit 121, a terminal type search unit 122, a test result acquisition unit 123, an optimal strategy determination unit 124, an optimization result acquisition unit 125 and a processing result determination unit 126;
  • the sound quality index obtaining unit 121 is configured to obtain the sound quality index of the business application in the game voice mode, and configure the sound quality parameter of the business application according to the sound quality index of the business application;
  • a terminal type search unit 122 configured to obtain the terminal type of the terminal to which the service application belongs, and search for a test type that matches the terminal type in the test list associated with the service application;
  • the test result obtaining unit 123 is used to obtain the first test processing result obtained by adopting the first pre-signal processing strategy from the test list based on the sound quality parameter if the test type matching the terminal type is found in the test list, and obtain the second test processing result obtained by adopting the second pre-signal processing strategy;
  • the first pre-signal processing strategy is the pre-signal processing strategy in the application layer of the service application;
  • the second pre-signal processing strategy is the test terminal The pre-signal processing strategy in the system terminal corresponding to the type;
  • the optimal strategy determining unit 124 is configured to determine, based on the first test processing result and the second test processing result, the optimal signal processing strategy associated with the sound quality parameter from the first pre-signal processing strategy and the second pre-signal processing strategy strategy, using the determined optimal signal processing strategy as the signal processing result associated with the first pre-signal processing strategy.
  • the speech optimization algorithm of the first optimization component in the first pre-signal processing strategy includes at least one of the following: a first echo cancellation algorithm for performing echo cancellation at the application layer, a first echo cancellation algorithm for performing noise suppression at the application layer A noise suppression algorithm, and a first gain control algorithm used for gain adjustment at the application layer;
  • the speech optimization algorithm of the second optimization component in the second pre-signal processing strategy includes at least one of the following: used at the end system layer A second echo cancellation algorithm for echo cancellation, a second noise suppression algorithm for noise suppression at the end system layer, and a second gain control algorithm for gain adjustment at the end system layer.
  • the optimal determination unit 124 includes: a first selection subunit 1241, a second selection subunit 1242, a third selection subunit 1243 and an optimal strategy determination subunit 1244;
  • the first selection subunit 1241 is used to obtain the first echo cancellation result corresponding to the first echo cancellation algorithm from the first test processing result, and obtain the second echo corresponding to the second echo cancellation algorithm from the second test processing result Cancellation result, based on the first echo cancellation result and the second echo cancellation result, select the optimal echo cancellation algorithm from the first echo cancellation algorithm and the second echo cancellation algorithm, and take the optimal echo cancellation algorithm as the first echo cancellation algorithm associated with the sound quality parameters.
  • the first selection subunit 1241 is specifically configured to obtain the first echo cancellation result corresponding to the first echo cancellation algorithm from the first test processing result, and obtain the corresponding second echo cancellation algorithm from the second test processing result.
  • the first selection subunit 1241 is also specifically configured to perform a first comparison between the optimized quality corresponding to the first echo cancellation result and the optimized quality corresponding to the second echo cancellation result, to obtain a first comparison result;
  • the first selection subunit 1241 is also specifically configured to select the first pre-signal processing strategy for the first pre-signal processing strategy if the first comparison result indicates that the optimized quality corresponding to the first echo cancellation result is better than the optimized quality corresponding to the second echo cancellation result.
  • an echo cancellation algorithm as the first optimal signal processing strategy associated with the sound quality parameter
  • the first selection subunit 1241 is also specifically configured to process the second pre-signal if the first comparison result indicates that the optimized quality corresponding to the second echo cancellation result is better than the optimized quality corresponding to the first echo cancellation result.
  • the second echo cancellation algorithm in the strategy acts as the first optimal signal processing strategy associated with the sound quality parameter.
  • the second selection subunit 1242 is configured to obtain the first noise suppression result corresponding to the first noise suppression algorithm from the first test processing result, and obtain the second noise suppression result corresponding to the second noise suppression algorithm from the second test processing result Suppression result, based on the first noise suppression result and the second noise suppression result, select the optimal noise suppression algorithm from the first noise suppression algorithm and the second noise suppression algorithm, and take the optimal noise suppression algorithm as the first noise suppression algorithm associated with the sound quality parameter.
  • the second selection subunit 1242 is specifically configured to obtain the first noise suppression result corresponding to the first noise suppression algorithm from the first test processing result, and obtain the second noise suppression algorithm corresponding to the second noise suppression algorithm from the second test processing result. the second noise suppression result;
  • the second selection subunit 1242 is further specifically configured to perform a second comparison between the optimized quality corresponding to the first noise suppression result and the optimized quality corresponding to the second noise suppression result, to obtain a second comparison result;
  • the second selection sub-unit 1242 is further specifically configured to, if the second comparison result indicates that the optimized quality corresponding to the first noise suppression result is better than the optimized quality corresponding to the second noise suppression result, select the No. a noise suppression algorithm as the second optimal signal processing strategy associated with the sound quality parameter;
  • the second selection subunit 1242 is also specifically configured to process the second pre-signal if the second comparison result indicates that the optimized quality corresponding to the second noise suppression result is better than the optimized quality corresponding to the first noise suppression result.
  • the second noise suppression algorithm in the strategy serves as the second optimal signal processing strategy associated with the sound quality parameter.
  • the third selection subunit 1243 is configured to obtain the first gain control result corresponding to the first gain control algorithm from the first test processing result, and obtain the second gain corresponding to the second gain control algorithm from the second test processing result
  • the control result based on the first gain control result and the second gain control result, select the optimal gain control algorithm from the first gain control algorithm and the second gain control algorithm, and use the optimal gain control algorithm as the first gain control algorithm associated with the sound quality parameter.
  • the third selection subunit 1243 is specifically configured to obtain the first gain control result corresponding to the first gain control algorithm from the first test processing result, and obtain the corresponding second gain control algorithm from the second test processing result. the second gain control result;
  • the third selection subunit 1243 is also specifically configured to perform a third comparison between the optimized quality corresponding to the first gain control result and the optimized quality corresponding to the second gain control result to obtain a third comparison result;
  • the third selection subunit 1243 is also specifically configured to select the first pre-signal processing strategy for a gain control algorithm as the third optimal signal processing strategy associated with the sound quality parameter;
  • the third selection subunit 1243 is also specifically configured to process the first pre-signal if the third comparison result indicates that the optimized quality corresponding to the second gain control result is better than the optimized quality corresponding to the first gain control result.
  • the second gain control algorithm in the strategy acts as the third optimal signal processing strategy associated with the sound quality parameter.
  • the optimal strategy determination subunit 1244 is used to determine the first optimal signal processing strategy, the second optimal signal processing strategy and the third optimal signal processing strategy as the signal processing strategies associated with the first pre-signal processing strategy result.
  • the first selection subunit 1241 for the specific implementation of the first selection subunit 1241, the second selection subunit 1242, the third selection subunit 1243 and the optimal strategy determination subunit 1244, reference may be made to the above description of the specific embodiment for determining the signal processing result, It will not be repeated here.
  • the optimization result obtaining unit 125 is used to obtain the uplink voice data of the first user through the microphone in the game voice mode if the test type matching the terminal type is not found in the test list.
  • a pre-signal processing strategy performs voice optimization on uplink voice data to obtain a first voice optimization result, and performs voice optimization on uplink voice data through a second pre-signal processing strategy to obtain a second voice optimization result;
  • the processing result determination unit 126 is used for determining the optimal signal processing strategy associated with the sound quality parameter from the first pre-signal processing strategy and the second pre-signal processing strategy based on the first speech optimization result and the second speech optimization result , taking the determined optimal signal processing strategy as the signal processing result associated with the first pre-signal processing strategy.
  • step S101 and step S102 in the example will not be repeated here.
  • the component control module 13 is configured to, according to the signal processing result, control the switch state of the second optimization component in the second pre-signal processing strategy in the terminal system layer at the application layer, or control the first pre-signal processing strategy in the first pre-signal processing strategy. 1. Optimize the switch state of the component;
  • the first optimization component enabled in the first pre-signal processing strategy is different from the second optimization component enabled in the second pre-signal processing strategy; the first optimization component enabled in the first pre-signal processing strategy is the same as the The voice optimization components disabled in the second pre-signal processing strategy have the same optimization function, and the second optimization component enabled in the second pre-signal processing strategy and the first optimization component disabled in the first pre-signal processing strategy have the same optimization function Function;
  • the component control module 13 includes: a coordination mechanism starting unit 131, a component control unit 132, a first component opening unit 133 and a second component opening unit 134;
  • the coordination mechanism starting unit 131 is configured to start the coordination mechanism between the application layer and the terminal system layer of the terminal to which the service application belongs according to the signal processing result;
  • the component control unit 132 controls the opening and closing of the second optimization component in the second pre-signal processing strategy in the terminal system layer based on the coordination mechanism at the application layer;
  • the first component enabling unit 133 is configured to, in the application layer, use the second optimization component that is turned off in the second pre-signal processing strategy as the first cooperative component, and enable the first cooperative component in the first pre-signal processing strategy
  • the component has the same optimization function as the first optimized component
  • the second component enabling unit 134 is configured to, in the application layer, use the second optimization component enabled in the second pre-signal processing strategy as the second coordination component, and disable the second coordination component in the first pre-signal processing strategy
  • the component has the same optimization function as the first optimized component.
  • step S102 the specific implementation of the coordination mechanism starting unit 131, the component control unit 132, the first component opening unit 133 and the second component opening unit 134 can be referred to the description of step S102 in the embodiment corresponding to FIG. 5 above, and will not be described here. Let's go on and on.
  • the voice optimization module 14 is used to obtain the uplink voice data of the first user corresponding to the business application in the game voice mode, based on the first optimization component enabled in the first pre-signal processing strategy and the activation in the second pre-signal processing strategy
  • the second optimization component of the game voice mode performs voice optimization on the uplink voice data in the game voice mode.
  • the first optimization component in the first pre-signal processing strategy at least includes: a first echo cancellation component, a first noise suppression component and a first gain control component
  • the second optimization component in the second pre-signal processing strategy at least It includes: a second echo cancellation component, a second noise suppression component and a second gain control component; the first echo cancellation component and the second echo cancellation component are both used for echo cancellation; the first noise suppression component and the second noise suppression component are both used for echo cancellation Used for noise suppression; both the first gain control component and the second gain control component are used for gain adjustment.
  • the resource package acquisition module 15 is configured to acquire a system resource package for loading the business application when the first user accesses the business application, parse and process the system resource package, and obtain system resource data of the business application;
  • the initialization module 16 is configured to perform initialization processing on the system resource data, and initially configure the service mode of the service application as the system media mode based on the system resource data after the initialization processing.
  • the application interface output module 17 is used for outputting the application display interface of the service application based on the system resource data after initialization processing; the application display interface includes a voice control for instructing the first user to initiate a voice interaction service;
  • the voice activation module 18 is used to detect the application type of the service application in response to the voice activation operation of the first user for the voice control;
  • the voice enabling module 18 can notify the game mode switching module 11 to generate an associated game type when detecting that the application type of the business application is a game type.
  • the first voice call instruction based on the first voice call instruction, the business mode of the business application is switched from the system media mode to the game voice mode.
  • the voice enabling module 18 may also notify the call mode switching module 19 when detecting that the application type of the business application is a non-game type (for example, a social type), when it detects that the application type of the business application is a non-game type.
  • a second voice call instruction associated with the game type is generated, and based on the second voice call instruction, the business mode of the service application is switched from the system media mode to the system call mode.
  • the call request sending module 20 is configured to send a system call request corresponding to the system call type to the second user through the service application when the call type of the voice interactive service is determined as the system call type based on the system call mode; the second user is the first user The user who requests to make a system call selected by the user in the service application;
  • the communication channel establishment module 21 is configured to establish a system communication channel between the first user and the second user when the second user responds to the system call request, and conduct a system call based on the system communication channel.
  • the target result determination module 22 is used to use the uplink voice data after voice optimization as the target voice optimization result corresponding to the uplink voice data;
  • the target result sending module 23 is used to send the target voice optimization result to the terminal corresponding to the third user associated with the first user, so that the terminal corresponding to the third user plays the voice-optimized uplink through the speaker in the game voice mode Voice data; optionally, both the first user and the third user are game users in the same game camp in the game voice mode.
  • the voice closing module 24 is configured to switch the service mode of the service application from the game voice mode to the system media mode in response to the voice close operation of the first user for the voice control.
  • the specific implementation of the processing result acquisition module 12, the component control module 13 and the voice optimization module 14 can be referred to the descriptions of steps 101 to S103 in the embodiment corresponding to FIG. 5, and will not be repeated here.
  • the determining module 22, the target result sending module 23, and the voice closing module 24 reference may be made to the descriptions of steps 201 to S213 in the embodiment corresponding to FIG. 9, which will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • An exemplary embodiment of the present application further provides an audio data processing apparatus, which is used to execute the method embodiment shown in FIG. 12 , and the apparatus may include at least one of the following: a processing result acquisition module and a component control module.
  • the processing result acquisition module is used to acquire the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application in the game voice mode; wherein, the first pre-signal processing strategy includes at least one first pre-signal processing strategy.
  • An optimization component is used to acquire the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application in the game voice mode; wherein, the first pre-signal processing strategy includes at least one first pre-signal processing strategy.
  • An optimization component is used to acquire the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application in the game voice mode; wherein, the first pre-signal processing strategy includes at least one first pre-signal processing strategy.
  • the component control module is used to control the switch state of the second optimization component in the second pre-signal processing strategy in the terminal system layer at the application layer according to the signal processing result; wherein, the second pre-signal processing strategy includes at least one Second optimization component.
  • An exemplary embodiment of the present application further provides an audio data processing apparatus, the apparatus is configured to execute the method embodiment shown in FIG. 13 , and the apparatus may include at least one of the following: a processing result acquisition module and a component control module.
  • the processing result acquisition module is used to acquire the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application in the game voice mode; wherein, the first pre-signal processing strategy includes at least one first pre-signal processing strategy.
  • An optimization component is used to acquire the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application in the game voice mode; wherein, the first pre-signal processing strategy includes at least one first pre-signal processing strategy.
  • An optimization component is used to acquire the signal processing result associated with the first pre-signal processing strategy in the application layer of the business application in the game voice mode; wherein, the first pre-signal processing strategy includes at least one first pre-signal processing strategy.
  • the component control module is used to control the switch state of the second optimization component in the second pre-signal processing strategy in the terminal system layer, or control the switching state of the first optimization component in the first pre-signal processing strategy according to the signal processing result.
  • the switch state wherein, the first optimization component enabled in the first pre-signal processing strategy is different from the second optimization component enabled in the second pre-signal processing strategy.
  • FIG. 15 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device 1000 may be a user terminal, and the user terminal may be the above-mentioned target user terminal. at this time.
  • the computer device 1000 may include: a processor 1001 , a network interface 1004 and a memory 1005 , in addition, the computer device 1000 may further include: a user interface 1003 , and at least one communication bus 1002 .
  • the communication bus 1002 is used to realize the connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may include a standard wired interface and a wireless interface (eg, a WI-FI interface).
  • the memory 1005 may be high-speed RAM memory or non-volatile memory, such as at least one disk memory.
  • the memory 1005 may optionally also be at least one storage device located remote from the aforementioned processor 1001. As shown in FIG. 15 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 1004 in the computer device 1000 can also provide a network communication function
  • the optional user interface 1003 can also include a display screen (Display) and a keyboard (Keyboard).
  • the network interface 1004 can provide a network communication function
  • the user interface 1003 is mainly used to provide an input interface for the user
  • the processor 1001 can be used to call the device control application stored in the memory 1005
  • the description of the processing device 1 will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the embodiment of the present application also provides a computer storage medium, and the computer storage medium stores the computer program executed by the aforementioned audio data processing apparatus 1, and the computer program includes program instructions,
  • the processor executes the program instructions, it can execute the description of the audio data processing method in the embodiment corresponding to FIG. 5 or FIG. 9 or FIG. 12 or FIG. 13 or other method embodiments. Therefore, the description will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the embodiments of the present application further provide a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the foregoing embodiment corresponding to FIG. 5 or FIG. 9 or FIG. 12 or FIG. 13 or other method embodiments.
  • the description of the audio data processing method in therefore, it will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Control Of Amplification And Gain Control (AREA)
  • Optical Recording Or Reproduction (AREA)

Abstract

一种音频数据处理方法、装置、设备、存储介质及程序产品,该方法包括:在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果(S101);根据信号处理结果,在应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,或控制第一前置信号处理策略中的第二优化组件的开关状态,其中,第一前置信号处理策略中开启的第一优化组件不同于在第二前置信号处理策略中开启的第二优化组件(S102);在获取到第一用户在游戏语音模式下的上行语音数据,基于开启的第一优化组件和开启的第二优化组件对上行语音数据进行语音优化(S103)。

Description

音频数据处理方法、装置、设备、存储介质及程序产品
本申请要求于2021年01月22日提交的申请号为202110088769.3、发明名称为“一种音频数据处理方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种音频数据处理方法、装置、设备、存储介质及程序产品。
背景技术
目前,使用移动终端的某个用户(例如,用户A)可以通过系统通话模式与其他用户(例如,用户B)进行系统通话。比如,用户A可以在电话通话场景下通过前述系统通话模式与用户B进行系统通话(即打电话)。
所以,当该用户A通过移动终端运行某个游戏应用(例如,游戏应用X),且与用户B进行系统通话(即打电话)时,该游戏应用X的应用层往往需要共用该移动终端的终端系统层的系统通话模式。基于此,当该移动终端无差别的开启该系统通话模式下的语音前置信号处理方案中的各个信号处理单元(即各个语音优化组件)时,该应用层与终端系统层均会通过同一功能类型的信号处理单元(即具有相同功能的语音优化组件)对采集到的该用户A的声音进行语音优化,以至于存在具有相同功能的语音优化组件的重复运行的现象,进而会增加系统的开销,还会因为多次重复处理而造成音质损伤等问题,以至于降低了语音优化效果。
发明内容
本申请实施例提供一种音频数据处理方法、装置、设备、存储介质及程序产品,可以提升游戏场景下的语音优化效果。
本申请实施例一方面提供了一种音频数据处理方法,所述方法由计算机设备执行,所述方法包括:在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,第一前置信号处理策略中包括至少一个第一优化组件;根据信号处理结果,在应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,或控制第一前置信号处理策略中的第一优化组件的开关状态;其中,第一前置信号处理策略中开启的第一优化组件不同于在第二前置信号处理策略中开启的第二优化组件;获取业务应用对应的第一用户在游戏语音模式下的上行语音数据,基于第一前置信号处理策略中开启的第一优化组件和第二前置信号处理策略中开启的第二优化组件,对游戏语音模式下的上行语音数据进行语音优化。
本申请实施例一方面提供了一种音频数据处理方法,所述方法由计算机设备执行,所述方法包括:在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,所述第一前置信号处理策略中包括至少一个第一优化组件;根据所述信号处理结果,在所述应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态;其中,所述第二前置信号处理策略中包括至少一个第二优化组件。
本申请实施例一方面提供了一种音频数据处理方法,所述方法由计算机设备执行,所述方法包括:在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,所述第一前置信号处理策略中包括至少一个第一优化组件;根据所述信号处理结果,控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,或控制所述第一前置信号处理策略中的第一优化组件的开关状态;其中,所述第一前置信号处理策略中开启的第一优化组件,不同于在所述第二前置信号处理策略中开启的第二优化组件。
本申请实施例一方面提供了一种音频数据处理装置,所述装置包括:
处理结果获取模块,用于在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,第一前置信号处理策略中包括至少一个第一优化组件;
组件控制模块,用于根据信号处理结果,在应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,或控制第一前置信号处理策略中的第一优化组件的开关状态;其中,第一前置信号处理策略中开启的第一优化组件不同于在第二前置信号处理策略中开启的第二优化组件;
语音优化模块,用于获取业务应用对应的第一用户在游戏语音模式下的上行语音数据,基于第一前置信号处理策略中开启的第一优化组件和第二前置信号处理策略中开启的第二优化组件,对游戏语音模式下的上行语音数据进行语音优化。
本申请实施例一方面提供了一种音频数据处理装置,所述装置包括:
处理结果获取模块,用于在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,所述第一前置信号处理策略中包括至少一个第一优化组件;
组件控制模块,用于根据所述信号处理结果,在所述应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态;其中,所述第二前置信号处理策略中包括至少一个第二优化组件。
本申请实施例一方面提供了一种音频数据处理装置,所述装置包括:
处理结果获取模块,用于在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,所述第一前置信号处理策略中包括至少一个第一优化组件;
组件控制模块,用于根据所述信号处理结果,控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,或控制所述第一前置信号处理策略中的第一优化组件的开关状态;其中,所述第一前置信号处理策略中开启的第一优化组件,不同于在所述第二前置信号处理策略中开启的第二优化组件。
本申请实施例一方面提供了一种计算机设备,包括:处理器和存储器;
处理器与存储器相连,其中,存储器用于存储计算机程序,计算机程序被处理器执行时,使得该计算机设备执行本申请实施例提供的方法。
本申请实施例一方面提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,该计算机程序适于由处理器加载并执行,以使得具有该处理器的算计设备执行本申请实施例提供的方法。
本申请实施例一方面提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行本申请实施例提供的方法。
在本申请实施例中,计算机设备(例如,移动终端)可以在游戏语音模式下,根据信号处理结果,在应用层控制开启和关闭终端系统层内的第二前置信号处理策略中的第二优化组件(即第二前置信号处理策略中的语音优化组件),或控制开启和关闭第一前置信号处理策略中的第一优化组件(即第一前置信号处理策略中的语音优化组件);其中,可以理解的是,第一前置信号处理策略中开启的第一优化组件不同于在第二前置信号处理策略中开启的第二优化组件。由此可见,本申请实施例提出可以根据前述信号处理结果(即具有相同功能的语音优化组件分别对应的算法对比结果),在应用层控制开启或者关闭终端系统层中的一个或者多个语音优化组件,从而可以让具有同一优化功能的语音优化组件要么是运行在应用层,要么是运行在终端系统层,这样,可以从根源上减少上行语音数据的音质损伤。可以理解的是,这里将不对终端系统层中所开启或者关闭的第二优化组件的数量和类型进行限定。进一步的, 计算机设备在获取到第一用户在游戏语音模式下的上行语音数据时,则可以快速基于具有不同功能的第一优化组件和第二优化组件,对游戏语音模式下的上行语音数据进行语音优化,进而可以在降低音质损伤的情况下,提升游戏场景下的语音优化效果。
附图说明
图1是本申请实施例提供的一种网络架构的结构示意图;
图2是本申请实施例提供的一种业务模式划分的示意图;
图3是本申请实施例提供的一种语音数据处理流程的示意图;
图4是本申请实施例提供的一种在游戏场景下进行语音交互的场景示意图;
图5是本申请实施例提供的一种音频数据处理方法的流程示意图;
图6是本申请实施例提供的一种测试列表的场景示意图;
图7是本申请实施例提供的一种确定与音质参数相关联的最优信号处理策略的场景示意图;
图8是本申请实施例提供的一种控制语音前置信号处理方案中各语音优化组件开启和关闭的场景示意图;
图9是本申请实施例提供的一种音频数据处理方法的示意图;
图10是本申请实施例提供的一种资源配置界面的场景示意图;
图11是本申请实施例提供的一种用于提供不同类型语言双讲服务的流程示意图;
图12是本申请实施例提供的另一种音频数据处理方法的流程示意图;
图13是本申请实施例提供的另一种音频数据处理方法的流程示意图;
图14是本申请实施例提供的一种音频数据处理装置的结构示意图;
图15是本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
本申请提供的各个实施例,可以单独实现,也可以进行任意的组合实现成为新的实施例,这都属于本申请保护的范围。
在对本申请实施例进行介绍说明之前,先对本申请涉及的一些技术名词进行定义说明。
1.游戏语音模式(Game Voice Mode):终端系统根据游戏应用场景下的语音需求和特点提供的一种与媒体模式、通话模式并列的语音模式。
2.采样率(Sample Rate):采样率也称为采样频率,指每秒从连续信号中提取并组成离散信号的采样个数,单位为赫兹(Hz)。采样率越高,数据越精确。常用的采样率是8khz,16khz,44.1khz,48khz。
3.采样位数(Bits of Samples):采样位数即采样值或取样值,用来衡量声音波动变化的参数,是指声卡在采集和播放声音文件时所使用数字声音信号的二进制位数。常用的采样位数为8位、16位和32位,且手机平台一般为16位采样位数。
4.声道数(Number of Channels):声道数也叫通道数,即声音的通道的数目,通常与硬件设备相关。常见的通道数有单声道和双声道(立体声),单声道的声音只能通过一个扬声器发声,双声道的声音可以通过两个扬声器发声,一般左右两个声道会有分工,从而更能感受到空间效果。
5.噪声抑制(Noise Suppression):经由语音采集工具采集到的语音数据通常既包含有效语音数据,如人声、音乐声等,也包含无用的噪声数据,如环境音等。噪声抑制是一种根据语音数据的特点,尽可能剔除或降低噪声对整个语音效果的影响的技术。
6.自动增益(Automatic Gain Control):一种使放大电路的增益自动地随信号强度而调整的自动控制方法,主要用于增强有效语音数据的信号强度。
7.回声消除(Acoustic Echo Cancellation):回声是指被声波反射或者重复的声音或者是声音信号经网络传输播放后被对端采集到重新传输回来,使得它又返回到说话者,通过信号处理算法或装置消除这些声音即回声消除。
8.动态控制(Dynamic Range Compression):动态控制即动态范围控制,可以动态调整音频输出幅值,在音量大时,适当压制音量,在音量小时,适当提升音量,从而使音量始终控制在一个合适的范围内。通常用于控制音频输出功率,使扬声器不破音,当处于低音量播放时也能清晰听到。
9.前处理(Front-End Process):语音前处理技术是指在进行编码、发送以前,先对原始语音数据进行处理,使得处理后的语音信号更能反映语音的本质特征的技术。语音前处理技术通常主要包含噪声抑制、回声消除、自动增益等技术。
请参见图1,图1是本申请实施例提供的一种网络架构的结构示意图。如图1所示,该网络架构可以包括业务服务器2000和用户终端集群。
用户终端集群可以包括一个或者多个用户终端,这里将不对用户终端的数量进行限制。如图1所示,这里的多个用户终端具体可以包括用户终端3000a、用户终端3000b、用户终端3000c、...、用户终端3000n。如图1所示,用户终端3000a、...、用户终端3000b可以分别与业务服务器2000进行网络连接,以便于该用户终端集群中的每个用户终端可以通过该网络连接与业务服务器2000之间进行数据交互。
如图1所示的业务服务器2000可以是独立的物理服务器,也可以是多个物理服务器所构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。
为便于理解,本申请实施例可以在图1所示的用户终端集群中选择一个用户终端(例如,用户A所使用的用户终端)作为目标用户终端,例如,本申请实施例可以将图1所示的用户终端3000a作为目标用户终端,该目标用户终端中可以集成有具备音频数据处理功能(例如,音频数据采集和播放功能)的业务应用。其中,业务应用具体可以包括娱乐客户端(例如,游戏客户端)、社交客户端、办公客户端、直播客户端等具有音频数据采集和播放功能的应用客户端。其中,目标用户终端(例如,用户终端3000a)具体可以包括:智能手机、平板电脑、笔记本电脑、可穿戴设备等携带音频数据处理功能的移动终端。其中,本申请实施例可以将娱乐客户端(例如,游戏客户端)对应的应用类型统称为游戏类型,并可以将社交客户端(例如,QQ、微信等客户端)、办公客户端(例如,企业客户端)、直播客户端等分别对应的应用类型统称为非游戏类型。
可以理解的是,对于运行有上述业务应用的用户终端(例如,前述目标用户终端)而言,可以根据业务应用的应用类型的不同,自适应地选择不同的业务模式,以在不同业务场景下进行不同类型的语音交互业务。
如图2所示,这里的业务模式具体可以包括系统媒体模式21a(也称为“媒体模式”)、系统通话模式21b(也称为“语音通话模式”或“通话模式”)和游戏语音模式21c。
应当理解,目标用户终端可以在用户(即第一用户)不需要语音双讲需求(即不需要进行语音交互)的情况下,默认将该业务应用的业务模式配置为系统媒体模式21a。可选的,该目标用户终端还可以在用户(即第一用户)需要语音双讲需求(即需要语音交互)的情况下,智能识别需要进行语音交互的业务应用的应用类型,进而可以根据业务应用的应用类型的不同,自适应地选择不同的业务模式。例如,在业务应用的应用类型是游戏类型的情况下,将该业务应用的业务模式配置为游戏语音模式21c;在业务应用的应用类型是非游戏类型的情况下,将该业务应用的业务模式配置为系统通话模式21b。
系统媒体模式21a可以用于指示目标用户终端为当前用户(即第一用户)播放音乐或者视频节目的音频数据。
系统通话模式21b可以用于指示目标用户终端在非游戏场景下通过该系统通话模式21b使当前用户(即第一用户)与另一个用户(即第二用户,该第二用户可以为第一用户在业务应用中所选择的请求进行系统通话的用户)进行系统通话。
游戏语音模式21c可以用于指示目标用户终端在游戏场景下提供一种全新的语音交互服务,比如,在游戏语音模式21c下,该用户(即第一用户)可以直接与另一个用户(比如,第三用户,该第三用户可以为在游戏语音模式21c下与第一用户处于同一游戏阵营中的游戏用户)进行游戏语音通话。
不同于聊天类软件,游戏场景中的语音环境更为复杂,需要同时兼顾语音通话质量和媒体播放音质。现有用户终端仅提供了适用于通话场景的系统通话模式21b和适用于音乐播放场景的系统媒体模式21a,而未考虑两者融合兼顾的场景,在游戏场景中应用效果不佳。因此,如何在确保语音双端通话体验的同时提升系统媒体播放音质,成为提升游戏用户语音体验的关键。如图2所示,在本申请中,移动智能终端通过提供与系统通话模式21b和系统媒体模式21a并列的游戏语音模式21c,实现针对游戏应用场景下的语音服务进行优化的目的。移动应用也可根据自身业务特点和诉求,选择最适合的语音模式,以快速高效地在应用中集成语音服务。游戏语音模式21c,即应用在游戏业务中或游戏场景下的一种语音模式,该模式旨在针对游戏场景优化玩家的语音体验。在游戏语音模式21c下,将针对游戏应用场景,对语音采集、处理、设置等各个环节采取有效的优化措施,从而给游戏玩家提供顺畅的游戏语音和优质的游戏音效体验。
需要说明的是,本申请主要运用于游戏语音行业,因此将本申请新提出的语音模式称为“游戏语音模式”,应当理解的是,该“游戏语音模式”不仅适用于游戏场景,还可适用于其他与游戏场景具有相同或相似的语音处理要求的业务场景,如适用于所有需要同时兼顾语音通话质量和媒体播放音质的语音业务场景,如视频直播场景、视频会议场景等,本申请对此不作限定。
以游戏场景为例,游戏语音主要经过语音数据采集阶段和语音数据播放阶段这两个阶段。如图3所示,其示出了语音数据处理流程的示意图。
在语音数据采集阶段,依次包括:
1.语音信号采集:语音通常通过麦克风输入手机。麦克风主要将声波转换为电压信号,然后对电压信号进行采样,从而将连续的电压信号转换为计算机能够处理的数字信号。影响采集到的语音信号质量的指标主要包括采样率、采样位数和声道数。采样率越高,每秒钟取得声音样本的次数也就越多,最终得到的音频质量也就越高。
2.语音信号前处理:对麦克风采集回来的数据进行预处理,提高语音数据的质量。前处理过程通常包含回声消除、自动增益和噪声抑制等音频处理算法。
3.编码:语音编码就是对采集到的数字语音信号进行压缩,降低传输码率并进行数字传输。
4.传输:传输即将编码后的语音数据通过网络发送到指定语音服务器,使得其他用户能通过服务器收听到该用户的语音数据。
在语音数据播放阶段,依次包括:
5.接收语音数据:从指定语音服务器获取其他用户的语音数据用于播放的过程。
6.解码:解码是与编码相对应的过程,即对接收到的编码后的语音数据进行解码,将数字信号转化为模拟信号。
7.后处理:解码后的语音数据可能因为丢包等问题导致播放出来的语音数据存在卡顿等影响音频播放效果的现象,需要通过后处理过程对解码后的语音数据进行调整和优化。
8.播放:将音频数据通过扬声器、耳机等设备播放出来。
可以理解的是,目标用户终端可以在该游戏语音模式下启动应用层与终端系统层之间的协同机制,进而可以根据该协同机制自适应的根据算法对比结果(即信号处理结果),从应用层的语音优化组件和终端系统层的语音优化组件中选择开启一个具有同一优化功能的语音优化组件。这样,当应用层与终端系统层按照前述协同机制进行协同工作时,可以在该游戏场景下对实时采集到的当前用户(即上述第一用户)的上行语音数据进行实时人声处理,以提 升上行语音数据的语音优化效果,进而可以提升游戏用户之间的语音交互体验。
为便于理解,请参见图4,图4是本申请实施例提供的一种在游戏场景下进行语音交互的场景示意图。其中,可以理解的是,在游戏场景下,如图4所示的用户终端10a中的业务应用的应用类型可以为上述游戏类型。此时,该用户终端10a可以将该业务应用的业务模式由系统媒体模式切换为游戏语音模式,以便于图4所示的用户1(即上述第一用户)可以在该游戏语音模式下与图4所示的用户2(即上述第三用户)进行游戏语音通话。
可以理解的是,图4所示的用户终端10a可以为上述具备音频数据处理功能的目标用户终端。可以理解的是,当图4所示的用户1需要通过该用户终端10a,向图4所示的用户2所对应的用户终端20a进行语音交互时,可以预先在该用户终端10a中对采集到的用户1的声音进行语音优化,以便于可以将语音优化后的用户1的声音作为上行语音数据对应的目标语音优化结果,以发送至用户2对应的用户终端20a,进而可以在用户终端20a中通过图4所示的扬声器播放该语音优化后的用户1的声音。其中,可以理解的是,本申请实施例可以将由用户终端10a的麦克风(对应于图4所示的应用显示界面100a中的语音控件)所采集到的用户1的声音统称为语音上行信号,即可以将由麦克风采集到的声音信号进行频谱分析后所得到的音频帧统称为上行语音数据。另外,可以理解的是,本申请实施例还可以将由用户终端20a的扬声器(对应于图4所示的应用显示界面200a中的播放控件)所播放的语音优化后的用户1的声音统称为语音下行信号,即可以将传递到扬声器进行播放的声音信号的音频帧称之为下行语音数据,同理,对于用户终端10a的扬声器所播放的语音优化后的其他用户(例如,用户2)的声音也可以统称为语音下行信号。
可以理解的是,在游戏场景下,如图4所示的用户终端10a在通过麦克风实时采集到用户1的声音(即上述语音上行信号)之后,可以得到该语音上行信号对应的上行语音数据,进而可以通过上述业务应用的应用层和终端系统层所共同协商得到的最优信号处理策略,例如,可以根据共同协商出的在应用层中开启的第一优化组件和在终端系统中开启的第二优化组件,对该用户1的上行语音数据进行语音优化。注意,这里的第二优化组件不同于第一优化组件;另外,应用层中开启的第一优化组件(即第一前置信号处理策略中开启的第一优化组件)与第二前置信号处理策略中关闭的第二优化组件具备同一优化功能,且终端系统中开启的第二优化组件与前述第一前置信号处理策略中关闭的第一优化组件具备同一优化功能。应当理解,本申请实施例可以将第一前置信号处理策略中的语音优化组件统称为第一优化组件,并将第二前置信号处理策略中的语音优化组件统称为第二优化组件。这样,通过应用层和终端系统层之间的协同机制,可以有效地确保具有同一优化功能的语音优化组件要么运行在应用层,要么运行在终端系统层,从而可以有效地避免具有同一优化功能的语音优化组件被重复运行的问题。
这里的进行语音优化即为上文介绍的前处理过程,主要包括但不限于进行回声消除(Acoustic Echo Cancellation,AEC),进行噪声抑制(noise suppression,NS),进行自动增益控制(Auto Gain Control,AGC)。
在进行回声消除(AEC)的过程中,回声主要是指说话者(例如,前述用户1)通过自己通信设备(例如,前述用户终端10a)发送给其他人(例如,前述用户2)的语音又重新回到自己的听筒里的现象。本申请实施例所涉及的回声消除主要是指目标用户终端(例如,前述用户终端10a)通过一定的算法装置(例如,回声消除组件)来消除这种回声的处理方案。
在进行噪声抑制(NS)的过程中,噪声主要是指由目标用户终端(例如,前述用户终端10a)采集到的除说话人(例如,前述用户1)之外其他物体所发出的声音信号,基于此,本申请实施例所涉及的噪声抑制主要是指用于目标用户终端(例如,前述用户终端10a)通过一定的算法装置(例如,噪声抑制组件)来消除这种噪声的处理方案。
在进行自动增益控制(AGC)的过程中,目标用户终端(例如,前述用户终端10a)可以通过一定的算法装置(例如,增益控制组件),智能根据人对声音的听觉感知范围对语音信号能 量进行调整,以使得语音信号能更好地被感知的处理方案。
应当理解,若用户终端10a通过算法比对结果选择在应用层中开启第一前置信号处理策略中的第一优化组件11(例如,回声抑制组件),则需要同步在终端系统层中关闭与该第一优化组件11(例如,回声抑制组件)具有相同优化功能的第二优化组件21,该第二优化组件21可以为由应用层所控制关闭的终端系统层内的第二前置信号处理策略中的回声抑制组件。这意味着在本申请实施例中,目标用户终端在通过麦克风实时采集到该游戏场景下的用户1的声音(即第一用户的上行语音数据)时,只需要在应用层或者终端系统层中运行一个具有相同优化功能的语音优化组件,进而可以确保具有相同优化功能的语音优化组件的功能运行一次,从而可以在根源上解决具有相同优化功能的语音优化组件的功能重复运行所造成的计算资源浪费的问题。
可以理解的是,如图4所示,当用户1(这里主要是指某个游戏用户,例如,游戏用户A)在图4所示的应用显示界面100a选择开启图4所示的语音控件(即处于关闭状态的语音控件时,该用户终端10a中运行的业务应用的业务模式可以为系统媒体模式)时,该用户1所使用终端(例如,图4所示的用户终端10a)可以确定出运行在该用户终端10a中的业务应用的应用类型,进而可以将该业务应用的业务模式由系统媒体模式切换为游戏语音模式,以使该用户终端10a可以在该游戏语音模式下,实时采集并优化由该用户1的声音,以得到图4所示的语音优化后的用户1的声音。进一步的,该用户终端10a可以将语音优化后的用户1的声音广播至该用户1所在阵营中的其他队友(例如,用户2,该用户2可以为与该用户1位于同一阵营中的其他游戏用户)。这样,当同一阵营中的其他队友(例如,用户2)所使用的终端(例如,图4所示的用户终端20a)在开启图4所示的播放控件(例如,在开启游戏场景下的扬声器)时,可以播放接收到的该语音优化后的用户1的声音。
可选的,若上述用户终端10a检测到上述业务应用的应用类型属于非游戏类型(例如,社交类型),则该用户终端10a(即上述目标用户终端)可以智能将该业务应用的业务模式由系统媒体模式切换为系统通话模式,以在该系统通话模式下执行第二类语音通话业务,该第二类语音通话业务可以为非游戏场景下的系统通话类型所对应的语音交互业务。比如,在社交场景下,可以允许图4所示的用户1向图4所示的用户2发送系统通话类型对应的系统通话请求,进而可以在图4所示的用户2所对应的终端(即上述用户终端20a)响应该系统通话请求(比如,该用户2确认接收用户1的来电请求)时,建立该用户1与用户2之间的系统通信信道,以通过该系统通信信道进行系统通话。
目标用户终端(例如,图4所示的用户终端10a)通过上述应用层控制开启和关闭终端系统层内的第二前置信号处理策略中的第二优化组件,以及对该第一用户的上行语音数据进行语音优化的具体实现方式,可以参见下述图5-图15所对应实施例。
请参见图5,图5是本申请实施例提供的一种音频数据处理方法的流程示意图。该方法由计算机设备执行,例如该方法可以由用户终端(例如,上述目标用户终端,该目标用户终端可以为上述图4所对应实施例中的用户终端10a)执行,也可以由业务服务器(如,上述图1所示的业务服务器2000)执行,还可以由用户终端和业务服务器交互配合执行。为便于理解,本实施例以该方法由用户终端执行为例进行说明。其中,该音频数据处理方法可以包括以下步骤S101-步骤S103中的至少一个步骤:
步骤S101,在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;
具体的,目标用户终端可以在游戏语音模式下,获取业务应用的音质指标,进而可以根据业务应用的音质指标,配置业务应用的音质参数(这里的音质参数可以包括但不限于语音采样率和语音声道数)。进一步的,目标用户终端获取业务应用所属终端的终端类型,在与业务应用相关联的测试列表中查找与终端类型相匹配的测试类型。若在测试列表中查找到与终端类型相匹配的测试类型,则目标用户终端可以基于音质参数从测试列表中获取采用第一前 置信号处理策略所得到的第一测试处理结果,且获取采用第二前置信号处理策略所得到的第二测试处理结果。其中,第一前置信号处理策略为业务应用的应用层内的前置信号处理策略。第二前置信号处理策略为测试终端类型所对应的终端系统内的前置信号处理策略。进一步的,目标用户终端可以基于第一测试处理结果和第二测试处理结果,从第一前置信号处理策略和第二前置信号处理策略中确定与音质参数相关联的最优信号处理策略,并可以将确定出的最优信号处理策略作为与第一前置信号处理策略相关联的信号处理结果。
可以理解的是,目标用户终端在执行步骤S101之前,还可以预先在该目标用户终端中加载该业务应用的系统资源包,进而可以在对系统资源包进行解析处理之后,得到该业务应用的系统资源数据,这样,当目标用户终端对该系统资源数据进行初始化处理之后,可以根据初始化处理后的系统资源数据将该业务应用的业务模式初始配置为系统媒体模式。应当理解,本申请实施例可以在完成系统资源数据的初始化处理之后,默认进入该系统媒体模式,以便于可以在该系统媒体模式下,根据初始化处理后的系统资源数据输出业务应用的应用显示界面,以在该应用显示界面中输出该业务应用的多媒体数据(例如,视频帧数据以及音频帧数据等)。可以理解的是,该应用显示界面中可以包含用于指示第一用户发起语音交互业务的语音控件,这样,当第一用户需要与其他用户进行语音交互时,则可以选择触发当前处于关闭状态的语音控件,以使该目标用户终端可以响应第一用户针对该语音控件执行的语音开启操作,进而可以自动检测发起语音交互业务的业务应用的应用类型。
可以理解的是,若目标用户终端确定发起该语音交互业务的业务应用的应用类型属于游戏类型,则该目标用户终端可以确定当前的业务场景为游戏场景,进而可以将该目标用户终端中所运行的业务应用的业务模式由系统媒体模式切换为游戏语音模式,例如,在该游戏场景中生成与该游戏类型相关联的第一语音通话指令,并可以基于第一语音通话指令将该目标用户终端中所运行的业务应用的业务模式由系统媒体模式切换为游戏语音模式,以便于后续可以在该游戏语音模式下,执行上述第一类语音通话业务。其中,可以理解的是,该目标用户终端可以在该游戏语音模式下,根据该业务应用的音质指标,细化一些与上述语音双讲需求相关联的音质参数。比如,该目标用户终端可以允许上述第一用户(即使用该目标用户终端的用户)在该游戏语音模式下,设置该目标用户终端对应的语音采样率和语音声道数。
可选的,可以理解的是,若该目标用户终端确定当前发起该语音交互业务的业务应用的应用类型属于非游戏类型,则该目标用户终端可以确定出当前的业务场景为非游戏场景,进而可以将该目标用户终端中所运行的业务应用的业务模式由系统媒体模式切换为系统语音模式,例如,在该非游戏场景中生成与该非游戏类型相关联的第二语音通话指令,并可以基于第二语音通话指令将该目标用户终端中所运行的业务应用的业务模式由系统媒体模式切换为系统语音模式,进而可以在系统语音模式下与其他用户(例如,上述第二用户)进行语音交互,以执行上述第二类语音通话业务。
由此可见,本申请实施例所提供的游戏语音模式与系统语音模式均为上述语音双讲场景下用于提供不同类型语音通话服务的两种业务模式。这样,目标用户终端通过判断发起该语音通话业务的业务应用的应用类型,可以在应用类型为游戏类型时,智能地进入上述游戏语音模式,以执行上述第一类语音通话业务。可选的,该目标用户终端还可以在应用类型为非游戏类型时,智能地进入上述系统语音模式,以执行上述第二类语音通话业务。
可以理解的是,在本申请实施例所涉及的目标用户终端可以包括但不限于具备上述语音数据处理功能的移动终端。所以,本申请实施例所涉及的设置目标用户终端对应的语音采样率,主要可以包括设置终端的上行采样率和下行采样率。此外,本申请实施例所涉及的设置目标用户终端的语音声道数主要是指设置语音的通道数目,例如,可以根据目标用户终端的音质指标,将通道数目设置为双声道。
应当理解,这里的语音采样率(例如,上行采样率和下行采样率)可以为目标用户终端的录音组件在单位采样周期内对声音信号进行采样的采样次数。该语音采用率可以包括但不 限于4kHZ,8kHZ和48kHZ。应当理解,语音采样率的取值大小可以用于反映录音组件所能够还原出的用户声音的真实度和自然度。
比如,在该游戏语音模式下,第一用户可以通过该目标用户终端中的语音交互系统所提供的语音双讲服务进行语音交互。即该目标用户终端中的麦克风在用于采集到该第一用户的声音信号(即可以用于采集第一用户的声音)时,可以在该游戏语音模式下,根据上述上行采样率对该第一用户的声音信号进行频谱分析,以采样得到该第一用户在该游戏语音模式下的上行语音数据。此时,目标用户终端可以对该上行语音数据进行语音优化,进而可以将语音优化后的第一用户的声音信号(即上述语音优化后的第一用户的声音)发送给其他通信对端(例如,上述第三用户所对应的终端),以在其他通信对端中通过各自的扬声器播放该上述语音优化后的第一用户的声音。同理,该目标用户终端还可以用于接收其他通信对端所传输来的语音优化后的第三用户的声音信号,进而可以在按照上述下行采用率对该语音优化后的第三用户的声音信号进行频谱分析之后,可以得到用于传递至该目标用户终端的扬声器的下行语音数据,这样,当通过该目标用户终端的扬声器播放该下行语音数据时,可以尽可能准确地为该第一用户还原出语音优化后的第三用户的声音。
为便于理解,请参见图6,图6是本申请实施例提供的一种测试列表的场景示意图。其中,测试列表301a中的测试终端类型可以为一个或者多个测试终端所对应的测试类型。可以理解的是,这里的测试终端类型可以包含但不限于一个或者多个品牌的机型;可选的,这里的测试终端类型还可以包含这些机型所对应的终端环境系统的系统类型以及系统版本等。
可以理解的是,业务应用对应的开发人员在研发出具有上述游戏语音模式的业务应用时,可以预先将该业务应用集成安装在各个用于进行测试的已知机型所对应的测试终端中,以在这些已知机型所对应的测试终端中分别使用多种前置信号处理策略(比如,上述应用层内的第一前置信号处理策略和上述终端系统层内的第二前置信号处理策略)进行性能测试,以测试得到同一已知机型(即同一测试类型)在特定音质参数下的应用层的中的各语音优化组件的优化性能,和终端系统层中的对应功能的语音优化组件的优化性能。
为便于理解,这里以测试终端类型为单个品牌的机型为例,该品牌下的n(这里的n为正整数)个机型可以为图6所示的测试类型T1、…、测试类型Tn。比如,测试类型T1可以为品牌A的机型1,测试类型T2可以为品牌A的机型2,以此类推,测试类型Tn可以为品牌A的机型n。
可以理解的是,为测试应用层中的各语音测试组件和终端系统层中的各语音测试组件在同一机型的不同音质参数下的优化性能,开发人员可以在设定音质参数为音质参数D1(例如,上行语音采样率为8kHz,下行语音采样率为8kHz,左声道等单声道数量)的情况下,使用应用层中的第一前置信号处理策略和终端系统层内的第二前置信号处理策略,测试得到该机型为测试类型T1的测试终端在该音质参数D1时的语音测试效果。
比如,在进行性能测试的过程中,通过使用上述应用层中的各语音优化组件(例如,用于进行回声消除的第一回声消除组件、用于进行噪声抑制的第一噪声抑制组件和用于进行增益调整的第一增益控制组件等第一优化组件)对上行语音数据(例如,用于进行性能测试的上行语音数据R1)进行测试优化所得到的测试处理结果可以为图6所示的与音质参数D1相关联的应用层的测试处理结果。此外,通过使用上述终端系统层中的各语音优化组件(例如,用于进行回声消除的第二回声消除组件、用于进行噪声抑制的第二噪声抑制组件和用于进行增益调整的第一增益控制组件等第二优化组件)对同一上行语音数据R1进行测试优化所得到的测试处理结果,可以为图6所示的与音质参数D1相关联的终端系统层的测试处理结果。
为便于理解,本申请实施例可以假设应用层中的第一回声消除组件所对应的测试处理结果可以为图6所示的第一测试结果31a;此时,在终端系统层中的与前述第一回声消除组件具有相同优化功能的语音优化组件可以为上述第二回声消除组件,如图6所示,使用该第二回声消除组件对上行语音数据R1进行回声消除后所得到的测试处理结果可以为图6所示的 第二测试结果31b。
又比如,应用层中的第一噪声抑制组件所对应的测试处理结果可以为图6所示的第一测试结果32a;此时,在终端系统层中的与前述第一噪声抑制组件具有相同优化功能的语音优化组件可以为上述第二噪声抑制组件,如图6所示,使用该第二噪声抑制组件对上行语音数据R1进行噪声抑制后所得到的测试处理结果可以为图6所示的第二测试结果32b。
又比如,应用层中的第一增益控制组件所对应的测试处理结果可以为图6所示的第一测试结果33a。此时,在终端系统层中的与前述第一增益控制组件具有相同优化功能的语音优化组件可以为上述第二增益控制组件,如图6所示,使用该第二噪声抑制组件对上行语音数据R1进行增益调整后所得到的测试处理结果可以为图6所示的第二测试结果33b。
此外,以此类推,如图6所示,开发人员还可以在设定音质参数为音质参数D2(例如,上行语音采样率为8kHz,下行语音采样率为16kHz,左声道等单声道数量)的情况下,使用应用层中的第一前置信号处理策略和终端系统层内的第二前置信号处理策略,测试得到该机型为测试类型Tn的另一测试终端在该音质参数D2时的语音测试效果。
比如,在进行另一性能测试的过程中,通过使用上述应用层中的各语音优化组件(例如,用于进行回声消除的第一回声消除组件、用于进行噪声抑制的第一噪声抑制组件和用于进行增益调整的第一增益控制组件等第一优化组件)对另一上行语音数据(例如,用于进行性能测试的上行语音数据R2)进行测试优化所得到的测试处理结果可以为图6所示的与音质参数D2相关联的应用层的测试处理结果。此外,通过使用上述终端系统层中的各语音优化组件(例如,用于进行回声消除的第二回声消除组件、用于进行噪声抑制的第二噪声抑制组件和用于进行增益调整的第一增益控制组件等第二优化组件)对同一上行语音数据R2进行测试优化所得到的测试处理结果,可以为图6所示的与音质参数D2相关联的终端系统层的测试处理结果。
同理,为便于理解,本申请实施例可以假设应用层中的第一回声消除组件所对应的测试处理结果可以为图6所示的第一测试结果34a;此时,在终端系统层中的与前述第一回声消除组件具有相同优化功能的语音优化组件可以为上述第二回声消除组件,如图6所示,使用该第二回声消除组件对上行语音数据R2进行回声消除后所得到的测试处理结果可以为图6所示的第二测试结果34b。
同理,应用层中的第一噪声抑制组件所对应的测试处理结果可以为图6所示的第一测试结果35a;此时,在终端系统层中的与前述第一噪声抑制组件具有相同优化功能的语音优化组件可以为上述第二噪声抑制组件,如图6所示,使用该第二噪声抑制组件对上行语音数据R1进行噪声抑制后所得到的测试处理结果可以为图6所示的第二测试结果35b。
同理,应用层中的第一增益控制组件所对应的测试处理结果可以为图6所示的第一测试结果36a。此时,在终端系统层中的与前述第一增益控制组件具有相同优化功能的语音优化组件可以为上述第二增益控制组件,如图6所示,使用该第二增益控制件对上行语音数据R2进行增益调整后所得到的测试处理结果可以为图6所示的第二测试结果36b。
综上所述,当目标用户终端在对应用层中的第一优化组件和终端系统层中的具有同一优化功能的第二优化组件进行性能测试之后,可以预先测试得到各已知机型在不同音质参数下的测试处理结果。然后,开发人员可以将按照上述测试终端类型、音质参数、应用层的测试处理结果和终端系统层的测试处理结果,构建得到上述图6的测试列表301a。这样,当第一用户在上述游戏语音模型下需要与其他用户进行上述语音交互业务时,可以智能根据该业务应用当前所属终端(即上述目标用户终端)的终端类型,快速在该测试列表301中查找与该终端类型相匹配的测试类型。比如,该目标用户终端可以基于当前用户(即上述第一用户)根据业务应用的音质指标所设置的音质参数(例如,该第一用户设置的音质参数为上述音质参数D1),快速从该测试列表301a中获取采用上述第一前置信号处理策略所得到的第一测试处理结果和采用上述第二前置信号处理策略所得到的第二测试处理结果。可以理解的是,此 时,目标用户终端可以在将具有相同优化功能的语音优化组件的测试结果进行比较之后,根据具有相同优化功能的语音优化组件的语音测试效果快速从上述第一前置信号处理策略和第二前置信号处理策略中,判断出当前终端类型(即当前机型)在特定音质参数下的各优化功能所对应的最优信号处理策略,进而可以将确定出的最优信号处理策略作为与第一前置信号处理策略相关联的信号处理结果,以便于后续可以继续执行下述步骤S102。
比如,在第一用户(即当前用户)根据音质指标所设置的音质参数为上述图6所示的音质参数D1的情况下,这里的第一测试处理结果具体可以包含上述第一回声消除组件(即应用层中的AEC组件)所对应的第一测试结果31a、上述第一噪声抑制组件(即应用层中的NS组件)所对应的第一测试结果32a、和上述第一增益控制组件(应用层中的即AGC组件)所对应的第一测试结果33a。其中,这里的第二测试处理结果具体可以包含上述第二回声消除组件(即终端系统层中的AEC组件)所对应的第二测试结果31b、上述第二噪声抑制组件(即终端系统层中的NS组件)所对应的第二测试结果32b、和上述第二增益控制组件(即终端系统层中的AGC组件)所对应的第二测试结果33b。
同理,在第一用户根据音质指标所设置的音质参数为上述图6所示的其他音质参数(例如,上述音质参数D2)的情况下,同样可以从上述测试列表301a中快速得到采用第一前置信号处理策略所得到的第一测试处理结果和采用第二前置信号处理策略所得到的第二测试处理结果。这里将不对与其他音质参数相关联的第一测试处理结果、和与其他音质参数相关联的第二测试处理结果进行一一列举。
步骤S102,根据信号处理结果,在应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,或控制第一前置信号处理策略中的第一优化组件的开关状态;
可选地,根据信号处理结果,在应用层控制开启和关闭终端系统层内的第二前置信号处理策略中的第二优化组件,或控制开启和关闭第一前置信号处理策略中的第一优化组件;其中,第一前置信号处理策略中包括至少一个第一优化组件,第二前置信号处理策略中包括至少一个第二优化组件。在一些实施例中,第一前置信号处理策略中包含的第一优化组件的数量和第二前置信号处理策略中包含的第二优化组件的数量相同,如均为3个。并且,第一前置信号处理策略中的每一个第一优化组件,在第二前置信号处理策略中都有一个与之具有相同优化功能的第二优化组件;相应地,第二前置信号处理策略中的每一个第二优化组件,在第一前置信号处理策略中都有一个与之具有相同优化功能的第一优化组件。
可选地,第一前置信号处理策略中开启的第一优化组件不同于在第二前置信号处理策略中开启的第二优化组件。
可选地,第一前置信号处理策略中开启的第一优化组件与第二前置信号处理策略中关闭的语音优化组件具备同一优化功能,且第二前置信号处理策略中开启的第二优化组件与第一前置信号处理策略中关闭的第一优化组件具备同一优化功能。
具体的,目标用户终端根据信号处理结果,确定第二前置信号处理策略中开启的第二优化组件,以及确定第二前置信号处理策略中关闭的第二优化组件。例如,目标用户终端可以根据前述信号处理结果,启动应用层与业务应用所属终端的终端系统层之间的协同机制,进而可以基于协同机制在应用层控制开启和关闭终端系统层内的第二前置信号处理策略中的第二优化组件。进一步的,目标用户终端可以在应用层中,将第二前置信号处理策略中关闭的第二优化组件作为第一协同组件,且在第一前置信号处理策略中开启与第一协同组件具有相同优化功能的第一优化组件。进一步的,目标用户终端可以在应用层中,将第二前置信号处理策略中开启的第二优化组件作为第二协同组件,且在第一前置信号处理策略中关闭与第二协同组件具有相同优化功能的第一优化组件。
可以理解的是,上述第一前置信号处理策略中的第一优化组件的语音优化算法可以包括以下至少一种:用于在应用层进行回声消除的第一回声消除算法(该第一回声消除算法对应的第一优化组件为上述第一回声消除组件)、用于在应用层进行噪声抑制的第一噪声抑制算法 (该第一噪声抑制算法对应的第一优化组件为上述第一噪声抑制组件)、和用于在应用层进行增益调整的第一增益控制算法(该第一增益控制算法对应的第一优化组件为上述第一增益控制组件)。同理,上述第二前置信号处理策略中的第二优化组件的语音优化算法可以包括以下至少一种:用于在终端系统层进行回声消除的第二回声消除算法(该第二回声消除算法对应的第二优化组件为上述第二回声消除组件)、用于在终端系统层进行噪声抑制的第二噪声抑制算法(该第二噪声抑制算法对应的第二优化组件为上述第二噪声抑制组件)、和用于在终端系统层进行增益调整的第二增益控制算法(该第二增益控制算法对应的第二优化组件为上述第二增益控制组件)。
此时,目标用户终端所获取到的信号处理结果可以是由下述步骤所得到的:从第一测试处理结果中获取第一回声消除算法所对应的第一回声消除结果,并从第二测试处理结果中获取第二回声消除算法所对应的第二回声消除结果,进而可以基于第一回声消除结果和第二回声消除结果,从第一回声消除算法和第二回声消除算法中选取最优回声消除算法,从而可以将最优回声消除算法作为与音质参数相关联的第一最优信号处理策略。进一步的,目标用户终端还可以从第一测试处理结果中获取第一噪声抑制算法所对应的第一噪声抑制结果,并从第二测试处理结果中获取第二噪声抑制算法所对应的第二噪声抑制结果,进而可以基于第一噪声抑制结果和第二噪声抑制结果,从第一噪声抑制算法和第二噪声抑制算法中选取最优噪声抑制算法,从而可以将最优噪声抑制算法作为与音质参数相关联的第二最优信号处理策略。进一步的,目标用户终端可以从第一测试处理结果中获取第一增益控制算法所对应的第一增益控制结果,并从第二测试处理结果中获取第二增益控制算法所对应的第二增益控制结果,进而可以基于第一增益控制结果和第二增益控制结果,从第一增益控制算法和第二增益控制算法中选取最优增益控制算法,进而可以将最优增益控制算法作为与音质参数相关联的第三最优信号处理策略。进一步的,目标用户终端可以将第一最优信号处理策略、第二最优信号处理策略和第三最优信号处理策略,确定为与第一前置信号处理策略相关联的信号处理结果。
为便于理解,请参见图7,图7是本申请实施例提供的一种确定与音质参数相关联的最优信号处理策略的场景示意图。其中,如图7所示的第一测试处理结果401a可以为上述图6所对应实施例中的与音质参数D1相关联的应用层的测试处理结果(即与音质参数D1相关联的第一测试处理结果)。其中,该第一测试处理结果401a中的测试结果41a可以为上述图6所对应实施例中的第一测试结果31a,即图7所示的测试结果41a可以为从第一测试处理结果401a中获取到的第一回声消除算法对应的第一回声消除结果。其中,该第一测试处理结果401a中的测试结果42a可以为上述图6所对应实施例中的第一测试结果32a,即图7所示的测试结果42a可以为从第一测试处理结果401a中获取到的第一噪声抑制算法对应的第一噪声抑制结果。其中,该第一测试处理结果401a中的测试结果43a可以为上述图6所对应实施例中的第一测试结果33a,即图7所示的测试结果43a可以为从第一测试处理结果401a中获取到的第一增益控制算法对应的第一增益控制结果。
如图7所示的第二测试处理结果401b可以为上述图6所对应实施例中的与音质参数D1相关联的终端系统层的测试处理结果(即与音质参数D1相关联的第二测试处理结果)。其中,该第二测试处理结果401b中的测试结果41b可以为上述图6所对应实施例中的第二测试结果31b,即图7所示的测试结果41b可以为从第二测试处理结果401b中获取到的第二回声消除算法对应的第二回声消除结果。其中,该第二测试处理结果401b中的测试结果42b可以为上述图6所对应实施例中的第二测试结果32b,即图7所示的测试结果42b可以为从第二测试处理结果401b中获取到的第二噪声抑制算法对应的第二噪声抑制结果。其中,该第二测试处理结果401b中的测试结果43b可以为上述图6所对应实施例中的第二测试结果33b,即图7所示的测试结果43b可以为从第二测试处理结果401b中获取到的第二增益控制算法对应的第二增益控制结果。
可以理解的是,该目标用户终端根据第一回声消除结果(例如,上述图7所示的测试结 果41a)和第二回声消除结果(例如,上述图7所示的测试结果41b),确定出第一最优信号处理策略的具体过程可以描述为:目标用户终端可以从第一测试处理结果中获取第一回声消除算法所对应的第一回声消除结果,并从第二测试处理结果中获取第二回声消除算法所对应的第二回声消除结果;进一步的,目标用户终端可以将第一回声消除结果对应的优化质量与第二回声消除结果对应的优化质量进行第一比较,以得到第一比较结果。其中,可以理解的是,如图7所示,目标用户终端可以根据测试结果41a和测试结果41b确定出具有同一优化功能的第一优化组件和第二优化组件的语音测试效果。比如,通过比较上述应用层中的第一回声消除组件在应用层的语音测试效果V11和上述终端系统层中的第二回声消除组件在终端系统层的语音测试效果V12,可以判断测试结果41a是否优于测试结果41b。这样,若图7所示的第一比较结果指示测试结果41a优于测试结果41b,则表明前述第一回声消除结果对应的优化质量优于前述第二回声消除结果对应的优化质量,进而可以将第一前置信号处理策略中的第一回声消除算法作为与音质参数相关联的第一最优信号处理策略;反之,若图7所示的第一比较结果指示测试结果41b优于测试结果41a,则表明第二回声消除结果对应的优化质量优于第一回声消除结果对应的优化质量,进而可以将第二前置信号处理策略中的第二回声消除算法作为与音质参数相关联的第一最优信号处理策略。可选的,应当理解,若测试结果41a与测试结果41b相同,则可以将第一前置信号处理策略中的第一回声消除算法或者第二前置信号处理策略中的第二回声消除算法作为第一最优信号处理策略。
可以理解的是,该目标用户终端根据第一噪声抑制结果(例如,上述图7所示的测试结果42a)和第二噪声抑制结果(例如,上述图7所示的测试结果42b),确定出第二最优信号处理策略的具体过程可以描述为:目标用户终端可以从第一测试处理结果中获取第一噪声抑制算法所对应的第一噪声抑制结果,并从第二测试处理结果中获取第二噪声抑制算法所对应的第二噪声抑制结果;进一步的,目标用户终端可以将第一噪声抑制结果对应的优化质量与第二噪声抑制结果对应的优化质量进行第二比较,得到第二比较结果。其中,可以理解的是,如图7所示,目标用户终端可以根据测试结果42a和测试结果42b确定出具有同一优化功能的各语音优化组件的语音测试效果。比如,通过比较上述应用层中的第一噪声抑制组件在应用层的语音测试效果V21和上述终端系统层中的第二噪声抑制组件在终端系统层的语音测试效果V22,可以判断该测试结果42a是否优于测试结果42b;这样,若图7所示的第二比较结果指示测试结果42a优于测试结果42b,则表明前述第一噪声抑制结果对应的优化质量优于前述第二噪声抑制结果对应的优化质量,进而可以将第一前置信号处理策略中的第一噪声抑制算法作为与音质参数相关联的第二最优信号处理策略;反之,若图7所示的第二比较结果指示测试结果42b优于测试结果42a,则表明第二噪声抑制结果对应的优化质量优于第一噪声抑制结果对应的优化质量,则该目标用户终端可以将第二前置信号处理策略中的第二噪声抑制算法作为与音质参数相关联的第二最优信号处理策略。同理,可选的,若测试结果42a与测试结果42b相同,则可以将第一前置信号处理策略中的第一噪声抑制算法或者第二前置信号处理策略中的第二噪声抑制算法作为第二最优信号处理策略。
可以理解的是,该目标用户终端根据第一增益控制结果(例如,上述图7所示的测试结果43a)和第二增益控制结果(例如,上述图7所示的测试结果43b),确定出第三最优信号处理策略的具体过程可以描述为:目标用户终端可以从第一测试处理结果中获取第一增益控制算法所对应的第一增益控制结果,从第二测试处理结果中获取第二增益控制算法所对应的第二增益控制结果;进一步的,目标用户终端可以将第一增益控制结果对应的优化质量与第二增益控制结果对应的优化质量进行第三比较,得到第三比较结果。其中,可以理解的是,如图7所示,目标用户终端可以根据测试结果43a和测试结果43b确定出具有同一优化功能的各语音优化组件的语音测试效果。比如,通过比较上述应用层中的第一增益控制组件在应用层的语音测试效果V31和上述终端系统层中的第二增益控制组件在终端系统层的语音测试效果V32,可以判断测试结果43a是否优于测试结果43b;这样,若图7所示的第三比较结 果指示测试结果43a优于测试结果43b,则表明前述第一增益控制结果对应的优化质量优于前述第二增益控制结果对应的优化质量,进而可以将第一前置信号处理策略中的第一增益控制算法作为与音质参数相关联的第三最优信号处理策略;反之,若图7所示的第二比较结果指示测试结果43b优于测试结果43a,则表明第二增益控制结果对应的优化质量优于第一增益控制结果对应的优化质量,进而可以将第一前置信号处理策略中的第二增益控制算法作为与音质参数相关联的第三最优信号处理策略。同理,可选的,若测试结果43a与测试结果43b相同,则可以将第一前置信号处理策略中的第一增益控制算法或者第二前置信号处理策略中的第二增益控制算法作为第三最优信号处理策略。
可选的,可以理解的是,若目标用户终端在测试列表(例如,上述测试列表301a)中未查找到与当前的终端类型相匹配的测试类型,则该目标用户终端可以确定当前的终端类型属于新的机型,从而可以在游戏语音模式下通过麦克风获取到第一用户的上行语音数据(例如,上述语音数据R3)时,进一步通过第一前置信号处理策略对上行语音数据(例如,上述语音数据R3)进行语音优化(即进行实时语音优化),以实时得到第一语音优化结果,且可以通过第二前置信号处理策略对上行语音数据(例如,上述语音数据R3)进行语音优化(即进行实时语音优化),以实时得到第二语音优化结果;进一步的,该目标用户终端可以基于第一语音优化结果和第二语音优化结果,从第一前置信号处理策略和第二前置信号处理策略中确定出与音质参数相关联的最优信号处理策略,进而可以将确定出的最优信号处理策略作为与第一前置信号处理策略相关联的信号处理结果。
可以理解的是,该目标用户终端在确定自己的机型不属于新的机型的情况下,则可以在上述游戏场景下通过应用层中的各语音优化控件对实时获取到的该第一用户的上述语音数据进行实时语音优化,进而可以得到应用层中的各语音优化控件对应的第一语音优化结果。同理,该目标用户终端还可以在上述游戏场景下通过终端系统层中的各语音优化控件对实时获取到的该第一用户的上述语音数据进行实时语音优化,进而可以得到终端系统层中的各语音优化控件对应的第二语音优化结果。其中,该目标用户终端将具有同一优化功能的语音优化组件的语音优化效果进行比较的具体实现方式,可以参见上述对具有同一优化功能的语音优化组件的语音测试效果的描述,这里将不再继续进行赘述。
第一前置信号处理策略中的第一优化组件可以包括以下至少一种:上述第一回声消除组件、上述第一噪声抑制组件和上述第一增益控制组件。第二前置信号处理策略中的第二优化组件可以包括以下至少一种:上述第二回声消除组件、上述第二噪声抑制组件和上述第二增益控制组件。其中,第一回声消除组件和第二回声消除组件均可以用于进行回声消除;第一噪声抑制组件和第二噪声抑制组件均可以用于进行噪声抑制;第一增益控制组件和第二增益控制组件均可以用于进行增益调整。
为避免具有相同优化功能的语音优化组件在应用层和终端系统层中重复运行,本申请实施例提出可以在游戏语音模式下,提供相应开关供前述应用层控制前置信号处理方案中的各个部分(即各个语音优化组件)的开启和关闭,以确保具有相同优化功能的语音优化组件要么运行在应用层,要么运行在终端系统层,这样,可以在游戏场景下的实时语音优化(即实时人声优化)过程中,能够降低整个人声优化流程的性能消耗,进而可以提示游戏场景下的语音交互体验,此外,本申请实施例还可以在该游戏语音模式下,可以避免终端系统资源(例如,CPU(Central Processing Unit,中央处理器)的计算资源)的浪费,从而可以有效地节省终端的耗电量。
为便于理解,请参见图8,图8是本申请实施例提供的一种控制语音前置信号处理方案中各语音优化组件开启和关闭的场景示意图。应当理解,这里的语音前置信号处理方案可以为上述目标用户终端为提升上行语音数据的清晰性、响度等而做的相关处理,比如,相关处理可以包括回声消除、噪声抑制、自动增益等。为便于理解,这里以语音前置信号处理方案包括上述第一前置信号处理策略和上述第二前置信号处理策略为例,以阐述在应用层中控制 语音前置信号处理方案中的各语音优化组件的开启和关闭的具体过程。
如图8所示的应用层601a可以为上述业务应用的应用层,该应用层601a所对应的语音前置信号处理方案可以为上述第一前置信号处理策略,这样,该第一前置信号处理策略中的第一优化组件至少包括:图8所示的语音优化组件61a、语音优化组件62a、语音优化组件63a。应当理解,其中,图8所示的语音优化组件61a可以为上述用于进行回声消除的第一回声消除组件;同理,图8所示的语音优化组件62a可以为上述用于进行噪声抑制的第一噪声抑制组件;同理,图8所示的语音优化组件63a可以为上述用于进行增益调整的第一增益控制组件。
如图8所示的终端系统层602a可以为上述业务应用所属终端(即上述目标用户终端)的底层系统层,该终端系统层602a所对应的语音前置信号处理方案可以为上述第二前置信号处理策略,这样,该第二前置信号处理策略中的第二优化组件至少包括:图8所示的语音优化组件61b、语音优化组件62b、语音优化组件63b。应当理解,其中,图8所示的语音优化组件61b可以为上述用于进行回声消除的第二回声消除组件;同理,图8所示的语音优化组件62b可以为上述用于进行噪声抑制的第二噪声抑制组件;同理,图8所示的语音优化组件63b可以为上述用于进行增益调整的第二增益控制组件。
应当理解,为避免具有相同功能的各语音优化组件的重复运行,本申请实施例提出可以在图8所示的应用层601a中提供相应的开关,帮助该应用层601a控制图8所示的终端系统层602a中的各语音优化组件的开启和关闭。
比如,图8所示的应用层601a中的开关K11可以用于控制图8所示的语音优化组件61a,应用层中的开关K12可以用于控制图8所示的终端系统层602a中的语音优化组件61b。可以理解的是,由于图8所示的应用层601a中的语音优化组件61a与图8所示的终端系统层602a中的语音优化组件61b具有相同的优化功能,所以,该目标用户终端可以根据应用层601a与终端系统层602a之间的协同机制(也可以称之为协商机制),选择是否在该应用层601a中控制开启(或者关闭)终端系统层602a内的第二前置信号处理策略中的语音优化组件61b。比如,如图8所示,目标用户终端可以在应用层601a中控制开启第一前置信号处理策略中的语音优化组件61a,即该目标用户终端可以生成用于控制业务开关64a关闭开关K11,且断开开关K12的第一控制指令,此时,该第一控制指令可以用于指示该目标用户终端将第二前置信号处理策略中关闭的第二优化组件(例如,图8的语音优化组件61b)作为第一协同组件,并可以在第一前置信号处理策略中开启与该第一协同组件具有相同优化功能的第一优化组件(例如,图8所示的语音优化组件61a)。
同理,应用层601a中的开关K21可以用于控制图8所示的语音优化组件62a,应用层中的开关K22可以用于控制图8所示的终端系统层602a中的语音优化组件62b。可以理解的是,由于图8所示的应用层601a中的语音优化组件62a与图8所示的终端系统层602a中的语音优化组件62b具有相同的优化功能,所以,该目标用户终端可以根据应用层601a与终端系统层602a之间的协同机制(也可以称之为协商机制),选择是否在该应用层601a中控制开启(或者关闭)终端系统层602a内的第二前置信号处理策略中的语音优化组件62b。比如,如图8所示,目标用户终端可以在应用层601a中控制开启第二前置信号处理策略的语音优化组件62a,即该目标用户终端可以生成用于控制业务开关64b关闭开关K22,且断开开关K21的第二控制指令,此时,该第二控制指令可以用于指示该目标用户终端将第二前置信号处理策略中开启的第二优化组件(例如,图8的语音优化组件62b)作为第二协同组件,并可以在第一前置信号处理策略中关闭与该第二协同组件具有相同优化功能的第一优化组件(例如,图8所示的语音优化组件62a)。
同理,应用层601a中的开关K31可以用于控制图8所示的语音优化组件63a,应用层中的开关K32可以用于控制图8所示的终端系统层602a中的语音优化组件63b。可以理解的是,由于图8所示的应用层601a中的语音优化组件63a与图8所示的终端系统层602a中的语音 优化组件63b具有相同的优化功能,所以,该目标用户终端可以根据应用层601a与终端系统层602a之间的协同机制(也可以称之为协商机制),选择是否在该应用层601a中控制开启(或者关闭)终端系统层602a内的第二前置信号处理策略中的语音优化组件63b。其中,该目标用户终端可以生成用于控制业务开关64c关闭开关K31,且断开开关K32的第三控制指令的具体实现方式可以参见对上述第一控制指令的描述,这里将不再继续进行赘述。此时,目标用户终端将第二前置信号处理策略中关闭的第二优化组件(例如,图8的语音优化组件63b)作为新的第一协同组件,并可以在第一前置信号处理策略中开启与该新的第一协同组件具有相同优化功能的第一优化组件(例如,图8所示的语音优化组件63a)。
步骤S103,获取业务应用对应的第一用户在游戏语音模式下的上行语音数据,基于第一前置信号处理策略中开启的第一优化组件和第二前置信号处理策略中开启的第二优化组件,对游戏语音模式下的上行语音数据进行语音优化。
应当理解,该目标用户终端可以进一步基于上述第一前置信号处理策略中开启的第一优化组件和上述第二前置信号处理策略中开启的第二优化组件,对该游戏场景下实时采集到的第一用户的上行语音数据进行语音优化,以确保当前录入至该目标用户终端的上行语音数据的清晰度和响度。这样,当该目标用户终端在该游戏语音模式下,可以将具有较高清晰度和响度的第一用户的声音传递至通信对端(即上述第三用户所对应的终端)。这样,由该通信对端的扬声器所播放的下行语音数据可以为语音优化处理后的第一用户的声音。
在本申请实施例中,计算机设备(例如,用作移动终端的目标用户终端)可以在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,可以理解的是,该第一前置信号处理策略中的每个第一优化组件与第二前置信号处理策略中的对应第二优化组件具有相同的优化功能。所以,在后续游戏实时语音的人声处理(即上行语音数据的语音优化)过程中,可以在该游戏语音模式下有效地解决具有同一功能的语音优化组件的重复运行的现象。比如,本申请实施例提出可以根据前述信号处理结果(即具有相同功能的语音优化组件所对应的算法对比结果),在应用层控制开启或者关闭终端系统层中的一个或者多个第二优化组件,从而可以让具有同一功能的语音优化组件要么是运行在游戏应用层,要么是运行在终端系统层,这样,可以从根源上减少上行语音数据的音质损伤。可以理解的是,这里将不对终端系统层中所开启或者关闭的第二优化组件的数量和类型进行限定。进一步的,计算机设备在获取到第一用户在游戏语音模式下的上行语音数据时,则可以快速基于开启的第一优化组件和开启的第二优化组件,协同对采集到的上行语音数据进行语音优化,进而可以在降低音质损伤的情况下,提升游戏场景下的语音优化效果。
请参见图9,图9是本申请实施例提供的一种音频数据处理方法的示意图。如图9所示,该方法可以由用户终端(例如,目标用户终端,该目标用户终端可以为上述图1所示的用户终端3000a)执行,该方法具体可以包含以下步骤S201~S213中的至少一个步骤:
步骤S201,在第一用户访问业务应用时,获取用于加载业务应用的系统资源包,对系统资源包进行解析处理,得到业务应用的系统资源数据;
步骤S202,对系统资源数据进行初始化处理,基于初始化处理后的系统资源数据将业务应用的业务模式初始配置为系统媒体模式。
为便于理解,请参见图10,图10是本申请实施例提供的一种资源配置界面的场景示意图。可以理解的是,在游戏场景下,如图10所示的游戏用户A可以为上述图4所对应实施例中的用户1。
如图10所示,当该游戏用户A在该目标用户终端中启动图10所示的业务应用时,可以从图10所示的业务服务器上获取用于加载该业务应用的系统资源包,进而可以通过该目标用户终端中的编码器对获取到的系统资源包进行解析处理,以得到该业务应用的系统资源数据。进一步的,该目标用户终端还可以用于对该系统资源数据进行初始化处理,进而可以基于初始化处理后的系统资源数据输出图10的资源配置界面,如图10所示,该资源配置界面可以 用于动态输出初始化处理后的系统资源数据中的多媒体数据,这里的多媒体数据可以包括但不限于图10所示的图像帧和音频帧。可以理解的是,本申请实施例可以基于初始化处理后的系统资源数据将该业务应用的业务模式初始配置为系统媒体模式,以便于可以在图10所示的资源配置界面中通过扬声器播放图10所示的媒体音频数据(即前述音频帧数据和视频帧数据)。可以理解的是,该目标用户终端还可以在完成系统配置之后,可以执行下述步骤S103,进而可以将该业务应用的显示界面由图10所示的资源配置界面800a切换为包括语音控件的应用显示界面。这样,当该游戏用户A在应用显示界面中触发该处于关闭状态的语音控件时,可以将该业务应用的业务模式由当前的系统媒体模式切换为上述游戏语音模式,以在该游戏语音模式下进行语音交互。
步骤S203,基于初始化处理后的系统资源数据输出业务应用的应用显示界面;
其中,应用显示界面中包含用于指示第一用户发起语音交互业务的语音控件。
步骤S204,响应第一用户针对语音控件的语音开启操作,检测业务应用的应用类型;
步骤S205,在检测到业务应用的应用类型为游戏类型时,生成与游戏类型相关联的第一语音通话指令,进而可以基于第一语音通话指令将业务应用的业务模式由系统媒体模式切换为游戏语音模式。
可选的,目标用户终端在执行完上述步骤S204之后,还可以在检测到业务应用的应用类型为游戏类型时,直接将业务应用的业务模式由系统媒体模式切换为游戏语音模式。
步骤S206,在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;
步骤S207,根据信号处理结果,在应用层控制开启和关闭终端系统层内的第二前置信号处理策略中的第二优化组件,或控制开启和关闭第一前置信号处理策略中的第一优化组件;
其中,应当理解,目标用户终端在应用层中根据算法对比结果控制开启和关闭第一前置信号处理策略中的第一优化组件的具体实现方式,可以参见上述图5所对应实施例中,对控制开启和关闭终端系统层内的第二前置信号处理策略中的第二优化组件的具体过程的描述,这里将不再继续进行赘述。
步骤S208,获取业务应用对应的第一用户在游戏语音模式下的上行语音数据,基于第一前置信号处理策略中开启的第一优化组件和第二前置信号处理策略中开启的第二优化组件,对游戏语音模式下的上行语音数据进行语音优化。
其中,步骤S205-步骤S208的具体实现方式,可以参见上述图5所对应实施例中对步骤S101-步骤S104的描述,这里将不再继续进行赘述。
步骤S209,将语音优化后的上行语音数据作为上行语音数据对应的目标语音优化结果;
步骤S210,将目标语音优化结果发送给与第一用户相关联的第三用户对应的终端,以使第三用户对应的终端在游戏语音模式下通过扬声器播放语音优化后的上行语音数据;
可选地,第一用户与第三用户均为游戏语音模式下处于同一游戏阵营中的游戏用户。
可选的,可以理解的是,当计算机设备在执行完上述步骤S204之后,若目标用户终端检测到当前运行的业务应用的应用类型属于非游戏类型时,还可以跳转执行下述步骤S211-步骤S213,以在系统通话模式下使得上述第一用户可以和其他用户(例如,第二用户)进行系统通话。
步骤S211,在检测到业务应用的应用类型为非游戏类型时,生成与游戏类型相关联的第二语音通话指令,基于第二语音通话指令将业务应用的业务模式由系统媒体模式切换为系统通话模式;
步骤S212,在基于系统通话模式将语音交互业务的通话类型确定为系统通话类型时,通过业务应用向第二用户发送系统通话类型对应的系统通话请求;
其中,第二用户为第一用户在业务应用中所选择的请求进行系统通话的用户;
步骤S213,在第二用户响应系统通话请求时,建立第一用户与第二用户之间的系统通信 信道,基于系统通信信道进行系统通话。
为便于理解,进一步的,请参见图11,图11是本申请实施例提供的一种用于提供不同类型语言双讲服务的流程示意图。其中,如图11所示,当第一用户在该目标用户终端中启动上述业务应用之后,可以执行图11所示的步骤S1,以进行系统资源初始化,比如,该目标用户终端可以对上述解析得到的系统资源数据进行初始化处理,进而可以根据初始化处理后的系统资源数据执行图11所示的步骤S2,以使该目标用户终端默认进入系统媒体模式,具体的,目标用户终端可以将业务应用的业务模式初始配置为系统媒体模式。进一步的,当第一用户需要与其他用户进行语音交互时,可以执行图11所示的步骤S3,以在该目标用户终端的应用层发起语音通话。此时,该目标用户终端可以执行图11所示的步骤S4,以判断发起该语音通话的业务应用的应用类型,如果业务应用的应用类型是游戏类型,则可以执行图11所示的步骤S5,以进入游戏语音模式,即第一用户可以在该系统通话模式下与其他用户(例如,上述第三用户)进行游戏场景下的语音通话。否则,如图11所示,则可以执行图11所示的步骤S11,以进入系统通话模式,即第一用户可以在该系统通话模式下与其他用户(例如,上述第二用户)进行非游戏场景下的系统通话。
如图11所示,该目标用户终端在执行完步骤S5之后,还可以进一步执行步骤S6,以在该目标用户终端中设置终端的语音采样率(例如,设置图11所示的上行、下行采用率,以确保采样率)和声道数(以保证上行,下行的语音质量),这里的语音采用率和声道数可以为上述音质参数。进一步的,如图11所示,目标用户终端可以进一步执行步骤S7,即目标用户终端可以根据上述算法比对效果,开启应用层的语音前置语音处理算法,且关闭终端系统层的前置语音处理算法。可选的,目标用户终端还可以在开启终端系统层的前置语音处理算法的同时,关闭应用层的语音前置语音处理算法。这样,可以确保该目标用户终端中具有相同优化功能的语音优化组件要么工作在应用层,要么工作在终端系统层,即本申请实施例可以尽可能地确保在应用层的第一优化组件和终端系统层的具有相同优化功能的第二优化组件之间,同时只有一份语音优化组件的语音处理算法在工作,这样,可以最大程度地降低耗电量,且可以提供最优的语音音质效果。
进一步的,如图11所示,当第一用户在游戏场景下与其他用户执行图11所示的步骤S8时,可以在该游戏语音场景下进行多端之间的游戏语音通话,即在游戏语音通话的过程中,该目标用户终端可以通过上述协商确定的第一优化组件和第二优化组件对实时采集到的第一用户的上行语音数据进行优化处理,进而可以将优化处理后的第一用户的声音发送给其他用户。进一步的,可以理解的是,当第一用户不需要向同一阵营中的其他游戏用户发送优化处理后的语音时,可以在游戏场景下执行图11所示的步骤S9,比如,该目标用户终端可以响应该第一用户针对上述语音控件的语音关闭操作,进而可以将业务应用的业务模式由前述游戏语音模式切换回图11所示的系统媒体模式。应当理解,本申请实施例还可以在游戏场景下,通过该系统媒体模式播放其他用户所对应的终端所传输来的优化处理后的其他用户的声音,比如,目标用户终端对应的第一用户可以在该系统媒体模式下听到优化处理后的其他用户(即上述第三用户)的声音。此时,该第一用户可以在关闭语音控件的情况下,无需继续该第一用户的上行语音数据进行语音优化,即该第一用户此时可以无需对向游戏场景下的其他用户发送优化处理后的第一用户的声音。
应当理解,如图11所示,当第一用户运行完上述业务应用中的游戏之后,可以执行图11所示的步骤S10,以退出当前的游戏系统,此时,该目标用户终端可以释放相关的系统资源数据。
可以理解的是,当第一用户在该目标用户终端中听音乐时,该目标用户终端可以工作在上述系统媒体模式,当第一用户在该目标用户终端中进行电话通话时,该目标用户终端可以工作在上述系统通话模式;可选的,当第一用户在该目标用户终端中进行游戏语音时,该目标用户终端可以工作在上述游戏语音模式。应当理解,本申请实施例所涉及的语音交互系统 可以包含以下两个模块,一个模块为目标用户终端内的游戏语音模式,它可以与前述系统通话模式,系统媒体模式并行存在与目标用户终端中。可以理解的是,在该游戏语音模式下,基于该目标用户终端的音质指标所配置的语音上行、下行语音采样率,通道数之间互不影响。另一个模块是运行在应用层的前置信号处理方案,比如,目标用户终端可以根据终端系统层的语音处理效果智能调整应用层的前置信号处理方案。这样,通过两个模块的协同工作,可以使得该目标用户终端在游戏场景下提升游戏用户之间的语音交互体验。
在本申请实施例中,计算机设备(例如,目标用户终端)在检测到业务应用的应用类型为游戏类型时,进入游戏语音模式,进而可以在游戏语音模式下,自适应地根据前述信号处理结果(即具有相同功能的语音优化组件所对应的算法对比结果),在应用层控制开启或者关闭终端系统层中的一个或者多个第二优化组件,从而可以让具有同一优化功能的语音优化组件要么是运行在游戏应用层,要么是运行在终端系统层,这样,可以从根源上减少上行语音数据的音质损伤。可以理解的是,这里将不对终端系统层中所开启或者关闭的第二优化组件的数量和类型进行限定。进一步的,计算机设备(例如,目标用户终端)还可以在获取到第一用户在游戏语音模式下的上行语音数据时,快速基于开启的第一优化组件和开启的第二优化组件,对游戏语音模式下的上行语音数据进行语音优化,进而可以在降低音质损伤的情况下,提升游戏场景下的语音优化效果。可选的,本申请实施例还可以在检测到业务应用的应用类型为非游戏类型时,进入系统语音模式,进而可以在系统语音模式下使得该第一用户可以与其他用户进行系统通话。
请参见图12,图12是本申请实施例提供的另一种音频数据处理方法的流程示意图。该方法由计算机设备执行,例如该方法可以由用户终端(例如,上述目标用户终端,该目标用户终端可以为上述图4所对应实施例中的用户终端10a)执行,也可以由业务服务器(如,上述图1所示的业务服务器2000)执行,还可以由用户终端和业务服务器交互配合执行。为便于理解,本实施例以该方法由用户终端执行为例进行说明。其中,该音频数据处理方法可以包括以下步骤S301-步骤S302中的至少一个步骤:
步骤S301,在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;
步骤S302,根据信号处理结果,在应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态。
其中,第一前置信号处理策略中包括至少一个第一优化组件,第二前置信号处理策略中包括至少一个第二优化组件。
例如,根据信号处理结果,在应用层确定终端系统层内的第二前置信号处理策略中需要开启的第二优化组件,和/或,根据信号处理结果,在应用层确定终端系统层内的第二前置信号处理策略中需要关闭的第二优化组件。之后,对于第二前置信号处理策略中需要开启的第二优化组件,如果该第二优化组件的当前状态为关闭状态,则应用层控制该第二优化组件开启,如果该第二优化组件的当前状态为开启状态,则保持该第二优化组件开启;对于第二前置信号处理策略中需要关闭的第二优化组件,如果该第二优化组件的当前状态为开启状态,则应用层控制该第二优化组件关闭,如果该第二优化组件的当前状态为关闭状态,则保持该第二优化组件关闭。
可选地,根据信号处理结果,在应用层控制第一前置信号处理策略中的第一优化组件的开关状态。
在一些实施例中,上述步骤S302包括:根据信号处理结果,确定第二前置信号处理策略中开启的第二优化组件,以及确定第二前置信号处理策略中关闭的第二优化组件;在应用层中,将第二前置信号处理策略中关闭的第二优化组件作为第一协同组件,且在第一前置信号处理策略中开启与该第一协同组件具有相同优化功能的第一优化组件;在应用层中,将第二前置信号处理策略中开启的第二优化组件作为第二协同组件,且在第一前置信号处理策略中 关闭与该第二协同组件具有相同优化功能的第一优化组件。
在一些实施例中,上述步骤S301包括:获取业务应用所属终端的终端类型,在与业务应用相关联的测试列表中查找与终端类型相匹配的测试类型;若在测试列表中查找到与终端类型相匹配的测试类型,则基于音质参数从测试列表中获取采用第一前置信号处理策略所得到的第一测试处理结果,且获取采用第二前置信号处理策略所得到的第二测试处理结果;基于第一测试处理结果和第二测试处理结果,从第一前置信号处理策略和第二前置信号处理策略中确定与音质参数相关联的最优信号处理策略,将最优信号处理策略作为与第一前置信号处理策略相关联的信号处理结果。
可选地,若在测试列表中未查找到与终端类型相匹配的测试类型,则在游戏语音模式下通过麦克风获取到第一用户的上行语音数据时,通过第一前置信号处理策略对上行语音数据进行语音优化,得到第一语音优化结果,且通过第二前置信号处理策略对上行语音数据进行语音优化,得到第二语音优化结果;基于第一语音优化结果和第二语音优化结果,从第一前置信号处理策略和第二前置信号处理策略中确定与音质参数相关联的最优信号处理策略,将最优信号处理策略作为与第一前置信号处理策略相关联的信号处理结果。
在一些实施例中,上述方法还包括:获取业务应用对应的第一用户在游戏语音模式下的上行语音数据,基于第一前置信号处理策略中开启的第一优化组件和第二前置信号处理策略中开启的第二优化组件,对游戏语音模式下的上行语音数据进行语音优化。
在一些实施例中,上述方法还包括:在第一用户访问所述业务应用时,获取用于加载业务应用的系统资源包,对系统资源包进行解析处理,得到业务应用的系统资源数据;对系统资源数据进行初始化处理,基于初始化处理后的系统资源数据将业务应用的业务模式初始配置为系统媒体模式。
在一些实施例中,上述方法还包括:将语音优化后的上行语音数据作为上行语音数据对应的目标语音优化结果;将目标语音优化结果发送给与第一用户相关联的第三用户对应的终端,以使第三用户对应的终端在游戏语音模式下通过扬声器播放所述语音优化后的上行语音数据。
在本申请实施例中,通过提供一种游戏语音模式,在该模式下,业务应用的应用层有权限对终端系统层内的语音优化组件的开关状态进行控制,从而使得业务应用能够根据实际业务请求或需求,灵活控制终端系统层内的语音优化组件的开关状态,保证在该模式下的语音优化效果。
请参见图13,图13是本申请实施例提供的另一种音频数据处理方法的流程示意图。该方法由计算机设备执行,例如该方法可以由用户终端(例如,上述目标用户终端,该目标用户终端可以为上述图4所对应实施例中的用户终端10a)执行,也可以由业务服务器(如,上述图1所示的业务服务器2000)执行,还可以由用户终端和业务服务器交互配合执行。为便于理解,本实施例以该方法由用户终端执行为例进行说明。其中,该音频数据处理方法可以包括以下步骤S401-步骤S402中的至少一个步骤:
步骤S401,在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;
步骤S402,根据信号处理结果,控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,或控制第一前置信号处理策略中的第一优化组件的开关状态;其中,第一前置信号处理策略中开启的第一优化组件,不同于在第二前置信号处理策略中开启的第二优化组件。
其中,第一前置信号处理策略中包括至少一个第一优化组件,第二前置信号处理策略中包括至少一个第二优化组件。
可选地,第一前置信号处理策略中开启的第一优化组件与第二前置信号处理策略中关闭的语音优化组件具备同一优化功能,且第二前置信号处理策略中开启的第二优化组件与第一 前置信号处理策略中关闭的第一优化组件具备同一优化功能。
可选地,步骤S402可以由业务应用的应用层执行,也可以由终端系统层执行,或者由应用层和终端系统层共同配合执行。例如,由应用层控制第一前置信号处理策略中的第一优化组件的开关状态,由终端系统层控制第二前置信号处理策略中的第二优化组件的开关状态。在这种情况下,应用层和终端系统层之间需要同步信号处理结果,或者同步需要开启和关闭的第一优化组件和/或第二优化组件。
在一些实施例中,上述步骤S402包括:根据信号处理结果,确定第二前置信号处理策略中开启的第二优化组件,以及确定第二前置信号处理策略中关闭的第二优化组件;将第二前置信号处理策略中关闭的第二优化组件进行关闭,且在第一前置信号处理策略中开启与关闭的第二优化组件具有相同优化功能的第一优化组件;将第二前置信号处理策略中开启的第二优化组件进行开启,且在第一前置信号处理策略中关闭与开启的第二优化组件具有相同优化功能的第一优化组件。可选地,对于第二前置信号处理策略中需要开启的第二优化组件,如果该第二优化组件的当前状态为关闭状态,则控制该第二优化组件开启,如果该第二优化组件的当前状态为开启状态,则保持该第二优化组件开启;对于第二前置信号处理策略中需要关闭的第二优化组件,如果该第二优化组件的当前状态为开启状态,则控制该第二优化组件关闭,如果该第二优化组件的当前状态为关闭状态,则保持该第二优化组件关闭。
在一些实施例中,上述步骤S401包括:获取业务应用所属终端的终端类型,在与业务应用相关联的测试列表中查找与终端类型相匹配的测试类型;若在测试列表中查找到与终端类型相匹配的测试类型,则基于音质参数从测试列表中获取采用第一前置信号处理策略所得到的第一测试处理结果,且获取采用第二前置信号处理策略所得到的第二测试处理结果;基于第一测试处理结果和第二测试处理结果,从第一前置信号处理策略和第二前置信号处理策略中确定与音质参数相关联的最优信号处理策略,将最优信号处理策略作为与第一前置信号处理策略相关联的信号处理结果。
可选地,若在测试列表中未查找到与终端类型相匹配的测试类型,则在游戏语音模式下通过麦克风获取到第一用户的上行语音数据时,通过第一前置信号处理策略对上行语音数据进行语音优化,得到第一语音优化结果,且通过第二前置信号处理策略对上行语音数据进行语音优化,得到第二语音优化结果;基于第一语音优化结果和第二语音优化结果,从第一前置信号处理策略和第二前置信号处理策略中确定与音质参数相关联的最优信号处理策略,将最优信号处理策略作为与第一前置信号处理策略相关联的信号处理结果。
在一些实施例中,上述方法还包括:在第一用户访问所述业务应用时,获取用于加载业务应用的系统资源包,对系统资源包进行解析处理,得到业务应用的系统资源数据;对系统资源数据进行初始化处理,基于初始化处理后的系统资源数据将业务应用的业务模式初始配置为系统媒体模式。
可选地,上述步骤S402之后还包括:获取业务应用对应的第一用户在游戏语音模式下的上行语音数据,基于第一前置信号处理策略中开启的第一优化组件和第二前置信号处理策略中开启的第二优化组件,对游戏语音模式下的上行语音数据进行语音优化。
在本申请实施例中,通过根据前述信号处理结果,控制开启或者关闭终端系统层中的一个或者多个语音优化组件,从而可以让具有同一优化功能的语音优化组件要么是运行在应用层,要么是运行在终端系统层,这样,可以从根源上减少上行语音数据的音质损伤,提升游戏场景下的语音优化效果。
另外,对于图12和图13实施例中未详细说明的细节,和参见本申请其他实施例中相关内容的描述,此处不再一一赘述。
请参见图14,图14是本申请实施例提供的一种音频数据处理装置的结构示意图。其中,该音频数据处理装置1可以包括以下至少之一:处理结果获取模块12,组件控制模块13和语音优化模块14。可选的,该音频数据处理装置还可以包括以下至少之一:资源包获取模块 15,初始化模块16,应用界面输出模块17,语音开启模块18,游戏模式切换模块11,通话模式切换模块19,通话请求发送模块20,通信信道建立模块21,目标结果确定模块22,目标结果发送模块23,语音关闭模块24。
处理结果获取模块12,用于在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,第一前置信号处理策略中包括至少一个第一优化组件;
其中,处理结果获取模块12包括:音质指标获取单元121,终端类型查找单元122,测试结果获取单元123,最优策略确定单元124,优化结果获取单元125和处理结果确定单元126;
音质指标获取单元121,用于在游戏语音模式下,获取业务应用的音质指标,根据业务应用的音质指标,配置业务应用的音质参数;
终端类型查找单元122,用于获取业务应用所属终端的终端类型,在与业务应用相关联的测试列表中查找与终端类型相匹配的测试类型;
测试结果获取单元123,用于若在测试列表中查找到与终端类型相匹配的测试类型,则基于音质参数从测试列表中获取采用第一前置信号处理策略所得到的第一测试处理结果,且获取采用第二前置信号处理策略所得到的第二测试处理结果;第一前置信号处理策略为业务应用的应用层内的前置信号处理策略;第二前置信号处理策略为测试终端类型所对应的系统终端内的前置信号处理策略;
最优策略确定单元124,用于基于第一测试处理结果和第二测试处理结果,从第一前置信号处理策略和第二前置信号处理策略中确定与音质参数相关联的最优信号处理策略,将确定出的最优信号处理策略作为与第一前置信号处理策略相关联的信号处理结果。
其中,第一前置信号处理策略中的第一优化组件的语音优化算法包括以下至少一种:用于在应用层进行回声消除的第一回声消除算法、用于在应用层进行噪声抑制的第一噪声抑制算法、和用于在应用层进行增益调整的第一增益控制算法;第二前置信号处理策略中的第二优化组件的语音优化算法包括以下至少一种:用于在终端系统层进行回声消除的第二回声消除算法、用于在终端系统层进行噪声抑制的第二噪声抑制算法、和用于在终端系统层进行增益调整的第二增益控制算法。
其中,最优确定单元124包括:第一选取子单元1241,第二选取子单元1242,第三选取子单元1243和最优策略确定子单元1244;
第一选取子单元1241,用于从第一测试处理结果中获取第一回声消除算法所对应的第一回声消除结果,从第二测试处理结果中获取第二回声消除算法所对应的第二回声消除结果,基于第一回声消除结果和第二回声消除结果,从第一回声消除算法和第二回声消除算法中选取最优回声消除算法,将最优回声消除算法作为与音质参数相关联的第一最优信号处理策略;
其中,第一选取子单元1241,具体用于从第一测试处理结果中获取第一回声消除算法所对应的第一回声消除结果,从第二测试处理结果中获取第二回声消除算法所对应的第二回声消除结果;
第一选取子单元1241,还具体用于将第一回声消除结果对应的优化质量与第二回声消除结果对应的优化质量进行第一比较,得到第一比较结果;
第一选取子单元1241,还具体用于若第一比较结果指示第一回声消除结果对应的优化质量优于第二回声消除结果对应的优化质量,则将第一前置信号处理策略中的第一回声消除算法作为与音质参数相关联的第一最优信号处理策略;
可选的,第一选取子单元1241,还具体用于若第一比较结果指示第二回声消除结果对应的优化质量优于第一回声消除结果对应的优化质量,则将第二前置信号处理策略中的第二回声消除算法作为与音质参数相关联的第一最优信号处理策略。
第二选取子单元1242,用于从第一测试处理结果中获取第一噪声抑制算法所对应的第一 噪声抑制结果,从第二测试处理结果中获取第二噪声抑制算法所对应的第二噪声抑制结果,基于第一噪声抑制结果和第二噪声抑制结果,从第一噪声抑制算法和第二噪声抑制算法中选取最优噪声抑制算法,将最优噪声抑制算法作为与音质参数相关联的第二最优信号处理策略;
其中,第二选取子单元1242,具体用于从第一测试处理结果中获取第一噪声抑制算法所对应的第一噪声抑制结果,从第二测试处理结果中获取第二噪声抑制算法所对应的第二噪声抑制结果;
第二选取子单元1242,还具体用于将第一噪声抑制结果对应的优化质量与第二噪声抑制结果对应的优化质量进行第二比较,得到第二比较结果;
第二选取子单元1242,还具体用于若第二比较结果指示第一噪声抑制结果对应的优化质量优于第二噪声抑制结果对应的优化质量,则将第一前置信号处理策略中的第一噪声抑制算法作为与音质参数相关联的第二最优信号处理策略;
可选的,第二选取子单元1242,还具体用于若第二比较结果指示第二噪声抑制结果对应的优化质量优于第一噪声抑制结果对应的优化质量,则将第二前置信号处理策略中的第二噪声抑制算法作为与音质参数相关联的第二最优信号处理策略。
第三选取子单元1243,用于从第一测试处理结果中获取第一增益控制算法所对应的第一增益控制结果,从第二测试处理结果中获取第二增益控制算法所对应的第二增益控制结果,基于第一增益控制结果和第二增益控制结果,从第一增益控制算法和第二增益控制算法中选取最优增益控制算法,将最优增益控制算法作为与音质参数相关联的第三最优信号处理策略;
其中,第三选取子单元1243,具体用于从第一测试处理结果中获取第一增益控制算法所对应的第一增益控制结果,从第二测试处理结果中获取第二增益控制算法所对应的第二增益控制结果;
第三选取子单元1243,还具体用于将第一增益控制结果对应的优化质量与第二增益控制结果对应的优化质量进行第三比较,得到第三比较结果;
第三选取子单元1243,还具体用于若第三比较结果指示第一增益控制结果对应的优化质量优于第二增益控制结果对应的优化质量,则将第一前置信号处理策略中的第一增益控制算法作为与音质参数相关联的第三最优信号处理策略;
可选的,第三选取子单元1243,还具体用于若第三比较结果指示第二增益控制结果对应的优化质量优于第一增益控制结果对应的优化质量,则将第一前置信号处理策略中的第二增益控制算法作为与音质参数相关联的第三最优信号处理策略。
最优策略确定子单元1244,用于将第一最优信号处理策略、第二最优信号处理策略和第三最优信号处理策略,确定为与第一前置信号处理策略相关联的信号处理结果。
其中,第一选取子单元1241,第二选取子单元1242,第三选取子单元1243和最优策略确定子单元1244的具体实现方式,可以参见上述对确定信号处理结果的具体实施方式的描述,这里将不再继续进行赘述。
可选的,优化结果获取单元125,用于若在测试列表中未查找到与终端类型相匹配的测试类型,则在游戏语音模式下通过麦克风获取到第一用户的上行语音数据时,通过第一前置信号处理策略对上行语音数据进行语音优化,得到第一语音优化结果,且通过第二前置信号处理策略对上行语音数据进行语音优化,得到第二语音优化结果;
处理结果确定单元126,用于基于第一语音优化结果和第二语音优化结果,从第一前置信号处理策略和第二前置信号处理策略中确定与音质参数相关联的最优信号处理策略,将确定出的最优信号处理策略作为与第一前置信号处理策略相关联的信号处理结果。
其中,音质指标获取单元121,终端类型查找单元122,测试结果获取单元123,最优策略确定单元124,优化结果获取单元125和处理结果确定单元126的具体实现方式可以参见上述图5所对应实施例中对步骤S101和步骤S102的描述,这里将不再继续进行赘述。
组件控制模块13,用于根据信号处理结果,在应用层控制终端系统层内的第二前置信号 处理策略中的第二优化组件的开关状态,或控制第一前置信号处理策略中的第一优化组件的开关状态;
其中,第一前置信号处理策略中开启的第一优化组件不同于在第二前置信号处理策略中开启的第二优化组件;第一前置信号处理策略中开启的第一优化组件与第二前置信号处理策略中关闭的语音优化组件具备同一优化功能,且第二前置信号处理策略中开启的第二优化组件与第一前置信号处理策略中关闭的第一优化组件具备同一优化功能;
其中,组件控制模块13包括:协同机制启动单元131,组件控制单元132,第一组件开启单元133和第二组件开启单元134;
协同机制启动单元131,用于根据信号处理结果启动应用层与业务应用所属终端的终端系统层之间的协同机制;
组件控制单元132,基于协同机制在应用层控制开启和关闭终端系统层内的第二前置信号处理策略中的第二优化组件;
第一组件开启单元133,用于在应用层中,将第二前置信号处理策略中关闭的第二优化组件作为第一协同组件,且在第一前置信号处理策略中开启与第一协同组件具有相同优化功能的第一优化组件;
第二组件开启单元134,用于在应用层中,将第二前置信号处理策略中开启的第二优化组件作为第二协同组件,且在第一前置信号处理策略中关闭与第二协同组件具有相同优化功能的第一优化组件。
其中,协同机制启动单元131,组件控制单元132,第一组件开启单元133和第二组件开启单元134的具体实现方式,可以参见上述图5所对应实施例中对步骤S102的描述,这里将不再继续进行赘述。
语音优化模块14,用于获取业务应用对应的第一用户在游戏语音模式下的上行语音数据,基于第一前置信号处理策略中开启的第一优化组件和第二前置信号处理策略中开启的第二优化组件,对游戏语音模式下的上行语音数据进行语音优化。
其中,第一前置信号处理策略中的第一优化组件至少包括:第一回声消除组件、第一噪声抑制组件和第一增益控制组件;第二前置信号处理策略中的第二优化组件至少包括:第二回声消除组件、第二噪声抑制组件和第二增益控制组件;第一回声消除组件和第二回声消除组件均用于进行回声消除;第一噪声抑制组件和第二噪声抑制组件均用于进行噪声抑制;第一增益控制组件和第二增益控制组件均用于进行增益调整。
可选的,资源包获取模块15,用于在第一用户访问业务应用时,获取用于加载业务应用的系统资源包,对系统资源包进行解析处理,得到业务应用的系统资源数据;
初始化模块16,用于对系统资源数据进行初始化处理,基于初始化处理后的系统资源数据将业务应用的业务模式初始配置为系统媒体模式。
应用界面输出模块17,用于基于初始化处理后的系统资源数据输出业务应用的应用显示界面;应用显示界面中包含用于指示第一用户发起语音交互业务的语音控件;
语音开启模块18,用于响应第一用户针对语音控件的语音开启操作,检测业务应用的应用类型;
可以理解的是,该语音开启模块18可以在检测到该业务应用的应用类型为游戏类型时,通知游戏模式切换模块11在检测到业务应用的应用类型为游戏类型时,生成与游戏类型相关联的第一语音通话指令,基于第一语音通话指令将业务应用的业务模式由系统媒体模式切换为游戏语音模式。
可选的,该语音开启模块18还可以在检测到该业务应用的应用类型为非游戏类型(例如,社交类型)时,通知通话模式切换模块19在检测到业务应用的应用类型为非游戏类型时,生成与游戏类型相关联的第二语音通话指令,基于第二语音通话指令将业务应用的业务模式由系统媒体模式切换为系统通话模式。
通话请求发送模块20,用于在基于系统通话模式将语音交互业务的通话类型确定为系统通话类型时,通过业务应用向第二用户发送系统通话类型对应的系统通话请求;第二用户为第一用户在业务应用中所选择的请求进行系统通话的用户;
通信信道建立模块21,用于在第二用户响应系统通话请求时,建立第一用户与第二用户之间的系统通信信道,基于系统通信信道进行系统通话。
可选的,目标结果确定模块22,用于将语音优化后的上行语音数据作为上行语音数据对应的目标语音优化结果;
目标结果发送模块23,用于将目标语音优化结果发送给与第一用户相关联的第三用户对应的终端,以使第三用户对应的终端在游戏语音模式下通过扬声器播放语音优化后的上行语音数据;可选地,第一用户与第三用户均为游戏语音模式下处于同一游戏阵营中的游戏用户。
可选的,语音关闭模块24,用于响应第一用户针对语音控件的语音关闭操作,将业务应用的业务模式由游戏语音模式切换回系统媒体模式。
其中,处理结果获取模块12,组件控制模块13和语音优化模块14的具体实现方式,可以参见上述图5所对应实施例中步骤101-步骤S103的描述,这里将不再进行赘述。进一步的,资源包获取模块15,初始化模块16,应用界面输出模块17,语音开启模块18,游戏模式切换模块11,通话模式切换模块19,通话请求发送模块20,通信信道建立模块21,目标结果确定模块22,目标结果发送模块23,语音关闭模块24的具体实现方式,可以参见上述图9所对应实施例中步骤201-步骤S213的描述,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
本申请一示例性实施例还提供了一种音频数据处理装置,该装置用于执行如图12所示的方法实施例,该装置可以包括以下至少之一:处理结果获取模块和组件控制模块。
处理结果获取模块,用于在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,第一前置信号处理策略中包括至少一个第一优化组件。
组件控制模块,用于根据信号处理结果,在应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态;其中,第二前置信号处理策略中包括至少一个第二优化组件。
本申请一示例性实施例还提供了一种音频数据处理装置,该装置用于执行如图13所示的方法实施例,该装置可以包括以下至少之一:处理结果获取模块和组件控制模块。
处理结果获取模块,用于在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,第一前置信号处理策略中包括至少一个第一优化组件。
组件控制模块,用于根据信号处理结果,控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,或控制第一前置信号处理策略中的第一优化组件的开关状态;其中,第一前置信号处理策略中开启的第一优化组件,不同于在第二前置信号处理策略中开启的第二优化组件。
对于上述装置实施例中未详细说明的细节,可结合参考相对应的方法实施例。
进一步地,请参见图15,图15是本申请实施例提供的一种计算机设备的结构示意图。如图15所示,该计算机设备1000可以为用户终端,该用户终端可以为上述目标用户终端。此时。该计算机设备1000可以包括:处理器1001,网络接口1004和存储器1005,此外,该计算机设备1000还可以包括:用户接口1003,和至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(Display)、键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存 储器1005可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图15所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。
其中,该计算机设备1000中的网络接口1004还可以提供网络通讯功能,且可选用户接口1003还可以包括显示屏(Display)、键盘(Keyboard)。在图15所示的计算机设备1000中,网络接口1004可提供网络通讯功能;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以执行前文图5或者图9或者图12或者图13所对应实施例或其他方法实施例中对该音频数据处理方法的描述,也可执行前文图14所对应实施例中对该音频数据处理装置1的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,这里需要指出的是:本申请实施例还提供了一种计算机存储介质,且计算机存储介质中存储有前文提及的音频数据处理装置1所执行的计算机程序,且计算机程序包括程序指令,当处理器执行程序指令时,能够执行前文图5或图9或图12或图13所对应实施例或其他方法实施例中对音频数据处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
可以理解的是,本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行前文图5或图9或图12或图13所对应实施例或其他方法实施例中对音频数据处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (34)

  1. 一种音频数据处理方法,所述方法由计算机设备执行,所述方法包括:
    在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,所述第一前置信号处理策略中包括至少一个第一优化组件;
    根据所述信号处理结果,在所述应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,或控制所述第一前置信号处理策略中的第一优化组件的开关状态;其中,所述第一前置信号处理策略中开启的第一优化组件,不同于在所述第二前置信号处理策略中开启的第二优化组件;
    获取所述业务应用对应的第一用户在所述游戏语音模式下的上行语音数据,基于所述第一前置信号处理策略中开启的第一优化组件和所述第二前置信号处理策略中开启的第二优化组件,对所述游戏语音模式下的上行语音数据进行语音优化。
  2. 根据权利要求1所述的方法,其中,所述根据所述信号处理结果,在所述应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,包括:
    根据所述信号处理结果,确定所述第二前置信号处理策略中开启的第二优化组件,以及确定所述第二前置信号处理策略中关闭的第二优化组件;
    在所述应用层中,将所述第二前置信号处理策略中关闭的第二优化组件作为第一协同组件,且在所述第一前置信号处理策略中开启与所述第一协同组件具有相同优化功能的第一优化组件;
    在所述应用层中,将所述第二前置信号处理策略中开启的第二优化组件作为第二协同组件,且在所述第一前置信号处理策略中关闭与所述第二协同组件具有相同优化功能的第一优化组件。
  3. 根据权利要求1所述的方法,其中,所述第一前置信号处理策略中的第一优化组件包括以下至少一种:第一回声消除组件、第一噪声抑制组件和第一增益控制组件;所述第二前置信号处理策略中的第二优化组件包括以下至少一种:第二回声消除组件、第二噪声抑制组件和第二增益控制组件;所述第一回声消除组件和所述第二回声消除组件均用于进行回声消除;所述第一噪声抑制组件和所述第二噪声抑制组件均用于进行噪声抑制;所述第一增益控制组件和所述第二增益控制组件均用于进行增益调整。
  4. 根据权利要求1所述的方法,其中,所述在游戏语音模式下,获取与所述业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果,包括:
    在所述游戏语音模式下,根据所述业务应用的音质指标,配置所述业务应用的音质参数;
    获取所述业务应用所属终端的终端类型,在与所述业务应用相关联的测试列表中查找与所述终端类型相匹配的测试类型;
    若在所述测试列表中查找到与所述终端类型相匹配的测试类型,则基于所述音质参数从所述测试列表中获取采用所述第一前置信号处理策略所得到的第一测试处理结果,且获取采用所述第二前置信号处理策略所得到的第二测试处理结果;
    基于所述第一测试处理结果和所述第二测试处理结果,从所述第一前置信号处理策略和所述第二前置信号处理策略中确定与所述音质参数相关联的最优信号处理策略,将所述最优信号处理策略作为与所述第一前置信号处理策略相关联的信号处理结果。
  5. 根据权利要求4所述的方法,其中,所述第一前置信号处理策略中的第一优化组件的语音优化算法包括以下至少一种:用于在所述应用层进行回声消除的第一回声消除算法、用于在所述应用层进行噪声抑制的第一噪声抑制算法、和用于在所述应用层进行增益调整的第一增益控制算法;所述第二前置信号处理策略中的第二优化组件的语音优化算法包括以下至少一种:用于在所述终端系统层进行回声消除的第二回声消除算法、用于在所述终端系统层进行噪声抑制的第二噪声抑制算法、和用于在所述终端系统层进行增益调整的第二增益控制算法。
  6. 根据权利要求5所述的方法,其中,所述基于所述第一测试处理结果和所述第二测试处理结果,从所述第一前置信号处理策略和所述第二前置信号处理策略中确定与所述音质参数相关联的最优信号处理策略,将所述最优信号处理策略作为与所述第一前置信号处理策略相关联的信号处理结果,包括:
    从所述第一测试处理结果中获取所述第一回声消除算法所对应的第一回声消除结果,从所述第二测试处理结果中获取所述第二回声消除算法所对应的第二回声消除结果,基于所述第一回声消除结果和所述第二回声消除结果,从所述第一回声消除算法和所述第二回声消除算法中选取最优回声消除算法,将所述最优回声消除算法作为与所述音质参数相关联的第一最优信号处理策略;
    从所述第一测试处理结果中获取所述第一噪声抑制算法所对应的第一噪声抑制结果,从所述第二测试处理结果中获取所述第二噪声抑制算法所对应的第二噪声抑制结果,基于所述第一噪声抑制结果和所述第二噪声抑制结果,从所述第一噪声抑制算法和所述第二噪声抑制算法中选取最优噪声抑制算法,将所述最优噪声抑制算法作为与所述音质参数相关联的第二最优信号处理策略;
    从所述第一测试处理结果中获取所述第一增益控制算法所对应的第一增益控制结果,从所述第二测试处理结果中获取所述第二增益控制算法所对应的第二增益控制结果,基于所述第一增益控制结果和所述第二增益控制结果,从所述第一增益控制算法和所述第二增益控制算法中选取最优增益控制算法,将所述最优增益控制算法作为与所述音质参数相关联的第三最优信号处理策略;
    将所述第一最优信号处理策略、所述第二最优信号处理策略和所述第三最优信号处理策略,确定为与所述第一前置信号处理策略相关联的信号处理结果。
  7. 根据权利要求6所述的方法,其中,所述从所述第一测试处理结果中获取所述第一回声消除算法所对应的第一回声消除结果,从所述第二测试处理结果中获取所述第二回声消除算法所对应的第二回声消除结果,基于所述第一回声消除结果和所述第二回声消除结果,从所述第一回声消除算法和所述第二回声消除算法中选取最优回声消除算法,将所述最优回声消除算法作为与所述音质参数相关联的第一最优信号处理策略,包括:
    从所述第一测试处理结果中获取所述第一回声消除算法所对应的第一回声消除结果,从所述第二测试处理结果中获取所述第二回声消除算法所对应的第二回声消除结果;
    将所述第一回声消除结果对应的优化质量与所述第二回声消除结果对应的优化质量进行比较,得到第一比较结果;
    若所述第一比较结果指示所述第一回声消除结果对应的优化质量优于所述第二回声消除结果对应的优化质量,则将所述第一前置信号处理策略中的所述第一回声消除算法作为与所述音质参数相关联的第一最优信号处理策略;
    若所述第一比较结果指示所述第二回声消除结果对应的优化质量优于所述第一回声消除结果对应的优化质量,则将所述第二前置信号处理策略中的所述第二回声消除算法作为与所述音质参数相关联的第一最优信号处理策略。
  8. 根据权利要求6所述的方法,其中,所述从所述第一测试处理结果中获取所述第一噪声抑制算法所对应的第一噪声抑制结果,从所述第二测试处理结果中获取所述第二噪声抑制算法所对应的第二噪声抑制结果,基于所述第一噪声抑制结果和所述第二噪声抑制结果,从所述第一噪声抑制算法和所述第二噪声抑制算法中选取最优噪声抑制算法,将所述最优噪声抑制算法作为与所述音质参数相关联的第二最优信号处理策略,包括:
    从所述第一测试处理结果中获取所述第一噪声抑制算法所对应的第一噪声抑制结果,从所述第二测试处理结果中获取所述第二噪声抑制算法所对应的第二噪声抑制结果;
    将所述第一噪声抑制结果对应的优化质量与所述第二噪声抑制结果对应的优化质量进行比较,得到第二比较结果;
    若所述第二比较结果指示所述第一噪声抑制结果对应的优化质量优于所述第二噪声抑制结果对应的优化质量,则将所述第一前置信号处理策略中的所述第一噪声抑制算法作为与所述音质参数相关联的第二最优信号处理策略;
    若所述第二比较结果指示所述第二噪声抑制结果对应的优化质量优于所述第一噪声抑制结果对应的优化质量,则将所述第二前置信号处理策略中的所述第二噪声抑制算法作为与所述音质参数相关联的第二最优信号处理策略。
  9. 根据权利要求6所述的方法,其中,所述从所述第一测试处理结果中获取所述第一增益控制算法所对应的第一增益控制结果,从所述第二测试处理结果中获取所述第二增益控制算法所对应的第二增益控制结果,基于所述第一增益控制结果和所述第二增益控制结果,从所述第一增益控制算法和所述第二增益控制算法中选取最优增益控制算法,将所述最优增益控制算法作为与所述音质参数相关联的第三最优信号处理策略,包括:
    从所述第一测试处理结果中获取所述第一增益控制算法所对应的第一增益控制结果,从所述第二测试处理结果中获取所述第二增益控制算法所对应的第二增益控制结果;
    将所述第一增益控制结果对应的优化质量与所述第二增益控制结果对应的优化质量进行比较,得到第三比较结果;
    若所述第三比较结果指示所述第一增益控制结果对应的优化质量优于所述第二增益控制结果对应的优化质量,则将所述第一前置信号处理策略中的所述第一增益控制算法作为与所述音质参数相关联的第三最优信号处理策略;
    若所述第三比较结果指示所述第二增益控制结果对应的优化质量优于所述第一增益控制结果对应的优化质量,则将所述第一前置信号处理策略中的所述第二增益控制算法作为与所述音质参数相关联的第三最优信号处理策略。
  10. 根据权利要求4所述的方法,其中,所述方法还包括:
    若在所述测试列表中未查找到与所述终端类型相匹配的测试类型,则在所述游戏语音模式下通过麦克风获取到所述第一用户的上行语音数据时,通过所述第一前置信号处理策略对所述上行语音数据进行语音优化,得到第一语音优化结果,且通过所述第二前置信号处理策略对所述上行语音数据进行语音优化,得到第二语音优化结果;
    基于所述第一语音优化结果和所述第二语音优化结果,从所述第一前置信号处理策略和所述第二前置信号处理策略中确定与所述音质参数相关联的最优信号处理策略,将所述最优信号处理策略作为与所述第一前置信号处理策略相关联的信号处理结果。
  11. 根据权利要求1所述的方法,其中,所述方法还包括:
    在所述第一用户访问所述业务应用时,获取用于加载所述业务应用的系统资源包,对所述系统资源包进行解析处理,得到所述业务应用的系统资源数据;
    对所述系统资源数据进行初始化处理,基于初始化处理后的系统资源数据将所述业务应用的业务模式初始配置为系统媒体模式。
  12. 根据权利要求11所述的方法,其中,所述方法还包括:
    基于所述初始化处理后的系统资源数据输出所述业务应用的应用显示界面;所述应用显示界面中包含用于指示所述第一用户发起语音交互业务的语音控件;
    响应所述第一用户针对所述语音控件的语音开启操作,检测所述业务应用的应用类型;
    在检测到所述业务应用的应用类型为游戏类型时,将所述业务应用的业务模式由所述系统媒体模式切换为所述游戏语音模式。
  13. 根据权利要求12所述的方法,其中,所述方法还包括:
    在检测到所述业务应用的应用类型为非游戏类型时,将所述业务应用的业务模式由所述系统媒体模式切换为系统通话模式;
    在基于所述系统通话模式将所述语音交互业务的通话类型确定为系统通话类型时,通过所述业务应用向第二用户发送所述系统通话类型对应的系统通话请求;所述第二用户为所述 第一用户在所述业务应用中所选择的请求进行系统通话的用户;
    在所述第二用户响应所述系统通话请求时,建立所述第一用户与所述第二用户之间的系统通信信道,基于所述系统通信信道进行系统通话。
  14. 根据权利要求12所述的方法,其中,所述方法还包括:
    响应所述第一用户针对所述语音控件的语音关闭操作,将所述业务应用的业务模式由所述游戏语音模式切换回所述系统媒体模式。
  15. 根据权利要求1-14任一项所述的方法,其中,所述方法还包括:
    将语音优化后的上行语音数据作为所述上行语音数据对应的目标语音优化结果;
    将所述目标语音优化结果发送给与所述第一用户相关联的第三用户对应的终端,以使所述第三用户对应的终端在所述游戏语音模式下通过扬声器播放所述语音优化后的上行语音数据。
  16. 一种音频数据处理方法,所述方法由计算机设备执行,所述方法包括:
    在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,所述第一前置信号处理策略中包括至少一个第一优化组件;
    根据所述信号处理结果,在所述应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态;其中,所述第二前置信号处理策略中包括至少一个第二优化组件。
  17. 根据权利要求16所述的方法,其特征在于,所述根据所述信号处理结果,在所述应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,包括:
    根据所述信号处理结果,确定所述第二前置信号处理策略中开启的第二优化组件,以及确定所述第二前置信号处理策略中关闭的第二优化组件;
    在所述应用层中,将所述第二前置信号处理策略中关闭的第二优化组件作为第一协同组件,且在所述第一前置信号处理策略中开启与所述第一协同组件具有相同优化功能的第一优化组件;
    在所述应用层中,将所述第二前置信号处理策略中开启的第二优化组件作为第二协同组件,且在所述第一前置信号处理策略中关闭与所述第二协同组件具有相同优化功能的第一优化组件。
  18. 根据权利要求16所述的方法,其特征在于,所述在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果,包括:
    获取所述业务应用所属终端的终端类型,在与所述业务应用相关联的测试列表中查找与所述终端类型相匹配的测试类型;
    若在所述测试列表中查找到与所述终端类型相匹配的测试类型,则基于所述音质参数从所述测试列表中获取采用所述第一前置信号处理策略所得到的第一测试处理结果,且获取采用所述第二前置信号处理策略所得到的第二测试处理结果;
    基于所述第一测试处理结果和所述第二测试处理结果,从所述第一前置信号处理策略和所述第二前置信号处理策略中确定与所述音质参数相关联的最优信号处理策略,将所述最优信号处理策略作为与所述第一前置信号处理策略相关联的信号处理结果。
  19. 根据权利要求18所述的方法,其特征在于,所述第一前置信号处理策略中的第一优化组件的语音优化算法包括以下至少一种:用于在所述应用层进行回声消除的第一回声消除算法、用于在所述应用层进行噪声抑制的第一噪声抑制算法、和用于在所述应用层进行增益调整的第一增益控制算法;所述第二前置信号处理策略中的第二优化组件的语音优化算法包括以下至少一种:用于在所述终端系统层进行回声消除的第二回声消除算法、用于在所述终端系统层进行噪声抑制的第二噪声抑制算法、和用于在所述终端系统层进行增益调整的第二增益控制算法。
  20. 根据权利要求19所述的方法,其中,所述基于所述第一测试处理结果和所述第二测试处理结果,从所述第一前置信号处理策略和所述第二前置信号处理策略中确定与所述音质 参数相关联的最优信号处理策略,将所述最优信号处理策略作为与所述第一前置信号处理策略相关联的信号处理结果,包括:
    从所述第一测试处理结果中获取所述第一回声消除算法所对应的第一回声消除结果,从所述第二测试处理结果中获取所述第二回声消除算法所对应的第二回声消除结果,基于所述第一回声消除结果和所述第二回声消除结果,从所述第一回声消除算法和所述第二回声消除算法中选取最优回声消除算法,将所述最优回声消除算法作为与所述音质参数相关联的第一最优信号处理策略;
    从所述第一测试处理结果中获取所述第一噪声抑制算法所对应的第一噪声抑制结果,从所述第二测试处理结果中获取所述第二噪声抑制算法所对应的第二噪声抑制结果,基于所述第一噪声抑制结果和所述第二噪声抑制结果,从所述第一噪声抑制算法和所述第二噪声抑制算法中选取最优噪声抑制算法,将所述最优噪声抑制算法作为与所述音质参数相关联的第二最优信号处理策略;
    从所述第一测试处理结果中获取所述第一增益控制算法所对应的第一增益控制结果,从所述第二测试处理结果中获取所述第二增益控制算法所对应的第二增益控制结果,基于所述第一增益控制结果和所述第二增益控制结果,从所述第一增益控制算法和所述第二增益控制算法中选取最优增益控制算法,将所述最优增益控制算法作为与所述音质参数相关联的第三最优信号处理策略;
    将所述第一最优信号处理策略、所述第二最优信号处理策略和所述第三最优信号处理策略,确定为与所述第一前置信号处理策略相关联的信号处理结果。
  21. 根据权利要求18所述的方法,其中,所述方法还包括:
    若在所述测试列表中未查找到与所述终端类型相匹配的测试类型,则在所述游戏语音模式下通过麦克风获取到所述第一用户的上行语音数据时,通过所述第一前置信号处理策略对所述上行语音数据进行语音优化,得到第一语音优化结果,且通过所述第二前置信号处理策略对所述上行语音数据进行语音优化,得到第二语音优化结果;
    基于所述第一语音优化结果和所述第二语音优化结果,从所述第一前置信号处理策略和所述第二前置信号处理策略中确定与所述音质参数相关联的最优信号处理策略,将所述最优信号处理策略作为与所述第一前置信号处理策略相关联的信号处理结果。
  22. 一种音频数据处理方法,所述方法由计算机设备执行,所述方法包括:
    在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,所述第一前置信号处理策略中包括至少一个第一优化组件;
    根据所述信号处理结果,控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,或控制所述第一前置信号处理策略中的第一优化组件的开关状态;其中,所述第一前置信号处理策略中开启的第一优化组件,不同于在所述第二前置信号处理策略中开启的第二优化组件。
  23. 根据权利要求22所述的方法,其特征在于,所述根据所述信号处理结果,控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,或控制所述第一前置信号处理策略中的第一优化组件的开关状态,包括:
    根据所述信号处理结果,确定所述第二前置信号处理策略中开启的第二优化组件,以及确定所述第二前置信号处理策略中关闭的第二优化组件;
    将所述第二前置信号处理策略中关闭的第二优化组件进行关闭,且在所述第一前置信号处理策略中开启与所述关闭的第二优化组件具有相同优化功能的第一优化组件;
    将所述第二前置信号处理策略中开启的第二优化组件进行开启,且在所述第一前置信号处理策略中关闭与所述开启的第二优化组件具有相同优化功能的第一优化组件。
  24. 根据权利要求22所述的方法,其特征在于,所述在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果,包括:
    获取所述业务应用所属终端的终端类型,在与所述业务应用相关联的测试列表中查找与所述终端类型相匹配的测试类型;
    若在所述测试列表中查找到与所述终端类型相匹配的测试类型,则基于所述音质参数从所述测试列表中获取采用所述第一前置信号处理策略所得到的第一测试处理结果,且获取采用所述第二前置信号处理策略所得到的第二测试处理结果;
    基于所述第一测试处理结果和所述第二测试处理结果,从所述第一前置信号处理策略和所述第二前置信号处理策略中确定与所述音质参数相关联的最优信号处理策略,将所述最优信号处理策略作为与所述第一前置信号处理策略相关联的信号处理结果。
  25. 根据权利要求24所述的方法,其特征在于,所述第一前置信号处理策略中的第一优化组件的语音优化算法包括以下至少一种:用于在所述应用层进行回声消除的第一回声消除算法、用于在所述应用层进行噪声抑制的第一噪声抑制算法、和用于在所述应用层进行增益调整的第一增益控制算法;所述第二前置信号处理策略中的第二优化组件的语音优化算法包括以下至少一种:用于在所述终端系统层进行回声消除的第二回声消除算法、用于在所述终端系统层进行噪声抑制的第二噪声抑制算法、和用于在所述终端系统层进行增益调整的第二增益控制算法。
  26. 根据权利要求25所述的方法,其中,所述基于所述第一测试处理结果和所述第二测试处理结果,从所述第一前置信号处理策略和所述第二前置信号处理策略中确定与所述音质参数相关联的最优信号处理策略,将所述最优信号处理策略作为与所述第一前置信号处理策略相关联的信号处理结果,包括:
    从所述第一测试处理结果中获取所述第一回声消除算法所对应的第一回声消除结果,从所述第二测试处理结果中获取所述第二回声消除算法所对应的第二回声消除结果,基于所述第一回声消除结果和所述第二回声消除结果,从所述第一回声消除算法和所述第二回声消除算法中选取最优回声消除算法,将所述最优回声消除算法作为与所述音质参数相关联的第一最优信号处理策略;
    从所述第一测试处理结果中获取所述第一噪声抑制算法所对应的第一噪声抑制结果,从所述第二测试处理结果中获取所述第二噪声抑制算法所对应的第二噪声抑制结果,基于所述第一噪声抑制结果和所述第二噪声抑制结果,从所述第一噪声抑制算法和所述第二噪声抑制算法中选取最优噪声抑制算法,将所述最优噪声抑制算法作为与所述音质参数相关联的第二最优信号处理策略;
    从所述第一测试处理结果中获取所述第一增益控制算法所对应的第一增益控制结果,从所述第二测试处理结果中获取所述第二增益控制算法所对应的第二增益控制结果,基于所述第一增益控制结果和所述第二增益控制结果,从所述第一增益控制算法和所述第二增益控制算法中选取最优增益控制算法,将所述最优增益控制算法作为与所述音质参数相关联的第三最优信号处理策略;
    将所述第一最优信号处理策略、所述第二最优信号处理策略和所述第三最优信号处理策略,确定为与所述第一前置信号处理策略相关联的信号处理结果。
  27. 根据权利要求24所述的方法,其中,所述方法还包括:
    若在所述测试列表中未查找到与所述终端类型相匹配的测试类型,则在所述游戏语音模式下通过麦克风获取到所述第一用户的上行语音数据时,通过所述第一前置信号处理策略对所述上行语音数据进行语音优化,得到第一语音优化结果,且通过所述第二前置信号处理策略对所述上行语音数据进行语音优化,得到第二语音优化结果;
    基于所述第一语音优化结果和所述第二语音优化结果,从所述第一前置信号处理策略和所述第二前置信号处理策略中确定与所述音质参数相关联的最优信号处理策略,将所述最优信号处理策略作为与所述第一前置信号处理策略相关联的信号处理结果。
  28. 根据权利要求22至27任一项所述的方法,其特征在于,所述方法还包括:
    获取所述业务应用对应的第一用户在所述游戏语音模式下的上行语音数据,基于所述第一前置信号处理策略中开启的第一优化组件和所述第二前置信号处理策略中开启的第二优化组件,对所述游戏语音模式下的上行语音数据进行语音优化。
  29. 一种音频数据处理装置,所述装置包括:
    处理结果获取模块,用于在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,所述第一前置信号处理策略中包括至少一个第一优化组件;
    组件控制模块,用于根据所述信号处理结果,在所述应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,或控制所述第一前置信号处理策略中的第一优化组件的开关状态;其中,所述第一前置信号处理策略中开启的第一优化组件,不同于在所述第二前置信号处理策略中开启的第二优化组件;
    语音优化模块,用于获取所述业务应用对应的第一用户在所述游戏语音模式下的上行语音数据,基于所述第一前置信号处理策略中开启的第一优化组件和所述第二前置信号处理策略中开启的第二优化组件,对所述游戏语音模式下的上行语音数据进行语音优化。
  30. 一种音频数据处理装置,所述装置包括:
    处理结果获取模块,用于在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,所述第一前置信号处理策略中包括至少一个第一优化组件;
    组件控制模块,用于根据所述信号处理结果,在所述应用层控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态;其中,所述第二前置信号处理策略中包括至少一个第二优化组件。
  31. 一种音频数据处理装置,所述装置包括:
    处理结果获取模块,用于在游戏语音模式下,获取与业务应用的应用层内的第一前置信号处理策略相关联的信号处理结果;其中,所述第一前置信号处理策略中包括至少一个第一优化组件;
    组件控制模块,用于根据所述信号处理结果,控制终端系统层内的第二前置信号处理策略中的第二优化组件的开关状态,或控制所述第一前置信号处理策略中的第一优化组件的开关状态;其中,所述第一前置信号处理策略中开启的第一优化组件,不同于在所述第二前置信号处理策略中开启的第二优化组件。
  32. 一种计算机设备,包括:处理器和存储器;
    所述处理器与存储器相连,其中,所述存储器用于存储计算机程序,所述处理器用于调用所述计算机程序,以使得所述计算机设备执行权利要求1-15任一项所述的方法,或者执行权利要求16-21任一项所述的方法,或者执行权利要求22-28任一项所述的方法。
  33. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,该计算机程序适于由处理器加载并执行,以使得具有所述处理器的计算机设备执行权利要求1-15任一项所述的方法,或者执行权利要求16-21任一项所述的方法,或者执行权利要求22-28任一项所述的方法。
  34. 一种计算机程序产品或计算机程序,所述计算机程序产品或计算机程序包括计算机指令,所述计算机指令存储在计算机可读存储介质中,处理器从所述计算机可读存储介质读取并执行所述计算机指令,以实现如权利要求1-15任一项所述的方法,或者实现如权利要求16-21任一项所述的方法,或者实现如权利要求22-28任一项所述的方法。
PCT/CN2021/131404 2021-01-22 2021-11-18 音频数据处理方法、装置、设备、存储介质及程序产品 Ceased WO2022156336A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2023544240A JP7597300B2 (ja) 2021-01-22 2021-11-18 音声データ処理方法と装置及びコンピュータ機器とプログラム
KR1020237027570A KR20230130730A (ko) 2021-01-22 2021-11-18 오디오 데이터 처리 방법 및 장치, 디바이스, 저장매체, 그리고 프로그램 제품
EP21920712.3A EP4283617A4 (en) 2021-01-22 2021-11-18 AUDIO DATA PROCESSING METHOD AND DEVICE, DEVICE, STORAGE MEDIUM AND PROGRAM PRODUCT
US17/991,239 US12477069B2 (en) 2021-01-22 2022-11-21 Audio data processing method and apparatus, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110088769.3A CN114822570B (zh) 2021-01-22 2021-01-22 一种音频数据处理方法、装置、设备及可读存储介质
CN202110088769.3 2021-01-22

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/991,239 Continuation US12477069B2 (en) 2021-01-22 2022-11-21 Audio data processing method and apparatus, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2022156336A1 true WO2022156336A1 (zh) 2022-07-28

Family

ID=82524619

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131404 Ceased WO2022156336A1 (zh) 2021-01-22 2021-11-18 音频数据处理方法、装置、设备、存储介质及程序产品

Country Status (6)

Country Link
US (1) US12477069B2 (zh)
EP (1) EP4283617A4 (zh)
JP (1) JP7597300B2 (zh)
KR (1) KR20230130730A (zh)
CN (1) CN114822570B (zh)
WO (1) WO2022156336A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114822570B (zh) * 2021-01-22 2023-02-14 腾讯科技(深圳)有限公司 一种音频数据处理方法、装置、设备及可读存储介质
CN115430156B (zh) * 2022-08-16 2024-10-18 中国联合网络通信集团有限公司 游戏期间的呼叫方法、呼叫装置及主叫用户终端
GB2636385A (en) * 2023-12-11 2025-06-18 Sony Interactive Entertainment Europe Ltd Method for adjusting an audio mix of a video game
CN121771168A (zh) * 2024-09-29 2026-03-31 华为技术有限公司 一种处理多媒体数据的方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090204922A1 (en) * 2008-02-13 2009-08-13 Microsoft Corporation Techniques to manage communications resources for a multimedia conference event
CN108762926A (zh) * 2018-05-29 2018-11-06 努比亚技术有限公司 一种系统优化方法、终端及计算机可读存储介质
CN108854062A (zh) * 2018-06-24 2018-11-23 广州银汉科技有限公司 一种移动游戏的语音聊天模块
CN109165091A (zh) * 2018-07-03 2019-01-08 南昌黑鲨科技有限公司 一种优化应用运行质量的方法、移动终端及存储介质
CN109343902A (zh) * 2018-09-26 2019-02-15 Oppo广东移动通信有限公司 音频处理组件的运行方法、装置、终端及存储介质
CN110704191A (zh) * 2019-09-29 2020-01-17 Oppo广东移动通信有限公司 一种游戏优化方法、游戏优化装置及移动终端

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3425548B2 (ja) * 2000-02-14 2003-07-14 コナミ株式会社 ビデオゲーム装置、ビデオゲームにおけるアナウンス音声出力方法及びアナウンス音声出力プログラムが記録されたコンピュータ読み取り可能な記録媒体
US7142335B2 (en) * 2002-07-22 2006-11-28 Eastman Kodak Company Method and apparatus for transparency scanning with a duplex reflective scanner
US7102615B2 (en) * 2002-07-27 2006-09-05 Sony Computer Entertainment Inc. Man-machine interface using a deformable device
US20100098266A1 (en) * 2007-06-01 2010-04-22 Ikoa Corporation Multi-channel audio device
JP5436793B2 (ja) * 2008-04-04 2014-03-05 株式会社バンダイナムコゲームス ゲーム動画配信システム
WO2009132270A1 (en) * 2008-04-25 2009-10-29 Andrea Electronics Corporation Headset with integrated stereo array microphone
US20120310652A1 (en) * 2009-06-01 2012-12-06 O'sullivan Daniel Adaptive Human Computer Interface (AAHCI)
JP2012238964A (ja) * 2011-05-10 2012-12-06 Funai Electric Co Ltd 音分離装置、及び、それを備えたカメラユニット
KR20130106462A (ko) 2012-03-19 2013-09-30 엔에이치엔엔터테인먼트 주식회사 움직임 센서를 이용하여 모바일 단말에서 음성 입력을 제어하기 위한 장치, 방법 및 컴퓨터 판독 가능한 기록매체
CN103617797A (zh) * 2013-12-09 2014-03-05 腾讯科技(深圳)有限公司 一种语音处理方法,及装置
US10034088B2 (en) * 2014-11-11 2018-07-24 Sony Corporation Sound processing device and sound processing method
CN106920559B (zh) * 2017-03-02 2020-10-30 奇酷互联网络科技(深圳)有限公司 通话音的优化方法、装置及通话终端
CN107610698A (zh) * 2017-08-28 2018-01-19 深圳市金立通信设备有限公司 一种实现语音控制的方法、机器人及计算机可读存储介质
CN107920176A (zh) * 2017-11-19 2018-04-17 天津光电安辰信息技术股份有限公司 一种用于语音通信系统的音质优化装置
CN107966910B (zh) 2017-11-30 2021-08-03 深圳Tcl新技术有限公司 语音处理方法、智能音箱及可读存储介质
CN108762607A (zh) * 2018-04-28 2018-11-06 努比亚技术有限公司 一种游戏交流方法、终端及计算机可读存储介质
CN110176244B (zh) * 2018-06-19 2023-10-03 腾讯科技(深圳)有限公司 回声消除方法、装置、存储介质和计算机设备
CN109147784B (zh) * 2018-09-10 2021-06-08 百度在线网络技术(北京)有限公司 语音交互方法、设备以及存储介质
CN109065065A (zh) * 2018-09-27 2018-12-21 南昌努比亚技术有限公司 通话方法、移动终端及计算机可读存储介质
CN110996153B (zh) * 2019-12-06 2021-09-24 深圳创维-Rgb电子有限公司 基于场景识别的音画品质增强方法、系统和显示器
CN113836345B (zh) * 2020-06-23 2026-02-24 索尼公司 信息处理设备、信息处理方法以及计算机可读存储介质
CN111739549B (zh) * 2020-08-17 2020-12-08 北京灵伴即时智能科技有限公司 声音优化方法及声音优化系统
CN111933184B (zh) * 2020-09-29 2021-01-08 平安科技(深圳)有限公司 一种语音信号处理方法、装置、电子设备和存储介质
CN114822570B (zh) * 2021-01-22 2023-02-14 腾讯科技(深圳)有限公司 一种音频数据处理方法、装置、设备及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090204922A1 (en) * 2008-02-13 2009-08-13 Microsoft Corporation Techniques to manage communications resources for a multimedia conference event
CN108762926A (zh) * 2018-05-29 2018-11-06 努比亚技术有限公司 一种系统优化方法、终端及计算机可读存储介质
CN108854062A (zh) * 2018-06-24 2018-11-23 广州银汉科技有限公司 一种移动游戏的语音聊天模块
CN109165091A (zh) * 2018-07-03 2019-01-08 南昌黑鲨科技有限公司 一种优化应用运行质量的方法、移动终端及存储介质
CN109343902A (zh) * 2018-09-26 2019-02-15 Oppo广东移动通信有限公司 音频处理组件的运行方法、装置、终端及存储介质
CN110704191A (zh) * 2019-09-29 2020-01-17 Oppo广东移动通信有限公司 一种游戏优化方法、游戏优化装置及移动终端

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4283617A4

Also Published As

Publication number Publication date
KR20230130730A (ko) 2023-09-12
EP4283617A4 (en) 2024-07-03
US12477069B2 (en) 2025-11-18
JP7597300B2 (ja) 2024-12-10
JP2024510367A (ja) 2024-03-07
CN114822570A (zh) 2022-07-29
EP4283617A1 (en) 2023-11-29
US20230146871A1 (en) 2023-05-11
CN114822570B (zh) 2023-02-14

Similar Documents

Publication Publication Date Title
US12477069B2 (en) Audio data processing method and apparatus, device, and storage medium
JP5085556B2 (ja) エコー除去の構成
US11929088B2 (en) Input/output mode control for audio processing
JP6849797B2 (ja) 音響信号の聴取試験および変調
KR101970370B1 (ko) 오디오 신호의 처리 기법
US20160170970A1 (en) Translation Control
EP3282669A2 (en) Private communications in virtual meetings
US8284922B2 (en) Methods and systems for changing a communication quality of a communication session based on a meaning of speech data
US11115444B2 (en) Private communications in virtual meetings
JP2025527151A (ja) インテリジェントな発話又は対話の強化
US10187432B2 (en) Replaying content of a virtual meeting
US11562761B2 (en) Methods and apparatus for enhancing musical sound during a networked conference
CN111951813A (zh) 语音编码控制方法、装置及存储介质
CN114979545A (zh) 多终端的通话方法和存储介质及电子设备
CN117079661A (zh) 一种声源处理方法及相关装置
CN111951821B (zh) 通话方法和装置
CN118692498B (zh) 基于多通道的音视频信号处理方法和装置
CN111885130A (zh) 语音通信方法、装置、系统、设备及存储介质
HK40071001B (zh) 一种音频数据处理方法、装置、设备及可读存储介质
HK40071001A (zh) 一种音频数据处理方法、装置、设备及可读存储介质
CN114093373A (zh) 音频数据传输方法、装置、电子设备及存储介质
US20250392873A1 (en) Source-Dependent Audio Enhancement Processing
CN121438808A (zh) 端云翻译系统的语音翻译方法、装置、设备以及介质
HK40073421B (zh) 多终端的通话方法和存储介质及电子设备
CN114550748A (zh) 音频信号混合处理方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21920712

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023544240

Country of ref document: JP

ENP Entry into the national phase

Ref document number: 20237027570

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020237027570

Country of ref document: KR

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021920712

Country of ref document: EP

Effective date: 20230822

WWR Wipo information: refused in national office

Ref document number: 1020237027570

Country of ref document: KR

WWC Wipo information: continuation of processing after refusal or withdrawal

Ref document number: 1020237027570

Country of ref document: KR