CN113473305B

CN113473305B - Audio processing device

Info

Publication number: CN113473305B
Application number: CN202010244320.7A
Authority: CN
Inventors: 王英剑
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2026-03-17
Anticipated expiration: 2040-03-31
Also published as: CN113473305A

Abstract

This application discloses an audio processing device, including a first housing with a speaker unit and a first audio acquisition module adjacent to the speaker unit disposed therein; a second housing connected to the first housing; and a second audio acquisition module disposed on the inner surface of the second housing and adjacent to its end. By placing the audio acquisition module near the speaker unit in the first housing and the second audio acquisition module on the inner surface of the second housing connected to the first housing, it is possible to acquire ambient noise and user voice audio from different locations. This allows for the elimination of noise interference on the voice audio based on the mixed audio containing different audio components acquired by the first and second audio acquisition modules, without the need for complex speech recognition algorithms.

Description

Audio processing device

Technical Field

The present application relates to the field of audio processing technologies, and in particular, to an audio processing apparatus.

Background

In a voice control technology in recent years, voice instructions of a user are picked up through a microphone of an earphone and an audio signal of the received voice instructions is processed by a processor of the earphone or a terminal connected to the earphone. However, since the microphone of the earphone or the microphone of the terminal is spaced from the mouth where the user gives out a voice command, particularly, the current wireless earphone is smaller and smaller, so that the main body of the wireless earphone is near the ear, and the audio data sent forward from the mouth needs to detour to be received by the microphone of the earphone or the terminal, so that a great amount of environmental sound, particularly environmental noise, is inevitably mixed in the audio data received by the microphone.

Therefore, such audio data containing a large amount of environmental noise brings great difficulty to subsequent audio recognition processing, and in the prior art, noise data is usually recognized from the received mixed audio data through a software algorithm, but due to the complexity of the environmental noise, the accuracy of such a recognition scheme is low.

Disclosure of Invention

The embodiment of the application provides an audio processing device, which improves the recognition accuracy of voice data in acquired audio data under the influence of noise.

To achieve the above object, an embodiment of the present application provides an audio processing apparatus, including:

A first shell, wherein a loudspeaker unit and a first audio acquisition module adjacent to the loudspeaker unit are arranged in the first shell;

the second shell is connected with the first shell;

And the at least one second audio acquisition module is arranged on the inner surface of the second shell and is adjacent to the tail end of the second shell.

According to the audio processing device provided by the embodiment of the application, the audio acquisition module is arranged near the speaker unit in the first shell provided with the speaker unit, and the second audio acquisition module is arranged on the inner surface of the second shell connected with the first shell, so that environmental noise and voice audio of a user can be acquired from different positions, and the influence and interference of noise on the voice audio can be eliminated based on the mixed audio containing different audio components acquired by the first audio acquisition module and the second audio acquisition module without a complex voice recognition algorithm.

The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

fig. 1 is a schematic view of an application scenario of an audio processing apparatus according to an embodiment of the present application;

Fig. 2 is a schematic diagram of an audio processing apparatus according to the present application;

fig. 3 is an exploded schematic view of an audio processing device according to the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

With the increase of the pace of life, the time of people in daily life also becomes fragmented, and people are increasingly accustomed to simultaneously performing multiple activities so as to fully utilize the fragmentation time. For example, a commuter may prefer to use headphones to listen to music, educational audio, or reuse a cell phone screen to view video content while taking a vehicle on duty and off duty. Especially for users who are commuting further up and down, the time spent on the road is also relatively large, and thus it may often be necessary to use the headset of the mobile terminal on the road for making a call. However, when listening to audio or talking using headphones in an outdoor or public transportation means such as a bus or subway or the like, the surroundings often have continuous environmental noise such as talking sounds of surrounding passengers, running noise of public transportation means such as an automobile or subway or wind sounds while walking on the road, or the like. These noises can severely interfere with the user's ability to hear the video. Particularly, when a user makes a call, since the audio emitted from the user's mouth needs to pass through a path in the air from the user's mouth to the microphone of the headset, the audio signal received by the microphone of the headset may be a mixture of the voice of the user and the noise audio, thereby severely deteriorating the quality of the voice of the user collected by the headset of the user, and causing the other party of the call to fail to hear the speaking contents of the speaker.

Particularly, with the development of voice control technology in recent years, a user can give a voice instruction from the mouth in addition to using the microphone of the headset during a call, and the voice instruction is collected by the microphone of the headset and transmitted to the mobile terminal to which the headset is connected to perform an operation represented by the voice instruction. In this case, for a mobile terminal executing a voice command, it is required that the audio of the voice command collected by the microphone of the earphone has a high quality to ensure sufficient definition of the collected audio. However, as described above, since the earphone of the user is located at a distance from the user's mouth, particularly, the recent wireless earphone is favored and used by more users because of its small size and portability. But such wireless headphones are farther from the user's mouth, e.g., the body of the headphone is typically near the ear, so that the microphone of the headphone is prone to pick up more ambient noise, thereby severely affecting the user's use of such headphones for talking and giving voice commands. In particular, when a specific audio is required to be uttered first to wake up the function by the earphone, the accuracy of the recognition of the specific audio by the mobile terminal or the earphone is more difficult to be ensured under the influence of environmental noise.

Fig. 1 is a schematic view of an application scenario of an audio processing apparatus according to an embodiment of the present application. As shown in fig. 1, a user may use headphones in various outdoor environments such as riding, running, etc. When the user wants to give a voice instruction to the headset or the headset-connected terminal, the user can directly speak the desired voice instruction, such as "hello, heaven fairy", with the headset being worn. The voice command is emitted from the user's mouth in the form of acoustic audio and propagates in the air. At this time, since the user is in an outdoor environment, various environmental sounds may exist in the environment. As shown in fig. 1, wind sounds, engine sounds of automobiles, track impact sounds of subways, etc. may exist around the user, and thus, sound wave audio of the user may be superimposed and mixed with external sounds, i.e., sound waves of environmental sounds, in the course of being propagated to the microphone of the earphone, to form mixed audio to be received by the microphone. Thereafter, the audio received by the microphone may be decoded and recognized by the earphone or a terminal connected to the earphone, thereby acquiring audio data of the user contained therein, i.e., voice instructions.

However, in the case of the above-described environmental noise, it is very difficult to recognize voice audio data from the mixed audio data simply by means of a software algorithm. In the embodiment of the present application as shown in fig. 2 and 3, the audio processing apparatus of the present application includes a plurality of audio collection modules, which may be disposed near the speaker unit and at the end position of the housing, respectively. Thus, in a scenario such as that shown in fig. 1, the speaker portion of the audio processing device as a headset is typically inserted into or against the user's ear, so that the audio acquisition module located near the speaker can directly acquire signals conducted through the bones and muscles near the ear, and the signal located at the end can acquire voice audio data and ambient noise audio through air conduction. Thus, audio data having different audio components can be collected at these different locations for comprehensive processing. For example, the audio collection module that is in close proximity to the user's ear may, due to insertion into or close proximity to the user's ear, collect audio data that contains less ambient noise, but the intensity of the user's voice audio that it collects is also lower, and the frequency is also lower. In contrast, a microphone located at the end of the housing of the headset relatively far from the user's mouth can collect voice audio data of high intensity, but the intensity of the mixed ambient noise is also high. Such audio acquisition data having different audio components may be processed in a processor to obtain voice audio data for the user.

For example, in the case where a user uses the audio processing apparatus of the present application to carry out transportation of a logistic object, the user may drive a transportation vehicle for a long period of time, for example, a takeaway rider may drive a two-wheeled vehicle such as a bicycle or a motorcycle, and thus his hands are required to be on the handlebars of the vehicle at any time to control the traveling direction while driving, and therefore, in this case, if the rider wants to operate an order or place a call to a receiver during traveling, he/she may need to stop the vehicle and then operate the mobile terminal, which may seriously affect the transportation efficiency, or the rider may use one hand to operate the mobile terminal during riding, but this may pose a risk to the driving of the vehicle. Therefore, in this case, the audio processing apparatus of the embodiment of the present application can be used to assist the rider in operating the mobile terminal using voice, so that the transportation efficiency and the running safety can be taken into consideration. For example, a rider may issue a voice wake-up instruction to the audio processing apparatus of the present application for waking up the mobile terminal, for example, the rider may speak a designated voice wake-up instruction, for example, "hello, fairy", etc., during riding, which may be conducted through bones and muscles near the ears to an audio acquisition module located near the speaker unit of the audio processing apparatus of the present application and may be transmitted through air propagation to a second audio acquisition module provided on the surface of the housing together with ambient noise, so that the audio acquisition module near the speaker unit may recognize that the rider speaks the voice wake-up instruction and may exclude the ambient noise from the mixed audio containing the ambient noise, for example, wind sound, collected by the second audio acquisition module, thereby transmitting the voice wake-up instruction having sufficient audio intensity to the mobile terminal or directly to the processing module built in the audio processing unit to perform the wake-up operation. The recognized voice command issued by the rider may then be transmitted to the mobile terminal or a built-in processing module in a similar manner to execute the specific command issued by the rider.

Accordingly, by providing the audio collection module in the vicinity of the speaker unit in the first housing in which the speaker unit is mounted and providing the second audio collection module on the inner surface of the second housing connected to the first housing, it is possible to collect the environmental noise and the voice audio of the user from different positions, so that the influence and interference of noise on the voice audio can be eliminated based on the mixed audio containing different audio components acquired by the first audio collection module and the second audio collection module without a complicated voice recognition algorithm.

The foregoing embodiments are illustrative of the technical principles and exemplary application frameworks of embodiments of the present application, and the detailed description of specific technical solutions of the embodiments of the present application will be further described below by means of a plurality of embodiments.

Fig. 2 is a schematic diagram of an audio processing apparatus according to the present application, and fig. 3 is an exploded schematic diagram of an audio processing apparatus according to the present application;

As shown in fig. 2 and 3, the audio processing device of the present application may be embodied as various headsets, in-ear headsets, or other various audio processing devices having a speaker and a microphone. A neck-mounted earphone is shown as an example in fig. 2 and 3, but it is easily understood that the scheme of the present application can also be applied to other devices having a microphone and a speaker to improve recognition accuracy for user voice data.

As shown in fig. 2 and 3, the audio processing apparatus 1 of the present application may include a first housing 11, a second housing 12, a first audio collection module 15, a second audio collection module 13, and a speaker unit 14.

The speaker unit 14 and the first audio collection module 15 may be disposed within the first housing 11. In an embodiment of the application, the audio processing device 1 may be a neck-mounted earphone, and the first housing 11 may be a housing of an earphone part inserted into the ear of the user. The speaker unit 14 may be a moving iron unit or other sound producing unit. The first audio acquisition module 15 may be placed inside the first housing 11 as a built-in microphone of the audio processing device 1 and may be adjacent to the speaker unit 14. In this case, when the user uses the audio processing device 1 of the embodiment of the present application, the earphone part is put into the ear canal, and thus, the first audio collection module 15 can enter the ear canal of the user along with the first housing 11 of the earphone part, so that audio data conducted through bones and/or muscles near the user's ear can be collected when the user gives a voice command.

In other embodiments of the application, the audio processing device 1 may also be other forms of headphones, such as headphones or ear-mounted headphones, and in this case the first audio acquisition module 15, due to its proximity to the speaker unit 14, will be in close proximity to the user's ear when in use, so that the first audio acquisition module 15 can still acquire audio data conducted through the bones and/or muscles in the vicinity of the user's ear when the user gives voice instructions.

Furthermore, the audio processing device 1 may comprise two earphone parts. As shown in fig. 2 and 3, for example, the audio processing device 1 may include two first housings 11 on both sides, and thus a speaker unit 14 and a first audio collection module 15 may be included in each first housing 11. Of course, in the embodiment of the present application, the first audio collection module 15 may be provided only in the first housing 11 on one side, or the speaker unit 14 may be provided only in the first housing 11 on one side, and only the first audio collection module 15 may be provided in the first housing 11 on the other side.

In the case of the neck headphone shown in fig. 2 and 3, the second housing 12 is connected with the first housing 11 through the first connection member 18, for example, the first connection member 18 may be a string structure, and both ends may be connected to connection holes on the surfaces of the second housing 12 and the first housing 11, and accordingly, a data line may be also placed in the string, so that the string may serve as both a connection member and a data transmission member. In case the audio processing device 1 is a suspension earphone, the second housing 12 may be directly joined with the first housing 11.

Further, as shown in fig. 2 and 3, the audio processing device 1 may include two second housings 12, and the two second housings 12 may be connected by a connection portion 17. The connection portion 17 may be hollow tubular, so that power supply lines and data lines for supplying power and transmitting data to the second audio collection module 13 in the second housing 12 and the speaker unit 14 and the first audio collection module 15 in the first housing 11 connected to the second housing 12 may be accommodated in the connection portion 17. Further, a power supply component of the audio processing device 1, such as a battery, may be accommodated in the connection portion 17.

The second housing 12 may contain therein a second audio acquisition module 13, which second audio acquisition module 13 may be disposed at an inner surface of the second housing 12 and may be disposed at an end of the second housing 12 as shown in fig. 2 and 3. Thus, when the user uses the audio processing device 1, the first housing 11 may be in the ear canal of the user or close to the ear of the user, while the second housing 12 may be outside the ear of the user, so that the first audio acquisition module 15 may acquire audio conducted through the bones and muscles of the user and the second audio acquisition module 13 may acquire audio data conducted through the air.

Furthermore, the second audio acquisition module 13 may comprise a plurality of audio acquisition units facing in different directions. For example, as shown in fig. 2 and 3, the second audio collection module 13 may include a first audio collection unit 131 and a second audio collection unit 132 disposed front and back in the length direction of the second housing 12. For example, the first audio collection unit 131 may be disposed in front of the second audio collection unit 132, i.e., disposed closer to the end of the second housing 12 than the second audio collection unit 132, and thus the second audio collection unit 132 may be disposed closer to the junction of the second housing 12 and the connection portion 17 than the first audio collection unit 131. In this case, the first audio collection unit 131 may have an audio collection direction toward the end of the second housing 12, i.e., toward the front, so as to be able to collect voice audio of the user and environmental noise, and the second audio collection unit 132 may be provided to have a collection direction toward a direction other than the front. In other words, the second audio acquisition unit 132 may be arranged to acquire ambient noise. For example, the second audio collection unit 132 may be disposed toward the top of the second housing 12 so as to collect environmental noise around the head of the user. In this case, since the first audio collection unit 131 is located in front of the second audio collection unit 132 and also has a collection direction toward the front, voice audio emitted in the user's mouth can be collected more by the first audio collection unit 131, and the second audio collection unit 132 can collect relatively more environmental noise.

Further, in the embodiment of the present application, as shown in fig. 2 and 3, in the case where the audio processing apparatus 1 is a neck headphone, the second housing 12 may be fixed above the first housing 11 so that the second audio collection module 13 may be disposed above the first audio collection module 15. With this structure, the second audio collection module 13 can be closer to the user's mouth to better collect the voice audio of the user through air conduction.

In addition, in the embodiment of the present application, openings may be provided on the first housing 11 and the second housing 12 at positions corresponding to the first audio collection module 15 and the second audio collection module 13 to facilitate audio access, and the openings may be grid-shaped, so that the first audio collection module 15 and the second audio collection module 13 may be protected without affecting the audio sound wave access.

In an embodiment of the present application, the second housing 12 may further include two side plates 122, 123 and a top plate 121, and a partition plate 124 between the two side plates 122 and 123. In this case, the strength of the second housing 12 may be reinforced by the partition 124, and the second audio collection module 13 may be disposed on the partition 124 accordingly. For example, the first and second audio collection units 131 and 132 may be disposed on the partition 124 so that the partition 124 may serve as a support base for the first and second audio collection units 131 and 132.

For example, the second audio collection module 13 is disposed on the side of the partition 124 facing the top plate 121. In the case where the second audio collection module 13 includes the first audio collection unit 131 and the second audio collection unit 132, and the first audio collection unit 131 is disposed on the side of the partition plate 124 facing the top plate 121, and the second audio collection unit 132 is disposed on the side of the partition plate 124 facing the junction of the two side plates 122 and 123, i.e., toward the outside of the second housing 12.

Furthermore, according to an embodiment of the present application, the audio processing apparatus may further comprise a voice wake-up function switch component, which may be used to enable or disable a voice wake-up function performed by a user through the audio processing apparatus of the present application. In other words, when the voice wake-up function is enabled, the user can wake-up by speaking through the first audio acquisition module 15 and the second audio acquisition module 13 of the audio processing apparatus of the present application. Specifically, the first audio collection module 15 and the second audio collection module 13 may collect voice instructions sent by a user, and accurately identify the voice instructions of the user by using the environmental noise and the voice audio of the user collected by different collection positions of the first audio collection module 15 and the second audio collection module 13, so as to send the voice instructions to a mobile terminal connected with the audio processing device to wake up the mobile terminal or other functional modules for waking up the audio processing device itself. In contrast, when the voice wake-up module is turned off, the input of the specific voice command of the user is not collected, or the wake-up operation is not performed or the specific voice command input by the user is not transmitted to the mobile terminal when the specific voice command is recognized and collected, so that false triggering in a scene where voice wake-up is not desired can be avoided.

In addition, according to an embodiment of the present application, the above-mentioned voice wake-up function switch assembly may further include a scene recognition unit, so that a scene where a user is located can be determined by receiving the environmental audio collected by the second audio collection module, and the voice wake-up function is automatically turned on or off according to a recognition result of the scene.

It should be noted that the above embodiments are merely for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present application.

Claims

1. An audio processing apparatus, comprising:

A first housing, wherein a speaker unit and a first audio acquisition module adjacent to the speaker unit are disposed within the first housing;

A second housing is connected to the first housing; the second housing is connected via a connecting part.

The second audio acquisition module, wherein at least one second audio acquisition module is disposed on the inner surface of the second housing and adjacent to the end of the second housing; the second audio acquisition module includes a plurality of audio acquisition units facing different directions;

The first housing is the housing of the earphone part that is inserted into the user's ear, and the first housing and the second housing are connected by a first connector.

2. The audio processing apparatus according to claim 1, further comprising:

The connecting part has two ends connected to the second housing, and at least one of the two ends is connected to the first housing.

3. The audio processing apparatus according to claim 1, wherein the second audio acquisition module comprises a first audio acquisition unit and a second audio acquisition unit, and

The first audio acquisition unit is positioned in front of the second audio acquisition unit along the length of the second housing and has an audio acquisition direction facing forward.

4. The audio processing apparatus according to claim 3, wherein the second audio acquisition unit has an audio acquisition direction toward the top of the second housing.

5. The audio processing apparatus according to claim 3, wherein the second audio acquisition module is disposed above the first audio acquisition module.

6. The audio processing apparatus according to claim 3, wherein the first audio acquisition module is disposed between the first audio acquisition unit and the second audio acquisition unit.

7. The audio processing apparatus according to claim 1, wherein the first audio acquisition module is disposed on the side of the audio processing apparatus closer to the user, and the speaker unit is disposed on the side farther away from the user.

8. The audio processing apparatus according to claim 1, wherein openings are provided on the surfaces of the first housing and the second housing at positions corresponding to the positions of the first audio acquisition module and the second audio acquisition module.

9. The audio processing apparatus of claim 1, wherein the second housing comprises two side plates and a top plate and a partition located between the two side plates.

10. The audio processing apparatus according to claim 9, wherein the second audio acquisition module is disposed on the side of the partition facing the top plate.

11. The audio processing apparatus according to claim 9, wherein the second audio acquisition module includes a first audio acquisition unit and a second audio acquisition unit, and the first audio acquisition unit is disposed on the side of the partition facing the top plate, and the second audio acquisition unit is disposed on the side of the partition facing the joint of the two side plates.

12. The audio processing device according to claim 1, wherein the two ends of the first connector are respectively fixed in the connection holes on the first housing and the second housing.