WO2023071519A1 - 音频信息的处理方法、电子设备、系统、产品及介质 - Google Patents

音频信息的处理方法、电子设备、系统、产品及介质 Download PDF

Info

Publication number
WO2023071519A1
WO2023071519A1 PCT/CN2022/116528 CN2022116528W WO2023071519A1 WO 2023071519 A1 WO2023071519 A1 WO 2023071519A1 CN 2022116528 W CN2022116528 W CN 2022116528W WO 2023071519 A1 WO2023071519 A1 WO 2023071519A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
audio information
alarm
alarm sound
position information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2022/116528
Other languages
English (en)
French (fr)
Inventor
王志超
王宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Honor Device Co Ltd
Original Assignee
Beijing Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Honor Device Co Ltd filed Critical Beijing Honor Device Co Ltd
Priority to US18/291,854 priority Critical patent/US20240411507A1/en
Priority to EP22885410.5A priority patent/EP4354900B1/en
Publication of WO2023071519A1 publication Critical patent/WO2023071519A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/18Methods or devices for transmitting, conducting or directing sound
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/60Substation equipment, e.g. for use by subscribers including speech amplifiers
    • H04M1/6033Substation equipment, e.g. for use by subscribers including speech amplifiers for providing handsfree use or a loudspeaker mode in telephone sets
    • H04M1/6041Portable telephones adapted for handsfree use
    • H04M1/6058Portable telephones adapted for handsfree use involving the use of a headset accessory device connected to the portable telephone
    • H04M1/6066Portable telephones adapted for handsfree use involving the use of a headset accessory device connected to the portable telephone including a wireless connection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/72409User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality by interfacing with external accessories
    • H04M1/72412User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality by interfacing with external accessories using two-way short-range wireless interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72448User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
    • H04M1/72454User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1083Reduction of ambient noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers
    • H04R3/005Circuits for transducers for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/033Headphones for stereophonic communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/04Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • H04S7/304For headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/60Substation equipment, e.g. for use by subscribers including speech amplifiers
    • H04M1/6016Substation equipment, e.g. for use by subscribers including speech amplifiers in the receiver circuit
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R2205/00Details of stereophonic arrangements covered by H04R5/00 but not provided for in any of its subgroups
    • H04R2205/024Positioning of loudspeaker enclosures for spatial sound reproduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field

Definitions

  • the present application relates to the technical field of audio processing, and in particular to an audio information processing method, electronic equipment, system, computer program product, and computer-readable storage medium.
  • the noise-cancelling earphones Even if the noise-cancelling earphones have a transparent mode, in this mode, the noise-cancelling earphones will not completely shield the surrounding sound, but if the surrounding sound is relatively noisy, it still cannot protect the safety of the user.
  • the present application provides an audio information processing method, an electronic device, a computer program product, and a computer-readable storage medium.
  • the purpose is to ensure that the user can be reminded of the alarm sound that exists around when the user is wearing a noise-canceling headset.
  • the present application provides a method for processing audio information applied to an electronic device, the method for processing audio information includes: acquiring audio information, the audio information is obtained by collecting the sound of the environment where the electronic device is located; determining the audio information Including the alarm sound; determining the first position information of the alarm sound based on the audio information; determining the first sound, the first sound includes the second position information, and both the first position information and the second position information are used to identify the sound source direction of the alarm sound, The second location information is the same as or different from the first location information; playing the first sound.
  • the first position information and the second position information may refer to the relative position information of the warning sound relative to the user, or may refer to the absolute position information of the warning sound.
  • the same first location information and the second location information can be understood as the same two values, and the difference between the first location information and the second location information can be understood as the two are approximately the same or within a certain range. If the first location information and the second location information Both position information are angle values, so it can be understood that the difference between them is within a certain angle range, such as 1°.
  • acquiring the audio information obtained by collecting the sound of the environment where the electronic device is located, when the audio information includes an alarm sound, playing the first sound including the direction of the sound source of the alarm sound can ensure that the user's surroundings There is an alarm sound, even if the user wears earphones, the alarm sound is provided by the first sound played.
  • determining the first sound, before the first sound includes the second position information further includes: determining that the audio information and the previous audio information including the warning sound are not acquired within a preset time period.
  • it also includes: determining that the audio information and the previous audio information containing the warning sound are obtained within a preset time period; judging the first position information of the warning sound in the audio information, and obtaining The difference of the first position information of the warning sound in the sound audio information is within the preset range, and the warning sound in the detection audio information and the warning sound in the previous audio information containing the warning sound belong to the same sound, and the distance coefficient is generated , the distance coefficient is used to characterize the energy gain of the audio information relative to the previous audio information containing the warning sound; determine the second sound, the second sound includes the second position information and the energy gain; play the second sound.
  • the difference between the first position information of the warning sound in the audio information and the first position information of the warning sound in the previous audio information containing the warning sound is within a preset range, and the difference in the audio information
  • the alarm sound and the alarm sound in the previous audio information containing the alarm sound belong to the same sound, indicating that there are two consecutive alarm sounds around the user. Therefore, the second sound that includes the sound source direction of the identification alarm sound and carries energy gain is played. , guarantee to remind the user with the second sound including energy gain.
  • playing the first sound includes: sending the first sound to an earphone, and playing the first sound by the earphone.
  • playing the second sound includes: sending the second sound to an earphone, and playing the second sound by the earphone.
  • determining the first position information of the warning sound based on the audio information includes: using a sound source localization algorithm based on a microphone array, using the audio information to perform sound source localization of the warning sound, and obtaining the first position information of the warning sound .
  • determining the first position information of the warning sound based on the audio information includes: determining the third position information of the warning sound based on the audio information, where the third position information is used to identify the position information of the warning sound relative to the sound of the electronic device. Source direction; performing coordinate transformation on the third position information of the warning sound to obtain the first position information of the warning sound.
  • determining the first sound, the first sound including the second position information includes: acquiring a standard sound; based on the first position information of the alarm sound, processing the standard sound to obtain the first sound, the first sound includes Second location information.
  • processing the standard sound to obtain the first sound includes: obtaining the head-related impulse response HRIR value corresponding to the first position information of the warning sound; converting the standard sound, respectively The HRIR value is convolved to obtain the first sound.
  • processing the standard sound to obtain the first sound includes: obtaining the HRTF value corresponding to the first position information of the warning sound; Fourier transform processing, and then multiplied by the HRTF value to obtain the first sound.
  • the manner of detecting that the alarm sound in the audio information and the alarm sound in the previous audio information containing the alarm sound belong to the same sound includes: respectively performing an operation on the audio information and the previous audio information containing the alarm sound; Convert the time domain to the frequency domain to obtain the amplitude spectrum of the audio information and the previous audio information containing the warning sound; use the audio information and the amplitude spectrum of the previous audio information containing the warning sound to compare the audio information and the previous audio information containing the warning sound The similarity calculation is performed on the audio information to obtain the calculation result, which is used to represent whether the audio information and the previous audio information belong to the same sound.
  • the audio information and the amplitude spectrum of the previous audio information containing the warning sound are used to perform similarity calculation on the audio information and the previous audio information containing the warning sound, and the calculation result is obtained, including: using Pearson Correlation function, calculate the similarity between the audio information and the previous audio information containing the alarm sound, and obtain the similarity value; if the similarity value is greater than the threshold, the audio information and the previous audio information containing the alarm sound belong to the same sound, similar If the intensity value is not greater than the threshold, the audio information and the previous audio information containing the alarm sound do not belong to the same sound.
  • the audio information and the amplitude spectrum of the previous audio information containing the warning sound are used to perform similarity calculation on the audio information and the previous audio information containing the warning sound, and the calculation result is obtained, including: using a classification model Predict whether the audio message and the previous audio message containing the alarm sound belong to the same sound.
  • the method of detecting that the alarm sound in the audio information and the alarm sound in the previous audio information containing the alarm sound belong to the same sound includes: from the audio information and the previous audio information containing the alarm sound, Raise alarm sounds respectively; judge whether the two extracted alarm sounds belong to the same alarm sound.
  • judging whether the two extracted alarm sounds belong to the same alarm sound includes: respectively converting the two extracted alarm sounds from the time domain to the frequency domain to obtain the two extracted alarm sounds The amplitude spectrum of the sound; use the amplitude spectrum of the two extracted alarm sounds to calculate the similarity of the two extracted alarm sounds to obtain the calculation result. The calculation result is used to indicate whether the two extracted alarm sounds belong to the same alarm sound.
  • the similarity calculation is performed on the two extracted alarm sounds by using the amplitude spectra of the two extracted alarm sounds to obtain the calculation result, which includes: using the Pearson correlation function to calculate the similarity of the two extracted alarm sounds Carry out similarity calculation for two alarm sounds to obtain a similarity value; if the similarity value is greater than the threshold, the two extracted alarm sounds belong to the same alarm sound; if the similarity value is not greater than the threshold value, the extracted two alarm sounds Do not belong to the same alarm sound.
  • the amplitude spectrum of the two extracted alarm sounds is used to calculate the similarity of the two extracted alarm sounds, and the calculation result is obtained, including: using the classification model to predict the extracted two alarm sounds Whether they belong to the same alarm sound.
  • the method further includes: determining that the distance coefficient is within a range of the distance coefficient.
  • it also includes: determining that the distance coefficient exceeds the range of the distance coefficient; determining a third sound, the third sound including the second position information and the energy gain represented by the endpoint value of the range of the distance coefficient; playing the third sound .
  • the end point value of the range of the distance coefficient is used as the distance coefficient to determine the third sound, and the third sound is played, which can avoid the generated distance coefficient being too large or too large. Small, how louder or softer the volume that causes the sound to play with the energy gain.
  • the manner of determining whether the audio information includes an alarm sound includes: calling an alarm sound detection model to detect whether the audio information includes an alarm sound, and obtaining a detection result, which is used to represent whether the audio information includes an alarm sound.
  • the present application provides an electronic device, including: one or more processors, memory and a wireless communication module; the memory and the wireless communication module are coupled with one or more processors, and the memory is used to store computer program codes,
  • the computer program code includes computer instructions, and when one or more processors execute the computer instructions, the electronic device executes the audio information processing method according to any one of the first aspect.
  • the present application provides a computer storage medium for storing a computer program, and when the computer program is executed, it is specifically used to implement the audio information processing method according to any one of the first aspect.
  • the present application provides a computer program product.
  • the computer program product When the computer program product is run on a computer, it enables the computer to execute the audio information processing method according to any one of the first aspect.
  • the present application provides an audio information processing system, including: an electronic device and an earphone, wherein the electronic device is used to execute the audio information processing method according to any one of the first aspect; the earphone is used for communicating with the electronic device Interaction for playing a first sound, a second sound or a third sound in response to the electronic device.
  • FIG. 1 is a diagram of an application scenario provided by an embodiment of the present application
  • FIG. 2a is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
  • Figure 2b is a software architecture diagram of the electronic device provided by the embodiment of the present application.
  • Fig. 3a is a display diagram of the noise reduction earphone provided by the embodiment of the present application.
  • Figure 3b is an interface display diagram provided by the embodiment of the present application.
  • FIG. 3c is a schematic diagram of the generalized cross-correlation delay estimation algorithm provided by the embodiment of the present application.
  • FIG. 4 is a sequence diagram of a method for processing audio information provided in Embodiment 1 of the present application.
  • FIG. 5 is a display diagram of the alarm sound relative to the position information of the user provided by the embodiment of the present application.
  • FIG. 6 is another application scenario diagram provided by the embodiment of the present application.
  • FIG. 7 is a sequence diagram of a method for processing audio information provided in Embodiment 2 of the present application.
  • one or more refers to one, two or more than two; "and/or” describes the association relationship of associated objects, indicating that there may be three types of relationships; for example, A and/or B may mean: A exists alone, A and B exist simultaneously, and B exists alone, wherein A and B may be singular or plural.
  • the character "/" generally indicates that the contextual objects are an "or" relationship.
  • references to "one embodiment” or “some embodiments” or the like in this specification means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically stated otherwise.
  • the terms “including”, “comprising”, “having” and variations thereof mean “including but not limited to”, unless specifically stated otherwise.
  • a plurality referred to in the embodiment of the present application means greater than or equal to two. It should be noted that in the description of the embodiments of the present application, words such as “first” and “second” are only used to distinguish the purpose of description, and cannot be understood as indicating or implying relative importance, nor can they be understood as indicating or imply order.
  • the noise-cancelling earphones Even if the noise-cancelling earphones have a transparent mode, in this mode, the noise-cancelling earphones will not completely shield the surrounding sound, but if the surrounding sound is relatively noisy, it still cannot protect the safety of the user.
  • the embodiment of the present application proposes a method for processing audio information.
  • the audio information processing method provided in the embodiment of the present application can be applied to the application scenario shown in FIG. 1 .
  • the user can be reminded when there is a dangerous alarm sound around the user through the interaction between the mobile phone and the noise-canceling headset.
  • Fig. 2a shows a composition example of an electronic device provided by an embodiment of the present application.
  • the composition structure of the mobile phone proposed in this application scenario is also shown in FIG. 2a.
  • other electronic devices and the noise-canceling headset such as tablet computers, desktops, laptops, notebook computers, ultra-mobile personal computers (Ultra-mobile Personal Computer, UMPC), handheld computers, netbooks, personal digital assistants (Personal Digital Assistant, PDA), wearable electronic devices, etc.
  • PDA Personal Digital Assistant
  • the electronic device 200 may include a processor 210, an external memory interface 220, an internal memory 221, a display screen 230, an antenna 1, an antenna 2, a mobile communication module 240, a wireless communication module 250, an audio module 260 and the like.
  • the structure shown in this embodiment does not constitute a specific limitation on the electronic device.
  • the electronic device may include more or fewer components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the processor 210 may include one or more processing units, for example: the processor 210 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • a memory may also be provided in the processor 210 for storing instructions and data.
  • the memory in processor 210 is a cache memory.
  • the memory may hold instructions or data that the processor 210 has just used or recycled. If the processor 210 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 210 is reduced, thereby improving the efficiency of the system.
  • the external memory interface 220 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device.
  • the external memory card communicates with the processor 210 through the external memory interface 220 to implement a data storage function. Such as saving music, video and other files in the external memory card.
  • the internal memory 221 may be used to store computer-executable program codes including instructions.
  • the processor 210 executes various functional applications and data processing of the electronic device 200 by executing instructions stored in the internal memory 221 .
  • the internal memory 221 may include an area for storing programs and an area for storing data.
  • the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like.
  • the storage data area can store data (such as audio data, phone book, etc.) created during the use of the electronic device.
  • the internal memory 221 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like.
  • the processor 210 executes various functional applications and data processing of the electronic device by executing instructions stored in the internal memory 221 and/or instructions stored in a memory provided in the processor.
  • the electronic device realizes the display function through the GPU, the display screen 230 , and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 230 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 210 may include one or more GPUs that execute program instructions to generate or change display information.
  • the electronic device can realize the shooting function through the ISP, the camera, the video codec, the GPU, the display screen 230 and the application processor.
  • the wireless communication function of the electronic device can be realized by the antenna 1, the antenna 2, the mobile communication module 240, the wireless communication module 250, the modem processor and the baseband processor.
  • the mobile communication module 240 can provide wireless communication solutions including 2G/3G/4G/5G applied to electronic devices.
  • the mobile communication module 240 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation.
  • the mobile communication module 240 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves through the antenna 1 for radiation.
  • the wireless communication module 250 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wireless fidelity, Wi-Fi) network), bluetooth (bluetooth, BT), global navigation satellite system, etc. (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 250 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 250 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 210 .
  • the wireless communication module 250 can also receive the signal to be sent from the processor 210 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the Bluetooth module in the wireless communication module 250 is used to implement short-distance communication between the electronic device 200 and other electronic devices, for example, the electronic device 200 interacts with the noise-canceling headset through the Bluetooth module.
  • the bluetooth module can be an integrated circuit or a bluetooth chip or the like.
  • the electronic device 200 can realize the audio function through the audio module 260 , the speaker 270A, the receiver 270B, the microphone 270C, the earphone interface 270D, and the application processor. Such as music playback, recording, etc.
  • the audio module 260 is used for converting digital audio information into an output analog audio signal, and is also used for converting an analog audio input into a digital audio signal.
  • the audio module 260 may also be used to encode and decode audio signals.
  • the audio module 260 may be set in the processor 210 , or some functional modules of the audio module 260 may be set in the processor 210 .
  • Speaker 270A also referred to as a "horn" is used to convert audio electrical signals into sound signals.
  • Electronic device 200 can listen to music through speaker 270A, or listen to hands-free calls.
  • the speaker 270A can be used to play the three-dimensional reminder sound mentioned in the embodiment of this application.
  • Receiver 270B also called “earpiece” is used to convert audio electrical signals into audio signals.
  • the receiver 270B can be placed close to the human ear to receive the voice.
  • the microphone 270C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 270C with a human mouth, and input the sound signal into the microphone 270C.
  • the electronic device 200 may be provided with at least one microphone 270C.
  • the electronic device 200 may be provided with two microphones 270C, which may also implement a noise reduction function in addition to collecting sound signals.
  • the electronic device 200 can also be provided with three, four or more microphones 270C to form a microphone array to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the microphone 270C is used to collect the sound of the external environment where the electronic device is located.
  • an operating system runs on top of the above components.
  • An application program can be installed and run on the operating system.
  • Fig. 2b is a block diagram of the software structure of the electronic device according to the embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the Android system is divided into four layers, which are respectively the application program layer, the application program framework layer, the Android runtime (Android runtime) and the system library, and the kernel layer from top to bottom.
  • the application layer can consist of a series of application packages. As shown in FIG. 2b, the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, and Bluetooth.
  • applications such as camera, gallery, calendar, call, map, navigation, WLAN, and Bluetooth.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions. As shown in Figure 2b, the application framework layer can include window manager, content provider, phone manager, resource manager, notification manager, view system, etc.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • Said data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.
  • the phone manager is used to provide communication functions of electronic devices. For example, the management of call status (including connected, hung up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify the download completion, message reminder, etc.
  • the notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window.
  • prompting text information in the status bar issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on.
  • the view system can be used to build applications.
  • a display interface can consist of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the Android Runtime includes core library and virtual machine.
  • the Android runtime is responsible for the scheduling and management of the Android system.
  • the cold start of the application will run in the Android runtime, and the Android runtime obtains the optimized file status parameters of the application, and then the Android runtime can judge whether the optimized file is outdated due to system upgrades through the optimized file status parameters , and return the judgment result to the application control module.
  • the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application program layer and the application program framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • a system library can include multiple function modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of various commonly used audio and video formats, as well as still image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG2, H.262, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to realize 3D graphics drawing, image rendering, compositing and layer processing, etc.
  • the 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
  • the noise-canceling headset can generally be a Bluetooth headset.
  • the bluetooth earphone is an earphone that supports the bluetooth communication protocol.
  • the Bluetooth communication protocol may be an ER traditional Bluetooth protocol, may also be a BDR traditional Bluetooth protocol, or may be a BLE low-power Bluetooth protocol. Of course, it can also be other new Bluetooth protocol types that will be launched in the future.
  • the version of the Bluetooth communication protocol can be any of the following: 1.0 series version, 2.0 series version, 3.0 series version, 4.0 series version, and other series of versions based on future releases.
  • the bluetooth earphone of the embodiment of the present application generally refers to a double bluetooth earphone composed of a left earphone and a right earphone, which can provide stereo sound effect for the user.
  • Common dual Bluetooth headsets include traditional in-ear Bluetooth headsets and true wireless stereo (TWS) Bluetooth headsets.
  • TWS Bluetooth headset saves the connecting wire between the two earphones and the audio source, the left earphone and the right earphone still need to be connected through a connecting wire to synchronize the audio signals.
  • the TWS Bluetooth headset not only saves the connection cable between the two headphones and the audio source, but also saves the connection cable between the left earphone and the right earphone.
  • Both the left earphone and the right earphone are provided with a bluetooth module, and the left earphone and the right earphone can transmit data through the bluetooth protocol.
  • Both the left earphone and the right earphone include microphones, that is to say, in addition to the function of audio playback, the main earphone and the auxiliary earphone also have the function of audio collection.
  • the Bluetooth headset in the embodiment of the present application can be one or more of the following applications: HSP (Headset Profile) application, HFP (Hands-free Profile) application, A2DP (Advanced Audio Distribution Profile) application, AVRCP (Audio/ Video Remote Control Profile) application.
  • HSP Headset Profile
  • HFP Headset Profile
  • A2DP Advanced Audio Distribution Profile
  • AVRCP Audio/ Video Remote Control Profile
  • the HSP application represents a headset application, and provides basic functions required for communication between the electronic device and the headset.
  • Bluetooth earphones can be used as audio input and output interfaces of electronic equipment.
  • the HFP application stands for the hands-free application.
  • the HFP application adds some extended functions on the basis of the HSP application.
  • the Bluetooth headset can control the call process of the terminal, such as answering, hanging up, rejecting, voice dialing, etc.
  • the A2DP application is an advanced audio transmission application.
  • A2DP can use the chip in the earphone to stack data to achieve high-definition sound.
  • the AVRCP application is an audio and video remote control application.
  • the AVRCP application defines how to control the characteristics of streaming media, including: pause, stop, start playback, volume control and other types of remote control operations.
  • the noise-cancelling earphones can be provided with an activation button for the earphone intelligent reminder alarm sound function.
  • an activation button 101 of the earphone smart reminder alarm sound function is set on the right earphone, and the activation button 101 may include a first position 11 and a second position 22 .
  • the start button 101 is located at the first position 11, the function of the earphone intelligent reminder alarm sound is activated; when the start button 101 is located at the second position 22, the function of the earphone intelligent reminder alarm sound is turned off.
  • the activation button of the earphone intelligent reminder alarm sound function may be the same button as the button for other functions of the noise-canceling earphone, or it may be a separate button.
  • the noise-canceling headset After the noise-canceling headset activates the earphone intelligent reminder alarm sound function, when the noise-canceling headset determines that there is an alarm sound around the user, the noise-canceling headset can play an alarm sound to the user.
  • the type of alarm sound played by the noise-canceling headset to the user can be set. Also referring to the example in FIG. 3 a , an alarm sound selection button 102 is set on the left earphone. The alarm sound is selected by triggering the alarm sound selection button 102 .
  • the user clicks the alarm sound selection button 102, and the noise-canceling earphone responds to the user's click operation to make a voice broadcast and select the alarm sound.
  • the alarm sound can be divided into three modes: default alarm sound, intelligently recommended alarm sound and manual selection of alarm sound.
  • the default alarm sound is the alarm sound set by the system, and the intelligent recommended alarm sound can provide different alarm sounds in combination with the operating status of the noise-canceling earphones.
  • Manually select the alarm sound and the user can click the alarm sound selection button 102 to select a different manual selection. Warning sounds, such as the horns of different types of vehicles.
  • FIG. 3a uses a head-mounted Bluetooth headset as an example for illustration, but this does not constitute a limitation to the Bluetooth headset involved in the embodiment of the present application.
  • the start button 101 and the alarm sound selection button 102 shown in FIG. 3a are physical keys, and in some embodiments, the start button 101 and the alarm sound selection button 102 may also be virtual keys.
  • the left earphone or the right earphone of the Bluetooth headset can be set with a virtual button, and the earphone intelligent reminder alarm function can be activated by triggering the virtual button.
  • the triggering of the virtual button can also be set in various forms.
  • the function of starting or closing the earphone intelligent reminder alarm sound can be realized by touching for different durations; in other embodiments, it can also be started by touching for different times. Or turn off the earphone intelligent reminder warning sound function; in some other embodiments, it is also possible to activate or deactivate the earphone intelligent reminder warning sound function by triggering different positions.
  • the left earphone or the right earphone of the bluetooth earphone can also be set with a virtual button, and different alarm sounds can be selected by triggering the virtual button.
  • the triggering of virtual keys can also be set in various forms. In some embodiments, different alarm sounds can be selected by touching for different durations; in other embodiments, different alarm sounds can also be selected by touching for different times; In some embodiments, different alarm sounds can also be selected by triggering different positions.
  • the control start and stop of the earphone intelligent reminder alarm sound function, and the control selection of different alarm sounds can also be realized by electronic equipment.
  • the setting interface of the Bluetooth headset of the electronic device presents four items of earphone intelligent reminder alarm sound, active noise reduction, gesture and alarm sound selection, and the user can activate the activation button of each item.
  • the function corresponding to the item In the example shown in FIG. 3 b , the earphone intelligent reminder alarm sound is activated, and the functions of the other three items are disabled.
  • the earphone intelligent reminder alarm sound is activated, and the noise-canceling earphone connected to the mobile phone Bluetooth can interact with the mobile phone to realize the alarm sound reminder when there is a dangerous alarm sound around the user.
  • the selection of the alarm sound is activated, and the user can complete the selection of the alarm sound when a dangerous alarm sound appears around the user through a manual input operation.
  • the selection of alarm sound is an item with a sub-interface, and the user slides and clicks the start button of the selection of alarm sound, the function of selecting the alarm sound is activated, and the sub-interface of the selection of alarm sound is displayed.
  • the sub-interface for selecting the alarm sound shows four modes, which are default alarm sound, intelligently recommended alarm sound, user-defined and manually selected alarm sound.
  • the default alarm sound, the intelligently recommended alarm sound and the manual selection of the alarm sound can be as described above.
  • Customization can be understood as an alarm sound that can be edited and customized by the user. In the example shown in Fig. 3b, the default alarm sound is enabled, and the other three modes are disabled.
  • a sub-interface for manually selecting the alarm sound is displayed, as shown in the example in FIG. 3 b .
  • the sub-interface for manually selecting the warning sound includes four kinds of vehicle warning sounds, and the user can select the warning sound through the start buttons of different vehicles.
  • vehicle 1 is in the activated state, and the other three vehicles are in the closed state.
  • the aforementioned electronic devices such as mobile phones and noise-cancelling earphones can also be equipped with an alarm sound detection model, which has the function of predicting whether the audio information input to the alarm sound detection model contains alarm sounds.
  • the alarm sound detection model can use basic network models such as convolutional neural network (Convolutional Neural Network, CNN) and long-short-term memory artificial neural network (Long-Short Term Memory, LSTM).
  • Convolutional neural networks usually include: input layer, convolution layer (Convolution Layer), pooling layer (Pooling layer), fully connected layer (Fully Connected Layer, FC) and output layer.
  • Convolution Layer convolution layer
  • Pooling layer Purooling layer
  • FC Fully Connected Layer
  • the first layer of a convolutional neural network is the input layer
  • the last layer is the output layer.
  • Convolution Layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can only be connected to some adjacent neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units of the same feature plane share weights, and the shared weights here are convolution kernels.
  • Pooling layer usually after the convolutional layer will get a feature with a large dimension, cut the feature into several regions, take its maximum value or average value, and obtain a new feature with a smaller dimension.
  • Fully-Connected layer which combines all local features into global features, is used to calculate the final score of each category.
  • a long-short-term memory artificial neural network usually includes an input layer, a hidden layer, and an output layer.
  • the input layer is composed of at least one input node; when the LSTM network is a unidirectional network, the hidden layer only includes the forward hidden layer; when the LSTM network is a bidirectional network, the hidden layer includes the forward hidden layer and the backward hidden layer to the hidden layer.
  • the hidden nodes are respectively connected to the output nodes, and are used to output their own calculation results to the output nodes, and the output nodes perform calculations according to the output nodes of the hidden layer, and output data.
  • the alarm sound detection model can be trained in the following ways:
  • the original model of alarm sound detection can choose basic network models such as CNN and LSTM.
  • the training samples include: samples containing alarm sounds and samples not containing alarm sounds, and the training samples are marked to indicate whether the samples contain alarm sounds.
  • the warning sound in the training sample can be like the whistle of a vehicle.
  • training samples including whistle sounds of different types of motor vehicles such as automobiles and motorcycles, as well as training samples containing other alarm sounds such as alarm bells .
  • the alarm bell can be understood as the alarm sound when special vehicles such as ambulances, police cars, and fire trucks are running.
  • the training samples are input into the original alarm sound detection model, and the original alarm sound detection model detects whether the training samples contain the alarm sound, and obtains the detection result.
  • loss function uses the loss function to calculate the loss value of the detection result and the labeling result of each training sample to obtain the loss value of the model.
  • loss functions such as cross-entropy loss function and weighted loss function can be used to calculate the loss value, or a combination of multiple loss functions can be used to calculate multiple loss values.
  • the model convergence condition may be that the loss value of the model is less than or equal to a preset loss threshold. That is to say, the loss value of the model can be compared with the loss threshold. If the loss value of the model is greater than the loss threshold, it can be judged that the loss value of the model does not meet the model convergence conditions. Conversely, if the loss value of the model is less than or equal to the loss threshold, it can be judged that the model loss value meets the model convergence condition.
  • the loss value of the corresponding model can be calculated for each training sample. In this case, only when the model loss value of each training sample meets the model convergence condition will the Execute, on the contrary, as long as there is a model loss value of a training sample that does not meet the model convergence conditions, then execute the subsequent steps.
  • the trained model can be used in the audio information processing method proposed in the following embodiments to detect whether the audio information input to the model contains an alarm sound.
  • the parameter update value of the model is calculated according to the loss value of the model, and the original model of alarm sound detection is updated with the parameter update value of the model. And use the updated model to continue to process the training samples, get the detection results, and continue to execute the subsequent process until the loss value of the model meets the convergence conditions of the model.
  • the sound can be localized based on the sound source localization algorithm of the microphone array and the like.
  • the sound source localization algorithm uses a microphone array for sound localization.
  • Commonly used sound source localization algorithms mainly fall into three categories: localization technology based on high-resolution spectrum estimation, localization technology based on steerable beamforming (Beamforming), and localization technology based on TDOA.
  • the implementation principle of the TDOA-based sound localization algorithm is simple. It is generally divided into two parts: delay estimation and sound source localization.
  • Delay estimation can calculate the arrival time difference of two signals from different microphones, and sound source localization can be calculated according to the time difference. The angle of the sound from the sound source.
  • the time delay estimation algorithm mainly includes the time delay estimation method based on correlation analysis, the time delay estimation method based on phase spectrum estimation, the time delay estimation method based on parameter estimation, etc.
  • the most widely used method is mainly the time delay estimation based on correlation analysis
  • the generalized cross-correlation function method (GCC) in the time delay estimation method based on correlation analysis introduces a weighting function to adjust the cross power spectral density, so as to optimize the performance of time delay estimation.
  • the generalized cross-correlation function has many different deformations
  • the generalized cross-correlation-phase transformation method Generalized Cross Correlation PHASE Transformation, GCC-PHAT
  • the generalized cross-correlation function delay estimation algorithm estimates the delay value according to the peak value of the cross-correlation function of two microphone signals.
  • the target signal received by each element of the microphone array comes from the same sound source. Therefore, there is a strong correlation between the signals of each channel.
  • the correlation function between every two signals the time delay between two microphone observation signals can be determined.
  • the received signals x 1 (t) and x 2 (t) of the two microphones in the array, as shown in Equation 1, are:
  • x 1 (t) a 1 s(t- ⁇ 1 )+n 1 (t)
  • t refers to the time
  • s(t) is the sound source signal
  • n 1 (t) and n 2 (t) are the environmental noise
  • ⁇ 1 and ⁇ 2 are the signal propagation from the sound source to the two microphone array elements propagation time.
  • X 1 ( ⁇ ) is the result of performing Fourier transform FFT on x 1 (t)
  • X 2 ( ⁇ ) is the result of performing Fourier transform FFT on x 2 (t)
  • refers to the microphone
  • the angular frequency of the received signal, ( . ) * refers to the conjugate processing of X 2 ( ⁇ )
  • ⁇ ( ⁇ ) is the phase transformation weighting function, used for the common
  • the yoke is weighted by phase transformation to obtain the calculation result.
  • the calculation result is subjected to peak detection after inverse Fourier transform IFFT, and the peak detection result is used to output ⁇ 12 .
  • ⁇ 12 ⁇ 1 ⁇ 2 , which is the time difference between the two microphone signals.
  • GCC-PHAT only uses the signals of two microphones. If the number of microphones is more than two, other methods can be used for delay estimation, such as the sound source localization algorithm based on phase transformation weighted controllable response power (Steered Response Power- Phase Transform, SRP-PHAT).
  • SRP-PHAT Stepered Response Power- Phase Transform
  • the basic principle of the SRP-PHAT algorithm is to calculate the sum of the generalized cross-correlation GCC-PHAT functions weighted by all microphones on the phase transformation of the received signal at the position of the imaginary sound source, and to find the point where the SRP value is the largest in the entire sound source space is the sound source location estimate.
  • this embodiment provides a method for processing audio information, and the method for processing audio information provided by this embodiment can be used in the application scenario in FIG. 1 .
  • the processing method of the audio information includes steps:
  • the mobile phone acquires audio information.
  • the audio information is obtained by collecting the sound of the external environment, and the sound of the external environment may be collected through a microphone.
  • both the mobile phone and the noise-canceling headset are equipped with microphones. Therefore, the microphone in the mobile phone or the microphone in the noise-canceling headset can collect the sound of the external environment to obtain audio information.
  • the mobile phone or the noise-canceling earphone can collect the sound of the external environment periodically or in real time to obtain audio information when the noise-canceling earphone is in the running state.
  • the noise-canceling earphone collects the sound of the external environment, obtains audio information, and transmits the audio information to the mobile phone through a channel connected to the mobile phone such as a Bluetooth channel.
  • the noise-canceling headset collects the sound of the external environment to obtain audio information
  • the mobile phone obtains the audio information obtained by the noise-canceling headset, and executes the following steps as an example for illustration.
  • the mobile phone calls the alarm sound detection model to detect whether the audio information contains the alarm sound, and obtains a detection result, and the detection result is used to indicate whether the audio information contains the alarm sound.
  • the alarm sound detection model has the function of predicting whether the audio information input to the alarm sound detection model contains the alarm sound. Therefore, after acquiring the audio information of the external environment, the alarm sound detection model can be used to detect whether the audio information contains the alarm sound, and obtain the detection result.
  • the mobile phone after the mobile phone obtains the audio information, it calls the alarm sound detection model to detect whether the audio information contains the alarm sound, and obtains the detection result.
  • the noise-canceling earphone can also call the alarm sound detection model to detect whether the audio information contains the alarm sound, obtain the detection result, and then transmit the detection result to the mobile phone. In this way, the mobile phone does not need to execute step S402.
  • step S403 If the detection result indicates that the audio information contains an alarm sound, execute steps S403 and S404; if the detection result indicates that the audio information does not contain an alarm sound, return to step S401.
  • alarm sounds mentioned in the embodiments of the present application can all be understood as the alarm sounds mentioned in the foregoing content, such as whistles or alarm bells of various types of motor vehicles.
  • the mobile phone uses the audio information to locate the alarm sound, and obtains position information of the alarm sound relative to the user.
  • the mobile phone may use the sound source localization algorithm based on the microphone array proposed in the foregoing content, and use audio information to perform sound source localization for the warning sound.
  • the mobile phone uses the audio information collected by the microphone of the left earphone of the noise-canceling earphone and the audio information collected by the microphone of the right earphone to locate the sound source of the alarm sound, and obtain the location information of the alarm sound relative to the user.
  • the location information is generally Including the horizontal direction angle ⁇ of the alarm sound relative to the user.
  • FIG. 5 shows an example of the horizontal direction angle ⁇ of the alarm sound relative to the user (referring to the center point of the user's head).
  • the position information of the alarm sound relative to the user obtained in this step refers to the position information of the alarm sound relative to the earphone.
  • the position information of the alarm sound relative to the user mentioned in the following content refers to the position information of the alarm sound relative to the earphone.
  • the obtained position information of the warning sound relative to the user can be understood as the relative position information of the warning sound.
  • the mobile phone uses the audio information to locate the alarm sound to obtain the location information of the alarm sound relative to the user, which may also refer to obtaining the absolute position of the alarm sound.
  • the audio information collected by the microphone of the left earphone and the audio information collected by the microphone of the right earphone can also be used by the noise-canceling earphone, and the sound source localization algorithm based on the microphone array proposed in the foregoing content can perform sound source localization on the alarm sound , to obtain the location information of the alarm sound relative to the user.
  • the noise-canceling headset can transmit the obtained warning sound to the mobile phone relative to the direction angle of the user. In this way, the mobile phone does not need to execute step S403.
  • the mobile phone can obtain the audio information collected by the built-in microphone array of the mobile phone, and use the microphone array to The collected audio information is used to locate the sound source of the alarm sound, and obtain the position information of the alarm sound relative to the mobile phone.
  • the mobile phone and the user may have relative angles, after the mobile phone uses the audio information collected by its own microphone array to obtain the position information of the alarm sound relative to the mobile phone, it is necessary to perform coordinate transformation on the position information of the alarm sound relative to the mobile phone to obtain The location information of the alarm sound relative to the user.
  • the coordinate conversion of the position information of the warning sound relative to the mobile phone can be performed to obtain the position information of the warning sound relative to the user.
  • the position information of the alarm sound relative to the mobile phone can be transformed based on the earth coordinate system to obtain the position information of the alarm sound relative to the user.
  • coordinate conversion can also be performed based on the unified coordinate system of other mobile phones and noise-canceling headphones.
  • the noise-cancelling headset In order to adapt to the coordinate transformation, the noise-cancelling headset needs to calculate the attitude angle relative to the earth coordinate system. Therefore, the noise-canceling headset needs to be equipped with an acceleration sensor and an angular velocity sensor. Usually, it is necessary to set the same type of acceleration sensor and angular velocity sensor as the mobile phone.
  • the mobile phone uses the detection data of its own acceleration sensor and angular velocity sensor to calculate the attitude angle of the mobile phone.
  • the noise-canceling earphone uses the detection data of its own acceleration sensor and angular velocity sensor to calculate the attitude angle of the earphone.
  • the mobile phone acquires the attitude angle of the earphone, and uses the attitude angle of the mobile phone and the earphone to determine the conversion relationship between the earphone and the mobile phone's coordinate system, and uses the conversion relationship to process the position information of the alarm sound relative to the mobile phone, and obtain the relative position of the alarm sound relative to the user. location information.
  • the specific method for calculating the attitude angle by using the detection data of the acceleration sensor and the angular velocity sensor of the mobile phone and the noise-cancelling earphone can refer to the conventional method, and will not be described here.
  • the mobile phone uses the attitude angle of the mobile phone and the attitude angle of the earphone to determine the conversion relationship between the coordinate system of the earphone and the mobile phone, and uses the conversion relationship to process the position information of the alarm sound relative to the mobile phone to obtain the position information of the alarm sound relative to the user. It can also refer to the conventional method, which will not be described here.
  • the mobile phone acquires audio information collected by a built-in microphone array of the mobile phone, and the mobile phone transmits the acquired audio information to the noise reduction earphone.
  • the noise-canceling earphone uses audio information to locate the alarm sound and obtain the position information of the alarm sound relative to the user.
  • the microphone array of the mobile phone collects the audio information of the external environment. Therefore, when the noise-canceling headset uses the audio information and uses the sound source localization algorithm based on the microphone array proposed in the foregoing content to localize the sound source of the alarm sound, the result is The location information of the alarm sound relative to the mobile phone.
  • the noise-canceling earphone uses the aforementioned content to perform coordinate transformation on the position information of the alarm sound relative to the mobile phone, and obtain the position information of the alarm sound relative to the user.
  • the mobile phone detects whether the audio information and the previous audio information including the alarm sound are obtained within a preset time period.
  • step S405 If the mobile phone detects the audio information and the previous audio information containing the warning sound is not obtained within the preset time period, then perform steps S405 to S407; if the mobile phone detects the audio information, the time difference with the previous audio information containing the warning sound is within the preset If it is acquired within the time period, execute steps S408 to S413.
  • the previous audio information containing the warning sound refers to: when the mobile phone detects the warning sound before the audio information containing the warning sound is determined, the most adjacent audio information containing the warning sound information.
  • the audio information detected by the mobile phone and the previous audio information containing the alarm sound are obtained within a preset time period, which means that the mobile phone has detected two alarm sounds consecutively within a time period, so it may be necessary to focus on reminding the user of the alarm sound.
  • the preset time period can be set according to actual needs, because two consecutive alarm sounds need to be screened out through step S404, so the preset time period should not be set too long.
  • the preset time period can be set to 30 seconds.
  • the mobile phone Based on the location information of the alarm sound relative to the user, the mobile phone processes the standard alarm sound to obtain a three-dimensional alarm sound.
  • the three-dimensional reminder sound can be understood as a directional warning sound.
  • the standard reminder sound can be processed by three-dimensional sound technology, and the alarm sound of the carrying position can be obtained. After the warning sound of the carrying direction is output to the user, the user can feel the direction of the warning sound.
  • the mobile phone pre-stores a plurality of standard reminder sounds, and the user can pre-set the standard reminder sound for the alarm reminder by adopting the alarm sound selection method proposed in the foregoing content.
  • the user can be reminded to set the standard alarm sound by displaying the alarm sound selection interface shown in FIG. 3b through the mobile phone.
  • the standard reminder sound can be understood as an alarm sound that does not contain noise, and it can usually be the whistle of a vehicle.
  • the mobile phone also pre-stores a plurality of Head-Response Transfer Function (HRTF) values; wherein, a plurality of Head-Response Transfer Function (Head-Response Transfer Function, HRTF) values, usually according to
  • HRTF Head-Response Transfer Function
  • the left and right earphones are set in pairs. That is, multiple HRTF values are divided into multiple HRTF values of the left earphone and HRTF values of the right earphone corresponding to each HRTF value of the left earphone.
  • the HRTF values of a pair of left and right earphones respectively correspond to an angle value of an alarm sound relative to the user.
  • the human head can be used as the center point, and the 360° with a certain distance from the center point can be divided into multiple angle values.
  • the 360° around the central point can be equally divided into multiple angle values.
  • the number of division angles can be set according to actual conditions.
  • HRTF Head-Response Transfer Function
  • the calculation method of the head-response transfer function (Head-Response Transfer Function, HRTF) value is as shown in formula two:
  • P L and P R are the frequency-domain complex sound pressures produced by the sound source in the left and right ears respectively;
  • P 0 is the sound pressure in the frequency domain of the sound source in the center of the head after the head is removed, and the definition of P 0 is shown in Equation 3 :
  • ⁇ 0 is the density of the medium (air)
  • c represents the speed of sound
  • c in the air at normal temperature is 344m/s
  • Q0 is the intensity of the sound source
  • r represents the sound
  • f represents the frequency of the sound.
  • this step includes:
  • the HRTF value of the left earphone and the HRTF value of the right earphone corresponding to the position information are obtained.
  • the location information of the alarm sound relative to the user includes: the horizontal direction angle of the alarm sound relative to the user. Use the horizontal direction angle of the alarm sound relative to the user as the screening factor, and filter multiple HRTF values stored in the mobile phone to obtain the HRTF value of the left earphone and the HRTF value of the right earphone that match the horizontal direction angle of the alarm sound relative to the user .
  • the standard reminder sound, the HRTF value of the left earphone and the HRTF value of the right earphone corresponding to the position information are subjected to Fourier transform product processing to obtain binaural output signals, that is, the three-dimensional reminder sound of the left earphone and the three-dimensional reminder sound of the right earphone .
  • the mobile phone can also pre-store multiple Head Related Inpulse Response (HRIR) values; where multiple Head Related Inpulse Response (HRIR) values are usually set in pairs according to the left and right earphones . That is, the multiple HRIR values are divided into multiple HRIR values of the left earphone and HRIR values of the right earphone corresponding to each HRIR value of the left earphone.
  • the HRIR values of a pair of left and right earphones respectively correspond to an angle value of an alarm sound relative to the user.
  • HRIR Head Related Inpulse Response
  • HRTF Head-Response Transfer Function
  • the HRIR value of the left earphone and the HRIR value of the right earphone corresponding to the position information are obtained.
  • the location information of the alarm sound relative to the user includes: a horizontal direction angle of the alarm sound relative to the user. Use the horizontal direction angle of the alarm sound relative to the user as the screening factor, and filter multiple HRIR values stored in the mobile phone to obtain the HRIR value of the left earphone and the HRIR value of the right earphone that match the horizontal direction angle of the alarm sound relative to the user .
  • the standard reminder sound is convolved with the HRIR value of the left earphone and the HRIR value of the right earphone corresponding to the position information, and the binaural output signal is obtained, that is, the three-dimensional reminder sound of the left earphone and the three-dimensional reminder sound of the right earphone.
  • the position information based on the alarm sound relative to the user proposed in the foregoing content may be completely the same as the position information of the alarm sound relative to the user obtained in step S403, or may be approximately the same, or the difference between the two is within a certain range. In the range.
  • the mobile phone sends a three-dimensional reminder sound to the noise reduction headset.
  • the mobile phone can send the three-dimensional reminder sound of the left earphone and the three-dimensional reminder sound of the right earphone to the noise-canceling earphone through a connection channel such as Bluetooth.
  • the noise reduction earphone plays a three-dimensional reminder sound.
  • the left earphone of the noise-canceling earphone outputs the three-dimensional reminder sound of the left earphone
  • the right earphone outputs the three-dimensional reminder sound of the right earphone
  • the alarm sound is detected in the audio of the external environment
  • the mobile phone uses the audio information to locate the alarm sound, obtains the position information of the alarm sound relative to the user, and processes the standard reminder based on the position information of the alarm sound relative to the user.
  • the three-dimensional reminder sound is obtained by the sound, and then the three-dimensional reminder sound is played by the noise-canceling earphone, which can remind the user that there is an alarm sound around and there is a safety problem.
  • the mobile phone determines whether the difference between the location information of the alarm sound relative to the user in the audio information and the location information of the alarm sound relative to the user in the previous audio information containing the alarm sound is within a preset range.
  • step S405 is executed.
  • step S409 is executed.
  • the difference between the location information of the alarm sound relative to the user in the audio information and the location information of the alarm sound relative to the user in the previous audio information containing the alarm sound is within the preset range, indicating that the alarm sound has appeared successively within the same range.
  • Two alarm sounds it is necessary to focus on reminding the user of the alarm sound.
  • the preset range can be set according to the actual situation. Generally, the difference between the alarm sound and the horizontal direction angle of the user can be set to be smaller than the first threshold.
  • the first threshold may be set according to actual conditions, and in one example, the first threshold may be 5°.
  • the mobile phone detects whether the audio information and the previous audio information including the alarm sound belong to the same sound.
  • whether the audio information detected by the mobile phone and the previous audio information containing the alarm sound belong to the same sound means: whether the alarm sound in the audio information detected by the mobile phone is the same as the alarm sound in the previous audio information containing the alarm sound belong to the same alarm sound.
  • the method for detecting whether the alarm sound in the audio information and the alarm sound in the previous audio information containing the alarm sound belong to the same alarm sound may include the following two methods.
  • the first method detecting whether the audio information and the previous audio information including the alarm sound belong to the same sound.
  • the second method from the audio information and the previous audio information containing the warning sound, the warning sound is proposed respectively; whether the two extracted warning sounds belong to the same warning sound is judged.
  • the alarm sound detection model when the alarm sound detection model detects that the audio information contains the alarm sound, the alarm sound detection model can obtain the position information of the alarm sound in the audio information, therefore, the position information of the alarm sound can be used to obtain the information from the audio information and the previous one containing the alarm sound. Extract the alarm sound from the audio information.
  • the following takes the first method as an example to specifically describe the process of whether the alarm sound in the audio information and the alarm sound in the previous audio information containing the alarm sound belong to the same alarm sound.
  • the method of judging whether two alarm sounds belong to the same alarm sound can also refer to the following content.
  • the intensity of the two warning sounds before and after may be different, but if the two alarm sounds belong to the same sound source, the frequency of the warning sound should be the same. Therefore, in a possible implementation mode, The magnitude spectrum can be used to judge whether the two audio messages containing the warning sound belong to the same sound.
  • the specific implementation is as follows:
  • Each audio information containing the warning sound is converted from the time domain to the frequency domain to obtain an amplitude spectrum of each audio information containing the warning sound.
  • the amplitude spectrum of the audio information can be obtained by performing Fourier transform on the audio information including the warning sound.
  • the x-axis of the magnitude spectrum is the frequency
  • the y-axis is the magnitude of the audio information.
  • the similarity calculation is performed on the two audio information, and the calculation result is obtained, which is used to represent whether the two audio information belong to the same sound.
  • a Pearson correlation function may be used to perform similarity calculations on two audio messages containing alarm sounds before and after to obtain a similarity value.
  • sampling points are collected for the two audio information containing the warning sound before and after, and n sampling points for each audio information containing the warning sound are obtained.
  • the two sampling points of the audio information containing the warning sound before and after can be referred to as ( Xi, Yi), using the two sampling points of the audio information containing the alarm sound into the following formula 4, the Pearson correlation coefficient r can be calculated.
  • the correlation strength of the two audio information containing the warning sound can be judged by Table 1.
  • a threshold can be set according to the relationship between the Pearson correlation coefficient r and the correlation strength provided in Table 1, for example, the threshold is set to 0.8. If the similarity value of the two audio information containing the warning sound is greater than the threshold, then the two audio information containing the warning sound belong to the same sound, and the similarity value of the two audio information containing the warning sound is not greater than the threshold, then the preceding and following Two audio messages containing warning tones do not belong to the same sound.
  • whether two audio information belong to the same sound can be predicted by a classification model, such as a binary classification model, a som model, an SVM model, and the like.
  • a classification model such as a binary classification model, a som model, an SVM model, and the like.
  • the classification model after the training has two input signals input to the classification model for prediction, such as whether the two audio information containing the warning sound in this embodiment are the classification results of the same class, and the prediction result is obtained.
  • the prediction result is 1, the two audio information containing the alarm sound belong to the same sound; if the prediction result is 0, the two audio information containing the alarm sound do not belong to the same sound.
  • step S410 if the detected audio information belongs to the same sound as the previous audio information including the warning sound, step S410 is performed. If the detected audio information does not belong to the same sound as the previous audio information containing the warning sound, step S405 is performed.
  • the audio information detected by the mobile phone belongs to the same sound as the previous audio information containing the alarm sound, which means that the user is in the same direction, and the alarm sound from the same sound source has appeared twice in a row. Therefore, it is necessary to focus on reminding the user of the alarm sound. Voice.
  • step S403 , step S404 , step S408 and step S409 are not limited to the execution order shown in FIG. 4 , and may be executed in parallel.
  • step S404, step S408, and step S409 are not limited to the execution sequence shown in FIG. 4, and may be executed in parallel or in other execution sequences.
  • the mobile phone generates a distance coefficient, where the distance coefficient is used to represent the energy gain of the audio information relative to the previous audio information including the warning sound.
  • the energy gain is positive, that is, the distance coefficient is a value greater than 1; if the energy of the audio information is smaller than the energy of the previous audio information containing the alarm sound The energy of the audio information, the energy gain is negative, that is, the distance coefficient is a value less than 1; if the energy of the audio information is the same as the energy of the previous audio information containing the warning sound, the energy gain is 0, that is, the distance coefficient is 1.
  • the distance coefficient gain can be calculated using Formula 5.
  • k is a constant.
  • the range of the distance coefficient can be set in advance, such as 0.1 to 10. After the distance coefficient is calculated in step S410, it is compared whether the distance coefficient is within the range of the distance coefficient. If the distance coefficient is within the range of the distance coefficient, the following steps can be performed. If the distance coefficient exceeds the range of the distance coefficient, then the endpoint value of the range of the distance coefficient (i.e. the maximum or minimum value of the range of the distance coefficient) is used as this step For the distance factor, perform the following steps. Of course, the following steps should be performed with the closest endpoint value of the generated distance coefficient as the distance coefficient of this step.
  • step S410 When the distance coefficient generated in step S410 exceeds the range of the distance coefficient, the following steps are performed with the endpoint value of the range of the distance coefficient as the distance coefficient of this step, which can avoid the generated distance coefficient being too large or too small, causing the following steps to generate How much or how low the volume of the 3D reminder sound with power gain is.
  • the mobile phone Based on the location information and distance coefficient of the alarm sound relative to the user, the mobile phone processes the standard reminder sound to obtain a three-dimensional reminder sound with energy gain.
  • the method of obtaining the standard reminder sound and determining the HRTF value and the HRIR value is the same as that of the aforementioned step S405, and will not be described here.
  • the standard reminder sound is subjected to Fourier transform processing, and then respectively multiplied by the HRTF value of the left earphone corresponding to the position information and the HRTF value of the right earphone to obtain the binaural output signal, that is, the HRTF value of the left earphone.
  • the three-dimensional reminder sound and the three-dimensional reminder sound of the right earphone and then multiply the three-dimensional reminder sound of the left earphone and the three-dimensional reminder sound of the right earphone by the distance coefficient gain to obtain the three-dimensional reminder sound of the left and right earphones with energy gain.
  • the standard reminder sound is convolved with the HRIR value of the left earphone and the HRIR value of the right earphone corresponding to the position information, respectively, to obtain a binaural output signal, that is, the three-dimensional reminder sound of the left earphone and the HRIR value of the right earphone.
  • the three-dimensional reminder sound of the right earphone, the three-dimensional reminder sound of the left earphone and the three-dimensional reminder sound of the right earphone are respectively multiplied by the distance coefficient gain, and the three-dimensional reminder sound with energy gain of the left and right earphones is obtained.
  • the mobile phone processes the standard reminder sound to obtain a three-dimensional reminder sound with energy gain. If the sound source of the warning sound is constantly approaching the user, the energy of the audio information acquired by the mobile phone in the next time is greater than the energy of the audio information acquired in the previous time. , therefore, the energy gain is positive, the distance coefficient is greater than 1, and the three-dimensional reminder sound with energy gain has more energy than the previous three-dimensional reminder sound, which can ensure that the three-dimensional reminder sound with energy gain is used to remind the user.
  • the mobile phone sends a three-dimensional reminder sound with energy gain to the noise reduction headset.
  • the mobile phone can send the three-dimensional reminder sound of the left earphone with energy gain and the three-dimensional reminder sound of the right earphone with energy gain to the noise-canceling earphone through a connection channel such as Bluetooth.
  • the noise reduction earphone plays a three-dimensional reminder sound with energy gain.
  • the left earphone of the noise-cancelling earphone outputs the three-dimensional reminder sound with energy gain of the left earphone
  • the right earphone outputs the three-dimensional reminder sound of the right earphone with energy gain.
  • step S404, step S408 to step S413 are optional steps. In some embodiments, if there is an alarm sound in the user's environment and the user is reminded of the alarm sound through the noise-canceling earphones, step S404, step S408 to step S413 may not be performed. Step S405 to step S407 are directly executed after step S403 is executed.
  • Embodiment 1 may also be performed by noise-canceling headphones.
  • the noise-canceling earphone completely replaces the mobile phone, and the audio information processing method shown in FIG. 4 is completely implemented. That is, after the earphone intelligent reminder alarm sound function is activated, during the operation of the noise-canceling earphone, use its own microphone to collect the sound of the external environment, obtain audio information, and use the audio information to perform steps S402 to S405, step S407 to step S411, and Step S413.
  • the earphone smart reminder alarm sound function is activated, and the microphone array of the mobile phone collects the sound of the external environment to obtain audio information.
  • the noise-canceling earphone uses the audio information to execute steps S402 to S405, steps S407 to S411, and step S413.
  • the user wears noise-cancelling headphones and a smart watch on his wrist, and the mobile phone establishes Bluetooth connections with the smart watch and the noise-cancelling headphones respectively.
  • the noise-canceling earphones and the smart watch can also exchange information through connection channels such as Bluetooth, so as to remind the user when there is a dangerous alarm sound around the user.
  • noise-canceling earphones and smart watches can be found in the foregoing content, and will not be repeated here.
  • a method for processing audio information provided in this embodiment includes:
  • the smart watch acquires the audio information obtained by the noise reduction earphone.
  • the microphone of the noise-canceling headset collects the sound of the external environment to obtain audio information
  • the smart watch can obtain audio information through the Bluetooth channel.
  • the noise-cancelling earphones can transmit audio information to the smart watch through a bluetooth channel, and then the smart watch can transmit the audio information to the smart watch through the bluetooth channel.
  • the noise-canceling earphones can transmit audio information to the smart watch through a bluetooth channel or the like.
  • the smart watch calls the alarm sound detection model to detect whether the audio information contains the alarm sound, and obtains a detection result, which is used to represent whether the audio information contains the alarm sound.
  • the alarm sound detection model has the function of predicting whether the audio information input to the alarm sound detection model contains the alarm sound. Therefore, after acquiring the audio information of the external environment, the smart watch can use the alarm sound detection model to detect whether the audio information contains an alarm sound.
  • the smart watch pre-selects and stores a trained alarm sound detection model. After the smart watch acquires audio information, it invokes the alarm sound detection model to detect whether the audio information contains an alarm sound, and obtains the detection result.
  • the noise-canceling earphone can also call the alarm sound detection model to detect whether the audio information contains the alarm sound, obtain the detection result, and then transmit the detection result to the smart watch. In this way, the smart watch may not perform step S702.
  • step S703 If the detection result indicates that the audio information contains an alarm sound, execute steps S703 and S704; if the detection result indicates that the audio information does not contain an alarm sound, return to step S701.
  • the smart watch uses the audio information to locate the alarm sound, and obtains position information of the alarm sound relative to the user.
  • the smart watch can use the sound source localization algorithm based on the microphone array proposed in the foregoing content, and use audio information to perform sound source localization for the warning sound.
  • the smart watch uses the audio information collected by the microphone of the left earphone of the noise-canceling earphone and the audio information collected by the microphone of the right earphone to locate the sound source of the alarm sound, and obtain the position information of the alarm sound relative to the user.
  • the position information Generally, it includes the horizontal direction angle ⁇ of the alarm sound relative to the user.
  • the audio information collected by the microphone of the left earphone and the audio information collected by the microphone of the right earphone can also be used by the noise-canceling earphone, and the sound source localization algorithm based on the microphone array proposed in the foregoing content can perform sound source localization on the alarm sound , to obtain the location information of the alarm sound relative to the user.
  • the noise-canceling earphones can transmit the resulting warning sound relative to the user's orientation angle to the smart watch. In this way, the smart watch may not perform step S703.
  • the smart watch can obtain the audio information collected by the built-in microphone array of the mobile phone, such as through Bluetooth channel to obtain the audio information collected by the microphone array of the mobile phone.
  • the smart watch uses the audio information collected by the microphone array to locate the sound source of the alarm sound and obtain the position information of the alarm sound relative to the user.
  • the smart watch uses the audio information collected by the microphone array of the mobile phone to locate the sound source of the alarm sound and obtain the position information of the alarm sound relative to the user, which can be described in step S403 in Embodiment 1, and will not be repeated here.
  • the smart watch detects whether the audio information and the previous audio information including the alarm sound are obtained within a preset time period.
  • steps S705 to S707 If it is obtained within a preset time period, then steps S708 to S713 are performed.
  • the smart watch Based on the location information of the alarm sound relative to the user, the smart watch processes the standard reminder sound to obtain a three-dimensional reminder sound.
  • the user can use the alarm sound selection method proposed in the foregoing content to pre-set the standard alarm sound for alarm reminder.
  • the user can be reminded to set the standard alarm sound by displaying the alarm sound selection interface shown in FIG. 3b through the mobile phone.
  • the smart watch also pre-stores a plurality of Head-Response Transfer Function (HRTF) values; wherein, a plurality of Head-Response Transfer Function (Head-Response Transfer Function, HRTF) values, usually Set up in pairs according to the left and right earphones. That is, the multiple HRTF values are divided into multiple HRTF values of the left earphone and HRTF values of the right earphone corresponding to each HRTF value of the left earphone.
  • the HRTF values of a pair of left and right earphones respectively correspond to an angle value of an alarm sound relative to the user.
  • the smart watch can also pre-store multiple Head Related Inpulse Response (HRIR) values; wherein, multiple Head Related Inpulse Response (HRIR) values are usually paired according to the left and right earphones. set up. That is, the multiple HRIR values are divided into multiple HRIR values of the left earphone and HRIR values of the right earphone corresponding to each HRIR value of the left earphone.
  • the HRIR values of a pair of left and right earphones respectively correspond to an angle value of an alarm sound relative to the user.
  • step S705 to process the standard reminder sound to obtain the three-dimensional reminder sound can be the same as the two possible implementation manners of step S405 in the first embodiment above, which will not be repeated here.
  • the smart watch sends a three-dimensional reminder sound to the noise reduction earphone.
  • the smart watch sends the three-dimensional reminder sound of the left earphone and the three-dimensional reminder sound of the right earphone to the noise reduction earphone.
  • the smart watch sends the three-dimensional reminder sound of the left earphone and the three-dimensional reminder sound of the right earphone to the noise reduction earphone through the mobile phone.
  • the noise-canceling earphone plays a three-dimensional reminder sound.
  • the left earphone of the noise-canceling earphone outputs the three-dimensional reminder sound of the left earphone
  • the right earphone outputs the three-dimensional reminder sound of the right earphone
  • the alarm sound is detected in the audio of the external environment
  • the smart watch uses the audio information to locate the alarm sound, obtains the position information of the alarm sound relative to the user, and based on the position information of the alarm sound relative to the user, the processing standard
  • the reminder sound gets a three-dimensional reminder sound, and then the three-dimensional reminder sound is played by the noise-canceling earphones, which can remind the user that there is an alarm sound around and there is a safety problem.
  • the smart watch judges whether the difference between the location information of the alarm sound relative to the user in the audio information and the location information of the alarm sound relative to the user in the previous audio information containing the alarm sound is within a preset range.
  • step S705 If the smart watch determines that the difference between the location information of the alarm sound relative to the user in the audio information and the location information of the alarm sound relative to the user in the previous audio information containing the alarm sound is not within the preset range, then execute step S705 .
  • step S709 is executed.
  • step S708 by the smart watch, please refer to the content of step S408 in the first embodiment above, which will not be repeated here.
  • the smart watch detects whether the audio information is the same sound as the previous audio information including the warning sound.
  • step S409 for an implementation manner in which the smart watch detects whether the audio information is the same as the previous audio containing the alarm sound, please refer to the content of step S409 in the first embodiment above, and details will not be repeated here.
  • step S710 is executed. If the detected audio information does not belong to the same sound as the previous audio information containing the warning sound, step S705 is executed.
  • the smart watch generates a distance coefficient, where the distance coefficient is used to represent the energy gain of the audio information relative to the previous audio information including the warning sound.
  • step S410 For the implementation manner of generating the distance coefficient by the smart watch, please refer to the content of step S410 in the first embodiment above, and details will not be repeated here.
  • the smart watch Based on the location information and distance coefficient of the alarm sound relative to the user, the smart watch processes the standard reminder sound to obtain a three-dimensional reminder sound with energy gain.
  • step S411 For the smart watch to process the standard reminder sound to obtain the three-dimensional reminder sound with energy gain based on the location information and distance coefficient of the alarm sound relative to the user, please refer to the content of step S411 in the first embodiment above, and will not repeat it here.
  • the smart watch sends a three-dimensional reminder sound with energy gain to the noise canceling earphone.
  • the smart watch sends the three-dimensional reminder sound of the left earphone with energy gain and the three-dimensional reminder sound of the right earphone with energy gain to the noise-canceling earphone.
  • the smart watch sends the three-dimensional reminder sound of the left earphone with energy gain and the three-dimensional reminder sound of the right earphone with energy gain to the noise-canceling earphone through the mobile phone.
  • the noise-canceling headset plays a three-dimensional reminder sound with energy gain.
  • the left earphone of the noise-cancelling earphone outputs the three-dimensional reminder sound with energy gain of the left earphone
  • the right earphone outputs the three-dimensional reminder sound of the right earphone with energy gain.
  • Another embodiment of the present application also provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer or a processor, the computer or the processor executes any one of the above-mentioned methods. one or more steps.
  • Another embodiment of the present application also provides a computer program product including instructions.
  • the computer program product is run on the computer or the processor, the computer or the processor is made to perform one or more steps in any one of the above methods.
  • Another embodiment of the present application also provides an audio processing system, the system includes electronic equipment and earphones, electronic equipment such as mobile phones, smart watches, etc., the earphones can be noise-canceling earphones, wherein the working process of the electronic equipment and earphones can be as follows The content of the foregoing embodiment 1 and embodiment 2 will not be described here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Environmental & Geological Engineering (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Otolaryngology (AREA)
  • Telephone Function (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种音频信息的处理方法、电子设备、系统、计算机程序产品及计算机可读存储介质。该音频信息的处理方法包括:获取音频信息,音频信息由采集电子设备所处环境的声音而得到;确定音频信息包括告警声;基于音频信息确定告警声的第一位置信息;确定第一声音,第一声音包括第二位置信息,第一位置信息和第二位置信息均用于标识告警声的声源方向;播放第一声音。因获取采集电子设备所处环境的声音而得到的音频信息,在音频信息包括告警声时,播放包含用于标识告警声的声源方向的第一声音,可以保证用户周围出现告警声,即便用户佩戴耳机,也通过播放的第一声音提供告警声。

Description

音频信息的处理方法、电子设备、系统、产品及介质
本申请要求2021年10月26日提交中国国家知识产权局、申请号为202111248720.6、发明名称为“音频信息的处理方法、电子设备、系统、产品及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及音频处理技术领域,尤其涉及一种音频信息的处理方法、电子设备、系统、计算机程序产品及计算机可读存储介质。
背景技术
用户耳戴降噪耳机处于户外场景时,用户周围的声音会被降噪耳机屏蔽掉。若用户周围出现危险,如用户后方有车靠近,由于降噪耳机屏蔽周围声音的作用,用户并不能听到车辆的鸣笛声,导致出现安全问题。
并且,即便降噪耳机具有通透模式,在该模式下,降噪耳机并不会完全屏蔽周围声音,但若周围声音较为嘈杂,仍然无法保护用户的安全。
发明内容
本申请提供了一种音频信息的处理方法、电子设备、计算机程序产品及计算机可读存储介质,目的在于保证在用户佩戴降噪耳机时,也能提醒用户周围存在的告警声。
为了实现上述目的,本申请提供了以下技术方案:
第一方面,本申请提供了一种应用于电子设备的音频信息的处理方法,该音频信息的处理方法包括:获取音频信息,音频信息由采集电子设备所处环境的声音而得到;确定音频信息包括告警声;基于音频信息确定告警声的第一位置信息;确定第一声音,第一声音包括第二位置信息,第一位置信息和第二位置信息均用于标识告警声的声源方向,第二位置信息与第一位置信息相同或者不同;播放第一声音。
上述音频信息的处理方法中,第一位置信息和第二位置信息,可以指代告警声相对于用户的相对位置信息,也可以指代告警声的绝对位置信息。并且,第一位置信息和第二位置信息相同可以理解成两个数值相同,第一位置信息和第二位置信息不同可以理解成两者近似相同或者在一定范围内,若第一位置信息和第二位置信息均为角度值,则可以理解成两者之差在一定的角度范围内,如1°。
由上述内容可以看出:获取采集电子设备所处环境的声音而得到的音频信息,在音频信息包括告警声时,播放包含用于标识告警声的声源方向的第一声音,可以保证用户周围出现告警声,即便用户佩戴耳机,也通过播放的第一声音提供告警声。
在一个可能的实施方式中,确定第一声音,第一声音包括第二位置信息之前,还包括:确定音频信息与前一个包含告警声的音频信息,未在预设时间段内获取。
在一个可能的实施方式中,还包括:确定音频信息与前一个包含告警声的音频信息,在预设时间段内获取;判断音频信息中的告警声的第一位置信息,与前一个包含告警声的音频信息中的告警声的第一位置信息的差值在预设范围内,且检测音频信息中的告警声和 前一个包含告警声的音频信息中的告警声属于同一声音,生成距离系数,距离系数用于表征音频信息相对于前一个包含告警声的音频信息的能量增益;确定第二声音,第二声音包括第二位置信息和能量增益;播放第二声音。
在本可能的实施方式中,音频信息中的告警声的第一位置信息,与前一个包含告警声的音频信息中的告警声的第一位置信息的差值在预设范围内,音频信息中的告警声和前一个包含告警声的音频信息中的告警声属于同一声音,说明用户周围出现连续两次告警声,因此,播放包括标识告警声的声源方向,且携带能量增益的第二声音,保证以包括能量增益的第二声音来重点提醒用户。
在一个可能的实施方式中,播放第一声音,包括:向耳机发送第一声音,由耳机播放第一声音。
在一个可能的实施方式中,播放第二声音,包括:向耳机发送第二声音,由耳机播放第二声音。
在一个可能的实施方式中,基于音频信息确定告警声的第一位置信息,包括:基于麦克风阵列的声源定位算法,利用音频信息对告警声进行声源定位,得到告警声的第一位置信息。
在一个可能的实施方式中,基于音频信息确定告警声的第一位置信息,包括:基于音频信息,确定告警声的第三位置信息,第三位置信息用于标识告警声相对于电子设备的声源方向;对告警声的第三位置信息进行坐标转换,得到告警声的第一位置信息。
在一个可能的实施方式中,确定第一声音,第一声音包括第二位置信息,包括:获取标准声音;基于告警声的第一位置信息,处理标准声音,得到第一声音,第一声音包括第二位置信息。
在一个可能的实施方式中,基于告警声的第一位置信息,处理标准声音,得到第一声音,包括:获取告警声的第一位置信息对应的头相关冲击响应HRIR值;将标准声音,分别HRIR值进行卷积处理,得到第一声音。
在一个可能的实施方式中,基于告警声的第一位置信息,处理标准声音,得到第一声音,包括:获取告警声的第一位置信息对应的头部相关变换函数HRTF值;将标准声音进行傅里叶变换处理,再与HRTF值作乘,得到第一声音。
在一个可能的实施方式中,检测音频信息中的告警声和前一个包含告警声的音频信息中的告警声属于同一声音的方式,包括:分别对音频信息和前一个包含告警声的音频信息进行时域到频域的转换,得到音频信息和前一个包含告警声的音频信息的幅度谱;利用音频信息和前一个包含告警声的音频信息的幅度谱,对音频信息和前一个包含告警声的音频信息进行相似度计算,得到计算结果,计算结果用于表征音频信息和前一个包含音频信息是否属于同一声音。
在一个可能的实施方式中,利用音频信息和前一个包含告警声的音频信息的幅度谱,对音频信息和前一个包含告警声的音频信息进行相似度计算,得到计算结果,包括:采用皮尔逊相关函数,对音频信息和前一个包含告警声的音频信息进行相似度计算,得到相似度值;其中,相似度值大于阈值,则音频信息和前一个包含告警声的音频信息属于同一声音,相似度值不大于阈值,则音频信息和前一个包含告警声的音频信息不属于同一个声音。
在一个可能的实施方式中,利用音频信息和前一个包含告警声的音频信息的幅度谱,对音频信息和前一个包含告警声的音频信息进行相似度计算,得到计算结果,包括:利用分类模型预测音频信息和前一个包含告警声的音频信息是否属于同一声音。
在一个可能的实施方式中,检测音频信息中的告警声和前一个包含告警声的音频信息中的告警声属于同一声音的方式,包括:从音频信息以及前一个包含告警声的音频信息中,分别提出告警声;判断提取得到的两个告警声是否属于同一个告警声。
在一个可能的实施方式中,判断提取得到的两个告警声是否属于同一个告警声,包括:分别对提取得到的两个告警声进行时域到频域的转换,得到提取得到的两个告警声的幅度谱;利用提取得到的两个告警声的幅度谱,对提取得到的两个告警声进行相似度计算,得到计算结果,计算结果用于表征提取得到的两个告警声是否属于同一个告警声。
在一个可能的实施方式中,利用提取得到的两个告警声的幅度谱,对提取得到的两个告警声进行相似度计算,得到计算结果,包括:采用皮尔逊相关函数,对提取得到的两个告警声进行相似度计算,得到相似度值;其中,相似度值大于阈值,则提取得到的两个告警声属于同一个告警声,相似度值不大于阈值,则提取得到的两个告警声不属于同一个告警声。
在一个可能的实施方式中,利用提取得到的两个告警声的幅度谱,对提取得到的两个告警声进行相似度计算,得到计算结果,包括:利用分类模型预测提取得到的两个告警声是否属于同一个告警声。
在一个可能的实施方式中,生成距离系数之后,还包括:确定距离系数在距离系数的范围内。
在一个可能的实施方式中,还包括:确定距离系数超过距离系数的范围;确定第三声音,第三声音包括第二位置信息和距离系数的范围的端点值表征的能量增益;播放第三声音。
在本可能的实施方式中,在距离系统超过距离系数的范围时,以距离系数的范围的端点值作为距离系数确定第三声音,并播放第三声音,可以避免生成的距离系数过大或过小,导致播放带有能量增益的声音的音量多大或多小。
在一个可能的实施方式中,确定音频信息包括告警声的方式,包括:调用告警声检测模型对音频信息是否包含告警声进行检测,得到检测结果,检测结果用于表征音频信息是否包含告警声。
第二方面,本申请提供了一种电子设备,包括:一个或多个处理器、存储器和无线通信模块;存储器和无线通信模块与一个或多个处理器耦合,存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当一个或多个处理器执行计算机指令时,电子设备执行如第一方面任意一项的音频信息的处理方法。
第三方面,本申请提供了一种计算机存储介质,用于存储计算机程序,计算机程序被执行时,具体用于实现如第一方面任意一项的音频信息的处理方法。
第四方面,本申请提供了一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第一方面任意一项的音频信息的处理方法。
第五方面,本申请提供了一种音频信息的处理系统,包括:电子设备和耳机,其中,电子设备用于执行如第一方面任意一项的音频信息的处理方法;耳机用于与电子设备交互,用于响应电子设备,播放第一声音、第二声音或第三声音。
附图说明
图1为本申请实施例提供的一种应用场景图;
图2a为本申请实施例提供的电子设备的结构示意图;
图2b为本申请实施例提供的电子设备的软件架构图;
图3a为本申请实施例提供的降噪耳机的展示图;
图3b为本申请实施例提供的界面展示图;
图3c为本申请实施例提供的广义互相关时延估计算法的原理图;
图4为本申请实施例一提供的一种音频信息的处理方法的时序图;
图5为本申请实施例提供的告警声相对于用户的位置信息的展示图;
图6为本申请实施例提供的另一种应用场景图;
图7为本申请实施例二提供的一种音频信息的处理方法的时序图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括例如“一个或多个”这种表达形式,除非其上下文中明确地有相反指示。还应当理解,在本申请实施例中,“一个或多个”是指一个、两个或两个以上;“和/或”,描述关联对象的关联关系,表示可以存在三种关系;例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A、B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。
在本说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
本申请实施例涉及的多个,是指大于或等于两个。需要说明的是,在本申请实施例的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。
图1展示的应用场景中,用户耳戴降噪耳机处于户外场景时,用户周围的声音会被降噪耳机屏蔽掉。若用户周围出现危险,如图1展示的用户后方有车靠近,由于降噪耳机屏蔽周围声音的作用,用户并不能听到车辆的鸣笛声,导致出现安全问题。
并且,即便降噪耳机具有通透模式,在该模式下,降噪耳机并不会完全屏蔽周围声音,但若周围声音较为嘈杂,仍然无法保护用户的安全。
基于上述问题,本申请实施例提出音频信息的处理方法。本申请实施例提供的音频信息的处理方法,可应用于图1展示的应用场景。在本应用场景中,用户通过手机和降噪耳机的交互,可实现在用户周边出现危险的告警声时,进行提醒。
图2a展示了本申请实施例提供的一种电子设备的组成示例。本应用场景中提出的手机的组成结构,同样如图2a所示。并且,本申请实施例,除了通过手机和降噪耳机进行交互,完成告警声提醒之外,还可通过其他电子设备和降噪耳机进行交互完成。如平板电脑,桌面型、膝上型、笔记本电脑,超级移动个人计算机(Ultra-mobile Personal Computer,UMPC),手持计算机,上网本,个人数字助理(Personal Digital Assistant,PDA),可穿戴电子设备等,其组件结构也如图2a所示。
电子设备200可以包括处理器210,外部存储器接口220,内部存储器221,显示屏230,天线1,天线2,移动通信模块240,无线通信模块250以及音频模块260等。
可以理解的是,本实施例示意的结构并不构成对该电子设备的具体限定。在另一些实施例中,该电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器210可以包括一个或多个处理单元,例如:处理器210可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
处理器210中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器210中的存储器为高速缓冲存储器。该存储器可以保存处理器210刚用过或循环使用的指令或数据。如果处理器210需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器210的等待时间,因而提高了系统的效率。
外部存储器接口220可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备的存储能力。外部存储卡通过外部存储器接口220与处理器210通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器221可以用于存储计算机可执行程序代码,可执行程序代码包括指令。处理器210通过运行存储在内部存储器221的指令,从而执行电子设备200的各种功能应用以及数据处理。内部存储器221可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器221可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器210通过运行存储在内部存储器221的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备的各种功能应用以及数据处理。
电子设备通过GPU,显示屏230,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏230和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器210可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
电子设备可以通过ISP,摄像头,视频编解码器,GPU,显示屏230以及应用处理器等实现拍摄功能。
电子设备的无线通信功能可以通过天线1,天线2,移动通信模块240,无线通信模块250,调制解调处理器以及基带处理器等实现。
移动通信模块240可以提供应用在电子设备上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块240可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块240还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。
无线通信模块250可以提供应用在电子设备上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块250可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块250经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器210。无线通信模块250还可以从处理器210接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
一些实施例中,无线通信模块250中的蓝牙模块用于实现电子设备200与其他电子设备之间的短距离通信,如电子设备200和降噪耳机通过蓝牙模块进行交互。蓝牙模块可以是集成电路或者蓝牙芯片等。
电子设备200可以通过音频模块260,扬声器270A,受话器270B,麦克风270C,耳机接口270D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块260用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块260还可以用于对音频信号编码和解码。在一些实施例中,音频模块260可以设置于处理器210中,或将音频模块260的部分功能模块设置于处理器210中。
扬声器270A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备200可以通过扬声器270A收听音乐,或收听免提通话。
一些实施例中,扬声器270A可用于播放本申请实施例提及的三维提醒声。
受话器270B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备200接听电话或语音信息时,可以通过将受话器270B靠近人耳接听语音。
麦克风270C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风270C发声,将声音信号输入到麦克风270C。电子设备200可以设置至少一个麦克风270C。在另一些实施例中,电子设备200可以设置两个麦克风270C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中, 电子设备200还可以设置三个,四个或更多麦克风270C,形成麦克风阵列,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
一些实施例中,麦克风270C用于采集电子设备所处的外界环境的声音。
另外,在上述部件之上,运行有操作系统。例如iOS操作系统,Android操作系统,Windows操作系统等。在操作系统上可以安装运行应用程序。
图2b是本申请实施例的电子设备的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。如图2b所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,以及蓝牙等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。如图2b所示,应用程序框架层可以包括窗口管理器,内容提供器,电话管理器,资源管理器,通知管理器,视图系统等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
电话管理器用于提供电子设备的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。在本申请一些实施例中,应用冷启动会在Android runtime中运行,Android runtime由此获取到应用的优化文件状态参数,进而Android runtime可以通过优化文件状态参数判断优化文件是否因系统升级而导致过时,并将判断结果返回给应用管控模块。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),二维图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG2,H.262,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染、合成和图层处理等。
二维图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,和传感器驱动等。
需要说明的是,本申请实施例虽然以Android系统为例进行说明,但是其基本原理同样适用于基于iOS、Windows等操作系统的电子设备。
本申请实施例中,降噪耳机通常可为蓝牙耳机。蓝牙耳机为支持蓝牙通信协议的耳机。其中,蓝牙通信协议可以为ER传统蓝牙协议,也可以为BDR传统蓝牙协议,还可以为BLE低功耗蓝牙协议。当然,还可以是未来推出的其他新的蓝牙协议类型。从蓝牙协议的版本角度,蓝牙通信协议的版本可以为下述中的任一:1.0系列版本、2.0系列版本、3.0系列版本、4.0系列版本、基于未来推出的其他系列版本。
本申请实施例的蓝牙耳机,通常指由左耳机和右耳机组成的双蓝牙耳机,能够为用户提供立体声的音效。常见的双蓝牙耳机有传统入耳式蓝牙耳机和真无线立体声(true wireless stereo,TWS)蓝牙耳机。传统入耳式蓝牙耳机虽然省去了两个耳机与音源之间的连接线,但是左耳机和右耳机之间仍然需要通过连接线连接,才能进行音频信号的同步。而TWS蓝牙耳机不仅省去了两个耳机与音源之间的连接线缆,还省去了左耳机和右耳机之间的连接线。
左耳机和右耳机内均设置有蓝牙模块,左耳机和右耳机可以之间通过蓝牙协议进行数据传输。左耳机和右耳机均包括麦克风,也就是说,主耳机和副耳机除了具有音频播放的功能外,主耳机和副耳机还具有音频采集的功能。
本申请实施例中的蓝牙耳机可以为下述应用中的一种或多种:HSP(Headset Profile)应用、HFP(Hands-free Profile)应用、A2DP(Advanced Audio Distribution Profile)应用、AVRCP(Audio/Video Remote Control Profile)应用。
其中,HSP应用代表耳机应用,提供电子设备与耳机之间通信所需的基本功能。蓝牙耳机可以作为电子设备的音频输入和输出接口。
HFP应用代表免提应用,HFP应用在HSP应用的基础上增加了某些扩展功能,蓝牙耳机可以控制终端的通话过程,例如:接听、挂断、拒接、语音拨号等。
A2DP应用为高级音频传送应用,A2DP能够采用耳机内的芯片来堆栈数据,达到声音的高清晰度。
AVRCP应用为音频视频遥控应用,AVRCP应用定义了如何控制流媒体的特征,包括:暂停、停止、启动重放、音量控制及其它类型的远程控制操作。
还需要说明的是,本申请实施例中,降噪耳机可设置耳机智能提醒告警声功能的启动按钮。如图3a所示的一种示例中,右耳机上设置耳机智能提醒告警声功能的启动按钮101,启动按钮101可包含第一位置11和第二位置22。启动按钮101位于第一位置11,耳机智能提醒告警声功能被启动;启动按钮101位于第二位置22,耳机智能提醒告警声功能被关闭。
其中,耳机智能提醒告警声功能的启动按钮,可以为降噪耳机的其他功能的按钮为同一个按钮,也可为一个单独的按钮。
在降噪耳机启动耳机智能提醒告警声功能后,在降噪耳机确定用户周围有告警声时,降噪耳机可向用户播放告警声。降噪耳机向用户播放的告警声的种类,可进行设定。同样参见图3a的示例,左耳机上设置告警声选择按钮102。通过告警声选择按钮102的触发,进行告警声的选择。
一些实施例中,用户点击告警声选择按钮102,降噪耳机响应用户的点击操作进行语音播报,进行告警声的选择。一个示例中,告警声可分三种模式:默认告警声、智能推荐告警声和手动选择告警声。其中,默认告警声为系统设定的告警声,智能推荐告警声可结合降噪耳机的运行状态提供不同的告警声,手动选择告警声,用户可通过点击告警声选择按钮102选择手动选择不同的告警声,如不同种类的车辆的鸣笛声。
需要说明的是,图3a以头戴式蓝牙耳机为例进行说明,但这并不构成对本申请实施例所涉蓝牙耳机的限定。并且,图3a展示的启动按钮101和告警声选择按钮102是物理按键,在一些实施例中,启动按钮101和告警声选择按钮102也可以是虚拟按键。
蓝牙耳机的左耳机或右耳机可设置虚拟按键,通过对虚拟按键的触发来启动耳机智能提醒告警声功能。虚拟按键的触发也可设置为多种形式,一些实施例中,通过不同时长的触摸来实现启动或关闭耳机智能提醒告警声功能;另一些实施例中,也可以通过不同次数的触摸来实现启动或关闭耳机智能提醒告警声功能;另一些实施例中,还可通过触发不同位置来实现启动或关闭耳机智能提醒告警声功能。
同理,蓝牙耳机的左耳机或右耳机也可设置虚拟按键,通过对虚拟按键的触发来不同的告警声的选择。虚拟按键的触发也可设置为多种形式,一些实施例中,通过不同时长的触摸来选择不同的告警声;另一些实施例中,也可以通过不同次数的触摸来选择不同的告警声;另一些实施例中,还可通过触发不同位置来实现选择不同的告警声。
耳机智能提醒告警声功能的控制启动和关闭,以及不同告警声的控制选择,还可以通过电子设备来实现。
参见图3b的一种示例,电子设备的蓝牙耳机的设置界面上呈现有耳机智能提醒告警声、主动降噪、手势和告警声选择四种项目,用户可通过每一条项目的启动按钮来启动该项目 对应的功能。图3b的展示的示例中,耳机智能提醒告警声处于启动状态,其他三个项目的功能处于关闭状态。
需要指出的是,耳机智能提醒告警声被启动,与手机蓝牙连接的降噪耳机可与手机进行交互,实现在用户周边出现危险的告警声时,进行告警声提醒。
告警声的选择被启动,用户可通过手动输入操作,完成在用户周边出现危险的告警声时,进行告警声提醒时的告警声的选择。
告警声的选择是具有子界面的项目,用户通过滑动点击告警声的选择的启动按钮,告警声的选择功能被启动,并且,告警声的选择的子界面被显示。图3b的示例中,告警声的选择的子界面展示了四种模式,分别为默认告警声、智能推荐告警声、自定义和手动选择告警声。默认告警声、智能推荐告警声和手动选择告警声可如前所述。自定义可以理解成用户可编辑完成自定义的告警声。图3b的展示的示例中,默认告警声处于启动状态,其他三个模式处于关闭状态。
若用户启动手动选择告警声,即用户滑动点击手动选择告警声的启动按钮,手动选择告警声的子界面被显示,如图3b的示例。在本示例中,手动选择告警声的子界面包括四种车辆的告警声,用户可通过不同车辆的启动按钮,来选择告警声。图3b展示的示例中,车辆1处于被启动状态,其他三种车辆处于关闭状态。
还需要说明的是,前述提出的如手机等电子设备以及降噪耳机,还可设置告警声检测模型,告警声检测模型具有预测输入到告警声检测模型的音频信息中是否包含告警声的功能。告警声检测模型可采用卷积神经网络(Convolutional Neural Network,CNN)、长短期记忆人工神经网络(Long-Short Term Memory,LSTM)等基础网络模型。
卷积神经网络通常包括:输入层、卷积层(Convolution Layer)、池化层(Pooling layer)、全连接层(Fully Connected Layer,FC)和输出层。一般来说,卷积神经网络的第一层是输入层,最后一层是输出层。
卷积层(Convolution Layer)是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。
池化层(Pooling layer),通常在卷积层之后会得到维度很大的特征,将特征切成几个区域,取其最大值或平均值,得到新的、维度较小的特征。
全连接层(Fully-Connected layer),把所有局部特征结合变成全局特征,用来计算最后每一类的得分。
长短期记忆人工神经网络(Long-Short Term Memory,LSTM)通常包括输入层、隐含层以及输出层。其中,输入层由至少一个输入节点组成;当LSTM网络为单向网络时,隐含层仅包括前向隐含层,当LSTM网络为双向网络时,隐含层包括前向隐含层以及后向隐含层。对于每个输入节点分别与前向隐含层节点以及后向隐含层节点连接,用于分别向前向隐含层节点以及后向隐含层节点输出输入数据,每个隐含层中的隐含节点分别与输出节 点连接,用于向输出节点输出自己的计算结果,输出节点根据隐含层的输出节点进行计算,并输出数据。
告警声检测模型可采用下述方式进行训练:
构建告警声检测原始模型。其中,告警声检测原始模型可选择CNN、LSTM等基础网络模型。
获取大量的训练样本,训练样本包括:包含告警声的样本以及不包含告警声的样本,并且,训练样本被标记出样本中是否包含告警声。训练样本中的告警声,可如车辆的鸣笛声。当然,为了训练出告警声检测模型能够预测的告警声更多样,可获取包含汽车、摩托车等不同种类的机动车的鸣笛声的训练样本,以及包含警铃声等其他告警声的训练样本。警铃声可以理解成是救护车、警车、以及消防车等特殊车辆行驶时的警报声。
将训练样本输入到告警声检测原始模型,由告警声检测原始模型对训练样本是否包含告警声进行检测,得到检测结果。
利用损失函数对检测结果和每个训练样本的标记结果进行损失值的计算,得到模型的损失值。一些实施例中,可采用交叉熵损失函数、加权损失函数等损失函数进行损失值计算,或者采用多种损失函数组合的方式,计算多种损失值。
判断模型的损失值是否符合模型的收敛条件。
一些实施例中,模型收敛条件可以是,模型的损失值小于或等于预先设定的损失阈值。也就是说,可以将模型的损失值和损失阈值进行比较,若模型的损失值大于损失阈值,则可以判断出模型的损失值不符合模型收敛条件,反之,若模型的损失值小于或等于损失阈值,则可以判断出模型损失值符合模型收敛条件。
需要说明的是,多个训练样本,可以针对每一个训练样本计算得到对应的模型的损失值,这种情况下,只有在每一个训练样本的模型损失值均符合模型收敛条件的情况下才会执行,反之,只要有一个训练样本的模型损失值不符合模型收敛条件,则执行后续步骤。
若模型的损失值符合模型的收敛条件,则说明模型训练结束。训练结束的模型则可用于下述实施例提出的音频信息的处理方法,对输入到模型的音频信息是否包含告警声的检测。
若模型的损失值不符合模型的收敛条件,则根据模型的损失值计算得到模型的参数更新值,并以模型的参数更新值,更新告警声检测原始模型。并利用更新后的模型,继续对训练样本进行处理,得到检测结果,继续执行后续过程,直至模型的损失值符合模型的收敛条件。
还需要说明的是,本申请实施例中,可基于麦克风阵列的声源定位算法等,对声音进行定位。声源定位算法是利用麦克风阵列进行声音定位。常用的声源定位算法主要有三大类:基于高分辨率谱估计的定位技术、基于可控波束形成(Beamforming)的定位技术和基于TDOA的定位技术。
基于TDOA的声音定位算法实现原理简单,一般分为延时估计和声源定位两个部分,延时估计可计算出来自于不同麦克风的两路信号的到达时间差,声源定位可根据时间差计算出声源发出的声音的角度。
时延估计的算法主要有基于相关分析的时延估计方法,基于相位谱估计的时延估计方法,基于参数估计的时延估计方法等,应用最广泛的方法主要为基于相关分析的时延估计方法中的广义互相关函数法(GCC)。基于相关分析的时延估计方法中的广义互相关函数法(GCC)引入了一个加权函数,对互功率谱密度进行调整,从而优化时延估计的性能。根据加权函数的不同,广义互相关函数有多种不同的变形,广义互相关-相位变换方法(Generalized Cross Correlation PHASE Transformation,GCC-PHAT)方法应用最为广泛。
广义互相关函数时延估计算法,根据两路麦克风信号的互相关函数峰值来估计时延值。在声源定位系统中,麦克风阵列的每个阵元接收到的目标信号都来自于同一个声源。因此,各通道信号之间具有较强的相关性。理想情况下,通过计算每两路信号之间的相关函数,就可以确定两个麦克风观测信号之间的时延。
阵列中两个麦克风的接收信号x 1(t)和x 2(t),如公式一所示,为:
公式一
x 1(t)=a 1s(t-τ 1)+n 1(t)
x 2(t)=a 2s(t-τ 2)+n 2(t)
其中,t指代时间,s(t)为声源信号,n 1(t)和n 2(t)为环境噪声,τ 1和τ 2是信号从声源处传播到两个麦克风阵元的传播时间。
广义互相关时延估计算法的原理可参见图3c。
图3c中,X 1(ω)为对x 1(t)进行傅里叶变换FFT的结果,X 2(ω)为对x 2(t)进行傅里叶变换FFT的结果,ω指代麦克风的接收信号的角频率,( .) *指代对X 2(ω)进行共轭处理,φ(ω)为相位变换加权函数,用于对X 1(ω)和X 2(ω)的共轭做相位变换加权,得到计算结果。计算结果再反傅里叶变换IFFT之后进行峰值检测,利用峰值检测结果输出τ 12
其中,τ 12=τ 12,为两路麦克风信号的时间差。
GCC-PHAT只利用了两个麦克风的信号,如果麦克风数量多于两个,就可以使用其他方法进行延时估计,比如基于相位变换加权的可控响应功率的声源定位算法(Steered Response Power-Phase Transform,SRP-PHAT)。SRP-PHAT算法的基本原理是在假想声源位置计算所有麦克风对接收信号的相位变换加权的广义互相关GCC-PHAT函数之和,在整个声源空间寻找使SRP值最大的点即为声源位置估计。
实施例一
基于前述内容,本实施例提供了一种音频信息的处理方法,本实施例提供的音频信息的处理方法,可用于图1的应用场景。参见图4,该音频信息的处理方法,包括步骤:
S401、手机获取音频信息。
其中,音频信息为采集外界环境的声音而得到,可通过麦克风采集外界环境的声音。在图1所示的应用场景中,手机和降噪耳机均设置有麦克风,因此,可由手机中的麦克风或降噪耳机中的麦克风采集外界环境的声音,得到音频信息。
还需要说明的是,耳机智能提醒告警声功能被启动之后,在降噪耳机处于运行状态,手机或降噪耳机可周期性或实时的采集外界环境的声音,得到音频信息。
降噪耳机采集外界环境的声音,得到音频信息,可通过蓝牙通道等与手机连接的通道,将音频信息传输到手机。
本实施例以降噪耳机采集外界环境的声音,得到音频信息,手机获取降噪耳机得到的音频信息,并执行下述步骤为例进行说明。
S402、手机调用告警声检测模型,对音频信息是否包含告警声进行检测,得到检测结果,该检测结果用于指示音频信息是否包含告警声。
如前所述,告警声检测模型具有预测输入到告警声检测模型的音频信息中是否包含告警声的功能。因此,在获取外界环境的音频信息后,可利用告警声检测模型,对音频信息进行是否包含告警声的检测,并得到检测结果。
本实施例中,手机获取音频信息后,调用告警声检测模型,对音频信息是否包含告警声进行检测,得到检测结果。
另一些实施例中,也可由降噪耳机调用告警声检测模型,对音频信息是否包含告警声进行检测,得到检测结果,再将检测结果传输到手机。如此,手机可不执行步骤S402。
若检测结果指示音频信息包含告警声,则执行步骤S403和步骤S404;若检测结果指示音频信息不包含告警声,则返回执行步骤S401。
需要说明的是,本申请实施例提及的告警声,均可以理解成前述内容提及的告警声,如多种类型的机动车的鸣笛声或者警铃声。
S403、手机利用音频信息对告警声进行定位,得到告警声相对于用户的位置信息。
本实施例中,手机可利用前述内容提出的基于麦克风阵列的声源定位算法,利用音频信息对告警声进行声源定位。具体的,手机利用降噪耳机的左耳机的麦克风采集的音频信息,以及右耳机的麦克风采集的音频信息,对告警声进行声源定位,得到告警声相对于用户的位置信息,该位置信息一般包括告警声相对于用户的水平方向角θ。图5展示了告警声相对于用户(指代用户人头的中心点)的水平方向角θ的一种示例。
需要说明的是,因手机利用耳机采集的音频信息进行定位,因此本步骤得到的告警声相对于用户的位置信息,是指告警声相对于耳机的位置信息。同理,下述内容中提出的告警声相对于用户的位置信息均指代告警声相对于耳机的位置信息。
还需要说明的是,采用基于麦克风阵列的声源定位算法,得到的告警声相对于用户的位置信息可以理解成是告警声的相对位置信息。当然,步骤S403中,手机利用音频信息对告警声进行定位,得到告警声相对于用户的位置信息,也可以是指得到告警声的绝对位置。
另一些实施例中,也可由降噪耳机利用左耳机的麦克风采集的音频信息和右耳机的麦克风采集的音频信息,前述内容提出的基于麦克风阵列的声源定位算法,对告警声进行声源定位,得到告警声相对于用户的位置信息。降噪耳机可将得到的告警声相对于用户的方向角传输到手机。如此,手机可不执行步骤S403。
还需要说明的是,由于手机的麦克风也可采集外界环境的声音,得到音频信息,因此,本申请的另一些实施例中,手机可获取手机内置的麦克风阵列采集的音频信息,并利用麦克风阵列采集的音频信息,对告警声进行声源定位,得到告警声相对于手机的位置信息。
由于手机和用户可能会存在相对角度,因此,手机利用自身的麦克风阵列采集的音频信息,得到告警声相对于手机的位置信息之后,还需要对告警声相对于手机的位置信息进行坐标转换,得到告警声相对于用户的位置信息。
其中,可基于手机和降噪耳机相对的同一个的坐标系,对告警声相对于手机的位置信息进行坐标转换,得到告警声相对于用户的位置信息。一些实施例中,因手机和降噪耳机相对于大地坐标系已知,因此,可基于大地坐标系,对告警声相对于手机的位置信息进行坐标转换,得到告警声相对于用户的位置信息。当然,还可以基于其他手机和降噪耳机均统一的坐标系进行坐标转换。
为了适配坐标转换,需要降噪耳机计算出相对大地坐标系的姿态角,因此,降噪耳机内需设置加速度传感器和角速度传感器,通常需要设置与手机同种类的加速度传感器和角速度传感器。
具体的,手机利用自身的加速度传感器以及角速度传感器的检测数据,计算得到手机的姿态角。降噪耳机利用自身的加速度传感器以及角速度传感器的检测数据,计算得到耳机的姿态角。手机获取耳机的姿态角,并利用手机的姿态角和耳机的姿态角,确定耳机和手机的坐标系的转换关系,利用该转换关系,处理告警声相对手机的位置信息,得到告警声相对于用户的位置信息。
需要说明的是,手机和降噪耳机利用自身的加速度传感器以及角速度传感器的检测数据,计算得到姿态角的具体方式,可参见常规方式,此处不展开说明。同理,手机利用手机的姿态角和耳机的姿态角,确定耳机和手机的坐标系的转换关系,利用该转换关系,处理告警声相对手机的位置信息,得到告警声相对于用户的位置信息,也可参见常规方式,此处不展开说明。
还需要说明的是,本申请的另一些实施例中,手机获取手机内置的麦克风阵列采集的音频信息,手机将获取的音频信息,传输到降噪耳机。降噪耳机利用音频信息,对告警声进行定位,得到告警声相对于用户的位置信息。
具体的,手机的麦克风阵列采集外界环境的音频信息,因此,降噪耳机利用该音频信息,采用前述内容提出的基于麦克风阵列的声源定位算法,对告警声进行声源定位时,得到的是告警声相对于手机的位置信息。降噪耳机再采用前述内容,对告警声相对于手机的位置信息进行坐标转换,得到告警声相对于用户的位置信息。
S404、手机检测音频信息,与前一个包含告警声的音频信息是否在预设时间段内获取。
若手机检测音频信息与前一个包含告警声的音频信息不是在预设时间段内获取,则执行步骤S405至S407;若手机检测音频信息,与前一个包含告警声的音频信息的时间差在预设时间段内获取,则执行步骤S408至S413。
因手机是周期性获取音频信息,因此,前一个包含告警声的音频信息是指:手机针对本次确定包含告警声的音频信息之前进行告警声检测时,最邻近的一次确定包含告警声的音频信息。
手机检测音频信息和前一个包含告警声的音频信息在预设时间段内获取,说明手机在一个时间段内连续检测到两个告警声,因此可能需要重点提醒用户该告警声。
需要说明的是,可根据实际需求设定预设时间段,因需要通过步骤S404来筛选出连续的两个告警声,因此预设时间段不宜设定过长。在一个示例中,预设时间段可设定为30秒。
S405、手机基于告警声相对于用户的位置信息,处理标准提醒声得到三维提醒声。
其中,三维提醒声可以理解成是携带有方向的告警声。可通过三维声音技术处理标准提醒声,得到携带方位的告警声。携带方向的告警声输出给用户后,可以让用户感受到告警声的方向。
需要说明的是,本实施例中,手机预先存储有多个标准提醒声,用户可采用前述内容提出的告警声选择方式,预先设定进行告警提醒的标准提醒声。当然,也可以在执行本步骤之前,通过手机展示图3b的告警声的选择的界面,提醒用户进行标准提醒声的设定。
标准提醒声,可以理解成是不包含杂音的告警声,通常可为车辆的鸣笛声。
一些本实施例中,手机还预先存储多个头部相关变换函数(Head-Response Transfer Function,HRTF)值;其中,多个头部相关变换函数(Head-Response Transfer Function,HRTF)值,通常按照左右耳机成对设置。即多个的HRTF值被分为多个左耳机的HRTF值和每个左耳机的HRTF值对应的右耳机的HRTF值。一对左、右耳机的HRTF值分别对应一个告警声相对于用户的一个角度值。
通常可以人头为中心点,间隔中心点一定距离的360°可被划分为多个角度值,每一个角度值设置两个对应的头部相关变换函数(Head-Response Transfer Function,HRTF)值,一个HRTF值对应左耳机,一个HRTF值对应右耳机。一些实施例中,可将中心点外围的360°进行等分,划分为多个角度值。并且,划分角度的数量,可根据实际情况进行设定。
头部相关变换函数(Head-Response Transfer Function,HRTF)是一种声音定位的处理技术,通过测量人耳对不同方位声音变换数据,统计并计算得出的人耳感知模型。
本实施例中,头部相关变换函数(Head-Response Transfer Function,HRTF)值的计算方式如公式二所示:
公式二
Figure PCTCN2022116528-appb-000001
Figure PCTCN2022116528-appb-000002
P L和P R分别是声源在左右耳产生的频域复数声压;P 0是人头移开后,声源在头部中心的频域的声压,P 0的定义如公式三所示:
公式三
Figure PCTCN2022116528-appb-000003
其中,ρ 0是介质(空气)的密度,c代表声音的速度,常温下空气中的c为344m/s,Q 0是声源的强度,k=2πf/c是声波的波数,r代表声源距离人的相对距离,如1.5米,f代表声音的频率。
基于前述内容,本步骤的一个可能的实施方式,包括:
获取用户设定的标准提醒声。
利用告警声相对于用户的位置信息,获取该位置信息对应的左耳机的HRTF值和右耳机的HRTF值。
一些实施例中,告警声相对于用户的位置信息包括:告警声相对于用户水平方向角。将告警声相对于用户的水平方向角为筛选因子,在手机内存储的多个HRTF值进行筛选,得到告警声相对于用户的水平方向角相匹配的左耳机的HRTF值和右耳机的HRTF值。
将标准提醒声,分别与位置信息对应的左耳机的HRTF值和右耳机的HRTF值进行傅里叶变换乘积处理,得到双耳输出信号,即左耳机的三维提醒声和右耳机的三维提醒声。
另一些实施例中,手机还可预先存储多个头相关冲击响应(Head Related Inpulse Response,HRIR)值;其中,多个头相关冲击响应(Head Related Inpulse Response,HRIR)值,通常按照左右耳机成对设置。即多个的HRIR值被分为多个左耳机的HRIR值和每个左耳机的HRIR值对应的右耳机的HRIR值。一对左、右耳机的HRIR值分别对应一个告警声相对于用户的一个角度值。
头相关冲击响应(Head Related Inpulse Response,HRIR)属于时域信号,头部相关变换函数(Head-Response Transfer Function,HRTF)属于HRIR对应的频域信号。
基于此,本步骤的另一个可能的实施方式,包括:
获取用户设定的标准提醒声。
利用告警声相对于用户的位置信息,获取该位置信息对应的左耳机的HRIR值和右耳机的HRIR值。
一些实施例中,告警声相对于用户的位置信息包括:告警声相对于用户的水平方向角。将告警声相对于用户的水平方向角为筛选因子,在手机内存储的多个HRIR值进行筛选,得到告警声相对于用户的水平方向角相匹配的左耳机的HRIR值和右耳机的HRIR值。
将标准提醒声,分别与位置信息对应的左耳机的HRIR值和右耳机的HRIR值进行卷积处理,得到双耳输出信号,即左耳机的三维提醒声和右耳机的三维提醒声。
还需要说明的是,前述内容提出的基于告警声相对于用户的位置信息,可以与步骤S403中得到的告警声相对于用户的位置信息完全相同,也可以近似相同,或两者差值在一定的范围内。
S406、手机向降噪耳机发送三维提醒声。
一些实施例中,手机可通过蓝牙等连接通道,向降噪耳机发送左耳机的三维提醒声和右耳机的三维提醒声。
S407、降噪耳机播放三维提醒声。
其中,降噪耳机的左耳机输出左耳机的三维提醒声,右耳机输出右耳机的三维提醒声。
本实施例中,在外界环境的音频中检测到告警声,手机利用音频信息对告警声进行定位,得到告警声相对于用户的位置信息,并基于告警声相对于用户的位置信息,处理标准 提醒声得到三维提醒声,再由降噪耳机播放三维提醒声,可提醒用户周围出现告警声,存在安全问题。
S408、手机判断音频信息中告警声相对于用户的位置信息,与前一个包含告警声的音频信息中告警声相对于用户的位置信息的差值,是否在预设范围内。
若手机判断出音频信息中告警声相对于用户的位置信息,与前一个包含告警声的音频信息中告警声相对于用户的位置信息的差值未在预设范围内,则执行步骤S405。
若手机判断出音频信息中告警声相对于用户的位置信息,与前一个包含告警声的音频信息中告警声相对于用户的位置信息的差值在预设范围内,则执行步骤S409。
其中,音频信息中告警声相对于用户的位置信息,与前一个包含告警声的音频信息中告警声相对于用户的位置信息的差值在预设范围内,说明在同一个范围内先后出现了两次告警声,需要重点提醒用户该告警声。
预设范围可根据实际情况进行设定,一般可设定告警声相对于用户的水平方向角之差小于第一阈值。当然,第一阈值可根据实际情况进行设定,一个示例中,第一阈值可为5°。
S409、手机检测音频信息与前一个包含告警声的音频信息,是否属于相同声音。
一些实施例中,手机检测音频信息与前一个包含告警声的音频信息,是否属于相同声音是指:手机检测音频信息中的告警声,与前一个包含告警声的音频信息中的告警声,是否属于同一告警声。
本步骤中,对音频信息中的告警声,与前一个包含告警声的音频信息中的告警声是否属于同一告警声的检测方式,可包含下述两种。
第一种:检测音频信息与前一个包含告警声的音频信息,是否属于相同声音。
第二种:从音频信息以及前一个包含告警声的音频信息中,分别提出告警声;判断提取得到的两个告警声是否属于同一个告警声。
其中,告警声检测模型检测音频信息包含告警声时,告警声检测模型可得到音频信息中的告警声的位置信息,因此,可利用该告警声的位置信息,从音频信息以及前一个包含告警声的音频信息中提取告警声。
以下以第一种方式为例,对音频信息中的告警声,与前一个包含告警声的音频信息中的告警声是否属于同一告警声的过程进行具体说明。当然,在第二种方式中,判断两个告警声是否属于同一个告警声的方式,也可参见下述内容。
还需要说明的是,前后两个告警声,告警声的强度可能不同,但若两者属于同一声源的告警声,那告警声的频率应是相同的,因此,一个可能的实施方式中,可利用幅度谱进行判断前后两个包含告警声的音频信息是否属于相同声音。具体实施方式如下:
获取前后两个包含告警声的音频信息。
对每个包含告警声的音频信息进行时域到频域的转换,得到每个包含告警声的音频信息的幅度谱。其中,可通过对包含告警声的音频信息进行傅里叶变换,得到音频信息的幅度谱。
需要说明的是,幅度谱的x轴是频率,y轴是音频信息的幅度。利用幅度谱可以查看指定频率端的能量分布。
利用前后两个包含告警声的音频信息的幅度谱,对前后两个音频信息进行相似度计算,得到计算结果,该计算结果用于表征前后两个音频信息是否属于同一声音。
一些实施例中,可采用皮尔逊相关函数,对前后两个包含告警声的音频信息进行相似度计算,得到相似度值。
具体的,对前后两个包含告警声的音频信息进行采样点采集,得到每个包含告警声的音频信息的n个采样点,前后两个包含告警声的音频信息的采样点可称之为(Xi,Yi),利用前后两个包含告警声的音频信息的采样点代入下述公式四,可计算得到皮尔逊相关系数r。
公式四
Figure PCTCN2022116528-appb-000004
当计算出皮尔逊相关系数r后,可以通过表一判断前后两个包含告警声的音频信息的相关强度。
表一
r 相关强度
0.8-1.0 极强相关
0.6-0.8 强相关
0.4-0.6 中等程度相关
0.2-0.4 弱相关
0.0-0.2 极弱相关
需要说明的是,可按照表一提供的皮尔逊相关系数r和相关强度的关系,设定一个阈值,如设定阈值为0.8。前后两个包含告警声的音频信息的相似度值大于该阈值,则前后两个包含告警声的音频信息属于同一声音,前后两个包含告警声的音频信息的相似度值不大于阈值,则前后两个包含告警声的音频信息不属于同一个声音。
另一些实施例中,可通过分类模型预测两个音频信息是否属于相同声音,如二分类模型、som模型、SVM模型等。本实施例中,分类模型的训练过程可参见前述内容提出的告警声检测模型的内容,此处不展开说明。
训练结束的分类模型,具有预测输入到分类模型的两路输入信号,如本实施例的前后两个包含告警声的音频信息,是否为同一类的分类结果,得到预测结果。一种示例中,预测结果为1,前后两个包含告警声的音频信息属于同一声音;预测结果为0,前后两个包含告警声的音频信息不属于同一个声音。
其中,若检测音频信息与前一个包含告警声的音频信息属于相同声音,则执行步骤S410。若检测音频信息与前一个包含告警声的音频信息不属于相同声音,则执行步骤S405。
还需要说明的是,手机检测音频信息与前一个包含告警声的音频信息属于相同声音,说明用户在同一个方位,连续两次出现了同一声源的告警声,因此,需要重点提醒用户该告警声。
还需要说明的是,步骤S403,与步骤S404、步骤S408和步骤S409可不限于图4展示的执行顺序,可并行执行。并且,步骤S404、步骤S408和步骤S409也可不限于图4展示的执行顺序,可并行执行或采用其他执行顺序。
S410、手机生成距离系数,该距离系数用于表征音频信息相对于前一个包含告警声的音频信息的能量增益。
需要说明的是,若音频信息的能量大于前一个包含告警声的音频信息的能量,则该能量增益为正,即距离系数为大于1的数值;若音频信息的能量小于前一个包含告警声的音频信息的能量,则该能量增益为负,即距离系数为小于1的数值;若音频信息的能量与前一个包含告警声的音频信息的能量相同,则该能量增益为0,即距离系数为1。
一些实施例中,假设音频信息的能量值为z 1,后一个包含告警声的音频信息的能量值为z 2,距离系数gain可采用公式五计算得到。
公式五
gain=log(z 2+k)/log(z 1+k)
式中,k为常数。
还需要说明的是,预先可设定距离系数的范围,如0.1到10。在步骤S410计算得到距离系数后,比对该距离系数是否位于距离系数的范围内。若距离系数位于距离系数的范围,则可执行下述步骤,若距离系数超过距离系数的范围,则以距离系数的范围的端点值(即距离系数的范围的最大值或最小值)作为本步骤的距离系数执行下述步骤。当然,要以生成的距离系数最接近的端点值,作为本步骤的距离系数执行下述步骤。
在步骤S410生成的距离系数超过距离系数的范围时,以距离系数的范围的端点值作为本步骤的距离系数执行下述步骤,可以避免生成的距离系数过大或过小,导致下述步骤生成的带有能量增益的三维提醒声的音量多大或多小。
S411、手机基于告警声相对于用户的位置信息和距离系数,处理标准提醒声得到携带能量增益的三维提醒声。
本步骤中,获取标准提醒声,以及确定HRTF值和HRIR值的方式与前述步骤S405的内容相同,此处不展开说明。
在一个可能的实施方式中,将标准提醒声进行傅里叶变换处理,再分别与位置信息对应的左耳机的HRTF值和右耳机的HRTF值进行乘积,得到双耳输出信号,即左耳机的三维提醒声和右耳机的三维提醒声,再将左耳机的三维提醒声和右耳机的三维提醒声分别与距离系数gain作乘,得到左、右耳机携带能量增益的三维提醒声。
在另一个可能的实施方式中,将标准提醒声,分别与位置信息对应的左耳机的HRIR值和右耳机的HRIR值进行卷积处理,得到双耳输出信号,即左耳机的三维提醒声和右耳机的三维提醒声,再将左耳机的三维提醒声和右耳机的三维提醒声分别与距离系数gain作乘,得到左、右耳机的携带能量增益的三维提醒声。
本实施例中,手机处理标准提醒声得到携带能量增益的三维提醒声,若告警声的声源不断靠近用户,则手机后一次获取的音频信息的能量,要大于前一次获取的音频信息的能量,因此,能量增益为正,距离系数大于1,携带能量增益的三维提醒声要比前一次三维提醒声能量大,可以保证以该携带能量增益的三维提醒声重点提醒用户。
S412、手机向降噪耳机发送携带能量增益的三维提醒声。
一些实施例中,手机可通过蓝牙等连接通道,向降噪耳机发送左耳机的携带能量增益的三维提醒声以及右耳机的携带能量增益的三维提醒声。
S413、降噪耳机播放携带能量增益的三维提醒声。
其中,降噪耳机的左耳机输出左耳机的携带能量增益的三维提醒声,右耳机输出右耳机的携带能量增益的三维提醒声。
需要说明的是,前述步骤S404,步骤S408至步骤S413是可选性执行的步骤。一些实施例中,若需要在用户所处环境出现告警声,通过降噪耳机提醒用户告警声,则可不执行步骤S404,步骤S408至步骤S413。在执行步骤S403之后直接执行步骤S405至步骤S407。
还需要说明的是,实施例一提供的音频信息的处理方法,也可由降噪耳机来执行。
一些实施例中,降噪耳机完全替代手机,完整执行图4展示的音频信息的处理方法。即耳机智能提醒告警声功能被启动之后,降噪耳机运行过程中,利用自身的麦克风采集外界环境的声音,得到音频信息,并利用音频信息执行步骤S402至步骤S405,步骤S407至步骤S411,以及步骤S413。
另一些实施例中,耳机智能提醒告警声功能被启动,手机的麦克风阵列采集外界环境的声音,得到音频信息。降噪耳机利用音频信息执行步骤S402至步骤S405,步骤S407至步骤S411,以及步骤S413。
实施例二
参见图6,本实施例提供的另一种应用场景中,用户耳戴降噪耳机,手腕处戴有智能手表,手机分别与智能手表和降噪耳机建立有蓝牙连接。在本应用场景中,降噪耳机和智能手表也可以通过蓝牙等连接通道进行信息的交互,实现在用户周边出现危险的告警声时,进行提醒。
需要说明的是,降噪耳机和智能手表的基本组成部件和软件结构,可参见前述内容,此处不再赘述。
本实施例提供的一种音频信息的处理方法,参见图7,包括:
S701、智能手表获取降噪耳机得到的音频信息。
其中,降噪耳机的麦克风采集外界环境的声音,得到音频信息,智能手表可通过蓝牙通道等获取音频信息。
一些实施例中,降噪耳机可通过蓝牙通道等,将音频信息传输到智能手表,再由智能手表通过蓝牙通道将音频信息传输到智能手表。
另一些实施例中,降噪耳机可通过蓝牙通道等,将音频信息传输到智能手表。
S702、智能手表调用告警声检测模型,对音频信息是否包含告警声进行检测,得到检测结果,该检测结果用于表征音频信息是否包含告警声。
如前所述,告警声检测模型具有预测输入到告警声检测模型的音频信息中是否包含告警声的功能。因此,在获取外界环境的音频信息后,智能手表可利用告警声检测模型,对音频信息进行是否包含告警声的检测。本实施例中,智能手表预选存储有训练好的告警声检测模型,智能手表获取音频信息后,调用告警声检测模型,对音频信息是否包含告警声进行检测,得到检测结果。
另一些实施例中,也可由降噪耳机调用告警声检测模型,对音频信息是否包含告警声进行检测,得到检测结果,再将检测结果传输到智能手表。如此,智能手表可不执行步骤S702。
若检测结果指示音频信息包含告警声,则执行步骤S703和步骤S704;若检测结果指示音频信息不包含告警声,则返回执行步骤S701。
S703、智能手表利用音频信息对告警声进行定位,得到告警声相对于用户的位置信息。
本实施例中,智能手表可利用前述内容提出的基于麦克风阵列的声源定位算法,利用音频信息对告警声进行声源定位。具体的,智能手表利用降噪耳机的左耳机的麦克风采集的音频信息,以及右耳机的麦克风采集的音频信息,对告警声进行声源定位,得到告警声相对于用户的位置信息,该位置信息一般包括告警声相对于用户的水平方向角θ。
另一些实施例中,也可由降噪耳机利用左耳机的麦克风采集的音频信息和右耳机的麦克风采集的音频信息,前述内容提出的基于麦克风阵列的声源定位算法,对告警声进行声源定位,得到告警声相对于用户的位置信息。降噪耳机可将得到的告警声相对于用户的方向角传输到智能手表。如此,智能手表可不执行步骤S703。
还需要说明的是,由于手机的麦克风也可采集外界环境的声音,得到音频信息,因此,本申请的另一些实施例中,智能手表可获取手机内置的麦克风阵列采集的音频信息,如通过蓝牙通道获取手机的麦克风阵列采集的音频信息。智能手表再利用麦克风阵列采集的音频信息,对告警声进行声源定位,得到告警声相对于用户的位置信息。
智能手表利用手机的麦克风阵列采集的音频信息,对告警声进行声源定位,得到告警声相对于用户的位置信息的方式,可如实施例一的步骤S403内容,此处不再赘述。
S704、智能手表检测音频信息,与前一个包含告警声的音频信息是否在预设时间段内获取。
若智能手表检测音频信息与前一个包含告警声的音频信息不是在预设时间段内获取,则执行步骤S705至S707;若智能手表检测音频信息,与前一个包含告警声的音频信息的时间差在预设时间段内获取,则执行步骤S708至S713。
S705、智能手表基于告警声相对于用户的位置信息,处理标准提醒声得到三维提醒声。
需要说明的是,用户可采用前述内容提出的告警声选择方式,预先设定进行告警提醒的标准提醒声。当然,也可以在执行本步骤之前,通过手机展示图3b的告警声的选择的界面,提醒用户进行标准提醒声的设定。
一些本实施例中,智能手表还预先存储多个头部相关变换函数(Head-Response Transfer Function,HRTF)值;其中,多个头部相关变换函数(Head-Response Transfer Function,HRTF)值,通常按照左右耳机成对设置。即多个的HRTF值被分为多个左耳机的HRTF 值和每个左耳机的HRTF值对应的右耳机的HRTF值。一对左、右耳机的HRTF值分别对应一个告警声相对于用户的一个角度值。
另一些实施例中,智能手表还可预先存储多个头相关冲击响应(Head Related Inpulse Response,HRIR)值;其中,多个头相关冲击响应(Head Related Inpulse Response,HRIR)值,通常按照左右耳机成对设置。即多个的HRIR值被分为多个左耳机的HRIR值和每个左耳机的HRIR值对应的右耳机的HRIR值。一对左、右耳机的HRIR值分别对应一个告警声相对于用户的一个角度值。
智能手表执行步骤S705,处理标准提醒声得到三维提醒声的方式,可如前述实施例一的步骤S405两种可能的实施方式,此处不再赘述。
S706、智能手表向降噪耳机发送三维提醒声。
一些实施例中,智能手表向降噪耳机发送左耳机的三维提醒声和右耳机的三维提醒声。
另一些实施例中,智能手表通过手机向降噪耳机发送左耳机的三维提醒声和右耳机的三维提醒声。
S707、降噪耳机播放三维提醒声。
其中,降噪耳机的左耳机输出左耳机的三维提醒声,右耳机输出右耳机的三维提醒声。
本实施例中,在外界环境的音频中检测到告警声,智能手表利用音频信息对告警声进行定位,得到告警声相对于用户的位置信息,并基于告警声相对于用户的位置信息,处理标准提醒声得到三维提醒声,再由降噪耳机播放三维提醒声,可提醒用户周围出现告警声,存在安全问题。
S708、智能手表判断音频信息中告警声相对于用户的位置信息,与前一个包含告警声的音频信息中告警声相对于用户的位置信息的差值,是否在预设范围内。
若智能手表判断出音频信息中告警声相对于用户的位置信息,与前一个包含告警声的音频信息中告警声相对于用户的位置信息的差值,未在预设范围内,则执行步骤S705。
若智能手表判断出音频信息中告警声相对于用户的位置信息,与前一个包含告警声的音频信息中告警声相对于用户的位置信息的差值,在预设范围内,则执行步骤S709。
智能手表执行步骤S708的具体过程,可参见前述实施例一的步骤S408内容,此处不再赘述。
S709、智能手表检测音频信息,与前一个包含告警声的音频信息是否属于相同声音。
智能手表检测音频信息与前一个包含告警声的音频是否属于相同声音的实施方式,可参见前述实施例一的步骤S409的内容,此处不再赘述。
若智能手表检测音频信息与前一个包含告警声的音频信息属于相同声音,则执行步骤S710。若检测音频信息与前一个包含告警声的音频信息不属于相同声音,则执行步骤S705。
S710、智能手表生成距离系数,该距离系数用于表征音频信息相对于前一个包含告警声的音频信息的能量增益。
智能手表生成距离系数的实施方式,可参见前述实施例一的步骤S410的内容,此处不再赘述。
S711、智能手表基于告警声相对于用户的位置信息和距离系数,处理标准提醒声得到携带能量增益的三维提醒声。
智能手表基于告警声相对于用户的位置信息和距离系数,处理标准提醒声得到携带能量增益的三维提醒声的实施方式,可参见前述实施例一的步骤S411的内容,此处不再赘述。
S712、智能手表向降噪耳机发送携带能量增益的三维提醒声。
一些实施例中,智能手表向降噪耳机发送左耳机的携带能量增益的三维提醒声,以及右耳机的携带能量增益的三维提醒声。
另一些实施例中,智能手表通过手机向降噪耳机发送左耳机的携带能量增益的三维提醒声和右耳机的携带能量增益的三维提醒声。
S713、降噪耳机播放携带能量增益的三维提醒声。
其中,降噪耳机的左耳机输出左耳机的携带能量增益的三维提醒声,右耳机输出右耳机的携带能量增益的三维提醒声。
本申请另一实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。
本申请另一实施例还提供了一种包含指令的计算机程序产品。当该计算机程序产品在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。
本申请另一实施例还提供了一种音频处理系统,该系统包括电子设备和耳机,电子设备如手机,智能手表等,耳机可以为降噪耳机,其中,电子设备和耳机的工作过程可如前述实施例一和实施例二的内容,此处不展开说明。

Claims (41)

  1. 一种音频信息的处理方法,其特征在于,应用于电子设备,所述音频信息的处理方法包括:
    获取音频信息,所述音频信息由采集所述电子设备所处环境的声音而得到;
    确定所述音频信息包括告警声;
    基于所述音频信息确定所述告警声的第一位置信息,所述第一位置信息用于标识所述告警声的声源方向;
    确定第一声音,所述第一声音包括第二位置信息,所述第二位置信息用于标识所述告警声的声源方向,所述第二位置信息与所述第一位置信息相同或者不同;
    播放所述第一声音。
  2. 根据权利要求1所述的音频信息的处理方法,其特征在于,所述确定所述第一声音,所述第一声音包括第二位置信息之前,还包括:
    确定所述音频信息与前一个包含告警声的音频信息,未在预设时间段内获取。
  3. 根据权利要求2所述的音频信息的处理方法,其特征在于,还包括:
    确定所述音频信息与所述前一个包含告警声的音频信息,在预设时间段内获取;
    判断所述音频信息中的告警声的第一位置信息,与所述前一个包含告警声的音频信息中的告警声的第一位置信息的差值在预设范围内,且检测所述音频信息中的告警声和所述前一个包含告警声的音频信息中的告警声属于同一声音,生成距离系数,所述距离系数用于表征所述音频信息相对于所述前一个包含告警声的音频信息的能量增益;
    确定第二声音,所述第二声音包括所述第二位置信息和所述能量增益;
    播放所述第二声音。
  4. 根据权利要求1至3中任意一项所述的音频信息的处理方法,其特征在于,所述播放所述第一声音,包括:
    向耳机发送所述第一声音,由所述耳机播放所述第一声音。
  5. 根据权利要求3所述的音频信息的处理方法,其特征在于,所述播放所述第二声音,包括:
    向耳机发送所述第二声音,由所述耳机播放所述第二声音。
  6. 根据权利要求1至5中任意一项所述的音频信息的处理方法,其特征在于,所述基于所述音频信息确定所述告警声的第一位置信息,包括:
    基于麦克风阵列的声源定位算法,利用所述音频信息对所述告警声进行声源定位,得到所述告警声的第一位置信息。
  7. 根据权利要求1至5中任意一项所述的音频信息的处理方法,其特征在于,所述基于所述音频信息确定所述告警声的第一位置信息,包括:
    基于所述音频信息,确定所述告警声的第三位置信息,所述第三位置信息用于标识所述告警声相对于所述电子设备的声源方向;
    对所述告警声的第三位置信息进行坐标转换,得到所述告警声的第一位置信息。
  8. 根据权利要求1至5中任意一项所述的音频信息的处理方法,其特征在于,所述确定第一声音,所述第一声音包括第二位置信息,包括:
    获取标准声音;
    基于所述告警声的第一位置信息,处理所述标准声音,得到所述第一声音,所述第一声音包括第二位置信息。
  9. 根据权利要求8所述的音频信息的处理方法,其特征在于,所述基于所述告警声的第一位置信息,处理所述标准声音,得到所述第一声音,包括:
    获取所述告警声的第一位置信息对应的头相关冲击响应HRIR值;
    将所述标准声音,分别所述HRIR值进行卷积处理,得到所述第一声音。
  10. 根据权利要求8所述的音频信息的处理方法,其特征在于,所述基于所述告警声的第一位置信息,处理所述标准声音,得到所述第一声音,包括:
    获取所述告警声的第一位置信息对应的头部相关变换函数HRTF值;
    将所述标准声音进行傅里叶变换处理,再与所述HRTF值作乘,得到所述第一声音。
  11. 根据权利要求3所述的音频信息的处理方法,其特征在于,所述检测所述音频信息中的告警声和所述前一个包含告警声的音频信息中的告警声属于同一声音的方式,包括:
    分别对所述音频信息和所述前一个包含告警声的音频信息进行时域到频域的转换,得到所述音频信息和所述前一个包含告警声的音频信息的幅度谱;
    利用所述音频信息和所述前一个包含告警声的音频信息的幅度谱,对所述音频信息和所述前一个包含告警声的音频信息进行相似度计算,得到计算结果,所述计算结果用于表征所述音频信息和所述前一个包含音频信息是否属于同一声音。
  12. 根据权利要求11所述的音频信息的处理方法,其特征在于,所述利用所述音频信息和所述前一个包含告警声的音频信息的幅度谱,对所述音频信息和所述前一个包含告警声的音频信息进行相似度计算,得到计算结果,包括:
    采用皮尔逊相关函数,对所述音频信息和所述前一个包含告警声的音频信息进行相似度计算,得到相似度值;
    其中,所述相似度值大于阈值,则所述音频信息和所述前一个包含告警声的音频信息属于同一声音,所述相似度值不大于阈值,则所述音频信息和所述前一个包含告警声的音频信息不属于同一个声音。
  13. 根据权利要求11所述的音频信息的处理方法,其特征在于,所述利用所述音频信息和所述前一个包含告警声的音频信息的幅度谱,对所述音频信息和所述前一个包含告警声的音频信息进行相似度计算,得到计算结果,包括:
    利用分类模型预测所述音频信息和所述前一个包含告警声的音频信息是否属于同一声音。
  14. 根据权利要求3所述的音频信息的处理方法,其特征在于,所述检测所述音频信息中的告警声和所述前一个包含告警声的音频信息中的告警声属于同一声音的方式,包括:
    从所述音频信息以及所述前一个包含告警声的音频信息中,分别提出告警声;
    判断提取得到的两个告警声是否属于同一个告警声。
  15. 根据权利要求14所述的音频信息的处理方法,其特征在于,所述判断提取得到的两个告警声是否属于同一个告警声,包括:
    分别对提取得到的两个告警声进行时域到频域的转换,得到所述提取得到的两个告警声的幅度谱;
    利用所述提取得到的两个告警声的幅度谱,对所述提取得到的两个告警声进行相似度计算,得到计算结果,所述计算结果用于表征所述提取得到的两个告警声是否属于同一个告警声。
  16. 根据权利要求15所述的音频信息的处理方法,其特征在于,所述利用所述提取得到的两个告警声的幅度谱,对所述提取得到的两个告警声进行相似度计算,得到计算结果,包括:
    采用皮尔逊相关函数,对所述提取得到的两个告警声进行相似度计算,得到相似度值;
    其中,所述相似度值大于阈值,则所述提取得到的两个告警声属于同一个告警声,所述相似度值不大于阈值,则所述提取得到的两个告警声不属于同一个告警声。
  17. 根据权利要求15所述的音频信息的处理方法,其特征在于,所述利用所述提取得到的两个告警声的幅度谱,对所述提取得到的两个告警声进行相似度计算,得到计算结果,包括:
    利用分类模型预测所述提取得到的两个告警声是否属于同一个告警声。
  18. 根据权利要求3所述的音频信息的处理方法,其特征在于,所述生成距离系数之后,还包括:
    确定所述距离系数在所述距离系数的范围内。
  19. 根据权利要求18所述的音频信息的处理方法,其特征在于,还包括:
    确定所述距离系数超过所述距离系数的范围;
    确定第三声音,所述第三声音包括所述第二位置信息和所述距离系数的范围的端点值表征的能量增益;
    播放所述第三声音。
  20. 根据权利要求1至19中任意一项所述的音频信息的处理方法,其特征在于,所述确定所述音频信息包括告警声的方式,包括:
    调用告警声检测模型对所述音频信息是否包含告警声进行检测,得到检测结果,所述检测结果用于表征所述音频信息是否包含告警声。
  21. 一种电子设备,其特征在于,所述电子设备包括:
    一个或多个处理器、存储器和无线通信模块;
    所述存储器和所述无线通信模块与所述一个或多个所述处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述一个或多个处理器执行所述计算机指令时,所述电子设备执行如权利要求1至20任意一项所述的音频信息的处理方法。
  22. 一种计算机存储介质,其特征在于,用于存储计算机程序,所述计算机程序被执行时,具体用于实现如权利要求1至20任意一项所述的音频信息的处理方法。
  23. 一种计算机程序产品,其特征在于,当计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至20任意一项所述的音频信息的处理方法。
  24. 一种音频信息的处理系统,其特征在于,包括:电子设备和耳机,其中,所述电子设备用于执行如权利要求1至20任意一项所述的音频信息的处理方法;所述耳机与所述电子设备交互,用于响应所述电子设备,播放第一声音、第二声音或第三声音。
  25. 一种音频信息的处理方法,其特征在于,应用于电子设备,所述音频信息的处理方法包括:
    获取音频信息,所述音频信息由采集所述电子设备所处环境的声音而得到;
    确定所述音频信息包括告警声;
    基于所述音频信息确定所述告警声的第一位置信息,所述第一位置信息用于标识所述告警声的声源方向;
    确定所述音频信息与前一个包含告警声的音频信息,未在预设时间段内获取;
    确定第一声音,所述第一声音包括第二位置信息,所述第二位置信息用于标识所述告警声的声源方向,所述第二位置信息与所述第一位置信息相同或者不同,所述第一声音为三维提醒声,所述三维提醒声是携带有方向的告警声;
    向耳机发送第一声音,由所述耳机播放所述第一声音;
    确定所述音频信息与所述前一个包含告警声的音频信息,在预设时间段内获取;
    判断所述音频信息中的告警声的第一位置信息,与所述前一个包含告警声的音频信息中的告警声的第一位置信息的差值在预设范围内,且检测所述音频信息中的告警声和所述前一个包含告警声的音频信息中的告警声属于同一声音,生成距离系数,所述距离系数用于表征所述音频信息相对于所述前一个包含告警声的音频信息的能量增益;
    确定第二声音,所述第二声音包括所述第二位置信息和所述能量增益;
    播放所述第二声音。
  26. 根据权利要求25所述的音频信息的处理方法,其特征在于,所述播放所述第二声音,包括:
    向耳机发送所述第二声音,由所述耳机播放所述第二声音。
  27. 根据权利要求25或26所述的音频信息的处理方法,其特征在于,所述基于所述音频信息确定所述告警声的第一位置信息,包括:
    基于麦克风阵列的声源定位算法,利用所述音频信息对所述告警声进行声源定位,得到所述告警声的第一位置信息。
  28. 根据权利要求25或26所述的音频信息的处理方法,其特征在于,所述基于所述音频信息确定所述告警声的第一位置信息,包括:
    基于所述音频信息,确定所述告警声的第三位置信息,所述第三位置信息用于标识所述告警声相对于所述电子设备的声源方向;
    对所述告警声的第三位置信息进行坐标转换,得到所述告警声的第一位置信息。
  29. 根据权利要求25或26所述的音频信息的处理方法,其特征在于,所述确定第一声音,所述第一声音包括第二位置信息,包括:
    获取标准声音;
    基于所述告警声的第一位置信息,处理所述标准声音,得到所述第一声音,所述第一声音包括第二位置信息。
  30. 根据权利要求29所述的音频信息的处理方法,其特征在于,所述基于所述告警声的第一位置信息,处理所述标准声音,得到所述第一声音,包括:
    获取所述告警声的第一位置信息对应的头相关冲击响应HRIR值;
    将所述标准声音,分别所述HRIR值进行卷积处理,得到所述第一声音。
  31. 根据权利要求29所述的音频信息的处理方法,其特征在于,所述基于所述告警声的第一位置信息,处理所述标准声音,得到所述第一声音,包括:
    获取所述告警声的第一位置信息对应的头部相关变换函数HRTF值;
    将所述标准声音进行傅里叶变换处理,再与所述HRTF值作乘,得到所述第一声音。
  32. 根据权利要求25所述的音频信息的处理方法,其特征在于,所述检测所述音频信息中的告警声和所述前一个包含告警声的音频信息中的告警声属于同一声音的方式,包括:
    分别对所述音频信息和所述前一个包含告警声的音频信息进行时域到频域的转换,得到所述音频信息和所述前一个包含告警声的音频信息的幅度谱;
    利用所述音频信息和所述前一个包含告警声的音频信息的幅度谱,对所述音频信息和所述前一个包含告警声的音频信息进行相似度计算,得到计算结果,所述计算结果用于表征所述音频信息和所述前一个包含音频信息是否属于同一声音。
  33. 根据权利要求32所述的音频信息的处理方法,其特征在于,所述利用所述音频信息和所述前一个包含告警声的音频信息的幅度谱,对所述音频信息和所述前一个包含告警声的音频信息进行相似度计算,得到计算结果,包括:
    采用皮尔逊相关函数,对所述音频信息和所述前一个包含告警声的音频信息进行相似度计算,得到相似度值;
    其中,所述相似度值大于阈值,则所述音频信息和所述前一个包含告警声的音频信息属于同一声音,所述相似度值不大于阈值,则所述音频信息和所述前一个包含告警声的音频信息不属于同一个声音。
  34. 根据权利要求32所述的音频信息的处理方法,其特征在于,所述利用所述音频信息和所述前一个包含告警声的音频信息的幅度谱,对所述音频信息和所述前一个包含告警声的音频信息进行相似度计算,得到计算结果,包括:
    利用分类模型预测所述音频信息和所述前一个包含告警声的音频信息是否属于同一声音。
  35. 根据权利要求25所述的音频信息的处理方法,其特征在于,所述检测所述音频信息中的告警声和所述前一个包含告警声的音频信息中的告警声属于同一声音的方式,包括:
    从所述音频信息以及所述前一个包含告警声的音频信息中,分别提出告警声;
    判断提取得到的两个告警声是否属于同一个告警声。
  36. 根据权利要求35所述的音频信息的处理方法,其特征在于,所述判断提取得到的两个告警声是否属于同一个告警声,包括:
    分别对提取得到的两个告警声进行时域到频域的转换,得到所述提取得到的两个告警声的幅度谱;
    利用所述提取得到的两个告警声的幅度谱,对所述提取得到的两个告警声进行相似度计算,得到计算结果,所述计算结果用于表征所述提取得到的两个告警声是否属于同一个告警声。
  37. 根据权利要求36所述的音频信息的处理方法,其特征在于,所述利用所述提取得到的两个告警声的幅度谱,对所述提取得到的两个告警声进行相似度计算,得到计算结果,包括:
    采用皮尔逊相关函数,对所述提取得到的两个告警声进行相似度计算,得到相似度值;
    其中,所述相似度值大于阈值,则所述提取得到的两个告警声属于同一个告警声,所述相似度值不大于阈值,则所述提取得到的两个告警声不属于同一个告警声。
  38. 根据权利要求36所述的音频信息的处理方法,其特征在于,所述利用所述提取得到的两个告警声的幅度谱,对所述提取得到的两个告警声进行相似度计算,得到计算结果,包括:
    利用分类模型预测所述提取得到的两个告警声是否属于同一个告警声。
  39. 根据权利要求25所述的音频信息的处理方法,其特征在于,所述生成距离系数之后,还包括:
    确定所述距离系数在所述距离系数的范围内。
  40. 根据权利要求39所述的音频信息的处理方法,其特征在于,还包括:
    确定所述距离系数超过所述距离系数的范围;
    确定第三声音,所述第三声音包括所述第二位置信息和所述距离系数的范围的端点值表征的能量增益;
    播放所述第三声音。
  41. 根据权利要求25或26,或30至40中任意一项所述的音频信息的处理方法,其特征在于,所述确定所述音频信息包括告警声的方式,包括:
    调用告警声检测模型对所述音频信息是否包含告警声进行检测,得到检测结果,所述检测结果用于表征所述音频信息是否包含告警声。
PCT/CN2022/116528 2021-10-26 2022-09-01 音频信息的处理方法、电子设备、系统、产品及介质 Ceased WO2023071519A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US18/291,854 US20240411507A1 (en) 2021-10-26 2022-09-01 Audio information processing method, electronic device, system, product, and medium
EP22885410.5A EP4354900B1 (en) 2021-10-26 2022-09-01 Audio information processing method, corresponding electronic device and computer storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111248720.6A CN114189790B (zh) 2021-10-26 2021-10-26 音频信息的处理方法、电子设备、系统、产品及介质
CN202111248720.6 2021-10-26

Publications (1)

Publication Number Publication Date
WO2023071519A1 true WO2023071519A1 (zh) 2023-05-04

Family

ID=80540443

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/116528 Ceased WO2023071519A1 (zh) 2021-10-26 2022-09-01 音频信息的处理方法、电子设备、系统、产品及介质

Country Status (4)

Country Link
US (1) US20240411507A1 (zh)
EP (1) EP4354900B1 (zh)
CN (1) CN114189790B (zh)
WO (1) WO2023071519A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114189790B (zh) * 2021-10-26 2022-11-29 北京荣耀终端有限公司 音频信息的处理方法、电子设备、系统、产品及介质
CN114760560B (zh) * 2022-03-23 2025-07-22 歌尔股份有限公司 声音信号处理方法、装置、耳机设备及存储介质
CN115278468A (zh) * 2022-05-27 2022-11-01 歌尔股份有限公司 声音输出方法、装置、电子设备及计算机可读存储介质
CN115623156B (zh) * 2022-08-30 2024-04-02 荣耀终端有限公司 音频处理方法和相关装置
CN115273431B (zh) * 2022-09-26 2023-03-07 荣耀终端有限公司 设备的寻回方法、装置、存储介质和电子设备
US20250037550A1 (en) * 2023-07-24 2025-01-30 Samsung Electronics Co., Ltd. Method and electronic device for providing environmental audio alert on personal audio device
US20260089439A1 (en) * 2024-09-26 2026-03-26 Intel Corporation Apparatus, system, and method of direction-based sound event indication
CN121037738B (zh) * 2025-10-30 2026-02-06 江苏物润船联网络股份有限公司 一种基于定位技术的智能耳机打断功能的实现方法与系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140301556A1 (en) * 2012-04-09 2014-10-09 Dts, Inc. Directional based audio response to an external environment emergency signal
CN107767697A (zh) * 2016-08-19 2018-03-06 索尼公司 用于处理交通声音数据以提供驾驶员辅助的系统和方法
US20180206038A1 (en) * 2017-01-13 2018-07-19 Bose Corporation Real-time processing of audio data captured using a microphone array
CN108600885A (zh) * 2018-03-30 2018-09-28 广东欧珀移动通信有限公司 声音信号处理方法及相关产品
CN110001512A (zh) * 2018-01-02 2019-07-12 福特全球技术公司 具有声音采集装置的机动车辆
CN114189790A (zh) * 2021-10-26 2022-03-15 荣耀终端有限公司 音频信息的处理方法、电子设备、系统、产品及介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110026745A1 (en) * 2009-07-31 2011-02-03 Amir Said Distributed signal processing of immersive three-dimensional sound for audio conferences
US8797386B2 (en) * 2011-04-22 2014-08-05 Microsoft Corporation Augmented auditory perception for the visually impaired
CN102624980A (zh) * 2012-03-06 2012-08-01 惠州Tcl移动通信有限公司 一种基于手机的耳机检测突发环境提示方法及手机
US10425717B2 (en) * 2014-02-06 2019-09-24 Sr Homedics, Llc Awareness intelligence headphone
US9788101B2 (en) * 2014-07-10 2017-10-10 Deutsche Telekom Ag Method for increasing the awareness of headphone users, using selective audio
US9800990B1 (en) * 2016-06-10 2017-10-24 C Matter Limited Selecting a location to localize binaural sound
KR101892028B1 (ko) * 2016-10-26 2018-08-27 현대자동차주식회사 음향 추적 정보 제공 방법, 차량용 음향 추적 장치, 및 이를 포함하는 차량
US10067737B1 (en) * 2017-08-30 2018-09-04 Daqri, Llc Smart audio augmented reality system
US11625222B2 (en) * 2019-05-07 2023-04-11 Apple Inc. Augmenting control sound with spatial audio cues
CN111432305A (zh) * 2020-03-27 2020-07-17 歌尔科技有限公司 一种耳机告警方法、装置及无线耳机
CN111398965A (zh) * 2020-04-09 2020-07-10 电子科技大学 基于智能穿戴设备的危险信号监控方法、系统和穿戴设备
CN111818441B (zh) * 2020-07-07 2022-01-11 Oppo(重庆)智能科技有限公司 音效实现方法、装置、存储介质及电子设备
US11467666B2 (en) * 2020-09-22 2022-10-11 Bose Corporation Hearing augmentation and wearable system with localized feedback

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140301556A1 (en) * 2012-04-09 2014-10-09 Dts, Inc. Directional based audio response to an external environment emergency signal
CN107767697A (zh) * 2016-08-19 2018-03-06 索尼公司 用于处理交通声音数据以提供驾驶员辅助的系统和方法
US20180206038A1 (en) * 2017-01-13 2018-07-19 Bose Corporation Real-time processing of audio data captured using a microphone array
CN110001512A (zh) * 2018-01-02 2019-07-12 福特全球技术公司 具有声音采集装置的机动车辆
CN108600885A (zh) * 2018-03-30 2018-09-28 广东欧珀移动通信有限公司 声音信号处理方法及相关产品
CN114189790A (zh) * 2021-10-26 2022-03-15 荣耀终端有限公司 音频信息的处理方法、电子设备、系统、产品及介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4354900A4

Also Published As

Publication number Publication date
CN114189790A (zh) 2022-03-15
US20240411507A1 (en) 2024-12-12
EP4354900A4 (en) 2024-10-30
EP4354900A1 (en) 2024-04-17
EP4354900B1 (en) 2025-08-06
CN114189790B (zh) 2022-11-29

Similar Documents

Publication Publication Date Title
CN114189790B (zh) 音频信息的处理方法、电子设备、系统、产品及介质
EP3547712B1 (en) Method for processing signals, terminal device, and non-transitory readable storage medium
US10817251B2 (en) Dynamic capability demonstration in wearable audio device
CN108538320B (zh) 录音控制方法和装置、可读存储介质、终端
CN108600885B (zh) 声音信号处理方法及相关产品
CN108521621B (zh) 信号处理方法、装置、终端、耳机及可读存储介质
CN111696570B (zh) 语音信号处理方法、装置、设备及存储介质
WO2014161309A1 (zh) 一种移动终端实现声源定位的方法及装置
WO2018045536A1 (zh) 声音信号处理的方法、终端和耳机
WO2020020375A1 (zh) 语音处理方法、装置、电子设备及可读存储介质
CN115775564B (zh) 音频处理方法、装置、存储介质及智能眼镜
CN111341307A (zh) 语音识别方法、装置、电子设备及存储介质
CN107863110A (zh) 基于智能耳机的安全提醒方法、智能耳机及存储介质
US12411653B2 (en) Method and electronic device for detecting ambient audio signal
WO2021244056A1 (zh) 一种数据处理方法、装置和可读介质
US20260089447A1 (en) Smart glasses for hearing assistance, hearing assistance method, and auxiliary system
CN108962241A (zh) 位置提示方法、装置、存储介质及电子设备
CN117133282B (zh) 一种语音交互方法及电子设备
CN114360546B (zh) 电子设备及其唤醒方法
CN115379433B (zh) 蓝牙设备配对方法及装置
HK40072239A (zh) 音频信息的处理方法、电子设备、系统、产品及介质
HK40072239B (zh) 音频信息的处理方法、电子设备、系统、产品及介质
WO2019183904A1 (zh) 自动识别音频中不同人声的方法
CN115166633B (zh) 声源方向确定方法、装置、终端以及存储介质
CN116405593B (zh) 音频处理方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885410

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022885410

Country of ref document: EP

Ref document number: 22885410

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 18291854

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2022885410

Country of ref document: EP

Effective date: 20240108

NENP Non-entry into the national phase

Ref country code: DE

WWG Wipo information: grant in national office

Ref document number: 2022885410

Country of ref document: EP