Directional Microphone Array Technology
Although speech recognition technology has made great strides in recent years, there are still some limitations which scientists are working hard to overcome. One major limitation is the environmental noise which limits the use of this exciting technology to only close-talk or headset based applications. By complimenting the speech recognition technology with microphone array technology, the above limitation can be effectively resolved. Microphone array technology is based on the principle that by deploying more than one microphone elements, the speech input processing software can simulate the human hearing in detecting the direction of the sound source, classify them accordingly, derive and output only the sound of interest and suppressing all other unwanted sound.
The end result of microphone array processing are clean speech signals with environmental and interference noise suppressed, which is perfectly suited for speech recognition and communication purposes.
Sound waves emitted from a single source would arrive at an array of elements spaced at certain intervals apart, with different time delays. As the location of the sound source moves, the arrival time delays for each element would also change. (Figure 1)
By careful summation of the signals received at each array element with the corrected time delays, the signal quality (SNR) of the sound source would be improved at a factor proportional to the number of receiving elements. This process is called “Beam Forming”. Moreover, with the introduction of DSP-based processing, a process called “adaptive-beam forming” can be used to maximize this signal-quality (SNR) improvement. A similar process called “Adaptive Interference-Cancellation” could also be used to minimise or cancel the signal. On top of this, the time delay information could be used to determine where the sound source comes from. We could then determine if we want to maximize the signal-quality of sound sources that come from pre-defined locations or cancel those that are outside.
BITwave Digital Microphone Array Technology
BITwave’s Digital Microphone Array eliminates the need for a headset or close-talk microphone with a superior approach to noise suppression. It separates wanted sounds from unwanted noise sources at multiple stages using different DSP algorithms. Its purpose is to suppress different types of noise by different methods. The technology can be used in any area in which hands-free environment is required.
In this stage, “Adaptive Beam Forming” algorithm is used to optimize the SNR of the wanted signal and continue to track the user’s voice within the Sweet Spot Area. A led will light up if the user is within the accepted zone. Here, compared to conventional beam former our adaptive solution:
Moreover, our logarithmically spaced microphone works in conjunction with our DSP software to improve SNR over a wider frequency range
This is the “Adaptive Interference-Cancellation” stage. Its purpose is to suppress the man made noise sources (those which contain directional information) the array identifies as outside the ‘sweet zone’. A ‘null or dead spot’ in the direction of each identified noise source, as shown in Figure 4, is produced adaptively to optimize the cancellation.
The number of noise sources that can be suppressed by nulls is only limited by the bandwidth of the noise source and the processing power of the DSP.
Moreover, we incorporated an innovative mixture of time and frequency-domain cancellation, an astonishing high-amount of cancellation could be achieved.
At the end of this juncture, the array has separated out, by using directional cues and other information, wanted man made sounds from unwanted man made sounds. Then both of these signals are sent to a third stage.
In this stage, the array primarily resolves the natural noise sources, those that are diffused or otherwise lacking in directional information. These noises have no directional information and are therefore suppressed all the time.
This stage also functions adaptively and works by identifying noises that can further be removed from the wanted channel. It continuously monitors the noises and suppresses them in the frequency domain.
Here, We use a mathematically optimized algorithm that suppresses the natural noises from human speech. This differs from simple spectral subtraction used by others in that the resultant voice will have minimum distortion. Using this mathematically optimized algorithm, we are able to suppress a lot more accurately and retain maximum intelligibility than other forms of frequency suppression algorithms.
Lastly, this 3rd stage also performs some recovery of wanted signal information and then outputs the final processed signal.
To determine when and how we want to optimize or suppress a sound source, every sound source, including noises, needs to be analyzed. Basically there are two types of noises:
Man made noises: These include unwanted voices or specific environmental noise sources like fans, radios, etc that are outside the “sweet zone”. These noise sources have directional information.
Natural noises: These include noise generated by microphones, circuit noise, and diffuse sound noise sources. These noise sources contain no directional information.
In many situations it is very difficult to be 100% sure whether the signal captured contains wanted information, unwanted information, or both. All sound sources are analyzed and tracked to determine their signal quality and locations. These informations are then used to determine whether to improve or suppress them and how to suppress them. This, we use an innovative combination of multi-dimension mathematical and fuzzy-logic modeling to accurately determine the sources. Here, the robustness of this combination of algorithm could be proven in the presence of multiple and non-stationary noise sources that often cripples other microphone-arrays.
Together, these technologies produce astonishingly levels of noise suppression and interference cancellation never achieved before. The result of processing different types of noise by different intelligent adaptive processes (as opposed to single juncture cancellation) results in superior delivery of the desired voice and greater suppression of unwanted noise. The suppression of unwanted noise sources in the Digital Microphone Array is on average 24 dB and as high as 30+dB.
So, for the first time, this advanced microphone array technology makes accurate dictation a possibly at the PC without wearing a headset microphone. The followings are 2 examples of the substantial Signal-to-Noise-Ratio improvement our MicArray technology could provide under extreme environments.
Sample Audio Files