The pattern-recognition capabilities of (AI) deep learning have spurred in speech and voice recognition. Today, researchers in Belgium released a studyin Nature Machine Intelligence introducing a new AI machine learning model with real-time, human-like capabilities that performs 2000 times faster than state-of-the-art machine-based hearing solutions.
The machine-based hearing market is a growth opportunity. The global speech and voice recognition is projected to reach USD 28.3 billion by 2026, with a CAGR of 19.8 percent during 2018-2026 according to Fortune Business Insights. The worldwide hearing aid market is projected to reach USD 7.3 billion by 2025 and have a CAGR of 4.6 percent during 2019-2025 according to Grand View Research.
Outside the world of biophysicists, machine learning experts, and AI medical device researchers, the problems plaguing machine-based hearing are not common knowledge. Current auditory models used for feature extraction for hearing-aids, machine-based hearing, robotics, and automated speech recognition systems share two major issues—they do not run in real-time, and they require heavy computing resources.
Researchers Sarah Verhulst, Deepak Baby, and Arthur Van Den Broucke at Ghent University in Belgium set out to create a new type of auditory model to herald the next generation of machine-based hearing.
To understand acoustic models requires knowledge of human hearing, a biophysical (versus biochemical) process. Human hearing starts with sound that travels from the outer ear through the ear canal to the eardrum (tympanic membrane) that separates the outer from the middle ear. The eardrum vibrates and transmits the sound that is then amplified by three tiny bones (ossicles) called the hammer (malleus), anvil (incus), and stirrup (stapes). Next, the amplified sound waves enter the inner ear to the cochlea, a liquid-filled curled structure with a snail’s shell appearance. The fluid in the cochlea moves up and down due to the sound. This causes the projections (stereocilia) on the hair cells that line the internal membrane (basilar membrane) of the cochlea to bend from bumping around. This physical movement and bending of the stereocilia stimulates ion channels to open which generates a signal sent from the cochlea to the brain’s stem (medulla) via the auditory (cochlear) nerve.
Common existing models of cochlear mechanics have different drawbacks. Many models may introduce distortion according to the researchers. The gammatone filterbank model “ignores the stimulus-level dependence of cochlear filtering.” Parallel architecture excludes otoacoustic emissions and longitudinal coupling. State-of-the-art transmission line (TL) models, which are commonly used for cochlear mechanics, use a cascaded system—a computationally expensive method that does not allow for parallel computing during filtering.
The Belgium researchers cite computational complexity as the gating factor for real-time cochlear traveling wave models in preprocessing.
“This complexity motivated our search for an efficient model that matches the performance of state-of-the-art analytical TL models while offering real-time execution,” the researchers wrote.
The researchers named their hybrid AI model CoNNear, a fully convolutional encoder-decoder network. CoNNear transforms 20-kHz sampled acoustic waveform to cochlear basilar-membrane (BM) waveforms. The CoNNear model simulates the cochlear mechanical responses in real-time. CoNNear is based on parallel CPU computations that can be sped up with GPU computing and integrated with real-time auditory deep learning applications.
“CoNNear presents an architecture with differentiable equations and operates in real time (< 7.5 ms delay) at speeds 2000 times faster than state-of-the-art biophysically realistic models,” reported the researchers. “We have high hopes that the CoNNear framework will inspire a new generation of human-like machine hearing, augmented hearing and automatic speech-recognition systems.”