"As part of the calibration, the speed of sound is also a parameter which is optimized to obtain the best model of the system, which allows this whole procedure to act as a ridiculously overengineered thermometer."
Reminds me of the electronics adage: "all sensors are temperature sensors, some measure other things as well."
Back in high school, I built (with some parental assistance) an apparatus to measure how quickly the pressure would drop (in a pressurized cylinder) when a very small hole allowed air to leak out.
Turns out, not only can you measure temperature that way, but can extrapolate the graph out to find absolute zero (IIRC my result was out by about 20 kelvin, which I think is pretty damn good for a high-school-garage project).
I love these kind of inadvertent measurements. One of my favorite examples is that a sufficiently accurate IMU can get you relatively accurate longitude measurements from the Coriolis effect.
Interesting. If the voltage across the speaker voice coil can be sampled with enough sensitivity at a fast-enough rate, you have an undocumented microphone.
I think the most you can tell from an IMU or gyro is that there is a change in velocity in a direction aligning with East-West when there is a change in location and that the change in velocity is greater when the location changes in line with North-South. The change in velocity would be greater as one approaches the poles and lesser at the equator.
Thought experiment: if I zeroed my IMU at the North pole and traveled in a straight line away from the pole along longitude zero, following the guidance of the IMU. By the time I got to 45° latitude I’d be traveling Westward at 1,180 kph (.95 Mach) to keep the IMU at zero.
The flat earther used a fibre optic gyro. You don't "zero" it, it continuously outputs a measurement of its own angular rate around it's sensitive axis. For a 3-axis gyro placed still on earth, it will read about 15 degree/hour around wherever the axis of earth is oriented.
The earth’s surface closer to the poles has less distance to travel for any rotation than the surface closer to the equator. As a result the inertial navigation systems of long distance systems must be adjusted. Iirc, this is also the case for artillery firing computations.
Coriolis corrections are thrown into sniper ballistic calculations, too. Not a huge effect in most conditions, but not zero, and there have been a lot of long shots in the past two decades.
I believe this is one of the initial steps an aircraft INS uses to find north while it is aligning, but it's been too long since I had aircraft systems theory in the front of my brain.
Yes, from earth rotation the INS could figure out true north if the latitude is known. Or figure out the latitude if current heading is known. But normally it's aligned with a starting position from pilot input or GPS.
Similarly, diesel engines come with a reserve fuel supply that you can accidentally use once. (diesel engines will happily run on engine oil when warm)
You don't have to try hard. Just use it as a photodiode and it magically works. However, if it's inside a plastic case that blocks light, it doesn't.
Due to some law about entropy, efficient processes are necessarily reversible. That's why electric motors - some of the most efficient machines ever invented - are also generators.
> However, if it's inside a plastic case that blocks light, it doesn't.
You want an ordinary diode to allow current to flow easily when it senses light? Simple: shine a powerful laser at the plastic-encased diode and it will melt the plastic and liquify the metal, fusing it together and allowing current to flow again. See? You just needed to try harder.
I first encountered it in Elecia White's book Making Embedded Systems, but the attribution is anonymous and whom it's attributed to may have heard it elsewhere.
I’m not sure how the speed of sound could depend on altitude, even in principle. The air doesn’t know where it is!
Putting that aside, in an ideal gas, the speed of sound depends on the composition of the gas and the temperature and, interestingly, does not depend on pressure, and pressure is the main way that the altitude would affect the speed of sound. So measuring the speed of sound in air actually makes for a pretty good thermometer.
"The speed has a weak dependence on frequency and pressure in ordinary air, deviating slightly from ideal behavior."
"The speed of sound is raised by humidity. The difference between 0% and 100% humidity is about 1.5 m/s at standard pressure and temperature, but the size of the humidity effect increases dramatically with temperature."
"Slight" can matter significantly in an application like this.
Not unless you change the average mass of the molecules.
An ideal gas’ pressure is a function of number of particles per unit volume, its temperature, and nothing else. If you do anything involving adding or removing heat or changing the volume or pressure, you probably also need to know the specific heat at constant volume and the specific heat at constant pressure or, frequency, their ratio. That ratio is called the adiabatic index or the heat capacity ratio, it’s written as gamma, and it’s the last parameter in the speed of sound of an ideal gas. Interestingly, it doesn’t vary all that much between different gasses.
Right, it gets even worse: Air pressure in not only altitude-dependent but fluctuates even at constant altitude. The pressure (altitude) dependence is comparatively weak, though.
By definition, sure. But one always needs some effect which changes some electrical property. We can't just hook up an ADC (analog digital converter) to thin air and hope for the best.
In practice most microphones measure the displacement of microscopic membranes, which are deformed by the air pressure.
The next question then becomes how to measure microscopic movements of a tiny membrane.
Turns out the membrane forms part of a capacitor and the electrical characteristics of capacitors depend on their geometry.
There are at least 4 different types of microphones. Condenser which does in fact form part of a capacitor, dynamic which is effectively a linear generator (coil attached to membrane), ribbon which is a change in resistance as a small ribbon flexes and piezoelectric which is some black magic witg crystals
For me I see a lot more dynamic than condensers but I guess if you are talking about what is in like every single IOT thingamabob then you might be right there.
Fascinating. Is there a book about the history of microphones?
I find this to all be in the realm of "I don't believe you that any of this works at all" if I didn't have a lifetime of experience with the fruits of successfully-functioning microphones.
A lot of people like myself consider heat a form of light but I guess a photographer would be just thinking visible light. They say that about 50% of the sun's light emissions comes in the infrared frequencies.
That seems like a mistake since heat can transfer e.g. via contact without any electromagnetic emission. In fact, that is what I think happens with the sensor also, given that there is an IR filter in front of it.
I once did a project to do multilateration of bats (the flying mammal) using an array of 4 microphones arranged in a big Y shape on the ground. Using the time difference of arrival at the four microphones, we could find the positions of each bat that flew over the array, as well as identify the species. It was used for an environmental study to determine the impact of installing wind turbines. Fun times.
Reminds me of Intellectual Venture's Optical Fence developed to track and kill mosquitoes with short laser pulses.
As a side-effect of the precision needed to spatially locate the mosquitoes, they could detect different wing beat frequencies that allowed target discrimination by sex and species.
This laser mosquito killer is, and always has been, a PR whitewashing campaign for Intellectual Venture's reputation.
This device has never been built, never been purchasable, and it is ALWAYS brought up whenever IV wants to talk about how cool they are.
And I say this as someone who loosely knows and was friends with a few people that worked there. They brought up this same invention when they were talking about their work. They eventually soured on the company, once they saw the actual sausage being made.
IV is a patent troll, shaking down people doing the real work of developing products.
They trot out this invention, and a handful of others, to appear like they are a public benefit. Never mind that most of these inventions don't really exist, have never been manufactured.
They hide the extent of their holdings, they hide the byzantine network of shell companies they use to mask their holdings, and they spend a significant amount of their money lobbying (bribing).
Why do they need to hide all of this?
Look at their front page, prominently featuring the "Autoscope", for fighting malaria. Fighting malaria sounds great, they're the good guys, right?
Now do a bit of web searching to try to find out what the Autoscope is and where it's being used. It's vaporware press release articles going back 8 years.
Look at their "spinouts" page, and try to find any real substance at all on these companies. It is all gossamer, marketing speak with nothing behind it when you actually go looking for it.
Meanwhile, they hold a portfolio of more than 40,000 patents, and they siphon off billions from the real economy.
Part of their "licensing agreement" is that you can't talk badly about them after they shake you down, or else the price goes up.
I did a similar project at 18. Needless to say I didn't have enough HW and SW skills to do much since I implemented the most naive form of the TDOA algorithms as well as the most inefficient way of estimating the time difference through cross correlation. I still learnt a lot and it led me to eventually getting a PhD in SAR systems, which are actually beamformers using the movement of the platform instead of an array
What were the results of your study? I’ve heard that bat lungs are so sensitive that when they fly across the pressure differential of large turbines their capillaries basically explode
I would love to do something like that to track the bats in my garden, how feasible would it be for an amateur to do as a personal project?
Any good references on where to start.
Honestly, that sounds like amazing work. I wish I could afford to get out of enterprise software engineering and just do academic software development like that.
Then you can take Jetson (or any I2S capable hardware with DSP or GPU on it) and chain 16 microphones per I2S port. It would seem a lot easier to assemble and probgam, if comared to FPGA setup.
Not OP, but I looked in to this a few years ago. It was more expensive then, and only went to 20 kHz. Higher frequencies are helpful if you're listening for the hiss of leaking gas, or corona discharge of an electric arc.
The Orin has 6xI2S ports internally, so that would work up to 16*6 = 96 microphones, which is a good number. But it looks like maybe only 3 are brought out & on different dev board connectors [1]? As with a lot of design, the devil is in the details. An FPGA could be easier to configure if you need more than 96 microphones.
Look up acoustic cameras on YouTube, there are some pretty impressive demonstrations of their capability. This is one of the companies I've been watching for a while, but it looks like FLIR and some other big names are getting into it: https://www.youtube.com/@gfaitechgmbh
The one use case that is both creepy and interesting to me is recording a public space and then after the fact 'zooming in' to conversations between individuals.
Armchair comment. I would LOVE to be a grad student again and try to pair it with ultrasound speaker arrays, for medical applications. Essentially a super HIFU (High-Intensity Focused Ultrasound) with live feedback. https://en.wikipedia.org/wiki/Focused_ultrasound
I do my PhD in in-air ultrasound with phased arrays and talk to the medical guys at conferences/labs that we talk to and it's soooo much harder in solids/liquids. The frequency is significantly higher, think 1-10MHz instead of like 40khz, so any normal electronics are out the window.
Hey saw your message a while back in a thread talking about continuous glucose meters and feeling tired and fatigued etc.
Mind contacting me? I'd love to chat. My email is in my profile
I would love to see this come to our various mobile devices in a nicely packaged form. I think part of what is holding back assistants, universal-translators, etc, is poor audio. Both reducing noise and being able to detect direction has a huge potential to help (I want to live-translate a group conversation around a dining table, for example).
Firstly it would be great if my phone + headphones could combine the microphones to this end. But what if all phones in the immediate vicinity could cooperate to provide high quality directional audio? (Assuming privacy issues could be addressed).
For the hard of hearing like me the killer application would be live transcription in a noisy setting like a meetup or party, with source separation and grouping of speech from different speakers. Could be life-changing.
(Android's Live Transcribe is very good now but doesn't even try to separate which words are from different speakers.)
* Automatic speech recognition (ASR) systems have progressed to the point where humans can interact with computing devices using speech. However, the distance between a device and the speaker will cause a loss in speech quality and therefore impact the effectiveness of ASR performance. As such, there is a greater need to have reliable voice capture for far-field speech recognition. The launch of Amazon Echo devices prompted the use of far-field ASR in the consumer electronics space, as it allows its users to interact with the device from several meters away by using microphone array processing techniques.*
In general the position of the microphones in space must be known precisely for the phase shifting math to be done well, and also the clocks on the phones would need to be in sync at high precision like 10x the highest frequency sound you're picking up. In other words within 10s of thousands of a second. Also if the array mic locations is not a simple straight line, circle, or other simple geometry the computer code (ie. math) to milk out an improved signal becomes very difficult.
Boeing ginned up a spherical version of these and used it on 787 prototypes to identify candidates for sound deadening material.
Apparently in loud situations like airplanes, audio illusions can make a sound appear to come from a different spot than it really is. And when you have a weight budget for sound dampening material it matters if you hit the 80/20 sweet spot or not.
If somebody wants to play around with Zynq 7010's - have a look at the EBAZ4205 board. They can be bought from Aliexpress (20-30€). These are former Bitcoin Mining controllers.
Some people reverse engineered the entire thing. It can be found in GitHub. And there's an adapter plate available for getting to the GPIOs.
For a less complex entry there are also Chinese FPGAs ("Sipeed" boards which use a GoWin FPGA. They are quite capable and the IDE is free.
I'm a bit surprised by those long "arm" PCBs. They are already doing calibration to account for some relatively large offsets: why not place each sensor on its own PCB, mount them to some carrier structure, and let calibration deal with the rest?
Huh, you're right. I expected 24-inch-long PCBs to be quite a bit more expensive, but even 4-layer boards at those sizes are still available at discount prices. I guess such thin boards could be used to fill in edges of mixed-order panels? It does make me wonder why they say "the array" was $700. Maybe assembly was extremely expensive
It doesn't seem they weren't really able to benefit from it all that much, though: half of them arrived defective, and they had to do quite a lot of debugging to fix them.
Starting to see more & more of this with drones. In some cases, it's for military to detect drones nearby. In others, it's being used by drone delivery companies to detect other planes in the sky in a way that is cheaper, works in low-visibility, and doesn't use the same power requirements as radar.
A similar technique is very popular in industrial automation to spot leaks in compressed air pipes and their connections from far away. These leaks are extremely loud in the ultrasonic range. It's overlayed with a camera picture.
I've always wanted this for videoconferencing room. A microphone array around the screen should be able to dynamically focus on the active talkers and cancel out background noise and echos to get much better sound quality that the muddy crap we usually get.
If there were a speaker array around the screens too, you might be able to localize the audio for each person so that it seems like the sound is coming from where their head is on the screen.
Microsoft Research had papers on speaker arrays that allowed speaker focus and noise cancelling a couple of decades ago. I think the technology eventually ended up in the Kinect.
I think Cisco had something similar in their large screen meeting room video conferencing systems that could do positional audio tracking of multiple people. Could be wrong, but I think that was at least 10 years or so ago, if not more.
I wish could rent one to figure out which device in my office has a squealing capacitor. I can hear it well enough to be driven crazy by it, but not well enough to find it. I start disconnecting things to narrow it down but then convince myself that it's my ears ringing.
I'm unsure if I'll age out of this problem, or if worse hearing will just recreate it at different thresholds.
You might have some luck with a spectrum analyzer app[1]. A fixed-pitch whine should show up as a line on the waterfall graph. If you move the phone around to differently locations, you might see the line getting stronger or weaker. You can also try rotating the phone to different orientations to see if it is coming from a particular direction.
I used this to locate an annoying squeal coming from some equipment at work once. And to confirm that it wasn't imaginary.
At a rough guess from the audio samples, that array is producing an acceptance angle much narrower than any Soundfield mic is capable of. The noise source is only 45 degrees off-axis; I'd say any first-order microphone polar pattern (i.e. those a Soundfield mic is capable of) would capture more of the noise than is demonstrated here.
Of course, you can improve on the rejection of off-axis sound by instead using a microphone with a more specialized polar patten (e.g. a shotgun mic), but then you lose the property of the pattern being steerable merely by signal processing.
Lastly, such an array of dirt cheap pressure sensitive mic capsules with some clever computation behind them strikes me as the sort of thing you could throw Moore's law at, if you could justify the quantity. Whereas, Soundfield mics don't make much sense unless you're working with very precisely machined pressure-gradient capsules.
Still, I get the feeling it'll be a while yet before this technique starts looking viable for audio production work, but it's very interesting.
This is more or less the same principle of how Amazon Echo devices work, but on steroids.
Very neat. I would be surprised if you aren’t seeing some diminished marginal returns from all those extra mics, but I guess you’re trying to capture azimuth deltas that Echo devices don’t really care about.
I was just doing research and landed on this exact page last night! I was wondering if anyone knows how someone could mic a room and record audio from only a specific area. For my use case I want to record a couch so I can watch TV with my friends online and remove their speech + show noise from the audio. Setting up some array of mics and using them for beam steering would probably work but there's not a lot of examples I could find on GitHub with code that works in real time.
From the article "The simplest method of beamforming is delay-and-sum (DAS)". Measure distance from a point (couch) to each microphone, delay the signal in time domain by the time the sound takes to travel from point (couch) to microphone, and add up the signals. Pretty trivial. Basically you want the microphones receive the couch signal at the same time, even though they are different distances away.
Make sure there is enough variation in microphone distances for this method to be effective.
I wonder how well this would work with laser microphones on a pane of glass. Can you infer keystrokes with near infrared laser? That is, can you identify the heatmap of keystroke events to infer which keyboard they're using, then replay the tape to identify the strings of characters being typed? Can you localize the turning of pages with UV?
This beamforming effect only works well when each sensor is getting a dramatic enough "different angle" on the signal that each one can use phase shifting to cancel out other noise, but with a laser there's not really any noise to cancel out (i mean you're just monitoring a vibrational spot on a window), and you also don't have a far enough "different angle" to shine from, if you're monitoring from one spot.
However having multiple lasers from multiple different locations might be able to create an improved signal if all signals are averaged, but it wouldn't really be due to the phase shifting that's used in beamforming.
Didn't Israeli students show that you can recover audio from the vibrations of bulb filament with a fast photo diode?
I'd test that with a CCD line sensor plus a wide aperture lens and reading it out with 8kHz. Then you have 128 audio pixels that can cover an entire city.
Line of sight might be an issue there. I'm thinking more high-end clandestine eavesdropping. Fun fact: curtains are a pretty good defeat for laser microphones, but if the building is really old and made of solid stone, you can point at the rock instead!
The rock?! That’s incredible. I would have guessed it was too dense to pick up normal speaking volume. Then again, even the window glass vibration seems pretty magical to me.
Because the distance between the mics needs to be 1) large and 2) consistent. It would work with a grid but the mics near the middle would be "underutilized" (not maximally taken advantage of), and also in a grid the mathematics is horrendous, but with a circle it's simple.
Could this be combined with a smaller number of high quality mics and then machine learning or something else incorporating them to boost the overall quality while maintaining all the other features?
afaik, it really depends on the spatial structure of the audio field.
think nyquist sampling rates, applied to space, and you can't apply a low-pass filter just because you don't care about higher-order signals. that means that for any given audio environment, there will be some "spatial spectrum" of signal, and you need to sample it densely enough to avoid aliasing.
"As part of the calibration, the speed of sound is also a parameter which is optimized to obtain the best model of the system, which allows this whole procedure to act as a ridiculously overengineered thermometer."
Reminds me of the electronics adage: "all sensors are temperature sensors, some measure other things as well."
Back in high school, I built (with some parental assistance) an apparatus to measure how quickly the pressure would drop (in a pressurized cylinder) when a very small hole allowed air to leak out.
Turns out, not only can you measure temperature that way, but can extrapolate the graph out to find absolute zero (IIRC my result was out by about 20 kelvin, which I think is pretty damn good for a high-school-garage project).
I love these kind of inadvertent measurements. One of my favorite examples is that a sufficiently accurate IMU can get you relatively accurate longitude measurements from the Coriolis effect.
Asahi Linux (and likely MacOS too) uses the resistance of the speakers coils to detect overheating of same speakers and reduces volume.
https://github.com/AsahiLinux/speakersafetyd
Interesting. If the voltage across the speaker voice coil can be sampled with enough sensitivity at a fast-enough rate, you have an undocumented microphone.
This is true of all speakers
Exactly.
Is that the same thing where a flat-earther tried to measure something with an expensive laser gyro and kept finding that Earth was rotating?
I think the most you can tell from an IMU or gyro is that there is a change in velocity in a direction aligning with East-West when there is a change in location and that the change in velocity is greater when the location changes in line with North-South. The change in velocity would be greater as one approaches the poles and lesser at the equator.
Thought experiment: if I zeroed my IMU at the North pole and traveled in a straight line away from the pole along longitude zero, following the guidance of the IMU. By the time I got to 45° latitude I’d be traveling Westward at 1,180 kph (.95 Mach) to keep the IMU at zero.
The flat earther used a fibre optic gyro. You don't "zero" it, it continuously outputs a measurement of its own angular rate around it's sensitive axis. For a 3-axis gyro placed still on earth, it will read about 15 degree/hour around wherever the axis of earth is oriented.
Slight correction, latitude, not longitude.
The earth’s surface closer to the poles has less distance to travel for any rotation than the surface closer to the equator. As a result the inertial navigation systems of long distance systems must be adjusted. Iirc, this is also the case for artillery firing computations.
https://www.oxts.com/blog/going-round-circles-earth-rotation...
https://www.britannica.com/science/latitude
Coriolis corrections are thrown into sniper ballistic calculations, too. Not a huge effect in most conditions, but not zero, and there have been a lot of long shots in the past two decades.
"Oi! Suzy!"
I believe this is one of the initial steps an aircraft INS uses to find north while it is aligning, but it's been too long since I had aircraft systems theory in the front of my brain.
Yes, from earth rotation the INS could figure out true north if the latitude is known. Or figure out the latitude if current heading is known. But normally it's aligned with a starting position from pilot input or GPS.
https://en.m.wikipedia.org/wiki/Inertial_measurement_unit
I just learned how the Duracell Powercheck© worked, which was done with temperature.
https://youtu.be/zsA3X40nz9w?si=oGg2wdUlLXSDxpsN
Is there one saying “All electronic devices are smoke machines, some can compute too”?
Similarly, diesel engines come with a reserve fuel supply that you can accidentally use once. (diesel engines will happily run on engine oil when warm)
"All diodes are light-emitting if you try hard enough"
All diodes are also light SENSING is you try hard enough.
You don't have to try hard. Just use it as a photodiode and it magically works. However, if it's inside a plastic case that blocks light, it doesn't.
Due to some law about entropy, efficient processes are necessarily reversible. That's why electric motors - some of the most efficient machines ever invented - are also generators.
> However, if it's inside a plastic case that blocks light, it doesn't.
You want an ordinary diode to allow current to flow easily when it senses light? Simple: shine a powerful laser at the plastic-encased diode and it will melt the plastic and liquify the metal, fusing it together and allowing current to flow again. See? You just needed to try harder.
All diodes are photodiodes, one has to be esp careful of glass encapsulated diodes. I have had that bite me before.
Ah, the light emitting resistor. The moment when you realize why it's called Ohm's Law.
"All diodes are light-emitting at least once"
Hahaha yea
I've seen that in electronics lab a few times. The "temporarily light emitting diode"
"Inside every amplifier is an oscillator trying to get out."
"All electronics are hand-warmers if miscalibrated correctly enough."
> Reminds me of the electronics adage: "all sensors are temperature sensors, some measure other things as well."
I wanna say that’s a Bob Pease quote but I can’t find an attribution to it.
I first encountered it in Elecia White's book Making Embedded Systems, but the attribution is anonymous and whom it's attributed to may have heard it elsewhere.
It does act as a thermometer, if and only if the altitude remains constant. The speed of sound fluctuates with both temperature and altitude
I’m not sure how the speed of sound could depend on altitude, even in principle. The air doesn’t know where it is!
Putting that aside, in an ideal gas, the speed of sound depends on the composition of the gas and the temperature and, interestingly, does not depend on pressure, and pressure is the main way that the altitude would affect the speed of sound. So measuring the speed of sound in air actually makes for a pretty good thermometer.
https://en.wikipedia.org/wiki/Speed_of_sound
In liquids the speed of sound is related to the density, I would have thought similar for air but I see your point. Very insightful!
From your own link:
"The speed has a weak dependence on frequency and pressure in ordinary air, deviating slightly from ideal behavior."
"The speed of sound is raised by humidity. The difference between 0% and 100% humidity is about 1.5 m/s at standard pressure and temperature, but the size of the humidity effect increases dramatically with temperature."
"Slight" can matter significantly in an application like this.
Can an ideal gas of same volume, mass and temperature be brought to different pressures?
https://courses.lumenlearning.com/suny-physics/chapter/13-3-...
Not unless you change the average mass of the molecules.
An ideal gas’ pressure is a function of number of particles per unit volume, its temperature, and nothing else. If you do anything involving adding or removing heat or changing the volume or pressure, you probably also need to know the specific heat at constant volume and the specific heat at constant pressure or, frequency, their ratio. That ratio is called the adiabatic index or the heat capacity ratio, it’s written as gamma, and it’s the last parameter in the speed of sound of an ideal gas. Interestingly, it doesn’t vary all that much between different gasses.
Right, it gets even worse: Air pressure in not only altitude-dependent but fluctuates even at constant altitude. The pressure (altitude) dependence is comparatively weak, though.
one might say air pressure changes constantly as we speak.
Isn't air pressure the only thing that microphones actually measure?
By definition, sure. But one always needs some effect which changes some electrical property. We can't just hook up an ADC (analog digital converter) to thin air and hope for the best.
In practice most microphones measure the displacement of microscopic membranes, which are deformed by the air pressure. The next question then becomes how to measure microscopic movements of a tiny membrane. Turns out the membrane forms part of a capacitor and the electrical characteristics of capacitors depend on their geometry.
That is not necessary true.
There are at least 4 different types of microphones. Condenser which does in fact form part of a capacitor, dynamic which is effectively a linear generator (coil attached to membrane), ribbon which is a change in resistance as a small ribbon flexes and piezoelectric which is some black magic witg crystals
Sure, that's why I wrote most microphones.
There are also some exotic principles like laser or radar microphones using interferometry.
https://en.m.wikipedia.org/wiki/Laser_microphone
https://ieeexplore.ieee.org/document/7808865
I think popular is very situational though.
For me I see a lot more dynamic than condensers but I guess if you are talking about what is in like every single IOT thingamabob then you might be right there.
Fascinating. Is there a book about the history of microphones?
I find this to all be in the realm of "I don't believe you that any of this works at all" if I didn't have a lifetime of experience with the fruits of successfully-functioning microphones.
Air pressure differentials, to be precise!
Many types measure the derivative of air pressure. One that measures absolute air pressure can be used for calibration.
The speed of sound fluctuates with density. Altitude and temperature both change density.
Oh yeah. I realised this the day I discovered my fancy digital SLR was a thermometer: https://entropicthoughts.com/does-my-dslr-have-dead-pixels
A lot of people like myself consider heat a form of light but I guess a photographer would be just thinking visible light. They say that about 50% of the sun's light emissions comes in the infrared frequencies.
That seems like a mistake since heat can transfer e.g. via contact without any electromagnetic emission. In fact, that is what I think happens with the sensor also, given that there is an IR filter in front of it.
But I may misunderstand your comment.
I once did a project to do multilateration of bats (the flying mammal) using an array of 4 microphones arranged in a big Y shape on the ground. Using the time difference of arrival at the four microphones, we could find the positions of each bat that flew over the array, as well as identify the species. It was used for an environmental study to determine the impact of installing wind turbines. Fun times.
Reminds me of Intellectual Venture's Optical Fence developed to track and kill mosquitoes with short laser pulses.
As a side-effect of the precision needed to spatially locate the mosquitoes, they could detect different wing beat frequencies that allowed target discrimination by sex and species.
Where can I buy one?
This laser mosquito killer is, and always has been, a PR whitewashing campaign for Intellectual Venture's reputation.
This device has never been built, never been purchasable, and it is ALWAYS brought up whenever IV wants to talk about how cool they are.
And I say this as someone who loosely knows and was friends with a few people that worked there. They brought up this same invention when they were talking about their work. They eventually soured on the company, once they saw the actual sausage being made.
IV is a patent troll, shaking down people doing the real work of developing products.
They trot out this invention, and a handful of others, to appear like they are a public benefit. Never mind that most of these inventions don't really exist, have never been manufactured.
They hide the extent of their holdings, they hide the byzantine network of shell companies they use to mask their holdings, and they spend a significant amount of their money lobbying (bribing).
Why do they need to hide all of this?
Look at their front page, prominently featuring the "Autoscope", for fighting malaria. Fighting malaria sounds great, they're the good guys, right? Now do a bit of web searching to try to find out what the Autoscope is and where it's being used. It's vaporware press release articles going back 8 years.
Look at their "spinouts" page, and try to find any real substance at all on these companies. It is all gossamer, marketing speak with nothing behind it when you actually go looking for it.
Meanwhile, they hold a portfolio of more than 40,000 patents, and they siphon off billions from the real economy. Part of their "licensing agreement" is that you can't talk badly about them after they shake you down, or else the price goes up.
They are rent-seeking parasites.
I don't think you can. This kind of laser devices is wildly dangerous.
I did a similar project at 18. Needless to say I didn't have enough HW and SW skills to do much since I implemented the most naive form of the TDOA algorithms as well as the most inefficient way of estimating the time difference through cross correlation. I still learnt a lot and it led me to eventually getting a PhD in SAR systems, which are actually beamformers using the movement of the platform instead of an array
What were the results of your study? I’ve heard that bat lungs are so sensitive that when they fly across the pressure differential of large turbines their capillaries basically explode
I would love to do something like that to track the bats in my garden, how feasible would it be for an amateur to do as a personal project? Any good references on where to start.
That sounds super interesting. Is there a write up somewhere of the project?
I had no idea they were mammals until this comment. I thought they were furry birds!
It is not unreasonable to think of bats as flying mice.
In Swedish that is almost exactly what they are called, bat translates to "fladdermus" which is "fladder" (flutter) and "mus" (mouse).
That sounds like a fun project. Was it part of a research grant?
Honestly, that sounds like amazing work. I wish I could afford to get out of enterprise software engineering and just do academic software development like that.
> bats (the flying mammal)
As opposed to?
Baseball bats?
Here is a nice article on a study of baseball bats using microphones. https://www.acs.psu.edu/drussell/bats/papers/AcousticsToday_...
I'm curious, why haven't you used TDM I2S microphones for your array and used PDM?
I understand that ICS-52000 is a relatively low cost ($2/100pcs) and there are even breakout boards available with 4 microphones, which can be chained to 8 or 16, like https://www.cdiweb.com/datasheets/notwired/ds-nw-aud-ics5200...
Then you can take Jetson (or any I2S capable hardware with DSP or GPU on it) and chain 16 microphones per I2S port. It would seem a lot easier to assemble and probgam, if comared to FPGA setup.
Not OP, but I looked in to this a few years ago. It was more expensive then, and only went to 20 kHz. Higher frequencies are helpful if you're listening for the hiss of leaking gas, or corona discharge of an electric arc.
The Orin has 6xI2S ports internally, so that would work up to 16*6 = 96 microphones, which is a good number. But it looks like maybe only 3 are brought out & on different dev board connectors [1]? As with a lot of design, the devil is in the details. An FPGA could be easier to configure if you need more than 96 microphones.
My notes:
ICS-52000 $3.50, 20 kHz
ICS-41350 $1.05, 40 kHz
SPH0641LU4H-1 $1.45, 80 kHz+
[1] https://docs.nvidia.com/jetson/archives/r34.1/DeveloperGuide...
Look up acoustic cameras on YouTube, there are some pretty impressive demonstrations of their capability. This is one of the companies I've been watching for a while, but it looks like FLIR and some other big names are getting into it: https://www.youtube.com/@gfaitechgmbh
The one use case that is both creepy and interesting to me is recording a public space and then after the fact 'zooming in' to conversations between individuals.
Armchair comment. I would LOVE to be a grad student again and try to pair it with ultrasound speaker arrays, for medical applications. Essentially a super HIFU (High-Intensity Focused Ultrasound) with live feedback. https://en.wikipedia.org/wiki/Focused_ultrasound
I do my PhD in in-air ultrasound with phased arrays and talk to the medical guys at conferences/labs that we talk to and it's soooo much harder in solids/liquids. The frequency is significantly higher, think 1-10MHz instead of like 40khz, so any normal electronics are out the window.
Then, why not be a grad student again?
Maybe they want to afford dinner?
Hey saw your message a while back in a thread talking about continuous glucose meters and feeling tired and fatigued etc. Mind contacting me? I'd love to chat. My email is in my profile
TANSTAAFL, but student loans too.
I may be the FUS grad student you seek. Reach out via profile email if you want to chat. Cheers!
Medical applications would presumably require contact coupling and not through air?
I would love to see this come to our various mobile devices in a nicely packaged form. I think part of what is holding back assistants, universal-translators, etc, is poor audio. Both reducing noise and being able to detect direction has a huge potential to help (I want to live-translate a group conversation around a dining table, for example).
Firstly it would be great if my phone + headphones could combine the microphones to this end. But what if all phones in the immediate vicinity could cooperate to provide high quality directional audio? (Assuming privacy issues could be addressed).
For the hard of hearing like me the killer application would be live transcription in a noisy setting like a meetup or party, with source separation and grouping of speech from different speakers. Could be life-changing.
(Android's Live Transcribe is very good now but doesn't even try to separate which words are from different speakers.)
* Automatic speech recognition (ASR) systems have progressed to the point where humans can interact with computing devices using speech. However, the distance between a device and the speaker will cause a loss in speech quality and therefore impact the effectiveness of ASR performance. As such, there is a greater need to have reliable voice capture for far-field speech recognition. The launch of Amazon Echo devices prompted the use of far-field ASR in the consumer electronics space, as it allows its users to interact with the device from several meters away by using microphone array processing techniques.*
https://assets.amazon.science/da/c2/71f5f9fa49f585a4616e49d5...
In general the position of the microphones in space must be known precisely for the phase shifting math to be done well, and also the clocks on the phones would need to be in sync at high precision like 10x the highest frequency sound you're picking up. In other words within 10s of thousands of a second. Also if the array mic locations is not a simple straight line, circle, or other simple geometry the computer code (ie. math) to milk out an improved signal becomes very difficult.
I believe modern macbook pro’s already have multiple microphones that probably do some phase-array magic.
Pretty much every device does, the trick always was if it actually worked, which Apple is assuredly great at. (source: worked on Google Assistant)
It's already kind of implemented.
Boeing ginned up a spherical version of these and used it on 787 prototypes to identify candidates for sound deadening material.
Apparently in loud situations like airplanes, audio illusions can make a sound appear to come from a different spot than it really is. And when you have a weight budget for sound dampening material it matters if you hit the 80/20 sweet spot or not.
Wow, you can refocus the direction after the audio is recorded!
This would be cool to mix with VR, so you could hear different conversations as you move around a virtual room
If somebody wants to play around with Zynq 7010's - have a look at the EBAZ4205 board. They can be bought from Aliexpress (20-30€). These are former Bitcoin Mining controllers.
Some people reverse engineered the entire thing. It can be found in GitHub. And there's an adapter plate available for getting to the GPIOs.
For a less complex entry there are also Chinese FPGAs ("Sipeed" boards which use a GoWin FPGA. They are quite capable and the IDE is free.
Xilinx tool chain is also no-cost.
I'm a bit surprised by those long "arm" PCBs. They are already doing calibration to account for some relatively large offsets: why not place each sensor on its own PCB, mount them to some carrier structure, and let calibration deal with the rest?
Pcb manufacturing is cheap. I put 20 parts 1.5 inch by 24 inch into pcbway and ended up with final delivered cost of 240 dollars.
Not having to deal with wiring that many individual boards and all days of headaches tracking down issues is well worth it in my book.
Huh, you're right. I expected 24-inch-long PCBs to be quite a bit more expensive, but even 4-layer boards at those sizes are still available at discount prices. I guess such thin boards could be used to fill in edges of mixed-order panels? It does make me wonder why they say "the array" was $700. Maybe assembly was extremely expensive
It doesn't seem they weren't really able to benefit from it all that much, though: half of them arrived defective, and they had to do quite a lot of debugging to fix them.
Starting to see more & more of this with drones. In some cases, it's for military to detect drones nearby. In others, it's being used by drone delivery companies to detect other planes in the sky in a way that is cheaper, works in low-visibility, and doesn't use the same power requirements as radar.
Nice. It would be cool if this project could cleanly separate sources based on location.
That would be a bit like a lightfield camera, where you can edit the focusing parameters after the image has already been taken, but now with sound.
https://en.wikipedia.org/wiki/Light_field_camera
I believe it can, there's a demo under the "Directional Audio" section, unless I misunderstand you.
I’m still sad these didn’t become a thing. I don’t need a 48MP camera phone. No seriously. I do not.
If you can get a microlens array infront of that 48MP imager, you can have the light field camera you seek.
What is the most practical application for this technology? Could you use it to pinpoint sounds coming from a car like a squeak?
A similar technique is very popular in industrial automation to spot leaks in compressed air pipes and their connections from far away. These leaks are extremely loud in the ultrasonic range. It's overlayed with a camera picture.
That's ultra expensive gear.
I've always wanted this for videoconferencing room. A microphone array around the screen should be able to dynamically focus on the active talkers and cancel out background noise and echos to get much better sound quality that the muddy crap we usually get.
If there were a speaker array around the screens too, you might be able to localize the audio for each person so that it seems like the sound is coming from where their head is on the screen.
Shure sells a variety of array microphones (and the software) that handles similar things. I've never used one, but heh.
https://www.shure.com/en-US/products/microphones/mxa710
https://www.shure.com/en-US/products/microphones/mxa920
Beamforming is standard in modern conference room gear. It's being used for making a video focus on the active speaker and optimizing his audio.
Have a look at the "Meeting Owl" for example.
It works great up to a limit (around 5m) then you will need additional microphones closer to the speaker.
XMOS targets this space specifically.
https://www.xmos.com/ a descendant of the Transputer.
Microsoft Research had papers on speaker arrays that allowed speaker focus and noise cancelling a couple of decades ago. I think the technology eventually ended up in the Kinect.
I think Cisco had something similar in their large screen meeting room video conferencing systems that could do positional audio tracking of multiple people. Could be wrong, but I think that was at least 10 years or so ago, if not more.
You just need to buy actual video conferencing gear, this is par for the course.
I wish could rent one to figure out which device in my office has a squealing capacitor. I can hear it well enough to be driven crazy by it, but not well enough to find it. I start disconnecting things to narrow it down but then convince myself that it's my ears ringing.
I'm unsure if I'll age out of this problem, or if worse hearing will just recreate it at different thresholds.
You might have some luck with a spectrum analyzer app[1]. A fixed-pitch whine should show up as a line on the waterfall graph. If you move the phone around to differently locations, you might see the line getting stronger or weaker. You can also try rotating the phone to different orientations to see if it is coming from a particular direction.
I used this to locate an annoying squeal coming from some equipment at work once. And to confirm that it wasn't imaginary.
---
[1] On Android, I like these two:
Spectroid (https://play.google.com/store/apps/details?id=org.intoorbit....). If you use this, consider turning on the waterfall display in the settings.
Spectral Audio Analyzer (https://play.google.com/store/apps/details?id=radonsoft.net....). This has more color options for the waterfall display.
Phyphox is a great sensor suite app for undergrad Physics experiments, and it includes a spectrum analyzer. Also, it supports both iOS and Android.
The tech is beamforming .. the applications are AV conferencing, camera tracking, voice lift, or sound reinforcement
What about a soundfield microphone? Does about the same thing and the electronics can be done in the analogue domain.
At a rough guess from the audio samples, that array is producing an acceptance angle much narrower than any Soundfield mic is capable of. The noise source is only 45 degrees off-axis; I'd say any first-order microphone polar pattern (i.e. those a Soundfield mic is capable of) would capture more of the noise than is demonstrated here.
Of course, you can improve on the rejection of off-axis sound by instead using a microphone with a more specialized polar patten (e.g. a shotgun mic), but then you lose the property of the pattern being steerable merely by signal processing.
Lastly, such an array of dirt cheap pressure sensitive mic capsules with some clever computation behind them strikes me as the sort of thing you could throw Moore's law at, if you could justify the quantity. Whereas, Soundfield mics don't make much sense unless you're working with very precisely machined pressure-gradient capsules.
Still, I get the feeling it'll be a while yet before this technique starts looking viable for audio production work, but it's very interesting.
This is more or less the same principle of how Amazon Echo devices work, but on steroids.
Very neat. I would be surprised if you aren’t seeing some diminished marginal returns from all those extra mics, but I guess you’re trying to capture azimuth deltas that Echo devices don’t really care about.
I was just doing research and landed on this exact page last night! I was wondering if anyone knows how someone could mic a room and record audio from only a specific area. For my use case I want to record a couch so I can watch TV with my friends online and remove their speech + show noise from the audio. Setting up some array of mics and using them for beam steering would probably work but there's not a lot of examples I could find on GitHub with code that works in real time.
You might look into OBS and/or VoiceMeeter to see how streamers selectively route audio while livestreaming/recording video/audio streams.
https://obsproject.com/
https://voicemeeter.com/
Loud show noise and your online friends' nearby audio is going to be reflected around the room as well as off of your bodies.
What you want isn't microphone or beamforming tech, it's echo cancellation the same as every videoconferencing software uses.
You just need to feed the show audio and friend audio in, and apply echo cancellation to each.
From the article "The simplest method of beamforming is delay-and-sum (DAS)". Measure distance from a point (couch) to each microphone, delay the signal in time domain by the time the sound takes to travel from point (couch) to microphone, and add up the signals. Pretty trivial. Basically you want the microphones receive the couch signal at the same time, even though they are different distances away.
Make sure there is enough variation in microphone distances for this method to be effective.
This has been on my to-do list since forever! Nice work Ben Wang.
I wonder how well this would work with laser microphones on a pane of glass. Can you infer keystrokes with near infrared laser? That is, can you identify the heatmap of keystroke events to infer which keyboard they're using, then replay the tape to identify the strings of characters being typed? Can you localize the turning of pages with UV?
This beamforming effect only works well when each sensor is getting a dramatic enough "different angle" on the signal that each one can use phase shifting to cancel out other noise, but with a laser there's not really any noise to cancel out (i mean you're just monitoring a vibrational spot on a window), and you also don't have a far enough "different angle" to shine from, if you're monitoring from one spot.
However having multiple lasers from multiple different locations might be able to create an improved signal if all signals are averaged, but it wouldn't really be due to the phase shifting that's used in beamforming.
Didn't Israeli students show that you can recover audio from the vibrations of bulb filament with a fast photo diode?
I'd test that with a CCD line sensor plus a wide aperture lens and reading it out with 8kHz. Then you have 128 audio pixels that can cover an entire city.
Line of sight might be an issue there. I'm thinking more high-end clandestine eavesdropping. Fun fact: curtains are a pretty good defeat for laser microphones, but if the building is really old and made of solid stone, you can point at the rock instead!
The rock?! That’s incredible. I would have guessed it was too dense to pick up normal speaking volume. Then again, even the window glass vibration seems pretty magical to me.
I wonder if there is a meaningful limit to number of listening zones. I’m imagining a 3d grid of virtual mics in a space, each with an AI behind it
Heck, train the model on the raw sensor data and you get the most awesome conference mics
Why a radial pattern and not a grid?
Because the distance between the mics needs to be 1) large and 2) consistent. It would work with a grid but the mics near the middle would be "underutilized" (not maximally taken advantage of), and also in a grid the mathematics is horrendous, but with a circle it's simple.
Could this be combined with a smaller number of high quality mics and then machine learning or something else incorporating them to boost the overall quality while maintaining all the other features?
afaik, it really depends on the spatial structure of the audio field.
think nyquist sampling rates, applied to space, and you can't apply a low-pass filter just because you don't care about higher-order signals. that means that for any given audio environment, there will be some "spatial spectrum" of signal, and you need to sample it densely enough to avoid aliasing.