top of page

Acoustic Detection - from Conservation to Conflict Zones

  • Writer: Osinto HQ
    Osinto HQ
  • Jan 20
  • 15 min read
Osinto - Acoustic Intelligence - Raspberry Pi Zero acoustic detector

A strike drone hurtles towards a target in Ukraine. A poacher takes a shot at an endangered bird in Bolivia. Unseen by human eyes both are nonetheless being monitored - by AI-enhanced ‘artificial ears’. Tiny listening stations trigger appropriate responses; be it an air raid alert helping civilians seek shelter or the dispatch of an anti-poaching enforcement team.


This is the world of acoustic monitoring - a fast-evolving discipline within artificial intelligence (AI) gaining significant traction in the dramatically dissimilar sectors of conservation and armed conflict. The daily work and motivations of Colonels and Ecologists may be worlds apart, but their respective fields have a history of convergence in the acoustic domain that stretches back more than a century.


The convergence of cheap consumer electronics and machine learning is reshaping battlefields; in humanity’s fight to protect the natural world and in those we wage against one another.


FROM BOMBS TO BIODIVERSITY - THE UNLIKELY ALLIANCE

Armies have been listening for the sounds of gunfire as far back as the First World War, attempting to gain battlefield advantage through triangulating the positions of enemy artillery batteries. The field of bioacoustics (the study of animal sounds) predates even these early efforts at acoustic shot ranging however, German-born Ludwig Koch capturing birdsong on his father’s waxed cylinder phonograph in 1889 at just eight years of age. He went on to join military intelligence before fleeing to Britain, where many years later his library of sound recordings would be acquired by the British Broadcasting Corporation, helping establish the BBC’s natural history sound library.


The development of radar during World War II would largely supersede combat uses of acoustic detection, on land lat least. A series of concrete acoustic mirrors along England’s south coast were already under construction when the programme was cancelled in favour of the now famous Chain Home radar system. A few still stand today, monuments to the former importance of acoustic intelligence in air defence.


A 4.5m high concrete acoustic mirror in the United Kingdom.
4.5m high concrete acoustic mirror in the United Kingdom [Source: Wikipedia]

POST WWII - ACOUSTICS GOES UNDERWATER

Sound travels far further in water than in air, and so it was the world’s navies that persisted with acoustic detection in the post-war years, whilst their colleagues on land and in the air focused on radar. As a result few are more familiar with the ethereal sounds of cetacean song than naval sonar operators - many are trained to differentiate between the rhythmic clicking of sperm whales and the similar sounds of enemy engines. It was the tireless hunt for enemy submarines that spurred much of the development of hydrophone technology, and the more these listening devices developed, the louder the background cacophony of natural marine noise became.


Large undersea listening networks have been a staple of trans-Atlantic defence since the 1960s - the formerly classified SOSUS now being one of the best known examples. The field of Passive Acoustic Monitoring (PAM) has continued to evolve, with projects like OASIS placing hydrophones on roaming networks of uncrewed vehicles, now able to detect sounds and relay data in realtime.


In both conservation and military contexts it has often been the persistence of acoustic surveillance that has yielded greatest insights, ongoing observation and cheap data storage giving rise to huge resources of accumulated acoustic knowledge.


THE RESURGENCE OF ABOVE-GROUND ACOUSTIC INTELLIGENCE (ACINT)

Converging factors have led to the recent renaissance in aerial and terrestrial acoustic monitoring:


  1. Technology maturation

    • Cheap and powerful single board computers - such as the Raspberry Pi - have become widely available at low cost, enabling efficient edge processing of acoustic data

    • Neural networks have been developed that run on this consumer grade hardware

    • High-quality microphones are widely and cheaply available

    • Data storage costs have fallen dramatically

    • Low cost lithium-ion batteries and solar panels have become widely available

    • Increased mobile (eg. GSM) and satellite (eg. Starlink) network deployments reduce the friction of maintaining communications in austere / remote environments


  1. Evolved battlefield threats

    • The proliferation of drones has fundamentally shifted the nature of modern warfare

    • Most existing radar systems are ill-suited to picking low signature (often composite or carbon fibre) threats out from low altitude background clutter

    • Tethered fibre-optic and autonomously navigating drones, often flying at low altitude, present little/no Radio Frequency (RF) signature, making detection extremely challenging

    • Active measures (such as radar) rapidly paint users as targets in a contested electromagnetic environment, necessitating means of passive detection that minimise / eliminate Radio Frequency (RF) emissions

    • Ukraine’s Passive Acoustic Monitoring (PAM) network of 14k deployed ‘Sky Fortress’ nodes has proven highly performant and cost-effective at early detection of threats such as Shahed-derived one-way-attack / strike drones


  1. Conservation urgency

    • A global biodiversity crisis has driven demand for more advanced environmental monitoring to eg. prevent species extinctions and create measurable awareness of the biodiversity impact of human activities

    • Sporadic human observations are error-prone snapshots, there’s long been a need for persistent, non-invasive methods of observation in ecology and conservation, at low cost

    • Autonomous recording units act as significant force multipliers in resource-constrained sectors

    • Projects like Cornell University’s BirdNET software and the British AudioMoth hardware have helped spur widespread adoption of acoustic monitoring within ecological survey work and conservation monitoring and enforcement operations


HOW ACOUSTIC DETECTION WITH MACHINE LEARNING ACTUALLY WORKS

Perhaps the biggest breakthrough in acoustic detection / monitoring has been pairing low-cost, consumer-grade hardware with Machine Learning (ML) algorithms, specifically Deep Neural Network (DNN) architectures such as the family of deep learning models known as Residual Neural Networks or ResNet.


These neural networks have performed particularly well at classification tasks, such as the classification of recorded sounds (or at least a representation of them).


The core architecture:

  • Microphone captures sound

  • A visual representation of the sound is created - often an Mel spectrogram

  • The neural network classifies the sound, providing a % confidence score


Osinto - Acoustic Intelligence - Tawny Owl mel spectrogram
A spectrogram showing a Tawny Owl call - most acoustic detectors are actually computer vision models which analyse these visual interpretations of recorded sounds.

The key trade-offs involved in design and operation of such systems include:

  • The extent to which data processing takes place on-device vs in the cloud / on servers remote from the sensor itself

  • Microphone selection eg. directional vs omnidirectional, single vs local array

  • Data storage - eg. SD vs NVMe, on-device vs remote

  • Power consumption (closely related to the amount of on-device processing taking place) and availability - eg. mains, batteries, solar / wind charging

  • Communication - standalone systems, GSM / satellite data links

  • Financial budgets - impacting both system and individual node level design choices

  • Coverage - wide area, lower fidelity vs small area, higher fidelity


There are significant limitations to (terrestrial) acoustic monitoring systems, including:

  • Range - perhaps most significantly - from tens / hundreds of meters to a few km at best

  • Weather - impacting range significantly, particularly humidity, precipitation, wind, cloud cover etc

  • Background noise - traffic and other anthropogenic noise can significantly impact the signal-to-noise ratio of captured audio, and hence accuracy of detections, potentially resulting in both increased false positive and false negative detections


Acoustic detection generally excels at detecting if a given noise is present (presence detection) and at cataloguing the diversity of sounds at a given location. It’s more challenging to define precisely triangulated positioning information (though not impossible) and of course any target that emits no / very low noise below detection thresholds of the hardware, will be missed.


ACOUSTICS - ONE LAYER IN A MULTI-SENSORY SYSTEM

Acoustic detectors are perhaps best used as part of multi-sensory monitoring networks, where the ‘AI ears’ can be used to cue other sensor modalities eg. a human-in-the-loop observer, electro-optical or other sensor types. This approach mirrors evolutionary development in predator species quite closely, where an animal might scan for prey with ears before making use of higher fidelity, optical sensors - such as binocular eyes that might be limited by a narrow field-of-view but benefit from high fidelity and outstanding localisation ability.


It’s also worth mentioning that acoustic monitoring systems needn’t be limited to the audible spectrum of human hearing (c. 20Hz to 20 kHz). Indeed subsonic and ultrasonic recording devices are already yielding fascinating insights in a conservation context, in species as diverse as elephants and bats, sensitive to tones as low as 1 Hz infrasound and as high as 200kHz ultrasound respectively. The same can be applied in a defence / security context too.


PASSIVE ACOUSTIC MONITORING (PAM) - FROM ANALYSIS TO EXPERIMENTATION

We’ve combed through books, blog posts and academic research papers, attended workshops and hackathons, held meetings and calls and analysed an intriguing batch of promising startups in the acoustic detection space.


To better understand both the markets and underlying technologies we’ve also gone a step further - engaging in hands-on prototyping and experimentation.


We’ve confirmed that with commercial-off-the-shelf hardware almost anyone today can:

  • Build functioning acoustic detectors with unit costs in the $100 - $1,000 range

  • Use sensors to collect and label new acoustic training data

  • Feed labelled data back into neural network development for a process of continuous performance improvement

  • Rightsize hardware and software to optimise system cost, computational and power efficiency


Building v0.1

Our first experiment was no-cost, testing just how accessible the field had become, using only consumer grade hardware we already had lying around the office. An old USB podcasting microphone was hooked up to a Raspberry Pi 3B+, thrown into a weatherproof box, plugged into the mains and connected to our workshop wifi.


Osinto - Acoustic Intelligence - a zero cost first prototype from used hardware in our workshop.
The bundle of spare parts from which our first, functioning - zero-cost - prototype acoustic detector was assembled.

Using Cornell’s BirdNET algorithm via an open source project named BirdNET-Pi we had a working avian species detector up and running within a couple of hours. We learned that:

  • The ResNet derived neural network architecture underlying BirdNET works well, running CPU inference on very modest hardware

  • The configuration of settings within the classifier can have a dramatic impact on the number of false positive and false negative identifications


On studying our results alongside those of others and consulting the original BirdNET research paper we also appreciated that this particular Convolutional Neural Network (CNN) could be better optimised for deployment at our UK site had the data on which it was trained been more relevant to the detections at our listening site. Our thesis (confirmed by discussion with expert ecologists) quite quickly became that curating a dataset with specific relevance to our monitoring location might yield materially better results than this more general purpose classifier.


Iterating - v0.2

We then tested to see how far we could push things at low/no cost. Could we use the BirdNET paper, a cutting edge Large Language Model (LLM) and an even more basic Raspberry Pi to create both our own device and our own neural network classifier? We aimed to:

  • Run neural network inference on the CPU of a Raspberry Pi Zero2W - an extremely low-cost ($15 RRP) and physically small device

  • Create our own own neural network, with an architecture optimised for inference on this hardware

  • Have an LLM (Claude) write all the code for us, not just for a classifier, but a serviceable prototype front-end for users

  • Train our neural network using our own database of species-labelled bird identifications, collected using our v1 device and the BirdNET algorithm


Using a slightly better microphone (a c. $100 Rode Video Micro directional, condenser type) with a dead cat windshield and Anthropic’s Claude LLM writing all the code, we successfully  got a system up and running on the Pi Zero2W, and running inference on our own neural net. What’s more we’d trained the neural network on quite modest consumer hardware - a MacBook Pro M1 Max with 64GB of RAM. Performance of the model was quite poor however - we immediately saw a huge number of false positives. On consulting the BirdNET paper more deeply we quickly understood why.


Osinto - Acoustic Intelligence - neural network training data snapshot
Our crude first attempt at training a neural network for acoustic classification of avian species.

Our training dataset was far too narrow - we’d simply used all classifications that had >95% confidence to train our AI model, this is far too blunt an instrument. In retrospect we’d done an abysmal job of curating an appropriate dataset on which to train our classifier:

  • There were no negatives - the network hadn’t learned what a car sounded like (our test site was adjacent to a road) nor human speech (which we understand BirdNET has been taught to ignore for privacy reasons)

  • The database was limited in scope, if we hadn’t captured the species before it wouldn’t have a chance and we’d not augmented our own data with any other sources (eg. Xeno Canto for birdsong)

  • Our data labels were far too coarse for effective real-world performance - just a species label won’t cut it, we should have samples labelled alongside specialist ecologists who can provide both confirmation that a given recording is a given species, but also add finer-grained labels such as estimations of distance from microphone, age of bird (eg. juvenile vs adult), sex

  • We’d performed no data augmentation on our limited sample set, eg mixing clean samples with samples of wind noise, rain, cars, buses, airliners and other anthropogenic noise that might be encountered at the monitoring site


UK ecologists also helped us realise that  BirdNET tends to perform best in North America, because the training data underlying the model skews there somewhat (eg. Soundscape data recorded at Cornell University’s ornithology lab woodlands). It suffers a little in performance terms because a lot of the recordings used in initial training are “too clean” vs real world samples the model might encounter when deployed. If the signal-to-noise ratio in your training data is a mis-match to what the network encounters during inference, classifier performance will suffer.


Osinto - Acoustic Intelligence - neural network training lost curve
The loss curve from training our first iteration of an acoustic classifier on a MacBook Pro M1 Max with 64GB RAM.

Future developments and DeepMind’s view

Google DeepMind are active in the bioacoustics field - their ‘Perch 2.0: The Bittern Lesson for Bioacoustics’ paper that accompanied the release of their latest model Perch 2.0 makes some interesting observations, perhaps most notably that:


  1. On audio problems the best performing Machine Learning (ML) models today are those using supervised learning - training using labelled data

  2. Supervised pre-training benefits particularly from having fine-grained labels” - the more granular the labelling of the training data the better the model performance

  3. Dataset diversity is important - incorporating a mix of both focused recordings of subjects and incorporating broader ‘soundscape’ recordings


We hypothesise that the same will hold true for acoustic classification in the defence and security domain, for example in counter-UAS / counter-drone deployments. Fine grained training data labels might prove to be the single most important factor in system performance.


Osinto - Acoustic Intelligence - a prototype front-end dashboard
A functioning front-end for an acoustic classifier, prototyped with Claude in just a few hours, and running on a Raspberry Pi Zero 2W.

MARKET LANDSCAPE AND FUTURE OPPORTUNITIES


The two main markets for acoustic monitoring today break down as follows:


  1. DEFENCE AND SECURITY

    • Air defence - low cost-per-node layer eg. for early warning of strike drones | Example: Sky Fortress in Ukraine - $400-1k per node - 14,000 deployed, NATO funding deployment of another 15,000 [Source: United24 - July 2025]

    • Counter-UAS system component - both as systems deployed on drones for air-to-air engagement of enemy ISR drones, and as last layer of defence against FPV strike drones to eg. directionally cue shotgun armed  infantry [Anonymous sources | Oct 2025]

    • Perimeter security - first layer drone detection at prison / data centre / other critical infrastructure facilities

    • Gunfire detection / shot ranging - from small arms to artillery (most established), vehicle based and man portable | Example: Raytheon BBN Boomerang / Boomerang Warrior-X shot detection systems [Sources: Janes - July 2020 | Army Technology - May 2012 | Raytheon - Sept 2017]


  1. ECOLOGY AND CONSERVATION


Osinto - Acoustic Intelligence - a node from Ukraine's Sky Fortress / Zvook acoustic detection network
A Ukrainian acoustic monitoring node - possibly Zvook / Sky Fortress

Air defence / counter-drone applications within the defence tech sector are particularly active today. Outside of existing defence primes a notable number of startups and SMEs are active in developing acoustic detection technology, with ‘Shazam for drones’ becoming a recurring theme at defence tech hackathons across Europe in 2025. Notable players include but are not limited to:



With unit costs for many of these solutions ranging from the $5,000 to $50,000+ we’d say the Ukrainians are the ones to watch, with sub-$1k / node pricing. Their simple systems appear to use commercial-off-the-shelf parts to maximise speed of deployment and reduce unit cost to a level that enables mass deployment. A large network of simple sensors is proving highly effective. Couple this vast amounts of captured acoustic data to further refine their classification algorithms, and it seems obvious that Ukrainian systems are likely to be today’s state-of-the-art. NATO member Lithuania is already reported to be testing Ukraine’s Sky Fortress system [Source: Militarnyi - July 2025] and the US have expressed interest too [Source: The War Zone - July 2024].


The vibrant ‘Nature Tech’ ecosystem is active too, as this sector map from Nature Tech Collective shows, with 56 terrestrial bioacoustics companies / projects listed, and 19 aquatic. A key advantage the conservation sector has over their military peers is openness, and sheer volume of data. With a shared mission and many players being not-for-profit entities vast databases of acoustic data have been created and shared. Militaries are far less willing to share their data than conservationists! This might just mean they’re ahead of the defence sector in some aspects of applying ML to acoustic classification on Commercial-Off-The-Shelf (COTS) hardware. BirdNET for example was able to make use of 7,000 hours of Xeno Canto recordings and Cornell’s enviable library of 750,000 separate sound recordings of over 10,000 species. Ultimately the model used a total of 226,078 audio samples in model training.


KEY FINDING: IT ALWAYS COMES BACK TO TRAINING DATA


Whatever an acoustic classifier is pointed at - be it quadcopter or quail, missile or mistlethrush - perhaps the most important factor impacting system performance will be the quality of the data used to train the neural network classifier. The more the data is representative of the acoustic signals present in the deployed area, and the finer grained the labels are, the better the system will likely perform.


As the BirdNET authors note “domain-specific data augmentation” (using techniques like mixup) “…is key to build models that are robust against high ambient noise levels” and to deal with overlapping signals.


Key Takeaway: Acoustic classifiers should be trained using data relevant to the environment in which they will be deployed.


LOOKING AHEAD

It seems likely that acoustic intelligence will become an embedded part of multi-modal sensory stacks across conservation, ecology, defence and security. Expect to see monitoring networks deployed with an acoustic layer in all manner of protected areas. So low is the cost of edge compute, simple microphones and a data connection that we should furthermore expect these systems to be deployed at large (landscape) scale. Privacy concerns will become a factor - much as they are with the use of facial recognition in a law enforcement context - and will need to be considered as part of system design from the outset. As such we’d expect early rollouts to be most frictionless in remote environments away from homes and urban areas - a good match with borders, nature reserves, prisons, power infrastructure etc.


Osinto - Acoustic Intelligence - sketches for a next iteration acoustic detector.
Sketches for a third version of our acoustic classifier - rapid iteration and feedback from real-world operations are the key to maintaining advantage. Get in touch if you'd like to collaborate on the next generation device.

Acoustic detection networks that don’t automatically detect and delete human speech, for example, risk being branded as pervasive / invasive surveillance tools, able to collect and interpret vast volumes of intelligence. We’d temper these privacy concerns however, given how popular Chinese surveillance cameras have proved in Western markets, despite their questionable security and the routing of highly sensitive private audio and video data directly through servers in mainland China for example.


The key differentiator of system performance will be the accuracy of the Machine Learning (ML) classifiers used. This in turn will largely depend on both the quantity and quality of labelled training data utilised. Large acoustic datasets with accurate, finely grained labelling are likely to be of significant and growing value. Those deploying sensor networks fastest will be best placed to accrue an unassailable advantage if they’re able to ‘spin up an effective data flywheel’ ahead of others.


We’d also expect players who co-develop hardware and software to be in a position to build advantage, optimising them in synergy for optimal system performance. The proliferation of cheap security cameras from China utilising highly performant system-on-chip architectures matched to the end use of the hardware could be a model to emulate. However we’d expect such custom silicon / circuit board architectures to require substantial up-front subsidy and / or large sales volumes in order to provide satisfactory financial returns. It might be that nation state entity’s involvement could sway a business case, if national security or other surveillance requirements were to be accounted for alongside / as part of pure financial return.


Such challenges notwithstanding AI-augmented acoustic detection technology represents a significant leap forward in humanity’s ability to monitor and understand our environment, potentially reducing our negative impact on the natural world and protecting civilian populations from military aggression in an increasingly unstable world where autonomous weapons are set to proliferate fast.


APPLYING OSINTO'S INTELLIGENCE, RESEARCH & DEVELOPMENT APPROACH

This is the first domain in which we have moved from analysis to action, getting hands-on in a specific domain with applied research. It’s been highly instructive to deepen our understanding of both markets and the underlying technologies shaping their future direction. Bridging the gap between theory and deployment in this way has reinforced just how powerful the focused deployment of commercial-off-the-shelf technology can be. We’re not stopping here!


We’re interested in working with partners - in both the ecology / conservation and defence / security domains on:

  • Acoustic monitoring system design

  • Market intelligence, consulting and executive briefing on acoustic monitoring

  • Prototype development / validation

  • Training data collection and labelling projects

  • Deployment partnerships

  • Further research and development


Whether it’s defending contested borders from drone incursions or protecting biodiversity from decline, the ability to truly listen - persistently and intelligently, and at scale - is proving transformative. At Osinto we’ll continue to track the development of what’s possible, and be a part of this bright future by continuing to actively experiment at the frontiers of what’s possible.


OUR RECOMMENDATIONS

  1. Keep acoustic recorder hardware simple and cheap, use widely available parts

  2. Deploy networks at scale - many simple nodes > a few smart nodes

  3. Fine-grained training data is the biggest determinant of system performance


If the article has sparked your interest, if you have any related questions or would like to talk to us in confidence about a project of your own, please don’t hesitate to contact us here or via email at hello@osinto.com.

Comments


bottom of page