Classifying cetacean species in marine acoustic recordings
A random forest classified closely related cetacean species using acoustic features of echolocation clicks
classification
random forest
An overview of my master’s thesis on passive acoustic monitoring of cetaceans in the California current.
Project Overview
NBHF Odontocetes
- Narrowband high-frequency (NBHF) clicks are produced by four species in the California Current Ecosystem
- P. dalli (Dall’s porpoise)
- P. phocoena (harbor porpoise)
- K. sima (dwarf sperm whale)*
- K. breviceps (pygmy sperm whale)*
*cryptic species
ADRIFT Survey
- ADRIFT survey data are processed to detect NBHF clicks but no further steps are taken to classify to species
- Problem because: surveys are extensive and could be used to estimate population abundance of cryptic Kogia species
BANTER model
- Classifies groups of clicks and/or other types of vocalizations (events) using two stages:
- Employs one or more call classifiers at the first stage
- At second stage, uses distributions of call classification probabilities, in addition to other event-level variables, to classify events
Objective
- Train a BANTER model to classify NBHF echolocation clicks in the California Current System
- Use groundtruth recordings to define classes for Dall’s porpoise and harbor porpoise
- Use drifter recordings from Baja to define Kogia class
Knowns
- A
randomForest
classification model was trained to discriminate between clusters of NBHF clicks in the California Current (Griffiths et al. 2020) - Groundtruth recordings of Dall’s porpoise had a strong affinity to one cluster
Unknowns
- To what extent is ambiguity reduced by:
- classifying events as opposed to individual clicks
- using a supervised approach to training the model
Methods
Define Events
For recordings with known species classification, PAMguard databases are created and events are constructed using one of two methods:
Manual
- Click detections were reviewed referencing features such as bearing, wigner plot, and spectrum.
- Time intervals containing true clicks were designated as events with no attempt to exclude false positives.
MTC
- PAMguard’s default harbor porpoise template was used in the MTC module to assign scores to all clicks (higher score means better match)
- An algorithm with three parameters (threshold, min clicks per event, min seperation between events) then constructed events.
- Analyst reviewed to identify events containing real clicks.
Question: How do the methods compare to one another?
Extract click features
- Events processed using PAMpal to extract click features with a bandpass filter 100-160 kHz
- false positives removed by applying filters to remove
- duration > .02 s
- 3dB BW > 4kHz
- Click channel with the lesser noiseLevel chosen, if applicable.
Question: should other steps be taken to filter clicks?
Train model
ntree=1000
for both stages of the model- sample size left to default
Citations
Griffiths, Emily T., Frederick Archer, Shannon Rankin, Jennifer L. Keating, Eric Keen, Jay Barlow, and Jeffrey E. Moore. 2020. “Detection and Classification of Narrow-Band High Frequency Echolocation Clicks from Drifting Recorders.” The Journal of the Acoustical Society of America 147 (5): 3511–22. https://doi.org/10.1121/10.0001229.