Classifying cetacean species in marine acoustic recordings

A random forest classified closely related cetacean species using acoustic features of echolocation clicks

classification
random forest
An overview of my master’s thesis on passive acoustic monitoring of cetaceans in the California current.
Author

Jackson Vanfleet-Brown

Published

February 25, 2024

Project Overview

NBHF Odontocetes

  • Narrowband high-frequency (NBHF) clicks are produced by four species in the California Current Ecosystem
    • P. dalli (Dall’s porpoise)
    • P. phocoena (harbor porpoise)
    • K. sima (dwarf sperm whale)*
    • K. breviceps (pygmy sperm whale)*

*cryptic species

ADRIFT Survey

  • ADRIFT survey data are processed to detect NBHF clicks but no further steps are taken to classify to species
  • Problem because: surveys are extensive and could be used to estimate population abundance of cryptic Kogia species

A map of tracks of drifters deployed 2016-2022

BANTER model

  • Classifies groups of clicks and/or other types of vocalizations (events) using two stages:
    • Employs one or more call classifiers at the first stage
    • At second stage, uses distributions of call classification probabilities, in addition to other event-level variables, to classify events

Objective

  • Train a BANTER model to classify NBHF echolocation clicks in the California Current System
    • Use groundtruth recordings to define classes for Dall’s porpoise and harbor porpoise
    • Use drifter recordings from Baja to define Kogia class

Knowns

  • A randomForest classification model was trained to discriminate between clusters of NBHF clicks in the California Current (Griffiths et al. 2020)
  • Groundtruth recordings of Dall’s porpoise had a strong affinity to one cluster

Unknowns

  • To what extent is ambiguity reduced by:
    • classifying events as opposed to individual clicks
    • using a supervised approach to training the model

Methods

Define Events

For recordings with known species classification, PAMguard databases are created and events are constructed using one of two methods:

Manual

  • Click detections were reviewed referencing features such as bearing, wigner plot, and spectrum.
  • Time intervals containing true clicks were designated as events with no attempt to exclude false positives.

MTC

  • PAMguard’s default harbor porpoise template was used in the MTC module to assign scores to all clicks (higher score means better match)
  • An algorithm with three parameters (threshold, min clicks per event, min seperation between events) then constructed events.
  • Analyst reviewed to identify events containing real clicks.

Question: How do the methods compare to one another?

Extract click features

  • Events processed using PAMpal to extract click features with a bandpass filter 100-160 kHz
  • false positives removed by applying filters to remove
    • duration > .02 s
    • 3dB BW > 4kHz
  • Click channel with the lesser noiseLevel chosen, if applicable.

Question: should other steps be taken to filter clicks?

Train model

  • ntree=1000 for both stages of the model
  • sample size left to default

Citations

Griffiths, Emily T., Frederick Archer, Shannon Rankin, Jennifer L. Keating, Eric Keen, Jay Barlow, and Jeffrey E. Moore. 2020. “Detection and Classification of Narrow-Band High Frequency Echolocation Clicks from Drifting Recorders.” The Journal of the Acoustical Society of America 147 (5): 3511–22. https://doi.org/10.1121/10.0001229.