Automated annotation of birdsong with a neural network that segments spectrograms

Abstract

Songbirds provide a powerful model system for studying sensory-motor learning. However, many analyses of birdsong require time-consuming, manual annotation of its elements, called syllables. Automated methods for annotation have been proposed, but these methods assume that audio can be cleanly segmented into syllables, or they require carefully tuning multiple statistical models. Here, we present TweetyNet: a single neural network model that learns how to segment spectrograms of birdsong into annotated syllables. We show that TweetyNet mitigates limitations of methods that rely on segmented audio. We also show that TweetyNet performs well across multiple individuals from two species of songbirds, Bengalese finches and canaries. Lastly, we demonstrate that using TweetyNet we can accurately annotate very large datasets containing multiple days of song, and that these predicted annotations replicate key findings from behavioral studies. In addition, we provide open-source software to assist other researchers, and a large dataset of annotated canary song that can serve as a benchmark. We conclude that TweetyNet makes it possible to address a wide range of new questions about birdsong.

Download paper here

BibTex:

@article {10.7554/eLife.63853,
article_type = {journal},
title = {Automated annotation of birdsong with a neural network that segments spectrograms},
author = {Cohen, Yarden and Nicholson, David Aaron and Sanchioni, Alexa and Mallaber, Emily K and Skidanova, Viktoriya and Gardner, Timothy J},
editor = {Goldberg, Jesse H and Calabrese, Ronald L and Goldberg, Jesse H and Brainard, Michael},
volume = 11,
year = 2022,
month = {jan},
pub_date = {2022-01-20},
pages = {e63853},
citation = {eLife 2022;11:e63853},
doi = {10.7554/eLife.63853},
url = {https://doi.org/10.7554/eLife.63853},
abstract = {Songbirds provide a powerful model system for studying sensory-motor learning. However, many analyses of birdsong require time-consuming, manual annotation of its elements, called syllables. Automated methods for annotation have been proposed, but these methods assume that audio can be cleanly segmented into syllables, or they require carefully tuning multiple statistical models. Here, we present TweetyNet: a single neural network model that learns how to segment spectrograms of birdsong into annotated syllables. We show that TweetyNet mitigates limitations of methods that rely on segmented audio. We also show that TweetyNet performs well across multiple individuals from two species of songbirds, Bengalese finches and canaries. Lastly, we demonstrate that using TweetyNet we can accurately annotate very large datasets containing multiple days of song, and that these predicted annotations replicate key findings from behavioral studies. In addition, we provide open-source software to assist other researchers, and a large dataset of annotated canary song that can serve as a benchmark. We conclude that TweetyNet makes it possible to address a wide range of new questions about birdsong.},
keywords = {songbirds, machine learning algorithms, automated annotation, canaries, bengalese finches, song syntax, neural network, sound event detection},
journal = {eLife},
issn = {2050-084X},
publisher = {eLife Sciences Publications, Ltd},
}

Share on

Twitter Facebook LinkedIn

David Aaron Nicholson

Abstract

Share on