Abstract¶

Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated transformation networks, these prototypes can be used to cluster and classify input audio samples from large collections of sounds. Our model can be trained with or without supervision and reaches state-of-the-art results for speaker and instrument identification, while remaining easily interpretable

Pipeline¶

pipeline

Method overview. Given an input sound, we predict for each prototype a gain, a pitch shift, as well as low and high frequency filters at each timestamp to generate the output. Prototypes and transformations are learned jointly using a reconstruction loss in either a supervised or unsupervised setting.

Ressources¶

If you find this project useful for your research, please cite:

@article{loiseau22amodelyoucanhear,
  title     = {A Model You Can Hear: Audio Identification with Playable Prototypes},
  author    = {Loiseau, Romain and Bouvier, Baptiste and Teytaut, Yann and Vincent, Elliot and Aubry, Mathieu and Landrieu, Loïc},
  journal   = {ISMIR},
  year      = {2022}
}

Code

Paper

Acknowledgements¶

This work was supported in part by ANR project Ready3D ANR-19-CE23-0007 and HPC resources from GENCI-IDRIS (Grant 2020-AD011012096).

A Model You Can Hear: Audio Identification with Playable Prototypes

1LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, France2LASTIG, Univ. Gustave Eiffel, ENSG, IGN, F-94160 Saint-Mande, France3STMS Lab, UMR 9912 (IRCAM, CNRS, Sorbonne University), Paris, France4Inria and DIENS (ENS-PSL, CNRS, Inria)

Abstract¶

Pipeline¶

Ressources¶

Acknowledgements¶

¹LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, France
²LASTIG, Univ. Gustave Eiffel, ENSG, IGN, F-94160 Saint-Mande, France
³STMS Lab, UMR 9912 (IRCAM, CNRS, Sorbonne University), Paris, France
⁴Inria and DIENS (ENS-PSL, CNRS, Inria)