Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

Hog and Subband power distribution image features for acoustic scene classification

Victor Bisot 1, 2 Slim Essid 1, 2 Gael Richard 1, 2 
1 S2A - Signal, Statistique et Apprentissage
LTCI - Laboratoire Traitement et Communication de l'Information
Abstract : Acoustic scene classification is a difficult problem mostly due to the high density of events concurrently occurring in audio scenes. In order to capture the occurrences of these events we propose to use the Subband Power Distribution (SPD) as a feature. We extract it by computing the histogram of amplitude values in each frequency band of a spectrogram image. The SPD allows us to model the density of events in each frequency band. Our method is evaluated on a large acoustic scene dataset using support vector machines. We outperform the previous methods when using the SPD in conjunction with the histogram of gradients. To reach further improvement, we also consider the use of an approximation of the earth mover's distance kernel to compare histograms in a more suitable way. Using the so-called Sinkhorn kernel improves the results on most of the feature configurations. Best performances reach a 92.8% F1 score.
Complete list of metadata
Contributor : TelecomParis HAL Connect in order to contact the contributor
Submitted on : Friday, September 13, 2019 - 4:47:03 PM
Last modification on : Friday, January 14, 2022 - 4:24:01 PM


  • HAL Id : hal-02287266, version 1


Victor Bisot, Slim Essid, Gael Richard. Hog and Subband power distribution image features for acoustic scene classification. EUSIPCO, Sep 2015, Nice, France. pp.719-723. ⟨hal-02287266⟩



Record views