Accompanying website to the paper Synthesizing Soundscapes: Leveraging Text-to-Audio Models for Environmental Sound Classification, Francesca Ronchini, Luca Comanducci, Fabio Antonacci, submitted at DCASE 2024

Abstract

In the past few years, text-to-audio models have emerged as a significant advancement in automatic audio generation. Although they represent impressive technological progress, the effectiveness of their use in the development of audio applications remains uncertain. This paper aims to investigate these aspects, specifically focusing on the task of classification of environmental sounds. This study analyzes the performance of two different environmental classification systems when data generated from text-to-audio models is used for training. Two cases are considered: a) when the training dataset is augmented by data coming from two different text-to-audio models; and b) when the training dataset consists solely of synthetic audio generated. In both cases, the performance of the classification task is tested on real data. Results indicate that text-to-audio models are effective for dataset augmentation, whereas the performance of the models drops when relying on only generated audio.

Audio Examples

In this page, we present audio data generated using AudioLDM2 and MusicGen via simple prompt and via ChatGPT prompts (namely AudioLDM2gpt and MusicGengpt). We present results for each of the 10 classes contained in the UrbanSound8K (US8K) dataset: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, street_music. For each class, we present three examples per each model.

1) air_conditioner

USK8 example:

AudioGen
AudioGengpt
AudioLDM2
AudioLDM2gpt

2) car_horn

USK8 example:

AudioGen
AudioGengpt
AudioLDM2gpt
AudioLDM2gpt

3) children_playing

USK8 example:

AudioGen
AudioGengpt
AudioLDM2gpt
AudioLDM2gpt

4) dog_bark

USK8 example:

AudioGen
AudioGengpt
AudioLDM2gpt
AudioLDM2gpt

5) drilling

USK8 example:

AudioGen
AudioGengpt
AudioLDM2gpt
AudioLDM2gpt

6) engine_idling

USK8 example:

AudioGen
AudioGengpt
AudioLDM2gpt
AudioLDM2gpt

7) gun_shot

USK8 example:

AudioGen
AudioGengpt
AudioLDM2gpt
AudioLDM2gpt

8) jackhammer

USK8 example:

AudioGen
AudioGengpt
AudioLDM2gpt
AudioLDM2gpt

9) siren

USK8 example:

AudioGen
AudioGengpt
AudioLDM2gpt
AudioLDM2gpt

10) street_music

USK8 example:

AudioGen
AudioGengpt
AudioLDM2gpt
AudioLDM2gpt