ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Query Based Acoustic Summarization for Podcasts

Samantha Kotey, Rozenn Dahyot, Naomi Harte

Podcasts are a rich storytelling medium of long diverse conversations. Typically, listeners preview an episode through an audio clip, before deciding to consume the content. An automatic system that produces promotional clips, by supporting acoustic queries would greatly benefit podcasters. Previous text based methods do not use the acoustic signal directly or incorporate acoustic defined queries. Therefore, we propose a query based summarization approach, to produce audio clip summaries from podcast data. Leveraging unsupervised clustering methods, we apply our framework to the Spotify podcasts dataset. Audio signals are transformed into acoustic word embeddings, along with a pre-selected candidate query. We initiate the cluster centroids with the query vector and obtain the final snippets by computing a global and local similarity score. Additionally, we apply our framework to the AMI meeting dataset and demonstrate how audio can successfully be utilized to perform summarization.