Identifying acoustic properties that characterize reading literary
genres can assist in giving a more personal and human tone to the speech
of bots and automatic readings.
In this paper we consider
the following question: given speech segments of audiobooks, how well
can we classify them according to their literary genres? In this study
we consider three different literary genres: children, horror and suspense,
and humorous audio books, taken from two free audio books sites: Librivox
and YouTube.
We ran four classification experiments: three for each pair of
genres, and one for all three genres together. We repeated each experiment
twice, with two different network architectures: Convolutional Neural
Network (CNN) and Recurrent Neural Network (RNN).
Note that, throughout
the reading, there are sections that are more typical to the book’s
genre than others. As the samples were taken sequentially throughout
the reading of the books and were short in duration, we did not expect
high classification rates. Nevertheless, the accuracy of all the experiments
were at least 72% for all the pair’s classifications; and at
least 57% for both architectures for the three classes classifications.