This paper reports on recent developments for the creation and analysis of very large databases of emotional and attitudinally-marked speech for the support of research into concatenative methods for producing synthesised speech which is capable of expressing the range of prosody and phonation styles to emulate human spoken interactions. It addresses the problems of ensuring high spontaneity in the speech corpus while at the same time collecting data that is of high enough audio quality to allow signal analysis by automatic processing techniques. The paper suggests that in order to describe such speech adequately, a new grammar for spoken language will be required.