Automatic identification of non-linguistic speech features (e.g. the speaker or the language of an utterance) are currently of practical interest. In this paper, we first impose a set of requirements that we think a statistical model used in non-linguistic feature identification should satisfy. Namely, these requirements are capturing both short and long term correlations in addition to maintaining a certain acoustic resolution. A model satisfying these requirements, and in the same time having the attractive feature of requiring no transcribed speech material during training is proposed. Experimental evaluation of the approach in speaker recognition on the TIMIT database is presented, where recognition rates up to 99.2 % are achieved.