Many intent understanding studies neglect the impact of paralinguistic information, resulting in misunderstandings during speech interactions, particularly when different intentions are conveyed by the same text with varying paralinguistic information. To address this issue, this study developed a Chinese multimodal spoken language intention understanding dataset that features different spoken intentions for identical texts. Our proposed attention-based BiLSTM model integrates textual and acoustic features and introduces an acoustic information gate mechanism to supplement or correct linguistic intention with paralinguistic intention. Experimental results demonstrate that our multimodal integration model improves intent discrimination accuracy by 11.0% compared to models that incorporate only linguistic information. The result highlights the effectiveness of our proposed model for intent discrimination, particularly in cases with identical text but varying intentions.