Summarizing spoken content using neural approaches has raised emerging research interest lately, as sequence-to-sequence approaches have improved abstractive summarization performance. However, summarizing long meeting transcripts remains challenging. Meetings are multi-party spoken discussions where information is topically diffuse, making it harder for neural models to distill and cover essential content. Such meeting summarization tasks cannot readily benefit from pre-trained language models, which typically have input length limitations. In this work, we take advantage of the intuition that the topical structure of meetings tends to correlate with the meeting agendas. Inspired by this phenomenon, we propose a dynamic sliding window strategy to elegantly decompose the long source content of meetings to smaller contextualized semantic chunks for more resourceful modeling, and propose two methods without additional trainable parameters for context boundary prediction. Experimental results show that the proposed framework achieves state-of-the-art abstractive summarization performance on the AMI corpus and obtains higher factual consistency on competitive baselines.