Inter-sentence pauses are silences occurring between sentences in a paragraph or dialogue. They are an important aspect of long-form speech prosody, as they can affect the naturalness and effectiveness of communication. When evaluating the output of long-form speech synthesis systems, it is crucial to understand the sensitivity of commonly used tests to variations in inter-sentence pause durations, as this sensitivity impacts the usefulness of such evaluations. However, perception of inter-sentence pauses in long-form speech synthesis is not well understood. Previous work often evaluates pause modelling in conjunction with other prosodic features making it hard to explicitly study how differences in inter-sentence pause lengths are perceived. To fill this gap, we investigate the sensitivity of subjective listening tests to changes to the durations of inter-sentence pauses in long-form speech, by comparing ground truth audio samples with renditions that have manipulated pause durations. Using multiple datasets to cover a variety of domains, we find that listening tests are not sensitive to variations in pause lengths unless these deviate from the norm exceedingly. Our evaluation experiments in this study can be considered preliminary work, the findings of which will have implications for evaluation experiments run on actual synthesized long-form speech.