Voice faking poses significant societal challenges. Currently, the prevailing assumption is that unaltered human speech can always be considered genuine, while fake speech usually comes from text-to-speech (TTS) synthesis. We argue that this type of binary distinction is oversimplified. For instance, altered playback speeds can maliciously deceive listeners, as in the `Drunken Nancy Pelosi' incident. Similarly, editing of audio clips can be done ethically, e.g. for brevity or summarization in news reporting or podcasts, but editing can also create misleading narratives. In this paper, we propose a conceptual shift away from the longstanding binary paradigm of speech audio being either `fake' or `real'. Instead, we focus on pinpointing `voice edits', which encompass traditional modifications like filters and cuts, as well as neural synthesis. We delineate six categories of voice edits and curate a new challenge dataset, for which we present baseline voice edit detection systems.