ToBI is a prosody labeling system that transcribes American English prosody in terms of phonological tones and break indices. Previous works on automatic ToBI transcription require additional information such as word boundaries and use modular feature extraction with separately optimized feature detectors and classifiers. We are interested in investigating if a neural network-based approach would also result in high performance on automatic ToBI transcription without additional information. In this paper, we investigate the problem of pitch accent detection and prosody boundary detection using the Wav2vec 2.0 model with only acoustic information. Our model is trained on the Boston University Radio News Corpus and evaluated on both the Boston University Radio News Corpus and the Boston Directions Corpus. We show that it achieves an F1 score of 0.82 on pitch accent detection and 0.86 on phrase boundary detection. Code and model weights are available.