The huge success of large language models has highlighted the relationship between human language acquisition and statistical learning ability: it appears that humans, too, glean lexical rules through predictive probabilistic statistics. To validate this hypothesis, we conducted a EEG study comparing the auditory responses of language-specific cortical regions to speech streams containing novel tri-syllable words and random syllable sequences. Distinctive time-frequency patterns were detected with a lexical learning effect, especially in the primary auditory cortex of the left hemisphere. Besides, the middle temporal gyrus was found to be instrumental in the initial segmenting of speech streams, while the anterior temporal lobe was more prominent in orchestrating the top-down prediction based on established word structures. These findings reinforce the assertion that humans perceive and segment words by internalizing rule-based combinations in a predictive and statistical manner.