Recent text-to-speech (TTS) models have synthesized remarkably natural speech for code-mixed TTS as well as cross-lingual TTS. However, code-mixed texts are synthesized with unnatural accents for each word because speaker-related features can include linguistic features from the speaker's source language. To solve the problems, we propose ClariTTS, which synthesizes speech with appropriate accents for the language of each word in code-mixed texts. Specifically, we propose feature-ratio normalized affine coupling layer in the flow-based TTS model, which disentangles speaker and linguistic features to prevent the accent of the target speaker's source language from being included in the target language. Furthermore, we introduce a duration stabilization training objectives to ensure stable duration prediction in code-mixed TTS. From the experimental results, we demonstrate that ClariTTS reliably generates code-mixed speech with clear pronunciation while preserving speaker identity.