Code-switching (CS) is the phenomenon by which multilingual speakers
switch back and forth between their common languages in written or
spoken communication. CS may occur at the inter-utterance, intra-utterance
(mixing of words from multiple languages in the same utterance) and
even morphological (mixing of morphemes from different languages) levels.
CS presents serious challenges for language technologies such as Automatic
Speech Recognition, Language Modeling, Parsing, Machine Translation
(MT), Information Retrieval (IR) and Extraction (IE), Keyword Search,
and semantic processing. A prime example of this is acoustic modeling
and language modeling in automatic speech recognition (ASR): techniques
trained on one language quickly break down when there is mixed language
input. The lack of basic tools such as language models, part-of-speech
(POS) taggers and parsers trained on such mixed language data makes
downstream tasks even more challenging. Even for problems that are
largely considered solved for monolingual corpora, such as Language
Identification, or POS Tagging, performance degrades at a rate proportional
to the amount and level of mixed-language present in the data.
This special event
is to bring together researchers interested in solving the CS problem,
to raise community awareness of the (limited) resources available and
the work currently underway for the study of CS, with particular emphasis
on work in the speech community. The format will consist of a short
introduction from the organizers followed by discussion. We held a
workshop in CS in conjunction with EMNLP 2014, developing a shared
text-based task for this purpose. We received 18 regular workshop submissions
and accepted 8. The goal of this event is to engage the speech processing
community now working in this area and to encourage new research by
those now working primarily with monolingual corpora.
We will solicit participation
from researchers working in speech processing for the analysis and/processing
of CS data. Topics of relevance to the event will include the following: