Understanding medical conversations requires detecting entities such as Medications, Symptoms, Treatment, Conditions and Diagnosis, which leads to large ontologies with overlapping spans.Popular solutions to Named Entity Recognition (NER) such as conditional random fields, sequence-to-sequence models, or the question-answering framework are not suitable for this task. We address this problem by proposing a new model for NER task -- an RNN transducer, which has hitherto been used only in speech recognition. These models are trained using paired input and output sequences without explicitly specifying the alignment between them, similar to other seq-to-seq models. In NER tasks, however, the alignment between words and labels are available from the human annotations. We propose a fixed alignment model that utilizes the given alignment, while preserving the benefits of RNN-Ts such as modeling output dependencies. We also propose a constrained alignment model where users can specify a relaxation and the model will learn an alignment within the given constraints. In other words, we propose a family of seq-to-seq models which can leverage alignments between input and target sequences when available. Through empirical experiments on a challenging real-world medical NER task with multiple ontologies, we demonstrate that our fixed alignment model outperforms the standard RNN-T model.