In this work, we investigate cross-lingual BN features in hybrid ASR systems under a two-level DNNs framework. The first-level DNNs are bottleneck feature extractors and the second-level DNNs serve as not only acoustic models but also feature combination modules. Different feature configurations, including the bottleneck dimensionality, the need of delta processing and the necessity of concatenation with standard features of target-language, are first studied. Further experiments are done to evaluate the cross-lingual generalization in a more holistic manner using optimized features. We then analyze the effects of adding more training data on the BN feature extractors. Performance improvement can be obtained when more data available. Finally, two different approaches of utilizing data from non-target languages are experimentally compared. It is shown that these two approaches have similar performance with each other, and the two-level DNNs architecture benefits from either of them.