This paper wants to discuss several aspects of multimodal/multimedia language resources such as the use of metadata descriptions for easy location purposes, their collaborative annotation and exploitation via Internet, the generation of synchronized media and text streams in distributed environments, and general annotation formats. These aspects that although they may be discussed independently have to fit together seamlessly to offer users an adequate exploitation environment that is up to the huge amount of data that is available in modern multi-media corpora and is able to exploit fully the current technology advancements.