Jacob Chanyeol Choi
Jun 17, 2024
“Improving Multi-lingual Alignment Through Soft Contrastive Learning,” has been accepted for presentation at NAACL 2024
We are excited to announce that our paper, “Improving Multi-lingual Alignment Through Soft Contrastive Learning,” has been accepted for presentation at NAACL 2024. This research introduces a novel approach to enhance cross-lingual performance in various applications.
About the Research
Our work focuses on improving multi-lingual sentence representations by:
Aligning Multi-lingual Embeddings: Utilizing sentence similarity measured by a pre-trained mono-lingual embedding model.
Soft Contrastive Learning: Training a multi-lingual model where the similarity between cross-lingual embeddings aligns with the mono-lingual teacher model’s similarity scores.
Experimental Results: Demonstrating that our soft-label contrastive loss significantly outperforms traditional hard-label methods in bitext mining and STS tasks across five languages.
Key Benefits & Highlights
Our approach surpasses existing multi-lingual embedding techniques, including LaBSE, especially on the Tatoeba dataset. This progress is crucial for applications needing accurate and efficient multi-lingual text processing, such as translation, information retrieval, and cross-lingual comprehension.
Acknowledgements
We extend our deepest gratitude to all co-authors and collaborators for their hard work and contributions to this remarkable research: Minsu Park, Seyeon Choi, Jun-Seong Kim, Jy-yong Sohn.
For more details, please visit our paper and access our code on GitHub.