Engineering, Company News

Engineering, Company News

Improving Multi-lingual Embeddings Through Soft Contrastive Learning Accepted at NAACL 2024

Improving Multi-lingual Embeddings Through Soft Contrastive Learning Accepted at NAACL 2024

Jun 17, 2024

Jun 17, 2024

Jacob Chanyeol Choi

Jacob Chanyeol Choi

Conference badge for NAACL 2024 paper on multilingual AI embeddings
Conference badge for NAACL 2024 paper on multilingual AI embeddings
Conference badge for NAACL 2024 paper on multilingual AI embeddings

“Improving Multi-lingual Alignment Through Soft Contrastive Learning,” has been accepted for presentation at NAACL 2024

We are excited to announce that our paper, “Improving Multi-lingual Alignment Through Soft Contrastive Learning,” has been accepted for presentation at NAACL 2024. This research introduces a novel approach to enhance cross-lingual performance in various applications.


About the Research

Our work focuses on improving multi-lingual sentence representations by:

  • Aligning Multi-lingual Embeddings: Utilizing sentence similarity measured by a pre-trained mono-lingual embedding model.

  • Soft Contrastive Learning: Training a multi-lingual model where the similarity between cross-lingual embeddings aligns with the mono-lingual teacher model’s similarity scores.

  • Experimental Results: Demonstrating that our soft-label contrastive loss significantly outperforms traditional hard-label methods in bitext mining and STS tasks across five languages.


Key Benefits & Highlights

Our approach surpasses existing multi-lingual embedding techniques, including LaBSE, especially on the Tatoeba dataset. This progress is crucial for applications needing accurate and efficient multi-lingual text processing, such as translation, information retrieval, and cross-lingual comprehension.


Acknowledgements

We extend our deepest gratitude to all co-authors and collaborators for their hard work and contributions to this remarkable research: Minsu Park, Seyeon Choi, Jun-Seong Kim, Jy-yong Sohn.

For more details, please visit our paper and access our code on GitHub.