Improving Multi-lingual Embeddings Through Soft Contrastive Learning Accepted at NAACL 2024

Improving Multi-lingual Embeddings Through Soft Contrastive Learning Accepted at NAACL 2024

Improving Multi-lingual Embeddings Through Soft Contrastive Learning Accepted at NAACL 2024

Jacob Choi

Jun 17, 2024

“Improving Multi-lingual Alignment Through Soft Contrastive Learning,” has been accepted for presentation at NAACL 2024

We are excited to announce that our paper, “Improving Multi-lingual Alignment Through Soft Contrastive Learning,” has been accepted for presentation at NAACL 2024. This research introduces a novel approach to enhance cross-lingual performance in various applications.


About the Research

Our work focuses on improving multi-lingual sentence representations by:

  • Aligning Multi-lingual Embeddings: Utilizing sentence similarity measured by a pre-trained mono-lingual embedding model.

  • Soft Contrastive Learning: Training a multi-lingual model where the similarity between cross-lingual embeddings aligns with the mono-lingual teacher model’s similarity scores.

  • Experimental Results: Demonstrating that our soft-label contrastive loss significantly outperforms traditional hard-label methods in bitext mining and STS tasks across five languages.


Key Benefits & Highlights

Our approach surpasses existing multi-lingual embedding techniques, including LaBSE, especially on the Tatoeba dataset. This progress is crucial for applications needing accurate and efficient multi-lingual text processing, such as translation, information retrieval, and cross-lingual comprehension.


Acknowledgements

We extend our deepest gratitude to all co-authors and collaborators for their hard work and contributions to this remarkable research: Minsu Park, Seyeon Choi, Jun-Seong Kim, Jy-yong Sohn.

For more details, please visit our paper and access our code on GitHub.