Join the Waitlist

Improving Multi-lingual Embeddings Through Soft Contrastive Learning Accepted at NAACL 2024

Jacob Chanyeol Choi

Jun 17, 2024

Request API Access

“Improving Multi-lingual Alignment Through Soft Contrastive Learning,” has been accepted for presentation at NAACL 2024

We are excited to announce that our paper, “Improving Multi-lingual Alignment Through Soft Contrastive Learning,” has been accepted for presentation at NAACL 2024. This research introduces a novel approach to enhance cross-lingual performance in various applications.

About the Research

Our work focuses on improving multi-lingual sentence representations by:

Aligning Multi-lingual Embeddings: Utilizing sentence similarity measured by a pre-trained mono-lingual embedding model.
Soft Contrastive Learning: Training a multi-lingual model where the similarity between cross-lingual embeddings aligns with the mono-lingual teacher model’s similarity scores.
Experimental Results: Demonstrating that our soft-label contrastive loss significantly outperforms traditional hard-label methods in bitext mining and STS tasks across five languages.

Key Benefits & Highlights

Our approach surpasses existing multi-lingual embedding techniques, including LaBSE, especially on the Tatoeba dataset. This progress is crucial for applications needing accurate and efficient multi-lingual text processing, such as translation, information retrieval, and cross-lingual comprehension.

Acknowledgements

We extend our deepest gratitude to all co-authors and collaborators for their hard work and contributions to this remarkable research: Minsu Park, Seyeon Choi, Jun-Seong Kim, Jy-yong Sohn.

For more details, please visit our paper and access our code on GitHub.

Request API Access

About us

Blog

@ 2025 Wecover Platforms Inc

Disclaimer

About us

Blog

@ 2025 Wecover Platforms Inc

Disclaimer

About us

Blog

@ 2025 Wecover Platforms Inc

Disclaimer