- Unified Vision-Language Pre-Training for Image Captioning and VQA. AAAI 2020. [PDF] [github repo]
- Few-Shot Image and Sentence Matching via Gated Visual-Semantic Embedding. AAAI 2019. [PDF]
- Multi-Level Visual-Semantic Alignments with Relation-Wise Dual Attention Network for Image and Text Matching. IJCAI 2019. [PDF]
- Knowledge Aware Semantic Concept Expansion for Image-Text Matching. IJCAI 2019. [PDF]
- Multi-Level Visual-Semantic Alignments with Relation-Wise Dual Attention Network for Image and Text Matching. IJCAI 2019. [PDF]
- Position Focused Attention Network for Image-Text Matching. IJCAI 2019. [PDF]
- Neural Compatibility Ranking for Text-based Fashion Matching. SIGIR 2019. [PDF]
- Prototype-guided Attribute-wise Interpretable Scheme for Clothing Matching. SIGIR 2019. [PDF]
- HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs [PDF]
- Composing Text and Image for Image Retrieval - An Empirical Odyssey. CVPR 2019. [PDF]
- Integrating Text and Image Determining Multimodal Document Intent in Instagram Posts. EMNLP 2019. [PDF]
- VrR-VG Refocusing Visually-Relevant Relationships. ICCV 2019. [PDF]
- Grounded compositional semantics for finding and describing images with sentences. ACL 2014. [PDF]
- Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation. MM 2019. [[PDF(https://dl.acm.org/citation.cfm?id=3351053)]]