Peformance Comparison of Cross-modal Retrieval
Peformance of Commonly-used Datasets
(* indicates Ensemble models, ^ indicates questionable authen)
Method_name Concise_note
Sentence retrieval Image retrieval
R@1 R@5 R@10 R@1 R@5 R@10
DeViSE RCNN 4.8 16.5 27.3 5.9 20.1 29.6
SDT-RNN AlexNet 4.5 18.0 28.6 6.1 18.5 29.0
SDT-RNN RCNN 6.0 22.7 34.0 6.6 21.6 31.7
DeFrag AlexNet 5.9 19.2 27.3 5.2 17.6 26.5
DeFrag RCNN 12.6 32.9 44.0 9.7 29.6 42.5
m-RNN AlexNet 14.5 37.2 48.5 11.5 31.0 42.4
DVSA DepTree 14.8 37.9 50.0 11.6 31.4 43.8
DVSA RCNN 16.5 40.6 54.2 11.8 32.1 44.7
UVSE AlexNet 13.5 36.2 45.7 10.4 31.0 43.7
UVSE VggNet 18.0 40.9 55.0 12.5 37.0 51.5
NIC GoogleNet 20 -- 61 19 -- 64
m-CNN* OverFeat 14.9 35.9 49.0 11.8 34.5 48.0
m-CNN* VggNet 24.8 53.7 67.1 20.3 47.6 61.7
HM-LSTM RCNN 27.7 -- 68.6 24.4 -- 68.1
SPE VggNet 30.1 60.4 73.7 23.0 51.3 64.8
FV GMM+HGLMM 31.0 59.3 73.7 21.2 50.0 64.8
MFM VggNet 35.6 67.0 78.6 28.4 58.5 72.3
NAA ResNet 37.2 68.1 79.1 27.7 59.6 71.8
ITMeetsAL MobileNet 30.9 58.6 70.8 -- -- --
ITMeetsAL ResNet 40.1 67.8 79.2 -- -- --
2WayNet VggNet 43.4 63.2 -- 29.3 49.7 --
SCAN* BUTD 52.2 81.0 89.2 38.3 67.8 78.9
IMRAM BUTD, Image 48.5 78.1 85.3 32.0 61.4 73.9
IMRAM BUTD, Text 52.1 81.5 90.1 40.2 69.0 79.2
IMRAM BUTD, Full 54.7 84.2 91.0 41.0 69.2 79.9
Method_name Concise_note
Sentence retrieval Image retrieval
R@1 R@5 R@10 R@1 R@5 R@10
DeViSE RCNN 4.5 18.1 29.2 6.7 21.9 32.7
SDT-RNN RCNN 9.6 29.8 41.1 8.9 29.8 41.1
DeFrag RCNN 14.2 37.7 51.3 10.2 30.8 44.2
DeFrag ftRCNN 16.4 40.2 54.7 10.3 31.4 44.5
DCCA AlexNet 16.7 39.3 52.9 12.6 31.0 43.0
NIC GoogleNet 17 -- 56 17 -- 57
DVSA DepTree 20.0 46.6 59.4 15.0 36.5 48.2
DVSA RCNN 22.2 48.2 61.4 15.2 37.7 50.5
UVSE AlexNet 14.8 39.2 50.9 11.8 34.0 46.3
UVSE VggNet 23.0 50.7 62.9 16.8 42.0 56.5
LRCN VggNet 23.6 46.6 58.3 17.5 40.3 50.8
m-CNN* OverFeat 20.1 44.2 56.3 15.9 40.3 51.9
m-CNN* VggNet 33.6 64.1 74.9 26.2 56.3 69.6
m-RNN AlexNet 18.4 40.2 50.9 12.6 31.2 41.5
m-RNN VggNet 35.4 63.8 73.7 22.8 50.7 63.1
FV GMM+HGLMM 35.0 62.0 73.8 25.0 52.7 66.0
HM-LSTM RCNN 38.1 -- 76.5 27.7 -- 68.8
SPE VggNet 40.3 68.9 79.9 29.7 60.1 72.1
sm-LSTM VggNet 42.4 67.5 79.9 28.2 57.0 68.4
sm-LSTM* VggNet 42.5 71.9 81.5 30.2 60.4 72.3
CSE ResNet 44.6 74.3 83.8 36.9 69.1 79.6
MDM VggNet 44.9 75.4 84.4 34.4 67.0 77.7
RRF-Net ResNet 47.6 77.4 87.1 35.4 68.3 79.9
CMPL MobileNet 40.3 66.9 76.7 30.4 58.2 68.5
CMPL ResNet 49.6 76.8 86.1 37.3 65.7 75.5
2WayNet VggNet 49.8 67.5 -- 36.0 55.6 --
MFM VggNet 50.2 78.1 86.7 38.2 70.1 80.2
VSE++ VggNet 41.3 69.1 77.9 31.4 60.0 71.2
VSE++ ResNet 52.9 80.5 87.2 39.6 70.1 79.5
TIMAM ResNet, Bert 53.1 78.8 87.6 42.6 71.6 81.9
TERN BUTD, Bert 53.2 79.4 86.0 41.1 71.9 81.2
DAN VggNet 41.4 73.5 82.5 31.8 61.7 72.5
DAN ResNet 55.0 81.8 89.0 39.4 69.2 79.1
NAA ResNet 55.1 80.3 89.6 39.4 68.8 79.9
SCO VggNet 44.2 74.1 83.6 32.8 64.3 74.9
SCO ResNet 55.5 82.0 89.3 41.1 70.5 80.1
Dual-Path VggNet 47.6 77.3 87.1 35.3 66.6 78.2
Dual-Path ResNet 55.6 81.9 89.5 39.1 69.2 80.9
ITMeetsAL VggNet 38.5 66.5 76.3 30.7 59.4 70.3
ITMeetsAL MobileNet 46.6 73.5 82.5 34.4 63.3 74.2
ITMeetsAL ResNet 56.5 82.2 89.6 43.5 71.8 80.2
CVSE++ ResNet 56.6 82.5 90.2 42.4 71.6 80.8
GXN ResNet 56.8 -- 89.6 41.5 -- 80.1
SMAN ResNet, Random 56.9 84.8 91.9 43.2 73.3 83.5
SMAN ResNet, Glove 57.3 85.3 92.2 43.4 73.7 83.4
M3A ResNet 58.1 82.8 90.1 44.7 72.4 81.1
Align2Ground BUTD -- -- -- 49.7 74.8 83.3
A3VSE BUTD 65.0 89.2 94.5 49.5 79.5 86.6
DXR ResNet, Bert 65.1 87.3 92.6 50.6 78.8 86.7
MTFN BUTD 63.1 85.8 92.4 46.3 75.3 83.6
MTFN BUTD, RR_no_STT 65.3 88.3 93.3 46.7 75.9 83.8
MTFN BUTD, RR_STT 65.3 88.3 93.3 52.0 80.1 86.1
R-SCAN BUTD, VrR-VG 66.3 90.6 96.0 51.4 77.8 84.9
SAVE ResNet 67.2 88.3 94.2 49.8 78.7 86.2
SCAN BUTD, T2I_AVE 61.8 87.5 93.7 45.8 74.4 83.0
SCAN BUTD, I2T_AVE 67.9 89.0 94.4 43.9 74.2 82.8
SCAN* BUTD, AVE+LSE 67.4 90.3 95.8 48.6 77.7 85.2
BFAN BUTD, prob 65.5 89.4 -- 47.9 77.6 --
BFAN BUTD, equal 64.5 89.7 -- 48.8 77.3 --
BFAN* BUTD 68.1 91.4 -- 50.8 78.4 --
CAMP BUTD 68.1 89.7 95.2 51.5 77.1 85.3
RDAN BUTD 68.1 91.0 95.9 54.1 80.9 87.2
GSLS ResNet, BUTD 68.2 89.1 94.5 43.4 73.5 82.5
Personality ResNeXt, Transformer 68.4 90.6 95.3 -- -- --
CASC ResNet 68.5 90.6 95.9 50.2 78.3 86.3
GVSE* BUTD 68.5 90.9 95.5 50.6 79.8 87.6
HAL SCAN_I2T 68.6 89.9 94.7 46.0 74.0 82.3
OAN BUTD 68.6 93.0 96.0 53.3 80.1 87.1
SAEM BUTD, Bert 69.1 91.0 95.1 52.4 81.1 88.1
MPL SCAN_I2T 69.4 89.9 95.4 47.5 75.5 83.1
LIWE BUTD, CLMR 64.0 88.3 93.3 46.8 76.4 84.5
LIWE BUTD, -Glove 66.4 88.9 94.1 47.5 76.2 84.9
LIWE BUTD, +Glove 69.6 90.3 95.6 51.2 80.4 87.2
PFAN BUTD, T2I 66.0 89.6 94.3 49.6 77.0 84.2
PFAN BUTD, I2T 67.6 90.0 93.8 45.7 74.7 83.6
PFAN* BUTD 70.0 91.8 95.0 50.4 78.7 86.1
PFAN++* BUTD 70.1 91.8 96.1 52.7 79.9 87.0
CAAN BUTD 70.1 91.6 97.2 52.8 79.0 87.9
DP-RNN BUTD 70.2 91.6 95.8 55.5 81.3 88.2
TERAN BUTD, Bert 70.8 90.9 95.5 56.5 81.2 88.2
HOAD BUTD 70.8 92.7 96.0 59.5 85.6 91.0
HOAD BUTD, +Dist 70.8 92.7 96.0 60.9 86.1 91.0
GOT SCAN_I2T 70.9 92.8 95.5 50.7 78.7 86.2
LGSGM BUTD 71.0 91.9 96.1 57.4 84.1 90.2
VSRN BUTD 70.4 89.2 93.7 53.0 77.9 85.7
VSRN* BUTD 71.3 90.6 96.0 54.7 81.8 88.2
SCG VggNet, Prod 57.2 85.1 92.1 40.1 69.5 79.5
SCG VggNet, Gated 71.8 90.8 94.8 49.3 76.4 85.6
SGM BUTD 71.8 91.7 95.5 53.5 79.6 86.5
Meta-SPN* BFAN* 72.5 93.2 96.7 53.3 80.2 87.2
CSCC BUTD, +GloVe 72.7 93.4 96.5 61.2 86.7 91.5
ADDR* BUTD, BFAN 71.3 91.5 96.4 54.0 80.0 87.6
ADDR* BUTD, SCAN 72.1 93.1 96.1 53.5 80.4 87.4
ADDR* BUTD, VSRN 73.0 92.5 96.6 55.6 82.0 88.9
AOQ* BUTD, SCAN 70.3 92.0 95.5 50.0 79.2 86.2
AOQ* BUTD, VSRN 72.8 91.8 95.8 55.3 82.2 88.4
AOQ* BUTD, BFAN 73.2 94.5 97.0 54.0 80.3 87.7
CVSE BUTD 73.5 92.1 95.8 52.9 80.4 87.8
SMFEA BUTD 73.7 92.5 96.1 54.7 82.1 88.4
IMRAM BUTD, Image 67.0 90.5 95.6 51.2 78.2 85.5
IMRAM BUTD, Text 68.8 91.6 96.0 53.0 79.0 87.1
IMRAM BUTD, Full 74.1 93.0 96.6 53.9 79.4 87.2
HAN BUTD 74.1 92.4 96.4 54.8 81.1 87.4
MMCA BUTD, Bert 74.2 92.8 96.4 54.8 81.4 87.8
SHAN BUTD, T2I 72.5 92.3 95.8 53.6 78.6 85.5
SHAN BUTD, I2T 70.6 91.7 95.5 50.5 77.1 85.2
SHAN BUTD, Full 74.6 93.5 96.9 55.3 81.3 88.4
WCGL BUTD 74.8 93.3 96.8 54.8 80.6 87.5
CCRS* BUTD, SCAN 70.1 92.0 96.0 52.3 79.9 86.8
CCRS* BUTD, BFAN 75.3 93.6 96.7 55.4 81.3 87.7
SSAMT BUTD, Bert 75.4 92.6 96.4 54.8 81.5 88.0
SAN^ VggNet 67.0 88.0 94.6 51.4 77.2 85.2
SAN^ ResNet 75.5 92.6 96.2 60.1 84.7 90.6
SAM VSRN 68.4 89.7 94.8 52.4 78.7 86.6
SAM^ CVSE^ 70.0 89.2 93.1 55.0 82.6 89.0
SAM SGR 75.9 92.4 96.6 57.6 83.1 89.7
GSMN BUTD, sparse 71.4 92.0 96.1 53.9 79.7 87.1
GSMN BUTD, dense 72.6 93.5 96.8 53.7 80.0 87.0
GSMN* BUTD 76.4 94.3 97.3 57.4 82.3 89.0
ADAPT BUTD, I2T 70.2 90.8 95.8 55.5 82.7 89.8
ADAPT BUTD, T2I 73.6 93.7 96.7 57.0 83.6 90.3
ADAPT* BUTD, +GloVe 76.6 95.4 97.6 60.7 86.6 92.0
SGRAF BUTD, SAF 73.7 93.3 96.3 56.1 81.5 88.0
SGRAF BUTD, SGR 75.2 93.3 96.6 56.2 81.0 86.5
SGRAF* BUTD 77.8 94.1 97.4 58.5 83.0 88.8
DSRAN BUTD, GRU 72.6 93.6 96.3 56.3 84.0 89.8
DSRAN BUTD, Bert 75.3 94.4 97.6 57.3 84.8 90.9
DSRAN* BUTD, GRU 74.9 94.5 97.0 58.6 85.8 91.3
DSRAN* BUTD, Bert 77.8 95.1 97.6 59.2 86.0 91.9
CAMERA BUTD, Bert 76.5 95.1 97.2 58.9 84.7 90.2
CAMERA* BUTD, Bert 78.0 95.1 97.9 60.3 85.9 91.7
CAEMCL BUTD 76.3 93.2 96.5 57.0 82.1 88.5
CAEMCL* BUTD 78.7 94.5 97.9 58.2 83.6 89.6
T-EMDE BUTD, SAF 75.2 94.2 97.1 57.1 82.2 88.3
T-EMDE BUTD, SGR 77.5 93.1 97.2 56.9 82.0 87.5
T-EMDE* BUTD, SGRAF 78.8 94.4 97.5 59.6 83.6 89.2
DIME BUTD, I2T, Bert 77.4 95.0 97.4 60.1 85.5 91.8
DIME BUTD, T2I, Bert 77.5 93.5 97.5 59.1 85.5 91.0
DIME* BUTD, Bert 81.0 95.9 98.4 63.6 88.1 93.0
PG* BUTD, +3loss 81.0 94.5 97.1 60.6 86.5 92.4
PG* BUTD, +GloVe 82.8 95.9 97.9 62.2 89.3 93.8
ACMM BUTD 80.0 95.5 98.2 50.2 76.8 84.7
ACMM* BUTD 85.2 96.7 98.4 53.8 79.8 86.8
GPO IN, BiGRU 77.1 94.5 97.1 58.5 84.1 89.6
GPO* IN+VG, BiGRU 80.7 96.4 98.3 60.8 86.3 92.3
GPO* IN+VG, Bert 85.3 97.2 98.9 66.7 89.9 94.0
GPO* WSL, Bert 88.7 98.9 99.8 76.1 94.5 97.1
Method_name Concise_note
Sentence retrieval Image retrieval
R@1 R@5 R@10 R@1 R@5 R@10
STV combine-skip 33.8 67.7 82.1 25.9 60.0 74.6
DVSA RCNN 38.4 69.9 80.5 27.4 60.2 74.8
FV GMM+HGLMM 39.4 67.9 80.9 25.1 59.8 76.6
m-RNN VggNet 41.0 73.0 83.5 29.0 42.2 77.0
m-CNN* VggNet 42.8 73.1 84.1 32.6 68.6 82.8
UVSE VggNet 43.4 75.7 85.8 31.0 66.7 79.9
HM-LSTM RCNN 43.9 -- 87.8 36.1 -- 86.7
Order-emb VggNet 46.7 -- 88.9 37.9 -- 85.9
SPE VggNet 50.1 79.7 89.2 39.6 75.2 86.9
SEAM VggNet 50.7 81.4 90.9 40.3 75.7 87.4
sm-LSTM VggNet 52.4 81.7 90.8 38.6 73.4 84.6
sm-LSTM* VggNet 53.2 83.1 91.5 40.7 75.8 87.4
CMPL MobileNet 52.9 83.8 92.1 41.3 74.6 85.9
MDM VggNet 54.7 84.1 91.9 44.6 79.6 90.5
2WayNet VggNet 55.8 75.2 -- 39.7 63.3 --
CMPM ResNet 56.1 86.3 92.9 44.6 78.8 89.0
CSE ResNet 56.3 84.4 92.2 45.7 81.2 90.6
RRF-Net ResNet 56.4 85.3 91.5 43.9 78.1 88.6
ITMeetsAL VggNet 44.2 76.1 86.3 37.1 72.7 85.1
ITMeetsAL MobileNet 54.7 84.3 91.1 41.0 76.7 88.1
ITMeetsAL ResNet 58.5 85.3 92.1 48.3 82.0 90.6
MFM VggNet 58.9 86.3 92.4 47.7 81.0 90.9
CHAIN-VSE VggNet 51.6 82.0 91.3 38.6 75.1 87.2
CHAIN-VSE ResNet 59.4 88.0 94.2 43.5 79.8 90.2
NAA ResNet 61.3 87.9 95.4 47.0 80.8 90.1
TERN BUTD, Bert 63.7 90.5 96.2 51.9 85.6 93.6
VSE++ VggNet 57.2 86.0 93.3 45.9 79.4 89.1
VSE++ ResNet 64.6 90.0 95.7 52.0 84.3 92.0
Dual-Path VggNet 59.4 86.2 92.9 41.6 76.3 87.5
Dual-Path ResNet 65.6 89.8 95.5 47.1 79.9 90.0
DXR ResNet, Bert 67.0 93.0 97.6 56.8 88.2 94.9
Personality ResNeXt, Transformer 67.3 91.7 96.5 -- -- --
Align2Ground BUTD -- -- -- 56.6 84.9 92.8
SMAN ResNet, Random 67.9 90.6 96.2 58.8 87.0 93.7
SMAN ResNet, Glove 68.4 91.3 96.6 58.5 87.4 93.5
GXN ResNet 68.5 -- 97.9 56.6 -- 94.5
GSLS ResNet, BUTD 68.9 94.1 98.0 58.6 88.2 94.9
CVSE++ ResNet 69.1 92.2 96.1 55.6 86.7 93.8
PVSE ResNet 69.2 91.6 96.6 55.2 86.5 93.7
DSVE-Loc ResNet 69.8 91.9 96.6 55.9 86.9 94.0
SCO VggNet 66.6 91.8 96.6 55.5 86.6 93.8
SCO ResNet 69.9 92.9 97.5 56.7 87.5 94.8
R-SCAN BUTD, VrR-VG 70.3 94.5 98.1 57.6 87.3 93.7
M3A ResNet 70.4 91.7 96.8 58.4 87.1 94.0
SAVE ResNet 70.8 93.2 97.6 56.9 87.6 94.4
MPL SCAN_I2T 71.1 93.7 98.2 56.8 86.7 93.0
SAEM BUTD, Bert 71.2 94.1 97.7 57.8 88.6 94.9
SoDeep DSVE-Loc 71.5 92.8 97.1 56.2 87.0 94.3
OAN BUTD 71.7 96.4 99.3 60.2 88.6 94.5
GVSE* BUTD 72.2 94.1 98.1 60.5 89.4 95.8
CAMP BUTD 72.3 94.8 98.3 58.5 87.9 95.0
CASC ResNet 72.3 96.0 99.0 58.9 89.8 96.0
SCAN BUTD, T2I_AVE 70.9 94.5 97.8 56.4 87.0 93.9
SCAN BUTD, I2T_AVE 69.2 93.2 97.5 54.4 86.0 93.6
SCAN* BUTD, LSE+AVE 72.7 94.8 98.4 58.8 88.4 94.8
LIWE BUTD, -Glove 69.6 93.9 98.0 55.5 87.3 94.2
LIWE BUTD, CLMR 71.8 93.1 97.6 56.2 87.5 94.2
LIWE BUTD, +Glove 73.2 95.5 98.2 57.9 88.3 94.5
SGM BUTD 73.4 93.8 97.8 57.5 87.3 94.3
ParNet BUTD, NP 72.8 94.9 97.9 57.9 87.4 94.0
ParNet BUTD, P 73.5 94.5 98.3 58.3 88.2 94.1
MTFN BUTD 71.9 94.2 97.9 57.3 88.6 95.0
MTFN BUTD, RR_no_STT 74.3 94.9 97.9 57.5 88.8 95.0
MTFN BUTD, RR_STT 74.3 94.9 97.9 60.1 89.1 95.0
Meta-SPN BFAN, equal 74.4 95.0 98.3 58.6 87.6 94.3
RDAN BUTD 74.6 96.2 98.7 61.6 89.2 94.7
CVSE BUTD 74.8 95.1 98.3 59.9 89.4 95.2
MMCA BUTD, Bert 74.8 95.6 97.7 61.6 89.8 95.2
BFAN BUTD, prob 73.0 94.8 -- 58.0 87.6 --
BFAN BUTD, equal 73.7 94.9 -- 58.3 87.5 --
BFAN* BUTD 74.9 95.2 -- 59.4 88.4 --
SMFEA BUTD 75.1 95.4 98.3 62.5 90.1 96.2
DP-RNN BUTD 75.3 95.8 98.6 62.5 89.7 95.1
CCRS* BUTD, SCAN 70.9 94.3 98.0 57.3 87.6 94.3
CCRS* BUTD, BFAN 75.4 95.3 98.5 60.3 88.6 94.6
WCGL BUTD 75.4 95.5 98.6 60.8 89.3 95.3
CAAN BUTD 75.5 95.4 98.5 61.3 89.7 95.2
VSRN BUTD 74.0 94.3 97.8 60.8 88.4 94.1
VSRN* BUTD 76.2 94.8 98.2 62.8 89.7 95.1
ADAPT BUTD, I2T 74.5 94.2 97.9 62.0 90.4 95.5
ADAPT BUTD, T2I 75.3 95.1 98.4 63.3 90.0 95.5
ADAPT* BUTD 76.5 95.6 98.9 62.2 90.5 96.0
PFAN BUTD, T2I 75.8 95.9 99.0 61.0 89.1 95.1
PFAN BUTD, I2T 70.7 94.1 97.8 53.0 84.5 92.6
PFAN* BUTD 76.5 96.3 99.0 61.6 89.6 95.2
SCG VggNet, Prod 73.4 94.8 97.6 56.3 85.6 93.5
SCG VggNet, Gated 76.6 96.3 99.2 61.4 88.9 95.1
IMRAM BUTD, Image 76.1 95.3 98.2 61.0 88.6 94.5
IMRAM BUTD, Text 74.0 95.6 98.4 60.6 88.9 94.6
IMRAM BUTD, Full 76.7 95.6 98.5 61.7 89.1 95.0
SHAN BUTD, T2I 75.9 96.1 98.7 60.7 88.2 94.2
SHAN BUTD, I2T 73.0 95.8 97.9 58.5 87.3 94.0
SHAN BUTD, Full 76.8 96.3 98.7 62.6 89.6 95.8
PFAN++* BUTD 77.1 96.5 98.3 62.5 89.9 95.4
ADDR* BUTD, SCAN 76.1 95.5 98.4 61.2 88.9 94.8
ADDR* BUTD, BFAN 76.4 95.8 98.3 62.3 89.4 96.2
ADDR* BUTD, VSRN 77.4 96.1 98.9 63.5 90.7 96.7
CAMERA BUTD, Bert 75.9 95.5 98.6 62.3 90.1 95.2
CAMERA* BUTD, Bert 77.5 96.3 98.8 63.4 90.9 95.8
AOQ* BUTD, SCAN 74.1 95.2 98.5 59.8 88.6 95.0
AOQ* BUTD, BFAN 77.3 96.0 98.5 61.2 89.2 95.0
AOQ* BUTD, VSRN 77.5 95.5 98.6 63.5 90.5 95.8
TERAN BUTD, Bert 77.7 95.9 98.6 65.0 91.2 96.4
HOAD^ BUTD 77.0 96.1 98.7 65.1 93.1 97.9
HOAD^ BUTD, +Dist 77.8 96.1 98.7 66.2 93.0 97.9
TOD-Net VSE++ 68.6 92.0 96.9 54.5 85.3 92.4
TOD-Net Bert 75.8 95.3 98.4 61.8 89.6 95.0
TOD-Net* Bert 78.1 96.0 98.6 63.6 90.6 95.8
SSAMT BUTD, Bert 78.2 95.6 98.0 62.7 89.6 95.3
HAL SCAN_I2T 78.3 96.3 98.5 60.1 86.7 92.8
DSRAN BUTD, GRU 76.3 94.9 98.4 62.4 89.7 95.2
DSRAN BUTD, Bert 77.1 95.3 98.1 62.9 89.9 95.3
DSRAN* BUTD, GRU 78.0 95.6 98.5 64.2 90.4 95.8
DSRAN* BUTD, Bert 78.3 95.7 98.4 64.5 90.8 95.8
GSMN BUTD, sparse 76.1 95.6 98.3 60.4 88.7 95.0
GSMN BUTD, dense 74.7 95.3 98.2 60.3 88.5 94.6
GSMN* BUTD 78.4 96.4 98.6 63.3 90.1 95.7
HAN BUTD 78.7 96.4 98.8 65.4 90.5 95.3
DIME BUTD, I2T, Bert 77.9 95.9 98.3 63.0 90.5 96.2
DIME BUTD, T2I, Bert 77.2 95.5 98.5 62.3 90.2 95.8
DIME* BUTD, Bert 78.8 96.3 98.7 64.8 91.5 96.5
CSCC^ BUTD, +GloVe 78.8 96.1 99.0 66.6 92.5 96.4
CAEMCL BUTD 77.6 96.4 98.8 62.2 89.8 95.8
CAEMCL* BUTD 78.9 97.5 98.8 65.7 90.2 96.6
SGRAF BUTD, SAF 76.1 95.4 98.3 61.8 89.4 95.3
SGRAF BUTD, SGR 78.0 95.8 98.2 61.4 89.3 95.4
SGRAF* BUTD 79.6 96.2 98.5 63.2 90.7 96.1
T-EMDE BUTD, SAF 78.3 95.7 98.5 62.3 89.7 95.2
T-EMDE BUTD, SGR 77.1 95.9 98.5 61.6 89.5 95.1
T-EMDE* BUTD, SGRAF 79.6 96.3 98.7 63.5 90.4 95.6
SAM VSRN 74.6 93.6 97.5 61.5 89.6 94.9
SAM^ CVSE^ 79.8 95.1 97.7 67.0 93.0 97.3
SAM SGR 80.7 97.2 98.6 63.8 90.5 95.9
ACMM BUTD 81.9 98.0 99.3 58.2 87.3 93.9
ACMM* BUTD 84.1 97.8 99.4 60.7 88.7 94.9
PG* BUTD, +GloVe 84.0 95.8 97.8 63.9 88.9 95.6
SAN^ VggNet 74.9 94.9 98.2 60.8 90.3 95.7
SAN^ ResNet 85.4 97.5 99.0 69.1 93.4 97.2
GPO IN, BiGRU 76.5 95.3 98.5 62.9 90.6 95.8
GPO* IN+VG, BiGRU 80.0 97.0 99.0 64.8 91.6 96.5
GPO* IN+VG, Bert 82.2 97.5 99.5 68.1 92.9 97.2
GPO* WSL, Bert 85.6 98.0 99.4 73.1 94.3 97.7
Method_name Concise_note
Sentence retrieval Image retrieval
R@1 R@5 R@10 R@1 R@5 R@10
DVSA RCNN 16.5 39.2 52.0 10.7 29.6 42.2
FV GMM+HGLMM 17.3 39.0 50.2 10.8 28.3 40.1
Order-emb VggNet 23.3 -- 65.0 18.0 -- 57.6
CSE ResNet 27.9 57.1 70.4 22.2 50.2 64.4
CMPL MobileNet 24.6 52.3 66.4 19.1 44.6 58.4
CMPM ResNet 31.1 60.7 73.9 22.9 50.2 63.8
TERN BUTD, Bert 38.4 69.5 81.3 28.7 59.7 72.7
Dual-Path VggNet 35.5 63.2 75.6 21.0 47.5 60.9
Dual-Path ResNet 41.2 70.5 81.1 25.3 53.4 66.4
VSE++ VggNet 32.9 61.7 74.7 24.1 52.8 66.2
VSE++ ResNet 41.3 71.1 81.2 30.3 59.4 72.4
GXN ResNet 42.0 -- 84.7 31.7 -- 74.6
SCO VggNet 40.2 70.1 81.3 31.3 61.5 73.9
SCO ResNet 42.8 72.3 83.0 33.1 62.9 75.5
CVSE++ ResNet 43.2 73.5 84.1 32.4 62.2 74.6
DXR ResNet, Bert 44.9 75.2 84.7 33.9 64.9 77.4
PVSE ResNet 45.2 74.3 84.5 32.4 63.0 75.0
R-SCAN BUTD, VrR-VG 45.4 77.9 87.9 36.2 65.5 76.7
SAVE ResNet 46.7 76.3 86.1 34.0 64.8 77.0
MPL SCAN_I2T 46.9 77.7 87.6 34.4 64.2 75.9
GVSE* BUTD 47.2 76.6 88.4 31.2 61.2 70.5
CASC ResNet 47.2 78.3 87.4 34.7 64.8 76.8
OAN BUTD 47.8 81.2 90.4 37.0 66.6 78.0
MTFN BUTD 44.7 76.4 87.3 33.1 64.7 76.1
MTFN BUTD, RR 48.3 77.6 87.3 35.9 66.1 76.1
M3A ResNet 48.9 75.2 84.4 38.3 65.7 76.9
A3VSE BUTD 49.3 81.1 90.2 39.0 68.0 80.1
GVSE* BUTD 49.9 77.4 87.6 38.4 68.5 79.7
SGM BUTD 50.0 79.3 87.9 35.3 64.9 76.5
CAMP BUTD 50.1 82.1 89.7 39.0 68.9 80.2
SCAN BUTD, I2T_LSE 46.4 77.4 87.2 34.4 63.7 75.7
SCAN* BUTD, AVE+LSE 50.4 82.2 90.0 38.6 69.3 80.4
GOT SCAN_I2T 50.5 80.2 89.8 38.1 66.8 78.5
PFAN* BUTD 50.8 83.9 89.1 39.5 69.5 80.8
Meta-SPN BFAN, equal 51.0 81.1 89.4 37.5 66.7 77.5
PFAN++* BUTD 51.2 84.3 89.2 41.4 70.9 79.0
HOAD BUTD 51.2 81.7 89.1 39.4 72.5 84.1
HOAD BUTD, +Dist 51.4 81.8 89.1 40.5 73.5 84.1
CAAN BUTD 52.5 83.3 90.9 41.2 70.3 82.9
VSRN* BUTD 53.0 81.1 89.4 40.5 70.6 81.1
CCRS* BUTD, SCAN 47.9 78.1 88.2 36.9 66.9 78.4
CCRS* BUTD, BFAN 53.1 81.8 90.2 38.3 67.8 78.6
IMRAM BUTD, Image 53.2 82.5 90.4 38.9 68.5 79.2
IMRAM BUTD, Text 52.0 81.8 90.1 38.6 68.1 79.1
IMRAM BUTD, Full 53.7 83.2 91.0 39.7 69.1 79.8
MMCA BUTD, Bert 54.0 82.5 90.7 38.7 69.7 80.8
SMFEA BUTD 54.2 -- 89.9 41.9 -- 83.7
CAMERA BUTD, Bert 53.1 81.3 89.8 39.0 70.5 81.5
CAMERA* BUTD, Bert 55.1 82.9 91.2 40.5 71.7 82.5
DSRAN BUTD, GRU 51.9 81.6 89.8 39.5 70.6 81.0
DSRAN BUTD, Bert 53.7 82.1 89.9 40.3 70.9 81.3
DSRAN* BUTD, GRU 54.4 83.5 91.3 41.5 71.9 82.1
DSRAN* BUTD, Bert 55.3 83.5 90.9 41.7 72.7 82.8
CSCC BUTD, +GloVe 55.6 83.6 91.2 40.8 73.2 84.3
TERAN BUTD, Bert 55.6 83.9 91.6 42.6 72.5 82.9
SAM VSRN 49.1 79.0 87.4 37.5 68.1 79.5
SAM SGR 55.7 83.2 91.2 40.5 69.7 80.5
SAM^ CVSE^ 56.4 82.4 90.1 42.3 73.9 84.5
SCG VggNet, Prod 49.9 78.9 88.1 33.2 62.4 74.7
SCG VggNet, Gated 56.6 84.5 92.0 39.2 68.0 81.3
AOQ* BUTD, SCAN 51.2 82.5 90.1 39.4 69.7 80.4
AOQ* BUTD, VSRN 55.1 83.3 90.8 41.1 71.5 82.0
AOQ* BUTD, BFAN 57.3 84.5 91.7 40.1 69.2 80.1
ADDR* BUTD, BFAN 54.3 84.0 91.5 40.1 69.2 80.6
ADDR* BUTD, VSRN 56.6 85.3 90.4 42.5 71.9 82.0
ADDR* BUTD, SCAN 57.3 86.0 92.7 41.8 72.0 81.3
SSAMT BUTD, Bert 57.7 84.2 90.8 40.8 70.5 80.5
SGRAF BUTD, SAF 53.3 82.3 90.1 39.8 69.0 80.2
SGRAF BUTD, SGR 56.9 83.2 90.5 40.2 69.0 79.8
SGRAF* BUTD 57.8 84.9 91.6 41.9 70.7 81.3
T-EMDE BUTD, SAF 56.7 -- 90.7 40.3 -- 80.4
T-EMDE BUTD, SGR 57.0 -- 91.0 40.0 -- 80.1
T-EMDE* BUTD, SGRAF 59.1 -- 91.8 41.8 -- 81.7
DIME BUTD, I2T, Bert 56.1 83.2 91.1 40.2 70.7 81.4
DIME BUTD, T2I, Bert 55.3 82.4 90.2 39.7 70.3 81.0
DIME* BUTD, Bert 59.3 85.4 91.9 43.1 73.0 83.1
SAN^ ResNet 65.4 89.4 94.8 46.2 77.4 86.6
ACMM BUTD 63.5 88.0 93.6 36.7 65.1 76.7
ACMM* BUTD 66.9 89.6 94.9 39.5 69.6 81.1
GPO IN, BiGRU 55.1 81.9 89.9 40.9 70.6 81.5
GPO* IN+VG, BiGRU 59.8 86.1 92.8 42.7 72.8 83.3
GPO* IN+VG, Bert 62.5 87.8 94.0 46.0 75.8 85.7
GPO* WSL, Bert 68.1 90.2 95.2 52.7 80.2 88.3
PG* BUTD, +GloVe 68.7 88.7 93.0 46.2 77.8 85.5
Peformance of Identity-aware Datasets
Method_name Concise_note
Text-to-Image
R@1 R@5 R@10
DSSL ResNet 32.43 55.08 63.19
Performance of CUHK-PEDES
Method_name Concise_note
Text-to-Image
R@1 R@5 R@10
LSTM-Q+I VggNet 17.19 -- 57.82
GNA-RNN VggNet 19.05 -- 53.64
IATV VggNet 25.94 -- 60.48
PWM-ATH VggNet 27.14 49.45 61.02
GLA ResNet 43.58 66.93 76.26
Dual-Path VggNet 32.15 54.42 64.30
Dual-Path ResNet 44.40 66.26 75.07
CMPM MobileNet 44.02 -- 77.00
CMPL MobileNet 49.37 -- 79.27
MCCL MobileNet, CL 48.21 -- 78.27
MCCL MobileNet 50.58 -- 79.06
MIA VggNet 48.00 70.70 79.30
MIA ResNet 53.10 75.00 82.90
A-GANet ResNet 53.14 74.03 82.95
PMA VggNet 47.02 68.54 78.06
PMA ResNet 53.81 73.54 81.23
TIMAM ResNet, Bert 54.51 77.56 84.78
CMAAM MobileNet 55.13 76.14 83.77
ITMeetsAL MobileNet 51.85 73.36 81.27
ITMeetsAL ResNet 55.72 76.15 84.26
ViTAA ResNet 55.97 75.84 83.52
FTD ResNet 57.84 78.33 85.43
MGEL VggNet 52.68 74.37 83.11
MGEL MobileNet 59.21 79.16 85.88
MGEL ResNet 60.27 80.01 86.74
SSAN VggNet 55.52 76.17 83.45
SSAN ResNet 61.37 80.15 86.73
NAFS ResNet, Bert 59.94 79.86 86.70
NAFS +RVN 61.50 81.19 87.51
DSSL ResNet 59.98 80.41 87.56
DSSL +RR 62.33 82.11 88.01
LapsCore CMPL 53.33 -- 83.20
LapsCore NAFS 63.40 -- 87.80
Performance of ICFG-PEDES
Method_name Concise_note
Text-to-Image
R@1 R@5 R@10
Dual-Path ResNet 38.99 59.44 68.41
CMPL ResNet 43.51 65.44 74.26
MIA ResNet 46.49 67.14 75.18
SCAN ResNet 50.05 69.65 77.21
ViTAA ResNet 50.98 68.79 75.78
SSAN ResNet 54.23 72.63 79.53
Performance of CUB-Flowers
Method_name Concise_note
CUB Flowers
Image-to-Text Text-to-Image
Image-to-Text Text-to-Image
R@1 AP@50 R@1 AP@50
FV GMM+HGLMM 36.5 35.6 54.8 52.8
Word2Vec 38.6 33.5 54.2 52.1
Word-NN CNN 51.0 43.3 60.7 56.3
Word-NN CNN-RNN 56.8 48.7 65.6 59.6
IATV Triplet 52.5 52.4 64.3 64.9
IATV VggNet 61.5 57.6 68.4 70.1
CMPM MobileNet 62.1 64.6 66.1 67.7
CMPL MobileNet 64.3 67.9 68.9 69.7
TIMAM ResNet, Bert 67.7 70.3 70.6 73.7
LapsCore CMPL 68.0 66.0 75.2 71.4
LapsCore CMP_adv 72.3 69.5 77.9 73.3