-🌱 I’m currently an AI Resident at FPT Software AI Center (AIC), ex-AI Engineer at Data & AI Lab (DAL), VNG Corporation.
-
(Large) Multimodal Models Reasoning: Multimodal Large Language Model (MLLM), Vision-Language Model (VLM).
-
(Large) Multimodal Understanding: Vision-Language Compositionality, Structured Representation.
-
Efficient (Large) Multimodal Models: Parameter-Efficient Fine-Tuning (PEFT) (Efficient Training), Small Models (Efficient Inference), Token Merging (Efficient Input).
My current research experience comprises of Intelligent Industrial Systems, Multimodal Learning and Image/Video Understanding, including:
-
[2023-Present] Efficient Cross-Modal Learning & Understanding: Video-Language Matching, Parameter-Efficient Fine-Tuning (PEFT), Multimodal Compositionality, Structured Representation (Scene Graph Generation).
-
[2021-2023] Intelligent Industrial/Traffic Systems Applications: Tracked-Vehicle to Video Retrieval, Person/Vehicle Re-Identification, Person/Vehicle Tracking, Face Recognition/Verification.