Open
Description
Hello,
I have a problem where I need to match entities with mixed-data (text, numerical, images) and multiple image inputs for each entity.
I was wondering if it is possible to use a custom architecture for creating the representation, so that I can use a multi-input multimodal architecture.
Thank you
Metadata
Assignees
Labels
No labels