💡 [REQUEST] - <title> How to Create the Multimodal embedding for text, image and videos using this model.

### 起始日期 | Start Date

_No response_

### 实现PR | Implementation PR

_No response_

### 相关Issues | Reference Issues

_No response_

### 摘要 | Summary

I want to create embeddings for text, image and videos using MiniCPM model like LlaVa model. How to create the multimodal embedding using this model.

### 基本示例 | Basic Example

There are few multimodal embedding like CLIP, LlaVa which can be used to create embeddings for text, images as well as videos. 

code ="""from transformers import AutoTokenizer, AutoModel, AutoImageProcessor
                import torch
                
                # Load the pre-trained  model
                model_path = "MLLM path "
                model = AutoModel.from_pretrained(model_path, trust_remote_code = True, device_map = device)
                tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code = True)
                processor = AutoImageProcessor.from_pretrained(model_path, trust_remote_code = True)
                
                # Preprocess the text
                text = "This is a sample text"
                inputs = processor(text, return_tensors="pt")
                
                # Generate text embeddings
                with torch.no_grad():
                    outputs = model(**inputs)
                    embeddings = outputs.last_hidden_state[:, 0, :]
                
                # Use the embeddings
                print(embeddings.shape)   """

### 缺陷 | Drawbacks

I am trying to do the same in this model but I am facing an error while doing this.

### 未解决问题 | Unresolved questions

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💡 [REQUEST] - <title> How to Create the Multimodal embedding for text, image and videos using this model. #506

起始日期 | Start Date

实现PR | Implementation PR

相关Issues | Reference Issues

摘要 | Summary

基本示例 | Basic Example

缺陷 | Drawbacks

未解决问题 | Unresolved questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development