Skip to content

Does the support for the Mistral Model seem inconsistent or incomplete with it official? #29533

Closed
@MrYxJ

Description

Feature request

Hello , thank you very much for creating the library transformers which provides a very convenient way to use the different models.

Recently, when I use Mistral Model by transformers (4.38.2) library to perform the model inference, the execution did not seem to be completely consistent with the official mistral papers and the code they provided.

Specifically, firstly the mistral have a Pre-fill and Chunking mechanism when prompt input encoding(As shown in the following picture). Is this method included in the transformers library's implementation of mistral's generate function?

图片

Secondly, Mistral use a Rolling Buffer Cache mechanism to enable key values to be updated in the cache as the sliding window moves. (As shown in the following picture).
图片

When I read this piece of the transformers source code, this mechanism is implemented in the transformer library using the DynamicCache update method, and this operation is actually to concatenate all the past keys and values with the current keys and values. https://github.com/huggingface/transformers/blob/b338a6c3b8eda29610d4d472cad8cd87cbfdaaed/src/transformers/cache_utils.py#L126C1-L132C105

Am I mistaken in the logic of this program or is the implementation not fully consistent with the mistral paper? In addition, I read the official source code of mistral and found that they do release two kinds of model inference code, one of which is simple version. Is this your current implementation?

Motivation

I want to know how to use transformers library on the mistral model to achieve the same way as in mistral paper.
Whether I read the code wrong or do I need to make some additional changes based on transformer?

Your contribution

I can try fix this piece code to achieve the same implementation as in the mistral paper based on transformers.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions