Skip to content

Language detection method in REST API & CLI #631

Closed
@osma

Description

Currently language detection is only used for the language filtering transformation, but there have been some requests about providing this functionality also for users of the REST API, e.g. for the selection of an appropriate model.

We could have a method in the REST API for language detection, something like

POST /detect-language

with the parameters text (input text whose language to detect) and candidates (list/array of language codes to consider, e.g. ["fi", "sv", "en"]). Not sure about whether it's better to use form data or a JSON object to wrap the parameters. The return format would be a JSON object containing an array of results, something like this:

{
  "results": [
    {"language": "fi", "score": 0.85},
    {"language": "sv", "score": 0.3},
    {"language": "en", "score": 0.3},
    {"language": null, "score": 0.1}
  ]
}

Here, the scores are arbitrary values between 0.0 and 1.0, and the language null stands for unknown language. The scores wouldn't necessarily add up to 1.

There could also be a similar CLI command for symmetry.

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions