Description
Currently language detection is only used for the language filtering transformation, but there have been some requests about providing this functionality also for users of the REST API, e.g. for the selection of an appropriate model.
We could have a method in the REST API for language detection, something like
POST /detect-language
with the parameters text
(input text whose language to detect) and candidates
(list/array of language codes to consider, e.g. ["fi", "sv", "en"]
). Not sure about whether it's better to use form data or a JSON object to wrap the parameters. The return format would be a JSON object containing an array of results, something like this:
{
"results": [
{"language": "fi", "score": 0.85},
{"language": "sv", "score": 0.3},
{"language": "en", "score": 0.3},
{"language": null, "score": 0.1}
]
}
Here, the scores are arbitrary values between 0.0 and 1.0, and the language null
stands for unknown language. The scores wouldn't necessarily add up to 1.
There could also be a similar CLI command for symmetry.