Skip to content

Lightweight CLD2 bindings for WebAssembly

License

Notifications You must be signed in to change notification settings

aspriddell/cld2-wasm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

cld2-wasm

Building the cld2 library targeting WebAssembly for use in browsers.

Setup

  1. Download cld2.js and cld2.wasm to your project, placing cld2.wasm so it can be loaded from http://<your-server-location>/cld2.wasm
  2. Import cld2, import cld2Module from './cld2.js'
  3. Load the WASM module using const cld2 = await cld2Module()
cld2.getVersion()

Returns the current source build version (should be V2.0 - 20141016)

cld2.detectLanguage(text)

Attempts to detect the most probable language for the given text, returning the code and whether it's reliable.

const {langId, langCode, isReliable} = cld2.detectLanguage("The quick brown fox jumps over the lazy dog");
console.log(`Detected language ${langCode} (${isReliable ? "reliable result" : "not confident"})`);
cld2.UNKNOWN_LANGUAGE_ID

(Constant) The unknown language id

cld2.getLanguageName(langId)

Given a langauge id (integer), returns the formatted name

const unknownLangName = cld2.getLanguageName(cld2.UNKNOWN_LANGUAGE_ID);

// unknownLangName = "Unknown"

Building

  1. Shallow clone the original repo git clone --depth=1 https://github.com/CLD2Owners/cld2.git
  2. cd into cld2
  3. Add cld2_emscripten.cc to internal folder
  4. Using emcc, build to WASM
docker run --rm -v $(pwd):/src -u $(id -u):$(id -g) -w /src/internal emscripten/emsdk \
  emcc -O3 -w -Wno-narrowing -lembind -o ../cld2.js \
  -s EXPORT_ES6=1 -s MODULARIZE=1 -s EXPORT_NAME="'cld2'" \
  cldutil.cc cldutil_shared.cc compact_lang_det.cc compact_lang_det_hint_code.cc \
  compact_lang_det_impl.cc debug.cc fixunicodevalue.cc \
  generated_entities.cc generated_language.cc generated_ulscript.cc  \
  getonescriptspan.cc lang_script.cc offsetmap.cc  scoreonescriptspan.cc \
  tote.cc utf8statetable.cc  \
  cld_generated_cjk_uni_prop_80.cc cld2_generated_cjk_compatible.cc  \
  cld_generated_cjk_delta_bi_4.cc generated_distinct_bi_0.cc  \
  cld2_generated_quadchrome_2.cc cld2_generated_deltaoctachrome.cc \
  cld2_generated_distinctoctachrome.cc  cld_generated_score_quad_octa_2.cc cld2_emscripten.cc
  1. collect cld2.js and cld2.wasm from the cld2 folder

License

Licensed under MIT. See license.md for more info.

About

Lightweight CLD2 bindings for WebAssembly

Topics

Resources

License

Stars

Watchers

Forks

Languages