LENU is a python library that helps to understand and work with Legal Entity Names in the context of the Legal Entity Identifier (LEI) Standard (ISO 17441) as well as the Entity Legal Form (ELF) Code List Standard (ISO 20275).
The library utilizes Machine Learning with Transformers and scikit-learn. It provides and utilizes pre-trained ELF Detection models published at https://huggingface.co/Sociovestix. This code as well as the LEI data and models are distributed under Creative Commons Zero 1.0 Universal license.
The project was started in November 2021 as a collaboration of the Global Legal Entity Identifier Foundation (GLEIF) and Sociovestix Labs with the goal to explore how Machine Learning can support in detecting the legal form (ELF Code) from a legal name.
It provides:
- an interface to download LEI and ELF Code data from GLEIF's public website
- an interface to train and make use of Machine Learning models to classify ELF Codes from given Legal Names
- an interface to use pre-trained ELF Detection models published on https://huggingface.co/Sociovestix
LENU requires
- python (>=3.8, <3.10)
- scikit-learn - Provides Machine Learning functionality for token based modelling
- transformers - Download and applying Neural Network Models
- pytorch - Machine Learning Framework to train Neural Network Models
- pandas - For reading and handling data
- Typer - Adds the command line interface
- requests and pydantic - For downloading LEI data from GLEIF's website
via PyPI:
pip install lenu
From github:
pip install https://github.com/Sociovestix/lenu
Editable install from locally cloned repository
git clone https://github.com/Sociovestix/lenu
pip install -e lenu
Create folders for LEI and ELF Code data and to store your models
mkdir data
mkdir models
Download LEI data and ELF Code data into your data
folder
lenu download
Train a (default) ELF Code Classification model. An ELF Classification model is always Jurisdiction specific and will be trained from Legal Names from this Jurisdiction.
Examples:
lenu train DE # Germany
lenu train US-DE # United States - Delaware
lenu train IT # Italy
# enable logging to see more information like the number of samples and accuracy
lenu --enable-logging train CH
Identify ELF Code by using a model. The tool will return the best scoring ELF Codes.
lenu elf DE "Hans Müller KG"
# ELF Code Entity Legal Form name Local name Score
# 0 8Z6G Kommanditgesellschaft 0.979568
# 1 V2YH Stiftung des privaten Rechts 0.001141
# 2 OL20 Einzelunternehmen, eingetragener Kaufmann, ein... 0.000714
You can also use pre-trained models, which is recommended in most cases:
# Model available at https://huggingface.co/Sociovestix/lenu_DE
lenu elf Sociovestix/lenu_DE "Hans Müller KG"
# ELF Code Entity Legal Form name Local name Score
#0 8Z6G Kommanditgesellschaft 0.999445
#1 2HBR Gesellschaft mit beschränkter Haftung 0.000247
#2 FR3V Gesellschaft bürgerlichen Rechts 0.000071
Feel free to reach out to either Sociovestix Labs or GLEIF if you need support in using this library, in utilizing LEI data in general, or in case you would like to contribute to this library in any form.