Highlights
- Pro
Stars
"Galahad". Goal: enable linguists to experiment with different taggers and use the result in other INT products
Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.
High-quality datasets, tools, and concepts for LLM fine-tuning.
Scalable data pre processing and curation toolkit for LLMs
Retrieval Augmented Generation (RAG) chatbot powered by Weaviate
Evaluation of language models on mono- or multilingual tasks.
Training LLMs with QLoRA + FSDP
Using open source LLMs to build synthetic datasets for direct preference optimization
Implementation of Nougat Neural Optical Understanding for Academic Documents
Compute complexity metrics from Universal Dependencies
andreasvc / readability
Forked from mmautner/readabilityMeasure the readability of a given text using surface characteristics
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
GEITje 7B: een groot open Nederlands taalmodel
Code for Multilingual Eval of Generative AI paper published at EMNLP 2023
Multilingual Large Language Models Evaluation Benchmark
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
benchmarks for evaluating MT models
MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinki
Robust recipes to align language models with human and AI preferences
German Alpaca Dataset (Cleaned + Translated)
Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.
A package for handy processing of semantic graphs such as AMR, with a special focus on standardized evaluation
Tools for creating TrueType fonts for written sign language in the SignWriting script based on the ISWA 2010
LLM based autonomous agent that does online comprehensive research on any given topic