Different types of tables are widely used to store and present information. To automatically process numerous tables and gain valuable insights, researchers have proposed a series of deep-learning models for various table-based tasks, e.g., table question answering (TQA), table-to-text (T2T), text-to-sql (NL2SQL) and table fact verification (TFV). Recently, the emerging Large Language Models (LLMs) and more powerful Multimodal Large Language Models (MLLMs) have opened up new possibilities for processing the tabular data, i.e., we can use one general model to process diverse tables and fulfill different tabular tasks based on the user natural language instructions. We refer to these LLMs speciallized for tabular tasks as Tabular LLMs
. In this repository, we collect a paper list about recent Tabular (M)LLMs and divide them into the following categories based on their key idea.
Table of Contents:
- Survey of Tabular LLMs and table understanding
- Prompting LLMs for different tabular tasks, e.g., in-context learning, prompt engineering and integrating external tools.
- Training LLMs for better table understanding ability, e.g., training existing LLMs by instruction fine-tuning or post-pretraining.
- Developing agents for processing tabular data, e.g., devolping copilot for processing excel tables.
- Empirical study or benchmarks for evaluating LLMs' table understanding ability, e.g., exploring the influence of various table types or table formats.
- Multimodal table understanding, e.g., training MLLMs to understand diverse table images and textual user requests.
- Table Understanding datasets, e.g., valuable datasets for model training and evaluation.
Task Names and Abbreviations:
Task Names | Abbreviations | Task Descriptions |
---|---|---|
Table Question Answering | TQA | Answering questions based on the table(s), e.g., answer look-up or computation questions about table(s). |
Table-to-Text | Table2Text or T2T | Generate a text based on the table(s), e.g., generate a analysis report given a financial statement. |
Text-to-Table | Text2Table | Generate structured tables based on input text, e.g., generate a statistical table based on the game summary. |
Table Fact Verification | TFV | Judging if a statement is true or false (or not enough evidence) based on the table(s) |
Text-to-SQL | NL2SQL | Generate a SQL statement to answer the user question based on the database schema |
Tabular Mathematical Reasoning | TMR | Solving mathematical reasoning problems based on the table(s), e.g., solve math word problems related to a table |
Table-and-Text Question Answering | TAT-QA | Answering questions based on both table(s) and their related texts, e.g., answer questions given wikipedia tables and their surrounding texts. |
Table Interpretation | TI | Interpreting basic table content and structure information, e.g., column type annotation, entity linking, relation extraction, cell type classification et al. |
Table Augmentation | TA | Augmenting existing tables with new data, e.g., schema augmentation, row population, et al. |
Title | Conference | Date | Pages |
---|---|---|---|
Language Modeling on Tabular Data: A Survey of Foundations, Techniques and Evolution | arxiv | 2024-08-20 | 49 |
Large Language Model for Table Processing: A Survey | arxiv | 2024-02-04 | 9 |
A Survey of Table Reasoning with Large Language Models | arxiv | 2024-02-13 | 9 |
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey | arxiv | 2024-03-01 | 41 |
Transformers for Tabular Data Representation: A Survey of Models and Applications | TACL 2023 | 23 | |
Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks | IJCAI 2022 | 2022-01-24 | 15 |
Title | Conference | Date | Task | Code |
---|---|---|---|---|
HYTREL: Hypergraph-enhanced Tabular Data Representation Learning |
NIPS 2023 | 2023-07-14 | TA, TI | Github |
FLAME: A small language model for spreadsheet formulas | AAAI 2024 | 2023-01-31 | Generating Excel Formulas | Github |
Title | Conference | Date | Task | Code |
---|---|---|---|---|
SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models | arxiv | 2024-03-06 | Manipulating Excels with LLM | Github |
EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records |
arxiv | 2024-01-13 | TQA | Github |
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks |
arxiv | 2024-01-10 | Data Analysis | Github |
DB-GPT: Empowering Database Interactions with Private Large Language Models |
arxiv | 2023-12-29 | Data Analysis | Github |
ReAcTable: Enhancing ReAct for Table Question Answering | arxiv | 2023-10-01 | TQA | |
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models |
NIPS 2023 | 2023-05-30 | Manipulating Excels with LLM | Github |
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT | arxiv | 2023-07-17 | Manipulating CSV table with LLM |
Title | Conference | Date | Task | Code |
---|---|---|---|---|
PixT3: Pixel-based Table-To-Text Generation |
ACL 2024 | 2023-11-16 | T2T | Github |
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy | arxiv | 2024-06-03 | TQA,TI | |
TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains |
arxiv | 2024-04-30 | TQA, TFV | Github |
Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs | ACL 2024 | 2024-02-19 | TQA,TFV,T2T | |
Multimodal Table Understanding |
ACL 2024 | 2024-02-15 | TQA, TFV, T2T, TI, TAT-QA, TMR | Github |
Title | Conference | Date | Task | Data Volume | Domain | Table Type | Data and Code |
---|---|---|---|---|---|---|---|
ENTRANT: A Large Financial Dataset for Table Understanding | Sci Data | 2024-07-04 | Cell Type Classification, Header Extraction, et al | Millions of tables with cell attributes, as well as positional and hierarchical information | Financial | Flat tables and hierarchical tables | Github |
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering | arxiv | 2024-08-17 | TMR, TFV, Trend Forecasting and Chart Generation | 3681 tables and 20K samples | Collect tables from academic datasets like WTQ and FeTaQA | Flat tables and a small number of hierarchical tables | Github |
DocTabQA: Answering Questions from Long Documents Using Tables | arxiv | 2024-08-21 | Table Generation based on question and document | 300 documents and 1.5k question-table pairs | Financial | Flat tables and hierarchical tables | Github |