This package is currently under development. See ASReview-statistics for stable version compatible with ASReview LAB <=0.19.x.
ASReview-datatools is an extension for ASReview LAB software. The extension can be used for describing, cleaning your (input) data, and converting file formats via the command line.
The ASReview-datatools extensions requires Python 3.6+ and ASReview LAB version 1 or later.
The easiest way to install the datatools extension is to install from PyPI:
pip install asreview-datatools
After installation of the datatools extension, asreview
should automatically
detect it. Test this by:
asreview --help
If it lists asreview data describe
, then the extension is successfully installed.
Describe the content of a dataset
asreview data describe MY_DATASET.csv
Export the results to a file (output.json
)
asreview data describe MY_DATASET.csv -o output.json
Describe the van_de_schoot_2017
dataset from the benchmark
platform.
asreview data describe benchmark:van_de_schoot_2017 -o output.json
{
"asreviewVersion": "1.0rc2+14.gac96c1a",
"apiVersion": "0.4+4.g3f54294",
"data": {
"items": [
{
"id": "n_records",
"title": "Number of records",
"description": "The number of records in the dataset.",
"value": 6189
},
{
"id": "n_relevant",
"title": "Number of relevant records",
"description": "The number of relevant records in the dataset.",
"value": 43
},
{
"id": "n_irrelevant",
"title": "Number of irrelevant records",
"description": "The number of irrelevant records in the dataset.",
"value": 6146
},
{
"id": "n_unlabeled",
"title": "Number of unlabeled records",
"description": "The number of unlabeled records in the dataset.",
"value": 0
},
{
"id": "n_missing_title",
"title": "Number of records with missing title",
"description": "The number of records in the dataset with missing title.",
"value": 5
},
{
"id": "n_missing_abstract",
"title": "Number of records with missing abstract",
"description": "The number of records in the dataset with missing abstract.",
"value": 764
},
{
"id": "n_duplicates",
"title": "Number of duplicate records (basic algorithm)",
"description": "The number of duplicate records in the dataset based on similar text.",
"value": 104
}
]
}
}
Convert the format of a dataset. For example, convert a RIS dataset into a CSV, Excel, or TAB dataset.
asreview data convert MY_DATASET.ris MY_OUTPUT.csv
Remove duplicate records with a simple and straightforward deduplication algorithm (see source code). The algorithm concatenates the title and abstract, whereafter it removes all non-alphanumeric tokens. Then the duplicates are removed.
asreview data dedup MY_DATASET.ris
Export the deduplicated dataset to a file (output.csv
)
asreview data dedup MY_DATASET.ris -o output.csv
Using the van_de_schoot_2017
dataset from the benchmark
platform.
asreview data dedup benchmark:van_de_schoot_2017 -o van_de_schoot_2017_dedup.csv
This extension is MIT licensed.
Use the issue tracker or see more contact options in the ASReview LAB repository.