YAML datasets (YAMLd) is a tiny subset of YAML designed specifically for representing tabular data (such as CSV). It is particularly useful for datasets with numerous features or lengthy sequences that are hard to read.
- The file ends with a top-level key
data
ordataset
that contain the list of entries in the dataset. - Everything before is considered meta-data.
features_description:
name: 'embloyee name'
age: 'age'
city: 'closest city to adress'
extra: 3.4
dataset:
- name: 'John Doe'
age: 30
city: 'New York'
- name: 'Jane Smith'
age: 25
city: 'San Francisco'
- name: 'Bob Johnson'
age: 35
city: 'Chicago'
Using a 'mini' version of this, you can remove the feature names from each line as follows.
data:
- name: 'John Doe'
age: 30
city: 'New York'
- - 'Jane Smith'
- 25
- 'San Francisco'
- - 'Bob Johnson'
- 35
- 'Chicago'
You can also remove feature names completly.
Note: It is still experimental, use it with caution.
A simple example:
import yamld
#Read dataframe
dataframe = yamld.read_dataframe('your_yamld_file.yaml')
#Print meta-data
print(dataframe.attrs)
#Write dataframe to a file
yamld.write_dataframe('new_yamld_file.yaml', dataframe)
#Read yamld as a generator of `dict` values
dataset_gen = read_generator('your_yamld_file.yaml')
Note: this API is subject to change.
csv2yamld <your-csv-file>
yamld2csv <your-yamld-file>
For more details use csv2yamld -h
or yamld2csv -h
.
Reading CSV can be annoying, here is a simple solution:
csv2yamld <your-csv-file> --stdout | nvim -c 'set filetype=yaml' -
Of course, you can edit it, save it, and convert it back to CSV using yamld2csv
.
pip install -U yamld
To install without virtual environments, you can use pipx.
The main goal is to edit and view your data files with nothing but your text editor. Consequently, a lot of YAML features are not parsed after the top-level key-word data
or dataset
, as their inclusion could either introduce clutter or hinder the parsing efficiency for dataset. For example, YAML explicit types markings and comments are not supported.
- Dataset data types:
- List: surrounded with
[]
- String: surrounded with
""
or''
- Number
- List: surrounded with
- No meta, only dataset. Example:
- - 30
- 'New York'
- - 'Jane Smith'
- 25
- 'San Francisco'
- Meta only (simply remove the top-level
dataset
ordata
key). Example:
META_VAL: "SOME_VALUE"