You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The 🤗 Hugging Face Hub intends to facilitate the hosting and sharing of AI models and datasets (as well as demo applications), and now also NatLibFi has an organization account in the Hugging Face Hub.
The data (models and datasets) in the HF Hub live in git repositories, and git can be used to handle the data (to commit, push, pull...) . However, also direct integration of applications with HF Hub is supported using the huggingface_hub Python library, which is usable also as a CLI tool.
Annif could have the functionality to push (and pull) projects or project sets to (and from) the HF Hub. It should to be able to operate on project sets because ensemble projects require the availability of also its base projects and also because of convenience.
There could be the following CLI command to push a set of projects to HF Hub:
projects.{cfg,toml,d} configurations of the projects
Options for bundling and uploading
1. Single file
Bundle all files into one zip named: yso-fi.zip (possibly include only the configs of the selected projects). Upload to the root of the repo.
The filename could be derived by the glob pattern of the projects or it could be a required argument for the upload command (as 2nd argument, to be added to the above example).
This option would be easiest for downloads: just wget one file and unzip.
2. One file for projects and vocab, and one for projects configs
Bundle projects and vocabulary directories into one zip and leave projects config file uncompressed.
3. One file for projects, one for vocab, and one for projects configs
Bundle the selected projects into one zip (yso-fi.zip) and vocabularies into another (yso.zip) and leave projects config file uncompressed. Upload the projects zip to data/projects directory and the vocab zip to data/vocabs.
4. Separate files for each project, vocab, and projects configs
Compress each project directory into its own zip (<project-id>.zip).
For this option for downloads one should use e.g. wget --accept yso*-fi.zip for the projects.
Some details and ideas:
There exists the upload_file method in the Python client library that could be used for this.
But implementing this is probably best done only after the upload functionality; downloading from the HF Hub can be done also by simply with wget or curl. However, if the download function is known to be added, the hierarchy and structure of the data files in the repo should be thought from this point of view.
The text was updated successfully, but these errors were encountered:
The 🤗 Hugging Face Hub intends to facilitate the hosting and sharing of AI models and datasets (as well as demo applications), and now also NatLibFi has an organization account in the Hugging Face Hub.
The data (models and datasets) in the HF Hub live in git repositories, and git can be used to handle the data (to commit, push, pull...) . However, also direct integration of applications with HF Hub is supported using the
huggingface_hub
Python library, which is usable also as a CLI tool.Annif could have the functionality to push (and pull) projects or project sets to (and from) the HF Hub. It should to be able to operate on project sets because ensemble projects require the availability of also its base projects and also because of convenience.
There could be the following CLI command to push a set of projects to HF Hub:
For example
would upload the specified projects to NatLibFi/FintoAI-data-YSO repository.
The files and dirs needed to be uploaded are
data/projects/project-id
the project directoriesdata/vocabs/vocab-id
vocabularies of the projectsprojects.{cfg,toml,d}
configurations of the projectsOptions for bundling and uploading
1. Single file
Bundle all files into one zip named:
yso-fi.zip
(possibly include only the configs of the selected projects). Upload to the root of the repo.The filename could be derived by the glob pattern of the projects or it could be a required argument for the upload command (as 2nd argument, to be added to the above example).
This option would be easiest for downloads: just
wget
one file and unzip.2. One file for projects and vocab, and one for projects configs
Bundle projects and vocabulary directories into one zip and leave projects config file uncompressed.
3. One file for projects, one for vocab, and one for projects configs
Bundle the selected projects into one zip (
yso-fi.zip
) and vocabularies into another (yso.zip
) and leave projects config file uncompressed. Upload the projects zip todata/projects
directory and the vocab zip todata/vocabs
.4. Separate files for each project, vocab, and projects configs
Compress each project directory into its own zip (
<project-id>.zip
).For this option for downloads one should use e.g.
wget --accept yso*-fi.zip
for the projects.Some details and ideas:
upload_file
method in the Python client library that could be used for this.ModelHubMixin
class which could help the integration.huggingface-cli login
command.huggingface-cli upload
CLI command (for commit message etc.).Downloading projects
We could also implement a feature to fetch projects from the HF Hub, for example:
But implementing this is probably best done only after the upload functionality; downloading from the HF Hub can be done also by simply with wget or curl. However, if the download function is known to be added, the hierarchy and structure of the data files in the repo should be thought from this point of view.
The text was updated successfully, but these errors were encountered: