Dataset
A dataset (for the purpose of the HALO-DB) is characterized as follows:
- A dataset is an IPLD-based tree structure identified by its root CID
- (this may be, but is not limited to, an immutable folder of a file system)
- There is a (single) function
extract_metadata(CID) -> metadata, which executes successfully on the root CID- (otherwise, what the CID points to is not considered a dataset)
- the metadata returned by the function is sufficient to
- apply for a DOI
- render a meaningful dataset landing page
- consider the dataset FAIR
- build a search index to reasonably find a specific dataset within a collection of many datasets
- the function
extract_metadatashould be as convenient as possible but must allow to share unknown data types as well
WPs¶
- specify more thoroughly which kinds of datasets are acceptable
- define the metadata format which is returned by
extract_metadata(likely STAC-based) - implement
extract_metadatain an open & shareable way (it’ll be used in many places)