HuBMAP Dataset schema

Overview:

This page describes the Dataset attributes available for querying via the HuBMAP parameterized search endpoint. Full Dataset schema information can be found at the HuBMAP Search API documentation page at the bottom of the page in the Schemas section under the Dataset section.

Description:

A query string is built by combining schema elements documented below with matching values. Each “term” of the query is combined using the & character, and the entire query is attached to the base URL after a ? character, per web standards.

Query terms may be composed from attributes deeper in the schema type of an attribute. The parameter names can be top level attributes from any of the entities or they can be nested attributes, for example /param-search/datasets?dataset_type=RNAseq queries the top level Dataset attribute dataset_type, whereas /param-search/datasets?metadata.metadata.is_targeted=Yes queries the is_targeted attribute that is nested under metadata.metadata. (NOTE: The dual nesting of metadata.metadata will be updated to a single level, just metadata, soon).

This example finds all Datasets of type RNAseq where specific molecules are not targeted for detection use this query:

 GET https://search.api.hubmapconsortium.org/v3/param-search/datasets?dataset_type=RNAseq&metadata.metadata.is_targeted=No

Dataset Attributes

Attribute Type Description
uuid string The HuBMAP unique identifier, intended for internal software use only. This is a 32 digit hexadecimal uuid e.g. 461bbfdc353a2673e381f632510b0f17
hubmap_id string A HuBMAP Consortium wide unique identifier randomly generated in the format HBM###.ABCD.### for every entity.
registered_doi string The doi of a the registered entity. e.g. 10.35079/hbm289.pcbm.487. This is set during the publication process and currently available for certain Collections and Datasets.
doi_url string The url from the doi registry for this entity. e.g. https://doi.org/10.35079/hbm289.pcbm.487
contains_human_genetic_sequences boolean True if the data contains any human genetic sequence information. Can only be set at CREATE/POST time
group_name string The displayname of globus group which the user who created this entity is a member of
dbgap_sra_experiment_url string A URL linking the dataset to the associated uploaded data at dbGaP.
dbgap_study_url string A URL linking the dataset to the particular study on dbGap it belongs to
data_access_level string from data_access_level attribute values One of the values: public, consortium.
status string string from status attribute values One of: NewProcessing, QA Published Error Hold Invalid
antibodies array of Antibody Schema A list of antibodies used in the assay that created the dataset
metadata.metadata JSON-encoded string for a supported assay type schema The assay level metadata submitted by data providers with data. Provided as json. Metadata schemas per dataset_type are linked from the dataset type allowable values section. (NOTE: The dual nesting of metadata.metadata will be updated to a single level, just metadata, soon).
dataset_type string dataset type allowable values The type of data contained in the dataset (as derived from a specific assay type
donor Donor Object The donor from which the tissue was taken for the assay. The sub-attributes under donor are specified in the Donor Schema
origin_samples Sample Object Array The organ from which the tissue was taken for the assay. The sub-attributes under origin_samples are specified in the Sample Schema. This is modeled as an array because it is possible for data to be derived from multiple organs, but currently HuBMAP only has data derived from a single organ.

data_access_level attribute values

The data_access_level of the Dataset Schema is one of the values following enumerated values:

  • public
  • consortium

status attribute values

The status attribute of the Dataset Schema is one of the values following enumerated values:

  • New
  • Processing
  • QA
  • Published
  • Error
  • Hold
  • Invalid

dataset_type allowable values

The dataset_type attribute of the Dataset Schema is a value from the current, authoritative list of dataset types. The valid dataset types, as of, 8/26/2024 are listed below. Additionally, linked next to the dataset types are the metadata schema pages for each dataset type. The metadata attributes listed for each dataset type are accessible below the Dataset.metadata.metadata attribute (e.g. Dataset.metadata.metadata.preparation_instrument_model.