HuBMAP Parameterized Search
Overview:
The HuBMAP parameterized search endpoints provide an option for a simpler programmatic search mechanism vs using the full search-api /search
endpoints. Both the /param-search
and /search
endpoints are backed by Elasticsearch indices, but the parameterized search facility follows a simple RESTful parameter scheme vs the complicated Elasticsearch json query syntax used by the full /search
mechanism. The /param-search
endpoint only allows for searching specific values of “allowable value” attributes “anded” together vs the full logic and attribute types available in the Elasticsearch supported queries available in the full /search
endpoint.
This page documents the public usage of the /param-search
endpoint and its variants vs the fully documented HuBMAP Search API, which includes less detail of the /param-search
endpoint, but also detail of the more capable, but more complicated /search
endpoint.
Description:
The /param-search/<entity-type>
endpoint of the HuBMAP Search API service is a RESTful search interface allowing simple attribute matching by providing attribute value pairs as query parameters at the end of the RESTful URL call. Multiple query parameters can be provided, which will be “ANDed” together in the query logic, for example the URL https://searchapi.service.endpoint/entity-type?param1=value1¶m2=value2¶m3=value3
will find all entities of type “entity-type” where entity.param1 equals “value1” and entity.param2 equals “value2” and entity.param3 equals “value3”
For an example of how to use the produce-clt-manifest
option (described below), see the Example Query and Download page
Inputs
-
The type of entity to search for provided as an in-URL resource parameter after the `/param-search/` endpoint name, where valid entity types are: donors
See the Donor Schema for the list of queryable Dataset parameters.samples
See the Sample Schema for the list of queryable Sample parameters.datasets
See the Dataset Schema for the list of queryable Dataset parameters.
- attribute value pairs as query parameters i.e
attribute-name=value
, at least one pair is required with the upper limit based on maximum URL length. An attribute name can only be included once in the attribute value pairs. Example call:/param-search/datasets?dataset_type=CODEX&status=Published
will return all datasets of type CODEX that are published. - optional query parameter
produce-clt-manifest=true
, that will produce, instead of a list of matching entities, a list of unique datasets ids for each dataset matching the query in the format of a manifest that is directly usable by the [HuBMAP Command Line Transfer Tool](../clt/index.html] to download the full datasets via the HuBMAP Globus Endpoint.
The parameter names can be top level attributes from any of the entities or they can be nested attributes, for example /param-search/datasets?dataset_type=CODEX
queries the top level Dataset attribute dataset_type
, whereas /param-search/datasets?metadata.is_targeted=Yes
queries the is_targeted
attribute that is nested under metadata.
Response
Response Code: 200
:
When at least one entity matches the query, a 200 HTTP response code and if query parameter produce-clt-manifest=true
is NOT include a json array of all entities matching the
Response Code: 303
:
If the total response payload exceeds 10 MB, the response is returned via an S3 bucket. A 303 HTTP response code will be returned with the redirect URL where the query can be retrieved.
Response Code 404
:
When no entities are matched a 404 HTTP response code is returned.
Response Code 504
:
There is a maximum query and response time of 30 seconds. If the query response takes 30 seconds or longer, a 504 HTTP response code will be returned and you’ll need to constrain your query to return less values.
Each document in the entities
indices contains information about one entity in a Dataset. The structure of these documents is described below.
Examples:
To find all Datasets of type RNAseq
where specific molecules are not targeted for detection use this query:
GET https://search.api.hubmapconsortium.org/v3/param-search/datasets?dataset_type=RNAseq&metadata.is_targeted=No
A json array containing Dataset objects will be returned.
To find all ATACseq datasets (dataset_type=ATACseq
) that were run on tissue from a right lung (origin_samples.organ=RL
):
GET https://search.api.hubmapconsortium.org/v3/param-search/datasets?origin_samples.organ=RL&dataset_type=ATACseq
A json array containing Dataset objects will be returned.
To run the same query finding all ATACseq datasets, but produce a manifest file to download all of the data instead of producing the json of all dataset information, add the produce-clt-manifest=true
option
GET https://search.api.hubmapconsortium.org/v3/param-search/datasets?origin_samples.organ=RL&dataset_type=ATACseq&produce-clt-manifest=true
This will produce a list of dataset ids in a format usable by the HuBMAP Command Line Transfer Tool to download the data. A Linux/MAC command line example of how to produce a manifest file:
curl "https://search.api.hubmapconsortium.org/v3/param-search/datasets?origin_samples.organ=RL&dataset_type=ATACseq&produce-clt-manifest=true" > manifest.out
Common document elements:
Stored documents are enhanced with the following attributes for convenient use within the JSON response.
Document Element | Description |
---|---|
ancestor_ids | A Javascript array with identifiers for all ancestors of the entity |
ancestors | A Javascript array with a JSON object for each ancestor of the entity |
descendant_ids | A Javascript array with identifiers for all descendants of the entity |
descendants | A Javascript array with a JSON object for each descendant of the entity |
display_subtype | A string with the name of the entity’s type |
donor | A JSON object with information for the Donor associated with the entity |
immediate_ancestors | A Javascript array with a JSON object for the subset of ancestors which are “a parent to” the entity |
immediate_descendants | A Javascript array with a JSON object for the subset of descendants which are “a child of” the entity |
index_version | A string indicating the version of the indexing software used to create the document in the index |
origin_sample | A JSON object with information for the ancestor Sample associated with the entity |
origin_samples | A Javascript array with a JSON object for the ancestor Sample associated with the entity |