HuBMAP EPICs Overview

Externally Processed Integrated Collection(s)

What is an EPIC?

An EPIC (Externally Processed Integrated Collection) is a processed or analyzed dataset generated by a lab. The EPIC dataset could be derived by analyzing one or more primary or derived datasets. The dataset(s) processed to create the EPIC is considered the “parent” dataset (see Fig 1).

Examples:

       EPICs Figure 1

Figure 1: EPIC dataset example (2D segmentation mask) and the upstream, parent dataset (PhenoCycler).

Please contact the HuBMAP Helpdesk, if you require assistance or have questions about EPICs.

Types of EPICs

We’ve defined three types of EPICs: (1) segmentation masks, (2) object by analyte, and (3) 3D reconstruction. The three EPICs should support a broad range of analysis:

There are currently no plans to define new types of EPICs. If you have derived data you would like to upload that doesn’t map to one of these dataset types, please contact the HuBMAP Helpdesk.

EPIC Example Type 1

Segmentation Mask: Published

This is a published specification, which has been tested, is stable, and can be used to submit relevant EPIC datasets. Segmentation of images is performed manually or by an algorithm that predicts edges of structures. Structures may be nuclear membrane, cell membranes, or larger structures such as tubules. Segmentation mask EPICs work with 2D and 3D images (see Fig 2).

A 3D EPIC will include:

If a 3D segmentation mask is based on a 3D image that’s been constructed from 2D serial sections, the “parent dataset” must be a 3D Reconstruction EPIC (see 3D Reconstruction below).

EPICs - Figure 2

Figure 2: Segmentation mask EPICs can be derived from 2D or 3D images.

EPIC Example Type 2

Object by Analyte: Draft

This specification is undergoing final testing, using example datasets. Object x analyte are data matrices containing the measured analyte levels for a set of analytes across a set of objects such as cells or nuclei (see Fig 3). For example, this could be a matrix of barcoded cells x gene expression values or a matrix of barcoded cells x metabolite levels. The values included might be raw computed gene expression values or normalized values, spanning multiple datasets. Alternatively, an object x analyte dataset could simply denote a novel set of annotations for an existing analysis. In this case, it would not include the main data matrix.

EPICs Figure 3

Figure 3: Object x analyte EPICs capture non-image based analyses.

EPIC Example Type 3

3D Reconstruction: Draft

This specification is in need of example datasets that can be used for testing. 3D reconstructions are pseudo 3D volumes generated from serial sections (see Fig 4). All images used to create the 3D reconstruction must be uploaded prior to uploading this EPIC. True 3D images such as from the lightsheet assay should be uploaded as primary data and not using this EPIC datatype. A 3D reconstruction might not be a comprehensive 3D volume, for example, if one of the serial sections is missing. Hence, these are considered “pseudo” 3D volumes and sometimes referred to as “2.5D”. There is also no expectation with regard to the type of images used in a 3D reconstruction. For example, the reconstruction could include Visium, histology, and CODEX images.

EPICs Figure 4

Figure 4: A 3D reconstruction is typically a 3D image creatred from combining a set of 2D images.

Cell Type Annotations

A critical component of EPICs is capturing lab-derived cell type annotations, specifically annotations curated by data providers with expert knowledge about the datasets. While not yet feasible, cell type annotations from EPICs will soon allow Data Portal users to query datasets with questions like “which datasets have B cells?”

For Segmentation Mask EPICs, part of the ingestion process includes collection of the distinct object types in the masks, annotation tools used, whether the mask is 2D or 3D, and more. This information is integrated into the database that underlies the Data Portal searches. The Data Portal is currently being updated to make use of this new information, allowing users to use these details to filter datasets (e.g., “which contain FTUs annotated by FUSION?”). With the finalization of the Object x Analyte EPIC, the Data Portal will be expanded to allow for queries like “which RNA-seq datasets contain cell type annotations from Azimuth?”

Uploading EPICs

Data providers can submit EPICs via the HuBMAP Ingest UI

For more information on registering data via the Ingest UI, see Data Submission Guide - registering data.

Minimum Upload Requirements

Like non-EPIC data uploads, EPICs have file requirements for successful data submission (Fig 6):

Figure 6: Reqired Metadata, Contributor, and Data files (and directories) for EPIC datasets.