ATACseq

NOTE: Several versions of this metadata schema have been created over time. The (Latest) version contains most attributes, but there may be some deprecated attributes in the older versions for which data has been collected. HuBMAP is in the process of creating a reference which combines all of these versions into a single view. That reference will be available here once completed.

Version 3 (Latest)

Attribute	Type	Description	Allowable Values	Required
analyte_class	Allowable Value	Analytes are the target molecules being measured with the assay.	`Chromatin` `DNA` `DNA + RNA` `Endogenous fluorophores` `Fluorochrome` `Lipid` `Metabolite` `Nucleic acid and protein` `Peptide` `Polysaccharide` `Protein` `RNA`	True
acquisition_instrument_vendor	Allowable Value	An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass.	`Akoya Biosciences` `Andor` `BGI Genomics` `Bruker` `Cytiva` `Evident Scientific (Olympus)` `GE Healthcare` `Hamamatsu` `Huron Digital Pathology` `Illumina` `In-House` `Ionpath` `Keyence` `Leica Biosystems` `Leica Microsystems` `Motic` `NanoString` `Resolve Biosciences` `Sciex` `Standard BioTools (Fluidigm)` `Thermo Fisher Scientific` `Zeiss Microscopy`	True
acquisition_instrument_model	Allowable Value	Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data.	`Aperio AT2` `Aperio CS2` `Axio Observer 3` `Axio Observer 5` `Axio Observer 7` `Axio Scan.Z1` `BZ-X710` `BZ-X800` `BZ-X810` `CosMx Spatial Molecular Imager` `Custom: Multiphoton` `Digital Spatial Profiler` `DM6 B` `DNBSEQ-T7` `EVOS M7000` `HiSeq 2500` `HiSeq 4000` `Hyperion Imaging System` `IN Cell Analyzer 2200` `Lightsheet 7` `MALDI timsTOF Flex Prototype` `MIBIscope` `MoticEasyScan One` `NanoZoomer 2.0-HT` `NanoZoomer S210` `NanoZoomer S360` `NanoZoomer S60` `NanoZoomer-SQ` `NextSeq 2000` `NextSeq 500` `NextSeq 550` `NovaSeq 6000` `NovaSeq X` `NovaSeq X Plus` `Orbitrap Eclipse Tribrid` `Orbitrap Fusion Lumos Tribrid` `Phenocycler-Fusion 1.0` `Phenocycler-Fusion 2.0` `PhenoImager Fusion` `Q Exactive` `Q Exactive HF` `Q Exactive UHMR` `QTRAP 5500` `Resolve Biosciences Molecular Cartography` `SCN400` `STELLARIS 5` `TissueScope LE Slide Scanner` `Unknown` `VS200 Slide Scanner` `Xenium Analyzer` `Zyla 4.2 sCMOS`	True
source_storage_duration_value	Numeric	How long was the source material (parent) stored, prior to this sample being processed.		True
source_storage_duration_unit	Allowable Value	The time duration unit of measurement	`hour` `month` `day` `minute` `year`	True
time_since_acquisition_instrument_calibration_value	Numeric	The amount of time since the acqusition instrument was last serviced by the vendor. This provides a metric for assessing drift in data capture.		False
time_since_acquisition_instrument_calibration_unit	Allowable Value	The time unit of measurement	`Column-by-column` `Not applicable` `Row-by-row` `Snake-by-columns` `Snake-by-rows`	False
contributors_path	Textfield	The path to the file with the ORCID IDs for all contributors of this dataset (e.g., “./extras/contributors.tsv” or “./contributors.tsv”). This is an internal metadata field that is just used for ingest.		True
data_path	Textfield	The top level directory containing the raw and/or processed data. For a single dataset upload this might be “.” where as for a data upload containing multiple datasets, this would be the directory name for the respective dataset. For instance, if the data is within a directory called “TEST001-RK” use syntax “./TEST001-RK” for this field. If there are multiple directory levels, use the format “./TEST001-RK/Run1/Pass2” in which “Pass2” is the subdirectory where the single dataset’s data is stored. This is an internal metadata field that is just used for ingest.		True
barcode_read	Allowable Value	Which read file contains the cell or capture spot barcode. This should be included when constructing sequencing libraries with a non-commercial kit. This field is required if the source material is barcoded. This field is used to determine which analysis pipeline to run.	`Read 2 (R2)` `Read 1 (R1)` `Not applicable`	True
barcode_size	Allowable Value	Length of the cell or capture spot barcode in base pairs. Cell and capture spot barcodes are, for example, 3 x 8 bp sequences that are spaced by constant sequences, the offsets. This should be included when constructing sequencing libraries with a non-commercial kit. This field is required if the source material is barcoded. This field is used to determine which analysis pipeline to run.	`14` `16` `40` `8,8,8` `8,6` `Not applicable`	True
umi_read	Allowable Value	Which read file contains the UMI barcode. This should be included when constructing sequencing libraries with a non-commercial kit.	`Read 2 (R2)` `Read 1 (R1)` `Not applicable`	True
umi_size	Allowable Value	Length of the umi barcode in base pairs. This should be included when constructing sequencing libraries with a non-commercial kit. This field is required if UMI are present. This field is used to determine which analysis pipeline to run.	`8` `9` `10` `12` `Not applicable`	True
assay_input_entity	Allowable Value	This is the entity from which the analyte is being captured. For example, for bulk sequencing this would be “tissue”, while it would be “single cell” for single cell sequencing. This field is used to determine which analysis pipeline to run.	`area of interest` `single cell` `single nucleus` `spot` `tissue (bulk)`	True
number_of_input_cells_or_nuclei	Numeric	How many cells or nuclei were input to the assay? This is typically not available for preparations working with bulk tissue.		False
library_adapter_sequence	Textfield	5’ and/or 3’ read adapter sequences used as part of the library preparation protocol to render the library compatible with the sequencing protocol and instrumentation. This should be provided as comma-separated list of key:value pairs (adapter name:sequence).		True
library_average_fragment_size	Numeric	Average size of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation. Numeric value in base pairs (bp).		True
library_input_amount_value	Numeric	The amount of cDNA, after amplification, that was used for library construction.		False
library_input_amount_unit	Allowable Value	unit of library input amount value	`ng` `ul`	False
library_output_amount_value	Numeric	Total amount (eg. nanograms) of library after the clean-up step of final pcr amplification step. Answer the question: What is the Qubit measured concentration (ng/ul) times the elution volume (ul) after the final clean-up step?		False
library_output_amount_unit	Allowable Value	Units of library final yield.	`ng` `ul`	False
library_concentration_value	Numeric	The concentration value of the pooled library samples submitted for sequencing.		True
library_concentration_unit	Allowable Value	Unit of library concentration value.	`ng/ul` `nM`	True
library_layout	Allowable Value	Whether the library was generated for single-end or paired end sequencing	`paired-end` `single-end`	True
number_of_pcr_cycles_for_indexing	Numeric	Number of PCR cycles performed in order to add adapters and amplify the library. This does not include the cDNA amplification which is captured in the “number of iterations of cDNA amplification” field.		True
library_preparation_kit	Allowable Value	Reagent kit used for library preparation	`10X Genomics; Automated Library Construction Kit` `24 rxns; PN 1000428` `10X Genomics; Chromium Next GEM Automated Single Cell 5' Kit v2` `24 rxns; PN 1000290` `10X Genomics; Chromium Next GEM Automated Single Cell 5' Kit v2` `4 rxns; PN 1000298` `10X Genomics; Chromium Next GEM Single Cell 3' GEM` `Library & Gel Bead Kit v3.1` `16 rxns; PN 1000121` `10X Genomics; Chromium Next GEM Single Cell 3' HT Kit v3.1` `48 rxns; PN 1000348` `10X Genomics; Chromium Next GEM Single Cell 3' HT Kit v3.1` `8 rxns; PN 1000370` `10X Genomics; Chromium Next GEM Single Cell 3' Kit v3.1` `16 rxns; PN 1000268` `10X Genomics; Chromium Next GEM Single Cell 3' Kit v3.1` `4 rxns; PN 1000269` `10X Genomics; Chromium Next GEM Single Cell 5' Kit v2` `16 rxns; PN 1000263` `10X Genomics; Chromium Next GEM Single Cell 5' Kit v2` `4 rxns; PN 1000265` `10X Genomics; Chromium Next GEM Single Cell Fixed RNA Hybridization & Library Kit` `4 rxns; PN 1000415` `10X Genomics; Chromium NextGem Single Cell Multiome ATAC + Gene Expression Reagent Bundle` `16 rxn; PN 1000283` `10X Genomics; Chromium NextGem Single Cell Multiome ATAC + Gene Expression Reagent Bundle` `4 rxn; PN 1000285` `10X Genomics; Chromium Single Cell 3' GEM` `Library & Gel Bead Kit v3` `4 rxns PN 1000092` `10X Genomics; Chromium Single Cell 3' Library & Gel Bead Kit` `4 rxns; PN 120267` `10X Genomics; Visium CytAssist Spatial Gene Expression for FFPE` `Human Transcriptome` `11 mm` `2 reactions; PN 1000522` `10X Genomics; Visium CytAssist Spatial Gene Expression for FFPE` `Human Transcriptome` `6.5mm` `4 reactions; PN 1000520` `10X Genomics; Visium Spatial for FFPE Gene Expression Kit` `Human Transcriptome` `1 slides` `4 reactions; PN 1000338` `10X Genomics; Visium Spatial for FFPE Gene Expression Kit` `Mouse Transcriptome` `4 rxns; PN 1000339` `10X Genomics; Visium Spatial Gene Expression Slide and Reagent Kit` `1 slides` `4 reactions; PN 1000187` `10X Genomics; Visium Spatial Gene Expression Slide and Reagent Kit` `4 slides` `16 reactions; PN 1000184` `Custom` `Illumina; TruSeq Stranded mRNA Library Prep (48 samples); PN 20020594` `Illumina; TruSeq Stranded mRNA Library Prep (96 samples); PN 20020595` `New England BioLabs; NEBNext Ultra II RNA Library Prep Kit for Illumina; PN E7770` `Parse Biosciences; Evercode WT Mini v2 Kit` `12 rxns; PN ECW02010` `Parse Biosciences; Evercode WT v2 Kit` `48 rxns; PN ECW02030)`	True
sample_indexing_kit	Allowable Value	Indexes are needed for multiplexing sequencing libraries for simultaneous sequencing (pooling) and proper attachment to the Illumina flowcell. Each indexing kit would have a number of compatible sequences (“sample indexing sets”) that are used to label some number of samples (the number of sets depend on the kit).	`10X Genomics; Chromium i7 Sample Index Plate (96 rxn); PN 220103` `10X Genomics; Dual Index Kit TS` `Set A; PN 1000251` `10X Genomics; Dual Index Kit TT` `Set A (96 rxn); PN 1000215` `10X Genomics; Single Index Kit N` `Set A (96 rxn); PN 1000212` `Custom` `Illumina; IDT for Illumina - TruSeq RNA UD Indexes v2 (96 Indexes` `96 Samples); PN 20040871` `Illumina; TruSeq RNA CD Index Plate (96 Indexes` `96 Samples); PN 20019792` `Illumina; TruSeq RNA Single Indexes Set A (12 Indexes` `48 Samples); PN 20020492` `Illumina; TruSeq RNA Single Indexes Set B (12 Indexes` `48 Samples); PN 20020493` `Integrated DNA Technologies: Custom DNA Oligos` `NanoString Technologies; GeoMx Seq Code Pack; PN GMX-NGS-SEQ-AB` `NanoString Technologies; GeoMx Seq Code Pack; PN GMX-NGS-SEQ-CD` `NanoString Technologies; GeoMx Seq Code Pack; PN GMX-NGS-SEQ-EF` `NanoString Technologies; GeoMx Seq Code Pack; PN GMX-NGS-SEQ-GH` `Not applicable` `Parse Biosciences; Fragmentation Reagents; PN WX100` `Parse Biosciences; UDI Plate - WT; PN UDI1001`	True
sample_indexing_set	Textfield	The specific sequencing barcode index set used, selected from the sample indexing kit. Example: For 10X this might be “SI-GA-A1”, for Nextera “N505 - CTCCTTAC”		True
is_technical_replicate	Allowable Value	Is the sequencing reaction run in replicate, “Yes” or “No”. If “Yes”, FASTQ files in dataset need to be merged.	`Yes` `No`	True
expected_entity_capture_count	Numeric	Number of cells, nuclei or capture spots expected to be captured by the assay. For Visium this is the total number of spots covered by tissue, within the capture area.		False
sequencing_reagent_kit	Allowable Value	Reagent kit used for sequencing	`Custom` `Illumina` `HiSeq 3000/4000 PE Cluster Kit PE-410-1001` `PN 1000283, Illumina` `NextSeq 1000/2000 P2 Reagent v3 Kit (100 Cycles)` `PN 20046811, Illumina` `NextSeq 1000/2000 P2 Reagent v3 Kit (200 Cycles)` `PN 20046812, Illumina` `NextSeq 1000/2000 P2 Reagent v3 Kit (300 Cycles)` `PN 20046813, Illumina` `NextSeq 2000 P3 Reagent Kit (300 Cycles)` `PN 20040561, Illumina` `NextSeq 2000 P3 Reagents Kit (100 Cycles)` `PN 20040559, Illumina` `NextSeq 500/550 Hi Output Kit 150 Cycles` `v2.5` `PN 20024907, Illumina` `NextSeq 500/550 Hi Output Kit 75 Cycles v2.5` `PN 20024906, Illumina` `NextSeq 500/550 Mid Output Kit 150 Cycles v2.5` `PN 20024904, Illumina` `NovaSeq 6000 S1 Reagent Kit (200 Cycles)` `PN 20012864, Illumina` `NovaSeq 6000 S1 Reagent v1.5 Kit (100 Cycles)` `PN 20028319, Illumina` `NovaSeq 6000 S1 Reagent v1.5 Kit (200 Cycles)` `PN 20028318, Illumina` `NovaSeq 6000 S1 Reagent v1.5 Kit (300 Cycles)` `PN 20028317, Illumina` `NovaSeq 6000 S2 Reagent v1.5 Kit (100 Cycles)` `PN 20028316, Illumina` `NovaSeq 6000 S4 Reagent Kit v1.5 (300 cycles)` `PN 20028312, Illumina` `NovaSeq 6000 S4 Reagent v1.5 Kit (200 Cycles)` `PN 20028313, Illumina` `NovaSeq 6000 SP Reagent v1.5 Kit (100 Cycles)` `PN 20028401, Illumina` `NovaSeq X Series 1.5B Reagent Kit (100 Cycle)` `PN 20104703, Illumina` `NovaSeq X Series 1.5B Reagent Kit (200 Cycle)` `PN 20104704, Illumina` `NovaSeq X Series 1.5B Reagent Kit (300 Cycle)` `PN 20104705, Illumina` `NovaSeq X Series 10B Reagent Kit (100 Cycle)` `PN 20085596, Illumina` `NovaSeq X Series 10B Reagent Kit (200 Cycle)` `PN 20085595, Illumina` `NovaSeq X Series 10B Reagent Kit (300 Cycle)` `PN 20085594`	True
sequencing_read_format	Textfield	Number of sequencing cycles in each round of sequencing (i.e., Read1, i7 index, i5 index, and Read2). This is reported as a comma-delimited list. Example: For 10X snATAC-seq (R1,Index,R2,R3) this might be: 50,8,16,50. For SNARE-seq2 this might be: 75,94,8,75		True
transposition_method	Allowable Value	Modality of capturing accessible chromatin molecules. For example, this would be the type of kit that was used.	`bulkATACseq` `sciATACseq` `Custom` `scATACseq`	True
sequencing_batch_id	Textfield	The ID for the sequencing run. This could, for example, be the chip ID and should allow users the ability to determine which samples were processed together in a sequencing run. It is recommended that data providers prefix the ID with the center name, to prevent values overlapping across centers.		False
capture_batch_id	Textfield	A lab-generated ID to identify which cells were captured at the same time. This would, for example, be an ID to denote which datasets were derived from a single 10X Genomics Chromium Controller run. In the case of the 10X Controller this could be the chip ID and would allow users the ability to determine which samples were processed together in a Chromium controller. It is recommended that data providers prefix the ID with the center name, to prevent values overlapping across centers.		False
preparation_instrument_vendor	Allowable Value	The manufacturer of the instrument used to prepare (staining/processing) the sample for the assay. If an automatic slide staining method was indicated this field should list the manufacturer of the instrument.	`10x Genomics` `Hamamatsu` `HTX Technologies` `In-House` `Leica Biosystems` `Not applicable` `Roche Diagnostics` `SunChrom` `Thermo Fisher Scientific`	False
preparation_instrument_model	Allowable Value	Manufacturers of a staining system instrument may offer various versions (models) of that instrument with different features. Differences in features or sensitivities may be relevant to processing or interpretation of the data.	`AutoStainer XL` `Chromium Connect` `Chromium Controller` `Chromium iX` `Chromium X` `Discovery Ultra` `EVOS M7000` `M3+ Sprayer` `M5 Sprayer` `NanoZoomer S210` `NanoZoomer S360` `NanoZoomer S60` `Not applicable` `ST5020 Multistainer` `Sublimator` `SunCollect Sprayer` `TM-Sprayer` `Visium CytAssist`	False
metadata_schema_id	Textfield	The string that serves as the definitive identifier for the metadata schema version and is readily interpretable by computers for data validation and processing. Example: 22bc762a-5020-419d-b170-24253ed9e8d9		True
preparation_instrument_kit	Allowable Value	The reagent kit used with the preparation instrument.	`10X Genomics; Chromium Next GEM Chip G Single Cell Kit` `16 rxns; PN 1000127` `10X Genomics; Chromium Next GEM Chip G Single Cell Kit` `48 rxns; PN 1000120` `10X Genomics; Chromium Next GEM Chip K Automated Single Cell Kit` `48 rxns; PN 1000289` `10X Genomics; Chromium Next GEM Chip K Single Cell Kit` `16 rxns; PN 1000287` `10X Genomics; Chromium Next GEM Chip K Single Cell Kit` `48 rxns; PN 1000286` `10X Genomics; Chromium Next GEM Chip Q Single Cell Kit` `16 rxns; PN 1000422` `10X Genomics; Chromium NextGem Single Cell Multiome ATAC + Gene Expression Reagent Bundle` `16 rxn; PN 1000283` `10X Genomics; Chromium NextGem Single Cell Multiome ATAC + Gene Expression Reagent Bundle` `4 rxn; PN 1000285` `10X Genomics; Visium FFPE Reagent Kit v2-Small` `PN 1000436` `Custom`	False
preparation_protocol_doi	Link	DOI for the protocols.io page that describes the assay or sample procurment and preparation. For example for an imaging assay, the protocol might include staining of a section through the creation of an OME-TIFF file. In this case the protocol would include any image processing steps required to create the OME-TIFF file. Example: https://dx.doi.org/10.17504/protocols.io.eq2lyno9qvx9/v1		True
is_targeted	Allowable Value	Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay (“Yes” or “No”). The CODEX analyte is protein.	`Yes` `No`	True
parent_sample_id	Textfield	Unique HuBMAP or SenNet identifier of the sample (i.e., block, section or suspension) used to perform this assay. For example, for a RNAseq assay, the parent would be the suspension, whereas, for one of the imaging assays, the parent would be the tissue section. If an assay comes from multiple parent samples then this should be a comma separated list. Example: HBM386.ZGKG.235, HBM672.MKPK.442 or SNT232.UBHJ.322, SNT329.ALSK.102		True
barcode_offset	Allowable Value	Positions in the read at which the cell or capture spot barcodes start. Cell and capture spot barcodes are, for example, 3 x 8 bp sequences that are spaced by constant sequences (the offsets). First barcode at position 0, then 38, then 76. This should be included when constructing sequencing libraries with a non-commercial kit.	`0` `8` `20` `1,27` `0,38,76` `10,48,86` `Not applicable`	True
umi_offset	Allowable Value	Position in the read at which the UMI barcode starts. This should be included when constructing sequencing libraries with a non-commercial kit.	`0` `16` `36` `Not applicable`	True
dataset_type	Allowable Value	The specific type of dataset being produced.	`10X Multiome` `2D Imaging Mass Cytometry` `ATACseq` `Auto-fluorescence` `Cell DIVE` `CODEX` `Confocal` `CosMx` `CyCIF` `DBiT` `DESI` `Enhanced Stimulated Raman Spectroscopy (SRS)` `GeoMx (nCounter)` `GeoMx (NGS)` `HiFi-Slide` `Histology` `LC-MS` `Light Sheet` `MALDI` `MERFISH` `MIBI` `Molecular Cartography` `MUSIC` `nanoSPLITS` `PhenoCycler` `Resolve` `RNAseq` `RNAseq (with probes)` `Second Harmonic Generation (SHG)` `SIMS` `SNARE-seq2` `Stereo-seq` `Thick section Multiphoton MxIF` `Visium (no probes)` `Visium (with probes)` `Xenium`	True

SNARE-seq2 / sciATACseq / snATACseq Version 1

Attribute	Type	Description	Allowable Values	Required
version	Allowable Value	Version of the schema to use when validating this metadata.	[‘1’]	True
description	Textfield	Free-text description of this assay.		True
donor_id	Textfield	HuBMAP Display ID of the donor of the assayed tissue.		True
tissue_id	Textfield	HuBMAP Display ID of the assayed tissue.		True
execution_datetime	Datetime	Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros.		True
protocols_io_doi	Textfield	DOI for protocols.io referring to the protocol for this assay.		True
operator	Textfield	Name of the person responsible for executing the assay.		True
operator_email	Textfield	Email address for the operator.		True
pi	Textfield	Name of the principal investigator responsible for the data.		True
pi_email	Textfield	Email address for the principal investigator.		True
assay_category	Allowable Value	Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence.	[‘sequence’]	True
assay_type	Allowable Value	The specific type of assay being executed.	[‘SNARE-seq2’, ‘sciATACseq’, ‘snATACseq’]	True
analyte_class	Allowable Value	Analytes are the target molecules being measured with the assay.	[‘DNA’]	True
is_targeted	boolean	Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay.		True
acquisition_instrument_vendor	Textfield	An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass.		True
acquisition_instrument_model	Textfield	Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data.		True
is_technical_replicate	boolean	If TRUE, fastq files in dataset need to be merged.		True
library_id	Textfield	A library ID, unique within a TMC, which allows corresponding RNA and chromatin accessibility datasets to be linked.		True
sc_isolation_protocols_io_doi	Textfield	Link to a protocols document answering the question: How were single cells separated into a single-cell suspension?		True
sc_isolation_entity	Allowable Value	The type of single cell entity derived from isolation protocol.	[‘whole cell’, ‘nucleus’, ‘cell-cell multimer’, ‘spatially encoded cell barcoding’]	True
sc_isolation_tissue_dissociation	Textfield	The method by which tissues are dissociated into single cells in suspension.		True
sc_isolation_enrichment	Allowable Value	The method by which specific cell populations are sorted or enriched.	[‘none’, ‘FACS’]	True
sc_isolation_quality_metric	Textfield	A quality metric by visual inspection prior to cell lysis or defined by known parameters such as wells with several cells or no cells. This can be captured at a high level. “OK” or “not OK”, or with more specificity such as “debris”, “clump”, “low clump”.		True
sc_isolation_cell_number	Numeric	Total number of cell/nuclei yielded post dissociation and enrichment.		True
transposition_input	Numeric	Number of cell/nuclei input to the assay.		True
transposition_method	Allowable Value	Modality of capturing accessible chromatin molecules.	[‘SNARE-Seq2-AC’, ‘bulkATACseq’, ‘snATACseq’, ‘sciATACseq’]	True
transposition_transposase_source	Allowable Value	The source of the Tn5 transposase and transposon used for capturing accessible chromatin.	[‘10X snATAC’, ‘In-house’, ‘Nextera’, ‘10X multiome’]	True
transposition_kit_number	Textfield	If Tn5 came from a kit, provide the catalog number.		False
library_construction_protocols_io_doi	Textfield	A link to the protocol document containing the library construction method (including version) that was used, e.g. “Smart-Seq2”, “Drop-Seq”, “10X v3”. DOI for protocols.io referring to the protocol for this assay.		True
library_layout	Allowable Value	State whether the library was generated for single-end or paired end sequencing.	[‘single-end’, ‘paired-end’]	True
library_adapter_sequence	Textfield	Adapter sequence to be used for adapter trimming.		True
cell_barcode_read	Textfield	Which read file(s) contains the cell barcode. Multiple cell_barcode_read files must be provided as a comma-delimited list (e.g. file1,file2,file3). This field is not required for barcoding by single-cell combinatorial indexing.		False
cell_barcode_offset	Textfield	Positions in the read at which the cell barcodes start. Cell barcodes are, for example, 3 x 8 bp sequences that are spaced by constant sequences (the offsets). First barcode at position 0, then 38, then 76. (Does not apply to sciATACseq, SNARE-seq and BulkATAC.)		False
cell_barcode_size	Textfield	Length of the cell barcode in base pairs. Cell barcodes are, for example, 3 x 8 bp sequences that are spaced by constant sequences, the offsets. (Does not apply to sciATACseq, SNARE-seq and BulkATAC.)		False
library_pcr_cycles	Numeric	Number of PCR cycles to enrich for accessible chromatin fragments.		True
library_pcr_cycles_for_sample_index	Numeric	Number of PCR cycles performed for library generation (figure in Descriptions section)		True
library_final_yield	Numeric	Total ng of library after final pcr amplification step.		True
library_final_yield_unit	Allowable Value	Units for library_final_yield	[‘ng’]	False
library_average_fragment_size	Numeric	Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation.		True
sequencing_reagent_kit	Textfield	Reagent kit used for sequencing		True
sequencing_read_format	Textfield	Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2.		True
sequencing_read_percent_q30	Numeric	Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …)		True
sequencing_phix_percent	Numeric	Percent PhiX loaded to the run		True
contributors_path	Textfield	Relative path to file with ORCID IDs for contributors for this dataset.		True
data_path	Textfield	Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions.		True

SNARE-seq2 / sciATACseq / snATACseq Version 0

Attribute	Type	Description	Allowable Values	Required
donor_id	Textfield	HuBMAP Display ID of the donor of the assayed tissue.		True
tissue_id	Textfield	HuBMAP Display ID of the assayed tissue.		True
execution_datetime	Datetime	Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros.		True
protocols_io_doi	Textfield	DOI for protocols.io referring to the protocol for this assay.		True
operator	Textfield	Name of the person responsible for executing the assay.		True
operator_email	Textfield	Email address for the operator.		True
pi	Textfield	Name of the principal investigator responsible for the data.		True
pi_email	Textfield	Email address for the principal investigator.		True
assay_category	Allowable Value	Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence.	[‘sequence’]	True
assay_type	Allowable Value	The specific type of assay being executed.	[‘scRNAseq-10xGenomics’, ‘snRNAseq-10xGenomics-v2’, ‘snRNAseq-10xGenomics-v3’, ‘scRNAseq’, ‘sciRNAseq’, ‘snRNAseq’, ‘SNARE2-RNAseq’]	True
analyte_class	Allowable Value	Analytes are the target molecules being measured with the assay.	[‘RNA’]	True
is_targeted	boolean	Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay.		True
acquisition_instrument_vendor	Textfield	An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass.		True
acquisition_instrument_model	Textfield	Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data.		True
sc_isolation_protocols_io_doi	Textfield	Link to a protocols document answering the question: How were single cells separated into a single-cell suspension?		True
sc_isolation_entity	Textfield	The type of single cell entity derived from isolation protocol		True
sc_isolation_tissue_dissociation	Textfield	The method by which tissues are dissociated into single cells in suspension.		True
sc_isolation_enrichment	Textfield	The method by which specific cell populations are sorted or enriched.		False
sc_isolation_quality_metric	Textfield	A quality metric by visual inspection prior to cell lysis or defined by known parameters such as wells with several cells or no cells. This can be captured at a high level.		True
sc_isolation_cell_number	Numeric	Total number of cell/nuclei yielded post dissociation and enrichment		True
rnaseq_assay_input	Numeric	Number of cell/nuclei input to the assay		True
rnaseq_assay_method	Textfield	The kit used for the RNA sequencing assay		True
library_construction_protocols_io_doi	Textfield	A link to the protocol document containing the library construction method (including version) that was used, e.g. “Smart-Seq2”, “Drop-Seq”, “10X v3”.		True
library_layout	Allowable Value	State whether the library was generated for single-end or paired end sequencing.	[‘single-end’, ‘paired-end’]	True
library_adapter_sequence	Textfield	Adapter sequence to be used for adapter trimming		True
library_id	Textfield	A library ID, unique within a TMC, which allows corresponding RNA and chromatin accessibility datasets to be linked.		True
is_technical_replicate	boolean	Is the sequencing reaction run in repliucate, TRUE or FALSE		True
cell_barcode_read	Textfield	Which read file contains the cell barcode		True
cell_barcode_offset	Textfield	Position(s) in the read at which the cell barcode starts.		True
cell_barcode_size	Textfield	Length of the cell barcode in base pairs		True
library_pcr_cycles	Numeric	Number of PCR cycles to amplify cDNA		True
library_pcr_cycles_for_sample_index	Numeric	Number of PCR cycles performed for library indexing		True
library_final_yield_value	Numeric	Total number of ng of library after final pcr amplification step. This is the concentration (ng/ul) * volume (ul)		True
library_final_yield_unit	Allowable Value	Units of final library yield	[‘ng’]	False
library_average_fragment_size	Numeric	Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation.		True
sequencing_reagent_kit	Textfield	Reagent kit used for sequencing		True
sequencing_read_format	Textfield	Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2.		True
sequencing_read_percent_q30	Numeric	Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …)		True
sequencing_phix_percent	Numeric	Percent PhiX loaded to the run		True
contributors_path	Textfield	Relative path to file with ORCID IDs for contributors for this dataset.		True
data_path	Textfield	Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions.		True

bulkATACseq Version 1

Attribute	Type	Description	Allowable Values	Required
version	Allowable Value	Version of the schema to use when validating this metadata.	[‘1’]	True
description	Textfield	Free-text description of this assay.		True
donor_id	Textfield	HuBMAP Display ID of the donor of the assayed tissue.		True
tissue_id	Textfield	HuBMAP Display ID of the assayed tissue.		True
execution_datetime	Datetime	Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros.		True
protocols_io_doi	Textfield	DOI for protocols.io referring to the protocol for this assay.		True
operator	Textfield	Name of the person responsible for executing the assay.		True
operator_email	Textfield	Email address for the operator.		True
pi	Textfield	Name of the principal investigator responsible for the data.		True
pi_email	Textfield	Email address for the principal investigator.		True
assay_category	Allowable Value	Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence.	[‘sequence’]	True
assay_type	Allowable Value	The specific type of assay being executed.	[‘bulkATACseq’]	True
analyte_class	Allowable Value	Analytes are the target molecules being measured with the assay.	[‘DNA’]	True
is_targeted	boolean	Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay.		True
acquisition_instrument_vendor	Textfield	An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass.		True
acquisition_instrument_model	Textfield	Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data.		True
bulk_transposition_input_number_nuclei	Textfield	A number (no comma separators)		True
bulk_atac_cell_isolation_protocols_io_doi	Textfield	Link to a protocols document answering the question: How was tissue stored and processed for cell/nuclei isolation		True
is_technical_replicate	boolean	Is this a sequencing replicate?		True
library_adapter_sequence	Textfield	Adapter sequence to be used for adapter trimming		True
library_average_fragment_size	Numeric	Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation.		True
library_concentration_value	Numeric	The concentration value of the pooled library samples submitted for sequencing.		True
library_concentration_unit	Allowable Value	Unit of library_concentration_value	[‘nM’]	False
library_construction_protocols_io_doi	Textfield	A link to the protocol document containing the library construction method (including version) that was used, e.g. “Smart-Seq2”, “Drop-Seq”, “10X v3”.		True
library_creation_date	Datetime	date and time of library creation. YYYY-MM-DD, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s.		False
library_final_yield_value	Numeric	Total number of ng of library after final pcr amplification step. This is the concentration (ng/ul) * volume (ul)		True
library_final_yield_unit	Allowable Value	Units of final library yield	[‘ng’]	False
library_id	Textfield	A library ID, unique within a TMC, which allows corresponding RNA and chromatin accessibility datasets to be linked.		True
library_layout	Allowable Value	State whether the library was generated for single-end or paired end sequencing.	[‘single-end’, ‘paired-end’]	True
library_pcr_cycles	Numeric	Number of PCR cycles performed in order to add adapters and amplify the library. Usually, this includes 5 pre-amplificationn cycles followed by 0-5 additional cycles determined by qPCR.		True
library_preparation_kit	Textfield	Reagent kit used for library preparation		True
sample_quality_metric	Textfield	This is a quality metric by visual inspection. This should answerthe question: Are the nuclei intact and are the nuclei free of significant amountsof debris? This can be captured at a high level, âOKâ or ânotOKâ.		True
sequencing_phix_percent	Numeric	Percent PhiX loaded to the run		True
sequencing_read_format	Textfield	Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2.		True
sequencing_read_percent_q30	Numeric	Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …)		True
sequencing_reagent_kit	Textfield	Reagent kit used for sequencing		True
transposition_kit_number	Textfield	If Tn5 came from a kit, provide the catalog number.		False
transposition_method	Textfield	Modality of capturing accessible chromatin molecules. The kit used, for example.		True
transposition_transposase_source	Textfield	The source of the Tn5 transposase and transposon used for capturing accessible chromatin.		True
contributors_path	Textfield	Relative path to file with ORCID IDs for contributors for this dataset.		True
data_path	Textfield	Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions.		True

bulkATACseq 0

Attribute	Type	Description	Allowable Values	Required
version	Allowable Value	Version of the schema to use when validating this metadata.	[‘1’]	True
description	Textfield	Free-text description of this assay.		True
donor_id	Textfield	HuBMAP Display ID of the donor of the assayed tissue.		True
tissue_id	Textfield	HuBMAP Display ID of the assayed tissue.		True
execution_datetime	Datetime	Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros.		True
protocols_io_doi	Textfield	DOI for protocols.io referring to the protocol for this assay.		True
operator	Textfield	Name of the person responsible for executing the assay.		True
operator_email	Textfield	Email address for the operator.		True
pi	Textfield	Name of the principal investigator responsible for the data.		True
pi_email	Textfield	Email address for the principal investigator.		True
assay_category	Allowable Value	Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence.	[‘sequence’]	True
assay_type	Allowable Value	The specific type of assay being executed.	[‘bulkATACseq’]	True
analyte_class	Allowable Value	Analytes are the target molecules being measured with the assay.	[‘DNA’]	True
is_targeted	boolean	Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay.		True
acquisition_instrument_vendor	Textfield	An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass.		True
acquisition_instrument_model	Textfield	Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data.		True
bulk_transposition_input_number_nuclei	Textfield	A number (no comma separators)		True
bulk_atac_cell_isolation_protocols_io_doi	Textfield	Link to a protocols document answering the question: How was tissue stored and processed for cell/nuclei isolation		True
is_technical_replicate	boolean	Is this a sequencing replicate?		True
library_adapter_sequence	Textfield	Adapter sequence to be used for adapter trimming		True
library_average_fragment_size	Numeric	Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation.		True
library_concentration_value	Numeric	The concentration value of the pooled library samples submitted for sequencing.		True
library_concentration_unit	Allowable Value	Unit of library_concentration_value	[‘nM’]	False
library_construction_protocols_io_doi	Textfield	A link to the protocol document containing the library construction method (including version) that was used, e.g. “Smart-Seq2”, “Drop-Seq”, “10X v3”.		True
library_creation_date	Datetime	date and time of library creation. YYYY-MM-DD, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s.		False
library_final_yield_value	Numeric	Total number of ng of library after final pcr amplification step. This is the concentration (ng/ul) * volume (ul)		True
library_final_yield_unit	Allowable Value	Units of final library yield	[‘ng’]	False
library_id	Textfield	A library ID, unique within a TMC, which allows corresponding RNA and chromatin accessibility datasets to be linked.		True
library_layout	Allowable Value	State whether the library was generated for single-end or paired end sequencing.	[‘single-end’, ‘paired-end’]	True
library_pcr_cycles	Numeric	Number of PCR cycles performed in order to add adapters and amplify the library. Usually, this includes 5 pre-amplificationn cycles followed by 0-5 additional cycles determined by qPCR.		True
library_preparation_kit	Textfield	Reagent kit used for library preparation		True
sample_quality_metric	Textfield	This is a quality metric by visual inspection. This should answerthe question: Are the nuclei intact and are the nuclei free of significant amountsof debris? This can be captured at a high level, âOKâ or ânotOKâ.		True
sequencing_phix_percent	Numeric	Percent PhiX loaded to the run		True
sequencing_read_format	Textfield	Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2.		True
sequencing_read_percent_q30	Numeric	Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …)		True
sequencing_reagent_kit	Textfield	Reagent kit used for sequencing		True
transposition_kit_number	Textfield	If Tn5 came from a kit, provide the catalog number.		False
transposition_method	Textfield	Modality of capturing accessible chromatin molecules. The kit used, for example.		True
transposition_transposase_source	Textfield	The source of the Tn5 transposase and transposon used for capturing accessible chromatin.		True
contributors_path	Textfield	Relative path to file with ORCID IDs for contributors for this dataset.		True
data_path	Textfield	Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions.		True