RNAseq
NOTE: Several versions of this metadata schema have been created over time. The (Latest) version contains most attributes, but there may be some deprecated attributes in the older versions for which data has been collected. HuBMAP is in the process of creating a reference which combines all of these versions into a single view. That reference will be available here once completed.
RNAseq Version 5 (current)
RNAseq Version 5 (current)
Attribute | Type | Description | Allowable Values | Required |
---|---|---|---|---|
analyte_class | Allowable Value | Analytes are the target molecules being measured with the assay. | Chromatin DNA DNA + RNA Endogenous fluorophores Fluorochrome Lipid Metabolite Nucleic acid and protein Peptide Polysaccharide Protein RNA |
True |
acquisition_instrument_vendor | Allowable Value | An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass. | Akoya Biosciences Andor BGI Genomics Bruker Cytiva Evident Scientific (Olympus) GE Healthcare Hamamatsu Huron Digital Pathology Illumina In-House Ionpath Keyence Leica Biosystems Leica Microsystems Motic NanoString Resolve Biosciences Sciex Standard BioTools (Fluidigm) Thermo Fisher Scientific Zeiss Microscopy |
True |
acquisition_instrument_model | Allowable Value | Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data. | Aperio AT2 Aperio CS2 Axio Observer 3 Axio Observer 5 Axio Observer 7 Axio Scan.Z1 BZ-X710 BZ-X800 BZ-X810 CosMx Spatial Molecular Imager Custom: Multiphoton Digital Spatial Profiler DM6 B DNBSEQ-T7 EVOS M7000 HiSeq 2500 HiSeq 4000 Hyperion Imaging System IN Cell Analyzer 2200 Lightsheet 7 MALDI timsTOF Flex Prototype MIBIscope MoticEasyScan One NanoZoomer 2.0-HT NanoZoomer S210 NanoZoomer S360 NanoZoomer S60 NanoZoomer-SQ NextSeq 2000 NextSeq 500 NextSeq 550 NovaSeq 6000 NovaSeq X NovaSeq X Plus Orbitrap Eclipse Tribrid Orbitrap Fusion Lumos Tribrid Phenocycler-Fusion 1.0 Phenocycler-Fusion 2.0 PhenoImager Fusion Q Exactive Q Exactive HF Q Exactive UHMR QTRAP 5500 Resolve Biosciences Molecular Cartography SCN400 STELLARIS 5 TissueScope LE Slide Scanner Unknown VS200 Slide Scanner Xenium Analyzer Zyla 4.2 sCMOS |
True |
source_storage_duration_value | Numeric | How long was the source material (parent) stored, prior to this sample being processed. | True | |
source_storage_duration_unit | Allowable Value | The time duration unit of measurement | hour month day minute year |
True |
time_since_acquisition_instrument_calibration_value | Numeric | The amount of time since the acqusition instrument was last serviced by the vendor. This provides a metric for assessing drift in data capture. | False | |
time_since_acquisition_instrument_calibration_unit | Allowable Value | The time unit of measurement | Column-by-column Not applicable Row-by-row Snake-by-columns Snake-by-rows |
False |
contributors_path | Textfield | The path to the file with the ORCID IDs for all contributors of this dataset (e.g., “./extras/contributors.tsv” or “./contributors.tsv”). This is an internal metadata field that is just used for ingest. | True | |
data_path | Textfield | The top level directory containing the raw and/or processed data. For a single dataset upload this might be “.” where as for a data upload containing multiple datasets, this would be the directory name for the respective dataset. For instance, if the data is within a directory called “TEST001-RK” use syntax “./TEST001-RK” for this field. If there are multiple directory levels, use the format “./TEST001-RK/Run1/Pass2” in which “Pass2” is the subdirectory where the single dataset’s data is stored. This is an internal metadata field that is just used for ingest. | True | |
barcode_read | Allowable Value | Which read file contains the cell or capture spot barcode. This should be included when constructing sequencing libraries with a non-commercial kit. This field is required if the source material is barcoded. This field is used to determine which analysis pipeline to run. | Read 2 (R2) Read 1 (R1) Not applicable |
True |
barcode_size | Allowable Value | Length of the cell or capture spot barcode in base pairs. Cell and capture spot barcodes are, for example, 3 x 8 bp sequences that are spaced by constant sequences, the offsets. This should be included when constructing sequencing libraries with a non-commercial kit. This field is required if the source material is barcoded. This field is used to determine which analysis pipeline to run. | 14 16 40 8,8,8 8,6 Not applicable |
True |
umi_read | Allowable Value | Which read file contains the UMI barcode. This should be included when constructing sequencing libraries with a non-commercial kit. | Read 2 (R2) Read 1 (R1) Not applicable |
True |
umi_size | Allowable Value | Length of the umi barcode in base pairs. This should be included when constructing sequencing libraries with a non-commercial kit. This field is required if UMI are present. This field is used to determine which analysis pipeline to run. | 8 9 10 12 Not applicable |
True |
assay_input_entity | Allowable Value | This is the entity from which the analyte is being captured. For example, for bulk sequencing this would be “tissue”, while it would be “single cell” for single cell sequencing. This field is used to determine which analysis pipeline to run. | area of interest single cell single nucleus spot tissue (bulk) |
True |
number_of_input_cells_or_nuclei | Numeric | How many cells or nuclei were input to the assay? This is typically not available for preparations working with bulk tissue. | False | |
library_adapter_sequence | Textfield | 5’ and/or 3’ read adapter sequences used as part of the library preparation protocol to render the library compatible with the sequencing protocol and instrumentation. This should be provided as comma-separated list of key:value pairs (adapter name:sequence). | True | |
library_average_fragment_size | Numeric | Average size of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation. Numeric value in base pairs (bp). | True | |
library_input_amount_value | Numeric | The amount of cDNA, after amplification, that was used for library construction. | True | |
library_input_amount_unit | Allowable Value | unit of library input amount value | ng ul |
True |
library_output_amount_value | Numeric | Total amount (eg. nanograms) of library after the clean-up step of final pcr amplification step. Answer the question: What is the Qubit measured concentration (ng/ul) times the elution volume (ul) after the final clean-up step? | False | |
library_output_amount_unit | Allowable Value | Units of library final yield. | ng ul |
False |
library_concentration_value | Numeric | The concentration value of the pooled library samples submitted for sequencing. | True | |
library_concentration_unit | Allowable Value | Unit of library concentration value. | ng/ul nM |
True |
library_layout | Allowable Value | Whether the library was generated for single-end or paired end sequencing | paired-end single-end |
True |
number_of_pcr_cycles_for_indexing | Numeric | Number of PCR cycles performed in order to add adapters and amplify the library. This does not include the cDNA amplification which is captured in the “number of iterations of cDNA amplification” field. | True | |
library_preparation_kit | Allowable Value | Reagent kit used for library preparation | 10X Genomics; Automated Library Construction Kit 24 rxns; PN 1000428 10X Genomics; Chromium Next GEM Automated Single Cell 5' Kit v2 24 rxns; PN 1000290 10X Genomics; Chromium Next GEM Automated Single Cell 5' Kit v2 4 rxns; PN 1000298 10X Genomics; Chromium Next GEM Single Cell 3' GEM Library & Gel Bead Kit v3.1 16 rxns; PN 1000121 10X Genomics; Chromium Next GEM Single Cell 3' HT Kit v3.1 48 rxns; PN 1000348 10X Genomics; Chromium Next GEM Single Cell 3' HT Kit v3.1 8 rxns; PN 1000370 10X Genomics; Chromium Next GEM Single Cell 3' Kit v3.1 16 rxns; PN 1000268 10X Genomics; Chromium Next GEM Single Cell 3' Kit v3.1 4 rxns; PN 1000269 10X Genomics; Chromium Next GEM Single Cell 5' Kit v2 16 rxns; PN 1000263 10X Genomics; Chromium Next GEM Single Cell 5' Kit v2 4 rxns; PN 1000265 10X Genomics; Chromium Next GEM Single Cell Fixed RNA Hybridization & Library Kit 4 rxns; PN 1000415 10X Genomics; Chromium NextGem Single Cell Multiome ATAC + Gene Expression Reagent Bundle 16 rxn; PN 1000283 10X Genomics; Chromium NextGem Single Cell Multiome ATAC + Gene Expression Reagent Bundle 4 rxn; PN 1000285 10X Genomics; Chromium Single Cell 3' GEM Library & Gel Bead Kit v3 4 rxns PN 1000092 10X Genomics; Chromium Single Cell 3' Library & Gel Bead Kit 4 rxns; PN 120267 10X Genomics; Visium CytAssist Spatial Gene Expression for FFPE Human Transcriptome 11 mm 2 reactions; PN 1000522 10X Genomics; Visium CytAssist Spatial Gene Expression for FFPE Human Transcriptome 6.5mm 4 reactions; PN 1000520 10X Genomics; Visium Spatial for FFPE Gene Expression Kit Human Transcriptome 1 slides 4 reactions; PN 1000338 10X Genomics; Visium Spatial for FFPE Gene Expression Kit Mouse Transcriptome 4 rxns; PN 1000339 10X Genomics; Visium Spatial Gene Expression Slide and Reagent Kit 1 slides 4 reactions; PN 1000187 10X Genomics; Visium Spatial Gene Expression Slide and Reagent Kit 4 slides 16 reactions; PN 1000184 Custom Illumina; TruSeq Stranded mRNA Library Prep (48 samples); PN 20020594 Illumina; TruSeq Stranded mRNA Library Prep (96 samples); PN 20020595 New England BioLabs; NEBNext Ultra II RNA Library Prep Kit for Illumina; PN E7770 Parse Biosciences; Evercode WT Mini v2 Kit 12 rxns; PN ECW02010 Parse Biosciences; Evercode WT v2 Kit 48 rxns; PN ECW02030) |
True |
sample_indexing_kit | Allowable Value | Indexes are needed for multiplexing sequencing libraries for simultaneous sequencing (pooling) and proper attachment to the Illumina flowcell. Each indexing kit would have a number of compatible sequences (“sample indexing sets”) that are used to label some number of samples (the number of sets depend on the kit). | 10X Genomics; Chromium i7 Sample Index Plate (96 rxn); PN 220103 10X Genomics; Dual Index Kit TS Set A; PN 1000251 10X Genomics; Dual Index Kit TT Set A (96 rxn); PN 1000215 10X Genomics; Single Index Kit N Set A (96 rxn); PN 1000212 Custom Illumina; IDT for Illumina - TruSeq RNA UD Indexes v2 (96 Indexes 96 Samples); PN 20040871 Illumina; TruSeq RNA CD Index Plate (96 Indexes 96 Samples); PN 20019792 Illumina; TruSeq RNA Single Indexes Set A (12 Indexes 48 Samples); PN 20020492 Illumina; TruSeq RNA Single Indexes Set B (12 Indexes 48 Samples); PN 20020493 Integrated DNA Technologies: Custom DNA Oligos NanoString Technologies; GeoMx Seq Code Pack; PN GMX-NGS-SEQ-AB NanoString Technologies; GeoMx Seq Code Pack; PN GMX-NGS-SEQ-CD NanoString Technologies; GeoMx Seq Code Pack; PN GMX-NGS-SEQ-EF NanoString Technologies; GeoMx Seq Code Pack; PN GMX-NGS-SEQ-GH Not applicable Parse Biosciences; Fragmentation Reagents; PN WX100 Parse Biosciences; UDI Plate - WT; PN UDI1001 |
True |
sample_indexing_set | Textfield | The specific sequencing barcode index set used, selected from the sample indexing kit. Example: For 10X this might be “SI-GA-A1”, for Nextera “N505 - CTCCTTAC” | True | |
is_technical_replicate | Allowable Value | Is the sequencing reaction run in replicate, “Yes” or “No”. If “Yes”, FASTQ files in dataset need to be merged. | Yes No |
True |
expected_entity_capture_count | Numeric | Number of cells, nuclei or capture spots expected to be captured by the assay. For Visium this is the total number of spots covered by tissue, within the capture area. | False | |
sequencing_reagent_kit | Allowable Value | Reagent kit used for sequencing | Custom Illumina; HiSeq 3000/4000 PE Cluster Kit PE-410-1001; PN 1000283 Illumina; NextSeq 1000/2000 P2 Reagent v3 Kit (100 Cycles); PN 20046811 Illumina; NextSeq 1000/2000 P2 Reagent v3 Kit (200 Cycles); PN 20046812 Illumina; NextSeq 1000/2000 P2 Reagent v3 Kit (300 Cycles); PN 20046813 Illumina; NextSeq 2000 P3 Reagent Kit (300 Cycles); PN 20040561 Illumina; NextSeq 2000 P3 Reagents Kit (100 Cycles); PN 20040559 Illumina; NextSeq 500/550 Hi Output Kit 150 Cycles; v2.5; PN 20024907 Illumina; NextSeq 500/550 Hi Output Kit 75 Cycles v2.5; PN 20024906 Illumina; NextSeq 500/550 Mid Output Kit 150 Cycles v2.5; PN 20024904 Illumina; NovaSeq 6000 S1 Reagent Kit (200 Cycles); PN 20012864 Illumina; NovaSeq 6000 S1 Reagent v1.5 Kit (100 Cycles); PN 20028319 Illumina; NovaSeq 6000 S1 Reagent v1.5 Kit (200 Cycles); PN 20028318 Illumina; NovaSeq 6000 S1 Reagent v1.5 Kit (300 Cycles); PN 20028317 Illumina; NovaSeq 6000 S2 Reagent v1.5 Kit (100 Cycles); PN 20028316 Illumina; NovaSeq 6000 S4 Reagent Kit v1.5 (300 cycles); PN 20028312 Illumina; NovaSeq 6000 S4 Reagent v1.5 Kit (200 Cycles); PN 20028313 Illumina; NovaSeq 6000 SP Reagent v1.5 Kit (100 Cycles); PN 20028401 Illumina; NovaSeq X Series 1.5B Reagent Kit (100 Cycle); PN 20104703 Illumina; NovaSeq X Series 1.5B Reagent Kit (200 Cycle); PN 20104704 Illumina; NovaSeq X Series 1.5B Reagent Kit (300 Cycle); PN 20104705 Illumina; NovaSeq X Series 10B Reagent Kit (100 Cycle); PN 20085596 Illumina; NovaSeq X Series 10B Reagent Kit (200 Cycle); PN 20085595 Illumina; NovaSeq X Series 10B Reagent Kit (300 Cycle); PN 20085594 |
True |
sequencing_read_format | Textfield | Number of sequencing cycles in each round of sequencing (i.e., Read1, i7 index, i5 index, and Read2). This is reported as a comma-delimited list. Example: For 10X snATAC-seq (R1,Index,R2,R3) this might be: 50,8,16,50. For SNARE-seq2 this might be: 75,94,8,75 | True | |
sequencing_batch_id | Textfield | The ID for the sequencing run. This could, for example, be the chip ID and should allow users the ability to determine which samples were processed together in a sequencing run. It is recommended that data providers prefix the ID with the center name, to prevent values overlapping across centers. | False | |
capture_batch_id | Textfield | A lab-generated ID to identify which cells were captured at the same time. This would, for example, be an ID to denote which datasets were derived from a single 10X Genomics Chromium Controller run. In the case of the 10X Controller this could be the chip ID and would allow users the ability to determine which samples were processed together in a Chromium controller. It is recommended that data providers prefix the ID with the center name, to prevent values overlapping across centers. | False | |
preparation_instrument_vendor | Allowable Value | The manufacturer of the instrument used to prepare (staining/processing) the sample for the assay. If an automatic slide staining method was indicated this field should list the manufacturer of the instrument. | 10x Genomics Hamamatsu HTX Technologies In-House Leica Biosystems Not applicable Roche Diagnostics SunChrom Thermo Fisher Scientific |
False |
preparation_instrument_model | Allowable Value | Manufacturers of a staining system instrument may offer various versions (models) of that instrument with different features. Differences in features or sensitivities may be relevant to processing or interpretation of the data. | AutoStainer XL Chromium Connect Chromium Controller Chromium iX Chromium X Discovery Ultra EVOS M7000 M3+ Sprayer M5 Sprayer NanoZoomer S210 NanoZoomer S360 NanoZoomer S60 Not applicable ST5020 Multistainer Sublimator SunCollect Sprayer TM-Sprayer Visium CytAssist |
False |
metadata_schema_id | Textfield | The string that serves as the definitive identifier for the metadata schema version and is readily interpretable by computers for data validation and processing. Example: 22bc762a-5020-419d-b170-24253ed9e8d9 | True | |
amount_of_input_analyte_value | Numeric | The amount of RNA or DNA input to the assay, typically measured by a Qubit, BioAnalyzer, or TapeStation. In most single cell/nuclei assays, this value isn’t available. | False | |
number_of_iterations_of_cdna_amplification | Numeric | This is the amplification of the cDNA prior to library construction. This is typically a PCR amplification, while for linear amplification methods like aRNA this would be the number of rounds of aRNA. | True | |
preparation_instrument_kit | Allowable Value | The reagent kit used with the preparation instrument. | 10X Genomics; Chromium Next GEM Chip G Single Cell Kit 16 rxns; PN 1000127 10X Genomics; Chromium Next GEM Chip G Single Cell Kit 48 rxns; PN 1000120 10X Genomics; Chromium Next GEM Chip K Automated Single Cell Kit 48 rxns; PN 1000289 10X Genomics; Chromium Next GEM Chip K Single Cell Kit 16 rxns; PN 1000287 10X Genomics; Chromium Next GEM Chip K Single Cell Kit 48 rxns; PN 1000286 10X Genomics; Chromium Next GEM Chip Q Single Cell Kit 16 rxns; PN 1000422 10X Genomics; Chromium NextGem Single Cell Multiome ATAC + Gene Expression Reagent Bundle 16 rxn; PN 1000283 10X Genomics; Chromium NextGem Single Cell Multiome ATAC + Gene Expression Reagent Bundle 4 rxn; PN 1000285 10X Genomics; Visium FFPE Reagent Kit v2-Small PN 1000436 Custom |
False |
preparation_protocol_doi | Textfield | DOI for the protocols.io page that describes the assay or sample procurment and preparation. For example for an imaging assay, the protocol might include staining of a section through the creation of an OME-TIFF file. In this case the protocol would include any image processing steps required to create the OME-TIFF file. Example: https://dx.doi.org/10.17504/protocols.io.eq2lyno9qvx9/v1 | True | |
is_targeted | Allowable Value | Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay (“Yes” or “No”). The CODEX analyte is protein. | Yes No |
True |
parent_sample_id | Textfield | Unique HuBMAP or SenNet identifier of the sample (i.e., block, section or suspension) used to perform this assay. For example, for a RNAseq assay, the parent would be the suspension, whereas, for one of the imaging assays, the parent would be the tissue section. If an assay comes from multiple parent samples then this should be a comma separated list. Example: HBM386.ZGKG.235, HBM672.MKPK.442 or SNT232.UBHJ.322, SNT329.ALSK.102 | True | |
umi_offset | Allowable Value | Position in the read at which the UMI barcode starts. This should be included when constructing sequencing libraries with a non-commercial kit. | 0 16 36 Not applicable |
True |
dataset_type | Allowable Value | The specific type of dataset being produced. | 10X Multiome 2D Imaging Mass Cytometry ATACseq Auto-fluorescence Cell DIVE CODEX Confocal CosMx CyCIF DBiT DESI Enhanced Stimulated Raman Spectroscopy (SRS) GeoMx (nCounter) GeoMx (NGS) HiFi-Slide Histology LC-MS Light Sheet MALDI MERFISH MIBI Molecular Cartography MUSIC nanoSPLITS PhenoCycler Resolve RNAseq RNAseq (with probes) Second Harmonic Generation (SHG) SIMS SNARE-seq2 Stereo-seq Thick section Multiphoton MxIF Visium (no probes) Visium (with probes) Xenium |
True |
barcode_offset | Allowable Value | Positions in the read at which the cell or capture spot barcodes start. Cell and capture spot barcodes are, for example, 3 x 8 bp sequences that are spaced by constant sequences (the offsets). First barcode at position 0, then 38, then 76. This should be included when constructing sequencing libraries with a non-commercial kit. | 0 8 20 1,27 0,38,76 10,48,78 10,48,86 Not applicable |
True |
amount_of_input_analyte_unit | Allowable Value | Units of amount of entity input to assay value | ug ng |
False |
RNAseq Version 2
RNAseq Version 2
Attribute | Type | Description | Allowable Values | required |
---|---|---|---|---|
analyte_class | Allowable Value | Analytes are the target molecules being measured with the assay. | Chromatin DNA DNA + RNA Endogenous fluorophores Fluorochrome Lipid Metabolite Nucleic acid and protein Peptide Polysaccharide Protein RNA |
True |
acquisition_instrument_vendor | Allowable Value | An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass. | Akoya Biosciences Andor BGI Genomics Bruker Cytiva Evident Scientific (Olympus) GE Healthcare Hamamatsu Huron Digital Pathology Illumina In-House Ionpath Keyence Leica Biosystems Leica Microsystems Motic NanoString Resolve Biosciences Sciex Standard BioTools (Fluidigm) Thermo Fisher Scientific Zeiss Microscopy |
True |
acquisition_instrument_model | Allowable Value | Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data. | Aperio AT2 Aperio CS2 Axio Observer 3 Axio Observer 5 Axio Observer 7 Axio Scan.Z1 BZ-X710 BZ-X800 BZ-X810 CosMx Spatial Molecular Imager Custom: Multiphoton Digital Spatial Profiler DM6 B DNBSEQ-T7 EVOS M7000 HiSeq 2500 HiSeq 4000 Hyperion Imaging System IN Cell Analyzer 2200 Lightsheet 7 MALDI timsTOF Flex Prototype MIBIscope MoticEasyScan One NanoZoomer 2.0-HT NanoZoomer S210 NanoZoomer S360 NanoZoomer S60 NanoZoomer-SQ NextSeq 2000 NextSeq 500 NextSeq 550 NovaSeq 6000 NovaSeq X NovaSeq X Plus Orbitrap Eclipse Tribrid Orbitrap Fusion Lumos Tribrid Phenocycler-Fusion 1.0 Phenocycler-Fusion 2.0 PhenoImager Fusion Q Exactive Q Exactive HF Q Exactive UHMR QTRAP 5500 Resolve Biosciences Molecular Cartography SCN400 STELLARIS 5 TissueScope LE Slide Scanner Unknown VS200 Slide Scanner Xenium Analyzer Zyla 4.2 sCMOS |
True |
source_storage_duration_value | Numeric | How long was the source material (parent) stored, prior to this sample being processed. | True | |
source_storage_duration_unit | Allowable Value | The time duration unit of measurement | hour month day minute year |
True |
time_since_acquisition_instrument_calibration_value | Numeric | The amount of time since the acqusition instrument was last serviced by the vendor. This provides a metric for assessing drift in data capture. | False | |
time_since_acquisition_instrument_calibration_unit | Allowable Value | The time unit of measurement | Column-by-column Not applicable Row-by-row Snake-by-columns Snake-by-rows |
False |
contributors_path | Textfield | The path to the file with the ORCID IDs for all contributors of this dataset (e.g., “./extras/contributors.tsv” or “./contributors.tsv”). This is an internal metadata field that is just used for ingest. | True | |
data_path | Textfield | The top level directory containing the raw and/or processed data. For a single dataset upload this might be “.” where as for a data upload containing multiple datasets, this would be the directory name for the respective dataset. For instance, if the data is within a directory called “TEST001-RK” use syntax “./TEST001-RK” for this field. If there are multiple directory levels, use the format “./TEST001-RK/Run1/Pass2” in which “Pass2” is the subdirectory where the single dataset’s data is stored. This is an internal metadata field that is just used for ingest. | True | |
barcode_read | Allowable Value | Which read file contains the cell or capture spot barcode. This should be included when constructing sequencing libraries with a non-commercial kit. This field is required if the source material is barcoded. This field is used to determine which analysis pipeline to run. | Read 2 (R2) Read 1 (R1) Not applicable |
True |
barcode_size | Allowable Value | Length of the cell or capture spot barcode in base pairs. Cell and capture spot barcodes are, for example, 3 x 8 bp sequences that are spaced by constant sequences, the offsets. This should be included when constructing sequencing libraries with a non-commercial kit. This field is required if the source material is barcoded. This field is used to determine which analysis pipeline to run. | 14 16 40 8,8,8 8,6 Not applicable |
True |
umi_read | Allowable Value | Which read file contains the UMI barcode. This should be included when constructing sequencing libraries with a non-commercial kit. | Read 2 (R2) Read 1 (R1) Not applicable |
True |
umi_size | Allowable Value | Length of the umi barcode in base pairs. This should be included when constructing sequencing libraries with a non-commercial kit. This field is required if UMI are present. This field is used to determine which analysis pipeline to run. | 8 9 10 12 Not applicable |
True |
assay_input_entity | Allowable Value | This is the entity from which the analyte is being captured. For example, for bulk sequencing this would be “tissue”, while it would be “single cell” for single cell sequencing. This field is used to determine which analysis pipeline to run. | area of interest single cell single nucleus spot tissue (bulk) |
True |
number_of_input_cells_or_nuclei | Numeric | How many cells or nuclei were input to the assay? This is typically not available for preparations working with bulk tissue. | False | |
library_adapter_sequence | Textfield | 5’ and/or 3’ read adapter sequences used as part of the library preparation protocol to render the library compatible with the sequencing protocol and instrumentation. This should be provided as comma-separated list of key:value pairs (adapter name:sequence). | True | |
library_average_fragment_size | Numeric | Average size of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation. Numeric value in base pairs (bp). | True | |
library_input_amount_value | Numeric | The amount of cDNA, after amplification, that was used for library construction. | True | |
library_input_amount_unit | Allowable Value | unit of library input amount value | ng ul |
True |
library_output_amount_value | Numeric | Total amount (eg. nanograms) of library after the clean-up step of final pcr amplification step. Answer the question: What is the Qubit measured concentration (ng/ul) times the elution volume (ul) after the final clean-up step? | False | |
library_output_amount_unit | Allowable Value | Units of library final yield. | ng ul |
False |
library_concentration_value | Numeric | The concentration value of the pooled library samples submitted for sequencing. | True | |
library_concentration_unit | Allowable Value | Unit of library concentration value. | ng/ul nM |
True |
library_layout | Allowable Value | Whether the library was generated for single-end or paired end sequencing | paired-end single-end |
True |
number_of_pcr_cycles_for_indexing | Numeric | Number of PCR cycles performed in order to add adapters and amplify the library. This does not include the cDNA amplification which is captured in the “number of iterations of cDNA amplification” field. | True | |
library_preparation_kit | Allowable Value | Reagent kit used for library preparation | 10X Genomics; Automated Library Construction Kit 24 rxns; PN 1000428 10X Genomics; Chromium Next GEM Automated Single Cell 5' Kit v2 24 rxns; PN 1000290 10X Genomics; Chromium Next GEM Automated Single Cell 5' Kit v2 4 rxns; PN 1000298 10X Genomics; Chromium Next GEM Single Cell 3' GEM Library & Gel Bead Kit v3.1 16 rxns; PN 1000121 10X Genomics; Chromium Next GEM Single Cell 3' HT Kit v3.1 48 rxns; PN 1000348 10X Genomics; Chromium Next GEM Single Cell 3' HT Kit v3.1 8 rxns; PN 1000370 10X Genomics; Chromium Next GEM Single Cell 3' Kit v3.1 16 rxns; PN 1000268 10X Genomics; Chromium Next GEM Single Cell 3' Kit v3.1 4 rxns; PN 1000269 10X Genomics; Chromium Next GEM Single Cell 5' Kit v2 16 rxns; PN 1000263 10X Genomics; Chromium Next GEM Single Cell 5' Kit v2 4 rxns; PN 1000265 10X Genomics; Chromium Next GEM Single Cell Fixed RNA Hybridization & Library Kit 4 rxns; PN 1000415 10X Genomics; Chromium NextGem Single Cell Multiome ATAC + Gene Expression Reagent Bundle 16 rxn; PN 1000283 10X Genomics; Chromium NextGem Single Cell Multiome ATAC + Gene Expression Reagent Bundle 4 rxn; PN 1000285 10X Genomics; Chromium Single Cell 3' GEM Library & Gel Bead Kit v3 4 rxns PN 1000092 10X Genomics; Chromium Single Cell 3' Library & Gel Bead Kit 4 rxns; PN 120267 10X Genomics; Visium CytAssist Spatial Gene Expression for FFPE Human Transcriptome 11 mm 2 reactions; PN 1000522 10X Genomics; Visium CytAssist Spatial Gene Expression for FFPE Human Transcriptome 6.5mm 4 reactions; PN 1000520 10X Genomics; Visium Spatial for FFPE Gene Expression Kit Human Transcriptome 1 slides 4 reactions; PN 1000338 10X Genomics; Visium Spatial for FFPE Gene Expression Kit Mouse Transcriptome 4 rxns; PN 1000339 10X Genomics; Visium Spatial Gene Expression Slide and Reagent Kit 1 slides 4 reactions; PN 1000187 10X Genomics; Visium Spatial Gene Expression Slide and Reagent Kit 4 slides 16 reactions; PN 1000184 Custom Illumina; TruSeq Stranded mRNA Library Prep (48 samples); PN 20020594 Illumina; TruSeq Stranded mRNA Library Prep (96 samples); PN 20020595 New England BioLabs; NEBNext Ultra II RNA Library Prep Kit for Illumina; PN E7770 Parse Biosciences; Evercode WT Mini v2 Kit 12 rxns; PN ECW02010 Parse Biosciences; Evercode WT v2 Kit 48 rxns; PN ECW02030) |
True |
sample_indexing_kit | Allowable Value | Indexes are needed for multiplexing sequencing libraries for simultaneous sequencing (pooling) and proper attachment to the Illumina flowcell. Each indexing kit would have a number of compatible sequences (“sample indexing sets”) that are used to label some number of samples (the number of sets depend on the kit). | 10X Genomics; Chromium i7 Sample Index Plate (96 rxn); PN 220103 10X Genomics; Dual Index Kit TS Set A; PN 1000251 10X Genomics; Dual Index Kit TT Set A (96 rxn); PN 1000215 10X Genomics; Single Index Kit N Set A (96 rxn); PN 1000212 Custom Illumina; IDT for Illumina - TruSeq RNA UD Indexes v2 (96 Indexes 96 Samples); PN 20040871 Illumina; TruSeq RNA CD Index Plate (96 Indexes 96 Samples); PN 20019792 Illumina; TruSeq RNA Single Indexes Set A (12 Indexes 48 Samples); PN 20020492 Illumina; TruSeq RNA Single Indexes Set B (12 Indexes 48 Samples); PN 20020493 Integrated DNA Technologies: Custom DNA Oligos NanoString Technologies; GeoMx Seq Code Pack; PN GMX-NGS-SEQ-AB NanoString Technologies; GeoMx Seq Code Pack; PN GMX-NGS-SEQ-CD NanoString Technologies; GeoMx Seq Code Pack; PN GMX-NGS-SEQ-EF NanoString Technologies; GeoMx Seq Code Pack; PN GMX-NGS-SEQ-GH Not applicable Parse Biosciences; Fragmentation Reagents; PN WX100 Parse Biosciences; UDI Plate - WT; PN UDI1001 |
True |
sample_indexing_set | Textfield | The specific sequencing barcode index set used, selected from the sample indexing kit. Example: For 10X this might be “SI-GA-A1”, for Nextera “N505 - CTCCTTAC” | True | |
is_technical_replicate | Allowable Value | Is the sequencing reaction run in replicate, “Yes” or “No”. If “Yes”, FASTQ files in dataset need to be merged. | Yes No |
True |
expected_entity_capture_count | Numeric | Number of cells, nuclei or capture spots expected to be captured by the assay. For Visium this is the total number of spots covered by tissue, within the capture area. | False | |
sequencing_reagent_kit | Allowable Value | Reagent kit used for sequencing | Custom Illumina; HiSeq 3000/4000 PE Cluster Kit PE-410-1001; PN 1000283 Illumina; NextSeq 1000/2000 P2 Reagent v3 Kit (100 Cycles); PN 20046811 Illumina; NextSeq 1000/2000 P2 Reagent v3 Kit (200 Cycles); PN 20046812 Illumina; NextSeq 1000/2000 P2 Reagent v3 Kit (300 Cycles); PN 20046813 Illumina; NextSeq 2000 P3 Reagent Kit (300 Cycles); PN 20040561 Illumina; NextSeq 2000 P3 Reagents Kit (100 Cycles); PN 20040559 Illumina; NextSeq 500/550 Hi Output Kit 150 Cycles; v2.5; PN 20024907 Illumina; NextSeq 500/550 Hi Output Kit 75 Cycles v2.5; PN 20024906 Illumina; NextSeq 500/550 Mid Output Kit 150 Cycles v2.5; PN 20024904 Illumina; NovaSeq 6000 S1 Reagent Kit (200 Cycles); PN 20012864 Illumina; NovaSeq 6000 S1 Reagent v1.5 Kit (100 Cycles); PN 20028319 Illumina; NovaSeq 6000 S1 Reagent v1.5 Kit (200 Cycles); PN 20028318 Illumina; NovaSeq 6000 S1 Reagent v1.5 Kit (300 Cycles); PN 20028317 Illumina; NovaSeq 6000 S2 Reagent v1.5 Kit (100 Cycles); PN 20028316 Illumina; NovaSeq 6000 S4 Reagent Kit v1.5 (300 cycles); PN 20028312 Illumina; NovaSeq 6000 S4 Reagent v1.5 Kit (200 Cycles); PN 20028313 Illumina; NovaSeq 6000 SP Reagent v1.5 Kit (100 Cycles); PN 20028401 Illumina; NovaSeq X Series 1.5B Reagent Kit (100 Cycle); PN 20104703 Illumina; NovaSeq X Series 1.5B Reagent Kit (200 Cycle); PN 20104704 Illumina; NovaSeq X Series 1.5B Reagent Kit (300 Cycle); PN 20104705 Illumina; NovaSeq X Series 10B Reagent Kit (100 Cycle); PN 20085596 Illumina; NovaSeq X Series 10B Reagent Kit (200 Cycle); PN 20085595 Illumina; NovaSeq X Series 10B Reagent Kit (300 Cycle); PN 20085594 |
True |
sequencing_read_format | Textfield | Number of sequencing cycles in each round of sequencing (i.e., Read1, i7 index, i5 index, and Read2). This is reported as a comma-delimited list. Example: For 10X snATAC-seq (R1,Index,R2,R3) this might be: 50,8,16,50. For SNARE-seq2 this might be: 75,94,8,75 | True | |
sequencing_batch_id | Textfield | The ID for the sequencing run. This could, for example, be the chip ID and should allow users the ability to determine which samples were processed together in a sequencing run. It is recommended that data providers prefix the ID with the center name, to prevent values overlapping across centers. | False | |
capture_batch_id | Textfield | A lab-generated ID to identify which cells were captured at the same time. This would, for example, be an ID to denote which datasets were derived from a single 10X Genomics Chromium Controller run. In the case of the 10X Controller this could be the chip ID and would allow users the ability to determine which samples were processed together in a Chromium controller. It is recommended that data providers prefix the ID with the center name, to prevent values overlapping across centers. | False | |
preparation_instrument_vendor | Allowable Value | The manufacturer of the instrument used to prepare (staining/processing) the sample for the assay. If an automatic slide staining method was indicated this field should list the manufacturer of the instrument. | 10x Genomics Hamamatsu HTX Technologies In-House Leica Biosystems Not applicable Roche Diagnostics SunChrom Thermo Fisher Scientific |
False |
preparation_instrument_model | Allowable Value | Manufacturers of a staining system instrument may offer various versions (models) of that instrument with different features. Differences in features or sensitivities may be relevant to processing or interpretation of the data. | AutoStainer XL Chromium Connect Chromium Controller Chromium iX Chromium X Discovery Ultra EVOS M7000 M3+ Sprayer M5 Sprayer NanoZoomer S210 NanoZoomer S360 NanoZoomer S60 Not applicable ST5020 Multistainer Sublimator SunCollect Sprayer TM-Sprayer Visium CytAssist |
False |
metadata_schema_id | Textfield | The string that serves as the definitive identifier for the metadata schema version and is readily interpretable by computers for data validation and processing. Example: 22bc762a-5020-419d-b170-24253ed9e8d9 | True | |
amount_of_input_analyte_value | Numeric | The amount of RNA or DNA input to the assay, typically measured by a Qubit, BioAnalyzer, or TapeStation. In most single cell/nuclei assays, this value isn’t available. | False | |
amount_of_input_analyte_unit | Textfield | Units of amount of entity input to assay value | False | |
number_of_iterations_of_cdna_amplification | Numeric | This is the amplification of the cDNA prior to library construction. This is typically a PCR amplification, while for linear amplification methods like aRNA this would be the number of rounds of aRNA. | True | |
preparation_instrument_kit | Allowable Value | The reagent kit used with the preparation instrument. | 10X Genomics; Chromium Next GEM Chip G Single Cell Kit 16 rxns; PN 1000127 10X Genomics; Chromium Next GEM Chip G Single Cell Kit 48 rxns; PN 1000120 10X Genomics; Chromium Next GEM Chip K Automated Single Cell Kit 48 rxns; PN 1000289 10X Genomics; Chromium Next GEM Chip K Single Cell Kit 16 rxns; PN 1000287 10X Genomics; Chromium Next GEM Chip K Single Cell Kit 48 rxns; PN 1000286 10X Genomics; Chromium Next GEM Chip Q Single Cell Kit 16 rxns; PN 1000422 10X Genomics; Chromium NextGem Single Cell Multiome ATAC + Gene Expression Reagent Bundle 16 rxn; PN 1000283 10X Genomics; Chromium NextGem Single Cell Multiome ATAC + Gene Expression Reagent Bundle 4 rxn; PN 1000285 10X Genomics; Visium FFPE Reagent Kit v2-Small PN 1000436 Custom |
False |
preparation_protocol_doi | Textfield | DOI for the protocols.io page that describes the assay or sample procurment and preparation. For example for an imaging assay, the protocol might include staining of a section through the creation of an OME-TIFF file. In this case the protocol would include any image processing steps required to create the OME-TIFF file. Example: https://dx.doi.org/10.17504/protocols.io.eq2lyno9qvx9/v1 | True | |
is_targeted | Allowable Value | Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay (“Yes” or “No”). The CODEX analyte is protein. | Yes No |
True |
parent_sample_id | Textfield | Unique HuBMAP or SenNet identifier of the sample (i.e., block, section or suspension) used to perform this assay. For example, for a RNAseq assay, the parent would be the suspension, whereas, for one of the imaging assays, the parent would be the tissue section. If an assay comes from multiple parent samples then this should be a comma separated list. Example: HBM386.ZGKG.235, HBM672.MKPK.442 or SNT232.UBHJ.322, SNT329.ALSK.102 | True | |
barcode_offset | Allowable Value | Positions in the read at which the cell or capture spot barcodes start. Cell and capture spot barcodes are, for example, 3 x 8 bp sequences that are spaced by constant sequences (the offsets). First barcode at position 0, then 38, then 76. This should be included when constructing sequencing libraries with a non-commercial kit. | 0 8 20 1,27 0,38,76 10,48,78 10,48,86 Not applicable |
True |
umi_offset | Allowable Value | Position in the read at which the UMI barcode starts. This should be included when constructing sequencing libraries with a non-commercial kit. | 0 16 36 Not applicable |
True |
dataset_type | Allowable Value | The specific type of dataset being produced. | 10X Multiome 2D Imaging Mass Cytometry ATACseq (bulk) Auto-fluorescence Cell DIVE CODEX Confocal CosMx CyCIF DBiT DESI Enhanced Stimulated Raman Spectroscopy (SRS) GeoMx (nCounter) GeoMx (NGS) HiFi-Slide Histology LC-MS Light Sheet MALDI MERFISH MIBI Molecular Cartography PhenoCycler RNAseq (bulk) scATACseq scRNAseq Second Harmonic Generation (SHG) SIMS SNARE-seq2 snATACseq snRNAseq Thick section Multiphoton MxIF Visium Xenium |
True |
bulk-RNA Version 1
bulk-RNA Version 1
Attribute | Type | Description | Allowable Values | Required |
---|---|---|---|---|
version | Allowable Value | Version of the schema to use when validating this metadata. | [‘1’] | True |
description | Textfield | Free-text description of this assay. | True | |
donor_id | Textfield | HuBMAP Display ID of the donor of the assayed tissue. | True | |
tissue_id | Textfield | HuBMAP Display ID of the assayed tissue. | True | |
execution_datetime | Datetime | Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros. | True | |
protocols_io_doi | Textfield | DOI for protocols.io referring to the protocol for this assay. | True | |
operator | Textfield | Name of the person responsible for executing the assay. | True | |
operator_email | Textfield | Email address for the operator. | True | |
pi | Textfield | Name of the principal investigator responsible for the data. | True | |
pi_email | Textfield | Email address for the principal investigator. | True | |
assay_category | Allowable Value | Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence. | [‘sequence’] | True |
assay_type | Allowable Value | The specific type of assay being executed. | [‘bulkATACseq’] | True |
analyte_class | Allowable Value | Analytes are the target molecules being measured with the assay. | [‘DNA’] | True |
is_targeted | Allowable Value | Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay. | [‘Yes’,’No’]] | True |
acquisition_instrument_vendor | Textfield | An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass. | True | |
acquisition_instrument_model | Textfield | Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data. | True | |
bulk_transposition_input_number_nuclei | Textfield | A number (no comma separators) | True | |
bulk_atac_cell_isolation_protocols_io_doi | Textfield | Textfield to a protocols document answering the question: How was tissue stored and processed for cell/nuclei isolation | True | |
is_technical_replicate | Allowable Value | Is this a sequencing replicate? | [‘Yes’,’No’]] | True |
library_adapter_sequence | Textfield | Adapter sequence to be used for adapter trimming | True | |
library_average_fragment_size | Numeric | Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation. | True | |
library_concentration_value | Numeric | The concentration value of the pooled library samples submitted for sequencing. | True | |
library_concentration_unit | Allowable Value | Unit of library_concentration_value | [‘nM’] | False |
library_construction_protocols_io_doi | Textfield | A link to the protocol document containing the library construction method (including version) that was used, e.g. “Smart-Seq2”, “Drop-Seq”, “10X v3”. | True | |
library_creation_date | Datetime | date and time of library creation. YYYY-MM-DD, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s. | False | |
library_final_yield_value | Numeric | Total number of ng of library after final pcr amplification step. This is the concentration (ng/ul) * volume (ul) | True | |
library_final_yield_unit | Allowable Value | Units of final library yield | [‘ng’] | False |
library_id | Textfield | A library ID, unique within a TMC, which allows corresponding RNA and chromatin accessibility datasets to be linked. | True | |
library_layout | Allowable Value | State whether the library was generated for single-end or paired end sequencing. | [‘single-end’, ‘paired-end’] | True |
library_pcr_cycles | Numeric | Number of PCR cycles performed in order to add adapters and amplify the library. Usually, this includes 5 pre-amplificationn cycles followed by 0-5 additional cycles determined by qPCR. | True | |
library_preparation_kit | Textfield | Reagent kit used for library preparation | True | |
sample_quality_metric | Textfield | This is a quality metric by visual inspection. This should answerthe question: Are the nuclei intact and are the nuclei free of significant amountsof debris? This can be captured at a high level, âOKâ or ânotOKâ. | True | |
sequencing_phix_percent | Numeric | Percent PhiX loaded to the run | True | |
sequencing_read_format | Textfield | Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2. | True | |
sequencing_read_percent_q30 | Numeric | Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …) | True | |
sequencing_reagent_kit | Textfield | Reagent kit used for sequencing | True | |
transposition_kit_number | Textfield | If Tn5 came from a kit, provide the catalog number. | False | |
transposition_method | Textfield | Modality of capturing accessible chromatin molecules. The kit used, for example. | True | |
transposition_transposase_source | Textfield | The source of the Tn5 transposase and transposon used for capturing accessible chromatin. | True | |
contributors_path | Textfield | Relative path to file with ORCID IDs for contributors for this dataset. | True | |
data_path | Textfield | Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions. | True |
bulk-RNA Version 0
bulk-RNA Version 0
Attribute | Type | Description | Allowable Values | Required |
---|---|---|---|---|
donor_id | Textfield | HuBMAP Display ID of the donor of the assayed tissue. | True | |
tissue_id | Textfield | HuBMAP Display ID of the assayed tissue. | True | |
execution_datetime | Datetime | Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros. | True | |
protocols_io_doi | Textfield | DOI for protocols.io referring to the protocol for this assay. | True | |
operator | Textfield | Name of the person responsible for executing the assay. | True | |
operator_email | Textfield | Email address for the operator. | True | |
pi | Textfield | Name of the principal investigator responsible for the data. | True | |
pi_email | Textfield | Email address for the principal investigator. | True | |
assay_category | Allowable Value | Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence. | [‘sequence’] | True |
assay_type | Allowable Value | The specific type of assay being executed. | [‘bulk-RNA’] | True |
analyte_class | Allowable Value | Analytes are the target molecules being measured with the assay. | [‘RNA’] | True |
is_targeted | Allowable Value | Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay. | [‘Yes’,’No’]] | True |
acquisition_instrument_vendor | Textfield | An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass. | True | |
acquisition_instrument_model | Textfield | Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data. | True | |
bulk_rna_isolation_protocols_io_doi | Textfield | Textfield to a protocols document answering the question: How was tissue stored and processed for RNA isolation RNA_isolation_protocols_io_doi | True | |
bulk_rna_yield_value | Numeric | RNA (ng) per Weight of Tissue (mg). Answer the question: How much RNA in ng was isolated? How much tissue in mg was initially used for isolating RNA? Calculate the yield by dividing total RNA isolated by amount of tissue used to isolate RNA from (ng/mg). | True | |
bulk_rna_yield_units_per_tissue_unit | Allowable Value | RNA amount per Tissue input amount. Valid values should be weight/weight (ng/mg). | [‘ng/mg’] | True |
bulk_rna_isolation_quality_metric_value | Numeric | RIN value | True | |
rnaseq_assay_input_value | Numeric | RNA input amount value to the assay | True | |
rnaseq_assay_input_unit | Allowable Value | Units of RNA input amount to the assay | [‘ug’] | False |
rnaseq_assay_method | Textfield | The kit used for the RNA sequencing assay | True | |
library_construction_protocols_io_doi | Textfield | A link to the protocol document containing the library construction method (including version) that was used, e.g. “Smart-Seq2”, “Drop-Seq”, “10X v3”. | True | |
library_layout | Allowable Value | State whether the library was generated for single-end or paired end sequencing. | [‘single-end’, ‘paired-end’] | True |
library_adapter_sequence | Textfield | Adapter sequence to be used for adapter trimming. | True | |
library_pcr_cycles_for_sample_index | Numeric | Number of PCR cycles performed for library indexing | True | |
library_final_yield_value | Numeric | Total number of ng of library after final pcr amplification step. This is the concentration (ng/ul) * volume (ul) | True | |
library_final_yield_unit | Allowable Value | Units of final library yield | [‘ng’] | False |
library_average_fragment_size | Numeric | Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation. | True | |
sequencing_reagent_kit | Textfield | Reagent kit used for sequencing | True | |
sequencing_read_format | Textfield | Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2. | True | |
sequencing_read_percent_q30 | Numeric | Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …) | True | |
sequencing_phix_percent | Numeric | Percent PhiX loaded to the run | True | |
contributors_path | Textfield | Relative path to file with ORCID IDs for contributors for this dataset. | True | |
data_path | Textfield | Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions. | True |
scRNAseq Version 3
scRNAseq Version 3
Attribute | Type | Description | Allowable Values | Required |
---|---|---|---|---|
version | Allowable Value | Version of the schema to use when validating this metadata. | [‘3’] | True |
description | Textfield | Free-text description of this assay. | True | |
donor_id | Textfield | HuBMAP Display ID of the donor of the assayed tissue. | True | |
tissue_id | Textfield | HuBMAP Display ID of the assayed tissue. | True | |
execution_datetime | Datetime | Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros. | True | |
protocols_io_doi | Textfield | DOI for protocols.io referring to the protocol for this assay. | True | |
operator | Textfield | Name of the person responsible for executing the assay. | True | |
operator_email | Textfield | Email address for the operator. | True | |
pi | Textfield | Name of the principal investigator responsible for the data. | True | |
pi_email | Textfield | Email address for the principal investigator. | True | |
assay_category | Allowable Value | Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence. | [‘sequence’] | True |
assay_type | Allowable Value | The UMI sequence length in the 10xGenomics-v2 kit is 10 base pairs and the length in the 10xGenomics-v3 kit is 12 base pairs. | [‘scRNAseq-10xGenomics-v2’, ‘scRNAseq-10xGenomics-v3’, ‘snRNAseq-10xGenomics-v2’, ‘snRNAseq-10xGenomics-v3’, ‘scRNAseq’, ‘sciRNAseq’, ‘snRNAseq’, ‘SNARE2-RNAseq’] | True |
analyte_class | Allowable Value | Analytes are the target molecules being measured with the assay. | [‘RNA’] | True |
is_targeted | Allowable Value | Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay. | [‘Yes’,’No’]] | True |
acquisition_instrument_vendor | Textfield | An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass. | True | |
acquisition_instrument_model | Textfield | Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data. | True | |
sc_isolation_protocols_io_doi | Textfield | Textfield to a protocols document answering the question: How were single cells separated into a single-cell suspension? | True | |
sc_isolation_entity | Allowable Value | The type of single cell entity derived from isolation protocol | [‘whole cell’, ‘nucleus’, ‘cell-cell multimer’, ‘spatially encoded cell barcoding’] | True |
sc_isolation_tissue_dissociation | Textfield | The method by which tissues are dissociated into single cells in suspension. | True | |
sc_isolation_enrichment | Allowable Value | The method by which specific cell populations are sorted or enriched. | [‘none’, ‘FACS’] | True |
sc_isolation_quality_metric | Textfield | A quality metric by visual inspection prior to cell lysis or defined by known parameters such as wells with several cells or no cells. This can be captured at a high level. | True | |
sc_isolation_cell_number | Numeric | Total number of cell/nuclei yielded post dissociation and enrichment | True | |
rnaseq_assay_input | Numeric | Number of cell/nuclei input to the assay | True | |
rnaseq_assay_method | Textfield | The kit used for the RNA sequencing assay | True | |
library_construction_protocols_io_doi | Textfield | A link to the protocol document containing the library construction method (including version) that was used, e.g. “Smart-Seq2”, “Drop-Seq”, “10X v3”. | True | |
library_layout | Allowable Value | State whether the library was generated for single-end or paired end sequencing. | [‘single-end’, ‘paired-end’] | True |
library_adapter_sequence | Textfield | Adapter sequence to be used for adapter trimming | True | |
library_id | Textfield | A library ID, unique within a TMC, which allows corresponding RNA and chromatin accessibility datasets to be linked. | True | |
is_technical_replicate | Allowable Value | Is the sequencing reaction run in replicate, TRUE or FALSE | [‘Yes’,’No’]] | True |
cell_barcode_read | Textfield | Which read file(s) contains the cell barcode. Multiple cell_barcode_read files must be provided as a comma-delimited list (e.g. file1,file2,file3). | False | |
umi_read | Textfield | Which read file(s) contains the UMI (unique molecular identifier) barcode. | True | |
umi_offset | Numeric | Position in the read at which the umi barcode starts. | True | |
umi_size | Numeric | Length of the umi barcode in base pairs. | True | |
cell_barcode_offset | Textfield | Position(s) in the read at which the cell barcode starts. | False | |
cell_barcode_size | Textfield | Length of the cell barcode in base pairs | False | |
expected_cell_count | Numeric | How many cells are expected? This may be used in downstream pipelines to guide selection of cell barcodes or segmentation parameters. | False | |
library_pcr_cycles | Numeric | Number of PCR cycles to amplify cDNA | True | |
library_pcr_cycles_for_sample_index | Numeric | Number of PCR cycles performed for library indexing | True | |
library_final_yield_value | Numeric | Total number of ng of library after final pcr amplification step. This is the concentration (ng/ul) * volume (ul) | True | |
library_final_yield_unit | Allowable Value | Units of final library yield | [‘ng’] | False |
library_average_fragment_size | Numeric | Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation. | True | |
sequencing_reagent_kit | Textfield | Reagent kit used for sequencing | True | |
sequencing_read_format | Textfield | Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2. | True | |
sequencing_read_percent_q30 | Numeric | Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …) | True | |
sequencing_phix_percent | Numeric | Percent PhiX loaded to the run | True | |
contributors_path | Textfield | Relative path to file with ORCID IDs for contributors for this dataset. | True | |
data_path | Textfield | Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions. | True |
scRNAseq Version 2
scRNAseq Version 2
Attribute | Type | Description | Allowable Values | Required |
---|---|---|---|---|
version | Allowable Value | Version of the schema to use when validating this metadata. | [‘2’] | True |
description | Textfield | Free-text description of this assay. | True | |
donor_id | Textfield | HuBMAP Display ID of the donor of the assayed tissue. | True | |
tissue_id | Textfield | HuBMAP Display ID of the assayed tissue. | True | |
execution_datetime | Datetime | Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros. | True | |
protocols_io_doi | Textfield | DOI for protocols.io referring to the protocol for this assay. | True | |
operator | Textfield | Name of the person responsible for executing the assay. | True | |
operator_email | Textfield | Email address for the operator. | True | |
pi | Textfield | Name of the principal investigator responsible for the data. | True | |
pi_email | Textfield | Email address for the principal investigator. | True | |
assay_category | Allowable Value | Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence. | [‘sequence’] | True |
assay_type | Allowable Value | The specific type of assay being executed. | [‘scRNAseq-10xGenomics-v2’, ‘scRNAseq-10xGenomics-v3’, ‘snRNAseq-10xGenomics-v2’, ‘snRNAseq-10xGenomics-v3’, ‘scRNAseq’, ‘sciRNAseq’, ‘snRNAseq’, ‘SNARE2-RNAseq’] | True |
analyte_class | Allowable Value | Analytes are the target molecules being measured with the assay. | [‘RNA’] | True |
is_targeted | Allowable Value | Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay. | [‘Yes’,’No’]] | True |
acquisition_instrument_vendor | Textfield | An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass. | True | |
acquisition_instrument_model | Textfield | Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data. | True | |
sc_isolation_protocols_io_doi | Textfield | Textfield to a protocols document answering the question: How were single cells separated into a single-cell suspension? | True | |
sc_isolation_entity | Allowable Value | The type of single cell entity derived from isolation protocol | [‘whole cell’, ‘nucleus’, ‘cell-cell multimer’, ‘spatially encoded cell barcoding’] | True |
sc_isolation_tissue_dissociation | Textfield | The method by which tissues are dissociated into single cells in suspension. | True | |
sc_isolation_enrichment | Allowable Value | The method by which specific cell populations are sorted or enriched. | [‘none’, ‘FACS’] | True |
sc_isolation_quality_metric | Textfield | A quality metric by visual inspection prior to cell lysis or defined by known parameters such as wells with several cells or no cells. This can be captured at a high level. | True | |
sc_isolation_cell_number | Numeric | Total number of cell/nuclei yielded post dissociation and enrichment | True | |
rnaseq_assay_input | Numeric | Number of cell/nuclei input to the assay | True | |
rnaseq_assay_method | Textfield | The kit used for the RNA sequencing assay | True | |
library_construction_protocols_io_doi | Textfield | A link to the protocol document containing the library construction method (including version) that was used, e.g. “Smart-Seq2”, “Drop-Seq”, “10X v3”. | True | |
library_layout | Allowable Value | State whether the library was generated for single-end or paired end sequencing. | [‘single-end’, ‘paired-end’] | True |
library_adapter_sequence | Textfield | Adapter sequence to be used for adapter trimming | True | |
library_id | Textfield | A library ID, unique within a TMC, which allows corresponding RNA and chromatin accessibility datasets to be linked. | True | |
is_technical_replicate | Allowable Value | Is the sequencing reaction run in replicate, TRUE or FALSE | [‘Yes’,’No’]] | True |
cell_barcode_read | Textfield | Which read file(s) contains the cell barcode. Multiple cell_barcode_read files must be provided as a comma-delimited list (e.g. file1,file2,file3). | False | |
cell_barcode_offset | Textfield | Position(s) in the read at which the cell barcode starts. | False | |
cell_barcode_size | Textfield | Length of the cell barcode in base pairs | False | |
expected_cell_count | Numeric | How many cells are expected? This may be used in downstream pipelines to guide selection of cell barcodes or segmentation parameters. | False | |
library_pcr_cycles | Numeric | Number of PCR cycles to amplify cDNA | True | |
library_pcr_cycles_for_sample_index | Numeric | Number of PCR cycles performed for library indexing | True | |
library_final_yield_value | Numeric | Total number of ng of library after final pcr amplification step. This is the concentration (ng/ul) * volume (ul) | True | |
library_final_yield_unit | Allowable Value | Units of final library yield | [‘ng’] | False |
library_average_fragment_size | Numeric | Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation. | True | |
sequencing_reagent_kit | Textfield | Reagent kit used for sequencing | True | |
sequencing_read_format | Textfield | Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2. | True | |
sequencing_read_percent_q30 | Numeric | Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …) | True | |
sequencing_phix_percent | Numeric | Percent PhiX loaded to the run | True | |
contributors_path | Textfield | Relative path to file with ORCID IDs for contributors for this dataset. | True | |
data_path | Textfield | Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions. | True |
scRNAseq Version 1
scRNAseq Version 1
Attribute | Type | Description | Allowable Values | Required |
---|---|---|---|---|
version | Allowable Value | Version of the schema to use when validating this metadata. | [‘1’] | True |
description | Textfield | Free-text description of this assay. | True | |
donor_id | Textfield | HuBMAP Display ID of the donor of the assayed tissue. | True | |
tissue_id | Textfield | HuBMAP Display ID of the assayed tissue. | True | |
execution_datetime | Datetime | Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros. | True | |
protocols_io_doi | Textfield | DOI for protocols.io referring to the protocol for this assay. | True | |
operator | Textfield | Name of the person responsible for executing the assay. | True | |
operator_email | Textfield | Email address for the operator. | True | |
pi | Textfield | Name of the principal investigator responsible for the data. | True | |
pi_email | Textfield | Email address for the principal investigator. | True | |
assay_category | Allowable Value | Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence. | [‘sequence’] | True |
assay_type | Allowable Value | The specific type of assay being executed. | [‘scRNAseq-10xGenomics’, ‘snRNAseq-10xGenomics-v2’, ‘snRNAseq-10xGenomics-v3’, ‘scRNAseq’, ‘sciRNAseq’, ‘snRNAseq’, ‘SNARE2-RNAseq’] | True |
analyte_class | Allowable Value | Analytes are the target molecules being measured with the assay. | [‘RNA’] | True |
is_targeted | Allowable Value | Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay. | [‘Yes’,’No’]] | True |
acquisition_instrument_vendor | Textfield | An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass. | True | |
acquisition_instrument_model | Textfield | Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data. | True | |
sc_isolation_protocols_io_doi | Textfield | Textfield to a protocols document answering the question: How were single cells separated into a single-cell suspension? | True | |
sc_isolation_entity | Allowable Value | The type of single cell entity derived from isolation protocol | [‘whole cell’, ‘nucleus’, ‘cell-cell multimer’, ‘spatially encoded cell barcoding’] | True |
sc_isolation_tissue_dissociation | Textfield | The method by which tissues are dissociated into single cells in suspension. | True | |
sc_isolation_enrichment | Allowable Value | The method by which specific cell populations are sorted or enriched. | [‘none’, ‘FACS’] | True |
sc_isolation_quality_metric | Textfield | A quality metric by visual inspection prior to cell lysis or defined by known parameters such as wells with several cells or no cells. This can be captured at a high level. | True | |
sc_isolation_cell_number | Numeric | Total number of cell/nuclei yielded post dissociation and enrichment | True | |
rnaseq_assay_input | Numeric | Number of cell/nuclei input to the assay | True | |
rnaseq_assay_method | Textfield | The kit used for the RNA sequencing assay | True | |
library_construction_protocols_io_doi | Textfield | A link to the protocol document containing the library construction method (including version) that was used, e.g. “Smart-Seq2”, “Drop-Seq”, “10X v3”. | True | |
library_layout | Allowable Value | State whether the library was generated for single-end or paired end sequencing. | [‘single-end’, ‘paired-end’] | True |
library_adapter_sequence | Textfield | Adapter sequence to be used for adapter trimming | True | |
library_id | Textfield | A library ID, unique within a TMC, which allows corresponding RNA and chromatin accessibility datasets to be linked. | True | |
is_technical_replicate | Allowable Value | Is the sequencing reaction run in repliucate, TRUE or FALSE | [‘Yes’,’No’]] | True |
cell_barcode_read | Textfield | Which read file contains the cell barcode | True | |
cell_barcode_offset | Textfield | Position(s) in the read at which the cell barcode starts. | True | |
cell_barcode_size | Textfield | Length of the cell barcode in base pairs | True | |
library_pcr_cycles | Numeric | Number of PCR cycles to amplify cDNA | True | |
library_pcr_cycles_for_sample_index | Numeric | Number of PCR cycles performed for library indexing | True | |
library_final_yield_value | Numeric | Total number of ng of library after final pcr amplification step. This is the concentration (ng/ul) * volume (ul) | True | |
library_final_yield_unit | Allowable Value | Units of final library yield | [‘ng’] | False |
library_average_fragment_size | Numeric | Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation. | True | |
sequencing_reagent_kit | Textfield | Reagent kit used for sequencing | True | |
sequencing_read_format | Textfield | Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2. | True | |
sequencing_read_percent_q30 | Numeric | Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …) | True | |
sequencing_phix_percent | Numeric | Percent PhiX loaded to the run | True | |
contributors_path | Textfield | Relative path to file with ORCID IDs for contributors for this dataset. | True | |
data_path | Textfield | Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions. | True |
scRNAseq Version 0
scRNAseq Version 0
Attribute | Type | Description | Allowable Values | Required |
---|---|---|---|---|
version | Allowable Value | Version of the schema to use when validating this metadata. | [‘2’] | True |
description | Textfield | Free-text description of this assay. | True | |
donor_id | Textfield | HuBMAP Display ID of the donor of the assayed tissue. | True | |
tissue_id | Textfield | HuBMAP Display ID of the assayed tissue. | True | |
execution_datetime | Datetime | Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros. | True | |
protocols_io_doi | Textfield | DOI for protocols.io referring to the protocol for this assay. | True | |
operator | Textfield | Name of the person responsible for executing the assay. | True | |
operator_email | Textfield | Email address for the operator. | True | |
pi | Textfield | Name of the principal investigator responsible for the data. | True | |
pi_email | Textfield | Email address for the principal investigator. | True | |
assay_category | Allowable Value | Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence. | [‘sequence’] | True |
assay_type | Allowable Value | The specific type of assay being executed. | [‘scRNAseq-10xGenomics-v2’, ‘scRNAseq-10xGenomics-v3’, ‘snRNAseq-10xGenomics-v2’, ‘snRNAseq-10xGenomics-v3’, ‘scRNAseq’, ‘sciRNAseq’, ‘snRNAseq’, ‘SNARE2-RNAseq’] | True |
analyte_class | Allowable Value | Analytes are the target molecules being measured with the assay. | [‘RNA’] | True |
is_targeted | Allowable Value | Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay. | [‘Yes’,’No’]] | True |
acquisition_instrument_vendor | Textfield | An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass. | True | |
acquisition_instrument_model | Textfield | Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data. | True | |
sc_isolation_protocols_io_doi | Textfield | Textfield to a protocols document answering the question: How were single cells separated into a single-cell suspension? | True | |
sc_isolation_entity | Allowable Value | The type of single cell entity derived from isolation protocol | [‘whole cell’, ‘nucleus’, ‘cell-cell multimer’, ‘spatially encoded cell barcoding’] | True |
sc_isolation_tissue_dissociation | Textfield | The method by which tissues are dissociated into single cells in suspension. | True | |
sc_isolation_enrichment | Allowable Value | The method by which specific cell populations are sorted or enriched. | [‘none’, ‘FACS’] | True |
sc_isolation_quality_metric | Textfield | A quality metric by visual inspection prior to cell lysis or defined by known parameters such as wells with several cells or no cells. This can be captured at a high level. | True | |
sc_isolation_cell_number | Numeric | Total number of cell/nuclei yielded post dissociation and enrichment | True | |
rnaseq_assay_input | Numeric | Number of cell/nuclei input to the assay | True | |
rnaseq_assay_method | Textfield | The kit used for the RNA sequencing assay | True | |
library_construction_protocols_io_doi | Textfield | A link to the protocol document containing the library construction method (including version) that was used, e.g. “Smart-Seq2”, “Drop-Seq”, “10X v3”. | True | |
library_layout | Allowable Value | State whether the library was generated for single-end or paired end sequencing. | [‘single-end’, ‘paired-end’] | True |
library_adapter_sequence | Textfield | Adapter sequence to be used for adapter trimming | True | |
library_id | Textfield | A library ID, unique within a TMC, which allows corresponding RNA and chromatin accessibility datasets to be linked. | True | |
is_technical_replicate | Allowable Value | Is the sequencing reaction run in replicate, TRUE or FALSE | [‘Yes’,’No’]] | True |
cell_barcode_read | Textfield | Which read file(s) contains the cell barcode. Multiple cell_barcode_read files must be provided as a comma-delimited list (e.g. file1,file2,file3). | False | |
cell_barcode_offset | Textfield | Position(s) in the read at which the cell barcode starts. | False | |
cell_barcode_size | Textfield | Length of the cell barcode in base pairs | False | |
expected_cell_count | Numeric | How many cells are expected? This may be used in downstream pipelines to guide selection of cell barcodes or segmentation parameters. | False | |
library_pcr_cycles | Numeric | Number of PCR cycles to amplify cDNA | True | |
library_pcr_cycles_for_sample_index | Numeric | Number of PCR cycles performed for library indexing | True | |
library_final_yield_value | Numeric | Total number of ng of library after final pcr amplification step. This is the concentration (ng/ul) * volume (ul) | True | |
library_final_yield_unit | Allowable Value | Units of final library yield | [‘ng’] | False |
library_average_fragment_size | Numeric | Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation. | True | |
sequencing_reagent_kit | Textfield | Reagent kit used for sequencing | True | |
sequencing_read_format | Textfield | Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2. | True | |
sequencing_read_percent_q30 | Numeric | Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …) | True | |
sequencing_phix_percent | Numeric | Percent PhiX loaded to the run | True | |
contributors_path | Textfield | Relative path to file with ORCID IDs for contributors for this dataset. | True | |
data_path | Textfield | Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions. | True |