WGS

NOTE: Several versions of this metadata schema have been created over time. The (Latest) version contains most attributes, but there may be some deprecated attributes in the older versions for which data has been collected. HuBMAP is in the process of creating a reference which combines all of these versions into a single view. That reference will be available here once completed.

Version 1 (no longer accepting data)

Version 1 (no longer accepting data)

Attribute Type Description Allowable Values Required
version Allowable Value Version of the schema to use when validating this metadata. [‘1’] True
description Textfield Free-text description of this assay.   True
donor_id Textfield HuBMAP Display ID of the donor of the assayed tissue.   True
tissue_id Textfield HuBMAP Display ID of the assayed tissue.   True
execution_datetime Datetime Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros.   True
protocols_io_doi Textfield DOI for protocols.io referring to the protocol for this assay.   True
operator Textfield Name of the person responsible for executing the assay.   True
operator_email Textfield Email address for the operator.   True
pi Textfield Name of the principal investigator responsible for the data.   True
pi_email Textfield Email address for the principal investigator.   True
assay_category Allowable Value Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence. [‘sequence’] True
assay_type Allowable Value The specific type of assay being executed. [‘WGS’] True
analyte_class Allowable Value Analytes are the target molecules being measured with the assay. [‘DNA’] True
is_targeted Allowable Value Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay. [‘Yes’,’No’] True
acquisition_instrument_vendor Textfield An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass.   True
acquisition_instrument_model Textfield Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data.   True
gdna_fragmentation_quality_assurance Allowable Value Is the gDNA integrity good enough for WGS? This is usually checked through running a gel. [‘Pass’, ‘Fail’] True
dna_assay_input_value Numeric Amount of DNA input into library preparation   True
dna_assay_input_unit Allowable Value Units of DNA input into library preparation [‘ug’] False
library_construction_method Textfield Describes DNA library preparation kit. Modality of isolating gDNA, Fragmentation and generating sequencing libraries.   True
library_construction_protocols_io_doi Textfield A link to the protocol document containing the library construction method (including version) that was used.   True
library_layout Allowable Value State whether the library was generated for single-end or paired end sequencing. [‘single-end’, ‘paired-end’] True
library_adapter_sequence Textfield The adapter sequence to be used for adapter trimming starting with the 5’ end. (eg. 5-ATCCTGAGAA)   True
library_final_yield Numeric Total amount of library after final pcr amplification step   True
library_final_yield_unit Allowable Value Total units of library after final pcr amplification step [‘ng’] False
library_average_fragment_size Numeric Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation.   True
sequencing_reagent_kit Textfield Reagent kit used for sequencing   True
sequencing_read_format Textfield Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2.   True
sequencing_read_percent_q30 Numeric Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …)   True
sequencing_phix_percent Numeric Percent PhiX loaded to the run   True
contributors_path Textfield Relative path to file with ORCID IDs for contributors for this dataset.   True
data_path Textfield Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions.   True
Version 0

Version 0

Attribute Type Description Allowable Values Required
donor_id Textfield HuBMAP Display ID of the donor of the assayed tissue.   True
tissue_id Textfield HuBMAP Display ID of the assayed tissue.   True
execution_datetime Datetime Start date and time of assay, typically a date-time stamped folder generated by the acquisition instrument. YYYY-MM-DD hh:mm, where YYYY is the year, MM is the month with leading 0s, and DD is the day with leading 0s, hh is the hour with leading zeros, mm are the minutes with leading zeros.   True
protocols_io_doi Textfield DOI for protocols.io referring to the protocol for this assay.   True
operator Textfield Name of the person responsible for executing the assay.   True
operator_email Textfield Email address for the operator.   True
pi Textfield Name of the principal investigator responsible for the data.   True
pi_email Textfield Email address for the principal investigator.   True
assay_category Allowable Value Each assay is placed into one of the following 4 general categories: generation of images of microscopic entities, identification & quantitation of molecules by mass spectrometry, imaging mass spectrometry, and determination of nucleotide sequence. [‘sequence’] True
assay_type Allowable Value The specific type of assay being executed. [‘WGS’] True
analyte_class Allowable Value Analytes are the target molecules being measured with the assay. [‘DNA’] True
is_targeted Allowable Value Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay. [‘Yes’,’No’] True
acquisition_instrument_vendor Textfield An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass.   True
acquisition_instrument_model Textfield Manufacturers of an acquisition instrument may offer various versions (models) of that instrument with different features or sensitivities. Differences in features or sensitivities may be relevant to processing or interpretation of the data.   True
gdna_fragmentation_quality_assurance Allowable Value Is the gDNA integrity good enough for WGS? This is usually checked through running a gel. [‘Pass’, ‘Fail’] True
dna_assay_input_value Numeric Amount of DNA input into library preparation   True
dna_assay_input_unit Allowable Value Units of DNA input into library preparation [‘ug’] False
library_construction_method Textfield Describes DNA library preparation kit. Modality of isolating gDNA, Fragmentation and generating sequencing libraries.   True
library_construction_protocols_io_doi Textfield A link to the protocol document containing the library construction method (including version) that was used.   True
library_layout Allowable Value State whether the library was generated for single-end or paired end sequencing. [‘single-end’, ‘paired-end’] True
library_adapter_sequence Textfield The adapter sequence to be used for adapter trimming starting with the 5’ end. (eg. 5-ATCCTGAGAA)   True
library_final_yield Numeric Total amount of library after final pcr amplification step   True
library_final_yield_unit Allowable Value Total units of library after final pcr amplification step [‘ng’] False
library_average_fragment_size Numeric Average size in basepairs (bp) of sequencing library fragments estimated via gel electrophoresis or bioanalyzer/tapestation.   True
sequencing_reagent_kit Textfield Reagent kit used for sequencing   True
sequencing_read_format Textfield Slash-delimited list of the number of sequencing cycles for, for example, Read1, i7 index, i5 index, and Read2.   True
sequencing_read_percent_q30 Numeric Q30 is the weighted average of all the reads (e.g. # bases UMI * q30 UMI + # bases R2 * q30 R2 + …)   True
sequencing_phix_percent Numeric Percent PhiX loaded to the run   True
contributors_path Textfield Relative path to file with ORCID IDs for contributors for this dataset.   True
data_path Textfield Relative path to file or directory with instrument data. Downstream processing will depend on filename extension conventions.   True