Contributors | Affiliation | Role |
---|---|---|
Saito, Mak A. | Woods Hole Oceanographic Institution (WHOI) | Principal Investigator |
Saunders, Jaci | Woods Hole Oceanographic Institution (WHOI) | Scientist |
York, Amber D. | Woods Hole Oceanographic Institution (WHOI BCO-DMO) | BCO-DMO Data Manager |
Methods & Sampling:
Samples were handled as described in (Saunders et al., submitted) and (McIlvin et al., 2021). There are a total of 107,579 unique peptide sequences from 56,543 protein groups (88,251 proteins). Proteins were extracted from biomass collected on a quarter section of a 142 mm 0.2 µm Supor filter (Pall Corporation) after pre-filtration through a 3.0 µm Supor filter. Proteins were extracted using a modified SP3 magnetic bead method (Hughes et al., 2014). Extracted proteins were quatified using the bicinchoninic acid method (Thermo Scientific Micro BCA protein assay kit) with an albumin protein reference standard. Extracted protein was purified and digested with trypsin. Purified and digested protein was then injected into an online nanoflow 2D active modulation liquid chromatography separation (McIlvin et al., 2021). Eluent flowed inline into the ion source of a Thermo Fusion quadrupole-Orbitrap mass spectrometer (Thermo Scientific). MS/MS spectra (and mass spec methods) are available at the PRIDE repository under accession PXD030684. Peptide to spectrum matching (PSM) on the MS/MS data was conducted with SEQUEST HT using Thermo Proteome Discoverer v 2.1 software against a database of predicted proteins from an assembled metagenome from the central Pacific Ocean. Additional protein inference and FDR calculations were conducted with Scaffold v 4.8.7. Identification was conducted with decoy false discovery rates (FDR) with a threshold of 95% minimum for peptides (FDR=0.1%) and a threshold of 99% (1 peptide minimum) for proteins (FDR=1.6%). Protein level inference for parsimony-based assignments of specific proteins was conducted using experiment-wide grouping with binary peptide-protein weights in Scaffold. All PSM and protein quantification in Scaffold was conducted using Exclusive Counts (not double-counted, PSM assignment to protein groups using parsimony).
Location: Pelagic central Pacific Ocean from 20 – 1250 m depth
Data Processing:
The raw mass spectra files were searched against SEQUEST within Proteome Discoverer v2.2 software. Raw mass spectra files are available in ProteomeXchange via the PRIDE database with accession # PXD030684 and 10.6019/PXD030684. Processed files were then loaded into Proteome Software and exclusive peptide reports were exported. The files were modified to scale and normalize the spectral counts at across the entire dataset. The peptide report was too large to work within Excel and was modified in Pandas/Python to produce a CSV file. Lowest Common Ancestor (LCA) analysis was conducted with METATRYP v 2 (Saunders et al 2020). Grouping of taxonomic and KO levels was conducted with pandas in python.
BCO-DMO data manager processing notes:
* Un-named first data column with row index removed (verified with submitter).
File |
---|
ProteOMZ Exclusive Peptide Level Spectral Counts filename: proteomz_peptide_spec_counts.csv (Comma Separated Values (.csv), 1.75 GB) MD5:81ec05e6f8abbd6e2ac31a3c26963053 ProteOMZ Exclusive Peptide Level Spectral Counts data table in csv format. See Parameters section for data table column names, descriptions, and units. |
Parameter | Description | Units |
MSMS_sample_name | RAW file name; crossreferences to proteomexchange or OPP repository | unitless |
Stn | Station number | unitless |
Depth | Depth of sampling | meters |
Longitude | Longitude | decimal degrees |
Latitude | Latitude | decimal degrees |
DepthGroup | K-means cluster of depths across transect | unitless |
Region | Hierarchical cluster of stations across transect | unitless |
Peptide_sequence | Unique Peptide sequence; this is the most unique identifier | unitless |
Protein_name | Metagenome ID of best inferred protein from Scaffold protein inference; needed to map to protein IDs. | unitless |
Exclusive_Sum_PSM | Exclusive Sum of +2 +3 +4 data = total unnormalized spectral counts; Quantitative Value | count |
Scaling_Factor | Scaling factor used to normalize samples across experiment to control for identification bias of PSMs | unitless |
Calculated_Total_Protein | Concentration of protein per L of seawater passed through filter | ug/L |
Scaled_Corrected_Exclusive_Sum | Scaled and normalized exclusive sum of spectral counts | sc/L |
blast_accession | Accession number in NCBI of best blast hit for query of ORF from protein name identified | unitless |
blast_best_hit_taxon_id | best NCBI taxon ID for best blast hit in NCBI for query of ORF from protein name identified | unitless |
KO | accession number of KEGG Orthology (KO) identifier for ORF from protein name | unitless |
Group | NCBI taxonomic group for ORF from protein name | unitless |
Domain | NCBI taxonomic domain for ORF from protein name | unitless |
Phylum | NCBI taxonomic phylum for ORF from protein name | unitless |
Class | NCBI taxonomic class for ORF from protein name | unitless |
Order | NCBI taxonomic order for ORF from protein name | unitless |
Family | NCBI taxonomic family for ORF from protein name | unitless |
Genus | NCBI taxonomic genus for ORF from protein name | unitless |
Species | NCBI taxonomic species for ORF from protein name | unitless |
LCA_Group | Lowest Common Ancestor for groups of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
LCA_Domain | Lowest Common Ancestor for domains of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
LCA_Phylum | Lowest Common Ancestor for phyla of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
LCA_Class | Lowest Common Ancestor for classes of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
LCA_Order | Lowest Common Ancestor for orders of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
LCA_Family | Lowest Common Ancestor for families of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
LCA_Genus | Lowest Common Ancestor for genuses of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
LCA_Level | lowest taxonomic level of the Lowest Common Ancestor of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
LCA_taxon | lowest taxonomic identification of the Lowest Common Ancestor of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
Combo_Group | All groups of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
Combo_Domain | All domains of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
Combo_Phylum | All phyla of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
Combo_Class | All classes of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
Combo_Order | All orders of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
Combo_Family | All families of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
Combo_Genus | All genuses of all NCBI taxons from ORFs containing peptide sequence in the entire metagenome | unitless |
Combo_KO | All KEGG Orthology (KO) identifiers from ORFs containing peptide sequence in the entire metagenome | unitless |
Combo_KO_Orthology | All KEGG Orthology (KO) orthologies from ORFs containing peptide sequence in the entire metagenome | unitless |
Combo_KO_Class | All KEGG Orthology (KO) classes from ORFs containing peptide sequence in the entire metagenome | unitless |
Combo_KO_Path | All KEGG Orthology (KO) paths from ORFs containing peptide sequence in the entire metagenome | unitless |
Combo_Gene_Name | All KEGG Orthology (KO) gene names form ORFs containing peptide sequence in the entire metagenome | unitless |
Combo_Gene_Description | All KEGG Orthology (KO) gene descriptions containing peptide sequence in the entire metagenome | unitless |
Combo_EC | All Enzyme Commission (E.C.) numbers from ORFs from ORFs containing peptide sequence in the entire metagenome | unitless |
Best_Peptide_identification_probability | Metric of peptide quality | unitless |
Best_Sequest_XCorr_Only_deltaCn | Metric of peptide quality | unitless |
Best_Sequest_XCorr_Only_XCorr | Metric of peptide quality | unitless |
Number_of_identified_plus2H_spectra | Count of +2H spectra | count |
Number_of_identified_plus3H_spectra | Count of +3H spectra | count |
Number_of_identified_plus4H_spectra | Count of +4H spectra | count |
Median_Retention_Time | Median retention time | minutes |
Total_TIC | Total Ion Current | unitless |
SOM_Label | Taxon and KO grouping label for SOM neural net | unitless |
SOM_Label_Flag | Flag for SOM neural net that indicates whether more than one group listed condensed into SOM Label | unitless |
All_Other_Proteins | All other ORFs in metagenome that contain the peptide sequences. Used to calculate LCA and Combos for taxonomy and KO annotations. | unitless |
Dataset-specific Instrument Name | McLane in situ filtration pump |
Generic Instrument Name | McLane Pump |
Generic Instrument Description | McLane pumps sample large volumes of seawater at depth. They are attached to a wire and lowered to different depths in the ocean. As the water is pumped through the filter, particles suspended in the ocean are collected on the filters. The pumps are then retrieved and the contents of the filters are analyzed in a lab. |
Website | |
Platform | R/V Falkor |
Report | |
Start Date | 2016-01-16 |
End Date | 2016-02-11 |
Description | Project: Using Proteomics to Understand Oxygen Minimum Zones (ProteOMZ)
More information is available from the ship operator at https://schmidtocean.org/cruise/investigating-life-without-oxygen-in-the...
Additional cruise information is available from the Rolling Deck to Repository (R2R): https://www.rvdata.us/search/cruise/FK160115 |
From Schmidt Ocean Institute's ProteOMZ Project page:
Rising temperatures, ocean acidification, and overfishing have now gained widespread notoriety as human-caused phenomena that are changing our seas. In recent years, scientists have increasingly recognized that there is yet another ingredient in that deleterious mix: a process called deoxygenation that results in less oxygen available in our seas.
Large-scale ocean circulation naturally results in low-oxygen areas of the ocean called oxygen deficient zones (ODZs). The cycling of carbon and nutrients – the foundation of marine life, called biogeochemistry – is fundamentally different in ODZs than in oxygen-rich areas. Because researchers think deoxygenation will greatly expand the total area of ODZs over the next 100 years, studying how these areas function now is important in predicting and understanding the oceans of the future. This first expedition of 2016 led by Dr. Mak Saito from the Woods Hole Oceanographic Institution (WHOI) along with scientists from University of Maryland Center for Environmental Science, University of California Santa Cruz, and University of Washington aimed to do just that, investigate ODZs.
During the 28 day voyage named “ProteOMZ,” researchers aboard R/V Falkor traveled from Honolulu, Hawaii to Tahiti to describe the biogeochemical processes that occur within this particular swath of the ocean’s ODZs. By doing so, they contributed to our greater understanding of ODZs, gathered a database of baseline measurements to which future measurements can be compared, and established a new methodology that could be used in future research on these expanding ODZs.
In support of obtaining deeper knowledge of major biogeochemically relevant proteins to inform a mechanistic understanding of global marine biogeochemical cycles.
A Gordon and Betty Moore Foundation Program.
Forging a new paradigm in marine microbial ecology:
Microbes in the ocean produce half of the oxygen on the planet and remove vast amounts of carbon dioxide, a greenhouse gas, from the atmosphere. Yet, we have known surprisingly little about these microscopic organisms. As we discover answers to some long-standing puzzles about the roles that marine microorganisms play in supporting the ocean’s food webs and driving global elemental cycles, we realized that we still need to learn much more about what these organisms do and how they do it—including how they evolved and contribute to our ocean's health and productivity.
The Marine Microbiology Initiative seeks to gain a comprehensive understanding of marine microbial communities, including their diversity, functions and behaviors; their ecological roles; and their origins and evolution. Our focus has been to enable researchers to uncover the principles that govern the interactions among microbes and that govern microbially mediated nutrient flow in the sea. To address these opportunities, we support leaders in the field through investigator awards, multidisciplinary team research projects, and efforts to create resources of broad use to the research community. We also support development of new instrumentation, tools, technologies and genetic approaches.
Through the efforts of many scientists from around the world, the initiative has been catalyzing new science through advances in methods and technology, and to reduce interdisciplinary barriers slowing progress. With our support, researchers are quantifying nutrient pools in the ocean, deciphering the genetic and biochemical bases of microbial metabolism, and understanding how microbes interact with one another. The initiative has five grant portfolios:
Individual investigator awards for current and emerging leaders in the field.
Multidisciplinary projects that support collaboration across disciplines.
New instrumentation, tools and technology that enable scientists to ask new questions in ways previously not possible.
Community resource efforts that fund the creation and sharing of data and the development of tools, methods and infrastructure of widespread utility.
Projects that advance genetic tools to enable development of experimental model systems in marine microbial ecology.
We also bring together scientists to discuss timely subjects and to facilitate scientific exchange.
Our path to marine microbial ecology was a confluence of new technology that could accelerate science and an opportunity to support a field that was not well funded relative to potential impact. Around the time we began this work in 2004, the life sciences were entering a new era of DNA sequencing and genomics, expanding possibilities for scientific research – including the nascent field of marine microbial ecology. Through conversations with pioneers inside and outside the field, an opportunity was identified: to apply these new sequencing tools to advance knowledge of marine microbial communities and reveal how they support and influence ocean systems.
After many years of success, we will wind down this effort and close the initiative in 2021. We will have invested more than $250 million over 17 years to deepen understanding of the diversity, ecological activities and evolution of marine microbial communities. Thanks to the work of hundreds of scientists and others involved with the initiative, the goals have been achieved and the field has been profoundly enriched; it is now positioned to address new scientific questions using innovative technologies and methods.
Funding Source | Award |
---|---|
Gordon and Betty Moore Foundation: Marine Microbiology Initiative (MMI) | |
Schmidt Ocean Institute (SOI) |