Contributors | Affiliation | Role |
---|---|---|
Saito, Mak A. | Woods Hole Oceanographic Institution (WHOI) | Principal Investigator |
Ake, Hannah | Woods Hole Oceanographic Institution (WHOI BCO-DMO) | BCO-DMO Data Manager |
York, Amber D. | Woods Hole Oceanographic Institution (WHOI BCO-DMO) | BCO-DMO Data Manager |
These data are part of the Ocean Protein Portal "ProteOMZ" dataset (https://proteinportal.whoi.edu/; Saito et al., 2019).
The raw mass spectra files were searched against SEQUEST within Proteome Discoverer v2.2 software. Processed files were then loaded into Proteome Software and protein and peptide reports as well as and fasta files were exported. The files were modified slightly to map to the Protein Portal data model for submission to BCO-DMO.
Preprocessing:
-Date, time, filter min, filter max, lat, lon, and cruise columns were added based on information from the Falkor 160115 Event log and CTD log.
-Column names reformatted to comply with BCO-DMO standards.
Dataset version 3 revision (2019-02-24): replaces the earlier data version from date (2018-05-25)
* Values "#N/A" changed to "Unknown" which has a different meaning than blank values. Unknown = "Protein functional and taxonomic annotations are marked as "Unknown" for protein sequences which did not have any significant hits to known reference sequences or motifs in the metagenome annotation database. "
* Event log and McLane pump log were updated to fix lat/lon, date/time issues. Since these were sources of information in this dataset, this dataset is also being updated. See respective "processing notes" sections for these two logs for detailed information about changes to those data sources. event log: https://www.bco-dmo.org/dataset/708384 pump log: https://www.bco-dmo.org/dataset/708495
* ISO_DateTime_UTC timestamp added from Date and time columns in the McLane log dataset. Date and time columns were HST time zone so 10 hours were added to make the time in UTC.
* columns for max and min pump filter size added (min 0.2 max 3.0).
* some station and target depths for station and target depth combinations did not exist in the Mclane pump log so the missing values were added to the mclane pump log. The following columns in this dataset come from the pump log: cruise, cast, date, time, lat, lon, depth (ISO_DateTime_UTC is derived from local date and time). Where there were two matched casts in the pump log for a station and target depth, the first cast for the station was used. The only differences between the two possible casts were the date/time columns.
Dataset Version 3: This file revision (2022-06-06) replaces a previous revision of dataset version 1 from 2019-02-24. This is still dataset version 3 because previous revisions of version 3 were not made public.
* Data from source file "ProteOMZ_proteins_for_OPP.csv" was imported into the BCO-DMO data system for this dataset. This file "ProteOMZ_proteins_for_OPP.csv" is from Ocean Protein Portal "ProteOMZ" dataset v3 (file version 2022-06-06)
** In the BCO-DMO data system missing data identifiers are displayed according to the format of data you access. For example, in csv files it will be blank (null) values. In Matlab .mat files it will be NaN values. When viewing data online at BCO-DMO, the missing value will be shown as blank (null) values.
* Column names adjusted to conform to BCO-DMO naming conventions designed to support broad re-use by a variety of research tools and scripting languages. [Only numbers, letters, and underscores. Can not start with a number] e.g. date_y-m-d changed to date_ymd
* ISO DateTime with timezone (UTC) column added in ISO 8601 format from local date and times in HST.
* ".0" removed from ncbi_id values to correspond to the integer identifier at NCBI.
* Data table attached to dataset as Data File:"737620_v3_proteomz-proteins.csv"
File |
---|
737620_v3_proteomz-proteins.csv (Comma Separated Values (.csv), 261.46 MB) MD5:f9e2a796920a457f8e89752bd74012cc Primary data file for dataset ID 737620, version 3 |
Parameter | Description | Units |
sample_id | Unique sample name for the specific filter collected (station/depth/version if applicable) | untiless |
MS_MS_sample_name | Unique name for the mass spec sample and run | untiless |
station_id | The identifier for the station | unitless |
depth_m | Cast depth where sample was taken | meters |
latitude_dd | Latitude of station | decimal degrees |
longitude_dd | Longitude of station | decimal degrees |
date_ymd | Date of sampling (local time zone HST) | unitless |
time_hms | Time of sampling (local time zone HST) | unitless |
minimum_filter_size_microns | Minimum size of the collection filter | microns |
maximum_filter_size_microns | Maximum size of the collection filter | microns |
cruise_id | The unique cruise identifier | unitless |
protein_id | The specific name of the full protein length sequence assembled in the metagenome that was used for peptide identification. An identifier that uniquely identifies this protein within this dataset and the FASTA file (see Related Datasets). | unitless |
spectral_count_sum | Spectral count of each protein | count |
molecular_weight_kDa | Molecular weight of the full length protein sequences | kDa |
protein_name | Descriptive name of the function of the protein | unitless |
ncbi_id | NCBI Taxonomy organism identifier (for ncbi_name) | taxon |
ncbi_name | NCBI Taxonomy name (corresponding to ncbi_id) | verbatimIdentifiation |
kegg_id | The Kegg Orthology Entry identifier for the best Kegg match. | unitless |
kegg_description | Description of the function of the specific KEGG protein group | unitless |
kegg_pathway | Decription of the cellular pathway that the KEGG protein is a part of. | unitless |
pfams_id | Protein family (Pfam) ID number | unitless |
pfams_name | Protein family (Pfam) description | unitless |
uniprot_id | Uniprot database ID number | unitless |
enzyme_comm_id | Enzyme Commission ID number | unitless |
ISO_DateTime_UTC | Datetime with timezone (UTC) of sampling in ISO 8601 format | unitless |
Dataset-specific Instrument Name | Alpkem Autosampler |
Generic Instrument Name | Alpkem RFA300 |
Dataset-specific Description | Used in nutrient analysis |
Generic Instrument Description | A rapid flow analyser (RFA) that may be used to measure nutrient concentrations in seawater. It is an air-segmented, continuous flow instrument comprising a sampler, a peristaltic pump which simultaneously pumps samples, reagents and air bubbles through the system, analytical cartridge, heating bath, colorimeter, data station, and printer. The RFA-300 was a precursor to the smaller Alpkem RFA/2 (also RFA II or RFA-2). |
Dataset-specific Instrument Name | SeaBird SBE19 CTD |
Generic Instrument Name | CTD Sea-Bird |
Dataset-specific Description | Used for water sampling |
Generic Instrument Description | Conductivity, Temperature, Depth (CTD) sensor package from SeaBird Electronics, no specific unit identified. This instrument designation is used when specific make and model are not known. See also other SeaBird instruments listed under CTD. More information from Sea-Bird Electronics. |
Dataset-specific Instrument Name | Technicon AutoAnalyzer II |
Generic Instrument Name | Technicon AutoAnalyzer II |
Dataset-specific Description | Used to measure phosphate and ammonium |
Generic Instrument Description | A rapid flow analyzer that may be used to measure nutrient concentrations in seawater. It is a continuous segmented flow instrument consisting of a sampler, peristaltic pump, analytical cartridge, heating bath, and colorimeter. See more information about this instrument from the manufacturer. |
Dataset-specific Instrument Name | Trace Metal Rosette |
Generic Instrument Name | Trace Metal Bottle |
Dataset-specific Description | Used for nutrient sampling |
Generic Instrument Description | Trace metal (TM) clean rosette bottle used for collecting trace metal clean seawater samples. |
Website | |
Platform | R/V Falkor |
Report | |
Start Date | 2016-01-16 |
End Date | 2016-02-11 |
Description | Project: Using Proteomics to Understand Oxygen Minimum Zones (ProteOMZ)
More information is available from the ship operator at https://schmidtocean.org/cruise/investigating-life-without-oxygen-in-the...
Additional cruise information is available from the Rolling Deck to Repository (R2R): https://www.rvdata.us/search/cruise/FK160115 |
From Schmidt Ocean Institute's ProteOMZ Project page:
Rising temperatures, ocean acidification, and overfishing have now gained widespread notoriety as human-caused phenomena that are changing our seas. In recent years, scientists have increasingly recognized that there is yet another ingredient in that deleterious mix: a process called deoxygenation that results in less oxygen available in our seas.
Large-scale ocean circulation naturally results in low-oxygen areas of the ocean called oxygen deficient zones (ODZs). The cycling of carbon and nutrients – the foundation of marine life, called biogeochemistry – is fundamentally different in ODZs than in oxygen-rich areas. Because researchers think deoxygenation will greatly expand the total area of ODZs over the next 100 years, studying how these areas function now is important in predicting and understanding the oceans of the future. This first expedition of 2016 led by Dr. Mak Saito from the Woods Hole Oceanographic Institution (WHOI) along with scientists from University of Maryland Center for Environmental Science, University of California Santa Cruz, and University of Washington aimed to do just that, investigate ODZs.
During the 28 day voyage named “ProteOMZ,” researchers aboard R/V Falkor traveled from Honolulu, Hawaii to Tahiti to describe the biogeochemical processes that occur within this particular swath of the ocean’s ODZs. By doing so, they contributed to our greater understanding of ODZs, gathered a database of baseline measurements to which future measurements can be compared, and established a new methodology that could be used in future research on these expanding ODZs.
Funding Source | Award |
---|---|
Gordon and Betty Moore Foundation: Marine Microbiology Initiative (MMI) | |
Schmidt Ocean Institute (SOI) |