Microbial eukaryotic focused metatranscriptome data from seawater collected in coastal California in May of 2015

Website: https://www.bco-dmo.org/dataset/745518

Data Type: Other Field Results

Version: 2

Version Date: 2018-10-15

Project

» Protistan, prokaryotic, and viral processes at the San Pedro Ocean Time-series (SPOT)

Contributors	Affiliation	Role
Caron, David	University of Southern California (USC)	Principal Investigator
Hu, Sarah K.	University of Southern California (USC)	Co-Principal Investigator, Contact
York, Amber D.	Woods Hole Oceanographic Institution (WHOI BCO-DMO)	BCO-DMO Data Manager

Abstract

Seawater was collected via Niskin bottles mounted with a CTD from the San Pedro Ocean Time-series (SPOT) station off the coast of Southern California near the surface (5 m), 150 and 890 m, in late May 2015. Raw sequence data was generated as part of a metatranscriptome study targeting the protistan community. Raw sequences are available at the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database (SRA Study ID: SRP110974, BioProject: PRJNA391503). Sequences for BioProject PRJNA608423 will be available at NCBI on Jan 1st, 2021. These data were published in Hu et al. (2018).

Coverage
Dataset Description
Data Files
Supplemental Files
Related Publications
Parameters
Instruments
Deployments
Project Information
Funding

Coverage

Spatial Extent: Lat:33.55 Lon:-118.4

Temporal Extent: 2015-05-20

Dataset Description

Methods & Sampling

Seawater was collected from the San Pedro Ocean Time-series (SPOT) station off the coast of Southern California near the surface (5 m), 150 and 890 m, in late May 2015. Briefly, seawater was pre-filtered (80 mm) into 20 L carboys to minimize the presence of multicellular eukaryotes. Replicate samples (ranging in volume from 1.5-3.5 L) from each depth were filtered onto sterile GF/F filters (nominal pore size 0.7 mm, Whatman, International Ltd. Florham Park, NJ). While we cannot avoid some impact that sample handling (i.e., bringing samples to the surface) may have had on our results, filters were immediately placed in 1.5 mL of lysis buffer and flash frozen in liquid nitrogen in < 40 min and away from light to minimize RNA degradation.

Total RNA was extracted from each filter using a DNA/RNA AllPrep kit (Qiagen, Valencia, CA, #80204) with an in-line genomic DNA removal step (RNase-free DNase reagents, Qiagen #79254) (dx.doi.org/10.17504/protocols.io.hk3b4yn). Extracted RNA was quality checked and low biomass samples were pooled. Six replicates were processed and sequenced from the surface, while pairs of filters were pooled for either 150 or 890 m, yielding 3 and 4 replicates respectively (Supporting Information Table S1). RNA concentrations were normalized before library preparation (Supporting Information). ERCC spike-in was added before sequence library preparation with Kapa’s Stranded mRNA library preparation kit using poly-A tail selection beads to select for eukaryotic mRNA (Kapa Biosystems, Inc., Wilmington, MA, #KK8420).

Also see:

https://www.protocols.io/view/sample-collection-from-the-field-for-downstream-mo-hisb4eehttps://www.protocols.io/view/rna-and-optional-dna-extraction-from-environmental-hk3b4yn

The associated assembly files can be found at Zenodo (see Hu, S. K. (2017), DOI: 10.5281/zenodo.1202041). The assembly files were also published in the journal publication Hu, et al. (2018).

Related code can be found in the github repository https://github.com/shu251/SPOT_metatranscriptome. The version of the code used for these publications can be found in the Supplemental Files section of this page.

Data Processing Description

Sequence adapters, low quality (phred score < 10, from 5’ and 3’ ends, and within a 25 bp sliding window) or short sequences (< 50 bps), and sequences containing more than 50 consecutive As or Ts were removed using Trimmomatic v. 0.32 (Bolger et al., 2014). All quality trimmed sequences were aligned to ERCC sequences using ‘align_and_estimate_abun- dance.pl’ in the Trinity v. 2.1.1 (Grabherr et al., 2011) package. Reads that were aligned to ERCC sequences were removed using a custom PERL script (available: https://github.com/shu251/SPOT_metatranscriptome).

BCO-DMO Processing Description

[Changes discussed with and reviewed by the data submitter]
* added a conventional header with dataset name, PI name, version date
* modified parameter names to conform with BCO-DMO naming conventions
* blank values in this dataset are displayed as "nd" for "no data." nd is the default missing data identifier in the BCO-DMO system.
* removed columns "object_status" and blank columns "filename3" and "filename4."
* Added column SRA_ID_link to the SRA run at NCBI
* removed column "assembly" which had values of "See related publication for access to assembly files" This information was included in the Methods & Sampling section of the methodology and further explained.
* For curatorial purposes BCO-DMO forked the github code repository https://github.com/shu251/SPOT_metatranscriptome and created a github release (see https://github.com/BCODMO/SPOT_metatranscriptome/releases/tag/bcodmo_v1). The release .zip file was downloaded to BCO-DMO's servers and added to the dataset landing page as a supplemental file to satisfy NSF OCE sharing requirements.
* changes in version 2: data for bioproject PRJNA608423 added to the dataset.

[ table of contents | back to top ]

Data Files

File
metaT.csv (Comma Separated Values (.csv), 11.80 KB) MD5:cbec0cc726f0b64619a035aba3abdbfa Primary data file for dataset ID 745518

[ table of contents | back to top ]

Supplemental Files

File
SPOT metatranscriptome code from Github, release bco-dmo_v1 filename: SPOT_metatranscriptome-bcodmo_v1.zip (ZIP Archive (ZIP), 810.06 KB) MD5:369ccce50b2d833723c7f9ea7607e78d Zip file containing required code for data compilation and analysis for a eukaryotic-focused metatranscriptome survey in the North Pacific. This is release "bcodmo_v1" https://github.com/BCODMO/SPOT_metatranscriptome/tree/bcodmo_v1.

File

SPOT metatranscriptome code from Github, release bco-dmo_v1
filename: SPOT_metatranscriptome-bcodmo_v1.zip

(ZIP Archive (ZIP), 810.06 KB)
MD5:369ccce50b2d833723c7f9ea7607e78d

Zip file containing required code for data compilation and analysis for a eukaryotic-focused metatranscriptome survey in the North Pacific. This is release "bcodmo_v1" https://github.com/BCODMO/SPOT_metatranscriptome/tree/bcodmo_v1.

[ table of contents | back to top ]

Related Publications

Hu, S. (2017). RNA (and optional DNA) extraction from environmental samples (filters) v2 (protocols.io.hk3b4yn). Protocols.io. doi:10.17504/protocols.io.hk3b4yn

Hu, S. (2017). Sample collection from the field for downstream molecular analysis - microbial eukaryote-focused v1. Protocols.io. doi:10.17504/protocols.io.hisb4ee

Hu, S. (2018). Required code for data compilation and analysis for a eukaryotic-focused metatranscriptome survey in the North Pacific (Version bcodmo_v1) [Computer software]. GitHub. Retrieved October 15, 2018, from https://github.com/BCODMO/SPOT_metatranscriptome/tree/bcodmo_v1

Hu, S. K. (2017). Shifting Metabolic Priorities Among Key Protistan Taxa Within And Below The Euphotic Zone [Data set]. Zenodo. https://doi.org/10.5281/zenodo.1202041

Hu, S. K., Liu, Z., Alexander, H., Campbell, V., Connell, P. E., Dyhrman, S. T., … Caron, D. A. (2018). Shifting metabolic priorities among key protistan taxa within and below the euphotic zone. Environmental Microbiology. doi:10.1111/1462-2920.14259

[ table of contents | back to top ]

Parameters

Parameter	Description	Units
SRA_run	SRA Run identifier at NCBI	unitless
SRA_run_link	URL for SRA Run Page at NCBI	unitless
SRA_study	SRA study identifier at NCBI	unitless
bioproject_accession	BioProject accesion number at NCBI	unitless
biosample_accession	BioSample accession number at NCBI	unitless
library_ID	SRA title	unitless
title	Descriptive title of SRA accession	unitless
sample_name	Sample name	unitless
library_strategy	Library strategy ("AMPLICON")	unitless
library_source	Library source ("TRANSCRIPTOMIC" or "GENOMIC")	unitless
library_selection	Library selection ("PCR")	unitless
library_layout	Library layout ("paired")	unitless
platform	Sequencing platform ("Illumina")	unitless
instrument_model	Sequencing instrument model ("Illumina MiSeq")	unitless
design_description	Sequencing design description	unitless
filetype	Type of files	unitless
filename	Name of file 1 (see NCBI for access)	unitless
filename2	Name of file 2 (see NCBI for access)	unitless

[ table of contents | back to top ]

Instruments

Dataset-specific Instrument Name	HiSeq
Generic Instrument Name	Automated DNA Sequencer
Dataset-specific Description	HiSeq High Output 125 bp PE sequencing was performed at UPC Genome Core at University of Southern California, Los Angeles, CA (BioProject: PRJNA391503).
Generic Instrument Description	A DNA sequencer is an instrument that determines the order of deoxynucleotides in deoxyribonucleic acid sequences.

Dataset-specific Instrument Name
Generic Instrument Name	Niskin bottle
Generic Instrument Description	A Niskin bottle (a next generation water sampler based on the Nansen bottle) is a cylindrical, non-metallic water collection device with stoppers at both ends. The bottles can be attached individually on a hydrowire or deployed in 12, 24, or 36 bottle Rosette systems mounted on a frame and combined with a CTD. Niskin bottles are used to collect discrete water samples for a range of measurements including pigments, nutrients, plankton, etc.

[ table of contents | back to top ]

Deployments

SPOT_Yellowfin_Cruises

Website	https://www.bco-dmo.org/deployment/754348
Platform	R/V Yellowfin
Start Date	2005-01-19
End Date	2018-07-18
Description	San Pedro Ocean Time Series (SPOT) station (33°33′N, 118°24′W) R/V Yellowfin, monthly SPOT cruises in the San Pedro Channel Deployment: SPOT Platform: RV Yellowfin Platform Type: vessel

[ table of contents | back to top ]

Project Information

Protistan, prokaryotic, and viral processes at the San Pedro Ocean Time-series (SPOT)

Coverage: San Pedro Channel off the coast of Los Angeles

Planktonic marine microbial communities consist of a diverse collection of bacteria, archaea, viruses, protists (phytoplankton and protozoa) and small animals (metazoan). Collectively, these species are responsible for virtually all marine pelagic primary production where they form the basis of food webs and carry out a large fraction of respiratory processes. Microbial interactions include the traditional role of predation, but recent research recognizes the importance of parasitism, symbiosis and viral infection. Characterizing the response of pelagic microbial communities and processes to environmental influences is fundamental to understanding and modeling carbon flow and energy utilization in the ocean, but very few studies have attempted to study all of these assemblages in the same study. This project is comprised of long-term (monthly) and short-term (daily) sampling at the San Pedro Ocean Time-series (SPOT) site. Analysis of the resulting datasets investigates co-occurrence patterns of microbial taxa (e.g. protist-virus and protist-prokaryote interactions, both positive and negative) indicating which species consistently co-occur and potentially interact, followed by examination gene expression to help define the underlying mechanisms. This study augments 20 years of baseline studies of microbial abundance, diversity, rates at the site, and will enable detection of low-frequency changes in composition and potential ecological interactions among microbes, and their responses to changing environmental forcing factors. These responses have important consequences for higher trophic levels and ocean-atmosphere feedbacks. The broader impacts of this project include training graduate and undergraduate students, providing local high school student with summer lab experiences, and PI presentations at local K-12 schools, museums, aquaria and informal learning centers in the region. Additionally, the PIs advise at the local, county and state level regarding coastal marine water quality.

This research project is unique in that it is a holistic study (including all microbes from viruses to small metazoa) of microbial species diversity and ecological activities, carried out at the SPOT site off the coast of southern California. In studying all microbes simultaneously, this work aims to identify important ecological interactions among microbial species, and identify the basis(es) for those interactions. This research involves (1) extensive analyses of prokaryote (archaean and bacterial) and eukaryote (protistan and micro-metazoan) diversity via the sequencing of marker genes, (2) studies of whole-community gene expression by eukaryotes and prokaryotes in order to identify key functional characteristics of microorganismal groups and the detection of active viral infections, and (3) metagenomic analysis of viruses and bacteria to aid interpretation of transcriptomic analyses using genome-encoded information. The project includes exploratory metatranscriptomic analysis of poorly-understood aphotic and hypoxic-zone protists, to examine their stratification, functions and hypothesized prokaryotic symbioses.

[ table of contents | back to top ]

Funding

Funding Source	Award
NSF Division of Ocean Sciences (NSF OCE)	OCE-1737409

[ table of contents | back to top ]