Dataset: Halanych-Kocot_2014_T1: NCBI accessions
Deployment: Halanych_lab_2011-16

Antarctic invertebrate collection locations and NCBI SRA accessions

Principal Investigator:

Kenneth M. Halanych (Auburn University)

Co-Principal Investigator:

Andrew Mahon (Central Michigan University)

BCO-DMO Data Manager:

Nancy Copley (Woods Hole Oceanographic Institution, WHOI BCO-DMO)

Project:

Genetic connectivity and biogeographic patterns of Antarctic benthic invertebrates (Antarctic Inverts)

Current State:

Preliminary and in progress

Version Date:

2016-12-22

Data URL:

https://www.bco-dmo.org/dataset-deployment/671645/data

Expand/Collapse All

Description

Methods & Sampling

Dataset acquisition description

From Halanych and Kocot (2014):

Taxon sampling
Transcriptomes were obtained for the 18 invertebrate species listed in Table 1. For most animals, adult body wall (epidermis, muscles, and underlying tissue) was used for transcriptomes. For entoprocts and small brachiopods (Novocrania anomala and Macandrevia cranium), whole-body tissue was employed; for larger brachiopods (Glottidia pyramidata, Hemithris psittacea, and Laqueus californicus), extractions focused on mantle and muscle; for the solenogaster (a.k.a neomenioid) aplacophoran mollusc Proneomenia custodiens, unhatched juveniles were used. Collection locality information is also given in Table 1. The three annelid transcriptomes (Magelona beckleyi, Paramphinome jeffreysii, and Phyllochaetopterus prolifica) have been previously published (Weigert et al., 2014) in a phylogenomics paper detailing basal annelid relations. The remainder were part of a phylogenomics project on lophotrochozoan relationships (Kocot, 2013).

Transcriptomic data collection
We follow the methods described in Weigert et al. (2014) and Kocot (2013), which are briefly summarized here. RNA was extracted with TRIzol (Invitrogen) and purified with the RNeasy kit (Qiagen; Valencia, CA) using oncolumn DNAse digestion. In the case of the small juveniles of Proneomenia custodiens, RNA was extracted with the RNeasy Micro kit (Qiagen; Valencia, CA). The SMART cDNA library construction kit (Clontech Laboratories, Mountain View, CA) was used to construct cDNA libraries. These libraries were sequenced by The Genomic Services Lab at the Hudson Alpha Institute in Huntsville, Alabama, on the Illumina (San Diego, CA) HiSeq 2000 platform with 2x100 paired-end sequencing. To reduce memory usage during transcriptome assembly, sequencing data were digitally normalized to a k-mer coverage depth of 30 using the normalize-by-median.py script (Brown et al., 2012). Data were then assembled using Trinity (Grabherr et al., 2011; 8 June 2012 version) with default settings.

Identification of genes
UniProt accession numbers for human toll-like receptor genes were obtained from InnateDB (Innate DB, 2014; Lynn et al., 2008; Breuer et al., 2013). Amino acid sequences for TLR1-TLR10 were obtained from then National Center for Biotechnology Information (NCBI; Table 2) and used for bait in a TBLASTN search (Altschul et al., 1990) against the assembled nucleotide transcriptomes. For these searches, an e-value of 10^-5 was employed and only top hits longer than 800 nucleotides (to filter out smaller gene fragments) were retained. Significant hits were translated to amino acid sequences with TransDecoder in the Trinity package (Grabherr et al., 2011), and redundant sequences were removed with cd-hit (Li and Godzik, 2006) using an identity (-c flag) of 1.0. To further confirm their identity, the translated hits were searched against the Swiss Prot database using BLASTP with an e-value cutoff of 10^-5. As TransDecoder can yield multiple translations per nucleotide sequence, all translations were subject to BLASTP, and only the best hits were retained. Sequences that returned the top hit with a significant e-value and matched to a TLR gene were retained and compared to the SMART database (SMART, 2014; Schultz et al., 1998; Letunic et al., 2012) using the "normal" search setting to identify protein domains.

Previous phylogenetic analyses (e.g., Zheng et al., 2005; Davidson et al., 2008) on TLR genes typically used only the TIR domain for analyses. These analyses are limited to about 140-180 amino acids in length. Here, we aligned recovered TLR contigs from lophotrochozoans and Priapulus with human and Drosophilia TLR genes using MAFFT ver. 7 (Katoh and Standley, 2013). Even after reducing the alignment to the conserved TIR domain, sequences were highly variable, resulting in a questionable alignment. Considering that the TIR domain is shared among several gene families (Aravind et al., 2001) and the variability of the alignment, we question whether a tree produced from these sequences accurately represents gene genealogy. This suspicion was confirmed by using the T-REX server (Boc et al., 2012) for a RAxML-VI-HPC (Stamatakis, 2006) analysis (parameters included PROTGAMMA, WAG, 100 bootstraps) that yielded a tree with very low nodal support value. Roughly 50% of the nodes reported a bootstrap value less than 40%, and exploring different parameters did little to improve results. We used a much broader taxon sampling than for previous TIR trees, which likely influenced variability of the alignment. Due to the questionable reliability of such a phylogenetic analysis, results are not included herein.

Data Processing Description

Dataset Processing Description

BCO-DMO Processing notes:
- added conventional header with dataset name, PI name, version date
- modified parameter names to conform with BCO-DMO naming conventions
- added links to NCBI accession pages
- converted lat and lon to decimal degrees
- replaced commas with underscores or semi-colons
- reformatted to flat table by adding columns for taxon1 and taxon2

More information about this dataset deployment

Funding

Award Number	Funding Source
PLR-1043670	NSF Office of Polar Programs (formerly NSF PLR)
PLR-1043745	NSF Office of Polar Programs (formerly NSF PLR)

Instruments

Automated DNA Sequencer

Supplied Name: Illumina (San Diego, CA) HiSeq 2000 platform at The Genomic Services Lab at the Hudson Alpha Institute in Huntsville, Alabama,

Supplied Description:

Instrument Type

Generic Name: Automated DNA Sequencer

Acronym: Automated Sequencer

Generic Description:

General term for a laboratory instrument used for deciphering the order of bases in a strand of DNA. Sanger sequencers detect fluorescence from different dyes that are used to identify the A, C, G, and T extension reactions. Contemporary or Pyrosequencer methods are based on detecting the activity of DNA polymerase (a DNA synthesizing enzyme) with another chemoluminescent enzyme. Essentially, the method allows sequencing of a single strand of DNA by synthesizing the complementary strand along it, one base pair at a time, and detecting which base was actually added at each step.

Thermal Cycler

Supplied Name:

Supplied Description:

Instrument Type

Generic Name: Thermal Cycler

Generic Description:

A thermal cycler or "thermocycler" is a general term for a type of laboratory apparatus, commonly used for performing polymerase chain reaction (PCR), that is capable of repeatedly altering and maintaining specific temperatures for defined periods of time. The device has a thermal block with holes where tubes with the PCR reaction mixtures can be inserted. The cycler then raises and lowers the temperature of the block in discrete, pre-programmed steps. They can also be used to facilitate other temperature-sensitive reactions, including restriction enzyme digestion or rapid diagnostics.

(adapted from http://serc.carleton.edu/microbelife/research_methods/genomics/pcr.html)

Parameters

Supplied Name	Supplied description	Supplied Units	Standard Name
taxon1	taxonomic group	unitless	taxon
taxon2	more specific taxonomic group	unitless	taxon
species	taxonomic genus and species name	unitless	species
locality	location of specimen collection	unitless	region
latitude	latitude; north is positive	decimal degrees	lat
longitude	longitude; east is positive	decimal degrees	lon
NCBI_SRA_accession	SRA NCBI GenBank accession number	unitless	accession_number
link_NCBI_SRA_accession	link to SRA NCBI GenBank accession	unitless	external_link

Database

Contribute Data

Dataset: Halanych-Kocot_2014_T1: NCBI accessions
Deployment: Halanych_lab_2011-16

Dataset acquisition description

Dataset Processing Description

Database

Contribute Data

Dataset: Halanych-Kocot_2014_T1: NCBI accessionsDeployment: Halanych_lab_2011-16

Dataset acquisition description

Dataset Processing Description

Dataset: Halanych-Kocot_2014_T1: NCBI accessions
Deployment: Halanych_lab_2011-16