Contributors | Affiliation | Role |
---|---|---|
Halanych, Kenneth M. | Auburn University | Principal Investigator |
Mahon, Andrew | Central Michigan University | Co-Principal Investigator |
Copley, Nancy | Woods Hole Oceanographic Institution (WHOI BCO-DMO) | BCO-DMO Data Manager |
From Halanych and Kocot (2014):
Taxon sampling
Transcriptomes were obtained for the 18 invertebrate species listed in Table 1. For most animals, adult body wall (epidermis, muscles, and underlying tissue) was used for transcriptomes. For entoprocts and small brachiopods (Novocrania anomala and Macandrevia cranium), whole-body tissue was employed; for larger brachiopods (Glottidia pyramidata, Hemithris psittacea, and Laqueus californicus), extractions focused on mantle and muscle; for the solenogaster (a.k.a neomenioid) aplacophoran mollusc Proneomenia custodiens, unhatched juveniles were used. Collection locality information is also given in Table 1. The three annelid transcriptomes (Magelona beckleyi, Paramphinome jeffreysii, and Phyllochaetopterus prolifica) have been previously published (Weigert et al., 2014) in a phylogenomics paper detailing basal annelid relations. The remainder were part of a phylogenomics project on lophotrochozoan relationships (Kocot, 2013).
Transcriptomic data collection
We follow the methods described in Weigert et al. (2014) and Kocot (2013), which are briefly summarized here. RNA was extracted with TRIzol (Invitrogen) and purified with the RNeasy kit (Qiagen; Valencia, CA) using oncolumn DNAse digestion. In the case of the small juveniles of Proneomenia custodiens, RNA was extracted with the RNeasy Micro kit (Qiagen; Valencia, CA). The SMART cDNA library construction kit (Clontech Laboratories, Mountain View, CA) was used to construct cDNA libraries. These libraries were sequenced by The Genomic Services Lab at the Hudson Alpha Institute in Huntsville, Alabama, on the Illumina (San Diego, CA) HiSeq 2000 platform with 2x100 paired-end sequencing. To reduce memory usage during transcriptome assembly, sequencing data were digitally normalized to a k-mer coverage depth of 30 using the normalize-by-median.py script (Brown et al., 2012). Data were then assembled using Trinity (Grabherr et al., 2011; 8 June 2012 version) with default settings.
Identification of genes
UniProt accession numbers for human toll-like receptor genes were obtained from InnateDB (Innate DB, 2014; Lynn et al., 2008; Breuer et al., 2013). Amino acid sequences for TLR1-TLR10 were obtained from then National Center for Biotechnology Information (NCBI; Table 2) and used for bait in a TBLASTN search (Altschul et al., 1990) against the assembled nucleotide transcriptomes. For these searches, an e-value of 10^-5 was employed and only top hits longer than 800 nucleotides (to filter out smaller gene fragments) were retained. Significant hits were translated to amino acid sequences with TransDecoder in the Trinity package (Grabherr et al., 2011), and redundant sequences were removed with cd-hit (Li and Godzik, 2006) using an identity (-c flag) of 1.0. To further confirm their identity, the translated hits were searched against the Swiss Prot database using BLASTP with an e-value cutoff of 10^-5. As TransDecoder can yield multiple translations per nucleotide sequence, all translations were subject to BLASTP, and only the best hits were retained. Sequences that returned the top hit with a significant e-value and matched to a TLR gene were retained and compared to the SMART database (SMART, 2014; Schultz et al., 1998; Letunic et al., 2012) using the "normal" search setting to identify protein domains.
Previous phylogenetic analyses (e.g., Zheng et al., 2005; Davidson et al., 2008) on TLR genes typically used only the TIR domain for analyses. These analyses are limited to about 140-180 amino acids in length. Here, we aligned recovered TLR contigs from lophotrochozoans and Priapulus with human and Drosophilia TLR genes using MAFFT ver. 7 (Katoh and Standley, 2013). Even after reducing the alignment to the conserved TIR domain, sequences were highly variable, resulting in a questionable alignment. Considering that the TIR domain is shared among several gene families (Aravind et al., 2001) and the variability of the alignment, we question whether a tree produced from these sequences accurately represents gene genealogy. This suspicion was confirmed by using the T-REX server (Boc et al., 2012) for a RAxML-VI-HPC (Stamatakis, 2006) analysis (parameters included PROTGAMMA, WAG, 100 bootstraps) that yielded a tree with very low nodal support value. Roughly 50% of the nodes reported a bootstrap value less than 40%, and exploring different parameters did little to improve results. We used a much broader taxon sampling than for previous TIR trees, which likely influenced variability of the alignment. Due to the questionable reliability of such a phylogenetic analysis, results are not included herein.
BCO-DMO Processing notes:
- added conventional header with dataset name, PI name, version date
- modified parameter names to conform with BCO-DMO naming conventions
- reformatted to flat table by adding columns for taxon1 and taxon2
File |
---|
Hal_Kocot_2014_T3.csv (Comma Separated Values (.csv), 3.62 KB) MD5:e60f294b68124067f55f77353b92ec59 Primary data file for dataset ID 671662 |
Parameter | Description | Units |
taxon1 | clade Lophotrochozoa or Ecdysozoa | unitless |
E_Value | description | unitless |
taxon2 | more specific taxonomic group | unitless |
BLASTP_Query | description | unitless |
TBLAST_Bait | description | unitless |
Hit_Name | description | unitless |
Description | description | unitless |
Species_best_hit | description | unitless |
Start_Methionine | description | unitless |
AA_length | description | base pairs? |
Stop_codon | yes/no whether stop codon is present | unitless |
Dataset-specific Instrument Name | Illumina (San Diego, CA) HiSeq 2000 platform at The Genomic Services Lab at the Hudson Alpha Institute in Huntsville, Alabama, |
Generic Instrument Name | Automated DNA Sequencer |
Generic Instrument Description | General term for a laboratory instrument used for deciphering the order of bases in a strand of DNA. Sanger sequencers detect fluorescence from different dyes that are used to identify the A, C, G, and T extension reactions. Contemporary or Pyrosequencer methods are based on detecting the activity of DNA polymerase (a DNA synthesizing enzyme) with another chemoluminescent enzyme. Essentially, the method allows sequencing of a single strand of DNA by synthesizing the complementary strand along it, one base pair at a time, and detecting which base was actually added at each step. |
Dataset-specific Instrument Name | |
Generic Instrument Name | Thermal Cycler |
Generic Instrument Description | A thermal cycler or "thermocycler" is a general term for a type of laboratory apparatus, commonly used for performing polymerase chain reaction (PCR), that is capable of repeatedly altering and maintaining specific temperatures for defined periods of time. The device has a thermal block with holes where tubes with the PCR reaction mixtures can be inserted. The cycler then raises and lowers the temperature of the block in discrete, pre-programmed steps. They can also be used to facilitate other temperature-sensitive reactions, including restriction enzyme digestion or rapid diagnostics.
(adapted from http://serc.carleton.edu/microbelife/research_methods/genomics/pcr.html) |
Website | |
Platform | Auburn University lab |
Start Date | 2011-08-01 |
End Date | 2016-07-31 |
Description | Invertebrate genomics |
Extracted from the NSF award abstract:
The research will explore the genetics, diversity, and biogeography of Antarctic marine benthic invertebrates, seeking to overturn the widely accepted suggestion that benthic fauna do not constitute a large, panmictic population. The investigators will sample adults and larvae from undersampled regions of West Antarctica that, combined with existing samples, will provide significant coverage of the western hemisphere of the Southern Ocean. The objectives are: 1) To assess the degree of genetic connectivity (or isolation) of benthic invertebrate species in the Western Antarctic using high-resolution genetic markers. 2) To begin exploring planktonic larvae spatial and bathymetric distributions for benthic shelf invertebrates in the Bellinghausen, Amundsen and Ross Seas. 3) To continue to develop a Marine Antarctic Genetic Inventory (MAGI) that relates larval and adult forms via DNA barcoding.
Funding Source | Award |
---|---|
NSF Office of Polar Programs (formerly NSF PLR) (NSF OPP) | |
NSF Office of Polar Programs (formerly NSF PLR) (NSF OPP) |