SEQenv: A pipeline capable of annotating genetic sequences based on environment descriptive terms occurring within their records and/or in relevant literature.

Given a set of sequence files (in FASTA format) SEC retrieves highly similar sequences from public repositories (such as SILVA and GenBank). Subsequently, from each of those records text fields carrying environmental context information (such as the reference title and the isolation source) are being extracted. Existing links to PubMed abstracts are also being followed and the relevant abstracts collected.

Once the relevant pieces of text for each matching sequence have been gathered they are being processed by a text mining module capable of identifying any Environment Ontology (EnvO) environment descriptive terms mentioned in them.

The identified EnvO terms along with their mention frequency are then subjected to clustering analysis and multivariate statistics. As a result tagclouds and heatmaps of environment descriptive terms characterizing different set of sequences (e.g. orginitating from different samples) are being generated.

A detailed SEQenv presentation is available here (by Dr. Christopher Quince).

Click to View in New Window

Biological Importance

  • Characterize sequences from novel environments based on the enviromental context of highly similar known sequences

  • Identify potential sample contamination sequences

  • Add standardized environmental context to already deposited, plain-text annotated sequences

Biological Examples

  • Microbial 16S rRNA sequence from Vietam and Tazmania pit-latrine sample annotation

  • Lagoon sediment sample annotation

Availability: an input form facilitating the processing of user submitted sequences will be made available at this site. The pipeline components are all open source pieces of software will be made available either here, or via links to their dedicated web pages.

Sister Projects

  • ENVIRONMENTS: a standalone command line application capable of identifying environment descriptive terms, such as "coral reef, cultivated land, glacier, pelagic, forest, lagoon", in text.

  • ENVIRONMENTS and EOL: From Plain Text to Enriched Encyclopedia of Life (EOL) Contents
    A project aiming at processing the EOL Taxon pages to extract descriptions of their environmental context.

Team: Umer Ijaz, Anastasis Oulas, Simon Berger, Christina Pavloudi, Julia Schnetzer, Evangelos Pafilis#, Christopher Quince# *# (*: main software developers, #: correspondence)

Collaborators - Related Projects: Pier Luigi Buttigieg, Rezno Kottman, MicroB3 Environment Ontology (Envo)

Maintained: at the Uni. of Glasgow and the Inst. of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR) Crete, Greece

Funding: European COST Action ES 1103 Microbial ecology & the earth system: collaborating for insight and success with the new generation of sequencing tools