HEATSTER - a platform for heat stress transcription factors (HSFs)

HEATSTER platform The HEATSTER database Signature motif library Annotation tool Visualization tool Deposited data Domains and motifs Version How to cite References Contact Disclaimer & data privacy

HEATSTER platform

HEATSTER aims to develop a central platform for the work with heat stress transcription factors (HSFs). Currently, the platform provides a manually curated dataset that includes 848 plant HSF sequences from 33 plant species (database version 1.0) and an extended dataset of 65 species (database version 2.0). The extended dataset was identifed by the accessible annotation tool of this webserver. It allows the automatic identification and classification of HSFs. Furthermore, a tool for visualization of HSF class motifs is provided. The dataset 1.0 of the HEATSTER database was curated by Lutz Nover and Klaus-Dieter Scharf, while the dataset version 2.0 was collected by Jannik Berz and Stefan Simm. The annotation tool was developed by Ingo Ebersberger and Sebastian Schuster.
The website and visualization tool was developed by Jannik Berz.
Critical comments and suggestions for further improvement are highly appreciated. Refer to Stefan Simm or Ingo Ebersberger for bioinformatic aspects or HSF related questions.


The HEATSTER database

Plant HSFs are encoded by multigene families consisting of 15 to 50 members. Based on conserved sequence and structural properties plant HSFs can be classified into three classes: HsfA, HsfB and HsfC. For each class, members can be further distinguished by forming subclasses indicated by numbers (e.g. HsfA1 vs. HsfA2), and multiple members within one subclass are differentiated by small case letters (e.g. HsfA1a vs. HsfA1b). An overview of the HSF families in the selected plant species and a comprehensive compilation of further details about plant HSF structure, function and evolution are reviewed by Scharf et al., 2012. The HSF names used in the database indicate the plant species abbreviated in a five letter code and the HSF-type. The Gene ID numbers refer to the gene/locus number used in the corresponding reference database.

The HEATSTER database contains the full length amino acid and open reading frame nucleotide sequences of heat stress transcription factors (HSFs) compiled for plant species with completed sequencing projects. The version 1.0 from January 2014 includes 32 manually curated angiosperm species with 26 eudicots and 6 monocots. The assignment of HSF sequences for database version 1.0 was performed on three levels:

  • homology search with known HSF sequences in publicly available EST, cDNA, and protein databases
  • refinement of identified HSF sequences by BLAST search in plant genome databases
  • classification of newly identified HSF sequences based on conserved functional and signature sequence motifs according to the widely accepted nomenclature for plant HSFs (Nover et al., 1996, 2001)

Approximately 5 to 10% of all plant HSF sequence information available in databases like NCBI or Phytozome (until January 2014) turned out to be incomplete or incorrect annotated due to sequencing errors and/or wrongly predicted splicing sites. For this reason HEATSTER version 1.0 was created. HSF sequences in this database have been subjected to manual curation and artefacts introduced by sequencing or annotation errors had been removed to the best of our knowledge. The curation step was aided by homology comparison and scanning for conserved signature sequence motifs within the predicted genomic region and in the adjacent 5' and 3' flanking genome sequences. However, we are aware that in these cases our proposed annotation remains tentative until proven by experimental evidence.
HEATSTER version 1.0 should be used for the HSF classification of Monocotyledons and Eudicotyledons. The version 1.0 harbours the essential functional motifs like DBD, OD, NLS, NES, AHA and RD as continuous signature sequence motifs.
HEATSTER version 2.0 from September 2016 extended the taxon compilation to 65 species including 1 Phaeophyta, 3 Rhodophyta and 61 species of the Viridiplantae, where the latter taxon comprises 5 Chlorophyta, 1 Bryophyta, 1 Lycopodiidae, 1 basal Magnoliophyta, 3 Gymnosperms, 16 Monocotyledons and 34 Eudicotyledons. In contrast to version 1.0, the HEATSTER HSF dataset version 2.0 was identified using the following procedure:

  • Search HSF canidates from collected proteomes via HEATSTER annotation tool
  • Creation of signature motif library

All sequences of the HEATSTER version 2.0 are directly extracted from the databases Phytozome, NCBI, Dendrome , Bambogdb, Banana hub, Kazusa Database and Cucurbit Genomics Database (September 2016). Conserved signature motifs within the predicted HSF sequences was performed by MEME and MAST (Timothy L. et al., 2009).
The version 2.0 is applicable for the HSF classification of all Viridiplantae species. Due to the usage of Chlorophyta, Gymnosperms and other plant ranks the motifs are more general due to phylogenetically more diverse training data. By this, approach the functional motifs like DBD, OD, NLS, NES, AHA and RD can be present as partial signature motifs in version 2.0. By this new functional motifs so far not mentioned can be assigned to HSF sequences.
While using the HEATSTER database for your research purposes, we recommend considering the rules associated with the use of the corresponding reference databases, especially in those cases where the genome sequencing projects are not published yet and the data are under protection of the Fort Lauderdale guidelines for large scale sequencing projects. We acknowledge the free accession to the genomic reference databases to build up the HSF database.


Signature motif library creation

Conserved signature motifs within the predicted HSF sequences were performed by MEME, TOMTOM and MAST (Bailey et al. 2009). For each HSF sub-class (co-)orthologs of the selected plant species were used. As classification of the sub-classes the widely accepted nomenclature of Nover et al. was used (Nover et al., 2001; Nover et al., 1996). The HSF sub-classes were used to perform motif search via MEME (Bailey et al., 2009). While for dataset version 1.0 a plain MEME search was conducted, dataset version 2.0 was created using different parameter sets for motif-width (10-20aa; 15-25aa; 20-30aa; 25-35aa;30-40aa) and site coverage (-OOPS; -ZOOPS; -Any). Further, for each HSF sub-class we created a decoy database containing the random shuffled sequences of the HSF sub-class and perform the motif search ten times for each parameter setting. The identification of the same signature motif in the decoy database was counted as false positive (FP) and used to calculate a false discovery rate (FDR). All signature motifs below a threshold of 0.1 were selected as signature motifs for HSF sub-classes. Further, signature motifs with an identity above 95% were merged via TOMTOM at the overlapping parts and extended to both endings. Finally, the signature motifs of the single HSF sub-classes were cross-validated by searching the HSF sub-class signature motifs in the other HSF sub-class signature motif libraries with MAST to detect sub-class specific signature motifs.
While the signature library of dataset version 1.0 was created using only monocotyledons and eudicotyledons, the dataset of version 2.0 was created using all 65 species. For this reason, version 1.0 might contain a more conserverd DBD and OD motif, while version 2.0 provides motifs conserved among all species and some smaller signature motifs for a better discrimination between the HSF classes.


Annotation tool

The HEATSTER annotation tool provides a platform that allows the automatic classification and annotation of unknown HSF sequences. The tool was developed on base of the motif search tool MAST (Bailey et al., 1998) and includes optionally the possibility to analyze either a single amino acid sequence or a set of multiple sequences at the same time. The classification and annotation is performed in two successive steps of repeated searches in a motif database, which was generated by using a training set of selected HSF sequences for identification and classification of HSF class- and subclass-specific conserved amino acid sequence motifs, respectively.
The first step of search determines the HSF class (A, B or C) to which the query sequence belongs. During the second step the particular HSF type within the identified class is assigned (sub- classification, e.g. A1, A2,..., B1, B2...). As dataset of version 1.0 contains the more conserved DBD and OD motifs, this dataset is used for the classification.
The output of the single sequence analysis indicates the presence of the two most conserved HSF domains (DBD, OD with HR-A and HR-B region). The identification as HSF and classification of the HSF-type based on the similarity of the identified motifs compared to the corresponding consensus motifs is done based on the best e-value. For more detailed examination on the sequence level the alignment to the identified functional domains and subclass-specific sequence motifs can be visualized.
For analyzing more than one sequence at the same time the batch analysis pipeline can be used. This pipeline takes any number of sequences and allows the search for HSF sequences in more complex protein databases. As MAST has only a limited capacity for searching in a sequence database with many sequences such as a whole proteome, the sequence database is filtered using a profile Hidden Markov Model (HMM) that describes the HSF-DBD domain. Thereby, only sequences containing this domain are recognized as potential HSFs for further analysis, while all other sequences are discarded. The rest of the classification process is identical to the one with a single query sequence. As output a table is provided, which shows all sequences classified as putative HSF with the indication of the presumable HSF type (class and subclass specification). Further, a visualization is available for the HSF sequences, which indicates the motifs matched (motif set of version 1.0 and 2.0) onto the provided sequence. The annotation tool relies on MAST (Timothy L. et al. , 1998) and HMMR (Finn et al., 2011).


Visualization tool

The HEATSTER visualization tool allows comparative studies among HSFs and other sequences by analyzing their specific domain and motif composition. By this, it allows the user to detect signature motifs which might be specific for different HSF classes. Further, it is possible to detect differences between known classified HSFs and user defined sequences by matching their motif and domain composition.
During the analysis, a set of HSF class specific motifs is mapped onto the HSF sequences. Therefore, a plain and extended variant of the visualization tool is provided. In the plain version the user refers to HSF sequences from species, which are stored in the HEATSTER database version 2.0. In the extended version the user can submit a query sequence to compare this sequence to a set of different plant HSF classes. In both modes, the user is able to use the signature motif library of version 1.0, version 2.0 or both.
The visualisation tool relies on a modified version of MAST (Bailey et al., 1998) from the MEME and MAST Suite. Appearance and functionality of the output provided by the visualization tool is resembles the standard MAST output. Each sequence is represented as a continuous line. The matched signature motifs and domains are indicated as colorized boxes on top of the referring sequence. The colors are chosen randomly and independently for version 1.0 and 2.0. Thus, similar colors do not infer any relation between motifs within a version or between versions. By expanding the “Show/Hide” menu and by clicking on any colorized box, the sequence logo of the underlying motif shows up.
Further, for each sequence an e-value indicates the overall expectation to see chance all signature motifs of an Hsf class onto the sequence. The combined p-value indicates the overall p-value of all signature motifs matched onto the sequence. The visualization tool offers the option to display the single amino acids of the sequence by clicking the arrow beside the sequence name. By moving the gray boxes, the user can slice through the sequence. Start and stop position of the displayed sequence section are displayed inside the box. For each matched signature motif at a specific position the referring p-values is shown, and the sequence positions matching the motif are indicated by a + sign.


Deposited Data

The HEATSTER contains HSF sequence information and classification of 33 plant species (version 1.0) and 65 species (version 2.0), respectively. As downloadable content it is possible to get all database information of HEATSTER version 1.0 or version 2.0. Deposited HSF-like and HSF-related lists are available as well as motif and domain sets of the different HSF sub-classes. The motifs for classification of the HSF are not necessarily the whole functional domain. Further a list of all analyzed plant species is deposited and contains a link to the downloaded proteome in the reference database. The proteomes were downloaded in April 2016.




Figure: Species included in the HEATSTER databases. The taxonomic tree includes all proteomes of species included for the search of HSFs. In grey the Phaeophyta (dark grey) and Rhodophyta (light grey) are used as outgroups due to the fact, that they are not assigned to the VIridiplantae. All other species are color coded concerning their taxonomic rank. The HEATSTER database v 2.0 includes green algae (chlorophyta, light green), mosses (bryophyta, orange), club-mosses (lycopodiophyta, cyan), gymnosperms (red), basal mangoliophyta (yellow), monocots (blue) and eudicots (dark green). The species included in the visual curated HEATSTER database v 1.0 are marked with an asterisk and is limited to monocots and eudicots. The species are named concerning the five letter abbreviation (first three letters = genus and last two letters = species epithet) and the full list with their complete names can be downloaded in the HEATSTER data section.


Overview of plant species and reference databases:

Name Short name Family Data origin

HSF domains and sequence motifs

DBD: HSF DNA-binding domain with the central and mostly conserved winged helix-turn-helix motif
OD: HSF oligomerization domain, formation of homo- and hetero-oligomeric HSF complexes
HR-A/B: heptad pattern of hydrophobic amino acid (aa) residues forming the OD, the N-terminal region (part A) is separated from the C-terminal part B by a linker region of 6 aa residues in class B HSFs, and additionally inserted 21 aa residues in class A or 7 aa residues in class C HSFs
NLS: HSF nuclear localization sequence, monopartite (m) or bipartite (b)
NES: HSF nuclear export sequence
AHA: HSF transcription activator motifs, consisting of aromatic (A), large hydrophobic (H) and acidic (A) aa residues, typically identified in class A HSFs
RD: HSF transcription repression domain, identified in distinct class B HSFs






Figure: General domain structure of the three major HSF classes in plants.
The three different HSF classes HsfA, HsfB and HsfC (Nover et al. , 1996, 2001) are characterized by specific sequences and composition of their functional domains involved in DNA binding (DBD, yellow), oligomerization ( HR-A/B region, green); nuclear localization and export (NLS and NES, orange); transcriptional activation and repression (AHA and RD, blue). Hallmark and conserved in all HSFs is the N-terminal DNA binding domain with a central helix-turn-helix motif flanked by two ß-strands on each side. The oligomerization domain (OD) consists of two adjacent heptameric hydrophobic repeat regions (HR-A/B) which are separated by 6 amino acid (aa) residues in non-plant and all plant class B HSFs. This linker (solid line) is extended by additional 21 aa residues in class A and 7 aa residues in class C HSFs, respectively. Intracellular localization is regulated by nuclear localization sequences (NLS) and nuclear export sequences (NES) in the C-terminal activation domain (CTAD). Both, class B and class C HSFs do not comprise a NES. Short sequence motifs formed by aromatic, large hydrophobic and acidic aa residues (AHA) are typical for class A HSFs and required for transcriptional activation, while in class B HSFs transcriptional repression is associated with the conserved repression domain (RD). Dotted lines represent less conserved sequence regions of variable length without annotated functional motifs. However, they might harbor conserved short sequence motifs relevant for sub- and sub-sub classification of individual HSFs. HEATSTER offers motif sets of v.1 & v.2.For further details see information. (Figure adapted from Scharf et al. , 2012 )

Version

  • Website version 1.1
  • Database version 1.0 & 2.0
  • Annotation tool version 1.0
  • Visualization tool version 1.0

How to cite HEATSTER

When using HEATSTER please cite:
Scharf KD, Berberich T, Ebersberger I, Nover L. The plant heat stress transcription factor (Hsf) family: Structure, function and evolution. Biochim Biophys Acta (2012) 1819:104-119, doi:10.1016/j.bbagrm.2011.10.002

For HSF annotation, please use the widley accepted nomenclature suggested in:
Scharf KD, Berberich T, Ebersberger I, Nover L. The plant heat stress transcription factor (Hsf) family: Structure, function and evolution. Biochim Biophys Acta (2012) 1819:104-119, doi:10.1016/j.bbagrm.2011.10.002

References

Scharf KD, Berberich T, Ebersberger I, Nover L. The plant heat stress transcription factor (Hsf) family: Structure, function and evolution. Biochim Biophys Acta (2012) 1819:104-119, doi:10.1016/j.bbagrm.2011.10.002

Bailey, T.L., et al. (2009) MEME SUITE: tools for motif discovery and searching, Nucleic acids research, 37, W202-208.

Finn, R.D., Clements, J. and Eddy, S.R. (2011) HMMER web server: interactive sequence similarity searching, Nucleic acids research, 39, W29-37.

Nover, L., et al. (2001) Arabidopsis and the heat stress transcription factor world: how many heat stress transcription factors do we need?, Cell Stress Chaperon, 6, 177-189.

Nover, L., et al. (1996) The Hsf world: classification and properties of plant heat stress transcription factors, Cell Stress Chaperones, 1, 215-223.

Further, Bootstrap, jQuery, and sorttable were used. The underlying HSF database relies on MySQL. The website is designed using HTML, CSS and PHP and annotation tool is writte in Perl.



Contact

This site is hosted and maintained by:

  • Dr. Stefan Simm
    Department of Molecular Cell Biology of Plants - group of Prof. Dr. Enrico Schleiff
    Max-von-Laue-Str. 9 (Campus Riedberg)
    60438 Frankfurt am Main
    N200 / 3. OG Room 06
    Germany
    Phone +496979829289
    Mail: Simm@bio.uni-frankfurt.de

  • Prof. Dr. Ingo Ebersberger
    Applied bioinformatics - group of Prof. Dr. Ingo Ebersberger
    Biologicum, Campus Riedberg
    Gebäudeteil B, 3.OG
    Max-von-Laue-Straße 13
    60438 Frankfurt am Main
    Germany

Critical comments and suggestions for further improvement are highly appreciated. Refer to Stefan Simm or to Ingo Ebersberger for HSF-related queries and bioinformatics aspects.


Disclaimer

For acknowledging the use of the HEATSTER platform please refer to:
Scharf KD, Berberich T, Ebersberger I, Nover L. The plant heat stress transcription factor (Hsf) family: Structure, function and evolution. Biochim Biophys Acta (2012) 1819:104-119, doi:10.1016/j.bbagrm.2011.10.002

For HSF annotation, please use the widley accepted nomenclature suggested in:
Scharf KD, Berberich T, Ebersberger I, Nover L. The plant heat stress transcription factor (Hsf) family: Structure, function and evolution. Biochim Biophys Acta (2012) 1819:104-119, doi:10.1016/j.bbagrm.2011.10.002

This file is part of HEATSTER database. HEATSTER database is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. HEATSTER database is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with HEATSTER database. If not, see GNU licence.

For usage and license of included tools, please refer to MEME Suite and HMMER. While using the HEATSTER database for your research purposes we recommend considering the rules associated with the use of the corresponding reference databases, especially in those cases where the genome sequencing projects are not published yet and the data are under protection of the Fort Lauderdale guidelines for large scale sequencing projects.

Data privacy statement

By using our website you consent to the collection, processing and use of data as described below. Our website can be visited without registration. This data such as pages called or name of the retrieved file, date and time are stored on the server for statistical purposes, without direct relation of this data to your person. Personal data, in particular name, address or e-mail address are collected as far as possible on a voluntary basis. Without your consent, the data will not be passed on to third parties.

Privacy policy for cookies

Currently, our website is not using cookies.

Source: Data privacy sample

Some really nice stuff hereQ!

Homepage: 2.0
Database: version 1.0 and 2.0 included.