In plants, the age of systems biology has accelerated the investigation of complex molecular mechanisms underlying intricate developmental and physiological processes. Since plants are anchored to their environment, they cannot escape from stresses by simply moving away. Instead, plants have developed a wide range of mechanisms to cope with environmental fluctuations. This plasticity generally involves changes at the level of DNA, RNA, protein and metabolites, resulting in complex phenotypes governed by multiple genes. Advanced genetic and molecular tools have led to tremendous progress in revealing the genetic architecture but also the regulatory mechanisms of complex traits (Mochida and Shinozaki, 2011). The development of molecular profiling techniques nowadays enables the high-throughput and affordable acquisition of large omics data sets, such as for transcriptomics, proteomics and metabolomics.
While substantial efforts are being made to generate large omics data sets, there is a growing need to develop platforms to integrate these data and derive models describing biological interactions in plants. In this context, networks have rapidly become an attractive approach to manage, display and contextualize these large data sets in order to obtain a system level and molecular understanding of biological key processes (Barabási and Oltvai, 2004; Usadel et al., 2009; Costa et al., 2015; Silva et al., 2016).
Biological networks are generally classified by the nature of the compounds and interactions involved. These networks can be derived from various molecular data resulting in, e.g., gene expression networks (correlation or co-expression networks), protein-protein interaction (PPI) networks, metabolic networks and signaling networks. Graphically, networks are represented as an ensemble of components (nodes or vertices) and interactions depicted by links (edges) connecting pairs of nodes. Such interaction maps provide an attractive framework to study the organizational structure of complex systems and have found many applications in plants (Jiménez-Gómez, 2014).
The fast development of transcriptomic technologies, as compared to other analytical platforms, has supported a range of studies on genetic and environmental perturbations at the transcriptome level in many organisms. Co-expression networks have grown in popularity in the last years as they enable the integration of large transcriptional data sets (Li et al., 2015; Liseron-Monfils and Ware, 2015). Co-expression network analysis allows the simultaneous identification, clustering and exploration of thousands of genes with similar expression patterns across multiple conditions (co-expressed genes). The main procedure for co-expression network inference is explained in Box 1 and illustrated in Figure 1. Briefly, a similarity score (i.e., correlation coefficient) is calculated from the pairwise comparison of the gene expression patterns for each possible pair of genes. Above a certain threshold, genes and gene pairs form a list of nodes and corresponding edges from which the network is constructed. As a rule, the guilt-by-association principle is applied stating that genes sharing the same function or that are involved in the same regulatory pathway will tend to present similar expression profiles and hence form clusters or modules in the network (Wolfe et al., 2005). Thus, within the same module, genes of known function can be used to predict the function of co-expressed unknown genes (Rhee and Mutwil, 2014).
Box 1. Network Inference
Constructing a network of genes from expression data generally consists of the following steps: first a measure of similarity or relatedness is calculated for each of the possible gene pairs. The resulting list of gene pairs is then filtered using a threshold value for the similarity score. The remaining gene pairs form a list of edges from which the network is constructed (Figure 1). As an optional next step, modules of highly related genes can be extracted from the network using gene prioritization approaches.
Gene expression values are usually log2 transformed before calculating the similarity score in order to scale the values to the same dynamic range.
Several measures are used to determine a similarity score between gene pairs, each with its specific strengths and weaknesses. Simple Pearson or Spearman correlation is often used and performs well compared to more sophisticated methods, both in terms of finding gene relationships and performance on large data sets (Song et al., 2012; Ballouz et al., 2015). Pearson is the most popular correlation measure, although it assumes a linear correlation, normally distributed values and is sensitive to outliers. Spearman's rank correlation is more robust, but also less powerful. Another often used measure that can describe non-linear relations between genes is called Mutual Information (MI) (Meyer et al., 2008). Song et al. (2012) found that in many situations MI does not perform better than correlation. They proposed “bi-weight mid-correlation” (bicor) as an attractive alternative correlation measure that is more robust than Pearson correlation.
When the similarity scores between all gene pairs have been determined, a cutoff is applied to select the gene pairs that should be connected in the network. This can be an arbitrary cutoff, but there are several ways to make a more informed choice. Lee et al. (2004) selected only the top 0.5% most positively and the top 0.5% most negatively correlated pairs. Bassel et al. (2011) chose a cutoff that results in a network following a power-law distribution, using the Weighted Gene Co-expression Network Analysis (WGCNA) package (Langfelder Langfelder and Horvath, 2008). Butte and Kohane (2000) used random permutations of the expression data to determine a cutoff for significant interactions. Other approaches calculate a p-value based on the null hypothesis that the correlation between two genes is 0.
Zhang and Horvath (2005) proposed to use soft thresholds instead of hard cutoffs, to produce weighted gene networks and preserve the underlying continuous nature of the correlation. However, visualizing these networks is challenging since the directly linked neighbors of a node are difficult to identify.
Correlation networks do not distinguish between direct and indirect interactions. The ARACNE algorithm (Margolin et al., 2006; Meyer et al., 2008) addresses this by pruning edges based on the analysis of gene triplets. If genes A, B, and C are fully connected in the network and the edge between A and C has the lowest weight, this edge could actually be an indirect interaction of A and C through B.
Correlation networks have undirected edges, since no causality can be inferred from two connected genes, although work has been published to address this (Opgen-Rhein and Strimmer, 2007). Regression methods are well-suited to find directed edges, since they try to find the set of genes that best predict the expression of a given target gene. However, because regression methods are generally computational demanding, the set of possible predictor genes is often limited to known transcription factors (Vignes et al., 2011; Marbach et al., 2012). In addition, Bayesian networks also allow the inclusion of prior knowledge, but their application is even more computationally challenging and not feasible for large sets of genes (Tamada et al., 2003; Imoto et al., 2004; Werhli and Husmeier, 2008).
Figure 1. Co-expression network inference pipeline. The biological question addressed drives the strategy for the co-expression network analysis: prior knowledge can be used to identify guide-genes and co-expression databases can be queried to investigate gene co-expression patterns across multiple conditions. Similarity in gene expression patterns is calculated using correlation coefficients (Pearson, Spearman…). A user defined threshold (in this example set at 0.8) enables the selection of genes with high co-expression scores. Significantly co-expressed genes are reported in the binary adjacency matrix as 1. A clustering algorithm is applied on the adjacency matrix to infer networks of significantly co-expressed genes. In the resulting network, significantly co-expressed genes are depicted as numbered nodes (vertices) linked by edges (links). The length of the edges is relative to the expression similarity of the connected genes, with a short edge corresponding to a high co-expression value. A “path” corresponds to the number of edges connecting two nodes (the shortest path from node 9 to 4 is 4 edges). Hubs are identified as highly connected nodes (node 1) and group of connected genes form modules (nodes 1–7). Network properties can be described by different parameters such as:
•The connectivity of a network corresponds to the total number of links in the network.
•The node degree corresponds to the number of connections of a node with other nodes in the network (node 4 has a node degree of 3).
•The betweenness of a node corresponds to the sum of the shortest paths connecting all pair of nodes in the network, passing through that specific node. The betweenness of node 8 corresponds to the sum of the shortest path the connecting node 10–9, 3–9, 4–9 etc…).
The two main applications for co-expression network analysis are to find novel genes involved in the biological process under investigation and to suggest the biological process a gene is involved in. Intuitively, reliable networks are needed to infer meaningful gene function predictions. Such networks heavily depend on a combination of decisions taken throughout the network inference process. From the quality, type and availability of the input data, the correlation coefficient and inference algorithm used, to the prior knowledge, the experimental and computational resources, any negligence can result in unreliable networks and subsequent misleading biological interpretations.
Caveats and opportunities of co-expression network analyses have been discussed previously (Usadel et al., 2009). When handling large data sets, co-expression networks can become very complex which limits their biological interpretation (Usadel et al., 2009). In addition, in contrast to regulatory networks, and because of their static representation, co-expression networks do not provide per se information on the nature of the regulatory relationship of connected genes (Stuart et al., 2003). Careful application of network analysis tools and strategies is thus important to maximize the information extraction, to disentangle reliable network connections and to infer true biological meaning.
In this review, we aim to provide an overview of the different strategies to employ during or after the co-expression network construction with the common aim of exploiting the full predictive potential of co-expression networks. The application of these strategies is illustrated by examples of recent studies. Particular attention is given to available and promising bioinformatics tools. Finally, we will speculate on network aspects worth developing in the near future to strengthen their inference power for a comprehensive understanding of the regulation of important biological processes.
Data Availability for Co-Expression Network Analysis
In the post-genomic era, the reduction of costs for large scale and high-throughput measurement technologies, such as for transcriptomics, has to the extensive collection of gene expression profiles capturing changes in gene expression during development, between different treatments or tissues, etc.
In addition, the sequenced genomes of model plants (e.g., Arabidopsis, medicago, and poplar) and economically important crops (e.g., tomato, potato, tobacco, rice, and soybean) strongly improve our understanding of transcriptional dynamics.
The compendia of generated data led to the development of publicly available gene expression databases (Table 1). These databases still largely contain microarray data and many of them are related to the model plant Arabidopsis. In recent years, RNA-sequencing, using next-generation high-throughput sequencing technologies (RNA-seq) has proven to be a powerful tool for whole transcriptome profiling with enhanced sensitivity for the discovery of new transcripts and enhanced specificity such as for the examination of allele-specific expression. The power of these sequencing technologies has enabled co-expression network analysis in species without a sequenced genome and, as a result, has opened the way for new applications (see Section Comparative Co-expression Network Analysis). RNA-seq based co-expression network construction is still in its infancy (Iancu et al., 2012; Ballouz et al., 2015) but the foreseen predominance of next generation sequencing tools in the coming years will certainly enrich existing databases for the benefit of network studies. Microarrays are still commonly used for transcriptome analysis because they are relatively cheap and their analysis is highly standardized. Comprehensive microarray gene expression sets are available in public repositories such as the Gene Expression Omnibus (GEO, Edgar et al., 2001), Genevestigator (Hruz et al., 2008) or Array Express (Parkinson, 2004). Other tools, such as the online bio-analytical resource for plant biology (BAR, Winter et al., 2007), provide interactive interfaces for the exploratory visualization of gene expression variation.
Table 1. Overview of available resources for co-expression network analysis.
Co-expression networks allow the simultaneous investigation of multiple gene co-expression patterns across a wide range of conditions. As a result, publicly available transcriptome data sets represent valuable resources for such analysis. It has been reported that nearly one in four studies uses public data to address a biological problem without generating new raw data (Rung and Brazma, 2013). The reuse of such data strengthens the need for reliable expression studies. A correct experimental design, the proper execution of the wet lab experiments and thorough annotation of the data are essential prerequisites for successful subsequent reuse (Brazma, 2003).
Several gene co-expression databases are available to help researchers in their investigations (reviewed in Brady and Provart, 2009; Usadel et al., 2009; Table 1). These databases provide user-friendly interfaces to facilitate access to the data and most of them also offer integrated data processing tools. ATTED-II (Obayashi et al., 2007, 2014) allows condition specific searches for co-expressed genes in several plant species. For Arabidopsis, CressExpress (Srinivasasainagendra et al., 2008) in addition allows selection of data sets based on a quality score to filter out “bad” microarrays. GeneMANIA (Warde-Farley et al., 2010) uses a large set of functional data of various types (predicted interactions, correlations, physical interactions and shared protein domains) to display all predicted interactions for a query gene list in an interactive network. The probabilistic functional gene network AraNet (Lee et al., 2015b) provides a measure to assess the connectivity of the query genes used in regard to the generated network. Additionally, AraNet integrates enrichment analysis tools for network components for gene ontology terms and biochemical pathways (Mapman, BioCyc and KEGG) (see Section Gene Prioritization). A popular platform for network inference is Cytoscape (Shannon et al., 2003). This open source program with its many plugins and apps allows the integration, visualization and analyses of network data (Saito et al., 2012).
Data Selection for Co-Expression Network Analysis
Publicly available gene expression databases can be queried using two main approaches. These approaches are reported in the literature as “non-targeted” (or “global”) and “targeted” (or “guided-gene”) approaches (Aoki et al., 2007). The use of one or the other approach is largely determined by the biological question addressed and the available knowledge.
The non-targeted approach provides a global overview of co-expression patterns of multiple genes across many conditions. This approach is also termed knowledge-independent or condition-independent, as no a priori information is used to construct the network. As an example, Mao et al. (2009) built an Arabidopsis gene co-expression network using gene expression data from 1094 non-redundant Affymetrix ATH1 arrays from the AtGenExpress consortium. This data set represented nine categories of experimental conditions, such as environmental stresses, hormonal treatments and developmental stages. The resulting network consisted of 6206 nodes and 512,936 edges. These “global” networks are generally used to describe the overall set of connections predicted to occur between gene pairs. Separated modules of functionally related genes can be identified and enable further gene prioritization (see Section Gene Prioritization).
In these global networks, also designated as condition-independent, weak interactions or interactions only occurring under specific conditions are easily missed. This can be circumvented by specifically selecting data from experiments that are relevant to the biological question addressed (Saito et al., 2008; Usadel et al., 2009). The resulting condition-dependent networks provide insights on specific biological processes (Atias et al., 2009). Illustratively, by selecting 138 samples from publicly available gene expression data sets exclusively from mature imbibed Arabidopsis seeds, Bassel et al. (2011) established a seed specific network. This SeedNet enabled the identification of modules associated with seed traits such as germination and dormancy. Childs et al. (2011) reported the improved predictive power for gene functional annotation of such condition-dependent networks. One of the limits of this approach is that the elucidation of system wide properties, such as intersecting biological pathways and genes exhibiting pleiotropic effects, might be overlooked.
An alternative approach allows to mimic condition-dependent data set selection, while using the full potential of gene expression data sets. This approach consists of pre-clustering the samples prior to network construction. In this case, a clustering algorithm is directly applied to the normalized expression matrix (genes × conditions) to partition the input samples into a defined number of groups based on their overall expression similarity. Co-expression networks are then built from each of the clusters obtained. Using this technique, Feltus et al. (2013) have shown that such an unsupervised pre-clustering approach improved capturing of co-expressed genes and the representation of unique biological terms in the derived network modules.
When experimental data have elucidated key components of specific pathways, a guide-gene approach can help to identify novel members of the same pathway in a more targeted manner (Itkin et al., 2013). These known genes, also called bait or seed genes, are used as input genes to build a seeded co-expression network. For example, Yang et al. (2011) used this approach to identify new candidate genes involved in cell-wall biosynthesis. They first established a list of 121 genes known to be involved in cell-wall biosynthesis and by querying available data sets with these seed genes, the initial list was extended to 694 potential candidate genes.
Strategies combining guide-gene queries and condition-dependent approaches may empower the predictive power of co-expression networks. For instance, Li et al. (2009) implemented a pipeline based on QUBIC, a QUalitative BIClustering algorithm, to select the conditions under which seed genes of the plant cell-wall biosynthesis pathway in Arabidopsis were found to be co-expressed among a total set of 351 conditions. These conditions were then used to generate networks of co-expressed gene modules.
Once a co-expression network is obtained, biological relevant information can be mined by gene prioritization. This process consists of integrating diverse data sources to allow the ranking of the nodes in the network and to identify groups of functionally related genes, down to important putative regulatory genes. A panel of databases and tools are available to facilitate the integration of gene information in the network (Table 1).
In nature, a variety of biological networks have displayed evidence of scale-free behavior (Barabási and Oltvai, 2004; Albert, 2005; Atias et al., 2009). Such networks are characterized by a distribution of nodes following a power law distribution. Graphically, this type of network displays a relatively large number of low-connected nodes and a few nodes with a high connectivity, the so called “hubs.” Even though, the assumption of a power law distribution is stated in numerous studies, statistical analyses have also refuted this approach (Khanin and Wit, 2006; Lima-Mendez and Van Helden, 2009).
The network topology encodes preliminary evidences for the understanding of the underpinning biological organization and reveals biological relevant information on the functional importance of individual nodes (Atias et al., 2009). Parameters derived from network local properties such as clustering coefficient, node degree (number of connected nodes), betweenness and centrality are commonly used for node ranking (Pavlopoulos et al., 2011). Nodes with a higher rank, i.e., with a high degree of connection and a high clustering coefficient, are identified as major hubs and are also likely associated to essential genes in the network (Provero, 2002; Carlson et al., 2006). The phenomenon, describing the link between connectivity and essentiality is termed the “lethality-centrality rule” (Jeong et al., 2001). Several studies have associated the non-trivial topological features of scale free networks to an essential buffering system for biological networks robustness and environmental responses (Levy and Siegal, 2008; Fu et al., 2009; Lachowiec et al., 2015).
Groups of highly connected genes in a network tend to form modules. Extracting modules from the network is thus a commonly used approach to generate manageable graph subunits for further study (Aoki et al., 2007; Mao et al., 2009). For this purpose, several clustering algorithms are available. These algorithms can be categorized into hierarchical and non-hierarchical algorithms. Hierarchical clustering algorithms identify clusters by iteratively assigning nodes to clusters. In a first step, weights are assigned to the network vertices, using for instance the calculated correlation coefficient. Clusters are then built from high weight vertices and progressively expanded by including neighboring vertices. The number of final clusters varies, for instance depending on a chosen threshold. A variety of hierarchical clustering methods are available including Weighted Gene Correlation Network Analysis (WGCNA) (Langfelder and Horvath, 2008), Markov Cluster Algorithm (MCL) (Enright et al., 2002; Mao et al., 2009), Normalization Engine for Matching Organizations (NeMo) (Rivera et al., 2010) and Improved Principal Component Analysis (IPCA) (Li M. et al., 2008; Fukushima et al., 2012). Mutwil et al. (2010) suggested a novel Heuristic Cluster Chiseling Algorithm (HCCA). For each node in the network, this algorithm generates node vicinity networks by collecting all nodes within n steps away from the seed node. Non-hierarchical approaches, such as K-mean clustering (Stuart et al., 2003), identify a certain number of modules given the input cluster criteria instead.
The performance of the different clustering algorithms can be assessed by evaluating the functional coherence of the predicted modules and inform, in return, the user on the best clustering algorithm to use (Lysenko et al., 2011). MORPH, an algorithm developed by Tzfadia et al. (2012), combines a guide-gene approach with data set selection and clustering to enable finding the best combination of gene expression data and network clustering to optimally associate candidate genes with a given target pathway.
Modules are often used as the starting point for more detailed studies as they considerably reduce the global network complexity. A panel of tools can be employed to further mine these modules (Table 1). These tools enable the functional annotation of nodes and modules and to unravel the nature of the gene-gene relationships.
Enrichment analysis for the genes within a module is the most widely used technique to associate modules with particular functions. Under the “guilt-by-association” rule, these functional modules provide a powerful framework for the identification of new genes relevant to biological processes and their functional annotation in the absence of strong a priori knowledge. These enrichment analyses mostly rely on annotation databases (Table 1). The most popular ones are the gene ontology (GO) database (Ashburner et al., 2000) and manually curated databases for metabolite pathways such as the Kyoto Encyclopedia for Genes and Genomes (KEGG) (Kanehisa and Goto, 2000), Mapman (Thimm et al., 2004), or BioCyc (Caspi et al., 2014).
Phenotypic data can also be used with the a priori expectation that clustered genes collaborate to control the same phenotypic trait. For example, Mutwil et al. (2010) successfully associated an individual cluster with a specific biological function using phenotypic data and tissue-dependent expression profiles for each gene in the cluster. Similarly, Ficklin et al. (2010) used phenotypic information of rice mutant lines to identify clusters of genes enriched for mutant phenotypic terms such as “sterile” or “dwarf.” In another study, Lee et al. (2010) showed that genes whose disruption is associated with embryonic lethality and pigmentation were significantly more interlinked in the AraNet network than expected by chance, corroborating the aforementioned centrality-essentiality theory.
Other available data can help to unravel the nature of the links connecting genes in the network. Co-expression networks are undirected networks as the edges between two genes do not indicate the direction of the interaction. Additionally, the co-expression link between two connected genes might also indicate an indirect interaction. To further unravel the gene regulatory dynamics in such modules, known gene-gene interactions can be displayed on the network and help to identify gene regulatory relationships (Ulitsky and Shamir, 2009).
One of the common approaches to identify regulatory relationships is to focus on known transcription factors and their known targets in the network. As transcription factors regulate the expression of many genes in the genome, one might also expect to find them as highly connected nodes in the network or connected to hub genes. The range of interactions of a transcription factor is defined by its binding capacity to specific cis-regulatory elements (motifs) identified in the promoter region of its target genes. Consequently, the search for such motifs in the nodes located in the vicinity of identified transcription factors can be a complementary source to functionally annotate genes and infer potential gene regulatory relationships (Vandepoele et al., 2009).
In their approach, Ma et al. (2013) used a bottom-up approach by first creating sub networks of genes based on motif enrichment for specific cis-regulatory elements and then identifying co-expression modules in those sub-networks.
Gene interaction information can also be retrieved from other data sources. The development and application of genome-wide methods for detecting protein-protein interactions, such as yeast two-hybrid (Brückner et al., 2009) or affinity purification methods coupled to mass spectrometry (Morris et al., 2014) have increased available interactome data. The InterProScan (Quevillon et al., 2005) or STRING (Szklarczyk et al., 2014) databases can be investigated to retrieve known physical interactions, both structurally resolved and experimentally validated. Knowledge on genetic interactions enables further inferring of functional relationships between genes and pathways. Besides data storage in databases, information on gene function and interactions can also be found embedded in textual data (Hakala et al., 2015). Text mining methods applied to literature resources, such as PubMed articles, help to extract additional information using manual curation efforts (Szakonyi et al., 2015) or semi-automated tools such as PubTator (Wei et al., 2012).
Previously mentioned data mining approaches essentially rely on available knowledge. Ample knowledge is available for Arabidopsis, but for other less well-studied plant species, the lack of knowledge regarding gene annotation and interactions severely limits network analysis using gene prioritization. Comparing networks from different species can provide an additional source of knowledge for gene functional annotation and gene connectivity using gene orthologs information and network alignment (see Section Comparative Co-Expression Network Analysis). As an example, Lee et al. (2015a) used conserved functional gene associations from networks inferred for Arabidopsis, worm, human and yeast as an additional source of data for the RiceNet, which was initially limited to rice-specific data sets.
The availability of these complementary data has opened the way to integrated approaches for function prediction studies. Multiple independent lines of evidence provide confidence for network functional gene associations. Kourmpetis et al. (2011) employed the Bayesian Markov Random Fields (BMRF) model to integrate protein sequence information, gene expression and protein-protein interaction data in their function prediction approach in Arabidopsis. They demonstrated that the model for network integration had the best performance when all of these data sources were used. One of the best examples of data integration is provided by GeneMANIA. This prediction server relies on a Gaussian Markov Random Fields-based method for protein function prediction combining multiple networks (Warde-Farley et al., 2010).
Together with computational methods, these tools, mobilizing and integrating prior knowledge and network features, have contributed to the establishment of diverse strategies to prioritize candidate genes for further experimentation (Table 2).
Table 2. Examples of strategies used for co-expression network analysis in regard to the respective biological question addressed.
Co-Expression Network Applications
eQTL Based Co-Expression Networks
Advances in “genetical genomics” have greatly benefited the elucidation of the genetic loci controlling transcription and the inference of regulatory mechanisms underlying complex phenotypic traits. The concept of “genetical genomics” was first introduced by Jansen and Nap in 2001 (Jansen and Nap, 2001), marking a new turn in genetic studies. The basic idea of this approach is to join classical genetic linkage analysis (Quantitative trait Loci (QTL) analysis) with gene expression studies (Keurentjes et al., 2007). The variation in gene expression is regarded as a quantitative trait for which the genetic basis (expression QTL, eQTLs) is investigated in mapping populations, such as recombinant inbred line (RIL) populations. In plants, “genetical genomics” has proven to be a successful strategy to dissect complex traits in a number of studies (for reviews see Joosen et al., 2009; Kliebenstein, 2009; Ligterink et al., 2012).
Detected eQTLs for a specific gene can be classified into “local” or “distant” eQTLs depending on whether they co-localize with the physical position of the studied gene or are located elsewhere in the genome, respectively (Rockman and Kruglyak, 2006). eQTLs can also be classified as cis- or trans-acting based on the location of the associated causal polymorphism in the gene under study or elsewhere in the genome, respectively. Consequently, distant eQTLs are always trans-acting, while local eQTLs can be cis-acting, if the associated causal polymorphism resides in the gene under study, or trans-eQTLs when they are caused by a closely linked allelic variation in a trans-acting factor. Allele specific expression analysis can specifically determine whether a local eQTL is trans or cis-acting (for review see Kliebenstein, 2009).
A common feature of global eQTL studies is the identification of trans-eQTL hotspots (Keurentjes et al., 2007; West et al., 2007). These eQTL hotspots correspond to a high number of co-locating trans-eQTLs in one region of the genome, indicating a hotspot for transcriptional regulation (Kliebenstein, 2009). Due to their analogy to high degree nodes in a network, cis-eQTLs located in these hotspots are sought as candidate master regulators affecting the expression of genes with a trans-eQTL in that same region (West et al., 2007). A regulatory relationship can be inferred by correlating gene expression profiles between the cis-eQTL candidate regulators and their potential downstream trans regulated genes. An iterative group analysis can be used to detect significant associations (Breitling et al., 2004; Keurentjes et al., 2007; Wang et al., 2014). Keurentjes et al. (2007) established a regulatory network for genes involved in the transition of flowering based on eQTL data. The GIGANTEA (GI) protein, known to be involved in the circadian clock controlled flowering time pathway, was identified as a regulator. Phenotypic QTLs associated with flowering and the circadian clock were also identified at the genetic locus of GI. Similarly, Wang et al. (2014) identified eight regulatory groups and their target genes for heading time in rice RILs. One regulatory group centered on Ghd7, an important regulator in heading time and yield potential in rice, was identified with a cis-eQTL connected to nine genes with trans-eQTLs. The network was validated by inspecting the transcript abundance of downstream-regulated targets and supported by co-localizing phenotypic QTLs for yield and heading time. These studies illustrate the usefulness of eQTL based co-expression analysis to guide the identification of candidate genes controlling quantitative traits. Other studies combined eQTL with co-expression analysis to identify regulator candidates underlying eQTLs (Terpstra et al., 2010; Flassig et al., 2013).
Interestingly, eQTL studies have also reported noteworthy properties of eQTLs in regard to their regulatory and evolutionary significance. cis-eQTLs were found to be highly inheritable with a larger genetic effect when compared to trans-eQTLs (Petretto et al., 2006; West et al., 2007; Kloosterman et al., 2012). cis-eQTLs were also found to be more consistent across different genetic backgrounds (Cubillos et al., 2012) and more robust to environmental perturbations (Cubillos et al., 2014), while genes with trans-eQTLs were more frequently reported as tissue or organ specific (Drost et al., 2010; Kloosterman et al., 2012).
QTLs tend to cover large regions of the genome, typically spanning hundreds of genes, and finding the actual gene that causes the observed trait variation is a formidable task. The capacity of gene co-expression networks to handle genome-wide data and filter out genes based on their correlation coefficients offers an attractive approach to prioritize genes. This strategy was successfully applied in the identification of EARLY FLOWERING 3 (ELF3), and its implication in shade avoidance response (Jimenez-Gomez et al., 2010). In this study, a network was built for each of the 363 candidate genes underlying the main phenotypic QTL for shade avoidance, connecting each candidate gene to co-expressed genes across 1.388 (selected) experiments. The eQTLs available for the investigated RIL population allowed pruning of the networks to keep only the co-expressed genes with a cis-eQTL, which is indicative of a regulatory relationship (Hansen et al., 2008). In a similar approach, Chan et al. (2011) used co-expression analysis to prioritize candidate genes resulting from a genome wide association study (GWAS). Alternatively, co-expression networks can be used prior to eQTL analysis (Kliebenstein et al., 2006; Kerwin et al., 2011). Kliebenstein et al. (2006) implemented an a priori network eQTL approach by calculating the mean expression value of the genes within each pre-determined network and using this as a quantitative trait in a subsequent QTL analysis.
One main advantage of eQTL analysis is that regulatory insights can be gained without prior knowledge. Information on the nature of the inferred interaction in such an approach, combined with co-expression network analysis, can substantially accelerate understanding of molecular regulatory interactions (Figure 2). However, the link between phenotype and transcript variation is not always straightforward as changes are also likely to occur at the protein or metabolite levels. The additional integration of other omics data available as QTLs for protein (pQTL) or metabolite (mQTL) variation (Wentzell et al., 2007; Kerwin et al., 2011) can bridge the gap between genotype and phenotype, providing an in-depth understanding of causal mechanisms. As an example, Kerwin et al. (2011) identified overlapping eQTLs and mQTLs for circadian time and glucosinolate variation in Arabidopsis. Specifically, AOP2, a 2-oxoglutarate-dependent dioxygenase, was identified as a potential regulator. Altered AOP2 function resulted in changes in expression of clock output genes, suggesting a causal relationship between changes in clock function and metabolite content.
Figure 2. Schematic representation of gene prioritization strategies. Gene sets of different expression values (shades of green) are used for co-expression network inference. Genes with co-expression values above a user defined threshold (dark green nodes) form nodes and edges in the network. Various additional data can then be used to enrich and extract biological relevant information from the network. Enrichment analysis tools such as gene ontology terms (pink contour nodes) can be used to functionally annotate unknown genes (question marked node) clustered in the vicinity. Prior knowledge can also help to highlight known gene-gene interactions (dotted line) and cis-regulatory motif (purple flags) can suggest local regulatory interactions (arrows) between transcription factors (TF node) and their target genes (flagged nodes). Gene regulatory relationships can also be extracted from time series data. Algorithms can extract causal regulatory relationships from shifted gene expression patterns in time series data. Co-localization of trans- and cis-eQTLs (hotspots) can also infer regulatory relationships between genes with a cis-eQTL (orange contour node) and genes with trans-eQTLs (blue contour node). Additional information can be gained from comparisons with networks of other species (yellow nodes) by orthology and network alignment (dotted lines).
High-Resolution Co-Expression Networks
Co-expression networks offer a conceptual framework to study gene interactions. However, their static representation does not capture all possible gene relationships as these do not operate simultaneously due to spatial and temporal variation in gene expression.
Temporal Resolution for Dynamic Co-Expression Networks
In response to developmental or environmental stimuli, plants undergo global transcriptional reprogramming. Monitoring transcriptional changes over time can provide more insight into the cascade of biological processes involved in the signal perception, transduction and final response.
Using time series data sets throughout seed development, Le et al. (2010) identified seed specific transcription factors active in different compartments and tissues of the seed at unique moments of seed development, suggesting a chronology of specific regulatory programs triggering seed development.
Time series experiments are often used to examine the dynamics of gene expression. Wei et al. (2013) used six time points during growth of poplar roots in low nitrogen conditions. GO categories associated with signal transduction were identified for differentially expressed gene sets in the early time points of the response (6 and 24 h), while categories associated with organ morphogenesis were prevalent throughout the later time points (48 and 96 h). By reducing the time scale to minutes, Krouk et al. (2010) observed that within 3 min following nitrate addition in Arabidopsis, functional categories such as ribosomal proteins were over-represented, suggesting the rapid activation of key elements of the translation machinery to synthesize proteins required for nitrogen acquisition. Combining time series and co-expression network analysis can unveil gene interactions associated with the dynamics of transcriptional programs. Global expression patterns can be obtained from the expression similarity calculated across samples collected at different time points. This approach is well suited to find modules of simultaneous expressed genes and gene interactions but is not well suited for time lagged regulations since all genes influencing the expression of downstream target genes are not necessarily captured within a same time point (experiment). This results in complex relationships between co-regulated genes, including co-expression, time shifted and inverted relationships (Zhang et al., 2005): an activated transcription factor gene first has to be transcribed and the resulting mRNA translated before it in turn can activate its downstream targets. The delay further depends on the dynamics of the regulation, and for instance the presence of network motifs like feed forward or negative feedback loops (Alon, 2007).
Windram et al. (2012) dissected the infection response of Arabidopsis to Botrytis cinerea
35 UNBELIEVABLE COOKING HACKS
Опубликовано: 12.02.2018 | Автор: Милена
Всего 6 комментариев.
Gordon and Betty Moore Foundation https://www. moore.org/ (grant number 2013-10-29). Received by BH and MG.
10. Engelhardt Institute of Molecular Biology of Russian Academy of Sciences, Moscow 2013; 14:89– 99. doi: 10.1038/nrg3394. 24. Corominas-Faja B, Santangelo E, Cuyàs E.
doi: 10.1038/nature05357. Supplementary Notes 1 (Fragments) Key to Figure 1 of the main text doi: 10.1038/nature05357. SUPPLEMENTARY INFORMATION. Fragment.
whb 1038. Страница: 16. Hansa whb 1038 Инструкция по эксплуатации онлайн [16/24]. Станица устройства.
4. Закрыть дверцу. 5. Вынуть (повесить) сливной шланг. 10. Нажать на кнопку Старт/Пауза. Степень защиты. Hansa WHC 1038 S. 6,0 кг A.