But what is proteomics? Proteomics is the global study of proteins that are expressed in a given organ, tissue, or cell line. This approach provides unique insights into biological systems that cannot be provided by genomic or transcriptomic approaches, simply because there are many more proteins than protein-coding genes ( Wienkoop, Baginsky, and Weckwerth 2010). Proteomics has been used for systematic purposes, qualitative and quantitative profiling, and evaluation of the functions of proteins that are present in plant cells, tissues or organelles Therefore, proteomics is also a good tool to elucidate the elements that are involved in stress perception and transduction, and some reviews covering this area have already been published (Jorrín et al. 2006; Thurston et al. 2005)
The process of proteomics research in plant breeding follows a path that begins with the identification of stress-response proteins through comparison between stressed and control plants. Studies of proteome responses to stress generally compare protein profiles among resistant or tolerant organisms such as wild plants, mutants from genetic model species such as Arabidopsis, or crop plants, especially rice, wheat, and maize, or transgenics with susceptible or non-tolerant individuals (Cooper and Farrant 2002). Following the numerous attempts to improve cultivars through classical crossover, several lines are available that have different degrees of tolerance (Salekdeh and Komatsu 2007). Different proteins, selected from contrasts between resistant/ tolerant versus susceptible/nontolerant, or between optimal growth conditions versus stressed growth conditions, are taken as candidates involved in the stress response. The detection of these candidate proteins may allow correlations with the stress and tolerance trait. Plant growth, the level and duration of stress, and plant phenotyping are relevant topics for stress proteome study (Salekdeh and Komatsu 2007). Irrespective of which stress is applied and what plant species is utilized, most of the different proteins identified appear to be either constitutively present (pre-formed defenses) or are specifically induced in the resistant/tolerant plants (Cooper and Farrant 2002).
The course of a standard proteomics experiment often includes the following procedures: experimental design, sampling, tissue/cell or organelle preparation, protein extraction/fractionation/purification, labelling/modification, separation, MS analysis, protein identification, and statistical analysis of data and validation (Jorrín-Novo et al. 2009). The extraction of proteins is a crucial step in reaching the later stages of protein detection and identification. At this stage it is necessary to extract and solubilize proteins. Several extraction protocols are available, but two types of protocols are mostly used for plant material: tissue homogenization in buffer-based media, or in organic-solvent media (TCA-acetone, phenol, precipitation protocols). In order to achieve maximum efficiency in the extraction stage, capturing the greatest possible diversity of proteins is necessary, and this is often accomplished by combining different procedures. To be considered an ideal method, the extraction protocol should be reproducible, while at the same time it should reduce the level of contaminants and minimize artifactual protein degradation and modification (Carpentier et al. 2005; Rossignol et al. 2006).
Separation techniques may involve either gel-based or gel-free approaches. For gel-based studies, 1-DE and 2-DE are the preferred techniques used in combination with MS (; (Lilley and Dupree 2006; Jorrín, Maldonado, and Castillejo 2007; Görg et al. 2009). One of the major criticisms of 2-DE is its low precision, with relative standard deviations reported to fall in the range of 15-70%. Major sources of variability for this technique may include the transfer between the first and the second dimension, the analyst's expertise and the detection of separated proteins (Schröder et al. 2008).
Gel-free liquid chromatography tandem mass spectrometry (LC-MS/MS) analysis, called shotgun proteomics (Leitner and Lindner 2009), can increase the number of different proteins that can be identified from complex samples, compared to more traditional gel-based approaches. Shotgun proteomics has become the method of choice for the analysis of complex protein mixtures (Wolters et al. 2001; Gerster et al. 2010). However, the combination of SDS-PAGE, band cutting, trypsin digestion, and LC separation of the resulting peptides is the most powerful proteomics tool to cover the majority of proteins (de Godoy et al. 2006; Tribl et al. 2008).
The so-called 'second generation' MS technologies for Quantitative Proteomics include difference gel electrophoresis (DIGE), isotope-coded affinity tags (ICAT) (Shiio and Aebersold 2006), isobaric tags for relative and absolute quantitation (iTRAQ) (Wiese et al. 2007; Gan et al. 2007), and stable isotope labelling by amino acids in cell culture (SILAC) (Nelson et al. 2007; Palmblad et al. 2008), are now beginning to be successfully applied to plants for quantitative and large-scale proteomics studies. The gel-free multi-dimensional protein identification technology (MudPIT) is particularly well suited for the identification of hydrophobic proteins (Tjalsma et al. 2004; Görg et al. 2009) and allows the detection of a much larger number of proteins compared to gel-based methods, its drawback being the lack of quantitative data (Bayer et al. 2006). The gel-based 2-D DIGE technique is adequate for quantitative proteomics, and requires only a small amount of protein (0.025-0.050 mg) compared with 2-DE (ca. 0.7-1.0 mg) and therefore avoids the limitation of the existence of highly abundant proteins in the protein samples (Majeran et al. 2005; Ndimba et al. 2005; Casati et al. 2005; Dunkley et al. 2006).
To investigate highly complicated proteomics, label-free approaches by means of LC-MS, an IT or Fourier transform mass spectrometer have been used (Wang et al. 2006). The simplicity and cost-effectiveness of this technique make its validation with plant extracts desirable (Jorrín, Maldonado, and Castillejo 2007).
Although Bottom-up Proteomics (analysis of proteolytic peptide mixtures) remains the predominant platform, top-down strategies (analysis of intact proteins) should allow a more complete characterization of the proteome, including protein isoforms and post-translational modifications (PTM). All these aspects have been discussed in detail in recent reviews (Aebersold and Mann 2003; Cravatt et al. 2007; Zubarev and Mann 2007; Molina et al. 2007; Good et al. 2007).
Using classical quadrupole and ion trap mass analyzers, intact protein masses can be determined with standard deviations in the range of 2-5 kDa. The use of Fourier transform mass spectrometry ion cyclotron resonance (Meng et al. 2007; Marshall et al. 2009) can avoid the problems relating to complex mixtures of protein isoforms, which may complicate the determination of protein mass (Katz et al. 2007; Bräutigam et al. 2008). Surprisingly, it has been reported that a set of proteins can only be detected by a specific technology (Setsuko Komatsu et al. 2006; McDonald et al. 2006; Wu et al. 2006), which is in agreement with the idea that a combination of different methodologies is still needed to characterize entire proteomes.
Some innovations in the field of proteomics have allowed leveraging of resources to better detect and identify proteins. In the past few years, the development of new Orbitrap and dissociation methods such as electron-transfer dissociation, have opened up new possibilities in proteome analysis. The mass spectrometer, despite constant improvement in terms of machines, software and protocols, has reached the limit of its capacity (Jorrín, Maldonado, and Castillejo 2007).
Proteomics platforms have a number of restrictions, such as sensitivity, resolution and speed of data capture. They also face a number of challenges, such as deeper proteome coverage, proteomics of unsequenced "orphan" organisms (Carpentier et al. 2005), top-down proteomics (Han et al. 2006) and protein quantitation (Cox and Mann 2007). These restrictions and challenges arise from the huge diversity of proteins, with widely differing physical and chemical characteristics, that are present in organisms.
Finally, in silico proteomics, although it is as yet only applicable where the full genomic sequence is available (i.e., Arabidopsis and rice), is useful in both predicting and validating experimental data (Heazlewood et al. 2007). Because of the large amounts of data generated by proteomics analyses over the past year, there have been efforts to form a database where proteomics information can be deposited and made available to the scientific community: the PPDB, http://ppdb.tc.cornell.edu (Sun et al. 2009); the PODB, http://proteome.dc.affrc.go.jp/Soybean/; the Organellome, http://podb.nibb.ac.jp/Organellome (Mano et al. 2007); and the knowledge-based UniProt (Jorrín-Novo et al. 2009).
Efforts to form a searchable database of MS/MS reference spectra have been implemented by committees such as the Subcommittee of the Multinational Arabidopsis Steering Committee, through projects such as the "Green Proteome" (Weckwerth et al. 2008; Hummel et al. 2007), Plant Proteomics in Europe (COST Action FA0603). The database permits authentic protein identification through a genome-independent approach, since newly generated MS/MS spectra can be matched against previous experimental MS/MS spectra. This approach allows semi-quantitative analysis at the same time as spectrum matching.
Initiatives have also begun to create a guide for conducting proteomics experiments to achieve more consistent results, because many papers contain errors in the experimental design, the analysis, and the interpretation of the data (Nesvizhskii, Vitek, and Aebersold 2007). More consistent data cannot rely upon speculation, especially when the genome or transcriptome of the species being studied is still unknown. Analysis of the greatest possible number of proteins, rather than only a fraction, also improves the consistency of results. Therefore, the HUPO's Proteomic Standard Initiative has developed guidance modules (Orchard and Hermjakob 2008) that have been translated into Minimal Information about a Proteomic Experiment (MIAPE) documents. The MIAPE documents recommend proteomics techniques that should be considered and followed when conducting a proteomics experiment. Proteomics journals should be, and in fact are, extremely strict in recommending that investigators follow the MIAPE standards for publishing a proteomics experiment (Jorrín-Novo et al. 2009).
What are the protein profiles that are found in plants under abiotic stress?
According to individual studies and reviews, few proteins are specific for the type of stress applied ( Bolwell et al. 2001; Cooper and Farrant 2002; Skylas et al. 2002; Hajheidari et al. 2007). For some differential proteins, multiple isoforms or specific PTMs may be detected, each responding differently according to the stress applied (Hammond-Kosack et al. 1998). Proteins that are expressed by the same stressors, clearly confer a physiological advantage under stress conditions, and thus are simultaneously potential targets for marker-assisted selection and rational candidate genes for the identification of quantitative trait loci.
Drought stress, metal toxicity, and salt-osmotic stress are the types of abiotic stress that are most often investigated. In contrast to the intensive study of the influence of water and nutrient status on plant proteomes, studies of plant responses to light and temperature stress are rare. Various sources of plant material were examined in proteome experiments: leaves and cotyledons, roots, fruits, phloem and xylem saps, apoplastic fluid, entire seedlings, shoots, stem segments, seeds, nuclear fractions, gametophores and meristem tissue.
Drought conditions may induce proteins related to detoxifying reactive oxygen species (ROS) (Hajheidari et al. 2007), but many other abiotic stresses can enhance production of ROS resulting from photosynthesis, respiration, and NADPH oxidase (Hammond-Kosack et al. 1998). This observation makes sense, since most stressors increase production of ROS in plants. Cells exposed to high amounts of ROS may be damaged. ROS act as secondary messengers involved in the stress-response signal transduction pathway. Therefore, to detoxify the cell, i.e., remove excess ROS, plants have two mechanisms. The most important ways to combat ROS are those that involve SOD (Hajheidari et al. 2005), the water-water cycle, the ascorbate-glutathione cycle, glutathione peroxidase, and catalase (del Río et al. 2006). In the early stages of drought stress, many proteins associated with root morphogenesis and carbon/nitrogen metabolism, which may contribute to drought avoidance by enhancing root growth are stimulated (Yoshimura et al. 2008). 2-Cysteine peroxiredoxin is a protein that can be synthesized from drought stress, and belongs to the group that reduces H2O2 and alkyl hydroperoxide (Dietz et al. 2002). Thi protein constitutes an important alternative to detoxification under oxidative stress conditions. Small Heat Shock proteins (sHSPs) are also induced by heat and drought stresses. HSPs function as chaperones and play an integral role in protein folding and assembly (Sun, Van Montagu, and Verbruggen 2002). Therefore, sHSPs are promising protein markers for marker-assisted breeding programs to increase stress tolerance. The response to drought stress (Hajheidari et al. 2005) also involves the expression of cytosolic Cu-Zn SOD, cyclophilin, nucleoside-diphosphate kinase, a nascent polypeptide-associated complex a-chain, and the large subunit of RubisCO. Nucleoside diphosphate kinase (NDPKs) is also more strongly expressed after heat and drought stress (Galvis et al. 2001; Moon et al. 2010). NDPK uses ATP to maintain the cellular levels of CTP, GTP, and UTP (Moon et al. 2010) and cooperates in cellular redox regulation. The overexpression of AtNDPK2 leads to decreased constitutive ROS levels and increased tolerance to multiple environmental stresses.
Actin depolymerizing factor 4 (ADF) is also correlated with responses to drought and salt stress (Salekdeh et al. 2002; Ali and Komatsu 2006; Yan et al. 2010). ADF is related to osmoregulation under osmotic stress. This group of proteins is involved in the regulation of different cellular processes including cytokinesis, remodeling of actin filaments, cytoplasmic streaming, and signal transduction events (Dong et al. 2001). The up-regulation of ADF under drought and salt stress indicates that this protein might be associated with dynamic reorganization of the cytoskeleton during drought stress. Redox proteins such as glutathione dehydrogenase (At1g19570) are affected in stress regulation (Morgenthal et al. 2007; Wienkoop et al. 2008).
Mitogen-activated protein kinases (MAPKs) are upstream regulators of many aspects of plant cell signaling. MAPK cascades usually require three components: MAPK kinase kinases (MPKKKs), which phosphorylate MAPK kinases (MPKKs), which phosphorylate MAPKs, which phosphorylate diverse proteins (Chinnusamy et al. 2004; Ren et al. 2008). After MAPK is activated, it further activates transcription factors in the nucleus, or phospholipid-cleaving enzymes in the cytoplasm. This set of enzymes is related to stress response ( Cheong et al. 2002; Xu et al. 2003; Chinnusamy et al. 2004; Hu et al. 2006). MPK4 and MPK6 have received considerable attention for their role in abiotic stress signaling. Post-translational activation of these two kinases is stimulated by cold, low humidity, salt, wounding, reactive oxygen species, and touch (Yuasa et al. 2001; Ichimura et al. 2000).
Salt stress responses involve the substrate-binding proteins of ABC transporters. Products including H1 transporting ATPases, signal transduction-related proteins, transcription/translation-related proteins, detoxifying enzymes, amino acid and purine biosynthesis-related proteins, proteolytic enzymes, HSPs and carbohydrate metabolism-associated proteins are also involved in salt stress (Des Marais and Juenger 2010).
Excessive light enhances production of proteins involved in photosynthesis, as well as some known light stress-related proteins, such as HSP, dehydroascorbate reductase, and SOD (Cushman and Bohnert 2000). The cold stress response leads to accumulation of dehydrins and low-temperature-induced protein (Uno et al. 2000).
2.3 Metabolomic approach
What is metabolomics? Metabolomics is the untargeted analysis of a set of metabolites that are produced by an organism, so the metabolome is the set of metabolites, specifically low-molecular weight molecules (typically 3000 m/z), present in a cell, tissue or organ in a particular physiological or developmental state (Oliver et al. 1998). It is the layer downstream from large-scale analysis of RNA (transcriptomics) and proteins (proteomics) (Weckwerth 2003; Bino et al. 2004).
Understanding the metabolome is important to elucidate the complex network related to abiotic stress. The idea that metabolites are only the final product of gene expression is outmoded (Hollywood, Brison, and Goodacre 2006). It is increasingly understood that metabolites themselves regulate macromolecular operations through, for example, feedback inhibition and as signaling molecules. The cellular processes are in reality intimately networked, with many feedback loops, and thus should be represented as dynamic protein complexes interacting with neighborhoods of metabolites (Caspi 2006). Metabolomics analyses are therefore destined to provide an integrated perspective of the functional status of an organism (Dixon et al. 2006). More than this perspective, metabolome investigation is complementary to transcriptomics and proteomics, and may have special advantages. While changes in the levels of individual enzymes may be expected to have little effect on metabolic fluxes, they can and do have significant effects on the concentrations of a variety of individual metabolites. In addition, as the 'downstream' result of gene expression, changes in the metabolome are amplified relative to changes in the transcriptome and the proteome, which is likely to allow for increased sensitivity (Dixon et al. 2006). Finally, it is known that metabolic fluxes are regulated not only by gene expression but also by post-transcriptional and post-translational events, and as such, the metabolome can be considered to be closer to the phenotype (Siritunga and Sayre 2003).
Metabolomics is not intended to identify a particular metabolite or set of metabolites, as is done in traditional phytochemical studies. The broader purpose of this technique allows the evaluation not of only a very small fraction of the metabolism, but of the maximum possible number of metabolites. This is because none of the existing techniques allows the evaluation of all the metabolites that are present in an organism (Ryan and Robards 2006). To capture all of them, different analytical platforms must be combined, considering that plant metabolites have different chemical properties (Fernie et al. 2004; Moco et al. 2007). Their differences are based on the degree of volatility, polarity and concentration in a given tissue (Wolfram Weckwerth 2003). Because of this wide variability of physico-chemical characteristics, metabolomics studies are usually based on substances with certain chemical affinities.
The most widely used model for studying this platform is Arabidopsis thaliana, but other species including food plants such as tomato and potato have been investigated by means of this approach (Catchpole et al. 2005; Kristensen et al. 2005; Keurentjes et al. 2006; Moco et al. 2006; Leiss et al. 2009; Kunin et al. 2009).
Metabolomics investigations are based on techniques that include nuclear magnetic resonance (NMR), Fourier transform ion cyclotron resonance coupled with mass spectrometry (FT-ICR-MS), and separation-based techniques such as gas chromatography and liquid chromatography coupled with mass spectrometry (LC-MS and GC-MS). These analytical tools can profile the impact of time, stress, nutritional status, and environmental perturbation on hundreds of metabolites simultaneously, resulting in massive, complex data sets. This information, in association with transcriptomics and proteomics, has the capacity to produce a more holistic view of the composition of food and feed products, to optimize crop trait development, and to enhance diet and health (Dixon et al. 2006).
Samples intended for this approach are prepared using rapid freezing that stops enzyme activity. Subsequently, metabolites are extracted by different methods, e.g. with methanol to extract semi-polar metabolites. The extract can then be analyzed by many different methods and approaches (Hollywood et al. 2006).
Because of the unique structural composition and three-dimensional configuration of each compound, NMR yields a specific spectrum for each substance. The advantage of this method is that it is highly reproducible and nondestructive, and can also quantify the metabolites (Verpoorte et al. 2007). Despite this, metabolites that are present in smaller quantities will not be detected. While NMR uses magnetic resonance, all the other metabolomics platforms use mass spectrometry (MS) for identification (Ward et al. 2003, Kikuchi et al. 2004).
The most widely used metabolomics platforms are MS combined with chromatographic separations, because of the availability and relatively low cost of these techniques. With MS, metabolites are ionized (charged) and their mass-to-charge ratios (mâ„ z) are measured using electric and magnetic fields in a mass analyzer. These mass-to-charge ratios are specific for each metabolite. The disadvantage of the MS platform is that quantification of the substances is difficult and can generally only be measured in relative terms. The reproducibility is lower compared to NMR, although the MS method is much more sensitive. ''Hyphenated'' techniques of LC/MS (Yamazaki et al. 2003a and b) combine retention times (the time needed to pass through the column that separates the compounds) with MS for identification, normally, of the nonvolatile metabolites, particularly the semi-polar secondary metabolites such as flavonoids, alkaloids and glucosinolates, but also sugars and amino acids (De Vos et al. 2007). GC-MS is widely used to analyze low-molecular-weight volatiles (Fiehn et al. 2000; Roessner et al. 2001). Analytical methods for metabolic fingerprinting analyses of crude extracts with no previous separation steps, involve Fourier-transform ion cyclotron resonance (FTICR) mass spectrometry and time-of-flight (TOF) mass spectrometry (Aharoni et al. 2002, Brown et al. 2005), which cover a broad range of substances (Hagel and Facchini 2007). The mass-to-charge ratios are established from the cyclotronic frequency of the ions in a fixed magnetic field (Heeren et al. 2004). Although FT-ICR-MS has a higher sensitivity and resolution compared to NMR, it mainly gives the elemental composition of a metabolite (through MS) without providing much extra information about the chemical structures of the molecules (Macel et al. 2010).
In contrast to transcriptome studies (but in common with protein analysis), no tools are available for amplification of metabolites, and consequently sensitivity is a major issue. Metabolites have huge chemical differences, and are often present in a wide dynamic range. All of these challenges need to be adequately addressed by the analysis strategy employed.
Because of the huge amount of data that can be generated by the techniques mentioned above, as also occurs in proteomics, data processing is required (Lommen 2009). For data analysis, knowledge of bioinformatics is required (Sumner et al. 2003; Smilde et al. 2005). Data can be examined by multivariate statistics such as principal components analysis (PCA), non-metric multi-dimensional scaling (NMDS), and partial least squares discriminant analysis (PLS-DA) (Westerhuis et al. 2008). These multivariate methods will show whether the metabolome, and to a certain extent also which metabolites, differ between treatments or species. To investigate the behavior of individual metabolites, Student's t-tests or univariate analyses of variance (ANOVA) can be used in combination with correction for false discovery rates (FDR) (Macel et al. 2010).
With the intention of gathering the largest possible amount of metabolomics data, a World Wide Web-access system was created. The PlantMetabolomics.org (PM) website allows public consultation of the MS-based plant metabolomics experimental results from multiple analytical and separation techniques. PM has extensive annotation links between the identified metabolites and metabolic pathways in AraCyc (Mueller et al. 2003) at The Arabidopsis Information Resource (Rhee et al. 2003) and the Plant Metabolic Network (www.plantcyc.org), the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa 2004), and MetNetDB (Wurtele et al. 2007). The rationale for the development of PM as an information portal is to provide free public access to experimental data, along with cross-references to related genetic, chemical, and pathway information. The portal also serves as an information resource for the field of metabolomics by providing tutorials on how to conduct metabolomics experiments. It describes minimum reporting standards (Fiehn et al. 2007; Sumner et al. 2007) for plant metabolomics experiments, based on the recommendations of the MSI. In addition, PM contains background information about the experimental design and tools that can be used to analyze the collected data (Bais et al. 2010).