Addressing a variety of questions within Earth science disciplines entails the inference of the spatio-temporal distribution of parameters of interest based on observations of related quantities. Such estimation problems often represent inverse problems that are formulated as linear optimization problems. Computational limitations arise when the number of observations and/or the size of the discretized state space become large, especially if the inverse problem is formulated in a probabilistic framework and therefore aims to assess the uncertainty associated with the estimates.

This work proposes two approaches to lower the computational costs and memory requirements for large linear space-time inverse problems, taking the Bayesian approach for estimating carbon dioxide (CO2) emissions and uptake (a.k.a. fluxes) as a prototypical example. The first algorithm can be used to efficiently multiply two matrices, as long as one can be expressed as a Kronecker product of two smaller matrices, a condition that is typical when multiplying a sensitivity matrix by a covariance matrix in the solution of inverse problems. The second algorithm can be used to compute a posteriori uncertainties directly at aggregated spatio-temporal scales, which are the scales of most interest in many inverse problems.

Both algorithms have significantly lower memory requirements and computational complexity relative to direct computation of the same quantities (O(n2.5) vs. O(n3)). For an examined benchmark problem, the two algorithms yielded a three and six order of magnitude increase in computational efficiency, respectively, relative to direct computation of the same quantities. Sample computer code is provided for assessing the computational and memory efficiency of the proposed algorithms for matrices of different dimensions.

**Introduction**

Addressing a variety of questions within Earth science disciplines including environmental science, hydrology, geology, geophysics, and biogeochemistry entails the inference of the spatio-temporal distribution of parameters of interest based on observations of related quantities. Such estimation problems often represent inverse problems, with examples including the estimation of hydraulic conductivity or other aspects of subsurface structure using hydraulic head, tracer, or remote sensing measurements (e.g. Aster et al., 2012; Hyndman et al., 2007); the identification of environmental contaminant sources using downstream concentrations (e.g. Atmadja and Bagtzoglou, 2001; Liu and Zhai, 2007; Michalak and Kitanidis, 2003; Zhang and Chen, 2007), the characterization of atmospheric and oceanic processes (Bennett, 2002), and the quantification of budgets of atmospheric trace gases using atmospheric observations (e.g. Ciais et al., 2011; Enting, 2002). Such inverse problems are often formulated as linear optimization problems. Even when the physics and/or chemistry relating the unobserved field to the measurements yield a nonlinear problem, the inverse problem is often solved through iterative application of a linear approximation (e.g. Kitanidis, 1995). Computational limitations arise when the number of observations '' and/or the size of the discretized state space '' become large, especially if the inverse problem is formulated in a probabilistic framework and therefore aims to assess the uncertainty associated with the estimates.

This work proposes approaches for addressing these computational limitations. We take the Bayesian approach for estimating carbon dioxide (CO2) emissions and uptake (a.k.a. fluxes) as a prototypical example of a spatio-temporal inverse problem, and use it to illustrate the proposed tools. We use the study of Gourdji et al. (2012) as a computational benchmark.

**A prototypical spatiotemporal inverse problem**

Gourdji et al. (2012) used atmospheric concentration measurements of CO2 to constrain CO2 fluxes in North America at a 1o longitude by 1o latitude spatial resolution (=2635 land regions for 50oW to 170oW and 10oN to 70o N) and a 3-hourly temporal frequency over the period of December 24, 2003, to December 31, 2004 (=2992 periods over 374 days). The implemented setup resulted in =8503 observations and =7,883,920 parameters to be estimated. This high spatiotemporal resolution was primarily motivated by the desire to avoid "aggregation errors," i.e. biases in the estimated fluxes caused by prescribing spatial and temporal patterns that cannot be adjusted through the estimation. The resolutions that can be resolved by observations are often coarser, however, as are the scales that are of most scientific interest. In the case of Gourdji et al. (2012), the estimates were therefore aggregated a posteriori to monthly and annual temporal resolution for interpretation. The a priori spatiotemporal error covariance was assumed separable, with exponential decay in correlation in both space and time. As a result, the prior covariance matrix could be expressed as a Kronecker product of matrices describing the spatial and temporal covariances, respectively.

**Bayesian framework for linear inverse problems**

Stochastic linear inverse problems are often formulated in a Bayesian framework by requiring the minimization of an objective function that can be written as:

(1)

where is a known vector of measurements, is a matrix that describes the relationship between measurements and the unknown field , is the covariance matrix of the model-data mismatch errors, is the prior estimate of , and is a (square and symmetric) prior error covariance matrix, describing deviations between the true field and the prior . The first term in Eq. (1) penalizes differences between available observations and those that would result from an estimated underlying field, while the second is a regularization term that penalizes departures from the prior, or more generally any type of desired structure.

The solution to the Bayesian linear inverse problem, defined as the estimate of that minimizes the objective function in Eq. (1), can be expressed as:

(2)

and the a posteriori uncertainty covariance of the estimated can be written as:

(3)

For small and , implementing Eq. (2) and (3) is straightforward. As inverse problems are solved using increasingly more observations and are used to estimate parameters at increasingly high spatiotemporal resolutions, as in the prototypical Gourdji et al. (2012) example, the number of floating point operations required to implement these equations becomes prohibitive.

A closer look at Eqs. (2) and (3) shows that the first computational bottleneck occurs due to the cost of multiplying the matrices and . The second is the cost of computing and storing a dense with dimensions. Paradoxically, as noted previously, the scales of ultimate interest are often coarser than the native resolution of , and these covariances are frequently aggregated a posteriori in space and/or time by summing or averaging the corresponding entries in .

In this work, we propose a computational approach for evaluating , and by extension for very large inverse problems, for the case where the covariance matrix can be expressed as a Kronecker product of two smaller matrices. This is typical of spatiotemporal inverse problems where the space-time covariance is assumed separable, or simpler problems that only consider covariance in space or in time, rather than both. We further present an approach for directly calculating the a posteriori error covariance at aggregated scales, without the intermediary step of first computing the full . We use the Gourdji et al. (2012) problem as a computational benchmark for evaluating the performance of the proposed approaches relative to a direct implementation of Eqs. (2) and (3). Code demonstrating the implementation of both methods for a toy example is available as supplementary material.

**Efficient method for the multiplication of any matrix with a matrix expressed as a Kronecker product**

One key step in the solution of a linear inverse problem is the matrix multiplication between the forward operator and the prior error covariance matrix . If can be factored as a Kronecker product, then the matrices formed by their multiplication can be computed in blocks.

**Algorithm**

Any matrix that can be expressed as a Kronecker product can be defined based on matrices and and denoted as , where:

(4)

For a square covariance matrix , both and are also square. For the prototypical case examined here, is expressed as the Kronecker product of the temporal covariance and the spatial covariance, both of which decay exponentially with separation distance or lag:

(5)

where is the variance in space and time, and represent the separation distances/lags between estimation locations in space and time, respectively, and and are the spatial and temporal correlation range parameters, respectively. In this case, and . This defines a block matrix with blocks, each defined as a square matrix with elements. As the Kronecker product is not commutative, the arrangement of the temporal and spatial covariance in Eq. (5) determines the design of .

Returning to the more generic case, the multiplication of any matrix by proceed as follows:

Divide into column blocks each with dimension (nÃ-r)

(6)

Multiply each block of by the elements of the first column of and add these blocks (). If an element of is zero then skip the multiplication; if it is one then add the column block of without performing scalar multiplication.

Multiply the resulting matrix by to obtain the first column block of .

Repeat steps 2 and 3 for the remaining columns of and the corresponding blocks of .

Overall,

(7)

This algorithm can also be used for the multiplication of matrices where the first matrix is a Kronecker product of two smaller matrices, through the cyclical permutation property of transposes.

For and Eqs.(6) and (7) become:

(8)

(9)

The multiplication of and where is a block diagonal (e.g. there is correlation in space but not in time) is a special case of the algorithm where is an identity matrix.

**Floating point operations**

The number of floating point operations required for a direct multiplication of a matrix by a matrix can be expressed as a function of the dimensions of these matrices (for details see; Golub and Van Loan (1996)):

(10)

Similarly, the cost of the "indirect" multiplication algorithm presented in the last section is:

(11)

Equation (11) is based on the fact that steps 2 and 3 are each repeated times. The relative computational performance of the indirect method can therefore be expressed as:

(12)

For and this simplifies to:

(13)

Note that the number of observations '' does not affect the relative performance of the algorithm. Asymptotically, equation 13 approaches zero with increasing and . For the Gourdji et al. (2012) problem, this ratio is 7.14E-04, a savings of over 99.9% in the computational cost relative to the direct approach.

For the more generic case of multiplying and , consider the special simplifying case where and ; the floating point operations required by the direct and indirect methods become:

(14)

(15)

This results in an asymptotic complexity of for the direct method, and for the indirect method. The computational cost of the indirect approach presented is thus lower than that for Strassen's algorithm , but greater than that for the Coppersmith-Winograd algorithm . The Coppersmith-Winograd algorithm, however, is only useful for extremely large matrices that cannot be handled by present-day computers . The direct method is more economical only for , and the relative cost of the indirect method decreases exponentially thereafter. The overall computational cost could be reduced further if the matrix multiplications in Step 3 of the algorithm were computed through the Strassen or Coppersmith and Winograd algorithm. For a special case where D is composed of zeros and ones, the computational cost can also be further reduced by avoiding scalar multiplications, as listed in Step 2 of the algorithm.

Two other methods have been proposed for reducing the cost of matrix multiplication in inverse problems in special circumstances. For the special case of a regular estimation grid and Toeplitz covariances, the Fast Fourier (FF) method gives an exact answer and has a computational complexity of , which is lower than the method proposed here, but has higher memory requirements. In addition, the algorithm presented here can significantly outperform the FF method for sparse Toeplitz covariances, as it can take advantage of the sparseness and structure of the covariance matrices. For an irregular estimation grid, an approximate method based on a hierarchical framework has been proposed by Saibaba and Kitanidis (2012). Like the FF method, this method also has a computational complexity of , but it can only be used for very specific covariance functions and structured matrices . In addition, errors due to the approximations used in this approach compound in the case of large inverse problems.

**Other computational benefits of the indirect approach**

Beyond the economies in floating point operations, the indirect method also dramatically decreases the random access memory requirements for matrix multiplication, because the proposed approach eliminates the need to create or store the full matrix B (or Q). Thus, again taking the special case of and as an example, the memory requirement for storing and is , whereas it is if is explicitly stored in memory.

In addition, the indirect approach is fault tolerant and amenable to distributed parallel programming or "out of core" matrix multiplication, as each column block of (or ) can be obtained separately without any communication between processors populating the individual blocks.

In the case of the solution of an inverse problem, the multiplication of can also be completed block by block:

(16)

where and represents column blocks of the matrix as defined earlier. The computational efficiency of the matrix multiplication of, and can be further improved if the symmetry of is taken into account (see details on quadratic forms; ). However there are no "Basic Linear Algebra Subroutines" that take this property into account, and additional work would be required to develop these methods for application in inverse problems and statistics.

**Computation of aggregated a posteriori covariance in large linear space-time inverse problems**

The a posteriori covariance matrix (; Eq.(3)) is typically dense, and calculating is a computational bottleneck for large inverse problems. For example, computing explicitly for the Gourdji et al. (2012) problem would require approximately 1.06E+18 floating point operations, and over 56 terabytes of RAM. We propose an algorithm for computing the a posteriori covariance directly at aggregated scales, which are typically the scales of most interest as described in Sect. 1, without explicitly calculating. We use the estimation of a posteriori uncertainties at the native spatial scales (1o Ã- 1o grid-scale in the prototypical example) but for estimates averaged across larger temporal scales as an example.

**Algorithm**

The algorithm is presented for a design consistent with Eqs. (4) and (5), i.e. where the diagonal blocks describe the spatial covariance, and the off-diagonal blocks describe the decay of this covariance with time. The particular design framework of used in this study does not hinder the application of the proposed algorithm for obtaining a posteriori covariances and cross-covariances in inverse problems where has a different design, or where the aggregation is to be conducted over a different desired dimension.

The calculation of the a posteriori covariance at the native spatial resolution aggregated temporally over a desired time period proceeds as follows:

Sum all blocks of the matrix corresponding to the time periods between periods and over which the a posteriori uncertainties are to be aggregated, where , . For expressed as a Kronecker product as given in section 2.1, this is the sum of all entries between and in , multiplied by (spatial covariance):

(17)

where represents the sum of all blocks between and .

Sum all column blocks of the (see, Eq. (9)):

(18)

where represents the sum of all column blocks as shown in Eq. (9) between and

Compute the aggregated grid-scale a posteriori covariance for the estimates averaged over the desired time-periods:

(19)

whereis the covariance of temporally averaged for time periods to .

**Floating point operations**

The number of floating point operations required for the direct calculation of (Eq. (3)) and its aggregation over k time periods is compared to the calculation of the aggregated using the algorithm described above. The floating point operations required for multiplying by and by , for adding to , for taking the inverse of , and for dividing the aggregated covariance by are excluded in the floating point operation count, because the cost of these operations is the same for both approaches. Of course, computational savings can be achieved for both by following the algorithm outlined in Sect. 2 for the matrix multiplications.

The number of floating point operations required for obtaining grid scale a posteriori covariance from the direct method can be divided into four sequential operations: (1) matrix multiplication of and , (2) matrix multiplication of and , (3) subtraction of : from , and (4) summation of all , (spatial covariance) blocks of . The floating point operations for these four calculations are:

(20)

For the algorithm proposed in section 3.1, five operations are required to obtain aggregated a posteriori covariance for the desired time-period. These are: (1) summation of , (spatial covariance) blocks of , (2) summation of , blocks of , (3) multiplication of and , (4) multiplication of and , and (5) subtraction of from . The last three of these are all part of Step 3 of the algorithm. The floating point operations for these five calculations are:

(21)

Asymptotically, approaches zero with increasing, and . For the Gourdji et al. (2012) problem, this ratio is 5.34E-07 when evaluating the a posteriori covariance aggregated over the full year (k=), a savings of over 99.9999% of the computational cost relative to the direct approach.

For the special simplifying case where (i.e. ) and , the ratio of floating point operations required by the direct and the indirect methods becomes:

(22)

This results in an asymptotic complexity of for the direct method, and for the indirect method. The reduced memory requirements are arguably even more important, however, as the proposed algorithm makes it possible to compute a posteriori covariances at any temporal resolution without explicitly creating

**Conclusion**

We propose two algorithms to lower the computational cost and memory requirements for large linear space-time inverse problems. The proposed matrix multiplication algorithm can be implemented with any matrices, as long as one of them can be expressed as a Kronecker product of smaller matrices, making it broadly applicable in other areas of statistics and signal processing, among others . The presented a posteriori covariance computation algorithm can provide aggregated uncertainty covariances even for extremely large space-time inverse problems with dramatically decreased computational and memory requirements.

The mounting availability of massive volumes of data (e.g., satellite observations) will further increase the computational cost associated with the solution of inverse problems in the Earth sciences. Beyond the approaches presented here, more work needs to be done to increase the efficiency of other parts of the inverse problem, including the matrix inversion operation required in the solution of the large systems of linear equations associated with inverse problems.

**Appendix A: Description of the Submitted Code**

Two Matlab code files demonstrating the application of the methods proposed in this manuscript are included as supplementary material. The Matlab script file "HQ_HQHt.m" allows users to experiment with different sizes of random covariance matrices in a Kronecker product form and computes HQ and HQHT using the direct method as well as the method presented in Sect. 2.1. The second Matlab script file "Uncertainty_Computations.m" allows users to experiment with random matrices for computing a posteriori covariances aggregated either over all time-periods or for specified time-periods. A detailed description of the codes is also given at the beginning of the script files.

**Acknowledgements**

This material is based upon work supported by the National Aeronautics and Space Administration under Grant No. NNX12AB90G and the National Science Foundation under Grant No. 1047871.