Program in Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA

Abstract

Background

Cryo-electron tomography emerges as an important component for structural system biology. It not only allows the structural characterization of macromolecular complexes, but also the detection of their cellular localizations in near living conditions. However, the method is hampered by low resolution, missing data and low signal-to-noise ratio (SNR). To overcome some of these difficulties and enhance the nominal resolution one can align and average a large set of subtomograms. Existing methods for obtaining the optimal alignments are mostly based on an exhaustive scanning of all but discrete relative rigid transformations (i.e. rotations and translations) of one subtomogram with respect to the other.

Results

In this paper, we propose gradient-guided alignment methods based on two popular subtomogram similarity measures, a real space as well as a Fourier-space constrained score. We also propose a stochastic parallel refinement method that increases significantly the efficiency for the simultaneous refinement of a set of alignment candidates. We estimate that our stochastic parallel refinement is on average about 20 to 40 fold faster in comparison to the standard independent refinement approach. Results on simulated data of model complexes and experimental structures of protein complexes show that even for highly distorted subtomograms and with only a small number of very sparsely distributed initial alignment seeds, our combined methods can accurately recover true transformations with a substantially higher precision than the scanning based alignment methods.

Conclusions

Our methods increase significantly the efficiency and accuracy for subtomogram alignments, which is a key factor for the systematic classification of macromolecular complexes in cryo-electron tomograms of whole cells.

Introduction

Cryo-electron tomography emerges as an important component for structural system biology approaches

Subtomogram alignment and classification methods

We demonstrate on realistically simulated data of models and real macromolecular structures that for highly distorted subtomograms, even given a small number of evenly sampled initial angles with a large interval of 60° or 45°, our method can accurately recover true transformation with very high precision.

Methods

Here we provide a gradient-guided refinement framework for subtomogram alignment that minimizes a dissimilarity score as defined by the squared sum of the differences between a parameter fixed function and a function whose parameters are optimized. We consider two types of dissimilarity scores for subtomogram alignments, which both incorporate missing wedge corrections: A real space constrained dissimilarity score (Section 2.2) and a Fourier space constrained dissimilarity score (Section 2.4). In addition, we adapt our refinement protocol also to the case where the rotational search is restricted to only certain axis of rotations, for instance when the search is constrained to rotations around a membrane surface normal when membrane bound complexes are aligned (Section 2.5). In principle, it is beneficial to refine independently each of the candidate solutions from an exhaustive rotational scanning, however this is computationally expensive and not feasible for large scale subtomogram classifications, which is necessary in whole cell tomography. We therefore provide also a stochastic parallel refinement framework (Section 2.3) to efficiently reduce the total number of refinement steps.

Parameter definitions

For simplicity, we denote two subtomograms as two integrable functions ^{3 }→ ℝ. For **a **∈ ℝ^{3 }, let _{a }be the translation operator (_{a}**x**) := **x **- **a**). For a rotation _{R }be the rotation operator, such that (Λ_{R}**x**) := ^{-1}(**x**)]. **R**. In this case, (_{a}Λ_{R}**x**) = **R**^{-1}(**x **- **a**)).

The rigid transformation parameters combine both rotation and translation and are expressed as **a**) = (_{1}, _{2}, _{3})^{⊤}, where (^{⊤ }are Euler angles in the 'ZYZ' convention **a **= (_{1},_{2},_{3})^{⊤}. In addition, for simplicity, we denote the combined rigid transformation operator _{β }:= _{a}Λ_{R}.

Local optimization of subtomogram alignment based on a real space constrained dissimilarity score (RCS)

We now describe the gradient-guided refinement for the subtomogram alignment, given a coarse initial solution for **a**. The goal is to identify a local optimal solution given the current values of

To address this problem, Förster

, where

The normalized subtomogram transforms can be defined as

where

Then the constrained correlation is calculated as

Because of the subtomogram normalization, this constrained correlation is equivalent to a constrained dissimilarity score:

For a given initial guess of the rotation _{a }that minimizes the distance criteria _{R }and _{a}, we seek to obtain an increment Λ_{ΔR }and corresponding _{Δa }so that

Since

Let **x**_{j},

According to the Levenberg-Marquardt algorithm, Δ**a**) can be obtained by computing

Here **f **and **g**_{β }are the vector representations

and

**J **is the Jacobian matrix whose **E**) converts a matrix **E **to a diagonal matrix consisting of only diagonal elements of **E**; λ is a damping factor to control the rate of convergence.

The final result of this section provides the refined alignment parameters _{2 }= _{1 }+ Δ_{1 }and **a**_{2 }= **a**_{1 }+ Δ**a**_{1 }given the initial parameter set _{1 }and **a**_{1}. To perform a complete alignment refinement this process must be repeated iteratively until convergence is achieved (next section).

Stochastic parallel refinement process

To carry out a global optimization it is necessary to perform multiple refinement runs starting each time from a different candidate rotation angle. However, to carry out these individual optimizations independently is time consuming, which would prevent large-scale applications of subtomogram alignments. Therefore, we propose a stochastic parallel refinement framework to prioritize for those candidate transform parameters with smaller dissimilarity scores (Figure _{1},..., _{m}, where each **a**) consists of both rotation and translation parameters. The choice of which _{j }to refine next is stochastically decided according to a probability obtained from _{j }with smaller

Flow chart

**Flow chart**. Flow chart of the stochastic parallel refinement process.

We define a sampling probability that considers both rank and magnitude of

Then for _{j }is proportional to _{j }with

where _{1 }= 1 and _{j }and _{j-1 }is at least 10^{t/(m-1)}, and _{m}/_{1 }≥ 10^{t}.

To further enhance the computational efficiency, similar candidate transforms _{j }and _{k }is defined as the Frobenius norm

∀

To terminate the optimization process, at each iteration the ratio between the smallest and the initial minimum score is calculated. The iterative process is terminated when convergence is achieved, which in turn is identified by a linear regression ratio ^{regress }over the minimal scores in the last iterations. In case convergence cannot be achieved the optimization is terminated after a large number of iterations ^{max_iter}.

Similar to other stochastic optimization methods, such as genetic algorithms, our method also stores and evolves a population of candidate solutions. However, our method represents solutions by continuous values, and improves individual solutions by gradually refining them. By contrast, genetic algorithms usually encode solutions in strings of discrete bits, and generate new solutions by applying mutation and recombination on multiple existing solutions.

In this section we have introduced a parallel iterative refinement method that relies on a dissimilarity measure and local optimization process as described in Section 2.2. In the following section, we introduce another refinement method based on a different dissimilarity measure between subtomograms.

Local optimization of subtomogram alignment based on a Fourier space constrained subtomogram dissimilarity score (FCS)

After having introduced an iterative refinement process, and introduced a dissimilarity measure in Section 2.2, we now test the refinement process further with a second dissimilarity score. This new score is based on a constrained dissimilarity score computed directly in Fourier space

By properties of the Fourier transform

, given a fixed initial

where

Let _{j},

Because the above score is based on complex functions, the Levenberg-Marquardt algorithm cannot be directly applied. Therefore we derive a new version of the Levenberg-Marquardt algorithm for complex functions. In this version, Δ

where

and where

Here **f **and **g**_{β }are vector representations of the Fourier transform of the two subtomograms

and

**J **is the Jacobian matrix whose

In summary, in this section a Fourier-based similarity score is introduced and combined with a Levenberg-Marquardt algorithm adapted for complex functions.

Constrained rotational search around a rotation axis

If knowledge about the macromolecule's preferred orientation is available, it is beneficial to reduce the rotational search space to a range of only those preferred orientations. Then a significantly smaller number of rigid candidate transformations is sufficient to find the optimal alignment. For example, when the macro-molecules are membrane-bound protein complexes (e.g.

To minimize distortions due to the interpolation step in rigid transformations, one wants to reduce the number of sequential transformations for the original subtomograms. Therefore, we perform the constrained search by rotating only _{R }while keeping the original subtomogram

where _{f }and _{g }are the rotations of _{n }represents a rotation around the z-axis, defined in the form of (^{⊤}. During the refinement process, _{f }and _{g }are kept constant, and the only rotational parameter to be optimized is

Generating simulated cryo-electron tomograms

For a reliable assessment of the method, tomograms must be simulated as realistic as possible. We follow a previously applied methodology for realistically simulating the tomographic image formation

Initial density maps at 4 nm resolution are generated and used as samples for simulating electron micrograph images at different tilt angles. The tilt angles are set within a certain maximal range with steps of 1°. As a result our data contains a wedge-shaped region in Fourier space for which no data has been measured (missing wedge effects), similar to experimental measurements. The missing wedge effect leads to distortions of the density maps in real spaces. To generate realistic micrographs, noise is added to the images and the resulting image map is convoluted with a Contrast Transfer Function (CTF), which describes the imaging in the transmission electron microscope in a linear approximation. Any negative contrast values beyond the first zero of the CTF are eliminated. We also consider the modulation Transfer Function (MTF) of a typical detector used in whole cell tomography, and convolute the density map with the corresponding MTF. The CTF and MTF describe distortions from interactions between electrons and the specimen and distortions due to the image detector ^{-3}m, the defocus value = -4 × 10^{-6}m, the MTF corresponded to a realistic electron detector

Finally, we use a backprojection reconstruction algorithm to generate a tomogram from the individual 2D micrographs that were generated at the various tilt angles

Simulated subtomograms from phantom model

**Simulated subtomograms from phantom model**. (a) Density map of an unsymmetric phantom model consisting of four different 3D Gaussian functions. This density map is used to simulate subtomograms of 32^{3 }voxels. (b) A slice of the reconstructed tomograms at different levels of noise (∞, 1, 0.5, 0.1), and different tilt angle ranges leading to different levels of missing wedge distortions. The isosurface are generated using the Chimera software package

All our methods are implemented in MATLAB.

Results

We test our methods on phantom models and actual structures of protein complexes.

Pairwise alignment of subtomograms from phantom models

To assess the general performance, 100 pairs of subtomograms with randomly placed phantom models were generated for different SNR levels and tilt angle ranges (Figure ^{regress }≤ 0.001 and ^{max_iter }= 1000. We test our approach with respect to two factors. First, the average alignment error obtained from the refinement and second, the number of iterative steps that are needed to determine the optimal solution.

We show that even at a low SNR level of 0.5 and a typical range of tilt angles between -70° and +70° our method can still achieve a very low alignment error (Table

**Alignment rotation error**. Subtomogram alignment error in terms of the difference in the determined and true rotational angle of the subtomograms. Shown are the medians and median absolute deviations of all 100 subtomogram alignments. Bold font shows all the alignments with errors larger than 5°, which are considered inaccurate.

60° angle interval

SNR

∞

1

0.5

0.1

∞

1

0.5

0.1

Tilt

±90°

0.71 ± 0.49

3.3 ± 2.8

2.6 ± 1.4

**14 ± 9.3**

0.89 ± 0.54

2.6 ± 2.1

2.4 ± 1.1

**8.5 ± 4.5**

±80°

0.85 ± 0.54

2.5 ± 1.8

3.5 ± 2.4

**21 ± 14**

1.1 ± 0.61

2.2 ± 1.6

3.2 ± 2.2

**12 ± 7.7**

±70°

1.2 ± 0.53

1.9 ± 1.3

3.1 ± 1.7

**19 ± 12**

2 ± 0.86

2.1 ± 1

2.9 ± 1.3

**16 ± 11**

±60°

0.97 ± 0.49

2 ± 0.97

3.7 ± 2.4

**49 ± 45**

1.5 ± 0.82

2.4 ± 1.2

3.8 ± 2.1

**34 ± 30**

±50°

1.8 ± 0.9

2.9 ± 1.6

**7 ± 5.2**

**87 ± 63**

2.6 ± 1.1

3.4 ± 1.8

**6.3 ± 4.2**

**43 ± 37**

±40°

1.6 ± 1

**9 ± 8.3**

**55 ± 53**

**123 ± 31**

**15 ± 14**

**92 ± 40**

**106 ± 37**

**113 ± 26**

45° angle interval

SNR

∞

1

0.5

0.1

∞

1

0.5

1

Tilt

±90°

0.58 ± 0.25

1.1 ± 0.57

2 ± 0.84

**8.1 ± 2.7**

0.7 ± 0.34

1.1 ± 0.47

1.7 ± 0.77

**5.9 ± 2.4**

±80°

0.79 ± 0.31

1.4 ± 0.6

2.4 ± 0.93

**11 ± 4.5**

1.2 ± 0.49

1.7 ± 0.73

2.3 ± 0.97

**7.9 ± 3**

±70°

1 ± 0.26

1.8 ± 0.69

2.7 ± 1.2

**8.4 ± 3.1**

1.5 ± 0.46

2.1 ± 0.7

2.5 ± 1.1

**7.9 ± 2.5**

±60°

1 ± 0.42

1.6 ± 0.66

2.7 ± 0.86

**10 ± 4.8**

1.7 ± 0.68

2.1 ± 0.84

2.8 ± 0.84

**9.2 ± 4.5**

±50°

2 ± 0.77

2.4 ± 0.93

2.7 ± 1

**14 ± 11**

2.6 ± 1.1

2.9 ± 1.1

2.9 ± 1.1

**11 ± 5.1**

±40°

1.5 ± 0.79

2.5 ± 1.1

**5.7 ± 3.6**

**107 ± 27**

**9.4 ± 7.8**

**5.7 ± 3.3**

**7.7 ± 5.4**

**111 ± 19**

RCS

FCS

Our method therefore allows substantially larger sampling interval while maintaining a high accuracy in subtomogram alignment.

Using a sampling angle interval as large as 60° has major advantages in terms of computational efficiency. For the standard exhaustive scanning at 5° intervals a total of 168,634 candidate orientations must be processed while at 60° rotational intervals only 108 candidate orientations are refined. Also our method can in general achieve a small error for the translation of subtomograms that cannot be reached by an FFT based exhaustive sampling, which on average cannot be less than 0.5 (Table

**Alignment translation error**. Subtomogram alignment error in terms of the difference in the Euclidean distance between determined and true subtomogram translations. Shown are the medians and median absolute deviations of all 100 subtomogram alignments.

60° angle interval, RCS

SNR

∞

1

0.5

0.1

Tilt

±90°

0.035 ± 0.023

0.16 ± 0.12

0.19 ± 0.12

0.96 ± 0.66

±80°

0.045 ± 0.029

0.24 ± 0.2

0.21 ± 0.15

1.3 ± 0.89

±70°

0.078 ± 0.037

0.25 ± 0.17

0.3 ± 0.18

1.3 ± 0.74

±60°

0.068 ± 0.036

0.19 ± 0.12

0.43 ± 0.3

2.2 ± 1.3

±50°

0.14 ± 0.078

0.26 ± 0.17

0.65 ± 0.51

2.3 ± 1.3

±40°

0.15 ± 0.092

0.74 ± 0.64

1.7 ± 1.3

3.2 ± 1.6

60° angle interval, FCS

SNR

∞

1

0.5

0.1

Tilt

±90°

0.047 ± 0.023

0.12 ± 0.081

0.11 ± 0.053

0.49 ± 0.31

±80°

0.053 ± 0.03

0.15 ± 0.1

0.18 ± 0.1

0.85 ± 0.66

±70°

0.11 ± 0.057

0.13 ± 0.074

0.21 ± 0.1

0.95 ± 0.58

±60°

0.11 ± 0.061

0.2 ± 0.094

0.3 ± 0.15

1.6 ± 1.2

±50°

0.19 ± 0.1

0.28 ± 0.16

0.44 ± 0.26

1.8 ± 1.2

±40°

0.61 ± 0.54

3.3 ± 2.7

4.3 ± 2.6

6.2 ± 3

45° angle interval, RCS

SNR

∞

1

0.5

0.1

Tilt

±90°

0.031 ± 0.014

0.072 ± 0.031

0.12 ± 0.049

0.43 ± 0.21

±80°

0.051 ± 0.027

0.11 ± 0.051

0.17 ± 0.072

0.69 ± 0.32

±70°

0.063 ± 0.024

0.14 ± 0.052

0.21 ± 0.1

0.63 ± 0.24

±60°

0.076 ± 0.036

0.15 ± 0.068

0.23 ± 0.1

0.89 ± 0.5

±50°

0.11 ± 0.055

0.2 ± 0.094

0.28 ± 0.14

1.3 ± 0.95

±40°

0.14 ± 0.071

0.31 ± 0.17

0.67 ± 0.47

6.2 ± 5.3

45° angle interval, FCS

SNR

∞

1

0.5

0.1

Tilt

±90°

0.033 ± 0.016

0.071 ± 0.03

0.094 ± 0.032

0.29 ± 0.13

±80°

0.061 ± 0.031

0.1 ± 0.052

0.13 ± 0.062

0.46 ± 0.21

±70°

0.08 ± 0.04

0.12 ± 0.05

0.17 ± 0.075

0.49 ± 0.22

±60°

0.1 ± 0.052

0.17 ± 0.091

0.22 ± 0.073

0.72 ± 0.36

±50°

0.19 ± 0.094

0.22 ± 0.083

0.24 ± 0.12

0.93 ± 0.5

±40°

0.76 ± 0.64

0.51 ± 0.36

0.82 ± 0.6

9.8 ± 4.1

In addition, the parallel stochastic refinement process reduces considerably the number of refinement iterations that are needed to reach a good solution in an optimization. At a rotational sampling of 60°, there are 108 candidate orientations that can potentially serve as starting points for a refinement process. Without the parallel stochastic optimization method, a refinement of a candidate orientation takes on average about 60 iterations per run. When all candidate orientations are refined independently a total of about 6480 iterative refinement steps are needed to find the global optimum among all candidate orientations. However, our parallel stochastic refinement process reaches convergence already within only 200-300 iterative refinement steps (Figure

Convergence example

**Convergence example**. Top panels: The minimum dissimilarity scores obtained at different iterations subtracted from the true dissimilarity score. Bottom panels: The difference

Computation speedup

**Computation speedup**. Computational speed up of the stochastic parallel optimization method compared to the traditional exhaustive refinement method. Shown is the ratio of the number of iterations needed to find the optimal solution for the exhaustive and stochastic parallel optimization methods (The numbers show the fold increase in number of iterations when the exhaustive method is used). Shown are the median deviations of all 100 subtomogram alignments for the RCS method (left column) and FCS method (right column) for optimizations using a rotational search at 60° intervals (a) and 45° intervals (b), respectively.

Next, we test the alignment when the search space is constrained to rotations around a single axis. When rotational sampling is performed at 60° and 45° intervals, only 6 and 8 initial candidate rotation angles are used, respectively. The alignment performance is shown in Tables

**Constrained alignment rotation error**. Constraining the search to rotations around a single axis. Subtomogram alignment error in terms of the difference in the determined and true rotational angle. Shown are the medians and median absolute deviations of all 100 subtomogram alignments. Bold font shows all the alignments with errors larger than 5°, which are considered inaccurate.

60° angle interval

SNR

∞

1

0.5

0.1

∞

1

0.5

0.1

Tilt

±90°

0.2 ± 0.14

0.31 ± 0.15

0.55 ± 0.26

4.1 ± 1.8

0.21 ± 0.13

0.44 ± 0.24

0.62 ± 0.33

3.1 ± 1.7

±80°

0.29 ± 0.19

0.55 ± 0.38

0.89 ± 0.62

4.5 ± 3

0.44 ± 0.29

0.67 ± 0.41

1 ± 0.73

3.2 ± 1.6

±70°

0.43 ± 0.25

0.67 ± 0.38

0.81 ± 0.54

**5.4 ± 3.9**

0.57 ± 0.37

0.85 ± 0.53

0.84 ± 0.43

3.8 ± 2.4

±60°

0.6 ± 0.4

0.99 ± 0.79

1.5 ± 1.2

**5.9 ± 4.9**

0.81 ± 0.64

1.3 ± 1.1

1.7 ± 1.3

3.8 ± 3

±50°

1.1 ± 0.93

1.3 ± 1.1

1.4 ± 0.94

**5 ± 4.1**

1.7 ± 1.4

1.7 ± 1.4

2 ± 1.3

4.1 ± 3.6

±40°

1.7 ± 1.7

2.3 ± 2.2

**7 ± 6.9**

**42 ± 38**

3.9 ± 3.7

3.1 ± 2.9

4 ± 3.6

**42 ± 39**

45° angle interval

SNR

∞

1

0.5

0.1

∞

1

0.5

0.1

Tilt

±90°

0.2 ± 0.12

0.35 ± 0.16

0.42 ± 0.25

3.1 ± 1.7

0.19 ± 0.12

0.38 ± 0.19

0.45 ± 0.28

2.5 ± 1.2

±80°

0.18 ± 0.12

0.31 ± 0.21

0.61 ± 0.3

3.9 ± 1.7

0.35 ± 0.23

0.5 ± 0.34

0.5 ± 0.36

2.5 ± 1.4

±70°

0.28 ± 0.15

0.47 ± 0.29

0.56 ± 0.31

3.9 ± 2.1

0.5 ± 0.34

0.64 ± 0.43

0.63 ± 0.45

2.8 ± 1.7

±60°

0.43 ± 0.23

0.49 ± 0.28

0.72 ± 0.37

4.5 ± 3

0.67 ± 0.46

0.64 ± 0.39

0.92 ± 0.46

2.7 ± 1.7

±50°

0.65 ± 0.41

0.89 ± 0.53

0.93 ± 0.64

**5.2 ± 3.4**

0.99 ± 0.67

1.1 ± 0.72

1.1 ± 0.81

3.1 ± 2.2

±40°

0.98 ± 0.87

1.2 ± 0.9

2 ± 1.7

**12 ± 11**

1.6 ± 1.2

1.7 ± 1.3

1.6 ± 1.2

**9.6 ± 9.2**

RCS

FCS

**Constrained alignment translation error**. Constraining the search to rotations around a single axis. Subtomogram alignment error in terms of the difference in the Euclidean distance between determined and true subtomogram translations. Shown are the medians and median absolute deviations of all 100 subtomogram alignments.

60° angle interval, RCS

SNR

∞

1

0.5

1

Tilt

±90°

0.018 ± 0.0053

0.047 ± 0.017

0.08 ± 0.023

0.28 ± 0.1

±80°

0.027 ± 0.011

0.069 ± 0.029

0.1 ± 0.048

0.37 ± 0.17

±70°

0.037 ± 0.017

0.085 ± 0.04

0.13 ± 0.059

0.45 ± 0.28

± 60°

0.055 ± 0.028

0.14 ± 0.083

0.19 ± 0.11

0.59 ± 0.36

±50°

0.1 ± 0.067

0.18 ± 0.12

0.24 ± 0.12

0.74 ± 0.42

±40°

0.27 ± 0.25

0.49 ± 0.4

0.94 ± 0.83

3.3 ± 2.3

60° angle interval, FCS

SNR

∞

1

0.5

0.1

Tilt

±90°

0.018 ± 0.0062

0.05 ± 0.017

0.074 ± 0.021

0.23 ± 0.097

±80°

0.027 ± 0.013

0.063 ± 0.025

0.076 ± 0.029

0.24 ± 0.091

±70°

0.034 ± 0.015

0.077 ± 0.032

0.098 ± 0.036

0.32 ± 0.16

±60°

0.055 ± 0.033

0.12 ± 0.063

0.18 ± 0.099

0.46 ± 0.28

±50°

0.11 ± 0.075

0.16 ± 0.099

0.2 ± 0.11

0.6 ± 0.36

±40°

0.39 ± 0.35

0.36 ± 0.27

0.57 ± 0.44

3.5 ± 2.8

45° angle interval, RCS

SNR

∞

1

0.5

0.1

Tilt

±90°

0.015 ± 0.005

0.054 ± 0.014

0.063 ± 0.019

0.24 ± 0.086

±80°

0.024 ± 0.0072

0.056 ± 0.018

0.086 ± 0.028

0.29 ± 0.12

±70°

0.033 ± 0.0095

0.075 ± 0.028

0.11 ± 0.04

0.38 ± 0.17

±60°

0.046 ± 0.019

0.1 ± 0.036

0.16 ± 0.051

0.48 ± 0.22

±50°

0.067 ± 0.035

0.14 ± 0.06

0.22 ± 0.098

0.59 ± 0.27

±40°

0.22 ± 0.18

0.31 ± 0.21

0.44 ± 0.29

2.3 ± 1.7

45° angle interval, FCS

SNR

∞

1

0.5

0.1

Tilt

±90°

0.017 ± 0.0053

0.048 ± 0.014

0.06 ± 0.016

0.21 ± 0.074

±80°

0.021 ± 0.0069

0.052 ± 0.018

0.073 ± 0.025

0.2 ± 0.067

±70°

0.03 ± 0.011

0.065 ± 0.023

0.098 ± 0.03

0.26 ± 0.085

±60°

0.043 ± 0.017

0.088 ± 0.033

0.13 ± 0.044

0.32 ± 0.11

±50°

0.07 ± 0.032

0.14 ± 0.056

0.18 ± 0.073

0.44 ± 0.19

±40°

0.17 ± 0.11

0.24 ± 0.16

0.33 ± 0.18

1.5 ± 1.2

When the information about the orientation of the membrane surface normal is included in the search process, the alignment accuracy increases significantly for subtomograms at high distortion levels. Without surface normal information, the alignment fails for subtomograms at very low SNR of 0.1, resulting in average angluar alignment errors of at least 10°. With surface normal information, the average anglular alignment errors are less than 6° even for subtomograms generated from a small tilt angle range of ±50°.

Next, we further test our alignment methods for refining the density maps of the complexes by averaging over all aligned subtomograms. For each complex, we generated 1000 subtomograms (at SNR 0.5, tilt angle range ±60°) containing randomly oriented models. We then aligned the tomograms against the initial templates with a rotational sampling of 60° angle intervals. From the resulting averaged density maps it can be seen that our methods can successfully recover the initial structures (Figure

Averaged subtomogram after alignment

**Averaged subtomogram after alignment**. Averaged subtomograms. Left, aligned using RCS. Right, aligned using FCS.

Pairwise alignment of subtomograms from real macromolecular complexes

A whole cell cryo-electron tomogram consists of instances of macromolecular complexes of different types. In principle, these instances can be segmented into individual subtomograms and classified after pairwise alignments. Therefore, subtomogram alignment and classification is fundamental for successful structural systems biology analysis of complexes using whole cell tomograms. In this section, we test our methods on subtomograms of four macromolecular complexes obtained from the Protein Data Bank (PDB id

We perform all pairwise alignments between all 80 subtomograms with sampling of 60° rotational angle intervals. After alignment the resulting dissimilarity score matrix for subtomogram classification is significantly improved in comparison to the dissimilarity score matrix generated from the initial starting structures (Figure

Pairwise alignment of protein complexes

**Pairwise alignment of protein complexes**. (a) Dissimilarity score matrices for subtomogram classification. The matrix elements representing the same complexes are in consecutive order. (Top row) Dissimilarity score matrix based on the initial subtomogram orientations before alignment for (left column) RCS score and (right column) FCS score. (Bottom row) RCS and FCS score matrices after subtomogram alignments. The alignment is performed at a sampling with 60° rotation angle intervals. (b) Density maps of complexes generated after averaging of the aligned subtomograms in the same class. (Left column) isosurface of the density distribution in single subtomogram for each complex. (Middle and right columns) isosurface of the resulting density maps generated by averaging the 20 subtomograms aligned with the RCS and FCS scores, respectively.

After classification and alignment, the resulting averaged tomograms are very similar to the original density maps. The distortions, as evident in the individual subtomograms are greatly reduced after averaging (Figure

Conclusion

In this paper, we have proposed a new gradient-based method for high precision subtomogram alignments. Combined with the RCS and FCS scores, this method can achieve significantly lower alignment errors in comparison to an exhaustive sampling method. We show that this accuracy can already be reached with only a relatively small number of sampled candidate orientations, for example at rotational intervals of 60° and 45°. The improvement in performance when using rotational intervals of 45° instead of 45° intervals is only marginal, indicating that 60° intervals are already sufficient for most alignments. We further extended the method to a special case when the alignment search is constrained to rotations around a single axis. For instance, alignment of membrane bound complexes allow the rotational search to be restricted to rotations around an axis parallel to a surface normal. This constrained alignment can achieve even higher alignment precision and is more robust to distortions in subtomograms, even when only 6 to 8 initial rotation angle candidates are used.

The RCS and FCS scores both have certain advantages. In contrast to FCS the RCS score takes into account the contrast difference between subtomograms. On the other hand, the FCS score has closed form partial derivatives with respect to the translation parameters, therefore introducing less numerical instability in the gradient refinement process. Moreover it is more efficiently computed because a smaller number of computational intensive rigid transform operations are needed.

Moreover, we have proposed a very efficient stochastic parallel refinement method, which is able to find the global optimum with only a small fraction of iterations in comparison to the independent sampling and refinement with the same sampling angle intervals. Together, these improvements increase significantly the efficiency and accuracy for subtomogram alignments, which is a key factor for the systematic classification of macromolecular complexes in cryo-electron tomograms of whole cells.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

F.A. and M.X conceived the project. M.X. performed the research and carried out experiments. F.A. and M.X. wrote the manuscript.

Acknowledgements

The authors would like to thank Dr. Martin Beck and Dr. Kay Gruenwald for providing valuable suggestions and tomography simulation code. This work is supported by the Human Frontier Science Program grant RGY0079/2009-C to F.A., Alfred P. Sloan Research foundation grant to F.A.; NIH grant 1R01GM096089 and 2U54RR022220 to F.A.; NSF grant CAREER 1150287 to F.A.. F.A. is a Pew Scholar in Biomedical Sciences, supported by the Pew Charitable Trusts.

This article has been published as part of