Nencki Institute of Experimental Biology, Polish Academy of Sciences, Pasteura 3, Warszawa, Poland

Abstract

Background

Quantitative analysis of changes in dendritic spine morphology has become an interesting issue in contemporary neuroscience. However, the diversity in dendritic spine population might seriously influence the result of measurements in which their morphology is studied. The detection of differences in spine morphology between control and test group is often compromised by the number of dendritic spines taken for analysis. In order to estimate the impact of dendritic spine diversity we performed Monte Carlo simulations examining various experimental setups and statistical approaches. The confocal images of dendritic spines from hippocampal dissociated cultures have been used to create a set of variables exploited as the simulation resources.

Results

The tabulated results of simulations given in this article, provide the number of dendritic spines required for the detection of hidden morphological differences between control and test groups in terms of spine head-width, length and area. It turns out that this is the head-width among these three variables, where the changes are most easily detected. Simulation of changes occurring in a subpopulation of spines reveal the strong dependence of detectability on the statistical approach applied. The analysis based on comparison of percentage of spines in subclasses is less sensitive than the direct comparison of relevant variables describing spines morphology.

Conclusions

We evaluated the sampling aspect and effect of systematic morphological variation on detecting the differences in spine morphology. The results provided here may serve as a guideline in selecting the number of samples to be studied in a planned experiment. Our simulations might be a step towards the development of a standardized method of quantitative comparison of dendritic spines morphology, in which different sources of errors are considered.

Background

Dendritic spines are short (with the typical length up to 2-3

The enormous diversity of spines has been recognized since spines were first observed

The concerns were raised that non-reproducibility or even contradictory results were reported in a set of experiments in which qualitatively similar results had been expected

Different kinds of sampling problems arise, depending on whether we compare different spine populations or if we track the time changes in live imaging of individual spines. There are several experimental situations in which one must compare images of different samples taken at specific time points. These cases include (a) comparisons of morphology of spines in transgenic versus wild-type animals, (b) models of neurodegenerative diseases, (c) studies of the influence of environmental factors, (d) the effect of pharmacological treatment, (e) characteristics of different parts of the brain or (f) different types of cells and (g) usage of electron-microscopy. We will focus on experiments in which measurements based on snapshots of different spines are analyzed.

The aim of our paper is to study the effectiveness of quantitative comparative methods in various experimental setups by means of Monte-Carlo simulation. We estimate the limitations in method sensitivity resulting from the sampling problem. Such estimates might be a guideline in selecting the number of samples in a new experiment or evaluating the sensitivity of experiments that have already been performed. It has to be stressed that there are other sources of variation present which originate in: the preparation of experimental samples, choice of the dendrite and the brain area, and the individual features of animals. Due to these factors, the estimates of method sensitivity resulting from sampling issues shall be treated as an upper (the best case) limit.

The simplest setup to compare morphology of spines comprises two groups of samples, that is, the treatment and the control. Selected subsets of spines would be assigned to each group. The simulations were performed by introducing in a controlled way, the systematic changes into the treatment group, while the spines in a control group were assigned randomly from a database that had been created previously. The morphological changes were assessed by performing the statistical tests in which the datum is the value of a certain variable, averaged over the sample. Alternatively, the distributions of variables could be compared using the Kolgomorov-Smirnov test, which could reveal changes that occur only in the subpopulation of spines. We investigated whether we can recover the differences we have previously introduced, changing the number of samples, the number of spines per sample, magnitude of introduced changes, variable studied, statistical test and its p-value. We looked both into the changes that affected the entire spine population and changes that affected only certain subpopulations. We focused both on the false negative rate and on the false positive rate. That is, we described the Type II Error. (i.e. the situation when the actual differences between populations were undetected), and the Type I Error (i.e. the situation when we conclude that there are differences between the groups while actually all spines originate from the same population). The latter case was simulated by comparing two control groups. In our analysis we focus on studies which measure the spine length, the head-width and the cross-sectional area (see Section ”Methods” for details).

Beside the parameters on which we focused our analysis, there are many other different two-dimensional and three-dimensional quantities describing the morphology of dendritic spines in confocal (and less frequently electron microscopy^{a}) images that are commonly studied. These parameters describe (a) the overall sizes of the spines, (b) the details such as head size or neck length, establishing relations between the morphology and the spine structure and function, and (c) parameters which combine the spine shape with fluorescence intensity. Several algorithms dedicated to two-dimensional^{b} and three-dimensional spine segmentation in confocal stacks have been proposed ^{c}.

Methods

The resource used in the Monte-Carlo simulations was the database of variables that describe the morphology of 2499 dendritic spines originating from 34 cells that were used as controls in other experiments. The simulations were implemented in Python 2.5.4 using Scientific Python Library 0.7.0. In a single simulation run, two groups were created, a certain statistical test was performed, and the outcome was recorded. This procedure has been performed repeatability in order to assess the false negative and false positive rates (we performed respectively 2000 and 10000 simulation runs in each case).

Preparation of dissociated cultures

Hippocampal dissociated cultures from P0 Wistar rats were prepared as follows: Brains were removed and hippocampi were isolated on ice in Dissociation Media DM (in mM: 81.8 _{2}
_{4}; 30 _{2}
_{4}; 5.8 _{2}; 0.25 _{2}; 1 HEPES pH 7.4; 20 Glucose; 1 Kynureic Acid; 0.001% Phenol Red). Hippocampi were later incubated twice for 15 minutes at 37°_{2} in a humidified incubator for 2 weeks. All experiments were performed 14 to 19 days in vitro (DIV). Cells were transfected using Effectene (Qiagen) according to manufacturer protocol at 10 DIV with plasmid carrying RFP under

Confocal imaging and image analysis

Images were acquired using the Leica TCS SP 5 confocal microscope with PL Apo 40 x /1.25 NA oil immersion objective using 561 nm line of diode pumped solid state laser at 10% transmission at a pixel count of 1024x1024. A series of z-stacks were acquired for a cell with step 0.4

**(A)** Low-power image of neuron with marked spines (red contours) selected for simulations.

**(A) Low-power image of neuron with marked spines (red contours) selected for simulations.** Only clearly distinguishable transversally protruding spines located on the secondary dendrite were taken. Scale bar is 10**(B)** For each spine we measured the cross sectional area, the head-width and the length. For the symmetric spine there is no ambiguity in definition of the head-width or the length. However, for the bent spines taking as the length the distance between the point at the spine tip and the foot results in underestimation of the length. Also it is not clear which distance shall represent the spine head-width (dashed red lines). For this reason we used the virtual spine skeleton to measure the length and require the head-width line to be perpendicular to this skeleton. **(C)** The magnification of a neuron with marked spines selected for simulations (Scale bar: 1

The probabilities of committing Type II Error (concluding a “false negative”) while analyzing the results of simulated linear growth of three quantities: head-width (dashed blue line), spine area (dotted red line) and length (solid green line).

**The probabilities of committing Type II Error (concluding a “false negative”) while analyzing the results of simulated linear growth of three quantities: head-width (dashed blue line), spine area (dotted red line) and length (solid green line).** The plots correspond to linear growth by 50% **(A)**, 20% **(B)**, 10% **(C)**. The same number of samples (the x axis) was used for both the control and the treatment group, each sample contained 60 spines. The comparison of groups was based on the t-test with p-value 0.01. The ensemble of 2000 simulation runs has been used to calculate the probabilities

Results and discussion

We have simulated an experiment with two groups (control and treatment) of

To eliminate the systematic differences in spine morphology due to the location of the spines on dendrites with different rank, special care was taken to acquire images of spines on secondary dendrites. Due to this restriction and due to the limitations resulting from the resolution of the optical microscope, we could clearly measure the morphology of roughly 30-90 spines per confocal stack (1024x1024 pixels). One of the factors contributing into the total measurement variation originates from the uncertainty of determining the spine shape. The determination of shape is restricted by the optical resolution of the microscope. In our experiment we used the RFP which gave the microscope resolution (FWHM) of 0.187

The comparison of the effectiveness of the Mann-Whitney-Wilcoxon two-tailed u-test (solid lines) and the Student’s two-tailed t-test (dotted lines).

**The comparison of the effectiveness of the Mann-Whitney-Wilcoxon two-tailed u-test (solid lines) and the Student’s two-tailed t-test (dotted lines).** The linear growth of 20% has been simulated, the tests were used with p-value 0.001. The ensemble of 2000 simulations has been used with 60 spines per sample. For

**Modeled change**

**t-test p-value**

**Parameter studied**

**Required number of cells per group**

**in spines variable**

**(with 15, 30, 60 spines/cell)**

Simulation results with various numbers of cells, spines per cell, p-values of the test, strength of systematic change and parameters studied were used to set the limits on number of analyzed cells per group in order to guarantee that the false negative rate falls below 5%. The systematic changes were modeled by the linear growth. The test groups were compared by Student’s two-tailed t-test.

10%

0.001

area

166

84

43

length

137

70

35

head-width

66

35

19

20%

0.001

area

47

24

14

length

39

21

13

head-width

21

12

8

50%

0.001

area

13

8

6

length

11

7

5

head-width

7

5

4

10%

0.01

area

119

62

31

length

101

51

27

head-width

49

26

14

20%

0.01

area

33

19

10

length

29

15

8

head-width

15

9

6

50%

0.01

area

9

6

4

length

8

5

4

head-width

5

4

3

In the first stage, we considered a theoretical situation in which the values of each morphometric variable under investigation grew linearly (by 10%, 20% and 50%) for any spine in the treatment group. Spine length, head-width and cross-sectional area were considered independently. The groups were compared using Students t-test. The computed false negative rate is presented in Figure

From (Figure ^{d} The spine length (the variable for which kurtosis is largest) is an important variable reflecting the difference between long filopodia and short stubby spines. The distribution of spine length is given in (Figure

The distribution of spine length has a highly non-Gaussian character which manifests in the large kurtosis (7.

**The distribution of spine length has a highly non-Gaussian character which manifests in the large kurtosis (7.97 for the distribution studied versus 2.02 for the distribution of the spine head-width and 0 for the Gaussian distribution).** We parametrized the length distributions as a superposition of three Gaussian functions that might represent different classes of spines, yet there is still a clear deviation between the curve and points to which it was fitted for lengths >5

The distribution could be modestly parametrized by a superposition of three Gaussian functions with the exception of points in the tail which represented very long filopodia (i.e., the length of filopodia was greater than 5

The simulation results for other settings (p-values, number of spines per sample, etc.) in a form of minimal number of samples that have to be analyzed in order to push the false negative rate below 5% are shown in Table

In Figure

Changes in Spines Subpopulations

Due to the fact that the spines exhibit a vast variation in their morphology and the biochemical composition, a model in which every spine in the treatement group is enlarged may be unrealistic. Realistically, only a certain subpopulation of spines could have been affected, or the size of changes may depend on the morphological characteristics of the spine. We will discuss here two models of spine maturation where the changes occur only in a subset of spines. We determine whether these changes can be seen in averaged data. The first one mimicked the situation in which more mature spines appeared at the cost of filopodia, and therefore there was a shift in the spectrum of the spines. Thus, we illustrated the hypothesis in which the filopodia were the precursors of dendritic spines

Two distinct dynamic processes

There are innumerable ways in which the changes could have affected certain spine subpopulations. Such changes result in modifications of the spine spectrum. We analyzed only the exemplary cases as an illustration of arising qualitative features. As an example of spine maturation, we have simulated the case in which the control group (measuring the spine lengths) was compared to the test group in which there was a 50% probability of elimination of spines with a length greater than 2

The general difference betweeen the simulated models of changes in spines subpopulations (maturation of spines at the cost of filopodia versus the growth of small spines) is shown in (Figure ^{
μm2}). Similar behaviour was observed with other settings.

False positive rates (probabilities of committing Type I Error) while direct comparison of two distinct population of spines is performed by means of Kolgomorov-Smirnov double sided test.

**False positive rates (probabilities of committing Type I Error) while direct comparison of two distinct population of spines is performed by means of Kolgomorov-Smirnov double sided test.** The variations of the mean values (per sample) are resulting not only from the sampling diversity but also from systematic factors (for each studied sample) affecting the measured parameter, see Results for details. The ensemble of 10000 simulation runs has been used to calculate the probabilities. The p-value was set at the level of 0.001 (the blue solid line). We have used

False positive results

In order to analyse false positive results two control groups were created in an identical way and subsequently compared. In contrast to the previously discussed false negative rate, which depended on many factors, the false positive rate is determined by the level of the test significance (i.e., the p-value). We evaluated whether the test p-value coincided with actual false positive rate. In some cases these values do not coincide, for example, the t-test requires satisfying certain conditions such as distribution normality and variance homogeneity. For the parameters shown in Table

However, if we use the Kolgomorov-Smirnov test we could have discrepancy between the actual false positive rate and the p-value of the conducted test. This situation can occur, because the null hypothesis of the Kolgomorov-Smirnov test is based on the assumption that the spines were drawn from the same distribution, but it does not include any of the systematic errors that influenced the morphology of the animals, cells, or dendrites that were included in the study. Systematic errors might have originated from various factors including: (a) differences in the preparation of samples, (b) individual features of the animals or cells that were selected, and (c) differences in spine morphology due to the distance from soma, etc.

An important feature of the Kolgomorov-Smirnov test is that it does not take into account from which sample within the group a particular spine originates. Therefore, the positive outcome of the K-S test means that the two populations of spines were unlikely to have originated from the same distribution, but this does not mean that the positive outcome was caused by some systematic influence on spines in the treatment group. The difference between populations could have been caused by abnormalities in the spine morphology even in one animal.

In our simulations we analyzed the outcomes of the Kolgomorov-Smirnov test. The parameters for each spine were subjected to additional systematic perturbations pertaining to a specific sample. Such perturbations were modeled by drawing for each sample a factor from a Gaussian distribution with the expectation value 1.0, while the variance

The detectability of changes in the spine subpopulations may strongly depend on the statistical used.

**The detectability of changes in the spine subpopulations may strongly depend on the statistical used.** For example, the growth of many small spines is much easier detected by using the K-S test rather then the t-test. In case of maturation of filopodia the situation was opposite

Comparison of number of Spines in Subclasses

Another way of detecting the changes in dendritic spine morphology is to divide the studied population into discrete subclasses, and to compare the percentage of spines in each subclass within the test groups. There is no standard classification, and different researchers may use different criteria. Division of spines into subclasses may be based on the absolute criterion (i.e., taking into account whether some spine variable is below or above a certain threshold) or on the relative criterion (i.e., taking into account the ratio of two spine variables). One of the popular criteria is to divide the population into thin, mushroom, stubby spines and filopodia

Another possibility of division of spines into subclasses is to threshold some spine variable and to create only two subclasses. In this setting, the sensitivity of the test was studied in the following model: The spines were classified as ”large” or ”small” depending on their cross-sectional area. The threshold value was 0.65

The direct morphometric measurements are much more sensitive (lower false negative rate) that the measurements based on the comparison of number of spines in subclasses.

**The direct morphometric measurements are much more sensitive (lower false negative rate) that the measurements based on the comparison of number of spines in subclasses.**

Conclusions

We have described several diverse issues in quantitative analysis of spines that could interfere with the final conclusions that are drawn from a study. The results of simulations set the minimal number of samples and spines that have to be analyzed in order to achieve the assumed false negative rate. The tabularized results might be helpful in optimization of the experimental setup. Specifically, we have observed that:

The simulation results show that systematic changes with the same magnitude would be detected more easily when the head-width, rather than the spine length or the cross-sectional area, is studied. Indeed, most of the positive results that were reported concern the changes in head-width. This also could be caused partially by the fact that head-width is a variable that is studied by most researchers interested in the morphology of dendritic spines due to: (a) known correlations that link it to the postsynaptic density, and (b) spine stability or spine head enlargement after various forms of stimulations

For the large changes (i.e., changes greater than 50%) in any of the parameters that were studied, the differences between the groups can be easily detected because the false negative rate quickly decays (roughly exponential) as a function of the number of animals or samples. However, for more subtle changes (i.e., changes less than 10%) the decay is very slow (Figure

In the situations that we have studied, the t-test was slightly more sensitive than the u-test (when the datum is the average value over the spines belonging to a specific animal or sample). It has to be noted here that the results of the Wilcoxon-Mann-Whitney test differed among different software packages as was reported in

Studies of changes that occur in the spine populations reveal that two different situations may exist: (1) The changes that effect the filopodia, which lie in the tail of the distribution of the spine length, are more easily found by comparing the mean values, rather than comparing the distributions. (2) In contrast, the changes that occur in small spines are more easily found by comparing the distributions. These changes would be “buried in the noise” if the average values alone are evaluated. These examples might represent two different general cases: (1) changes that take place in a small number of spines for which the values of some parameter describing them is large, and (2) changes in numerous population for which the measured values that contribute to the mean are small.

Detecting differences between groups by means of the Kolmogorov-Smirnov test (or any other test which compares only the shapes of distributions) could lead to underestimation of false positive rate and the false conclusion that there are significant differences between the groups whilst actually there are none (i.e., Type I Error). This situation is due to the existence of contributions (specific to cells, samples or animals) into the variations of morphological parameters. These contributions do not originate from the sampling process. The magnitude of these contributions, which is difficult to estimate, may depend on several experimental factors. The simulation results show that if this magnitude exceeds a certain value, the actual false positive rate is much higher than the assumed p-value. This problem concerns the changes in subpopulations which may be detected using a test that probes the distributions, but might not be detected by a comparison of mean values for each animal. In these cases, there should be an additional confirmation of the claimed result. One possible confirmation could be obtained by studying a fraction of spines in the subpopulation (indicated by a comparison of distributions) for each animal.

It has been customary to classify spines into subpopulations such as stubby, mushroom, thin, and filopodia. However, whether there is an actual distinction between subpopulations or we observe a continuum of shapes

Starting from 1995 (especially in the studies of long term potentiation)

Although there is no straightforward prescription for the optimal method and size of the spines population to be analyzed, special attention should be taken to understand the origin and to estimate false negative and false positive rates in the performed statistics. Misapplication may lead to a high rate of non-repeatability and to drawing frivolous conclusions from the experiment.

Endnotes

^{a}The comparative analysis of the morphology of dendritic spines has been developed starting from the quantification of the electron-microscopy section images. The shape parameter that is usually taken into consideration is the cross-sectional area. In the single section estimates, a number of spines cut by a subjective cross-section or projection was observed. The measured quantities depend significantly on the way the section cut the spine, which introduced a large uncontrollable source of variation. When high-resolution optical microscopes and three-dimensional reconstruction of serial-sectioning electron microscopy (SSEM) were introduced it became possible to use these techniques to quantify the parameters.The three-dimensional reconstruction of SSEM images is very labor-intensive ^{b}There was a study ^{c}There was a study ^{d}Kurtosis is one of the measures that shows how a given distribution differs from the Gaussian function. Kurtosis is based on the fourth moment of the population and vanishes for the Gaussian function. The largest values of kurtosis were found for spine-length (7.96) and for area (7.38). For spine head-width, kurtosis had a much smaller value (2.02). These observations do not exactly coincide with the observed fact that the changes in the spine area are the most difficult to detect. However, other details of the distribution such as higher moments, could also be important.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

BR and JW conceived of the study. BR developed the algorithms and performed the simulations. JW designed the imaging experiment. ZS and MB performed the experiments and analyzed the data, KK contributed the data. BR, GW, LK and JW wrote the original manuscript. GW and LK commented on the manuscript with important intellectual contributions. All authors read and approved the final manuscript.

Acknowledgements

We thank Ben Warhurst for reviewing the manuscript, Marcin Wawrzyniak and Piotr Michaluk for their critical remarks. The work was supported by ERA-NET NEURON MODDIFSYN (B.R., J.W. and L.K.), National Science Centre Dec-2011/01/D/NZ3/00163 (J.W.) and POIG-008 grant (G.W.).