Department of Computer Science and Engineering The University of Texas at Arlington Arlington, TX, 76019, USA

Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA

Abstract

Background

Biological networks offer us a new way to investigate the interactions among different components and address the biological system as a whole. In this paper, a reverse-phase protein microarray (RPPM) is used for the quantitative measurement of proteomic responses.

Results

To discover the signaling pathway responsive to RPPM, a new structure learning algorithm of Bayesian networks is developed based on mutual Information, conditional independence, and graph immorality. Trusted biology networks are thus predicted by the new approach. As an application example, we investigate signaling networks of ataxia telangiectasis mutation (ATM). The study was carried out at different time points under different dosages for cell lines with and without gene transfection. To validate the performance ofthe proposed algorithm, comparison experiments were also implemented using three well-known networks. From the experiment results, our approach produces more reliable networks with a relatively small number of wrong connection especially in mid-size networks. By using the proposed method, we predicted different networks for ATM under different doses of radiation treatment, and those networks were compared with results from eight different protein protein interaction (PPI) databases.

Conclusions

By using a new protein microarray technology in combination with a new computational framework, we demonstrate an application of the methodology to the study of biological networks of ATM cell lines under low dose ionization radiation.

Background

Bayesian networks are widely applied to a variety of domains such as business, engineering, and medicine

To perform an efficient inference and correct representation of the dependency relationship, an optimal structure is constructed to maximize the probabilistic fitness to the given data. Determining the optimal network through learning structures of Bayesian networks has been explored over the last decade, which contains the development of searching and scoring schemes. The searching is to find the structure that has the highest score among all possible ones. Since the searching space grows exponentially when the number of variables (nodes) increases, it is known as NP hard

Until now, several scoring functions have been developed including the well known Cooper-Herskovits scoring function as in K2 algorithm

The goal of this study is to infer the proteomic signaling pathways affected by DNA damage, DNA repair, cell cycle checkpoints, and cell apoptosis under the influence of different radiation dosages. An emerging protein microarray technology, called the revers-phase protein microarray (RPPM), in conjunction with the quantum dots (Qdot) nano-technology, is used as the detection system. We study the proteomic responses at different time points (1h, 6h, 24h, 48h, and 72h) under different dosages (4 cGy, 10 cGy, 50 cGy, 1 Gy, and 5 Gy).

To infer the signaling pathways under different radiation dosages, in this paper we propose a new Bayesian network structure learning algorithm using the mutual information, conditional independence, and property of immorality in graph. Our method has two important features. First, the algorithm does not provide the direction for every edge in a predicted network. Since a signaling pathway is composed of successive and oriented interactions of molecules, even a small number of edges that have incorrect directions can cause significant effect in biological network analysis. To avoid a misleading result, therefore, we aim to report the most trusted edges, though a complete directed graph is not produced. Second, we focus on reducing wrong edges even though price for missing edges is paid. In other words, reliable, though not complete, information is reported as opposed to complete but uncertain information. To achieve these two goals, we initially exclude edges with low mutual information, and strictly carry out conditional independence test and immorality test for each candidate edge in order to remove incorrect edges. In the following sections, we first introduce the main steps of the proposed methodology. Then we use well known standard networks to evaluate the performance of the algorithm. Finally proteomic networks for ATM cell lines under different radiation dosages are presented.

Methods

Bayesian networks and MDL scoring function

Consider a finite set_{n}_{1}, _{2}, …, _{n}_{G}_{n}_{G}_{i}_{i}_{i}_{i}_{i}

Therefore, once we know the structure of a Bayesian network and the conditional probabilities of each node, we will know the joint probability distribution. The objective of this study is to infer the biological structure

and

where_{i}_{i}_{i}_{i}_{ijk}_{i}_{ik}_{i}_{i}_{i}_{ij}_{i}

Mutual information and conditional independence

In our proposed algorithm, mutual information (MI) is used to decide which edge is more significant than others. More precisely, we sequentially decide the connection and orientation of edges which is ordered by MI. Mutual Information between random variables X and Y is defined as follows:

where

where_{i}

In our algorithm, conditional independence (CI) is also used to find which edge is incorrect in a triangular structure. CI is defined as follows:

_{i}_{j}_{k}_{i}_{j}_{k}_{i}_{k}_{j}_{k}

Therefore, once the edge we consider to connect makes a new triangle, we can test (5) for all three edges of the triangle. Based on the result of the CI test, we can update the network.

Property of equivalence class and immorality

In searching and scoring scheme for learning structure of Bayesian networks,

Structures with three nodes and two edges

Structures with three nodes and two edges

Algorithm

The proposed algorithm initially starts from a non-connected network in which there is no edge between nodes. We calculate MI for two nodes of all edges, and the edges whose MI is less than threshold

The first case is when the

The second case is that the current edge creates a cycle in the graph which means there should be at least more than one Immorality in the cycle because Bayesian Network is an acyclic graph. Since most of the edges with relatively low MI creates cycles and are added after correct network is constructed already, we have to avoid the wrong edges with immorality test between current and other linked edges. If there is no immorality, we do not use the current edge. As an example, given a structure with four nodes,

The third case is all other cases except the aforementioned two cases. In this situation, we connect current edge with other edges without any test except MI test for the orientations of edges. The pseudo code for our proposed algorithm is outlined in Figure

Pseudo code for suggested algorithm

Pseudo code for suggested algorithm

Results and discussion

Algorithm evaluation

To evaluate the algorithm, we adopted three well-known networks, ASIA

ASIA and CAR DIAGNOSIS2 networks. ASIA network has 8 nodes and 8 edges (A), and CAR DIAGNOSIS2 network has 18 nodes and 20 edges (B).

ASIA and CAR DIAGNOSIS2 networks. ASIA network has 8 nodes and 8 edges (A), and CAR DIAGNOSIS2 network has 18 nodes and 20 edges (B).

ALARM network. ALARM network has 37 nodes and 46 edges.

ALARM network. ALARM network has 37 nodes and 46 edges.

Results of structure learning for known networks

ASIA network has 8 nodes and 8 edges. Since ASIA network is a small size graph, the predicted network by our method does not have any WE and even ME is just 0.1 on average as shown in Table

Result for the Asia network

**Method**

**ME**

**WOE**

**WE**

Our Method

0.1

n/a

0

Hill-Climbing

2.2

0.8

4.8

K2

1

3.45

4.8

Result for the Car Diagnosis2 network

**Method**

**ME**

**WOE**

**WE**

Our Method

2

n/a

0.8

Hill-Climbing

2.35

5.9

8.4

K2

1.4

9.4

16.3

Result for the Alarm network

**Method**

**ME**

**WOE**

**WE**

Our Method

6.05

n/a

3.85

Hill-Climbing

1.55

9.75

9.4

K2

2.05

22.5

53.75

Result for Trustworthy Network

**Network**

**NETN**

**CETN**

**ACCURACY**

ASIA

4

4

100%

CAR DIAGNOSIS2

13.8

13

94%

ALARM

29.7

26.8

90%

Learning structure of pathway in ATM cell

We applied quantum dot reverse-phase protein microarray to profile the dynamic responses of several signaling pathways, including DNA damage, DNA repair, and cell cycle checkpoints, under ionizing radiation (IR). Ataxia telangiecstasia mutation-deficient (ATM-) and -proficient (transfected with full length ATM construct, ATM+) cells were treated with different doses of IR and cell lysates were collected at different time-points, serially diluted and spotted on an array in triplicate. The intensities of all antibodies were normalized relative to those of control and were normalized to values from zero to one. The arrays were then probed with specific antibodies. 67 antibodies have been evaluated for the dynamic change of the network. The complete list of the antibodies is shown in Table

67 antibodies used in the reverse-phase protein array for ATM radiation study

**0**

**1**

**2**

**3**

**4**

**5**

**6**

**7**

mTOR

b-catenin

Chk1

E-Cad

MDM2

p38

p-p38

pChk2

8

9

10

11

12

13

14

15

pATM

Rb

pRb

Raf-1

p-Src

PTEN

STAT3

Caspase8

16

17

18

19

20

21

22

23

IGF1-R

IRS-1

GSK3ab

pGSK3ab

pMDM2

pSTAT3

AKT

pAKT

24

25

26

27

28

29

30

31

Caspase3

DNAPK

pDNAPK

EGFR

pEGFR

NFkBp65

pNFkB

NQO1

32

33

34

35

36

37

38

39

p21

p27

p-PTEN

pRaf1

Bcl-2

pBcl-2

Caspase9

cdk4

40

41

42

43

44

45

46

47

pErk

lkBa

plkBa

JNK

Klotho

p16

p53

p-p53

48

49

50

51

52

53

54

55

Smad3

Src

Vimentin

sClu

ATM

Chk2

Erk

HSP27

56

57

58

59

60

61

62

63

IGFBP

pChk1

pDNAPK

gH2AX

pIGF1-R(y1158.62.63)

pIGF1-R(y1162.63)

pIRS(Y896)

pIRS(Y1179)

64

65

66

pJNK

p-mTOR

pSmad3

The expression data is normalized with respect to Actin concentration on each microarray chip. The expression level of each antibody is discretized into 2 to 4 values using minimum entropy based discretization. For each IR dose, we have a total number of 30 samples for ATM+ and ATM- from the triplicate at different times. Among the 67 antibodies involved in the RPPM data, we select the most distinguishing ones between ATM+ and ATM- using a feature selection method developed in our early work

Figures

Signal networks under the dosages of 4cGy (A), 10cGy (B), and 50cGy (C).

Signal networks under the dosages of 4cGy (A), 10cGy (B), and 50cGy (C).

Signal networks under the dosages of 1Gy (A) and 5Gy (B).

Signal networks under the dosages of 1Gy (A) and 5Gy (B).

Conclusions

Understanding the proteomic network structure reveals the inherent biological information flow which will lead to more effective therapies and disease treatments. In this paper, by using a new protein microarray technology in combination with a new computational framework, we demonstrate an application of the methodology to the study of biological networks of ATM cells under ionization radiation. Different networks were found through this study. The same technology can be extended to different biological problems. For future work, we intend to validate our discovery by carrying out biological experiments.

Authors' contributions

Dongchul Kim and Jean Gao contribute the computational algorithm design and the manuscript writing. Xiaoyu Wang carried out the biological experiment for the RPPM data generation. Chin-Rang Yang is responsible for the overall project layout and direction.

Competing interests

The authors declare that they have no competing interests.

Acknowledgements

This research was supported by the Department of Energy under Grant No. DEFG02-07ER64335.

This article has been published as part of