Web document 7.9. Output of the TREE-PUZZLE program for 13 globin proteins.

 

TREE-PUZZLE 5.2

 

Input file name: 13globins.phy

Type of analysis: tree reconstruction

Parameter estimation: approximate (faster)

Parameter estimation uses: neighbor-joining tree (for substitution process and rate variation)

 

Standard errors (S.E.) are obtained by the curvature method.

The upper and lower bounds of an approximate 95% confidence interval

for parameter or branch length x are x-1.96*S.E. and x+1.96*S.E.

 

 

SEQUENCE ALIGNMENT

 

Input data: 13 sequences with 170 amino acid sites

Number of constant sites: 4 (= 2.4% of all sites)

Number of site patterns: 168

Number of constant site patterns: 3 (= 1.8% of all site patterns)

 

 

SUBSTITUTION PROCESS

 

Model of substitution: Dayhoff (Dayhoff et al. 1978)

Amino acid frequencies (estimated from data set):

 

 pi(A) =  12.0%

 pi(R) =   2.1%

 pi(N) =   2.8%

 pi(D) =   5.8%

 pi(C) =   0.6%

 pi(Q) =   2.4%

 pi(E) =   5.4%

 pi(G) =   6.9%

 pi(H) =   5.3%

 pi(I) =   3.8%

 pi(L) =  10.7%

 pi(K) =   9.6%

 pi(M) =   1.8%

 pi(F) =   5.7%

 pi(P) =   3.3%

 pi(S) =   6.4%

 pi(T) =   5.0%

 pi(W) =   1.2%

 pi(Y) =   1.9%

 pi(V) =   7.5%

 

 

AMBIGUOUS CHARACTERS IN THE SEQUENCE (SEQUENCES IN INPUT ORDER)

 

               gaps  wildcards        sum   % sequence

 mbkangaroo      16          0         16        9.41%  

 mbharbor_p      16          0         16        9.41%  

 mbgray_sea      16          0         16        9.41%  

 alphahorse      28          0         28       16.47%  

 alphakanga      29          0         29       17.06%  

 alphadog        29          0         29       17.06%  

 betadog         23          0         23       13.53%  

 betarabbit      23          0         23       13.53%  

 betakangar      24          0         24       14.12%  

 globinlamp      24          0         24       14.12%  

 globinseal      20          0         20       11.76%  

 globinsoyb      28          0         28       16.47%  

 globininse      19          0         19       11.18%  

 -------------------------------------------------------

 Sum            295          0        295       13.35%  

 

 

The table above shows the amount of gaps ('-') and other 'wildcard'

characters ('X', '?', etc.) and their percentage of the 170 columns

in the alignment.

Sequences with more than 50% ambiguous characters are marked with a '!' and

should be checked, whether they have sufficient overlap to other sequences.

Sequences with 100% ambiguous characters do not hold any phylogenetic

information and had to be discarded from the analysis.

 

 

 

SEQUENCE COMPOSITION (SEQUENCES IN INPUT ORDER)

 

              5% chi-square test  p-value

 mbkangaroo        passed          26.80% 

 mbharbor_p        passed          46.46% 

 mbgray_sea        passed          48.39% 

 alphahorse        passed          59.30% 

 alphakanga        passed          92.71% 

 alphadog          passed          23.85% 

 betadog           passed          66.49% 

 betarabbit        passed          53.16% 

 betakangar        passed          82.28% 

 globinlamp        passed          30.85% 

 globinseal        passed          35.59% 

 globinsoyb        passed          62.07% 

 globininse        passed          25.92% 

 

The chi-square tests compares the amino acid composition of each sequence

to the frequency distribution assumed in the maximum likelihood model.

 

WARNING: Result of chi-square test may not be valid because of small

maximum likelihood frequencies and short sequence length!

 

 

IDENTICAL SEQUENCES

 

The sequences in each of the following groups are all identical. To speed

up computation please remove all but one of each group from the data set.

 

 All sequences are unique.

 

 

MAXIMUM LIKELIHOOD DISTANCES

 

Maximum likelihood distances are computed using the selected model of

substitution and rate heterogeneity.

 

  13

mbkangaroo  0.00000  0.21902  0.20052  2.17425  2.12014  2.14530  2.00001

            2.11096  2.14115  2.14558  2.10391  3.52637  2.48052

mbharbor_p  0.21902  0.00000  0.12093  2.07576  2.14874  2.02229  2.00365

            2.14664  2.17650  2.32790  2.20266  3.57537  2.61353

mbgray_sea  0.20052  0.12093  0.00000  2.17915  2.16199  2.05370  1.99534

            2.09580  2.17180  2.16355  2.10682  3.61475  2.64070

alphahorse  2.17425  2.07576  2.17915  0.00000  0.24644  0.21733  0.96849

            0.92659  1.11683  1.57220  1.47385  2.44582  2.80913

alphakanga  2.12014  2.14874  2.16199  0.24644  0.00000  0.29110  1.02345

            0.96551  1.06470  1.64357  1.57526  2.35081  2.97564

alphadog    2.14530  2.02229  2.05370  0.21733  0.29110  0.00000  1.04569

            0.97727  1.13342  1.53262  1.50049  2.45586  2.94082

betadog     2.00001  2.00365  1.99534  0.96849  1.02345  1.04569  0.00000

            0.15641  0.30774  1.66648  1.60807  2.59331  3.32967

betarabbit  2.11096  2.14664  2.09580  0.92659  0.96551  0.97727  0.15641

            0.00000  0.31486  1.69085  1.67968  2.45026  3.26310

betakangar  2.14115  2.17650  2.17180  1.11683  1.06470  1.13342  0.30774

            0.31486  0.00000  1.74675  1.68240  2.47610  3.33744

globinlamp  2.14558  2.32790  2.16355  1.57220  1.64357  1.53262  1.66648

            1.69085  1.74675  0.00000  0.09023  2.48297  2.50849

globinseal  2.10391  2.20266  2.10682  1.47385  1.57526  1.50049  1.60807

            1.67968  1.68240  0.09023  0.00000  2.62112  2.51371

globinsoyb  3.52637  3.57537  3.61475  2.44582  2.35081  2.45586  2.59331

            2.45026  2.47610  2.48297  2.62112  0.00000  2.51450

globininse  2.48052  2.61353  2.64070  2.80913  2.97564  2.94082  3.32967

            3.26310  3.33744  2.50849  2.51371  2.51450  0.00000

 

Average distance (over all possible pairs of sequences):  1.87503

                  minimum  : 0.09023,  maximum  : 3.61475

                  variance : 0.78451,  std.dev. : 0.88572

 

 

RATE HETEROGENEITY

 

Model of rate heterogeneity: uniform rate

 

 

QUARTET STATISTICS (SEQUENCES IN INPUT ORDER)

 

 

 name       | resolved        | partly resolved | unresolved      | sum

 --------------------------------------------------------------------------

 mbkangaroo      194 [ 88.18%]       3 [  1.36%]      23 [ 10.45%]     220

 mbharbor_p      196 [ 89.09%]       1 [  0.45%]      23 [ 10.45%]     220

 mbgray_sea      195 [ 88.64%]       3 [  1.36%]      22 [ 10.00%]     220

 alphahorse      200 [ 90.91%]       1 [  0.45%]      19 [  8.64%]     220

 alphakanga      201 [ 91.36%]       1 [  0.45%]      18 [  8.18%]     220

 alphadog        199 [ 90.45%]       2 [  0.91%]      19 [  8.64%]     220

 betadog         199 [ 90.45%]       3 [  1.36%]      18 [  8.18%]     220

 betarabbit      199 [ 90.45%]       4 [  1.82%]      17 [  7.73%]     220

 betakangar      200 [ 90.91%]       4 [  1.82%]      16 [  7.27%]     220

 globinlamp      200 [ 90.91%]       0 [  0.00%]      20 [  9.09%]     220

 globinseal      201 [ 91.36%]       1 [  0.45%]      18 [  8.18%]     220

 globinsoyb      157 [ 71.36%]       1 [  0.45%]      62 [ 28.18%]     220

 globininse      219 [ 99.55%]       0 [  0.00%]       1 [  0.45%]     220

 --------------------------------------------------------------------------

  #quartets :    640 [ 89.51%]       6 [  0.84%]      69 [  9.65%]     715

 

The table shows the occurrences of fully resolved, partially, and

completely unresolved quartets for each sequence and their percentage

relative to the number of times the sequence occurs in the list of

quartets (i.e. 220 quartets out of 715 in total).

In fully resolved quartet one single topology is supported, while for

partially resolved quartets two and for completely unresolved quartets

none of the topologies (AB||CD, AC||BD, AD||BC) are favoured.

Note: Because 4 sequences are involved in one quartet numbers add up

to a four-fold of the existing quartets.

 

Hint: The overall numbers in the last row give information about the

phylogenetic content of the dataset. The higher the percentage of partially

and unresolved quartets, the lower the content of phylogenetic information.

This can be visualized in more detail by likelihood mapping analysis.

 

 

 

TREE SEARCH

 

Quartet puzzling is used to choose from the possible tree topologies

and to simultaneously infer support values for internal branches.

 

Number of puzzling steps: 1000

Analysed quartets: 715

Fully resolved quartets:  640 (= 89.5%)

Partly resolved quartets: 6 (= 0.8%)

Unresolved quartets:      69 (= 9.7%)

 

Quartet trees are based on approximate maximum likelihood values

using the selected model of substitution and rate heterogeneity.

 

 

QUARTET PUZZLING TREE

 

Support for the internal branches of the unrooted quartet puzzling

tree topology is shown in percent.

 

This quartet puzzling tree is completely resolved.

 

 

                     :---globinsoyb

     :-------------95:            

     :               :---globininse

     :                            

     :               :---globinlamp

 :100:   :---------97:            

 :   :   :           :---globinseal

 :   :   :                        

 :   :   :           :---betadog  

 :   :-62:       :-89:            

 :       :   :100:   :---betarabbit

 :       :   :   :                

 :       :   :   :-------betakangar

 :       :-92:                    

 :           :       :---alphahorse

 :           :   :-76:            

 :           :100:   :---alphadog 

 :               :                

 :               :-------alphakanga

 :                                

 :                   :---mbharbor_p

 :-----------------97:            

 :                   :---mbgray_sea

 :                                

 :-----------------------mbkangaroo

 

 

Quartet puzzling tree (in CLUSTAL W notation):

 

(mbkangaroo,((globinsoyb,globininse)95,((globinlamp,globinseal)97,

(((betadog,betarabbit)89,betakangar)100,((alphahorse,alphadog)76,

alphakanga)100)92)62)100,(mbharbor_p,mbgray_sea)97);

 

 

BIPARTITIONS

 

The following bipartitions occured at least once in all intermediate

trees that have been generated in the 1000 puzzling steps.

Bipartitions included in the quartet puzzling tree:

(bipartition with sequences in input order : number of times seen)

 

 ******...* ***  :  1000

 ***...**** ***  :  1000

 ***....... ...  :  999

 *********. .**  :  973

 *..******* ***  :  972

 ********** *..  :  953

 ***......* ***  :  915

 ******..** ***  :  892

 ***.*.**** ***  :  762

 ***....... .**  :  623

 

Congruent bipartitions occurred in 50% or less, not included in

the consensus tree:

(bipartition with sequences in input order : number of times seen)

 

 None (No congruent split not included)

 

Incongruent bipartitions not included in the consensus tree:

(bipartition with sequences in input order : number of times seen)

 

 ***..***** ***  :  226

 *********. ...  :  219

 ***......* *..  :  152

 *******..* ***  :  102

 ***....... ..*  :  29

 ***...***. ...  :  24

 ***...**** *..  :  24

 ******.... ...  :  20

 *********. ..*  :  18

 ******...* *..  :  17

 *.*....... ...  :  16

 *********. *..  :  14

 ********** ...  :  13

 ****..**** ***  :  12

 **........ ...  :  12

 ******.*.* ***  :  6

 ***...***. .**  :  4

 ******.... .**  :  2

 *.*......* *..  :  1

 

 

MAXIMUM LIKELIHOOD BRANCH LENGTHS ON CONSENSUS TREE (NO CLOCK)

 

Branch lengths are computed using the selected model of

substitution and rate heterogeneity.

 

 

                   :---------12 globinsoyb

             :----14

             :     :-----------13 globininse

 :----------22

 :           :           :-10 globinlamp

 :           :  :-------15

 :           :  :        :-11 globinseal

 :           :-21

 :              :             :-7 betadog

 :              :          :-16

 :              :          :  :-8 betarabbit

 :              :    :----17

 :              :    :     :--9 betakangar

 :              :---20

 :                   :       :-4 alphahorse

 :                   :    :-18

 :                   :    :  :--6 alphadog

 :                   :---19

 :                        :-5 alphakanga

 :

 :  :-2 mbharbor_p

 :-23

 :  :-3 mbgray_sea

 :

 :-1 mbkangaroo

 

 

         branch  length     S.E.   branch  length     S.E.

mbkangaroo    1  0.07886  0.03279      14  0.53760  0.20175

mbharbor_p    2  0.07110  0.02354      15  0.81918  0.14939

mbgray_sea    3  0.04859  0.01981      16  0.05776  0.03282

alphahorse    4  0.08771  0.03069      17  0.47954  0.09621

alphakanga    5  0.10811  0.03530      18  0.06137  0.03158

alphadog      6  0.13950  0.03661      19  0.31739  0.08389

betadog       7  0.08054  0.02747      20  0.31729  0.11569

betarabbit    8  0.07571  0.02674      21  0.07578  0.12611

betakangar    9  0.18201  0.04507      22  1.28945  0.20094

globinlamp   10  0.07676  0.02490      23  0.07198  0.03259

globinseal   11  0.01386  0.01908

globinsoyb   12  1.16848  0.23708     20 iterations until convergence

globininse   13  1.40863  0.26701     log L: -2925.01

 

 

Consensus tree with maximum likelihood branch lengths

(in CLUSTAL W notation):

 

(mbkangaroo:0.07886,((globinsoyb:1.16848,globininse:1.40863)95:0.53760,

((globinlamp:0.07676,globinseal:0.01386)97:0.81918,(((betadog:0.08054,

betarabbit:0.07571)89:0.05776,betakangar:0.18201)100:0.47954,((

alphahorse:0.08771,alphadog:0.13950)76:0.06137,alphakanga:0.10811)

100:0.31739)92:0.31729)62:0.07578)100:1.28945,(mbharbor_p:0.07110,

mbgray_sea:0.04859)97:0.07198);

 

 

TIME STAMP

 

Date and time: Mon Aug 06 07:04:29 2007

Runtime (excl. input) : 4 seconds (= 0.1 minutes = 0.0 hours)

Runtime (incl. input) : 65 seconds (= 1.1 minutes = 0.0 hours)