Report on Genetic Distance Comparisons Between Confirmed Haplogroup R-U106 and R-P312 FamilyTreeDNA (FTDNA) Tested Haplotypes

 

Author: David Weston, Co-Administrator, R1b-U106/S21+ Research Group

 

Source Data:

 

1.                  R1b-U106/S21+ Research Group, Oct 2009

2.                  R1b and Subclades Project, Aug 2009

3.                  R-P312 and Subclades Project, Aug 2009

4.                  All FamilyTreeDNA-hosted European YDNA and Dual Geographic DNA Projects, Aug 2009.

 

Reproduction Rights:

 

Permission to reproduce this report is granted for non-commercial use only on condition that it is kept in full with this credit page.  All other uses are prohibited without written permission from the author.

 

Disclaimer:

 

This report has not been subject to peer review.  Although every effort has been made to ensure its accuracy and completeness; it may contain errors of fact and/or omission.  Notification of errors and/or omissions critical to the report conclusions may be sent to the author for document revision.

 

Table of Contents

 

Introduction. 2

Methodology. 2

Results. 2

Discussion. 3

Conclusions. 4

 

Figure 1. 12-marker Haplotype Comparisons. 5

Figure 2. 37-marker Haplotype Comparisons. 6

Figure 3. 67-marker Haplotype Comparisons. 7

 

Table 1. Summary Statistics for GD between R-U106 and R-P312 haplotypes. 3

Table 2. Summary Statistics for GD between R-U106 haplotypes. 3

Table 3. 12-marker Haplotype Comparison Data – R-U106 with R-P312. 8

Table 4. 37-marker Haplotype Comparison Data – R-U106 with R-P312. 9

Table 5. 67-marker Haplotype Comparison Data – R-U106 with R-P312. 11

Table 6. 12-marker Haplotype Comparison Data – R-U106 with R-U106. 13

Table 7. 37-marker Haplotype Comparison Data – R-U106 with R-U106. 14

Table 8. 67-marker Haplotype Comparison Data – R-U106 with R-U106. 16


Introduction

 

1.                  The question this report addresses is, “Can GD, i.e. the total difference in allele values between a YSNP-confirmed haplogroup R-U106 haplotype and an non-YSNP tested haplotype be used to determine haplogroup status?”  To answer this question, GDs between confirmed haplogroup R-U106 and R-P312 haplotypes were calculated.  Haplogroup R-P312 is the closest relative in the YDNA haplogroup tree to R-U106 (ISOGG YDNA Haplogroup 2009 R tree.).  Consequently, R-P312 is the haplogroup whose haplotypes are closest to and most likely to be confused with those of R-U106 in the absence of YSNP testing.  GDs between R-U106 haplotypes themselves were also calculated to see how the spread of GDs between R-U106 and R-P312 haplotypes compare.  As the standard haplotype length testing offered by FamilyTreeDNA, the ability to determine R-U106 clade status using GD was considered for 12, 37 and 67-marker haplotype lengths.

 

Methodology

 

2.                  Haplotype data for this report was compiled from the publicly available sources listed under ’Source Data’ on page 1 between August and October 2009.  All haplotypes were tested by and confirmed as belonging to their respective haplogroups including known subclades by FamilyTreeDNA.

 

3.                  Genetic Distances (GD) were calculated between each R-U106 and R-P312 haplotype, and counted to determine their frequency and cumulative probability of occurrence.  For comparison purposes, GDs were similarly calculated between each R-U106 haplotype and the other R-U106 haplotypes within the sample set.  The results of these calculations are summarized in Figures 1 to 3 and Tables 1 to 8.

 

4.                  GDs were calculated using a MS Excel 2003 Visual Basic Application (VBA) procedure as the sum of the differences between each corresponding allele (marker) for each haplotype pair.  The difference between Null and non-Null alleles was counted as 1.  The multi-copy markers DYS385, DYS 389, DYS459, YCAII, CDY, DYS395 and DYS413 were assumed to be pair-wise equivalent, i.e. the ‘a’ copy in one haplotype corresponds to the ‘a’ copy in the second haplotype.  For DYS464, the allele GD was calculated as the difference between the sum of the a, b, c, and d copies in each haplotype.  The alleles DYS19b and DYS464e, f and g were excluded from the GD calculations.  All figures and tables presented here were generated in MS Excel 2003.

 

Results

 

5.                  Summary statistics for GDs between R-U106 and R-P312 haplotypes, and between R-U106 haplotypes are shown in table 1 and 2.

 

Markers

Median

Min

Max

Range

1st Percentile

5th Percentile

95th Percentile

99th Percentile

12

4

0

17

17

0

1

8

9

37

18

3

44

41

9

12

26

30

67

27

5

53

50

12

19

35

39

Table 1. Summary Statistics for GD between R-U106 and R-P312 haplotypes.

 

Markers

Median

Min

Max

Range

1st Percentile

5th Percentile

95th Percentile

99th Percentile

12

3

0

15

15

0

1

7

8

37

17

0

39

39

5

11

25

28

67

24

0

47

47

7

16

32

35

Table 2. Summary Statistics for GD between R-U106 haplotypes.

 

6.                  Detailed statistics are presented in graphical format in figures 1 to 3 and tabular format in tables 3 to 8 at the end of this report.

 

Discussion

 

7.                  The statistical and graphical analysis presented here shown similar distributions around the median values for the R-U106/R-P312 and R-U106/R-U106 GDs, with only small differences in the spread comparison.  90% (difference between 5th and 95th Percentiles) of R-U106 and R-P312 haplotypes will be between a GD of 1 and 8 at 12-markers; 12 and 26 at 37-markers; and, 19 and 35 at 67-marker haplotype lengths.  This compares with a 90% of R-U106 haplotypes being between a GD of 1 and 7 at 12-markers; 11 and 25 at 37-markers; and, 16 and 32 at 67-markers from other R-U106 haplotypes.  The R-U106/R-P312 GDs are shifted to larger values by only 1 at 12 and 37-markers, and 3 at 67-markers.  This shows that the large majority of R-U106 haplotypes will be indistinguishable from R-P312 haplotypes in terms of GD since they will have the same GD values whether they are in fact belong to R-U106 or R-P312.  12-marker R-U106 and R-P312 haplotypes remain indistinguishable except at the maximum observed GD values.  However, only 9 GDs out of 874,611 calculations greater than the 12-marker R-U106/R-U106 max GD of 15; a frequency of 0.001%. 

 

8.                  Where the differences in GD between R-U106 and R-P312 haplotypes are evident is the ends of the distributions for 37 and 67-marker haplotypes.   No R-U106/R-P312 GDs were observed less than 3 on 37-markers over 709,866 calculations or 5 on 67-markers out of 435,199 calculations.  A haplotype with a GD 2 or less on 37-markers or 4 or less on 67-markers from a confirmed R-U106 haplotype therefore can be assumed to also be R-U106 to near certainty.  These are GDs typically observed between haplotypes related within the time since surnames have been in use.

 

9.                  For the R-U106/R-P312 calculations, a GD of less than 10 on 37-markers and 13 on 67-markers was observed in 1% of the comparisons.  A GD less than these values between an untested haplotype and confirmed R-U106 haplotype thus provides a strong indicator that a likely haplotype belongs to haplogroup R-U106.  A GD of less than 13 on 37-markers and 20 on 67-markers was observed in 5% of the comparisons.  A GD less than these values with a confirmed R-U106 haplotype would indicate would be a good indicator that an untested haplotype is R-U106.

 

10.              A GD of greater than 27 on 37-markers and 34 on 67-markers was observed between 1% of R-U106 haplotypes.  A GD greater than these values between an untested haplotype and confirmed R-U106 haplotype can thus be used as a strong indicator that a the haplotype in question does not likely belong to haplogroup R-U106.  A GD of greater than 24 on 37-markers and 31 on 67-markers was observed between 5% of R-U106 haplotypes.  A GD greater than these values with a confirmed R-U106 haplotype would be a good indicator that an untested haplotype is not R-U106.

 

Conclusions

 

11.              Calculations of GD between untested haplotypes and confirmed R-U106 haplotypes will not indicate haplogroup R-U106 status in the majority of cases.  12-marker GD comparisons are not a reliable indicator of haplogroup R-U106 status under any circumstance.  37-marker and 67-marker GD comparisons will indicate haplogroup R-U106 status for GDs of 2 or less and 4 or less, respectively; GDs typically observed amongst related surname groups.  A GD of 10 or less on 37-markers and 13 or less on 67-markers provides a strong indicator of haplogroup R-U106 status.  A strong indicator that an untested haplotype does not belong to haplogroup R-U106 is given by a GD with a confirmed R-U106 haplotype of greater than 27 on 37-markers and 34 on 67-markers.


Figure 1. 12-marker Haplotype Comparisons

 

(a)

(b)

 


Figure 2. 37-marker Haplotype Comparisons

 

(a)

(b)

 


Figure 3. 67-marker Haplotype Comparisons

 

(a)

(b)

 


Table 3. 12-marker Haplotype Comparison Data – R-U106 with R-P312.

 

GD

Count

Freq

Cum Count

Cum Prob

0

3726

0.00426

3726

0.00426

1

26149

0.02990

29875

0.03416

2

78546

0.08981

108421

0.12396

3

137829

0.15759

246250

0.28155

4

171676

0.19629

417926

0.47784

5

166667

0.19056

584593

0.66840

6

129887

0.14851

714480

0.81691

7

84360

0.09645

798840

0.91337

8

45150

0.05162

843990

0.96499

9

19270

0.02203

863260

0.98702

10

7181

0.00821

870441

0.99523

11

2653

0.00303

873094

0.99827

12

1021

0.00117

874115

0.99943

13

354

0.00040

874469

0.99984

14

105

0.00012

874574

0.99996

15

28

0.00003

874602

0.99999

16

8

0.00001

874610

1.00000

17

1

0.00000

874611

1.00000

18

0

0.00000

874611

1.00000

 

Sample Size: (783) R-U106 Haplotypes; (1117) R-P312 Haplotypes

 


Table 4. 37-marker Haplotype Comparison Data – R-U106 with R-P312.

 

GD

Count

Freq

Cum Count

Cum Prob

0

0

0.00000

0

0.00000

1

0

0.00000

0

0.00000

2

0

0.00000

0

0.00000

3

2

0.00000

2

0.00000

4

12

0.00002

14

0.00002

5

48

0.00007

62

0.00009

6

175

0.00025

237

0.00033

7

513

0.00072

750

0.00106

8

1266

0.00178

2016

0.00284

9

2861

0.00403

4877

0.00687

10

5648

0.00796

10525

0.01483

11

10442

0.01471

20967

0.02954

12

17334

0.02442

38301

0.05396

13

26383

0.03717

64684

0.09112

14

36654

0.05164

101338

0.14276

15

47353

0.06671

148691

0.20946

16

56113

0.07905

204804

0.28851

17

63403

0.08932

268207

0.37783

18

66709

0.09397

334916

0.47180

19

66743

0.09402

401659

0.56582

20

63143

0.08895

464802

0.65477

21

56489

0.07958

521291

0.73435

22

48199

0.06790

569490

0.80225

23

38432

0.05414

607922

0.85639

24

29822

0.04201

637744

0.89840

25

22447

0.03162

660191

0.93002

26

16258

0.02290

676449

0.95292

27

11370

0.01602

687819

0.96894

28

7655

0.01078

695474

0.97973

29

5280

0.00744

700754

0.98716

30

3483

0.00491

704237

0.99207

31

2218

0.00312

706455

0.99519

32

1325

0.00187

707780

0.99706

33

873

0.00123

708653

0.99829

34

522

0.00074

709175

0.99903

35

300

0.00042

709475

0.99945

36

163

0.00023

709638

0.99968

37

98

0.00014

709736

0.99982

38

66

0.00009

709802

0.99991

39

28

0.00004

709830

0.99995

40

20

0.00003

709850

0.99998

41

6

0.00001

709856

0.99999

42

5

0.00001

709861

0.99999

43

4

0.00001

709865

1.00000

44

1

0.00000

709866

1.00000

45

0

0.00000

709866

1.00000

 

Sample Size: (698) R-U106 Haplotypes; (1030) R-P312 Haplotypes

 


Table 5. 67-marker Haplotype Comparison Data – R-U106 with R-P312.

 

GD

Count

Freq

Cum Count

Cum Prob

0

0

0.00000

0

0.00000

1

0

0.00000

0

0.00000

2

0

0.00000

0

0.00000

3

0

0.00000

0

0.00000

4

0

0.00000

0

0.00000

5

1

0.00000

1

0.00000

6

0

0.00000

1

0.00000

7

2

0.00000

3

0.00001

8

3

0.00001

6

0.00001

9

11

0.00003

17

0.00004

10

35

0.00008

52

0.00012

11

63

0.00014

115

0.00026

12

178

0.00041

293

0.00067

13

365

0.00084

658

0.00151

14

740

0.00170

1398

0.00321

15

1431

0.00329

2829

0.00650

16

2418

0.00556

5247

0.01206

17

4160

0.00956

9407

0.02162

18

6444

0.01481

15851

0.03642

19

9244

0.02124

25095

0.05766

20

13200

0.03033

38295

0.08799

21

17309

0.03977

55604

0.12777

22

21918

0.05036

77522

0.17813

23

26048

0.05985

103570

0.23798

24

29931

0.06878

133501

0.30676

25

33043

0.07593

166544

0.38268

26

34645

0.07961

201189

0.46229

27

34355

0.07894

235544

0.54123

28

33368

0.07667

268912

0.61791

29

31376

0.07210

300288

0.69000

30

28212

0.06483

328500

0.75483

31

24658

0.05666

353158

0.81149

32

20687

0.04753

373845

0.85902

33

16541

0.03801

390386

0.89703

34

13046

0.02998

403432

0.92701

35

9625

0.02212

413057

0.94912

36

7114

0.01635

420171

0.96547

37

5031

0.01156

425202

0.97703

38

3587

0.00824

428789

0.98527

39

2332

0.00536

431121

0.99063

40

1578

0.00363

432699

0.99426

41

1023

0.00235

433722

0.99661

42

589

0.00135

434311

0.99796

43

385

0.00088

434696

0.99884

44

221

0.00051

434917

0.99935

45

119

0.00027

435036

0.99963

46

76

0.00017

435112

0.99980

47

48

0.00011

435160

0.99991

48

15

0.00003

435175

0.99994

49

12

0.00003

435187

0.99997

50

5

0.00001

435192

0.99998

51

3

0.00001

435195

0.99999

52

1

0.00000

435196

0.99999

53

3

0.00001

435199

1.00000

54

0

0.00000

435199

1.00000

 

Sample Size: (563) R-U106 Haplotypes; (773) R-P312 Haplotypes

 


Table 6. 12-marker Haplotype Comparison Data – R-U106 with R-U106

 

GD

Count

Freq

Cum Count

Cum Prob

0

4251

0.01389

4251

0.01389

1

19273

0.06295

23524

0.07684

2

42087

0.13747

65611

0.21431

3

58902

0.19239

124513

0.40670

4

60925

0.19900

185438

0.60570

5

50990

0.16655

236428

0.77225

6

34344

0.11218

270772

0.88443

7

19677

0.06427

290449

0.94871

8

9789

0.03197

300238

0.98068

9

4010

0.01310

304248

0.99378

10

1382

0.00451

305630

0.99829

11

403

0.00132

306033

0.99961

12

101

0.00033

306134

0.99994

13

16

0.00005

306150

0.99999

14

2

0.00001

306152

1.00000

15

1

0.00000

306153

1.00000

16

0

0.00000

306153

1.00000

 

Sample Size: (783) R-U106 Haplotypes

 


Table 7. 37-marker Haplotype Comparison Data – R-U106 with R-U106

 

GD

Count

Freq

Cum Count

Cum Prob

0

9

0.00004

9

0.00004

1

16

0.00007

25

0.00010

2

21

0.00009

46

0.00019

3

41

0.00017

87

0.00036

4

61

0.00025

148

0.00061

5

125

0.00051

273

0.00112

6

273

0.00112

546

0.00224

7

594

0.00244

1140

0.00469

8

1149

0.00472

2289

0.00941

9

2128

0.00875

4417

0.01816

10

3693

0.01518

8110

0.03334

11

6074

0.02497

14184

0.05831

12

8973

0.03689

23157

0.09520

13

12548

0.05158

35705

0.14678

14

16370

0.06730

52075

0.21408

15

19445

0.07994

71520

0.29401

16

21998

0.09043

93518

0.38445

17

23139

0.09512

116657

0.47957

18

22934

0.09428

139591

0.57385

19

21570

0.08867

161161

0.66252

20

18989

0.07806

180150

0.74059

21

16067

0.06605

196217

0.80664

22

13163

0.05411

209380

0.86075

23

10249

0.04213

219629

0.90288

24

7595

0.03122

227224

0.93411

25

5411

0.02224

232635

0.95635

26

3827

0.01573

236462

0.97208

27

2537

0.01043

238999

0.98251

28

1627

0.00669

240626

0.98920

29

1103

0.00453

241729

0.99373

30

690

0.00284

242419

0.99657

31

368

0.00151

242787

0.99808

32

212

0.00087

242999

0.99896

33

116

0.00048

243115

0.99943

34

72

0.00030

243187

0.99973

35

34

0.00014

243221

0.99987

36

14

0.00006

243235

0.99993

37

11

0.00005

243246

0.99997

38

6

0.00002

243252

1.00000

39

1

0.00000

243253

1.00000

40

0

0.00000

243253

1.00000

 

Sample Size: (698) R-U106 Haplotypes

 


Table 8. 67-marker Haplotype Comparison Data – R-U106 with R-U106

 

GD

Count

Freq

Cum Count

Cum Prob

0

4

0.00003

4

0.00003

1

9

0.00006

13

0.00008

2

11

0.00007

24

0.00015

3

12

0.00008

36

0.00023

4

14

0.00009

50

0.00032

5

23

0.00015

73

0.00046

6

39

0.00025

112

0.00071

7

31

0.00020

143

0.00090

8

37

0.00023

180

0.00114

9

79

0.00050

259

0.00164

10

164

0.00104

423

0.00267

11

278

0.00176

701

0.00443

12

484

0.00306

1185

0.00749

13

734

0.00464

1919

0.01213

14

1199

0.00758

3118

0.01971

15

1887

0.01193

5005

0.03164

16

2732

0.01727

7737

0.04891

17

4132

0.02612

11869

0.07502

18

5467

0.03456

17336

0.10958

19

7233

0.04572

24569

0.15530

20

8973

0.05672

33542

0.21202

21

10325

0.06526

43867

0.27728

22

11697

0.07394

55564

0.35122

23

12629

0.07983

68193

0.43105

24

12990

0.08211

81183

0.51316

25

12966

0.08196

94149

0.59512

26

12201

0.07712

106350

0.67224

27

11038

0.06977

117388

0.74201

28

9872

0.06240

127260

0.80441

29

8221

0.05196

135481

0.85637

30

6481

0.04097

141962

0.89734

31

4964

0.03138

146926

0.92872

32

3796

0.02399

150722

0.95271

33

2621

0.01657

153343

0.96928

34

1770

0.01119

155113

0.98047

35

1270

0.00803

156383

0.98850

36

749

0.00473

157132

0.99323

37

438

0.00277

157570

0.99600

38

280

0.00177

157850

0.99777

39

157

0.00099

158007

0.99876

40

89

0.00056

158096

0.99932

41

49

0.00031

158145

0.99963

42

29

0.00018

158174

0.99982

43

13

0.00008

158187

0.99990

44

10

0.00006

158197

0.99996

45

2

0.00001

158199

0.99997

46

2

0.00001

158201

0.99999

47

2

0.00001

158203

1.00000

48

0

0.00000

158203

1.00000

 

Sample Size: (563) R-U106 Haplotypes