Genealogy and Autosomal DNA Matches: Common Errors in “Proving” An Ancestor, and the Allure of Easy Gateway Ancestors

 

What we aim to accomplish in this post:

  1. Illustrate a common misconception in the use of autosomal DNA in genetic genealogy; namely, assigning a single shared chromosome region to a specific common entry in two pedigrees, an ancestor, presumed to be the source of the shared DNA common to both people who tested.
  2. Define a spectrum of “proof” uses of autosomal DNA matching.

First, copying a label warning from previous posts:

Warning: Genetics is a field with a steep learning curve; thus Genetic Genealogy likewise has a steep learning curve.   The following will be a bit technical.   If it seems hard don’t worry, it’s supposed to be.

With the rising use of direct to consumer (DTC) autosomal DNA testings (such as through AncestryDNA, 23andMe, or Family Tree DNA) for the purpose of genealogy we have seen also a great number of posts in blogs, Facebook, etc. on using chromosome browsers and matching chromosome segments to prove an ancestor.

And now there are services popping up on the internet that claim to be able to give you a DNA match to famous ancestors. Like with many third party “ethnicity” websites, my advice to the consumer is an old one: Caveat Emptor.

To explain why, we have to review many related concepts.

The first principle with which we must come to grips is this:

☞ DNA (of any kind) directly gives us a way to construct clades (represented in graphical form by cladograms), but does not construct family trees as we know them in genealogy.

And the corollary:

☞ Family trees are constructed from a variety of evidence and must permit the DNA data as empirical evidence.

 

Let us first compare a cladogram to a family tree.
What is a cladogram? A cladogram is a way of showing relationships among living things, the relationships based on morphology (classically, derived characteristics) or through DNA. There are many possible ways to draw a cladogram but my favorite is the horizontal format. A classic cladogram for us apes is thus:

 

Cladogram of Apes

Fig. 1: Cladogram of Apes, annotated.

 

 

Contrast this with the classic scheme for a descendant chart, often used in genealogy:

descendant chart v2

Fig. 2: A Simple Descendant Chart, a paradigm used much in genealogy.

 

 

The family tree, in this case represented by a descendant chart, includes a parent for each child. This is in contrast to the cladogram, which does not have a parent for each (or any) living organism included.

The practical import of this difference between a cladogram and a family tree will become more clear as we place DNA data on what I will call a genetic genealogy proof spectrum, for the lack of a better phrase. DNA data can be used to prove or disprove relationships within certain boundaries depending upon the nature of the relationship:

Proof Spectrum of Autosomal DNA Use

Fig. 3: Proof Spectrum of Autosomal DNA Use In Genealogy

 

Practical definitions in the “proof spectrum”:

“Defined” – the relationship between two people can only fit one model, either parent/child, or full siblings (or equivalent.) This is clear in autosomal DNA testing as parent/child are half-identical across all their autosomes and for the X between son and mother, or daughter and both parents. As has been noted by others, these autosomal tests are excellent paternity tests.

“Delimited” – there are a limited (tractable) number of possible genealogical relationships, the ambiguity of which can be eliminated by a DNA test of the right third party. The relationships are easy to determine (assuming one is able to do DNA testing) because relationships like grandparents and half siblings and aunts and uncles are seen readily by two people being half identical over large regions of several chromosomes. 3rd cousins are at the weak edge of this category, as the randomness of inheritance can on rare occasions lead to genealogical 3rd cousins not sharing any half-identical chromosome regions.

“Evidenced” – the existence of a common ancestor between two people is clear because of the presence of a statistically significant shared region of a chromosome, but the genealogical relationship cannot be defined (other than the relationship will not be in the set of the closer relationships which are described above) based on the DNA tests of the two individuals. There is no guarantee that any two distant genealogical cousins will share any statistically significant chromosome regions, and the likelihood of two cousins, beyond 4th cousins, sharing these regions is quite small.

Here now we come to a dilemma faced by those who are using autosomal testing for genealogy: Nearly all your DNA matches will be of distant cousins, and as more customers are gathered by the companies the number of your distant matches will grow to be very many. For example, today on AncestryDNA it is not unusual for an American to have over 5000 matches, yet only a handful will fall into categories closer than 4th cousin.

The few matches one may have who are close relatives may be either expected or be a shocking surprise, but because close relatives will share statistically significant identical regions on several to many chromosomes the closeness of a relationship will not be in doubt, and if the other party is willing to share information one can arrive at the identity of the common ancestors. [There are exceptions of course, see the end of this blog entry.]

However, given so many distant cousin matches, we are tempted to use these distant matches to try and “prove” our family trees, but all too often the novice falls into the following algorithm, which is a trap:

  • Step 1) Person B matches you (person A.)
  • Step 2) Ancestor Z is in B’s pedigree and in your pedigree.
  • Step 3) Therefore the match proves you (A) and B are descended from Z and the DNA you share came from Z.

The above algorithm is flawed. The reason is straightforward – how do you prove that the piece of DNA, a shared chromosome region, did indeed arise from ancestor Z and not from another ancestor?

Hint – you can’t, at least easily; we’ll cover that a bit more later.

We have a great many ancestors even from the time of the first European colonization of the Americas. For finding from which ancestor we inherit a shared chromosome region with another person we must come to terms with the many possible overlapping ancestors between any two people whose ancestors lived in the same part of a continent.

Now the more aware consumer will jump up and shout CHROMOSOME BROWSER!!

Alas, it is not that simple.

Even with a chromosome browser available, DNA does not come with little labels attached saying from which person centuries ago said bit of DNA came through. A process of “triangulation” may be used, but its value as a proving-mechanism is based upon going from the known to the unknown.

Otherwise, without starting from a known ancestor (say a tested grandparent), if one has a group of people who are matches to each other for the same region of a particular chromosome one may be able to lay out the pedigrees of all the group members and discover that there is only a single person, or a couple, common to all pedigrees. In these cases the hypothesis will be that one ancestor (or couple) who is common in all the pedigrees is the source of the shared chromosome region, but if the effort to identify this common ancestor does not start from a (later, descended) known person we are still at a hypothesis and not a conclusion.  The shared DNA in these cases is simply evidence, not proof.

There are problems in the use of “triangulation” that may be overlooked by the inexperienced.   My goal in this post is not to review triangulation, which is covered in blog posts by genetic genealogists and providers of support software, and in a few books now available.   But I do want to point out two pitfalls about which to be aware: 1) the diploid problem, and 2) the existence of too common identical chromosome regions (see here) being found in the customer sets of DTC testing companies. Both of these pitfalls may come into play as we tackle one of the instigations for this blog entry: the claim that one can know deep, often “gateway”, ancestors with DNA.

As noted at the beginning of this blog entry, there are now outfits claiming to be able to tell you your “Gateway” ancestor(s), by submitting your autosomal test results to them. These claims are unsubstantiated and more directly do not follow proven methods of determining relationships.

Simply because one shares a small chromosome region, or even worse only a limited set of SNP alleles, with a group of other people all claiming descent from some particular ancestor does not mean that person is your ancestor too!

Let us look at an example, where you and three others (A, B, and C, not of your immediate family) all get autosomal tests, and submit your raw data to a service claiming to give you a “gateway” ancestor based on matching:

Marketing vs. Biology

Fig. 4: Marketing vs. Biology

 

On the left of the picture is what the vendor may want to sell you; on the right is the conclusion that can be based on what your raw data says.

The reason it is so hard to determine from whom a shared chromosome region may arise is because of the vast number of arrangements of possible descent. Here are but two scenarios based on 4 people sharing a chromosome segment:

Two Descendancy Scenarios

Fig. 5: For any set of distant cousin matches, there exists a vast number of possible descendancy paths from the most recent common ancestor of all those involved. Here two scenarios are provided for 4 people matching each other.

 

In Case 1, A and B are 4th cousins 1x removed. You and C are 6th cousins to A and 6th cousins 1x removed to B. Only A and B descend from the “Gateway Ancestor”, while said ancestor is a 4th great grand uncle (or aunt) relationship to you and C. The reason some matching  algorithm might declare that both you and C descend from the Gateway Ancestor is because both of you match two people (A and B) who in fact do descend from the Gateway Ancestor. However, the DNA you all share come from a founding couple who were the parents to the Gateway Ancestor.

Case 2 presents an even more insidious example of how descendancy over time will obscure actual DNA inheritance paths. In this case person B is indeed the 4th great grandchild of the Gateway Ancestor, an ancestor so named because one of their parents was either famous or came from a famous family line. Person A thinks they are descended from the same Gateway Ancestor, but unbeknown to him is that his great grandparent was only a half sibling to B’s 2nd great grandparent, because of an unrecorded parentage (benignly so or not.) Meanwhile you and C, even though you match B, are the 2nd cousin 6x removed of the Gateway Ancestor, but not on the path with the famous person. And because you have some pedigree collapse going on, you are also the 2nd cousin 5x removed of the Gateway Ancestor, as well as being the 5th cousin 1x removed to C (besides also being 8th cousins with him.) Yet some matching service might declare that you are descended from the Gateway ancestor, because you match B and probably several others who are indeed descended from the Gateway ancestor.

By the time we get back to 4th, 5th, 6th and so on great grandparents there is an immense number of possible family connections. Even though you have only a low probability of having an autosomal match with any specific distant cousin, because you have millions of cousins since the time of colonialism you will eventually end up with a very large list of matches.

Trying to make genealogical proof arguments out of single segment DNA matches is a challenge for which I believe few are prepared. To unravel these deep connections will take much time and money as large groups of people will need be tested, and uniparental DNA (Y and mitochondrial) testing may be required to rule out possible lines of descent. Given large enough data sets of tested individuals some innovations may eventually be accepted as “proof”, such as AncestryDNA’s “Circles”. However, the AncestryDNA Circles, and even more so AncestryDNA’s recently introduced New Ancestor Discoveries, are still in the early stages of being user tested and for now cannot alone be used as “proof” in genealogy.

So, beware if someone wants you to send them your raw data and some money, to get a certificate or document or just an email claiming you are descended from a Somebody or a Gateway Ancestor.

End note: As mentioned above, there are problems with DNA matching even closer cousinry, for specific people. Here we find the wacky DNA world world of closed communities – there exist human communities which have arisen from a small number of founders, whose descendants only marry with each other, and in these situations even multiple segment matches can be misleading. Well known examples include Ashkenazi Jews and Pacific Islanders. For people from such populations, even multi-segment matches across several chromosomes could still be distant cousins.

Additionally, if two people match closely and if the two are both adoptees and do not know either of their birth parents, it may not be clear on how to define the tested relationship without the serendipity of having other, closer, matches appear.   Adoptees are recommended to “fish in all ponds” to increase the chance that serendipitous matches appear.

Bookmark the permalink.

2 Comments

  1. Thank you for your intelligence analysis based on logic and math. I, too, have seen so much that is assumed in the result of many of these dna companies that they
    really cannot claim many of their results to be scientific. Sincere admiration for the
    concise description of some pitfalls of dna commercialism, and lack of total pristine scientific, mathematical and analytic logic…..

  2. Thank you for your post. I had to read it twice (no science background) but it’s quite a quite helpful analysis. I wanted to tell you that I’ve included your post in my NoteWorthy Reads post for this week: http://jahcmft.blogspot.com/2015/04/noteworthy-reads-11.html

Leave a Reply