Friday, October 30, 2015

Inferences that can be made from a Sudanese Arab's Gedmatch results

A while back I said I'd be sharing the various genomic results of both a Sudanese Arab man & a Nubian woman who were commercially tested (via 23andme) because this would allow for the utilization of more SNPs when dealing with analyses like admixture or PCAs.

The reason I wanted to do this was because while Dobon et al. 2015 sampled a lot of Nubians & Sudanese Arabs for about 200,000 SNPs which should allow for high quality analyses; they utilized a bad genotyping chip / a chip that overlaps in SNPs badly with the chips used for sequencing genomes in other datasets.

For example, the admixture analysis above from Dobon et al. 2015, if I recall correctly, and was told by one of the study's researchers via email was that it only utilized about 15,000 SNPs and, indeed, the decent enough PCAs I shared in my post about Dobon et al. also used such a low number of SNPs.

It's usually preferable for PCAs like the one below, for example, to utilize 100,000 SNPs or more for high quality results:

The PCAs I shared are good enough and the inferences that can be made from them like Bejas seeming as though they are intermediates between certain Horn Africans (Somalis, Habeshas, Wolaytas, Oromos) & Sudanese Arabs + Nubians aren't wrong at all. [note]

On the other hand, when dealing with analyses like admixture analyses which become increasingly unstable & therefore unreliable with the lower number of SNPs you use; such a low number of genetic markers is problematic.

So what I essentially wanted to do was circumvent this issue by taking two people who were not tested in this study and whom were sampled for anything between 500,000 to 1,000,000 SNPs (like at 23andme) & run them through some admixture analyses in order to  reliably compare them to modern populations in terms of for example just how much ancestry they may share with Somalis & Habeshas.

I'm happy to say the data very much supports what was noticeable via the PCAs I shared that were ultimately owed to David Wesolowski who runs the Eurogenes genome blog and project.

Basically, what the results for Eurogenes K=36 suggest is that Horn Africans such as Somalis & Tigrinyas in terms of more recent ancestry; are closer to one another which explains why in various PCAs like the one below; we cluster off together rather than either population clustering with Sudanese populations instead:

More recent ancestry being ancestry I suppose more along the lines of like the last few thousand years: in terms of such ancestry its obvious most Horn Africans of Cushitic & Ethiopian Semitic speaking populations are closer to each other than either is to Sudanese Arab or Nubian groups as the PCAs suggest and this Sudanese Arab's admixture results do as well.

Although another inference that can be made is that it seems quite evident lots of likely post-Neolithic ancestry is shared between Horn African groups like Somalis & Northern Sudanese groups like Sudanese Arabs.
I say this once again based on these Eurogenes K=36 results. Firstly it must be noted that these results were acquired via Gedmatch which allowed me to utilize David Wesolowski's K=36 admixture calculator; however, these results are skewed to some extent by the notorious calculator effect.

This is why for example, the Eritrean Tigrinya, this Sudanese Arab chap & I are showing "East African" which is actually a Maasai peaking / seemingly based component from what I understand. In the original table David made for this analysis; things are more fine tuned and not effected by the calculator effect and Somalis alongside Tigrinyas are missing components like "East African".

However, the Northeast African component here peaks in Somalis & is somewhat similar to the "Ethio-Somali" component or the "Lowland East Cushitic" component. And of course, even if it is now being shown via a calculator effect; the "East African" component carries with it Horn African-related ancestry.

This Sudanese Arab along with Dobon et al.'s Nubians & Sudanese Arabs as the global PCA below including them (that's a bit messy because of the low number of SNPs utilized) suggests- :

-is on a rather fundamental scale very similar to Horn African populations & essentially a mixture between West Euraisan ancestry and African ancestry related to the kind of ancestry that forms the non-Niger-Congo-related ancestry in populations such as Dinkas. [note]

In fact, in this regard he is somewhat more similar to Tigrinyas than I am as he is closer to them in West Eurasian ancestry levels than Somalis are as you can see below:

 However, the difference seems to be that he has some much more "recent" West Eurasian-derived ancestry than either Somalis or Tigrinyas do, as evidenced by his higher "Arabian", "Near Eastern" & "Eastern Mediterranean" scores than either me or my Tigrinya colleague. [note]

Another distinction seems to be that he has much much recent Nilo-Saharan speaker-related ancestry evidenced by his "Central African" score (peaks in South Sudanese, if I recall correctly) which also explains why, in various admixture analyses available at Gedmatch, he consistently displays Niger-Congo-related ancestry which is present in contemporary South Sudanese populations and seemingly also populations like Darfurians.

The "East African component" based ancestry in Horn Africans while seemingly related to the non-Niger-Congo ancestry in Nilo-Saharan speakers such as Dinkas; isn't exactly like them & is seemingly mostly to entirely pre-historic in origin / extremely ancient hence why we lack the likely ancient & substantial Niger-Congo-related ancestry present in populations like Dinkas; our non-West Eurasian ancestry is not derived from them.

However, in this Sudanese Arab's case it does seem as though he derives actual ancestry from groups like the South Sudanese rather than pre-historic populations similar to much of the ancestry in them. This is quite understandable as many Sudanese Arabs used to be actual Nilo-Saharan speakers prior to their Arabization, and Nubians are Nilo-Saharan speakers.

As for the signals of perhaps post-Neolithic Horn African-related / Somali-like ancestry showing in this Sudanese Arab, I would have liked to see the results of a Nubian in these kinds of analyses but what this likely suggests to me is that this is due to ancestry from actual early to not so early Cushites being present in Sudanese Arabs & Nubians. [note]

As it is suggested via both archaeological and linguistic studies; in many cases, some of the earliest inhabitants of Northern Sudan (as early as the Neolithic) were in fact people of Cushitic speaking origins who were eventually assimilated by the Nilo-Saharan speaking ancestors of populations like Nubians just a few thousand years ago. [2] [3]

The Kerma culture's people are suggested by some linguists to have been Cushitic speakers

Then there's of course the strong & long-time presence of actual Cushitic speakers seemingly closely related to Horn African populations such as Bejas [note] who are, in some cases, thought to be a relic of not just Sudan's early Cushitic speaking nature but also that the ancestors of Horn African Cushitic speakers originally migrated into the Horn from northerly regions such as Sudan.

Further linguistic, archaeoligical & genomic study is needed to confirm these possibilities of shared ancestry from just say the last few thousand years or so between Nubians and Somalis for example (the sampling of ancient genomes would be the most ideal)...

Northeast Africa
Nevertheless, for now those are the best inferences I can make using this one man as a proxy for Dobon et al.'s various samples whom he seems representative of. Granted, one should recall that at least in terms of admixture levels (proportions of West Eurasian and African ancestry); Sudanese Arabs and Nubians seem quite heterogeneous.

For now my bet would be that Nubians for example are a mixture between earlier Cushitic or generally Afro-Asiatic speaking peoples perhaps genetically similar to Somalis, populations perhaps similar to Copts from earlier periods of Egypt & populations similar to contemporary Nilo-Saharan populations such as Dinkas with Sudanese Arabs being all that + some later Arabian-related gene flow. [note]

Only further genomic study of these groups will tell and what I wrote in the above paragraph is honestly just an educated guess on my part based on our current data.

Reference List:


1. Eurogenes ANE K=7 mostly inflates the ENF & WHG-UHG scores of people who've been run through it (Northwest Africans turning up as over 20% WHG-UHG as opposed to the more fine-tuned K=8's ~15% values) though its ENF scores are not inflated by much (Somalis hopping from 41% in K=8 to 42% in K=7). This is because the run was honestly mostly designed to spot ANE-related ancestry & not as much care was put into the other components. [note] Sub-Saharan African is just a result of me adding the "East African" & "West African" components in ANE K=7 together to neaten things up.

3. A friend more knowledgeable than I in this particular subject's notes on the Niger-Congo-related admixture in groups like the South Sudanese: Link to note


  1. Great analysis. I think the addition of a Nilotic sample displayed with the others would be good for a visual effect of the discontinuity. Although we assume readers know of how it would look, assumptions are only assumptions.

    1. Thanks & well; you can see the "discontinuity" in the PCAs, I guess. With Sudanese Arabs, Nubians & various Cushitic and Ethiopian Semitic speaking Horn African populations pulling between West Eurasians (+ North Africans) & populations such as the "South Sudanese".

  2. Excellent blog post. Thanks for this Awale.

    I always viewed north sudanese to be a real example of what some lay people view the horn to be.
    Mainly a legit admixture between 'real Arabians' + Egyptians and 'real nilotics' relatively recently. I guess I was sort of correct though they do seem to have "cushitic" admixture as well.

    1. "Excellent blog post. Thanks for this Awale."

      Thanks, man. And you're welcome as well. I found some new samples and plan to do yet another post on Sudanese Arabs and Nubians, hopefully I'll get to learn and redefine somethings.

      "I always viewed north sudanese to be a real example of what some lay people view the horn to be.
      Mainly a legit admixture between 'real Arabians' + Egyptians and 'real nilotics' relatively recently. I guess I was sort of correct though they do seem to have "cushitic" admixture as well."

      Yep, they are, in a way, what various people think Horners are. Granted, a lot of folks out there oddly think we're "Arab + "Bantu"" which highlights their ignorance because there weren't any Bantu speakers or people culturally associated with them in even Southeastern Africa when our pastoralist ancestors first began hitting up the Horn around 3,000-4,000 years ago or so. ;-)

  3. @Awale Do you think that the Northeast African component of k36 is safe enough to assume Eastern cushitic admixture/ancestry or could south cushitic still appear in the northeast African category?