Saturday, June 9, 2018

Southeast & Southern African Ancient DNA

Alright, this post is long overdue and I've hopefully got some interesting data to share based on these now relatively new samples from Skoglund et al. 2017.

First, I'd like to take the time to compare modern Horn-Africans to the Tanzanian-Pastoralist sample of most likely South-Erythraeic speaking origins and then I'll address this intriguing "East-South Hunter-Gatherer cline" then finally dig into what this says about South-Erythraeic speaker admixture in Southeast and Southern-Africa. (if you're confused by the word "Erythraeic" go here)

The Horn's later genomic tack-ons

One thing a person might catch right off the bat when reading the study is that it points out that modern Horn-Africans, Somalis included, based on formal-stats, seem to share drift with both Neolithic Levantines and Iranians whereas the Southeast African Pastoralist from 3,000 years ago only seems to share drift with Neolithic Levantines:

We found that the 3,100 BP individual (Tanzania_Luxmanda_3100BP), associated with a Savanna Pastoral Neolithic archeological tradition, could be modeled as having 38% ± 1% of her ancestry related to the nearly 10,000-year-old pre-pottery farmers of the Levant (Lazaridis et al.,2016), and we can exclude source populations related to early farmer populations in Iran and Anatolia. 


While these findings show that a Levant-Neolithic-related population made a critical contribution to the ancestry of present-day eastern Africans (Lazaridis et al., 2016), present-day Cushitic speakers such as the Somali cannot be fit simply as having Tanzania_Luxmanda_3100BP ancestry. The best fitting model for the Somali includes Tanzania_Luxmanda_3100BP ancestry, Dinka-related ancestry, and 16% ± 3% Iranian-Neolithic-related ancestry (p = 0.015). This suggests that ancestry related to the Iranian Neolithic appeared in eastern Africa after earlier gene flow related to Levant Neolithic populations, a scenario that is made more plausible by the genetic evidence of admixture of Iranian-Neolithic-related ancestry throughout the Levant by the time of the Bronze Age (Lazaridis et al., 2016) and in ancient Egypt by the Iron Age (Schuenemann et al., 2017)

Seems somewhat sensible but also surprising to me. I'd have expected, based on previous data (both formal-stats and ADMIXTURE based), that Somalis didn't have Iranian-Neolithic admixture which, based on my past opinions, would've possible come to the Horn with two-waves; a wave a little before 3,000ybp from Sudan tacked-onto Highlander Erythraeic speaking populations like Agaws and a wave around 2,500-3,000ybp owed to the Proto-Ethiosemitic speaking community. 



Dinka 54.1
Natufian 40.2
Iran-Chalcolithic 4.4
Mota 1.3



Natufian 46.7
Dinka 35.3
Mota 9.3
Iran-Chalcolithic 8.7 



Natufian 41
Mota 33.5
Dinka 25.5
Iran-Chalcolithic 0

But now, as you can see above, both formal-stat methods like those of this study and nMonte utilizing PCA positions (Global10 in the case above and all cases below) find that Somalis, Habeshas and Agaws have ancestry related to Neolithic Iranians and Caucasus Hunter-Gatherers (about 60-70% of the ancestry in Chalcolithic Iranians).

Albeit, I'd say these nMonte results are vastly more sensible than the study's findings. 16 ± 3% is frankly senseless for Somalis. It would require being so Bronze-Age Levantine-like in ancestry that there is simply no way past formal stat runs like those of Pickrell et al. 2013 or ADMIXTURE runs like those of Hodgson et al. 2014 wouldn't have significantly picked up on it nor would models like these fail so miserably with nMonte:



Dinka 57.8
Levant Bronze-Age 42.2
Mota 0



Dinka 55.6
Yemenite-Jew 44.4
Mota 0

There's also no way we wouldn't have an abundance of very recent looking Y-DNA and mtDNA ties with populations like Arabians and Levantines which we largely don't, as I've outlined in the past. So, I'd say something like 3-5% Chalcolithic-Iranian-like ancestry owed, most likely, to being about 8-10% derived from a likely ancient Arabian population- :



Dinka 53.7
Natufian 34.6
Yemenite-Jew 9.7
Mota 2



Dinka 53.6
Natufian 33.7
Saudi 10.4
Mota 2.3

-is much more sensible. I say likely ancient Arabian because I managed to send Davidski over at Eurogenes 3 Copts to average out then put into Global-10 PCA so that we could see how well they fit, in comparison to the Saudi and Yemenite-Jewish samples, for Horners like Somalis and Tigrinyas:



Dinka 52.7
Natufian 35.3
Egyptian-Copt 9.3
Mota 2.7



Natufian 36
Dinka 32.4
Egyptian-Copt 19.7
Mota 11.9



Dinka 34.5
Natufian 34.3
Yemenite-Jew 20.7
Mota 10.5



Dinka 34.5
Natufian 31.6
Saudi 23.1
Mota 10.8

The better fitting is only slightly in favor of the Arabian groups but is still there, and nMonte will consistently choose them over Egyptian-Copts if both are present. This has me wondering if my long-time friend and I were incorrect about the "earlier wave" I mentioned before and if Southwestern Arabians speaking Proto-Ethiosemitic are responsible for all of the later MENA admixture in the Horn.

As for how Somalis got it... I honestly can't say with any certainty. We need more ancient DNA from the Horn, Egypt, Sudan and Arabia. From the northern Ethiopian-Highlands, from the northerly areas of the Somali Peninsula, from Yemen, from Sudan, from Egypt... Only then can we be definitive about all of this.

More of this...
Nevertheless, I suppose it's possible, given the presence of Musnad inscriptions across northerly areas of the Somali Peninsula [5], that, despite not linguistically shifting, our ancestors too were affected by migrants from Southwestern Arabia or perhaps this is a sign of inter-mixing within the Horn itself? I doubt the latter more because we don't show all that much Mota-related ancestry.

But on that note, you maybe wondering why the 3,000ybp pastoralist has so much Mota ancestry and, in truth, I don't believe that is Mota-related ancestry from the Horn itself but more likely admixture from Hunter-Gatherers found in Southeast & Southern Africa:



Dinka 47
Natufian 44.9
South-Africa-2000ybp 8.1



Dinka 44.4
Natufian 43.9
Malawi-Hora-Holocene 11.7



Natufian 41
Mota 33.5
Dinka 25.5

For one, I think the sheer magnitude of Mota-like ancestry is a bit hard to sell, especially considering how much Natufian-like ancestry is still left over. nMonte probably just prefers Mota because he has far more Dinka-like ancestry than the Southeast and Southern African HGs. In reality, my bet, especially given the presence of mtDNA L0f (often found in groups like Southeast African HGs) even among modern South-Erythraeic speaker descended peoples, the scenario went something like this:

  • South-Erythraeic speaking pastoralists made up of mostly Dinka-like ("East-African") and Natufian/LNF-related ancestry began migrating into Southeast Africa before 3,000ybp.
  • Once they got to areas like Southeast and Southern Africa, they not only started contributing ancestry to some local Hunter-Gatherer populations but acquired admixture from them as well.

Thus explaining why the 3,000ybp pastoralist sample is about ~20% or so less Dinka-like than Somalis according to Skoglund et al. 2017 (~10% in nMonte runs). Time and more ancient DNA will either affirm or refute the above... And as for why ancestry owed to Southeast and Southern African HGs could be mistaken for Mota-like ancestry, that will be addressed in the next section of this post.

But, I'd conclude this section by pointing out that the story of the Horn's admixtures looks like this so far to my eyes:

  • Most likely somewhere in the Egypt-Sudan area Dinka-like and Natufian-like people intermix over-time to form the peoples who, to this day, make-up the most significant portion of most Erythraeic and Ethiosemitic speaking Horn-Africans' ancestry. (I'd say this is what the Tanzanian Pastoralist is overwhelmingly descended from)
  • These eventual people happen to be, in my humble opinion, a part of the Sudanese-Neolithic and begin moving into the Horn sometime around 5,000-7,000ybp. They eventually also acquire admixture from the earlier inhabitants of areas such as the Ethiopian Highlands (some of whom were most likely Omotic speakers) at levels of 1-25% over the last several millennia.
  • Also, around 2,500-3,000ybp, the Horn begins to see some slight incursions from Southwestern Arabia bringing in new layers of Anatolian and Iranian Neolithic related ancestry into the region (as well as Ethiopian-Semitic). And, to my complete surprise, even Omotic speakers such as Aris were not spared eventually acquiring this sort of ancestry:


Mota 62.8
Dinka 18.1
Natufian 10.8
Saudi 8.3



Mota 60.7
Dinka 19.9
Natufian 11.7
Saudi 7.7 

I'm still a little taken aback by this and for months was skeptical (still slightly am) but if various distinct analyses methods are finding these same sort of results on a base level which is that modern Horn-Africans (including Somalis and Aris) have post Chalcolithic influences from the Middle-East whilst the 3,000ybp pastoralist lacks these elements; it must indeed be the case.

The East-South Hunter-Gatherer cline

Now this concerns why the Malawi Hunter-Gatherer ("Hora-Holocene") from about 8,100 years ago can prove, to some extent, a stand-in for Mota. This is because the study has discovered something quite intriguing which is that there once existed a cline between Southern African HGs (a more "pure" version of the modern San) and East-African HGs (essentially the "East African" cluster I've always been on about):

The genetic cline correlates to geography, running along a north-south axis with ancient individuals from Ethiopia (~4,500 BP), Kenya (~400 BP), Tanzania (both ~1,400 BP), and Malawi (~8,100–2,500 BP), showing increasing affinity to southern Africans (both ancient individuals and present-day Khoe-San). The seven individuals from Malawi show no clear heterogeneity, indicating a long-standing and distinctive population in ancient Malawi that persisted for at least ~5,000 years (the minimum span of our radiocarbon dates) but which no longer exists today.

Some of the later individuals along this cline do seem to have Erythraeic speaker related admixture alongside the deeper layer of SA-HG and EA:



South-Africa-2000ybp 46.5
Dinka 29.1
Tanzania-Luxmanda-3000ybp 18.2
Onge 6.2

With even later individuals acquiring admixture from the Bantu-Expansion such as the Kenya-500ybp and and Tanzania-Pemba-700ybp. But more on that with the next section... In this section what's most interesting for me to note is that it seems like, before the arrival of South-Erythraeic speakers and the eventual arrival of Bantu speakers; Southeast Africa was once a sort of nexus point between ancient peoples rich in ancestry related to East-African cluster and ancient people rich in South-African HG-related ancestry.

And before what was likely the Proto-Agäw-East-South Erythraeic speaking community swooped in, the Horn too, based on Mota and modern Omotic speakers like Aris, was probably also a part of this nexus point. I'm also reminded of longstanding reports and archaeology purporting that the indigenous population of Southern Somalia were "San-like" Hunter-Gatherers. [5]

What's interesting about this is that it implies, at least to me, that just north of the Horn, in areas such as Sudan (North & South) and the Chad, was likely a much more pristine "East-African" population given how you can find a more pristine South-African HG population once you go deep enough into Southern Africa's pre-history and, of course, given the presence of populations (Dinkas et al.) very rich in such ancestry in that general vicinity even today.

So, Southeast Africa and the Horn were probably once genetically sandwiched between these two clusters, one probably around Sudan and Chad and one mainly stationed around Southern Africa and it was the introduction of Natufian-like ancestry from North-Africa (discounting areas of Northeast Africa south of Egypt) and West-African related ancestry by the likes of Bantu and Nilotic speakers that broke up this zone's prior genomic diversity.

But, if you're wondering about at the "Onge" the ancient Zanzibar sample is showing, it seems to also pop up in Mota as well as the Malawi HG from 8,100ybp:



Dinka 58.2
South-Africa-2000ybp 24.9
Onge 9.5
Natufian 7.4


         Malawi Hora-Holocene

South-Africa-2000ybp 68.5
Dinka 18.8
Natufian 5.4
Onge 5.3
Tianyuan 2

It's not real Eurasian admixture as these individuals would seem unadmixed if analyzed using formal-stat methods like qpAdm (plus, it's way too broad to be real. I mean, Natufian, Onge and Tianyuan?!). It just seems to me that the Global10 PCA is probably picking up on how the ancient East-African cluster related ancestry in them has some mild sort of affinity for Eurasians. What to actually focus on are the Dinka-like and South-Africa HG-like elements.

South-Erythraeic speaker admixture in Southeast and Southern Africa

As some readers may know, I've pointed out several times in the past, based on modern DNA, that there was Horn-African related admixture in Southeast Africa (admixture similar to most of the ancestry in ethnic groups like Somalis and Oromos), something past academics have also argued using modern DNA and even linguistics as well as cultural anthropology.

And it's now quite nice to get to say that ancient DNA backs this up. The Savanna Pastoral Neolithic brought pastoralism to Southeast and Southern Africa as well as certain cultural elements (i.e. mat-tents as mentioned here) and, intriguingly, some admixture as well. 

The Tanzania-Luxmanda sample unfortunately has significant Southern-Africa HG related ancestry so she may not prove a pristine enough example of the early population that moved in Southeast Africa from the Horn and eventually contributed to groups such as the Maasai, Tutsis, Datoogas and so on but she'll have to do for now:



Tanzania-Luxmanda-3000ybp 50.8
Dinka 48.9



South-Africa-2000ybp 89.3
Tanzania-Luxmanda-3000ybp 10.7


Dinka 46.4
South-Africa-2000ybp 25.2
Tanzania-Luxmanda-3000ybp 22.7
Onge 5.7

And it seems like the Hadza are, similar to Aris, a modern relic of the old East-South cline, albeit with some admixture from Erythraeic speaking people from the Horn similar to much of the ancestry in Somalis and the Tanzanian-Luxmanda sample. So, in the end, Southeast Africa is quite the demographically interesting place, having been contributed to by the East-South cline, later Erythraeic, Nilotic and Bantu speaking migrants, some minor post Iron-Age MENA elements in groups such as coastal Swahilis and, on top of this, we're probably underestimating the effect Mambuti (Mbuti) related Hunter-Gatherers had over-time as well as I personally wasn't able to even put them in my runs given that they aren't present in the Global10 datasheet.

But at any rate, I think I'll leave it at that for now regarding this paper's myriad of intriguing findings. Hope this was interesting for anybody reading. 


  1. Oh just another question. If this really is post iron age adimixture that happen after the first natufian like adimixture event, wouldn't we expect to find ANE type dna in modern hornes? But that hasn't been found in sub-sahara yet?

  2. When you say "Southern African," what is this inclusive of? Is it just Khoisan or does this capture Ngunis, Sothos, etc.? My understanding is there's significant variance between these groupings.

  3. Thought I left a comment on this, perhaps it's stuck in comment purgatory somehow. Can you distinguish usage of "Southeast & Southern African" for your audience? I think it would be useful as Khoisan, Nguni, and Sotho populations each possess somewhat overlapping but largely distinct profiles.

  4. Salam brother
    Where did u disappear bro
    I check your blog daily for updated articles.
    I hope you are doing ok

    1. Got busy with Uni, work and personal stuff for the last few years. Might come back sometime soon, though.