Somali mtDNA frequencies

While like I've said; I don't touch upon Uniparental data (Y-DNA & mtDNA) too much, I still thought sharing this data would be rather interesting especially when you consider that I've gathered the mtDNA markers of about 75 ethnic Somalis who are on 23andme alongside the data from Mikkelsen et al. with a sample size of 190 Somalis.
And what's ultimately intriguing and why I've taken the time to quickly share these frequencies is how much the study & the 23andme set correlate.

I've essentially colored things based on the larger mtDNA lineage each subclade belongs to. All L2 lineages are colored blue, M lineages are colored crimson etc etc. 

The only meaningful differences would be for example that in Mikkelsen et al's non-L lineages (M & N) are slightly more common at roughly 37-39% whilst this goes down to 31-32% in that 23andme set (granted, the sample size is lower, I suppose). 

Otherwise the noticeable correlation is quite neat, I must say. The fact that L3 is the most prevalent non-M & N lineage. The fact that L2a is very common in all two sets and the general overlap in markers (the same markers appearing again). 

At any rate; I suppose I figured this was interesting data and worth plunging into one little place like this blog post for some to view. I'm not going to go into much detail at all here like the history of this or that Haplogroup. Sharing these frequencies should be enough because as I said; Uniparentals are not my forte; some blunders on my part pointed out by the helpful Zam in the comment section which led to a remodelling of this post are testament to that...

This is essentially the mtDNA data of about 265 Somalis most of whom are unrelated [check notes] and from differing regions across Greater Somalia in various cases, from three different sources and they correlate which in my humble opinion speaks rather strongly to how "representative" they are so enjoy this interesting data.

1. Methodology behind how the 23andme samples were gathered is gone into here.

Not everything Cushitic south of the Horn is South Cushitic

Not all of what seems to be Cushitic genetic influences South of the Horn of Africa should be perceived as South Cushitic admixture which is something I've hoped to mention for a while now for various reasons.

Map of languages spoken in Kenya

Some are overt & obvious and likely known to anyone at all familiar with a country like Kenya which is that there's a strong "East Cushitic" presence (Lowland East to be exact) via the presence of Oromos, Rendilles & Somalis all over Northern Kenya with Somalis even being the predominant ethnic group in Northeastern Kenya for one.

Other examples go more deep than that though with anthropologists like Daniel Stiles explicitly stating that the Azanians of antiquity seemed to be culturally more East Cushitic (as in more similar to Lowland East, Sidamic and Dullay speakers). [1]

Mainland Southeast Africa

Azanians were essentially a rather advanced culture in antiquity (going back to the 1st Century CE or earlier) which traded with outside groups like Greeks, Romans & South Arabians and are thought to have been found perhaps as far south as the Rufiji river in Tanzania. [2] It says something that upon examining sites associated with these "Azanians" in Northern Kenya; Daniel Stiles figured they looked more like an East Cushitic culture than a South Cushitic one.

"The solution that Mous proposes is that the Mbugu people originally spoke a form of (Old Kenyan) Cushitic but, probably some time after arriving in the Pare mountains (49), there was a rapid shift to a completely Bantu (Pare) grammar (83)
The IMb lexicon has various sources, including Nilotic (Maasai), Southern Cushitic (Gorwaa and related languages), Eastern Cushitic (represented nowadays by Boraana Oromo and Dahalo, but possibly closer to Yaaku (Heine 1975)), and Bantu (notably Shambaa and phonologically adapted NMb).  
Mous argues convincingly that the Eastern Cushitic contributions are from an Old Kenyan Cushitic source and constitute the oldest lexical elements in IMb. This hypothesis is consistent with the most likely migratory route of the Mbugu and, if we assume that the Mbugu have always had a cattle culture, it is supported by the fact that much of the most detailed cattle terminology in IMb is from this source (43). 
Mous addresses the possibility that some of the Eastern Cushitic words in IMb could have come from Taita (Sagala, Davida) which although Bantu has undergone significant Cushitic influence (see Ehret & Nurse 1981). Of the IMb words shared with Taita that Mous lists on p.36 none that I am aware of have cognates in Mijikenda, despite the fact that there has been a significant Mijikenda influence on Taita (Sagala). This suggests either that “the Taita words in Inner Mbugu are remnants of an Old Kenyan Cushitic presence” (36) or that they are later borrowings from Taita which were either taken before the Mijikenda influx or were selectively borrowed in order to ‘screen out’ obviously Bantu words. The first scenario seems the more likely."

Above are excerpts from a paper on the Mbugu language a mixed Bantu language from Tanzania with strong Cushitic elements some of which are not entirely South Cushitic but instead East/Eastern Cushitic in nature from a rather old source. [3]  And this linguist isn't alone in finding this influence since Roland Kießling found substantial East Cushitic influences in Mbugu as well. [4]

There's also the presence of lineages like E-V32 and E-V22 among groups like the Turkana in Kenya and even some Maasais like in a recent paper.

E-V32 is very common in East Cushites like Somalis & Oromos [3] and even "North Cushites" like Bejas [5] while E-V22 via this new paper (Trombetta et al. 2015) seems to possibly extremely common among Sahos who are Lowland East Cushitic speakers like Somalis & Oromos. [6] 

However E-V32 and E-V22 while common in various Nilo-Saharan speakers although those speakers of Nilo-Saharan are mostly from Western Sudan [8] as a friend notes below:

"Turkana are Northwest Kenyans and Maasai migrated from there recently, so it seems plausible that they have some East Cushitic admixture, which could explain the V32. Or it could be from their Nilo-Saharan ancestry, although V32 is mostly found in Western Sudan."

So these lineages could seriously be owed to East Cushitic input or perhaps not and I'm just speculating too much here. But it is a fact that there has been intermingling between Nilo-Saharans and East Cushites; a fact Rendilles are a living testament to with the substantial Nilotic influences in their culture that nearly everyone that's every studied them in one way or the other has noted.

I wouldn't at all put it beyond these Turkanas and certain Maasais to honestly have some East Cushitic admixture.

One thing I and some others have noticed that could be an autosomal distinction between South and East Cushites could be what turns up at the higher Ks of various ADMIXTURE analyses whether its in studies or those made by independent more genome bloggers running ancestry projects.

ADMIXTURE analysis from Hodgson et al. 2014 [7]


And that's that like at K=5 (look at the numbers to the left on the ADMIXTURE analysis from Hodgson et al.) where Somalis and other Horn populations like Oromos and Habeshas alongside Wolaytas show more "Northern West Asian" like influences by showing "European" at this K alongside "Arabian".

And in various ADMIXTURE runs like HarappaWorld, Eurogenes K13 and Dodecad Globe13 where Somalis will consistently show "Mediterranean" to sometimes also "Caucasian" like influences while Maasais and some other seemingly mostly South Cushitic admixed will be utterly bereft of such influences in the same analyses (granted, the Maasai show some "East Med" in K13 but no "West Med").

Myself and others think this may be indicative of West Eurasian ancestry carrying gene flow into populations like Agaws & Somalis that perhaps never managed to hit South Cushites.

But the thing is while I haven't managed to confirm this as most of these runs tend to add all of Pagani et al.'s Oromos together for these kinds of analyses which skews things given their heterogeneity; I've been told by some reliable individuals that Borana Oromos may also lack such "Mediterranean" to "Caucasian" like influences.

You can also easily see that Ari Culivators who are Horn residing Omotic speakers with non-negligible Cushitic admixture like the Maasai don't show such influences and while I can't of course confirm this; I think it's more plausible that the admixture in them might be East Cushitic rather than Southern but we'll need more data and perhaps ancient genomes to really be sure.

So the point is; this may not entirely be indicative of a true "East-South genetic divide" among Cushitic speakers but it's the closest thing we've got on an autosomal level that at least I and those I tend to discuss East African population genetics with have noticed. It could turn out to not be much or anything...

But this could still prove relevant in that if a Southeast African population via various ADMIXTURE analyses demonstrates more "Mediterranean" to "Caucasian" like influences this could be indicative of more recent gene flow from East Cushites like how Datogs/Datoogas show tiny hints of the pink "Early European Farmer" component (essentially an equivalent "Mediterranean" in this run) at K=20 in Lazaridis et al. 2013's ADMIXTURE analysis [8]:

Some of the Somalis used in this study in some cases may not be entirely "representative" and the "Afars" aren't actually Afars

But this is merely educated speculation on my part. We clearly need more study into this but nevertheless; the point behind this entire post is quite simple--- not everything Cushitic in Southeast Africa or South of the Horn of Africa, Sudan and South Sudan or Northern Kenya should be assumed to be entirely South Cushitic. There are noticeable influences from East Cushites via archaeology, linguistics and at the very least; genetic hints.

1. In Cruciani et al. E-V32 is listed E-M78γ ("The Gamma cluster", basically).

2. Although it's been brought to my attention via the massive Hirbo thesis that Iraqw do show E-V22 so it's rather E-V32 showing up in some of these Southeast African Cushitic admixed groups that perhaps has more of a dubious origin.

Meager comments on new Haplogroup E paper

This new paper which goes quite a bit in Haplogroup E3b / E1b1b / E-M215 has a lot one could touch upon and frankly; Uniparental (Y-DNA & mtDNA) data isn't my forte. [1] My knowledge of them is quite basic or at least not comparable to how much I know when it comes to autosomal DNA based data. 

But some meager comments on my layman part wouldn't hurt; specifically pertaining to some of the data we now have on the E-M215 lineages carried by various Cushitic speaking populations (or those with substantial Cushitic ancestry) in this paper. 

The above chart is basically a duplication of the one Ethio-Helix' author made over at his blog when he touched upon this paper.

Many interesting things were discussed in the comment section of his blog and while I'm not necessarily recommending this forum as while it has some great and stellar to nice members, it is also overrun with a lot of troll-ish, bigoted to unstable types; there was some great back and forth about this paper on it here.

Many things are quite interesting here like the fact that E-M293* essentially seems to be tied to and likely spread by South Cushites as it and E-V32 & E-V22 are the only E-M215 lineages found in this paper's Southeast African populations all of whom are known to have substantial Cushitic ancestry mostly to entirely via what seemed to be South Cushitic speakers.

It's also in the past been tied to the spread of pastoralism across Southeast Africa & Southern [2], pastoralism being something South Cushites seemingly had a great part in spreading across the region.

However it's not entirely likely that E-V32 and E-V22 could be owed to South Cushites for reasons a friend noted to me just recently when I asked if they could be:

"Possible, but doubtful considering the South Cushites from Tanzania sampled by Hirbo (and the Datog, who are mostly of South Cushitic ancestry) lack E-V32 completely. Iraqw have some V12* though. These Tanzanians don't seem bottlenecked since they have the highest M293 diversity, so if proto-South Cushites had some V32, you would expect it to be preserved in the Tanzanians. 
Turkana are Northwest Kenyans and Maasai migrated from there recently, so it seems plausible that they have some East Cushitic admixture, which could explain the V32. Or it could be from their Nilo-Saharan ancestry, although V32 is mostly found in Western Sudan."

Now, given the uniparental data of groups like Iraqw who are actual modern South Cushitic speakers; it seems that South Cushites were paternally E-M215 (E1b1b / E3b) & T? [3] [4]

The current lack of J1 or J2 or even J* in all these South Cushitic admixed populations including South Cushitic speakers like the Iraqw is interesting to say the least. In the Horn J lineages can be found in Omotic speakers, Ethiopian Semitic speakers and Cushitic speakers alike. From Afars to Amharas to Gedeos or even Somalis to some extent. [4] 

The current lack of this Haplogroup among substantially South Cushitic admixed groups and even groups like Iraqws who currently speak a South Cushitic language is puzzling to me to say the least...

Other interesting things to note would be that despite their incredibly close linguistic ties; Sahos & Afars seem quite distinct in terms of Y-DNA lineages. For one in both this new paper and Plaster et al. the predominant E-M215 lineage among Afars seems to be E-V6 whilst E-V22 is the predominant E-M215 among Sahos.

It's also remarkable that all 94 of this paper's Saho samples are pretty much E-M215 carriers and could be 2% J1 or A-M13 or T or what have you when Afars actually tend to show a notable amount of J1, A-M13 and T:
More study will need to be conducted but for now it seems rather evident that despite their very close linguistic ties where Saho & Afar are often classified under one rather close "Saho-Afar" subbranch of Lowland East Cushitic; there's a pretty stark difference in terms of Y-DNA lineages between these two groups so far.

Other than that I find the amount of non-E-M215 in those Djiboutian Somalis quite interesting as this in my humble opinion means they're most likely predominantly Haplogroup T carriers.

An old mapping of Haplogroup T's spread

Somalis from Djibouti are mostly of the Dir clan (of the Issa sub-clan to be exact) and so far in Ethiopia and such; Dir clan Somalis tend to be mostly T carriers and something like 20-30% E-V32 [5] while the percentages are often the inverse for other Somalis like various Darods who tend to be 70-80% E-V32 and 20-30% (or less) T [5] with a small minority of J1 and A-M13 carriers.

Otherwise, I don't have too many other things to note that I possibly didn't note in this forum thread. I recommend going over to Ethio-Helix' blog post about this study and engaging in discussions about it there.

The thing is; I don't tend to touch upon Haplogroups in great detail because 1) I'm not that knowledgeable when it comes to them and 2) Ethio-Helix does enough when it comes to Uniparental data that me doing so would just translate into me just re-hashing everything he's already posted and adding some speculation here and there.

Nevertheless, I thought some of these things I and some others noticed would be interesting to mention here so enjoy and of course; do read the actual study if you're truly interested in all this.

1. The Youtube videos are recordings of the Afar & Saho languages respectively that I've uploaded onto my Youtube channel. I've uploaded recordings of various Cushitic languages onto these from South Cushitic languages to Central Cushitic languages to various East Cushitic (Sidamic & Lowland East) ones and in time; I'll add more. You can go into the description section of any given video I've uploaded to know where they come from.

2. Although it's been brought to my attention via the massive Hirbo thesis that Iraqw do show E-V22 so it's rather E-V32 showing up in some of these Southeast African Cushitic admixed groups that perhaps has more of a dubious origin.

Anthromadness News and things to come

In light of what could be considered a bit of slump these last weeks in terms of my activity; I've decided to hopefully start working on this blog a good degree more. I've also decided to open up a new sub-blog of sorts.

Anthromadness News

Anthromadness News will operate a good degree like how Dienekes' Blog often does nowadays; I'll mostly just share a new and interesting study/paper on population genetics, history, linguistics or archaeology on there with often nothing more than its abstract or an excerpt along with a possible image from the study/paper and perhaps some short notes on my part as to what the paper could entail (I'll likely only ever do this rarely):

Like with the above three examples it'll be a good chance for me to spread around some studies/papers I have almost no intention of commenting on over at this blog (in detail, at least) or at times; said paper may just be "pay-walled" and it would be somewhat of a breach of copyright for me to share too much on some of its details until the pay-wall is lifted. Though I could easily create my usual sort of blog post on a study/paper's data a while after I share it on Anthromadness News.

As for this blog; I have decided to edit it up a bit if you haven't noticed. The old Leonardo Da Vinci sketch background is gone as is the old template and while this particular template is very sleek and simplistic which is to my liking; it does indeed feel as though I'm somewhat "imitating" David over at Eurogenes or Dienekes's blog.

While I do like the simplicity of this current theme; I will try and work on giving this blog back a bit of its own flavor/identity perhaps with a custom made banner and a logo or even a new background and paid for theme at some point. I'll see. 

You'll also notice new features like the new "followers"/ members feature (go ahead and follow... I have noticed that I do have a consistent following via page views) and the blogroll where I'm recommending a handful of genome blogs I really approve of. Granted, this doesn't mean I agree with the authors of those blogs on everything (f.e. I don't at all agree with Dienekes' belief that Anatolia is where the Indo-European languages originated) but they've done stellar work in many regards and this is me expressing some respect.

I also highly recommend Anthrogenica which is about the only anthro/genome forum I've come across not teaming with mentally unstable people, bigots and trolls; it's a very good place to discuss population genetics, anthropology and linguistics without having to come across any unsightly characters. Many of the members there are also immensely knowledgeable with a few even having studied (or are studying) things like Medical Genetics or Linguistics so you can learn quite a bit.

As for actual posting well... I've got many things cooking up. For those of you who were intrigued by the whole Dobon et al. debacle well... I'm actually still working on getting ADMIXTURE results and even more PCAs for those Sudanese samples. It'll take quite a bit of time to get all this done on my own but if you're truly still interested in those samples; stay tuned. 

On the other hand; I've also got my hands on the GEDmatch kit no. of a Sudanese Arab bloke and am working on getting the raw genotype data of a Nubian woman (both are not involved with Dobon et al.). I've found through the Sudanese Arab so far that much of his results seem to somewhat back up what we've learned via Dobon et al.'s samples and once I'll touch on both his various ADMIXTURE results and the Nubian woman's results in due time. [note]

A bit of a tease of the results to come. The Tigrinya is an Eritrean friend of mine and somewhat of an old "mentor" of sorts to me, I'm Awale/ a rather typical ethnic Somali and that's of course the Sudanese Arab I'm in contact with. 

The above ADMXITURE results were acquired via GEDmatch (Eurogenes K36 calculator) and are in some cases a bit prone to the infamous Calculator Effect like with how we all show "East African" (in this run that's a component that peaks in Maasais) so the above results are not at all 100% accurate but I'll touch upon what these results and various others (including those of the Nubian woman) mean in due time, this is again; just a tease

As for other posts; some may have noticed that my very first ever blog post about "Race" is also gone (I haven't deleted it and it's archived elsewhere as well) and well... I'm essentially going to come out with a much more comprehensive, accurate and perhaps even more hulking post to replace it with and once that's out I'll consider still making a link to the old post available.

A good Anthrogenica post I'll be sharing in my re-modelling of my first ever blog post

I'll also be coming out with various other posts with completely different subject matters from the ones that'll be on Horner & Sudanese genetics or the re-modelling of my very first blog post so stay tuned. I've been a little inactive lately and just wanted to briefly touch upon somethings to come and some of this blog's new features.


1. For those curious; the Haplogroups of the three Northeast Africans would be Y-DNA E-V32 & mtDNA N1a for me/Awale, Y-DNA E-V32 & mtDNA T for my Tigrinya compatriot and Y-DNA J1e & mtDNA L2c for the Sudanese Arab chap. Our Haplogroup results and raw genotype data were made available to us via 23andme.