EgyptSearch Forums: New African pop. study, Scheinfeldta, Soib, Tishkoff

...

»	EgyptSearch Forums » Egyptology » New African pop. study, Scheinfeldta, Soib, Tishkoff

UBBFriend: Email this page to someone!

Author

Topic: New African pop. study, Scheinfeldta, Soib, Tishkoff

the lioness,
Member
Member # 17353

Rate Member

posted

Working Working toward a synthesis of archaeological, linguistic, and genetic data for inferring African population history

1. Laura B. Scheinfeldta,
2. Sameer Soib, and
3. Sarah A. Tishkoffa,c,1

+ Author Affiliations

1.
aDepartment of Genetics,
2.
bGenomics and Computational Biology Graduate Group, and
3.
cDepartment of Biology, University of Pennsylvania, Philadelphia, PA 19104

Next Section
Abstract

Although Africa is the origin of modern humans, the pattern and distribution of genetic variation and correlations with cultural and linguistic diversity in Africa have been understudied. Recent advances in genomic technology, however, have led to genomewide studies of African samples. In this article, we discuss genetic variation in African populations contextualized with what is known about archaeological and linguistic variation. What emerges from this review is the importance of using independent lines of evidence in the interpretation of genetic and genomic data in the reconstruction of past population histories.

* genetic variation
* human evolution
* mitochondrial DNA

Disentangling past population histories is a formidably complicated task that benefits from the synthesis of archaeological, linguistic, and genetic data. Archaeology permits insights into ancient technology and culture and provides a timetable for the emergence of innovations. Historical linguistic data complement the archaeological record by contributing an independent phylogenetic analysis of language relationships and providing clues about ancient population migration and admixture events. Similarly, genetic data provide an independent data source to understand the biological relationships among modern peoples and likely points of origin and expansion of their ancestors. Undoubtedly, the specific details of human demographic history are more complex than any synthesis can account for, but we are focusing here on the overlap among the archaeological, linguistic, and genetic data collected in Africa to make inferences about African demographic history.
Previous SectionNext Section
African Language Family Classification

Africa is home to almost a third of all modern languages, encompassing >2,000 ethno-linguistic groups (1) that have largely been classified into four language families: Niger-Kordofanian, Afroasiatic, Nilo-Saharan, and Khoesan. As displayed in Fig. 1, Niger-Kordofanian languages are spoken throughout western Africa, eastern Africa, central Africa, and southern Africa and include the common Bantu languages. The Afroasiatic language family includes languages spoken in northern, central, and eastern Africa such as Cushitic, Chadic, Semitic, and ancient Egyptian. The Nilo-Saharan language family is spoken predominantly in central and eastern Africa and includes the Sudanic and Nilotic languages. The Khoesan language family, which includes languages that contain click consonants and is spoken by hunter–gatherer populations in eastern (Hadza and Sandawe) and southern Africa [the San, referred to here as “southern African Khoesan” (“SAK”)], is the most contentious of the African language families because there is so much divergence among the Hadza, Sandawe, and SAK languages (2, 3).
Fig. 1.
View larger version:

Map of Africa color languange diagram * In this window
* In a new window

* Download as PowerPoint Slide

Fig. 1.

Map of Africa colored by the language family spoken in each region (adapted from ref. 29). The Afroasiatic language family is shown in purple, the Nilo-Saharan language family is shown in pink, the Khoesan language family is shown in blue, and the Niger-Kordofanian language family is shown in yellow.
Previous SectionNext Section
Modern Human Origins and Migration out of Africa

The earliest emergence of anatomically modern humans in the fossil record occurred in eastern Africa 200–150 thousand years ago (kya) (4–7). Although the earliest dated modern humans outside of Africa were identified in the Middle East ~90 kya (5, 8–11), there was no continuous occupation of regions outside of Africa until ~60–40 kya; modern human remains are documented in Papua New Guinea 60–40 kya (12), southwest Asia ~35 kya, Europe ~40 kya, and mainland Asia ~35 kya (5). Therefore, over half of modern human history took place within Africa exclusively, and understanding patterns of variation within Africa is critical for the elucidation of modern human demographic history.

Genetic data from extant modern humans complement the fossil record in the reconstruction of modern human origins. The uniparentally inherited mitochondrial DNA (mtDNA) and nonrecombinant portion of the Y chromosome (NRY) are two loci that have been extensively studied in human populations, in part because they represent the maternal and paternal population histories, respectively, in a population sample and in part because they do not undergo recombination and, therefore, lineages can be more easily traced back to a single common ancestor. Unfortunately, the mtDNA and NRY loci are single loci, which are susceptible to the effects of natural selection and genetic drift because they have smaller effective population sizes relative to the autosomes and because any selective pressure will impact the entire locus. Thus, combined mtDNA, NRY, and autosomal data are necessary for a thorough understanding of any population history.

MtDNA, NRY, and autosomal DNA studies demonstrate that the highest levels of genetic variation are present in African samples relative to non-Africans, consistent with a model of African ancestry for all modern humans (e.g., refs. 1, 13–19). Further, phylogenetic analysis of mtDNA and NRY variation reveals that the deepest phylogenetic clades are found exclusively in African samples and all non-African lineages derive from a subset of these African lineages (15, 16, 20–24). Consistent with the archaeological record, estimates of the time to the most recent ancestor (TMRCA) for the mtDNA lineages give an age range of ~200–100 kya (23–26) and similar results have been published for NRY lineages, ~200–65 kya (26–28). Therefore, the genetic data corroborate a model in which modern humans arose in African 200–100 kya and subsequently, one or more populations split off and migrated out of Africa. The migration out of Africa was accompanied by a population bottleneck, which resulted in a reduction in genetic diversity in non-African populations relative to Africans (29).
Previous SectionNext Section
Middle Stone Age in Africa

The Middle Stone Age, which took place ~250–40 kya (30), is a period in the archaeological record that indicates a significant change in culture and subsistence technology in Africa. Several sites in eastern, central, and southern Africa contain artifacts consistent with a shift in technology and population expansion ~75–55 kya, including hunting weapons, indications of increased plant utilization, signs of increased marine exploitation, and evidence of large-scale movement of red ochre (used for art), stone, and shell ornaments (30–33). It is tempting to speculate that these developments are tied to improvements in human communication; however, the reconstruction of proto-languages does not extend back this far in time; therefore, there is no empirical way to establish when or where human language emerged. Interestingly, an analysis of mtDNA data estimates a population expansion in Africa 70 kya (34), consistent with the archaeological evidence from the late Middle Stone Age. Furthermore, we would not expect to see the same signal of expansion in non-African populations given that the extreme bottleneck associated with the migration out of Africa most likely obscures more ancient demographic signals.
Previous SectionNext Section
Neolithic in Africa

The Neolithic period, beginning ~10 kya, included the development of agriculture and animal domestication in Africa, with concomitant changes in population demographics due to population growth and migration to new regions. Below we discuss several such movements including the spread of agriculture, the spread of pastoralism, and the dispersal of affiliated language groups and genetic lineages. It is important to note, however, that these associations among linguistic, archaeological, and genetic data are not presented here to paint a simple picture of migration or replacement, but rather to illustrate that large-scale movements of technology and culture have resulted in detectable amounts of gene flow among the involved peoples and that the interpretation of extant genetic patterns benefits from an understanding of the combined data.
Neolithic in Northern Africa.

Approximately 14 kya, climatic changes associated with the end of the Last Glacial Maximum resulted in regions around the world becoming more favorable to human exploitation. Northern Africa is one such region, and ~13 kya, novel technologies (“Natufian”) thought to be the immediate precursor to agricultural technologies emerged and were associated with semisedentary subsistence and population expansions in northeastern Africa (35). Moreover, before the emergence of the Natufian styled artifacts, the archaeological record includes two artifact styles, the “Geometric Kebaran” and the “Mushabian” associated with Middle Eastern and Northern African populations, respectively (35). The archaeological evidence suggests the peoples using these assemblages interacted for well over 1,000 years, and linguistic evidence suggests that the peoples using these assemblages may have spoken some form of proto-Afroasiatic (35, 36). Although the origins of the Afroasiatic language family remain contentious, linguistic data generally support a model in which the Afroasiatic language family arose in Northern Africa >10 kya (36). Moreover, analyses of the Cushitic branch of the Afroasiatic language family suggest that proto-Cushitic arose and diversified at least 7 kya, and this likely took place in Ethiopia (37).

Intriguingly, the origin and diversification of proto-Afroasiatic is consistent with the spread of intensive plant collection in the archaeological record, and some interpret this pattern to represent a model in which proto-Afroasiatic speakers developed the novel subsistence technology resulting in the expansion and spread of their Afroasiatic descendants in the region (37). Some examples of the relevant linguistic data include reconstructed Chadic root words for “porridge” and “sorghum” and the Cushitic root words for “grain” and “wheat” (37). Because these and other root words are present in many of the Chadic and Cushitic languages, it is assumed that they were present in the proto-Chadic and proto-Cushitic languages and therefore must be as old as those proto-languages (37).

The genetic data appear to be consistent with the archaeological and linguistic data indicative of extensive population interactions between North African and Middle Eastern populations. A recent NRY study explores the distribution of haplogroups in a sample of African, Middle Eastern, and European males (38). Whereas a subclade of haplogroup E (M35) appears to have arisen in eastern Africa over 20 kya and subsequently spread to the Middle East and Europe, haplogroup J (M267) appears to have arisen in the Middle East over 20 kya and subsequently spread into northern Africa (38). A recent study of genomewide autosomal microsatellite markers reports that Middle Eastern and African samples share the highest number of alleles that are also absent in other non-African samples, consistent with bidirectional gene flow (1). In addition, a recent study of domestic goat mtDNA and NRY variation reports similar findings as well as evidence of trade along the Strait of Gibraltar (39). The combined archaeological, linguistic, and genetic data, therefore, suggest bidirectional migration of peoples between northern Africa and the Levant for at least the past ~14 ky.
Neolithic in Sahel.

There is increasing archaeological, linguistic, and genetic evidence that the Sahel has been an important region for bidirectional migration between western and eastern Africa (1, 40–42). Linguistic evidence indicates population interactions for ~20–10 kya between the Nilo-Saharan and Afroasiatic speakers in this region (43). The combined linguistic and archaeological data support a model in which the Nilo-Saharan language family arose in eastern Sudan >10 kya and Nilo-Saharan speakers subsequently migrated westward to Lake Chad and southward into southern Sudan (1, 44). Linguistic data also suggest that ~7 kya, proto-Chadic Afroasiatic speakers migrated from the Sahara into the Lake Chad Basin (45). This possibility is supported by an analysis of NRY variation that finds that the pattern and distribution of haplogroup R (V88) are consistent with the emergence of proto-Chadic ~7 kya and subsequent expansion of this linguistic group into the Lake Chad Basin (46). Whereas the inferred migration route is not consistent between NRY and mtDNA analyses, perhaps due to sex-biased migration, studies of mtDNA corroborate a model in which Sahel is a corridor for bidirectional migration between eastern and western Africa and, on the basis of the distribution of haplogroup L3f3, the proto-Chadic speakers expanded from eastern Africa into the Lake Chad Basin (47, 48).
The Spread of Pastoralism.

Archaeological data suggest that the emergence of animal husbandry in northeastern Africa took place as early as ~11 kya (49). Archaeological studies in Nabta Playa (in Egypt’s Western Desert) reveal a spectrum of artifacts consistent with pastoralism and adaptation to the desert environment, including particular pottery styles (Khartoum tradition), evidence of well technology, and cattle burials (49, 50). By ~8 kya, evidence is present of imported (from the Middle East) sheep or goat remains in northeastern Africa (e.g., ref. 50). Some controversy persists in the archaeological community regarding whether cattle domestication was developed in northern Africa or imported from the Middle East; however, recent DNA analysis of extant indigenous African bovine taurine and zebu cattle (51) supports a model in which the earliest emergence of pastoralism involving taurine cattle took place in northeastern Africa and subsequently spread westward and southward (51). A recent analysis of NRY variation in 13 eastern and southern African population samples suggests that the spread of pastoralism from eastern Africa to southern Africa was accompanied by migration of pastoral peoples as well as pastoral technology as evidenced by the distribution of NRY haplogroup M293 (and the subclade E3b1f-M293) (22). Furthermore, the most likely source for this migration based on the samples included in Henn et al. (22) would have been the southern Nilotic speaking Datog (because the haplotype frequency and diversity of M293 is highest in the Datog) ~2 kya (22).

Ehret (52) inferred the history of pastoralism in Africa from a linguistic analysis of shared cognates. His findings support a relatively ancient emergence of pastoralism in northeastern Africa corresponding to Eastern Sudanic, Central Sudanic, and possibly Southern Cushitic speakers, followed by the subsequent spread of cattle keeping to western and southern Africa (52). The relatively ancient emergence of pastoralism in the archeological record is supported by the reconstruction of proto-Cushitic languages. For example, there are at least two words for cattle that are thought to be relatively old, one in Northern Cushitic and the other in Central Cushitic. In proto-Cushitic, the word “hlee,” which translates to “head of cattle,” is related to the Southern Cushitic (Mbugu) word “hline,” which translates to “heifer” (52), and so on. Furthermore, estimates of linguistic diversity of vocabulary related to cattle suggest that cattle keeping arose in northeastern Africa and subsequently spread to western and southern Africa (52).

Ehret (52) also argues that the spread of cattle milking was separate and more recent than the spread of cattle keeping. He discusses the assumption that the spread of cattle milking would require some discernible impact on the language used to discuss it (52). For example, the proto-Bantu word for milk is related to the proto-Bantu word for breast, but there are several root words for milk (many likely borrowed from other languages) among the Bantu languages. However, there is only one root word for milking (literally to squeeze). This observation, is interpreted to support a model in which a Bantu population in Tanzania borrowed the word (possibly from the southern Cushitic speakers) representing milking as well as the actual technology related to cattle milking and subsequently spread the technology to other Bantu speaking populations (52).

The shift from food gathering to food producing inferred from African archaeological and linguistic data also resulted in a detectible genetic signal. This relationship between subsistence, culture, and biology due to gene/culture coevolution is one that has been of special interest in human genetics studies. Models of Darwinian (i.e., positive) selection are consistent with subsistence being an environmental factor that can have a profound effect on patterns of genetic variation, and the emergence of agriculture and pastoralism is tied to increased population densities and dietary changes. Thus, genetic variants that conveyed a selective advantage in this shift in diet from foraging to animal and plant products would have persisted and increased in frequency in agricultural and pastoralist communities.

Lactase persistence is one of the better studied examples of gene/culture coevolution (e.g., 53, 54). In most mammals, once an individual is weaned, it loses the ability to produce the enzyme lactase-phlorizin hydrolase (LPH), which is necessary to digest the sugar lactose present in milk without gastric distress (55). The majority of humans do not express this enzyme as adults (referred to as the “lactase nonpersistence” phenotype) (56). Several widespread mutations, however, result in the continued production of LPH into adulthood, a trait often referred to as lactase persistence (57). The distribution of the lactase persistence phenotype is intriguing given what is known about subsistence patterns worldwide (Fig. 2). Lactase persistence is present at high frequency in Northern European dairying and African pastoralist populations; at moderate frequency in southern European and Middle Eastern populations; and at low frequency in nonpastoral Asian, Pacific, American, and African populations (55). In Europeans, the most common mutation associated with lactase persistence is thought to be a regulatory mutation located upstream of the gene that encodes LPH (a T at position −13910), within intron 13 of the neighboring MCM6 gene (56, 58). Further, this mutation is located within a large linkage disequilibrium block that is thought to have arisen ~20–2 kya, consistent with recent positive selection related to the emergence of cattle domestication and milk consumption ~10 kya in the Middle East (59, 60).
Fig. 2.
View larger version:

* In this page
* In a new window

* Download as PowerPoint Slide

Fig. 2.

Global map showing the frequency of the lactase persistence trait for populations reported in Ingram et al. (55) and citations therein. Lactase persistence is shaded in black.

In African populations, the lactase persistence phenotype is generally highest in pastoral populations (55–57, 61, 62). However, with the exception of the Fulani and Hausa populations (62), other African pastoralist populations do not have the T-13910 mutation associated with the lactase persistence trait (57, 61). Recent studies have identified at least three additional and independent mutations that are associated with lactase persistence in East African pastoralist populations: C-14010, which is most common in Kenya and Tanzania (57); G-13907, which is present at low to moderate frequency in northeast Africa (57, 61); and G-13915, which is most common in the Middle East (60) and northeastern Africa (57, 61) and may be associated with camel domestication in the Middle East ~6 kya (60). Tishkoff et al. (57) demonstrated that all three variants result in significant increases in gene expression levels driven by the lactase promoter.

The most common variant within Africa associated with lactase persistence (C-14010) is also located within an extremely large linkage disequilibrium block (2 Mb) and is thought to have arisen ~6.8–2.7 kya in either the agropastoralist Afroasiatic populations that migrated into Kenya and Tanzania from Ethiopia within the past 5,000 years or the Nilo-Saharan pastoralist populations that migrated into the region from southern Sudan within the past 3,000 years, and the variant then subsequently spread throughout pastoral populations in eastern Africa relatively rapidly, consistent with the spread of pastoralism into sub-Saharan Africa ~4.5 kya (57). The estimates of the selection coefficients of the African mutations (0.035–0.097) are among the highest reported for modern humans, and intuitively this makes sense given not only the increased nutritional value of drinking milk as an adult but also the increased source of water in regions such as the Sahara where dehydration and diarrhea can cause death.
Bantu Expansion.

In Sub-Saharan Africa, the long-range exchange networks of Neolithic technology and associated spread of Bantu languages (which we refer to here as the “Bantu expansion” for the sake of simplicity) have had a major influence on biological and cultural diversity in sub-Saharan Africa. On the basis of archaeological and linguistic data, the Bantu languages and associated agricultural and iron age technologies are thought to have originated in Nigeria or Cameroon (63) ~5,000 years ago (64, 65) and spread relatively rapidly across sub-Saharan Africa. The extent to which this was associated with the migration of populations vs. a diffusion of language and technology among populations has been debated.

The linguistic classification of the ~600 Bantu languages is interpreted to represent several dispersals throughout sub-Saharan Africa (e.g., ref. 66). Ehret (67) argues that proto-Bantu diverged into several daughter clades, all but one of which are spoken only in the northwestern region of the Bantu-speaking areas (i.e., western central Africa), and the other of which was a forest Savanna Bantu clade. Ehret (67) goes on to argue that the forest Savanna Bantu clade diverged into several daughter clades, including the Savanna Bantu clade, and this diversification is linked to the spread of Bantu languages into central and southern Africa. The Savanna Bantu clade includes most of the contemporary languages spoken in eastern Africa, southeastern Africa, southwestern Africa, and the southern Savanna belt. This reconstruction supports a model in which proto-Bantu emerged in western central Africa ~5,000 years ago and diversified and spread across the rainforest for ~2,000 years before the first archaeological evidence of eastern Bantu speakers in the Great Lakes region (67).

Archaeological evidence related to the Bantu expansion largely focuses on the distribution of Early and Late Iron Age sites in Africa. Phillipson argues that the Eastern Bantu languages likely arose in western central Africa around the time of the emergence of Early Iron Age artifacts consistent with cattle keeping, but that the spread of Eastern Bantu languages is associated with the distribution of “later Iron Age” sites in central and southern Africa (68).

There is also a genetic signature of past population movements thought to be associated with the Bantu expansion. The large majority of genetic analyses have focused on mtDNA and NRY data. Overall, both datasets tie particular mtDNA (e.g., L0a, L2a, L3b, and L3e) (25, 69–73) and NRY [e.g., E3a (M2/M180), E2 (M75), and B2a (M150)] (22, 65, 72) lineages to the Bantu expansion, because they are found in the highest frequencies in extant Bantu-speaking populations. Interestingly, comparative studies of mtDNA and NRY variation suggest different maternal and paternal population histories related to the Bantu expansion (71, 72). Specifically, NRY variation in regions affected by the Bantu expansion is low relative to mtDNA variation and consists almost exclusively of haplogroup lineages associated with the Bantu expansion (71). Conversely, the mtDNA haplogroup lineages in the same samples include lineages associated with the Bantu expansion as well as lineages that are thought to have been present in the region before the Bantu expansion (21). This discrepancy is largely attributed to sex-biased migration and gene flow due to the practice of patrilocality and/or polygyny (71, 74), both of which are common in present-day Bantu-speaking populations. Moreover, this pattern of sex-biased gene flow is documented independently in other regions of the world such as the Pacific Islands (75, 76). Both loci, however, are more susceptible to genetic drift than autosomal loci because of their relatively smaller effective population sizes; therefore, some of the differential male/female patterns may be attributed to chance. A recent analysis of genomewide autosomal data is consistent with a large genetic impact of the Bantu expansion on most of sub-Saharan Africa, as evidenced by the presence of Niger-Kordofanian ancestry in many central, eastern, and southern African populations (1). In addition, Tishkoff et al. (1) documented evidence from their analysis of genomewide autosomal loci of a distinct Bantu migration from eastern to southern Africa, which is consistent with the archaeological and linguistic evidence of dispersal of Bantu technology and languages from the Great Lakes region of East Africa (67).
Previous SectionNext Section
Contemporary African Genetic and Linguistic Variation

Scholars have studied language relationships within a cladistic framework since at least the early 19th century (77), and given the parallels in linguistic and genetic change over time, it is not unreasonable to use linguistic affiliations as a way of grouping individuals for genetic study. Several studies have demonstrated a correlation between linguistic and genetic variation, including cases in Europe (78, 79), Asia (80), the Pacific (75, 76, 81, 82), and the Americas (83–86). The main difficulty in these studies lies in the interpretation of linguistic similarities among populations. Whereas language sharing obviously results from some degree of contact among peoples, the horizontal transmission of language can occur with little to no genetic exchange. Likewise, there can be genetic exchange with little or no linguistic exchange. Therefore, the degree of correlation between genetic and linguistic variation varies depending on the populations being studied.

Studies of genetic variation within Africa, as mentioned above, have found extensive amounts of genetic variation relative to non-Africans owing to the fact that the “out of Africa” bottleneck significantly reduced genetic variation in non-Africans; however, most genetic studies of African populations are limited by the number of population samples included. More recent work has improved the understanding of genetic variation in Africa with a survey of genomewide genetic variation in geographically and ethnically diverse African samples (1). Tishkoff et al. (1) analyzed 1,327 genomewide autosomal microsatellite and insertion/deletion polymorphisms in 121 African population samples and a comparative sample of 1,394 non-Africans. The authors (1) studied population structure and relationships using the program STRUCTURE (87), among other phylogenetic analyses. The STRUCTURE program uses a model-based Bayesian clustering approach to identify genetic subpopulations and assign individuals probabilistically to these subpopulations on the basis of their genotypes, while simultaneously estimating ancestral population allele frequencies. The program STRUCTURE places individuals into K clusters, where K is chosen in advance and is varied across independent runs, and individuals can have membership in multiple clusters (87). Tishkoff et al. (1) inferred 14 ancestral population clusters globally as well as within Africa and found that the African samples cluster geographically as well as linguistically and ethnically (Table 1). In addition to the STRUCTURE analysis, the authors (1) constructed a neighbor-joining tree on the basis of pairwise population genetic distances that showed that the African samples clustered primarily by geographic region and to a lesser extent by linguistic affiliation with a few notable exceptions. The pygmies from central Africa, for example, clustered near the southern African San.
View this table:

* In this window
* In a new window

Table 1.

Inferred population clusters using the STRUCTURE analysis of autosomal microsatellite and insertion/deletion polymorphism data from global populations adapted from ref. 1

Several studies have looked at the relationship between genetic and linguistic variation in African samples (1, 21, 22, 40, 88–90). For example, an NRY study of Nilo-Saharan, Niger-Congo, and Afroasiatic speakers in Sudan revealed a strong correlation (Mantel test: r = 0.31, P = 0.007) between linguistic and NRY variation (40), and in this case the correlation between linguistic and genetic variation was stronger than the correlation between geographic and genetic distances (Mantel test: r = 0.29, P = 0.025). Similarly, a study of mtDNA and NRY variation in 40 African samples representing all four language families reports a significant correlation between genetic and linguistic distances (Mantel of NRY, r = 0.32, P = 0.001; Mantel of mtDNA, r = 0.23, P = 0.016) (71).

The single-locus studies of genetic and linguistic correlation are consistent with the regression analysis reported by Tishkoff et al. (1) that documents significant correlations between linguistic and genetic distances within the Niger-Kordofanian and Nilo-Saharan language families after correction for geographic distances. To further explore the relationship among genetic and linguistic variation in Africa, we used the published dataset of genomewide data from Tishkoff et al. (1) that includes 103 population samples (n ≥ 10) that speak languages representing all four African language families. We first performed a Mantel test to determine to what extent genetic and linguistic distances are correlated within language families. Not surprisingly, all three tests showed that linguistic and genetic distances were significantly correlated (with 100,000 permutations): Niger-Kordofanian, r = 0.32, P = 9.99−6; Nilo-Saharan, r = 0.29, P = 9.99−6; and Afroasiatic, r = 0.27, P = 9.99−6 (the linguistic relationships among the Khoesan speakers are not clearly understood and therefore did not permit the construction of a linguistic distance matrix needed to perform a Mantel test); and the correlation coefficient is >25% in all three tests.

Because we and others (1) have established a significant correlation between linguistic affiliation and genetic variation within three of the African language families, we wanted to explore to what degree samples plotted by genetic distance cluster by language family. We used multidimensional scaling (MDS) to construct a two-dimensional plot of a pairwise genetic distance matrix taken from the above-mentioned 103 population samples (1). Consistent with the mtDNA and NRY studies discussed above (40, 71), our genomewide analysis of microsatellite data shows that populations generally cluster on the basis of both geographic region and linguistic classification. Fig. 3 demonstrates that populations generally separate by linguistic affiliation along dimension 1. Dimension 2 separates the SAK speakers from all other Africans including the eastern Khoesan speakers, the Hadza and Sandawe, that cluster closely with other eastern Africans. Another interesting pattern that emerges in the MDS plot that is consistent with previous work (1) is the clustering of the Afroasiatic Chadic speakers with the Nilo-Saharan speakers, which may reflect a past language shift (1).
Fig. 3.
View larger version:

* In this page
* In a new window

* Download as PowerPoint Slide

Fig. 3.

Multidimensional scaling (MDS) analysis of autosomal microsatellite data from Tishkoff et al. (1). A pairwise genetic distance matrix using (δμ)2 (as described in ref. 1) was constructed for populations with a sample size of n ≥ 10 and used for MDS analysis. Populations are colored on the basis of linguistic affiliation. The Afroasiatic speakers are shown in purple, the Nilo-Saharan speakers are shown in pink, the Khoesan speakers are shown in blue, and the Niger-Kordofanian speakers are shown in yellow. The x axis represents dimension 1 and the y axis represents dimension 2.

Because the distribution of language families in Africa roughly follows a geographic distribution (Fig. 1), we also performed MDS within geographic regions that include at least three language families. In central Africa (Fig. S1), the samples cluster by language family with a few notable exceptions. For example, the Fulani who are nomadic pastoralists that speak a Niger-Kordofanian language and reside across central and western Africa do not cluster with other Niger-Kordofanian-speaking populations. Moreover, the Fulani are distinguished from other African samples at K = 14 in Tishkoff et al.’s (1) STRUCTURE analysis. Morphological analyses of the Fulani have been interpreted to suggest a Middle Eastern origin for the Fulani (91), and there has been some speculation based on linguistic data that the Fulani migrated to central Africa from northern Africa or the Middle East (91). In addition, there is evidence of shared recent ancestry among the Fulani and European/Middle Eastern samples from studies of mtDNA (92), NRY (40), and autosomal microsatellites (1) and from the presence in this population of the mutation associated with lactose tolerance in Europeans (T-13910) (62).

Whereas previous work on mtDNA (92) is consistent with a West African origin for the Fulani (consistent with other Bantu speakers), the NRY data reveal that the Fulani share recent ancestry with Nilo-Saharan and Afroasiatic speaking populations (40). As in other cases where the maternal and paternal patterns of population history are not in agreement, this result could reflect differential patterns of Fulani male and female migration and gene flow, or it could reflect the influence of genetic drift or some combination of the two. A more recent analysis of genomewide autosomal data shows that the Fulani cluster most closely with the Chadic- and Central Sudanic-speaking populations (1). This result is consistent with our MDS analysis in which both Fulani cluster most closely with the Chadic- and Central Sudanic-speaking populations, as well as with the Baggara (Semitic). The clustering of the Baggara near the Fulani is also consistent with Tishkoff et al. (1), who report that the Baggara share ancestry with the Fulani and with the Chadic speakers.

To a lesser extent, the Hausa from Nigeria and Cameroon cluster more closely with the Niger-Kordofanian speakers along dimension 2 (Fig. S1). This result is consistent with previous genetic analysis (1) and with linguistic analysis of the Hausa that suggests extensive interaction between the Hausa (who speak an Afroasiatic Chadic language) and Niger-Kordofanian speakers as evidenced by an analysis of loanwords (93).

In eastern Africa (Fig. S2), dimension 1 separates the Afroasiatic and Niger-Kordofanian samples, and dimension 2 separates the Nilo-Saharan samples. As in Fig. 3, the Hadza and Sandawe do not separate from the eastern African samples along either dimension to any large extent, although they do cluster closely to each other (Fig. S2), and this pattern is consistent with extensive regional gene flow with neighboring populations. The other noteworthy pattern in this plot is the Luo sample (Fig. S2), who speak a Western Nilotic language but cluster separately from other Nilo-Saharan speakers along dimension 1, together with Bantu-speaking populations. This clustering is consistent with previous findings that the Luo show predominately Bantu ancestry (1) and may reflect high levels of admixture among the Luo and geographically nearby Bantu populations (94).
History of Hunter–Gatherer Populations.

As mentioned previously, the classification of languages within the Khoesan language family is contentious given the high diversity within each subclade and extreme divergence among them (95, 96), particularly for the Sandawe and Hadza. A common classification, therefore, groups the three languages spoken in South Africa into a separate branch (SAK) from the more divergent Sandawe and Hadza (97). One interpretation of this extreme linguistic diversity is that the last common ancestor of the language family must be extremely ancient, and Ehret (95) estimates the TMRCA to be at least 20 kya (which approaches the limit in timescale to linguistic reconstruction). The Sandawe and SAK are more similar to each other linguistically than either one is to the Hadza. Geographically, however, the Sandawe and Hadza are extremely close to each other (150 km apart in Tanzania), and both are geographically distant from the SAK populations residing in southern Africa.

A recent study of mtDNA and NRY variation investigates the genetic relationship among the Hadza, Sandawe, and SAK (21). The authors find that in general, the Hadza and Sandawe are more genetically similar to each other than either one is to the SAK. However, the Sandawe and SAK share ancient mtDNA lineages, which may suggest an ancient common ancestry. For example, mtDNA haplogroup L0d is present at high frequency in the SAK and at low frequency in the Sandawe, but is not present in the Hadza samples (21), and the TMRCA estimate of the SAK and Sandawe L0d lineages is ancient (~60 kya) (21). Similarly, the SAK and Sandawe share NRY haplogroup A (M91), which is not present in the Hadza samples (21). On the other hand, haplogroup L4g is common in both the Sandawe and the Hadza and absent from the SAK samples, and the TMRCA for the Sandawe and Hadza L4g is more recent (~25 kya) (21). And all three samples share NRY haplogroup B2b (M112) (21). The authors (21) discuss more than one interpretation of these results. The absence of mtDNA haplogroup L0d and NRY haplogroup A (M91) from the Hadza could reflect loss due to genetic drift because there is evidence of a recent bottleneck in the Hadza (98). Alternatively, the pattern of haplogroup variation could reflect an ancient linguistic and genetic divergence of the Hadza from the SAK. Moreover, the authors (21) performed a likelihood analysis to estimate the time of divergence among the populations and found that the divergence between the Hadza and the Sandawe was >20 kya and the divergence between the Hadza/Sandawe and the SAK was >40 kya. Additional studies of mtDNA and NRY variation have identified ancient shared lineages among the SAK and the Hadza as well as several other eastern African populations (28, 38, 99–101). Consistent with the mtDNA and NRY data, our MDS analysis shows that the Hadza and Sandawe cluster closely together with each other and with other eastern African populations (Fig. 3). Additionally, the Hadza are slightly farther from the SAK than the Sandawe along both dimensions (Fig. 3).

Tishkoff et al. (1) provide evidence for an ancient common ancestry of Khoesan and Pygmy populations, suggesting the possibility of a proto-Khoesan hunter–gatherer population in eastern Africa that diverged >30 kya. STRUCTURE analysis revealed that the pygmies cluster together with other hunter–gatherer samples, including the SAK, Hadza, and Sandawe at low K values (K = 3), and then differentiate at higher K values (K = 5) (Table 1). The analysis also shows that the Mbuti pygmies cluster with the SAK at higher K values (K = 7), which could be due to either common ancestry or more recent gene flow. In addition, recent work on mtDNA, NRY, and autosomal data estimated the TMRCA of the pygmy and agricultural populations to be approximately 70–60 kya and the TMRCA of western and eastern pygmies to be approximately 20 kya (73, 102, 103). The findings of Tishkoff et al. (1) raise the possibility that the pygmy populations, who have lost their indigenous language, once spoke some form of proto-Khoesan with click consonants. Interestingly, linguistic analysis of the SAK suggests that they originated in eastern Africa and possibly as far north as Ethiopia before migrating into southern Africa, consistent with the identification of rock art in the Sandawe homeland and in southern Africa that is thought to be related to Khoesan speakers (104). There is further evidence that, although there has not been recent gene flow among these populations, there has been recent admixture between the Sandawe and neighboring populations as well as between the pygmies and neighboring populations, and this recent admixture may be obscuring the more ancient relationships among the hunter–gatherer populations (1). Future analyses that incorporate data from across the genome together with full-likelihood or approximate Bayesian computation methods will be necessary to more fully understand these complex population histories.
Previous SectionNext Section
Conclusions

We have presented here a synthesis of the archaeological, linguistic, and genetic data used to infer African population history. The general picture that emerges is that genetic variation in Africa is structured geographically and to a lesser extent linguistically. This is consistent with the fact that populations in close geographic proximity to each other as well as populations that speak linguistically similar languages are more likely to exchange genes. The pattern of genetic variation in Africa is also consistent with geographic barriers limiting gene flow as exampled by the geographic/genetic distinction between northern African and sub-Saharan African populations. When we focus, however, on particular exceptions to these broad patterns, we are able to more fully appreciate the complex population histories that have contributed to extant patterns of genetic variation. The development of sequencing and genotyping technologies is advancing at an unprecedented rate and is allowing for the genotyping of millions of single-nucleotide polymorphisms and the sequencing of millions of nucleotides across populations. These data, coupled with computational methods for inferring demographic parameters and testing demographic models (e.g., maximum likelihood and approximate Bayesian computation), are well powered to refine our understanding of African past population histories. The incorporation of archaeological and linguistic data will be important for establishing testable hypotheses and elucidating the evolutionary processes (or forces) that have shaped the genomic landscape in Africa.
Previous SectionNext Section

Posts: 42922 | From: , | Registered: Jan 2010 | IP: Logged |

the lioness,
Member
Member # 17353

Rate Member

posted

Posts: 42922 | From: , | Registered: Jan 2010 | IP: Logged |

Printer-friendly view of this topic