EgyptSearch Forums: Post A Reply

...

»	EgyptSearch Forums » Egyptology » Genomic Ancestry of North Africans Supports Back-to-Africa Migrations Brenna M. Henn » Post A Reply

Post A Reply
Login Name:
Password:
Message Icon:
Message: HTML is not enabled. UBB Code™ is enabled.	[QUOTE]Originally posted by the lioness,: [QB] Materials and Methods Samples and Data Generation A total of 152 individuals representing seven different North African locations and the Basque Country were included in the present study. Informed consent was obtained from all of them. Samples were genotyped on the Affymetrix 6.0 chip, and after quality control filtering for missing loci and close relatives, 125 individuals remained: 18 from North Morocco, 16 from South Morocco, 18 from Western Sahara, 19 from Algeria, 18 from Tunisia, 17 from Libya and 19 from Egypt. Further information on the samples may be found in Table S1. Moreover, 20 individuals from the Spanish Basque country were included in the analysis. Data are publicly available at: bhusers.upf.edu/dcomas/. In order to study the population structure and the genetic influence of migrants in the region a database was built including African and European populations from HapMap3 [43], western Africa [20], and 20 Qatari from the Arabian Peninsula [44] as Near Eastern representatives. Written informed consent was obtained from the participants and analyses were performed anonymously. The project obtained the ethics approval from the Institutional Review Board of the institution involved in the sampling [Comitè Ètic d'Investigació Clínica - Institut Municipal d'Assistència Sanitària [CEIC-IMAS] in Barcelona, Spain]. Population Structure An unsupervised clustering algorithm, ADMIXTURE [25], was run on our seven new North African populations, Spanish Basque, Near Eastern Qatari, western Africans, HapMap3 Kenyan Luhya, Maasai and Italian Tuscans. Nine ancestral clusters [k = 2 through 10] in total were tested successively. Log likelihoods for each k clusters are available in Figure S1B. Fst based on allele frequencies was calculated in ADMIXTURE for each identified cluster at k = 8. Given the high heterogeneity in Qatari population, we present individuals with the lowest sub-Saharan, European and North African ancestries and higher Near Eastern ancestry, based on ADMIXTURE. Multidimensional scaling [MDS] was applied to the pairwise IBS Matrix of 279,528 SNPs using PLINK 1.07 software [45]. The top three MDS components were plotted together using R 2.11.1. Population divergence estimates from the cluster-based allele frequencies from ADMIXTURE [k = 5–8] were obtained using [46]:  The cluster-based allele frequencies will be less biased by recent migration between populations. Estimates of population divergence, though potentially older if migration is unaccounted for in the Fst estimate, are unlikely to be younger if the range of Ne sizes is realistic. Phasing In previous work, imputation accuracy was tested in a sample of Algerian Mozabites and other populations from the Human Genome Diversity Project [HGDP-CEPH] [37]. Among all the African populations, the Mozabites had the poorest imputation accuracy when the sub-Saharan Yoruban sample was used to predict allele states [37]. For this reason, we used multiple populations for phase inference. North African, Qatari and Basque genotypes were phased using BEAGLE 3.0 software [47]. Phased haplotypes from three HapMap3 populations [i.e. Maasai, Yoruba, and Tuscans] were used as seeds for haploype inference; each HapMap3 population was randomly sub-sampled for 30 individuals each in order to prevent over-representation of haplotypes from a single geographic region. The Basque, Qatari and all North African populations were phased with the same three seed populations to prevent discrepancies based solely on different haplotype seeds. Inference of IBD We estimate the amount of DNA shared identically by descent [IBD] using the GERMLINE software [27], with a 5 cM threshold to eliminate false positive IBD matches. All 5 cM or greater segments shared IBD between pairs of individuals were summed, and histograms created for sharing within each North African population. PCA–Based Local Ancestry Assignment Local ancestry was assigned with a new PCA-based method, PCADMIX. This method uses phased genotype data [i.e., haplotypes] to determine exact posterior probabilities along each chromosome. PCADMIX relies on Principal Components Analysis [PCA] to quantify the information that each SNP contributes to distinguishing the ancestry of a genomic region. PCADMIX is publicly available at sites.google.com/site/PCADMIX. We use Singular Value Decomposition in R to perform PCA on the phased genotypes of the ancestral representatives. We project admixed individuals on the basis of principal components, and compute the observed ancestry “score” for a haplotype i in the jth window as the weighted average Ljgij, where gij is a column vector of the haplotype's alleles [coded as 0 or 1] in window j, standardized by the mean and standard deviation of that SNP's frequency in the ancestral populations. Lj is a matrix for which the entry in the kth row, lth column is the loading of SNP l in the window on principal component k. We use a forward-backward algorithm to identify the probability of ancestry at each window, conditional on the ancestry scores. For the forward-backward algorithm in our HMM, we used a haploid version of the transition and emission probabilities in the Viterbi algorithm of Bryc et al. [20]. The transition probability is defined by p, the probability of recombination between windows, and qj, the frequency of the target population's chromosomes in the admixing ancestral pool. First, ancestral populations are thinned for SNPs with r2<0.8 in order to remove highly linked alleles from different populations, which can lead to spurious ancestry transitions. Second, chromosomes for each individual in a population are artificially strung together to create two haploid genomes for the individual; this step increases the amount of information used for PCA, and it is of special relevance given that Europeans, Near Easterners and North African are differentiated with an Fst of only ~0.05. Then, PCA on a number k≤3 of ancestral populations is performed and the admixed population is projected into the determined k≤3 PCA space. PC loadings are used as weights in a weighted average of the allele values in a window of 40 SNPs. These window scores are then used as observed values in a HMM to assign posterior probabilities to the ancestry in each window [where chromosome were considered separately]. Information on using PCADMIX in Egyptians is available in Figure S8. Additional performance testing and details of the implementation for this approach are available in [28], Texts S1, S2, S3 and Figure S9. Estimates of Migration Parameters We tabulated the length and number of genomic tracts [i.e. phased haplotypes] assigned to particular population ancestries for the South Moroccan and Egyptian population samples [see above for PCA-based local ancestry assignment]. We used a posterior probability threshold of 0.8, optimized for concordance with ADMIXTURE ancestry proportions [Figure 5A]. The maximum likelihood estimate of the time of migration is sensitive to the minimum detectable length of migrant tracts. That is, as migrant tracts recombine with non-migrants and become smaller in size, we are less likely to detect them. Histograms of the cumulative number of migrant tracts of different lengths, for all individuals, were visualized [Figure S10] and we observe a reduction in the number of short migrant tracts in the 0.5 to 1.5 cM bins, inconsistent with constant or punctual migration model. Rather, this reduction can be understood as a reduction in our ability to detect short migrant segments due to insufficient SNP density or haplotype variation that is not present in our source population. We therefore choose a 3 cM threshold as the minimal length of migrant tracts to be considered. Theoretically, under an isolation followed by migration model and with a 3 cM tract length threshold, we have power to detect relatively recent migrations occurring within the past generations [30]. We modify Pool and Nielsen [30] equation 10, with for the likelihood that a segment is of length Morgans given that it is longer than the cutoff length in a model with constant migration rate starting at time in a chromosome of length . Similarly, we estimated a likelihood of for punctuated migration occurring generations ago, which neglects chromosomal edge effects, an approximation justified by the fact that for a large majority of tracts. Supporting Information Figure S1. A] ADMIXTURE results for k = 10 ancestral clusters in our North African populations, Spanish Basque, Near Eastern Qatari, western Africans, HapMap3 Kenyan Luhya and Maasai and Italian Tuscans. B] Log likelihoods for each of the k clusters tested. [TIF] Figure S2. We used multidimensional scaling [MDS] to discriminate clusters of genetic variation within Africa and neighboring regions. MDS was applied to the pairwise, individual identity-by-state [IBS] matrix of 279,500 SNPs using PLINK 1.07 software [45]. The component 3 versus 4 [A] and component 1 versus component 2 versus component 3 [B] were plotted together using R 2.11.1. Population colors match Figure S1A [k = 10]. North African populations are all indicated in turquoise. [TIF] Figure S3. Long runs of homozygosity compared across North African populations and neighbors. –homozyg –homozyg-window-kb 5000 –homozyg-window-het 1 –homozyg-window-missing 1 –homozyg-snp 25 –homozyg-kb 500 –homozyg-gap 100. [TIF] Figure S4. Implementation of PCADMIX. A] A principal components analysis is first run for k = 3 ancestral populations. The proportion of Population A's ancestry in an admixed individual is estimated by: a given haplotype's [black square] distance from the line connecting the means of PCA1 and PC2 for the two other populations, as a proportion of the haplotype's distance from all edges. B] Simulated ancestry assignment with and without LD filtering. The black arrow indicates a region of simulated European ancestry that is incorrectly classified [at a posterior probability calling threshold of 0.9] as African when no linkage disequilbrium [LD] filtering is used, and whose ancestry is left undecided when LD filtering is implemented [r2<0.8]. [TIF] Figure S5. Comparison of ADMIXTURE and PCADMIX ancestry estimations in [A] South Moroccans and [B] Egyptians. In both cases PCADMIX was required to assign ancestry with a posterior probability of 0.95. The 0.95 threshold substantially reduces the proportion of the genome assigned by PCADMIX. In South Moroccans, the reduction in assigned ancestry occurs primarily in the European and to a lesser extent in the Berber component. For the Egyptians, the reduction in assigned ancestry is dramatically reduce Near Eastern [or Arabic] ancestry. [TIF] Figure S6. A] PCADMIX applied to a South Moroccan individual using Saharawi, Basques and Luhyan as ancestral populations. Segments are assigned to ancestries with a posterior probability higher than 0.8. B] PCADMIX applied to the same South Moroccan individual as in A] using Tunisian, Basque and Luhya as the ancestral populations. Segments are assigned to ancestries with a posterior probability higher than 0.8. [TIF] Figure S7. We capture admixture proportions by independently running LAMP [29] for estimating local ancestry using the Tunisian Berber, European Basque and sub-Saharan Luhya source populations. Sub-Saharan ancestry appears concordant with ADMIXTURE and PCADMIX. Tracts of “Maghrebi” ancestry appear shorter than those inferred in PCADMIX, although this may be attributed to the use of the high Maghrebi but low diversity Tunisian Berbers. Results are shown for chromosome 1 [A] and X chromosome [B]. [TIF] Figure S8. Shown is the admixture deconvolution for chromosome 1 using PCADMIX for 19 Egyptian individuals [n = 38]. Initially we assigned ancestry for k = 3 ancestral populations [Maghreb: SAH, European: BAS, Sub-Saharan: MKK] using a 0.8 posterior probability threshold, shown in [A,B]. Then we assumed a different set of 3 ancestral populations [Maghreb: SAH, European: BAS, Near Eastern: QAT] shown in [C,D]. In the third step, we assumed the Sub-Saharan ancestry, assigned in A, represented truly divergent sub-Saharan haplotypes given the high Fst between this ancestry and all others. E] We layered these haplotypes on top of [C] [Maghreb, European, Near Eastern] deconvoluted chromosomes. [TIF] Figure S9. A] We present the average assigned ancestry [>0.8 posterior probability] across chromosome 1 for each of 4 ancestries assigned in the Egyptians: Maghrebi [Saharawi], European [Basque], Near Eastern [Qatari], Sub-Saharan [Maasai]. [TIF] Figure S10. A] Distribution of the number and length in centimorgans of migrant Sub-Saharan [Luhya] tracts distributed by length found in the South Moroccan population. B] Distribution of the number and length in centimorgans of migrant Sub-Saharan [Maasai] tracts distributed by length found in the Egyptian population. Red bar indicates the minimum threshold cutoff employed in the migration parameter analysis. Please note the different scales along the X-axis. [TIF] Table S1. Name, sample size and country of origin for populations newly genotyped in the present study as well as for populations published previously. References are included in the table. [DOC] Table S2. Additional estimates of Fst after removed putative admixture events. [DOC] Table S3. Significance of the comparisons of ancestry assignment using PCADMIX and ADMIXTURE. [DOC] Text S1. Assigning local ancestry with PCADMIX. [DOC] Text S2. Concordance between ADMIXTURE and PCADMIX. [DOC] Text S3. Chromosome 1 Ancestry Deviations. [DOC] [IMG]http://picturestack.com/422/87/eOdPicture3Sl5.png[/IMG] [IMG]http://picturestack.com/422/87/w1kPicture4j1D.png[/IMG] [IMG]http://picturestack.com/422/87/doBPicture51kg.png[/IMG] [IMG]http://picturestack.com/422/87/E8MPicture6KL6.png[/IMG] [IMG]http://picturestack.com/422/87/1ZDPicture7XOg.png[/IMG] [IMG]http://picturestack.com/422/87/fZcPicture84R3.png[/IMG] [IMG]http://picturestack.com/422/87/hcNPicture9qEz.png[/IMG] [/QB][/QUOTE]