We review studies of genomic data obtained by sequencing hominin fossils with particular emphasis on the unique information that ancient DNA (aDNA) can provide about the demographic history of humans and our closest relatives. We concentrate on nuclear genomic sequences that have been published in the past few years. In many cases, particularly in the Arctic, the Americas, and Europe, aDNA has revealed historical demographic patterns in a way that could not be resolved by analyzing present-day genomes alone. Ancient DNA from archaic hominins has revealed a rich history of admixture between early modern humans, Neanderthals, and Denisovans, and has allowed us to disentangle complex selective processes. Information from aDNA studies is nowhere near saturation, and we believe that future aDNA sequences will continue to change our understanding of hominin history.
The genomics revolution is well under way. At the time that the first human genomic sequences were obtained (1, 2), it was almost inconceivable that within 15 y thousands of genomes from people around the world would be sequenced, many to a high depth of coverage (3). It was probably even less conceivable that partial or complete genomic sequences would be obtained from hundreds of modern human fossils (4–6), several Neanderthal fossils (7, 8), and even fossils of a previously unknown sister group of Neanderthals, called Denisovans (9, 10) (Fig. 1). Some of these ancient genomes have been sequenced to such high depth that their error rates rival those of high-coverage sequences from present-day humans.
The wealth of present-day and ancient genomic data has greatly increased what is demanded of population geneticists. When relatively few loci could be studied using marker loci—chiefly blood groups, allozymes, and microsatellites—gross descriptive statistics, such as heterozygosity, Wright’s FST, and various genetic distances were sufficient to characterize broad patterns of population differentiation. Application of these classic methods was pioneered by Luca Cavalli-Sforza and his many collaborators. As early as 1964, Cavalli-Sforza et al. (11) published a phylogenetic tree of 15 human populations based on a total of 20 alleles at 5 loci, mostly blood groups, for which adequate published data were available. The authors superimposed the tree on a world map to suggest past dispersal routes. Their map is surprisingly consistent with more recent studies based on vastly more data. Only the connection of Maori to Native Americans disagrees with currently accepted theory, that the Maori descended from Polynesians (12).
At present, not only can geneticists elucidate broad patterns of relationship among populations, but they can also provide detailed answers to historical questions of relevance to archeology and paleoanthropology. When, where, and from what source did particular human populations arise? Who admixed with whom and when did the admixture take place? Are obvious changes in the archaeological record the result of population replacement or cultural innovation? Did past cultures leave any genetic descendants? As we will discuss, analysis of ancient DNA (aDNA) has been successful in answering several of these questions, but has also raised new questions in the process. Importantly, aDNA provides a temporal dimension to genetic studies that would be inaccessible with present-day genomes alone, and only now is the full significance of aDNA being explored.
One of the major problems that prevented the widespread sequencing of hominin aDNA for several years was contamination. Genetic material extracted and sequenced from a tissue sample of a living individual will consist largely of DNA fragments from that individual (i.e., endogenous fragments) if standard laboratory practices are followed. In contrast, because aDNA is so scarce and fragmented, most of the genetic material extracted from fossils tends to be exogenous, usually either from environmental microbes or humans who handled the fossil (13). The latter type of DNA is especially troublesome, as presentday human DNA is similar in sequence to endogenous aDNA from hominin fossils, and can introduce biases in downstream analyses.
Although some of the first studies of nuclear aDNA from archaic hominins had problems with contamination (14, 15), there have been substantial experimental and computational innovations for mitigating its effect in contemporary studies. In the past decade, researchers have developed two broad sets of approaches to correct for contamination in their aDNA samples, allowing for the study of previously unusable sequences.
First, it is now a standard practice to extract aDNA under strict clean-room conditions—including UV radiation, bleach treatment of surfaces, and filtered air systems—so as to minimize the proportion of exogenous DNA in the fossil extracts (13). Additionally, at the time of DNA library construction, scientists incorporate unique adapters to tag molecules that are present at the moment of extraction (16), to prevent additional molecules accidentally added during subsequent sequencing steps from being confused with endogenous molecules.
Second, after the DNA has been sequenced, several bioinformatic tools can be used to either remove contaminant reads or estimate the proportion of those reads present in a DNA library. A common practice is to estimate the rate of contamination using mitochondrial DNA (mtDNA), which is much more abundant than nuclear DNA and hence is sequenced to a much higher coverage than nuclear DNA. For highly divergent populations (e.g., Neanderthals), one can use diagnostic positions that distinguish the two groups and assess how many discordant reads are present at each position (17, 18). For modern human populations (e.g., ancient Europeans), one can check for reads that diverge from the consensus sequence or that do not contain molecular signatures consistent with aDNA (19, 20). There are also more sophisticated contamination rate estimation methods that use larger subsets of the data, including sex chromosomes (21, 22) and entire autosomal genomes (7, 23). Additionally, one can use patterns of cytosine deamination at the ends of fragments—a postmortem chemical damage typical of aDNA—to filter out sequenced reads that do not display this signature and are therefore not likely to be ancient (24).
The sequencing and analysis of genomes from Neanderthals and their relatives has been nothing short of revolutionary. First, the question of interbreeding between Neanderthals and modern humans—posed by paleoanthropologists over 30 y ago—has now been convincingly answered (7, 8). Additionally, a sister group of Neanderthals, called Denisovans, was discovered and its relationship to Neanderthals and humans established (9, 10).
Neanderthals. Despite the overlapping ranges of Neanderthals and modern humans in Europe and western Asia for at least 10,000 y, there was no widely accepted archaeological evidence that Neanderthals and modern humans interacted or interbred. Krings et al. (25) found that mtDNA sequences obtained from a Neanderthal fossil lay outside the clade composed of all mtDNA sequences from modern humans. This pattern of reciprocal monyphyly has been confirmed in many later studies of Neanderthal mtDNA (17). Although the mtDNA tree was consistent with the hypothesis that there was no admixture between the two groups, it did not provide conclusive evidence against it. In fact, reciprocal monophyly would be seen with significant probability even if there had been substantial admixture (26). Before the sequencing of the first Neanderthal genome, an analysis of presentday human samples had indicated there might have been high levels of archaic ancestry in both European and West African genomes, likely stemming from a diverged hominin group (27).