Accurate genomic prediction of human height

by on September 21, 2017 at 12:37 am in Data Source, Science | Permalink

They used to say this couldn’t be done:

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ~40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ~0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the “missing heritability” problem – i.e., the gap between prediction R-squared and SNP heritability. The ~20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.

While I don’t find “within a few centimeters” to be especially impressive, the question is still “what’s next?”

The authors on the paper are Louis Lello, Steven G Avery, Laurent Tellier, Ana Vazquez, Gustavo de los Campos, and Stephen D. H. Hsu.

1 buddyglass September 21, 2017 at 1:02 am

Are their findings predicated on a single nutritional context? That is, they can predict height, based on genetics, but only if the country where the individual will be raised is known?

If not, then wouldn’t this imply nutrition doesn’t play as large a role in height? And if that’s the case, then does that mean the gain in height over the past couple hundred years is largely the result of “breeding” and not improved nutrition?

2 Roy LC September 21, 2017 at 2:03 am

The connection between nutrition and height is pretty darn robust, however in the same cultural group in a first world country you are not going to get a lot of variation in nutrition, or at least the kind that aids achieving full height.

As to they said it couldn’t be done, who is this they? Because I have known a lot geneticists since I was pretty short backin the 70s and predicting size was one they were always very sure they could do, unlike hair, eye and skin color, etc…

3 Careless September 21, 2017 at 11:50 pm

predicting size was one they were always very sure they could do, unlike hair, eye and skin color, etc…

They didn’t think those would be easy? Ouch.

4 Ricardo September 21, 2017 at 7:14 am

The R-squared is 0.4. That means 60% of the variation in height is not captured by their data on individual genes.

5 Hazel Meade September 21, 2017 at 10:32 am

Correct response. They’re basically saying it’s 40% genetic and 60% environmental.

6 Rick Hyatt September 21, 2017 at 10:59 am

You’re both wrong. It’s >40% *SNP additive* genetic as their polygenic score recovers most but not quite all of the SNP heritability (note the asymptote with sample size), then another likely 30-40% rare or interacting genetics (where the variants are either not recorded by the sequencing or their effect is too complex to model in a simple way), then the remaining 20%, as estimated from twin/adoption/family studies, is split between shared-environment and nonshared (which includes developmental noise, somatic mutations, measurement error, and is not generally what people think of as ‘environmental’).

> That is, they can predict height, based on genetics, but only if the country where the individual will be raised is known?

Earlier studies don’t show much modification of the polygenic scores by cohort. There will be some but not as much as you think.

> I’m calling B.S. on this study. As implied by buddyglass above, what these researchers are doing is curve fitting data with a N-th order polynomial, that any Matlab package can do for you trivially, so the equation fits the data (but gives no predictive power out of sample, going forward).

Ray is, as usual, wrong and has not bothered to read the paper. They validate out-of-sample.

> Not really. By simply increasing the sample size you can get the genetic prediction of IQ to about 40% of variance. That’s the main takeaway from this study.

Maz: no. ‘40%’ is not universal, it’s simply how well it does for height. Other traits have much lower or higher SNP heritability (because of differences in how well the traits are measured or their genetic architecture eg Alzheimers is quite low once you take out APOE). You can see this in their GCTA estimate of education: hardly 10%. As far as IQ goes, the GCTAs usually are around 25-30%. That sets the limit for GWASes like this. To do better, you’ll need better SNP panels/whole-genomes, better measurements than the very bad IQ tests used in the UK Biobank and elsewhere, or better methods like MTAG. Still, if you look at the sample sizes in SSGAC and elsewhere, expect IQ polygenic scores up to 25% in the next year or two.

> What’s the practical application of this pseudo-information?

Embryo selection would let you increase height of male children by ~2 inches; substantial biases favoring taller men in income and career accomplishment have long been documented. That’s far from nothing. It also serves as proof of concept that the method works as it far exceeds the previous height polygenic score of 20%, has the predicted phase transition, works on real data, and predicts out of sample. This means it will work for the other hundreds of traits in UKBB such as diseases, and that is surely *very* practical. Not to mention IQ. Height is an interesting trait on its own, since it’s been selected for in humans in various directions over the past few thousands of years (in addition to the Pygmies long before that), is closely connected to human developmental and growth processes, and has genetic correlations with other traits like BMI which are important. Lots of things. This is indeed big news.

> While they could generally predict the appearance of the results of a mating, they weren’t able then, and can’t now, predict if a dog would be more intelligent than its litter mates or a good hunter.

Dog breeders don’t have much funding or interest in doing it right. Animal breeders, cow breeders especially, can tell you with high accuracy whether the breeding value of one cow is higher than its siblings.

> With crispr the sky is the limit.

CRISPR isn’t that great. A SNP is usually not the causal variant, it merely ‘tags’ the causal variant by being on the same chunk of chromosome, which has the causal variant in linkage disequilibrium with the measured SNP. Works for prediction but it means that if you edit the SNP to one setting, most of the time it does nothing. This is also why polygenic scores weaken when transferring to other races, especially African ones: different chromosome chunking patterns mean the identified SNP no longer travels with the causal variant. Since the IQ SNPs have a small effect size as it is, typically 50% of your edits do nothing… you’ll need a *lot* of edits to get anywhere.

7 Maz September 21, 2017 at 1:44 pm

With more reliable IQ measurement than in most studies the GCTA heritability is surely closer to 40% than 25-30%.

8 Rick Hyatt September 21, 2017 at 2:39 pm

Closer to 50%, I would expect. But that merely raises the question of where you are going to get those 100k-1m much more accurate IQ measurements + genomes? Not from the UKBB or SSGAC, that’s for sure. My guess is that in practice GWASes will shift to using multiple bad measurements which can be combined into a single good estimate, which is how the best present IQ PGS has been computed: http://www.biorxiv.org/content/early/2017/07/07/160291.1

> Still, it’s a stretch to say “accurate genetic prediction of height” since the genes involved only get you 40% accuracy (loosely speaking), not 100% accuracy.

It’s twice as good as previous, offers a wealth of loci to investigate for biological insights, and is close to as good as can be done. ‘accurate’ is an accurate description. Also, if you require 100% accuracy to be ‘accurate’, then no one has ever done an accurate measurement of height, because height measurements will vary due to mood, spine, gravity (astronauts become taller), age (taller then shorter), and person conducting it by easily a centimeter from day to day.

9 Hwite September 21, 2017 at 11:05 am

Incorrect: they can detect only part of the variance as being due to specific genes. Twin studies have established its genetic component is greater.

10 Hazel Meade September 21, 2017 at 1:05 pm

ok, so it’s 40% these particular genes, and 60% “something else”.

Still, it’s a stretch to say “accurate genetic prediction of height” since the genes involved only get you 40% accuracy (loosely speaking), not 100% accuracy.

11 Dr. D September 21, 2017 at 9:49 pm

The authors stated that 40% of the variation in height could be explained by the 20,000 single nucleotide polymorphisms that they knew of. That still leaves other significant sources of genetic variation.
1. SNPs that have not been described
2. More radical mutations (deletions, insertions, frame shift)
3. Epigenetic effects (methylation, position effect, scaffold attachment) some of which may be affected by the environment)
Probably much more than 40% of height variation is due to genetics, but I doubt we will be able to select for height based on individual genes any time soon.

On another note, this research has not been reviewed, why? Trying to lead other teams down a rathole? Least publishable unite?

12 Rick Hyatt September 22, 2017 at 10:58 am

> On another note, this research has not been reviewed, why? Trying to lead other teams down a rathole? Least publishable unite?

Because Nature takes a goddamn year to publish papers like these (or worse, I’ve seen papers in Biorxiv take longer to reach their final homes), and that is an enormous opportunity cost for a method that could and should be applied to all GWASes >100k. The results are obviously correct and what is expected. And if you want ‘peer reviewed’ papers, don’t get your bockers in a bunch – there were several peer reviewed lasso papers before this.

13 The Anti-Gnostic September 21, 2017 at 8:37 am

In other words, Nature sets the height of the bar, Nurture determines whether you clear it.

14 Hazel Meade September 21, 2017 at 1:16 pm

It’s probably a bit more complex than that due to epigenetic effects. Maybe there’s a height gene that gets “turned on” in some environments, and that raises the bar a bit, and then you still need the nutritional intake necessary to max out the extra growth that gene allows.

15 Ray Lopez September 21, 2017 at 10:40 am

I’m calling B.S. on this study. As implied by buddyglass above, what these researchers are doing is curve fitting data with a N-th order polynomial, that any Matlab package can do for you trivially, so the equation fits the data (but gives no predictive power out of sample, going forward).

This is not ‘machine learning’ but using a Taylor series and least squares type analysis, techniques that are several hundred years old.

In short, a genetically identical person in the Third World will have a different height that a person in the First World, and your model cannot predict that unless you back-fit the formula to fit the data, which is trivial. Another ‘publish or perish’ science paper it seems.

16 Iamthep September 21, 2017 at 11:12 am

Compressed sensing is quite new. Terence Tao was involved with some of the theory.

See https://terrytao.wordpress.com/2007/04/13/compressed-sensing-and-single-pixel-cameras/

17 David Wright September 21, 2017 at 2:42 am

IQ!

18 Steve Sailer September 21, 2017 at 5:02 am

IQ is likely to be closer to last than to next. It’s extremely complicated.

19 Maz September 21, 2017 at 6:06 am

Not really. By simply increasing the sample size you can get the genetic prediction of IQ to about 40% of variance. That’s the main takeaway from this study.

20 chuck martel September 21, 2017 at 9:26 am

What’s the practical application of this pseudo-information?

21 Maz September 21, 2017 at 9:29 am

What’s “pseudo” about it? It will of course be used for embryo selection, among other things.

22 chuck martel September 21, 2017 at 9:50 am

If a Dalmatian bitch is bred to a Dalmatian dog, the chances are pretty good that the puppies will be white with black spots, a fact that’s been no surprise for centuries. In fact, the differences in domestic breeds of the same species can be credited to uneducated farmers and hunters of the past who were able to understand animal and plant reproduction. While they could generally predict the appearance of the results of a mating, they weren’t able then, and can’t now, predict if a dog would be more intelligent than its litter mates or a good hunter.

23 The Anti-Gnostic September 21, 2017 at 11:58 am

Dalmatians are bred for a certain phenotype and to be content running around a big, loud wagon. Border collies and German shepherds are bred for more complex tasks requiring more brainpower and a more predatory (but not too predatory) temperament.

One practical “application” with humans is simply the recognition that equality of educational inputs will not, and cannot, yield equality of outcomes, so education is more appropriately left to the private sector.

24 prior_test3 September 21, 2017 at 3:27 am

‘the question is still “what’s next?”’

Testing such height predictions on groups of children receiving significantly different amounts of protein over 2 decades, refining the nurture/nature parameters? With parental consent and adequate remuneration, undoubtedly. And no pesky bioethical panels interfering in the pursuit of knowledge, preventing any brave truth seekers from taking the necessary steps to a much better world.

25 jb September 21, 2017 at 7:20 am

Before long, genetic tests will be able to explain a a large part of the variation of a whole range of attributes, but no one will care because it won’t tell you anything more about yourself than all your friends know anyway.

26 Alex September 21, 2017 at 8:33 am

Something like that. I expect there will be novel applications of these studies and techniques but they are very tough to predict in advance. The story that it will lead to many high IQ kids is wrong.

27 Hwite September 21, 2017 at 9:56 am

Why? Even if we didn’t have crispr we could still accomplish a ~10 IQ point increase in intelligence(see Hsu) over what it would otherwise be if we had that data with embryo selection. With crispr the sky is the limit.

28 Alex September 21, 2017 at 10:43 am

We can already do IVF. You can take gametes from smart people. It is already more powerful than anything within reach. Nobody does it. Also, nobody is even interested in it or writing about it. Nobody cares about realities here. (Nobody has demonstrated CRISPR for massively polygenic traits so that’s a complete unknown.) This is all signaling and PR. I have to recommend Dale Carrico’s blog despite all its hot air just to give him credit for being anti-futurist. You will be more accurate in predicting the future by reading about social trends in the NY Times than by using someone’s PR framework to make predictions. Okay, that is the last I ever write about this subject here.

29 Hwite September 21, 2017 at 10:56 am

“We can already do IVF. You can take gametes from smart people. It is already more powerful than anything within reach.”

Incomparable: people don’t like cuckoldry. They want smart children: but they want them to be their children. When it can be done, people will demand it. Only an idiot or an ideologue would deny it.

30 Hazel Meade September 21, 2017 at 1:13 pm

People only select for intelligence in other people’s gametes if they can’t produce any of their own. I.e. egg and sperm donors.

31 Cyrus September 21, 2017 at 9:04 pm

Get the cost of full-genome sequencing down another order of magnitude, and fertility clinics will be marketing a “best of both of you” service. Make ten embryos, sequence each, pick the one(s) whose genomes the statistics favor for implantation. No cuckoldry required.

32 Matt H September 21, 2017 at 9:55 am

What’s Next?

Embryo Selection for Height in China by 2020. Embryo Selection for IQ by 2025 in China. Banned in US by 2026. Elites in the US start going abroad for embryo selection in large numbers by 2032. By then enough time has past to see the results, US elites feel competitive pressure. US legalize the practice in 2035.

33 harpersnotes September 21, 2017 at 10:11 am

Perhaps .. A few centimeters is environmental influences and developmental noise. Think in terms of animal breeding. Breed this cow, not that cow, is like natural language in computer language typology. Implement these genes not those genes is assembly language. In animal breeding there is tight control over the environment, chemically as well as socially and so on. Most of that is to prevent environmental influences that decrease the targeted traits. Similarly, I would think most of the ‘few centimeters’ in humans are decreases in height due to nutritional deficiencies and so on. But even there environmental and developmental noise is tough, so ‘missing heritability.’ Now, along with Crispr technologies and reiterated embryo-selection, there will also eventually be very tight control over genes and gene expression. As with all these similar sorts of technologies the press focuses on the use of them on humans, but the more immediate economic impacts are likely to be in agriculture and animal husbandry and all that entails such as building materials and biofuels. Not to downplay the human angle which in the near term is potentially highly significant for various gene related diseases, but the long term full potential would seem to be at least fifteen years (a teenager) later than non-human potential. – Potentially Chinese developed Von Neumann’s (super-geniuses.) In the past we’ve had tighter control over environments for humans’ development than genes (even with twins and degrees-of-relatedness studies, Mendelian Randomization, and so on.) This proof of concept breakthrough makes it clear that in the long run that will reverse.

34 Tom Warner September 21, 2017 at 1:27 pm

” actual heights of most individuals in validation samples are within a few cm of the prediction”

Guessing >50% of people’s height within 3cm does not strike me as an impressive result. You could probably do that just by asking gender and ethnic origin. I wonder if that’s not how they’re doing it – using DNA markers of ethnic origin, ie circumstantial evidence of likely height, rather than really finding any direct height-determining genes.

35 Maz September 21, 2017 at 2:04 pm

There were only white Brits in the sample and sex effects were controlled by z-scoring height within sexes (i.e. heights of men and women were rescaled so that mean=0 and SD=1).

36 Handle September 21, 2017 at 2:57 pm

Hopefully results like this are starting to make the anti-hereditarians / blank-slatists / human neurouniformitists feel just a little gun-shy about making their same old ideology-based assertions, out of accurate fear that they are all going to look pretty dumb in the coming years.

37 DevOps Dad September 21, 2017 at 8:01 pm

The solution is to gamble with the blank-slatist ideologues. Have them place bets on whether a randomly selected, one year old Border Collie can out think a randomly selected one year Chow Chow or Basset Hound.

Of course, the Border collie would receive only half or 1/3 of the hours of training its opponents would receive.

“If you’re playing a poker game and you look around the table and can’t tell who the sucker is, it’s you.”
— Paul Newman

38 Jake the Peg September 22, 2017 at 6:04 am

Both my legs have the same DNA but one’s shorter than the other.

Comments on this entry are closed.

Previous post:

Next post: