I won’t double indent, these are all his words:
“I agree with your general take on pricing and expect prices to continue to fall, ultimately approaching marginal costs for common use cases over the next couple years.
A few recent data points to establish the trend, and why we should expect it to continue for at least a couple years…
- OpenAI reduced core LLM pricing by 2/3rds last year.
- StabilityAI has recently reduced prices on Stable Diffusion down to a base of $0.002 / image – now you get 500 images / dollar. This is a >90% reduction from OpenAI’s original DALLE2 pricing.
- OpenAI has also recently reduced their embeddings price by 99.8% – not a typo! You can now index all 200M+ papers on Semantic Scholar for $500K-2M, depending on your approach.
- Emad from StabilityAI projects ~1M fold cost improvement over next 10 years – responding to Chamath who had predicted 1000X improvement
- continued application of RLHF and similar techniques – these techniques create 100X parameter advantage (already in use in force at OpenAI, Anthropic, and Google – but limited use elsewhere)
- the CarperAI “Open Instruct” project – also affiliated with (part of?) StabilityAI, aims to match OpenAI’s current production models with an open source model, expected in 2023
- 8-bit and maybe even 4-bit inference – simply by rounding weights off to fewer significant digits, you save memory requirements and inference compute costs with minimal performance loss
- pruning for sparsity – turns out some LLMs work just as well if you set 60% of the weights to zero (though this likely isn’t true if you’re using Chinchilla-optimal training)
- mixture of experts techniques – another take on sparsity, allows you to compute only certain dedicated sub-blocks of the overall network, improving speed and cost
- distillation – a technique by which larger, more capable models can be used to train smaller models to similar performance within certain domains – Replit has a great writeup on how they created their first release codegen model in just a few weeks this way!
- distributed training techniques, including approaches that work on consumer devices, and “reincarnation” techniques that allow you to re-use compute rather than constantly re-training from scratch
And this is all assuming that the weights from a leading model never leak – that would be another way things could quickly get much cheaper… ”
TC again: All worth a ponder, I do not have personal views on these specific issues, of course we will see. And here is Nathan on Twitter.