A One Parameter Equation That Can Exactly Fit Any Scatter Plot

In a very surprising paper Steven Piantadosi shows that a simple function of one parameter (θ) can fit any collection of ordered pairs {Xi,Yi} to arbitrary precision. In other words, the same simple function can fit any scatter plot exactly, just by choosing the right θ. The intuition comes from chaos theory. We know from chaos theory that simple functions can produce seemingly random, chaotic behavior and that tiny changes in initial conditions can quickly result in entirely different outcomes (the butterfly effect). What Piantadosi shows is that the space traversed in these functions by changing θ is so thick that you can reverse the procedure to find a function that fits any scatter plot. He gives two examples:

Note that he doesn’t present the actual θ’s used because they run to hundreds or thousands of digits. Moreover, if you are off by just one digit then you will quickly get a very different picture! Nevertheless, precisely because tiny changes in θ have big effects you can fit any scatter to arbitrary precision.

Aside from the wonderment at the result, the paper also tells us that Occam’s Razor is wrong. Overfitting is possible with just one parameter and so models with fewer parameters are not necessarily preferable even if they fit the data as well or better than models with more parameters. His conclusion is exactly right:

The result also emphasizes the importance of constraints on scientific theories that are enforced independently from the measured data set, with a focus on careful a priori consideration of the class of models that should be compared.

Comments

"In a very surprising paper Steven Piantadosi shows that a simple function of one parameter (θ) can fit any collection of ordered pairs {Xi,Yi} to arbitrary precision."

He would say it, wouldn't he?

I'm struck by the irrelevance of many of these comments. Read the (well written) paper. It says very clearly that you can fit an arbitrary set of data with a continuous (in fact infinitely continuous differentiable) function depending on a single real variable. The significance is that "counting parameters" is not a good measure of complexity. That's all. You may or may not find that interesting. I do.

For whatever reason, Marginal Revolution has cultivated a low standard of regular commenters.

And racists. Don't forget racists. Is Tyler Cowen secretly jesus (something something fan club)?

If you allow enough digits in the parameter, it seems that you can trivially encode any image. The 'parameter' is the binary JPG data, while the function is the JPG decoder.

Yes I can't see why it would be considered in any way "surprising" that you can encode a countable number of pixels in a countable number of digits.

I came in here to say exactly this, well done.

The problem is that discussions of JPGs don't get to use exciting scientific terms like chaos theory, the butterfly effect, or even "function."

So you are unlikely to dupe an economist who is bad at math into re-posting anything about them.

I imagine the fully paper shows you can do this for a continuous function, i.e. something you might find in a statistical model. Of course an image can be encoded using discrete math, but the issue isn't of encoding. It's of fitting a supposedly explanatory continuous function.

Normally you fit a continuous model to discrete data. If you start with a continuous function, then there is no need for fitting.

Furthermore, the paper states that between any two points the function produces a sine wave. So even though the points can be arbitrarily close together, this method will not produce a useful continuous function.

Since nobody seems to want to read the paper, here's the function:
https://www.screencast.com/t/xn3OnMB1rQ

f_theta(x) = m^rx (theta) = sin^2( 2^rx * arcsin( sqrt(theta) ) )

He specifically says "a simple function of one parameter". The JPG decoder is not simple.

Now all he needs to do is find some useful purpose for this "discovery"

Being able to decompose messy functions into simpler stuff is actually super useful, e.g. fourier. For example, commenters above are saying that this isn't a surprising discovery, since you can stuff the output of a jpeg encoder into the number and use a jpeg decoder as the function. Consider the opposite however - if the paper gives us thetas that are shorter than the output of a jpeg encoder, great, you have a better way of encoding images than jpeg!

The difficulty will be not in finding useful purposes for it, but in finding ones that are both useful and better than what exists. There are already established methods for reducing information to a function with a few parameters, and your computer will not work without them.

Yes. Also, saying this shows Occam's razor is wrong is like saying that if you find that one number that completely specifies the entire history of the universe and every particle in it then you have no need for scientific laws or theory. Now if only we could figure out what that number is!

Occam's razor doesn't say anything about the number of parameters. It just says we should prefer a model that is more parsimonious. A model can be more parsimonious by having fewer parameters, or by having a simpler form of dependence on the parameters it has etc. Information criteria are often used to give precision to this intuition.
The main problem with Occam's Razor is basically the opposite - there are so many dimensions of counting assumptions that it can basically become a tautology: The best explanation is the one that is best when we count the assumptions "correctly".

Agreed. It seems to me there is no refutation of Occam here (but it is still interesting). Occam says only that simplest explanations are best (most likely to be true). This just shows that all the complexity in a graph can be stuffed into a single parameter (of one thousand digits!).

Also: "The result also emphasizes the importance of constraints on scientific theories that are enforced independently from the measured data set, with a focus on careful a priori consideration of the class of models that should be compared." Yes!! But how did we forget this in the first place?

Now apply this insight to "global warming" and see what happens.

Yes. Claude Shannon and all that.

Umm. What? Of course it can. Allow an arbitrarily precise number to be used- this gives you an infinite set. Trivially, this can be mapped to another infinite set. Guess you gave up in math before real analysis?

Is it trivial to show that for a continuous function?

My real analysis rusty... but I don't think this is possible in continuous space. Arbitrary precision, n-dimensions? Yes. But don't you get a Cantor's diagonal issue in continuous space?

in continuous space (assuming the function is differentiable) just set the digits of the parameter to be the coefficients of a taylor expansion for the function. this should fit the function to arbitrary precision depending on how many terms of the expansion you use and how many significant digits you use for the coefficients.

alternatively, the fourier series is an example of an (infinite dimensional) linear independent basis for the continuous functions. the digits in the parameter can represent the weights on each term of such a basis.

https://en.wikipedia.org/wiki/Space-filling_curve

I have to agree with the other commenters: this is a neat result, but basically equivalent to the trick that you can use to compress any file into one that is 0 bytes long... at the cost of a very long file name.

I tend to agree with the other commenters as well... Or are we all missing something crucial here?

Prof. Tabarrok knows more about the progress of philosophy than everyone combined.

True, that does not really make the cut for using Occam's Razor, but as noted by Prof. Tabarrok, 'Occam’s Razor is wrong.'

Cleverly encoding a long list of number pairs in one number is like defining the moral philosophy of a society in a list of the beliefs of every person within it. The ultimate moral relativism.

Great post!

This is a plot of the function from 0 to 1 using theta = 0.00242
The period of the wave in inversely proportional to X. The rate at which wave period decreases is directly proportional to theta. http://www.wolframalpha.com/input/?i=sin%5E2(+2%5E(8x)arcsin(.00242%5E0.5))+from+0+to+1

I don't want to be pedant but.......the function only reproduces nodes positions. Our brains literally "connect the dots" and see an elephant. However, the links between nodes is assumed by our brains but NOT provided by the function.

The illusion of the elephant arises just because the function is plotted at discrete X positions and our brains link them. If the function is plotted continuously, the elephant is gone and we just see a sinusoidal wave with period reducing to zero .

The large amount of "information" is in our brains, not in the 1 parameter model.

No, the information is there. It's an elephant, to some resolution. If you'd like a higher-resolution elephant, you'll need more information.

If you'd like a completely smooth continuous elephant then I'm afraid you're going to be disappointed when you take a close look at an elephant's skin someday. Or your own: https://www.gettyimages.com/photos/skin-microscope

Ceci n'est pas un éléphant?

I don't understand how what you're describing differs from literally any depiction of an elephant. Even in the most high-fidelity digital image of an elephant, "the function only reproduces nodes positions. Our brains literally "connect the dots" and see an elephant. However, the links between nodes is assumed by our brains but NOT provided by the function. "

If you want to take this to an even more solipsistic direction, even in viewing an elephant with our eyes, we're only connecting the dots formed by reflected light rays, we're not actually "seeing" an "elephant".

You need and specific list of X values to get dots. Without ~100 specific X values, the plot is just a periodic sinusoidal wave. This vital input is not acknowledged.

This is not a discussion on the nature on perception. Just try to replicate the article. Try to reproduce the plot.

But...isn't the point to produce the node positions such that they give the impression of the elephant? A sufficiently dense plot of dots can give the impression of a continuous line. I believe the paper is saying that if they want to produce a continuous line it would be possible with the right input, though it might not be trivial to find the appropriate input. I don't think the point is to produce only barely-interpretable nodal plots, but that these are rather just proofs of concept, no?

"Note that the only shown x values are integers, and between these integers are the rapidly oscillating sin patterns implied by (8)."

A function that merely reproduces data points is worthless. We already have the data points, we are interested in interpolation or extrapolating something useful from them. And this method completely fails on that measure. As others have pointed out, despite being a continuous function there is literally nothing informative *between* the data points (which is why the continuous function is not even shown in the paper).

The paper is a bit of a joke (as others has observed) but has a serious point: counting the number of parameters is still frequently taken seriously as a measure of model complexity. For example the number explicitly appears in BIC / AIC formulae. We can and should do better.

If you'll forgive the self-promotion, you may find this interesting:
https://arxiv.org/abs/1705.01166
Which even cites some economists (Sims, Rational Inattention... AER) who kept looking at things which statisticians discarded as too ugly.

The model presented isn't all that complex, though. The complexity we see in the output comes almost entirely from the complexity of the input.

Others have mentioned Occam's Razor, but even with Occam's Razor the simplest model isn't always preferable. In fact, the simplest model that agrees with all observations about the thing being modeled is the one preferred.

Well there are various meanings for complexity. Their formula is indeed simple and has one parameter. But the point they are making (I think) is that these are terrible measures of complexity, because so much can be hidden in the parameter.

Models never agree with all observations, there is always some noise. So interesting interpretations of Occam's razor need to talk about tradeoffs... find a way to think at the margin.

see also, https://en.wikipedia.org/wiki/Tupper%27s_self-referential_formula, another tongue in cheek over selling of what is essentially an encoding algorithm

'Occam’s Razor is wrong'

Um, no.

'Occam's (or Ockham's) razor is a principle attributed to the 14th century logician and Franciscan friar William of Ockham. Ockham was the village in the English county of Surrey where he was born.

The principle states that "Entities should not be multiplied unnecessarily." Sometimes it is quoted in one of its original Latin forms to give it an air of authenticity:

"Pluralitas non est ponenda sine neccesitate"
"Frustra fit per plura quod potest fieri per pauciora"
"Entia non sunt multiplicanda praeter necessitatem"

In fact, only the first two of these forms appear in his surviving works and the third was written by a later scholar. William used the principle to justify many conclusions, including the statement that "God's existence cannot be deduced by reason alone." That one didn't make him very popular with the Pope.

Many scientists have adopted or reinvented Occam's Razor, as in Leibniz's "identity of observables" and Isaac Newton stated the rule: "We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances."

The most useful statement of the principle for scientists is
"when you have two competing theories that make exactly the same predictions, the simpler one is the better."

In physics we use the razor to shave away metaphysical concepts. The canonical example is Einstein's theory of special relativity compared with Lorentz's theory that ruler's contract and clocks slow down when in motion through the ether. Einstein's equations for transforming spacetime are the same as Lorentz's equations for transforming rulers and clocks, but Einstein and Poincaré recognised that the ether could not be detected according to the equations of Lorentz and Maxwell. By Occam's razor it had to be eliminated.' http://math.ucr.edu/home/baez/physics/General/occam.html

That an economist might not understand Occam's Razor is not all that surprisingly, of course, even when using just a single data point.

"when you have two competing theories that make exactly the same predictions, the simpler one is the better"

That's fine. The more interesting questions are first deciding what simpler means, when one theory isn't just the other plus a spandrel. And second, deciding how much better a theory has to be, to justify a little more complexity.

Simpler does not correspond to the number of parameters, one can be fairly confident, except that as noted by Prof. Tabarrok, 'Occam’s Razor is wrong,' so who cares, right?

Take a list of number pairs and remove the last ten, and apply the "theory" to the list to produce the parameter. Now generate the next ten numbers. Do they match within the predicted precision?

Encoding systems work, say mpeg, because they predict with enough precision in most cases the missing data. Assuming the data represents images of common nature. Most such encoding fails for PowerPoint slides.

Astounding that you can write a comment longer than the post which inspired it, and yet completely miss the point.

That Prof. Tabarrok does not understand Occam's Razor seemed to be the main take away of several commenters here (admittedly, others point to his mathematical inability, if that is what you mean by the main point). I just decided to include information for anyone to read so they can make up their own mind concerning Prof. Tabarrok's obrservation that 'Occam’s Razor is wrong.'

I don't think he's missed the point. Taking this version "when you have two competing theories that make exactly the same predictions, the simpler one is the better", Tabarrok's statement that Occam's razor is wrong is as absurd as it sounds. I imagine trying to produce a GDP trend or temperature time series by trying find the exact precise theta (that is completely undetermined) would quite a formidable grid search. And given the chaotic nature of the function, you would have to revise that parameter with each new data point. Think Google flu trends. The first year it was a success in predicting flu cases, the second year it was way off. That's what using this function to build a theory would look like. I would like to see Professor Tabarrok's response to all the comments, though. Perhaps, we're all missing something.

This seems surprising, but the more I think about it the less surprising it is. If you're using a number of arbitrary precision, you can essentially encode information in the digits of that number. That's doesn't seem to be what they're doing here, but it's one way to make this work.

Its not surprising at all. Its standard undergrad "philosophy", similar to monkeys on typewriters.

Anyone who read the post has already "expanded a number into a scatterplot of arbitrary resolution" because Alex posted an image of a scatter plot - literally a long number that the computer unpacked into an image.

Not really surprising when the parameter is incredibly complicated. Like some commenters have said, this isn't data - it's image compression. The author makes it look like a regression because it's boring otherwise. It has nothing to do with Occams razor and no implications for modeling anything.

The Occam's razor comment is off-base, but there still is an important point brought up by the paper: counting the number of free parameters in a model is not a good metric for evaluating model complexity. So we should throw out AIC / BIC, and instead, use alternatives like #s of bits to encode models.

Is counting bits really an improvement though? Doesn't the number of bits depend on the encoding scheme?

The number of bits under optimal encoding is one measure. This is a step up from counting parameters.

There's a nice set of ideas under Minimum Description Length, which counts both the bits needed for your parameters, and the bits needed to describe your errors. This gives a trade-off between simpler and more accurate models, which you must have... in order to compare at the margin. Whether it's the correct trade-off is another question.

Somewhat https://en.wikipedia.org/wiki/Kolmogorov_complexity#Invariance_theorem

I have long believed that Occam's Razor is more a definition of truth than a criterion for "discerning" it. When there are several explanations, we choose to adopt the most elegant/parsimonious/aesthetic. Anything else is just logical positivism. I was probably influenced in this by Feyerabend.

The equation is sin^2 [2^rx arcsin√θ]. It is amazing that you can get so much out of this function.

So, theory matters.

Very interesting. Thanks for the pointer!

However for real-world modeling and analytics engaged in by most businesses, I am not sure about the implications.

The models used by most banks or marketing companies for purposes of assessing credit risk, profitability, or response to offers, among other things are still based on relatively simple functions. Eg : Logistic and linear regression, Gradient Boosting. Bagging. Also the distribution of features used for predictive modeling in the real world are seldom truly continuous. Eg : FICO score - there are only some 300 odd values it can take.

So given these realities, I guess the conventional modeling wisdom of using fewer features for ensuring stable model performance still holds.

This is merely substituting the higher complexity of more parameters with the higher complexity of a much larger parameter value.

So, it doesn't seem to disprove Occam's Razor. Indeed, it seems to just add additional proof in favor of Occam's Razor.

Give a choice between a 1 parameter function with 5 significant digits and a 1 parameter function with hundreds of digits, which is likely to be more universally applicable?

Occam's Razor says if the output is similar we should prefer the short version.

Reading this post carelessly, I misattributed the Occam's Razor view to the author of the paper, but he explicitly makes exactly your point towards the end:

"More generally, uncritical use of a “parameter counting” approach ignores the fact that a single real-valued parameter potentially contains an unboundedly large amount of information since a real number requires an infinite number of bits to specify...Quantitatively, parameter-counting methods should be dispreferred relative to model comparisons based on measures that incorporate the precision required of real-valued parameters, including Minimum Description Length [17]. The result also emphasizes the importance of constraints on scientific theories that are enforced independently from the measured data set, with a focus on careful a priori consideration of the class of models that should be compared [4]."

What was Taleb saying about economists?

Should've removed the quotes since I put blockquote.

"Aside from the wonderment at the result, the paper also tells us that Occam’s Razor is wrong. Overfitting is possible with just one parameter and so models with fewer parameters are not necessarily preferable even if they fit the data as well or better than models with more parameters."

Good lord. No, the paper tells you that the complexity of your theory is not solely measured by the "number of parameters." The complexity of each parameter is also important, because in theory (and now, in practice) you can pack an arbitrary amount of information into a single real number. Obviously!

This is not by any means a new result.

> Aside from the wonderment at the result, the paper also tells us that Occam’s Razor is wrong.

No-Free Lunch theorem already tells us that Occam's Razor (or any learning rule) is untrue in half of all mathematically possible worlds. The thing is though that it's extremely true in our world. Just like how most real world functions are smooth, whereas the vast majority of mathematically possible functions are non-smooth.

One of the most persistent properties about our universe is that the physical behavior of proximate numbers tends to be similar. E.g. 88.35 year olds have similar mortality rates to 88.34 year olds compared to 22.54 year olds. The kind of function in that paper breaks this by making very small differences generate very large effects. That's an interesting, but ultimately unnatural, phenomenon.

Doug,

first of all, the NFL theorems assume that all possibilities are equally likely -- despite their different complexity. So it *assumes* that Ockham's razor is false, it doesn't *show* it. This assumption in turn is usually justified by the principle of indifference, which says that all possibilities are equally likely if nothing else is known about them. But actually, their complexity *is* known, and their complexity is not the same. In these case usually Ockham's razor (which says that simpler possibilities are more likely than others) is taken to overrule the principle of indifference. So the use of the principle of indifference is at least not obviously justified here. One could of course argue that it is more plausible than Ockham's razor even in cases of different complexities, since it is extremely hard to justify the Ockham claim that simpler possibilities are more probable. But the principle of indifference has also problems itself because there seems no unique way to "slice up" the possibility space into separate possibilities. Which leads to inconsistent probability distributions. This is known as Bertrand's paradox. So it seems the NFL theorem stands on similarly shaky legs as Ockham's razor.

Second, you cannot both accept the NFL theorem and also claim to know that our world is in fact simple. According to the NFL theorem, all extrapolation methods of known data into the future are equivalent. So according to the NFL theorem tomorrow all oh-so-smooth physical laws could just as well change totally as they could stay the same.

As others have said, this is not a new result. The whole thing is in fact reminiscent of a piece of history of Mathematics about 150 years old. Precisely, the discovery by the famous mathematician Cantor of the fact that the set of real numbers (or points on the line) and its square (or the set of points in the plane) has the "same number of elements", precisely that these two could be put in a perfect correspondence. In other words, Cantor had shown that points in the plane (or for that matter in a space of any dimension, even countably infinite) could be represented without repetition nor ambiguity by *one* real parameter.

Cantor wrote to Dedekind (an other major mathematical figure of the late XIX-th century) about his discovery, in two letters showing his huge excitement, and was congratulated by his colleague and friend. In subsequent letters, Cantor began to make very bold statement, very close to "The existence of an equation fθ with this property highlights that “parameter counting” fails as a measure of model complexity [...]", a sentence from the abstract of the paper under discussion. In substance, Cantor claimed to have destroyed the "theory of dimension" of the geometers, with the dimension of a geometric object defined as the number of parameters needed to locate a point in it. Dedekind tried to damp his enthusiasm, noting that the "parameters" used by geometers were supposed (explicitly or not) to vary "continuously" as the point they represent move, which was not the case in Cantor's construction, so that dimension theory was safe.

From this time it was clear that "parameter-counting" as a measure of the "complexity" of a system in a class of systems made sense only if the parameters were forced to describe the state of the system in a certain restricted way (continuously for instance, or in other contexts, "differentiably", "analytically", "algebraically", "functorially", etc.) and for a certain, corresponding class of systems. The paper in question seems to reinvent the wheel in rediscovering this old piece fo wisdom.

The large correspondence between Dedekind and Cantor is still a very readable, very interesting document for people interested in mathematics and their history.

Thank you, I was hoping you would comment on this post.

Thanks! Is there a book you recommend to read up more on this?

Googling for the correspondence between Dedekind and Kant, some forum pointed me to:

https://www.amazon.com/Kant-Hilbert-William-Bragg-Ewald/dp/0198505361

From 843, the early correspondence between the two is given, and later more are given. Joël should know better links.

This result is interesting because of the differentiability of the generating system and the fact it is deterministic. Kolmogorov did a bunch of these interesting toy problems (most famously, continuous everywhere differentiable nowhere function).

However, connecting this to the razor by saying OCCAM ~ no parameters is to be naive about a pretty major literature. You really need to add measure theory (probability) to get a useful definition. Then the problem has been very well studied and useful metrics defined (e.g. VC dimension, shattering dimensions on split sets etc -- a classic one studied is using sin(x/w) to fit points on (0,pi)x(-1,1) by just picking w on (0,1]. One parameter, many types of fits and overfits possible depending on assumptions of how points can be drawn.

Note that he doesn’t present the actual θ’s used because they run to hundreds or thousands of digits. Moreover, if you are off by just one digit then you will quickly get a very different picture!

I'd argue that it's effectively a multiple parameter function that is merely encoding the multiple parameters into a string of digits. All you have to do to revive Occam is measure it on the number of bits required for the input and output representations instead of the number of parameters.

I'm surprised that you're surprised. Of course a single parameter can be used to fit anything if you let the parameter have enough significant figures.

Here's another famous example of this from 2001: https://en.wikipedia.org/wiki/Tupper%27s_self-referential_formula

That one uses computational universality rather than dynamical chaos to encode the image into the parameter.

Both cases point towards a vaguer, hand wavy truth, which is that there's a tradeoff between expressive power and analytic tractability. As you let the language used to express the model grow in expressive power, it quickly becomes so powerful that it can express "anything", or at least a large enough class of things that understanding them in general is impossible, and only special cases can be understood. In this case, a little bit of nonlinearity is enough to draw a picture of an elephant.

You’re working off a dated, partial mathematical interpretation of Occam’s razor.

For a better interpretation see here:

https://medium.com/computational-science/deriving-occams-razor-using-bayes-theorem-6bf9676700d9

The complexity of your constants is relevant!

Haven't read the paper, but there is a really easy way to set up a one-parameter smooth function to perfectly fit a bunch of points.

The trick: simply encode a Turing machine in the function. The function is evaluated at integer points only, so we can let the function do whatever it wants at non-integer points in order to maintain smoothness. Choose the input constant to encode the state of RAM, and choose a smooth function which performs the action of a CPU under one clock-cycle, then do a back-of-the-envelope calculation for how many clock cycles are needed for the program you're interested in to run.

For example, the numeric constant could just be the machine code for an image display program, then a decimal point, then a .jpeg file.

All you need is the trick of defining it only on integer values. Just take an enumeration h of (Q x Q)^<w (countable set) and define the value on input n to be the minimal continuous curve connecting those points.

Now if you want it to be infinitely differentiable you need to do a bit more and instead find some nice continuously differentiable basis and fit tem to the points but it still doesn't require coding any Turing machines.

This post is proof that economists and academics in particular need to learn more math. The two results presented are not new and has been presented many times in the past. There's at least 3 other comments that have pointed this out. My respect for GMU has dropped down a notch. Looks like Koch brothers are paying top dollar to stack the faculty with woefully underinformed but politically convenient who will happily push their agendas.

As a mathematician I actually disagree. If anything economists are too involved/knowledgeable about math for their own good.

I mean I'm a logician so I love oddities in set theory and the weirdness of uncountable sets but spending a term as an undergrad in an econ class proving theorems about under what set theoretic assumptions axiom set 1 for preference relations are equivalent to axiom set 2 was a crazy waste of economists time. I mean there is never a situation in which a real world economic problem turns on whether or not these axiomitizations of preference relations may diverge if CH fails.

More practically, everyone can't know everything and just as I don't know standard economic facts about classes of macro models it's reasonable for them to focus on their field rather than trivia results like this.

arbitrary precision > 0
=============

I see a lot of this starting idea and it seems to be giving solutions to some old and critical math problems by letting computation count grow the same rate as precision increases. This, at its heart, is information theory but is also queuing theory over distribution nets. Shannon dunnit, give the guy a boost, he us up there with Einstein.

Thus, also from the paper
This approach is related to attempts to encode computations
into chaotic dynamical systems [8, 9].
==============
Don't give a a formula, give me your measurements and at some precision, I can give you the minimum state variable form. They find a set of quants algebraic around the bound precision. These quants work as good as any other set to compute the formula. You design a spacial finite computer optimized for the data. This is Shannon optimum coding theory. The big step is assuming the process under steady is entropy expanding which implies that any other theory of the process is a node by node transformation of your optimal processor design. This is singularity stiff, our last great discovery, everything blatter will be embedded in the singularity, just beyond our reach.

For those who still find this paper "very surprising", e.g., a typical economist with an exaggerated impression of his feel for mathematics, here is a rough idea of what the paper does.

The author wants to find a function f(θ, x)​​ of two variables θ and x with the following property: for any y_1,...,y_n, you can find θ between 0 and 1 such that f(θ, 1) approximates y_1, ..., f(θ, n) approximates y_n. One can assume each y_i lies between 0 and 0.5, without loss of generality.

To do this he uses the function m(θ) = 4 θ (1 - θ)​​, with θ between 0 and 1. Its graph is a single "camel hump" joining (0,0)->(0.5,1)->(1,0), and the range is the same interval [0, 1], this time with every point except one taken twice.

Thus, you can iterate m on [0, 1]. If you iterate twice, you get "two camel humps", the first between 0 and 0.5, and the second between 0.5 and 1.
Iterate 100 times, you get 2^99 camel humps, and the values in [0,1) are taken twice in each hump.

Thus the more you iterate, the more number of humps you get: given any small enough interval for θ to take values in, and any real number y between 0 and 1, a sufficiently large iteration of the function will give you a smaller subinterval on which the function approximates y to an a priori fixed extent.

Thus, the author's idea is to simply take take f(θ, x) to be the rx-th iterate of the function m (where r is a sufficiently large integer depending on how close you want to approximate): the point is that for integer x one can give a closed expression for the x-th iterate of m, that then makes sense even when x is not an integer.

Then f(1, θ) has a large number (2^{r - 1}) of humps as a function of θ in [0, 1], f(2, θ) a still much larger, f(3, θ) even larger, and so on. Get an interval where f(1, θ) approximates y_1, then a smaller subinterval where f(2, θ) approximates y_2 and so on.

To do this quantitatively, the author uses a somewhat clever and cute trick: he realizes that m can be `transformed' into a function that truncates the digits of a number written in binary form (read the paper for more details).

So Joël, I don't even know if something conceptually as nontrivial as Cantor's idea is involved.

https://en.wikipedia.org/wiki/Tupper%27s_self-referential_formula

One could simulate this with either a space-filling curve (https://en.wikipedia.org/wiki/Space-filling_curve) rescaled to fill a rectangle of size (Xmax - Xmin) * (Ymax - Ymin), or any sufficiently quickly oscillating function, like y = sin(wx) for large enough w.

This construction doesn't seem any more deep than that of Tupper's self-referential formula.

This is not a new point. Here's a philosophy and statistician making the same observation 10 years ago: http://www.princeton.edu/~harman/Papers/Induction-Stat.pdf (see p. 18)

https://en.wikipedia.org/wiki/Turing_machine

Once I saw that the function required a high precision parameter, I assumed that parameter would just encode the x,y pairs. It sort of does, but more subtly using continuous functions. The information has to be somewhere.

Comments for this post are closed