The Mirage of Data Portability

In The Facebook Trials: It’s Not “Our” Data I wrote:

Facebook hasn’t taken our data—they have created it.

…Moreover, it’s the prospect of profits that has led Facebook and Google to invest in the technology and tools that have created “our data.” The more difficult it is to profit from data, the less data there will be. Proposals to require data to be “portable” miss this important point. Try making your Facebook graph portable before joining Facebook.

In an important post, Will Rinehart, adds detail:

Contrary to the claims of portability proponents, however, it isn’t data that gives Facebook power.

Facebook’s technology stack, the suite of technologies that it uses behind the scenes, clearly shows the importance of scaling, as much of the architecture was developed in-house to address the unique problems facing Facebook’s vast troves of data. Facebook created BigPipe to dynamically serve pages faster, Haystack to efficiently store billions of photos, Unicorn for searching the social graph, TAO for storing graph information, Peregrine for querying, and MysteryMachine to help with end-to-end performance analysis. Nearly all of this design is open for others to use, and has been a significant boon to programmers in the ecosystem. The company also invested billions in content delivery networks to quickly deliver video, and it split the cost of an undersea cable with Microsoft to speed up information travel.

The vast investment that Facebook has put into programs for understanding and processing its users’ data points to the fundamental flaw in the argument for data portability.

…Requiring data portability does little to deal with the very real challenges that face the competitors of Facebook, Amazon, and Google. Entrants cannot merely compete by collecting the same kind of data. They need to build better sets of tools to understand information and make it useful for consumers.


"…Requiring data portability does little to deal with the very real challenges that face the competitors of Facebook"

Maybe. But a lot of that technology in the stack seems directed toward providing things that are much more valuable to FaceBook than to its users. I'm not on FaceBook, but my impression of the users I know is that they would prefer a much simpler service -- something like the Facebook of 10 years ago (basically messenger plus photos and posts by friends and little else). If a competing service was available that offered that experience and was interoperable with FaceBook, I think FB would face a significant risk of mass defection.

This is not true of Amazon and Google, though, where keeping customers depends very little on network effects (it doesn't matter whether or not your friends also shop on Amazon or use Google Maps to navigate).

The alternative to data portability is of course standardised third party access like a wholesale service (much like permitting cross network interactions between utility networks). Perhaps there would be more merit in arguing for greater standardisation and access to data points via APIs across different platforms. That might mean that portability isn't that necessary - if I could communicate across networks, like sending an FB post to a Twitter user, or allowing a Twitter user to access an Instagram photo gallery.

Effectively this would split FB into two potential businesses: FB wholesale, which provides data storage and structures which are accessible to third party retail network businesses; and FB retail, which continues to offer its existing user interfacing product to retail customers.

Third party access would also potentially ease barriers to new entrants, and remove the incentive for incumbent network businesses like FB to acquire any new businesses whose network reach could ultimately threaten their core business (like Instagram).

Thirdly, third party access via standardised data formats would also provide competitive tension against existing ad loads and invasive privacy standards on FB, and could provide a mechanism for existing FB users to leave FB without leaving the FB wholesale service (and losing their retained data). This might be preferable to the current situation.

Obviously this would reduce the incentive to invest in new data structures and services. Incumbent networks would also need to find a way to continue to earn a fee for storing exited users data and photos which are now used by a third party provider. And I’m sure there would be technical complexities around ‘third party access for data’ which are infinitely more challenging than I can appreciate right now.

Facebook uses technology Facebook created to run Facebook - so?

This is like arguing that only Gmail offers what Gmail offers - and that there is no way for a competitor to offer something the same, or even comparable.

Take another look at those tech pieces. They can be used in a number of large applications. As a FN naysayer, it's cool that they've opened these up to others to use.

I've taken a look at several of them, and like you said they can be used in (very) large applications. I'm guessing the largest 1% ?
The point of the post cited by Alex is that "...[it] has been a significant boon to programmers in the ecosystem". What programmers? the 1% who actually develop software that benefits from that stack?

That doesn't seem like a lot of value for society. And should I substract from that all the wasted effort thanks to countless developers using those same technologies for problems that don't need them ? ("it's cool, let's use it anyway!")

That 1% of largest applications provides a lot of services.

"("it's cool, let's use it anyway!") You do definitely have a point here.

'They can be used in a number of large applications.'

Again, so? Not to any sense to dismiss the usefulness of GNU/open source software, but that is not the sort of thing that either Prof. Cowen or Prof. Tabarrok have apparently ever considered worth discussing on this web site.

Notice that the GPL plays a role in several companies development of software, not just Facebook. Here is some reporting concerning that from a year ago - 'On November 27, Red Hat, IBM, Google, and Facebook announced that they would give infringers of their GPL software up to a 30-day hold-off period during which an accused infringer could cure a GPL violation after one was brought to their attention by the copyright holder, and a 60 day “statute of limitations” on an already-cured infringement when the copyright holder has never notified the infringer of the violation. In both cases, there would be no penalty: no damages, no fees, probably no lawsuit; for the infringer who promptly cures their infringement.'

"Again, so? " With this you both dismiss and admit that you were wrong in the previous comment. With dozens of unrelated bombast following.

'With this you both dismiss'

Yep, I am dismissing your examples.

'and admit that you were wrong in the previous comment'

When pointing out that the GPL requires precisely such sharing? With an actual link talking about Facebook plays as a member of the GPL community?

'With dozens of unrelated bombast'

I'm guessing you mean words, because I just might have used a dozen words before linking to what was pretty major GPL news at the time.

That Facebook does not ignore the GPL is to Facebook's credit, by the way. It just has basically nothing to do with Prof. Tabarrok's attempt to create a manticore.

"When pointing out that the GPL requires precisely such sharing?"

The conversation was about the application tools they have created, so yes, you responded with a wall of words unrelated to what we are talking about. On a subject you brought up.

Acutally yes - gmail service is superb and it is quite hard to offer something similar (and this is on *mail* service which has been here over 3 decades). So why would you expect data portability would have noticable effect on a service that has been here a few years?

Sure, we could have a facebook competitor. But would this regulation really matter? Can't they fetch their own data if they manage the feat of making something similar to facebook?

Some people have woken up to Trump's victory and are lashing out. They have caught Facebook in their wild flailing about. Bad news for Facebook. Of course Facebook wasn't doing anything anyone did not know - nor did they do anything they were not praised for doing when it helped elect Obama. But the mob needs a scapegoat and it is Facebook's turn. So tough luck for them (although all they have to do is hold out until the fuss dies down and then the Great and the Good on the Left will be defending them again. Look at Tom Brokaw)

Still, the Twitter Mob insists there is a problem. Is there? Some people seem to think so. What is to be done? Lots of ideas floating around. This seems to merge several of them. Making data portable is really about helping people leave Facebook. It is so much work to re-connect with all your Middle School friends if you move social media platforms. So you are stuck unless you can take all your address book and so on with you.

That does not really solve the problem.

The issue is really that Facebook asked nicely about our data and we said yes. Only it turns out that data analysis is "emergent". They get a lot more out of it than we thought they could and it may well add up to a Very Bad Thing.

So the real solution, apart from not giving them any data at all, is to insist that they cannot use that data without our permission and even then only for the purpose they expressly told us, and even then we should be allowed to withdraw. Basically the ethics and consent process that is common in most of academia these days.

The more I read articles like that, the more actual sympathy I feel for Trump.

The whole is such a mess of vague accusations, associations, insinuations, lack of criminal specifics and general ad hominem, that it makes me think Trump is LESS likely to be guilty of a criminal offence. There doesn't seem to be any criminal "centre" to the investigation. Just the pursuit of tawdry details used to nail people for procedural crimes of lying to the FBI. If so much smoke, where is the fire?

Trump's critics refuse to specify what exactly he is supposed to be guilty of, and dissimilate endlessly. What exactly is his crime? They are convinced he has done something terrible, but what? It seems to mostly consist of being elected, and not being Hilary Clinton. The investigation certainly makes Trump seem unsavoury and a bit dim, but it also makes his critics look irrational and viciously deranged.

If hints of malfeasance make you think Trump is less likely to be guilty, that is irrational. That's not how math works. That's not how Bayes' Theorem works. You are absolutely wrong that Trump's critics are simply refusing to specify what exactly he's guilty of -- they have little alternative. Critics are waiting for the DOJ investigators to do it and perhaps also elected reps. if the Congress flips. Why should pundits have to state what Trump is guilty of? They are relatively less powerful compared to the DOJ and the Congress so pundit opinion has to exhibit caution. These things I've written are plain and simple so the mystery is why you don't see it the way I do.

Unfortunately for you, Brian, I'm better at Bayes Theorem. Perhaps because my priors are better formulated.

I expect an extensive search to have uncovered prima facie serious wrongdoing already. If serious wrongdoing, I expected search convergence, not divergence. If serious wrongdoing, I expected much more smoke at this stage. If serious wrongdoing, I would NOT expect Stormy Daniels and other trivia to be a main avenue of pursuit.

Given the limited discovery and the scope of the search, I have thus downgraded my odds for (serious wrongdoing). Please note my background priors for ANY US politician is P(clean) = 0.2 P(minor malfeasance) = 0.7, P(serious crime) = 0.1

Basically, I think if you put any senior US politician or businessmen through the wringer like Mueller is, you'll come up with a list of minor stuff. The laws are written thus that it is almost impossible not to make some minor transgressions in a successful life. So I'm very unimpressed so far.

This might work if minor malfeasance has a negative correlation with serious wrongdoing. That would be an unusual understanding of human nature. The correlation is probably positive.

So Melania is not fluent in 5 languages and you are not a statistician as you claim to be. Are you detecting a pattern yet?

Another reason for scepticism:

Many of the "promising" avenues of investigation have obviously run into the ground. The Russian collusion arc (the main point of the investigation?!?) seems to be a complete bust now; we've already found every contact Trump's extended circle had with Russia and there's nothing to hang anyone with. Just a string of tawdry business deals that never panned out. The Russian information campaign was indeed real but laughably small and ineffective, and the Russians seem as surprised as anyone at Trumps triumph.

Team Dem set out to hang Trump for Treason and is now trying to settle for a campaign finance hush payment in the $100k range and a side of sexual smears?!? Well, fine, try and make it stick, but jeez, the way you lot are carrying on is pathetic.

Manafort built his career in political work for foreign dictators. I suggest that you don't know what Manafort did when he worked for Trump. In that case your opinion that there's "nothing to hang anyone with" is uninformed and premature. You seem to be satisfied with your armchair perspective and I think that's an inappropriate position from which to draw a conclusion. The GOP shares your preferences. They don't want to look up a blocked caller number near the time that the Trump Tower meeting was discussed. How difficult could that be? Not difficult at all.

More 'effing supposition, hearsay, and insinuation.

I am not a political naif and know how the world works. I don't like it, but there's nothing here beyond the "normal" level of sleaze in our ruling class. You talk about Bayes Theorem; which is the bigger prior to be concealed? Minor sleaze and indiscretions or serious crimes?

Get some evidence with information content, not noise.

I agree minor sleaze is more probable than serious wrongdoing in a randomly selected politician in the absence of information about the character of that politician.

However, the random selection process doesn't apply here. The DOJ has acquired details known and unknown to the public and the media have done some work also. Obviously we know the character of the man because he sought to make himself known.

If I may, I might also add that the whole "Russia hacked the election" thing cost your team a LOT of credibility and downgrades the weight I give your subsequent assessments here. It increases the odds that your perspective is seriously distorted by mood affiliation.

I thought, from the start, the Russian operation was a limited 7-figure operation with spoiler intent that had pretty much zero effect. But I indulged your team whilst the allegations became ever more vague and grandiose, from hacked voting machines to 100's of millions and a grand Kremlin plan to install Trump (rather than just cause chaos). I had to listen whilst liberal family members assured me that they had seen "secret" information that was going to reveal the horrifying scale in the coming weeks, but it never did. Even after the underwhelming inditements (half of which are frankly legally dubious and the other half are mostly procedural crap which will never be prosecuted), the ridiculous grandstanding continued to the point of hauling Zuckerberg over the coals... I felt sorry for the guy. You vote Dem all you life, donate generously, and then get humiliated by your patrons for a non-event that you couldn't control.

It was all a bust. My original assessment of scale and effect seems about right. So I'm not well disposed to this follow up act.

To say Russia certainly did not tip the election would seem to make sense if the margin of victory was big and yet it wasn't. I think the margin of victory in WI, MI, PA totals about 0.56% so just about anything could have been the last straw on the camel's back. A few hundred cubicle workers in Eastern Europe leveraging the internet, possibly. Jill Stein, likewise. Comey, likewise.

The weakness of the Russia argument is that Stein and Comey also tipped the election, but that doesn't make the argument wrong.

It doesn't matter to me what the 'liberal team" said because I'm not on anybody's team so rationally my credibility is not affected by team cred.

... and my cred is boosted when you let me have the last word here and on the other threads.

BTW, there are critics, ex-prosecutors that have for many months been stating exactly what crimes they believe were committed. You're not looking much if you haven't encountered these people. Still they haven't got the investigatory powers of the DOJ and Congress and ex-prosecutors haven't had access to seized materials from the raids so, as I suggested, forbearance is reasonable.

And yet you do not specify them. Lose another point.

What are the crimes and what is the evidence? I am infuriated with the vagueness and hearsay in this matter from Trumps critics.

It seems you're getting emotional. Try researching "Akerman" on evidence of obstruction and conspiracy and quid pro quo. If you don't think obstruction is a serious wrongdoing then make yourself comfortable in the banana republic.

Why would a Trump-allied Congress have informed you of evidence. Consider the possibilities of individual dishonesty and concealment and dissembling. There are constraints on smoke production. If the Congress flips in November, check for smoke after the flip.

You make my point fully, through tedious Newsweek and Daily Beast crud. The charges sought in your references are not directly about a crime, but procedural violations like "obstruction" and "lying to the FBI".

In the absence of evidence of a real crime, serious or otherwise, (where is that Russian treason, exactly? What is the statue infringed?) we have a 3-ring circus whose job is to provoke and trawl about until someone makes a procedural error in covering up some ridiculous indiscretion and then Mueller at al jump on them shouting "gotcha". As an exercise in the deep state it is loathsome; a system where everyone can be found guilty of something under infinite investigation and power resides in prosecutorial discretion and Judicial activism.

I remember Ken Starr. This is twice as ridiculous as that was.

So, I am pleased I "researched" your material. This analyst and statistician is certainly infuriated by the dissembling displayed. Given the vaguery and animus of the allegations and the inability of Trump's critics to point to real crimes rather than procedural ones, I will leave it to history to decide which of us is "emotional".

If obstruction is not a crime then how do you explain what happened on July 27, 1974. Wikipedia seems to think obstruction can get 20 years and conspiracy with Russia 5 years.

Akerman worked on the Watergate prosecution so I wouldn't call him crud.

Alistair, why didn't Vekselberg's cousin pay Cohen directly? Instead the money was put where the Stormy payment came from. That's an account that appears to have been created for facilitating Trumpy activities. Presumably Cohen has his own bank account.

"Facebook hasn’t taken our data—they have created it." ?

In 1950, did a Ma Bell telephone book "create" personal date or merely organize it well in a convenient and relatively portable form. Do encyclopedias or dictionaries "create" data or just provide a useful service in organizing existing date?

I believe it was copyrighted information.

Not in the U.S. - a collection of facts is not copyrightable. 'Feist Publications, Inc., v. Rural Telephone Service Co., 499 U.S. 340 (1991), was a decision by the Supreme Court of the United States establishing that information alone without a minimum of original creativity cannot be protected by copyright.[1] In the case appealed, Feist had copied information from Rural's telephone listings to include in its own, after Rural had refused to license the information. Rural sued for copyright infringement. The Court ruled that information contained in Rural's phone directory was not copyrightable and that therefore no infringement existed.',_Inc.,_v._Rural_Telephone_Service_Co. (Of course, this decision is later than the 1950s.)

... and eventually, as the publishing of phone numbers got more ubiquitous and the stakes got higher, people demanded the ability to opt out. Same cycle with caller ID blocking. As now with FB.

The tech creates a powerful way to aggregate and publish data. Eventually, the users of the aggregated data overreach, and so a tipping point of public outcry is reached.

Just because I use a telephone does not mean ATT has a right to own my number and publish it against my wishes.

Yes, I think this is the best comment.

If Bissell and Drake invent technology for drilling below the ground to find oil and Daniel Plainview comes along and drinks your milkshake by extracting the oil below your land, did they "not take your oil -- they have created it"?

I thought it was spelled Fascbook.

We're headed towards a collision between European style "personality and moral rights" and American copyright laws. The French and the EU seem to have the upper-hand - a consequence of Brexit.

Also none of the FB listed here technologies are new . They are just more efficient and brazen implementations (because of centralization) of processes that also exist in more decentralized form. A successful FB-busting strategy should focus on mandating that closed systems like FB provide public APIs to the data (possibly with some kind of time delay to protect FB's economic rights) . This is where moral rights come in - I see predict legal challenges , at least in Europe, that will breach the walls of these walled gardens.

>Also none of the FB listed here technologies are new

Doing something at scale is new. My mom loves these technologies because she can stay in touch with her children and grandchildren who are spread all over the world.

Social networks are essentially a scaled technology that grows because they work as expected across a very broad array of places, equipment and skill levels.

The legal challenges will break up these behemoths by building barriers to entry.

It is worth noting that the question has only come up when the entrenched powerful were surprised by a challenge to their power which came through these networks.

I don't trust anyone in this situation.

'It is worth noting that the question has only come up' in 2011, when Facebook signed the FTC consent decree concerning ensuring customer data was kept private.

'Former Federal Trade Commission officials say that Facebook Inc. appears to have breached a 2011 consent agreement to safeguard users’ personal information and may be facing hundreds of millions of dollars in fines.

The agency could fine Facebook up to $40,000 per violation per day -- which could add up quickly with millions of users involved -- if it finds the social media giant broke its earlier promises to protect user data, they say.'

Feature not bug. There seems to be so little awareness that everything people complain about with FB are central to the business model and routine. Being manipulated by Ruskies drives people apoplectic, but being manipulated in exactly the same way by Amazon not so much.

They don’t care about the Russkies, it’s all about any weapon to hand to attack Trump

If the Russkies were using Facebook to AstroTurf anti-fracking hysteria (just as a completely hypothetical, never to be imagined in the real world example) there would be no real concern.

I get your point that the topic is opportunistic. (Although I do in fact think the Clinton wing of neocons has a Russia fixation that predates the Trump stuff).

Nevertheless, the dialog around the outrage is myopic and ignorant of how FB works. But I don't think they can contain the thread. Already we are seeing the criticism of FB regarding Russian trolls opening up doors to all sorts of complaints. I believe the howling about bots opened some eyes to all of it. Sort of broke the spell.

'They don’t care about the Russkies'

You are of course right. It is about Facebook violating the 2011 FTC consent decree (see above).

We are manipulable, and it is difficult to admit the degree to which we are manipulable. Presently we are looking for someone to blame for this manipulability without recognizing how our own nature contributes.

Apparently, Facebook knows more about me than I know about me. That isn't that much.

I think it is better for everyone that facebook open sources the technologies they have created than the data they have collected.

I'm surprised nobody is talking about the privacy issues around this:

Mormons rulz!


All those promises that voluntarily cataloging our DNA wouldn't be used against us.

I know, I know, they supposedly got a really bad dude, albeit one that apparently put himself out to pasture already.

But of course that's how it always starts.

The cryptoscene and the blockchain it rides on is coming to eat the Facebook/Google lunch ...........checkout @instartoken as the go-to app for monetizing every consumer's data, seamlessly.

I'm looking forward to reading Radical Markets by Eric Posner on this topic

I think this article conflates two things. There is our public/private data, and there are infrastructures to share both kinds of data.

If I share pictures on Instagram that's intentional, and yes there is a big infrastructure to support it. But if Instagram deduces things about me and my travels, and sells that metadata to advertisers or political operatives, I don't think "but there is an infrastructure" matters much.

Keep your eye on the ball. The loss of privacy is about unintentional disclosures, compiled in petabyte databases, and shared incestuously by organizations of all types.

For that resson privacy is functionally dead in the US, until such time as we decide some rules for "gray data" retention.

By the way, at least a dozen organizations with whom I have relationships have sent me updated privacy rules in the last week. Does anyone know what's going on? Is this their reaction to Facebook's scandals or was there a change in law that triggers these new revisions?

The General Data Protection Regulation (GDPR) (EU)

That is interesting. It would be nice if I got a peripheral benefit.

Correct. In addition, although the GLBA (Gramm-Leach-Bliley Act) has been around for years, colleges and universities are about the be liable to face closer scrutiny about how they use and protect constituents' data.

AlexT: "Nearly all of this design is open for others to use, and has been a significant boon to programmers in the ecosystem" - as long as you play by Facebook rules. Recall the UK programmer during the Zuckerberg testimony that was cut off by FB after years of profitable collaboration because they had the temerity to dispute FB in public.

"They need to build better sets of tools to understand information and make it useful for consumers."

For "understand" read "sell" and by "consumers" read those who would manipulate behavior.

That said, there are dozens of companies who can build these tools, and many different ways that data can be monetized. However, due to network effects, only the first movers will successfully acquire data.

Should also add React and Flux to the publicly available technologies Facebook started. As front ends grow in complexity, FB has certainly pioneered ways to maintain GUIs much more simply and predictably at scale.

Rinehart is wronger than a sizzling NY strip at a vegan restaurant. FB's data is its #1 asset. No data means no $$$ means no tech. He has cause and effect completely backwards.

Facebook users don't need "end-to-end performance analysis'. A paid-for-service competitor wouldn't invest in things meant for advertisers.

That said, portability doesn't need to be implemented by Facebook. A would be competitor can just ask for your permission to export your graph with or without Facebook's help. Yes, it would be easier if Facebook helped but not absolutely necessary.

For those who wish for a public API to their own data, go to and read the documentation. You can see nearly everything. The only exceptions will be where someone else’s privacy takes precedent over yours (for example, seeing who has blocked you or deleted your friend request).

Would be competitors would take care of it. The issue of "data portability" seems to me more appropriately expressed as whether the API is good for exporting or if the competitor has to resort to web scraping. If Facebook wants to be friendly to advertisers then much can be exported but advertisers probably don't (yet) have a neural net wanting to stream entire posted videos so Facebook can drag their feet when it comes to exporting that kind of thing.

The listed technologies solve the problems that Facebook has at trying to run Facebook as a profitable monopolist social network, not problems that face its users in trying to share messages and pictures.

One can imagine an alternative history in which social networking evolved on peer-to-peer lines (or more realistically, peer-to-peer with certain superpeer nodes), that would require little that did not exist in 1999, and would be difficult to make monopoly profits at. (Superpeers would have made normal operating profits, akin to paying a few dollars for SMTP service that doesn't involve Google spying on your email.)

Data portability has been one the mayor success in this technology era.

Users upload their data to Facebook, so presumably if they wished to retain this data (in its original form) they could do so just by archiving it?

Facebook processes this user data (and other data users have agreed to provide, such as location). But to the extent this processed data differs from what the user uploaded it would seem to be Facebook's data now and not the user's.

Complaining that Facebook uses data you provided to it for its own purposes seems like complaining that broadcasters put ads in their broadcasts: that's the basic business model that pays the bills.

Although I remain mystified at the grip Facebook seems to have on many of its users, and the persistent, frequent "I just need to check ..." behaviors produced by this.

The argument should be the degree of use and manipulation that your data (that you agree to put on Facebook) is used and sold to other companies at our expense.

Comments for this post are closed