Why forced data portability is a mistake

“Let people export their Facebook data into competing social media services” — what could sound better?  It would seem to make the market more contestable and give consumers more choice — what could be wrong with that?

Alex already has pointed out that it is not “your data.”  And here is Alex on the mirage of data portability.  I would like to add a few points:

1. Presumably data portability would be imposed on Facebook’s competitors and potential competitors as well.  That would mean all future competing firms would have to slot their products into a Facebook-compatible template.  Let’s say that 17 years from now someone has a virtual reality social network innovation: does it have to be “exportable” into Facebook and other competitors?  It’s hard to think of any better way to stifle innovation.

2. Which are the eligible firms/services to which exportability is going to be required?  Do you really want government to be certifying the “legitimate players” in the market?  And branding other competitors and potential competitors as not worthy enough to deserve this legal protection?

3. How about when you tag other people?  If I’m only on Facebook, and only wish to be on Facebook, and you wish to export your tags of me and other me-related content to some other service, how much am I losing control?  Or do the regulators somehow mandate a “only data about you and you alone” standard?  Is that really going to be possible to enforce?

4. Didn’t we just go nuts on Mark Zuckerberg and call him before the commissars for exporting too much data to external services?  And in many and probably most of these cases with user consent?

5. Does a mandate for exportability also imply a mandate for importability?  Does Facebook have to swallow and digest your photos from Pinterest?  From Russian and Chinese services?  From very difficult to make compatible systems?  We might end up with walled gardens in any case.  I find it striking that many critics of the tech companies hold two directly incompatible views: a) you have to make them post everything, and b) you need to make them liable for what they put up.

Comments

The rules for the new money tech.
Keep your personal data in your plastic card, which can also verify, silently, that you meet any account restrictions you have agreed to. The system needs only know that you are a live thumb print, and that your cash device is not counterfeit.

We don't all have smart plastic cards, but some platforms like Telegram, do a very good job of letting you trade and meet contracts without revealing your goodie. In Telegram, I think, your social media is encrypted, except for your chosen parties.

Behind the scene, we have AI machines running autonomously organizing your anonymous contracts such that goods are optimally distributed, there are no observable shortages that do not get immediately marked to market. So, no need for Facebook or Google or Swift, we end up with a shell like Telegram, and optimizing bots running Python always keep us optimum. We see the most informative stuff, according to our data contract with the All. No human needed except us and our desire for goodies.

How does it hide you walking down a street, walking into and through stores, hide you driving your car, parking your car at home, at work, while shopping, while working out at the gym, or hiking in a park?

The story about the repo man in WaPo?? was interesting. To get data on where cars to be repo'd are currently located, the repo man is producing data for free to a data broker.

"Repo agents are responsible for the majority of the billions of license plate scans produced nationwide. But they don’t control the information. Most of that data is owned by Digital Recognition Network (DRN), a Fort Worth company that is the largest provider of license-plate-recognition systems. And DRN sells the information to insurance companies, private investigators — even other repo agents.

DRN is a sister company to Vigilant Solutions, which provides the plate scans to law enforcement, including police and U.S. Immigration and Customs Enforcement. Both companies declined to respond to questions about their operations. The potential misuse of the plate data has drawn criticism from privacy groups. A federal court in Nevada ruled in January that the scans do not amount to unwarranted surveillance because they are essentially snapshots taken in public."
--- The surprising return of the repo man
https://wapo.st/2L04xz7

“Exportable” isn’t the same as “importable”. You could export the data in any form you like. It’s up to each service that wants to use it to learn how to do so. That’s how everything from bookmarks to contact lists irks now. Standards efforts can reduce the effort, but that runs years late.

Obvious. TC's handwringing is not necessary here.

I think the claims in (1) and (2) are not major problems with the policy. The government could simply require the data be able to be exported in some standard, human-readable format that is publicly documented (a CSV format where what each variable means is documented), and leave it up to the competing sites to determine if they want to do the work to implement support for importing from the CSV files produced by Facebook or other competitors. A mandate for importability does not seem to be implied by this and would be a really bad idea as you mentioned in (5).

Big Government dictating every aspect of the data analysis industry?

Hey, Google and Facebook would happily define the data based on what data they had in say 2010, to be exported in 2020, so any competitor would need to invest $100 billion to use that data as seeds for collecting enough data on a hundred million people to be competitive.

I bet Google has invested at least a thousand dollars per person in its database over the past quarter century collecting and organizing the data from thousands of sources.

How can Google export the data collected from your activities over the past quarter century, especially when it is not tagged as specific to you personally?

Exporting the all the data that relates to a person seems impracticable and probably infeasible.

But one can imagine an intermediate solution. I imagine a user can authorise a third party to access his/her Facebook data. So I can use a different social network, but still see part of the Facebook activity on a different platform.

For an example check the Payment Service Directive (EU) that would allow users to instruct their bank to share their data (for instance, to a third party app, or financial service provider).

https://en.wikipedia.org/wiki/Payment_Services_Directive

A similar directive could be designed for photos, connections, etc. It would look like sharing your Instagram update in your Twitter feed. The difference is that Facebook could not block such sharing, because it would be written in the law.

'Exporting the all the data that relates to a person seems impracticable and probably infeasible. '

You do know that Facebook already provides this service, right?

<a href="https://www.facebook.com/help/1701730696756992?helpref=hc_global_nav"<Accessing & Downloading Your Information - https://www.facebook.com/help/1701730696756992?helpref=hc_global_nav

I used the word "export" wrongly. did not mean asking Facebook to download the data in bulk. But asking Facebook to show my data to a third party in real time, via an API. Facebook does this but in a very limited way, and it is up to them who gets to access to FB data. A directive would put the user in charge of who can access their FB data.

Fair enough, but the point of consent becomes more difficult. Plus, unlike exporting data that belongs to the user (in the sense of EU law), an API does come with cost to Facebook, and would allow a competitor to piggyback on Facebook. I am not saying that I personally have much of a problem with this, but it is not a completely frivolous objection. (The major difference to net neutrality is that in the case of net neutrality, I pay for the access to Internet that is the service being provided by the company that took my money. In the case of an API, Facebook receives no payment from me to provide that service, at least under the current framework.)

Even electronic anonymity does not stop you being tracked via meta data. To put this debate in context I think we need to relate our electronic identity to our physical persona, and see if we are judging both by the same standard.

It is impossible, for example, to be invisible in public, although it is relatively trivial to be unnoteworthy. Likewise in the physical world, it is not possible to avoid a video capture which could be viewed and reviewed at a later date, but it is difficult and often illegal to correlate this public knowledge with private knowledge, such as where we live, who our friends are, our political views and what our salaries or bank balances might be.

We have had centuries of struggle to establish our freedoms, and we have had many examples of the downsides of those freedoms being restricted, and we should insist on the same freedoms in our electronic lives as we do in our physical lives - it should not be an option to sign away those freedoms in multi-page Ts & CS designed to protect the violators of our rights instead of recognising and protecting them.

The penalties for doing so should be crippling to the companies concerned, to the company directors and officers concerned, and to the company investors concerned, so that there is no doubt where the axe will fall and so that everyone who has a duty of oversight is liable for any violation of law or principle.

#1 is not a problem.

This very website can be read with edge, chrome, firefox, opera and a long list of other web browsers. The HTML/CSS data and scripts can be read by any of them.

The MR site is "portable". Does this "portability" stifles innovation?

Right. Someone used Jupyter Notebook a little while back to reprocess public MR data. Everyone approved.

I commented then that they could do much more with the private data Alex and Tyler claim is "not ours" - the email and IP addresses associated with each comment.

Basically Tyler is making assertions here that he knows are wrong, that he probably wouldn't follow through on himself.

Social media always has embedded social agreements. "Not yours" is just a stupid hard-line rejection of that reality.

This doesn't seem to be a fair analogy. Sure, web pages are portable in the very generic sense that web servers listen on port 80/443, clients can handle html/css/javascript and they both agree to use HTTP/TCP, but because everyone now makes this choice (or in the case of data-portability would be induced to do so) the barries to exit are enormous and there isn't much innovation in *this* particular area.

Most of the innovation now is on top of these protocols/technologies, or adjacent to them. In a sense this standardization either pushes innovation up a level, to services built on top (or othogonally into new areas). Just as the adoption of IP at the network layer shifted innovation more towards the transport layer, and the adoption of tcp/udp shifted it more to the application layer.

I think the point of #1 is more about forced, premature standardization. Once we get to this point the type of 'horizontal' competive innovations tend to slow and its more practical and profitable to move into other areas.

If data is in any reasonably defined format it can be reprocessed, at a later date, into bright shiny new formats.

You see it every day, when email is translated into html for web viewing.

Staggering. This is a blog ... with an RSS feed ... arguing that standards based content syndication is impossible, or if not impossible infeasible, or if not infeasible undesirable.

Walk outside and ask the first ten people you meet if they know what RSS stands for, what it is, or if they use it. How many times, total, do you think you'll hear 'yes' ?

Exactly as many as would know what SMTP stands for. Yet email remains usable in spite of mass ignorance.

When I read a numbered list of concerns on MR, I'm used to them being intractable points of political conflict or hard and bitter pills to swallow about the social order. Most of these seem like eminently solvable technical problems.

Actually, with the exception of 3, all of them can either be considered long solved technical problems or long solved technical non-problems.

3 is an interesting point that has absolutely nothing to do with data portability per se, but goes to the very heart of what personal data means. And in a sense, it will never be solved, if only because different legal systems have different frameworks in which the question exists. Though as Prof. Cowen's example shows, he is not aware of American law surrounding this point - a picture taken of you in a public place without commercial interest does not belong to you, and you have absolutely zero rights regarding what the photographer does with that image in the sense of having zero claim to any copyright interest (at least in such an example).

'Alex already has pointed out that it is not “your data.”'

And he was wrong.

'It’s hard to think of any better way to stifle innovation.'

Really? Tagging as a concept seems to have worked out pretty well. Maybe you have heard of this data network called the world wide web? Hard as it might be to imagine, you can actually share data using it. (And HTML was predated by SGML, and was followed by XML - it is the concept that is powerful, just like TCP/IP.)

Further, you are aware that Facebook itself has ported its data, right?

'Which are the eligible firms/services to which exportability is going to be required?'

Are you familiar with the financial industry? Amazingly, such questions did not seem to cause problems when creating common standards for data transfer. Then there is thing called the Internet - amazingly, all devices connected to it seem to be able to use it at a fundamental level data exchange level, if not in every single aspect.

'How about when you tag other people?'

This is a real problem that has nothing to do with data portability. I have never had anything to do with Facebook, but undoubtedly exist as a shadow user.

'Didn’t we just go nuts on Mark Zuckerberg and call him before the commissars for exporting too much data to external services? '

What 'we?,' and the question of consent seems to have flown right over your head.

'Does a mandate for exportability also imply a mandate for importability?'

No.

'Does Facebook have to swallow and digest your photos from Pinterest?'

No.

From Russian and Chinese services?

No.

From very difficult to make compatible systems?

See the point about tagging above. If you run a photo sharing site that does not support video files, then all files marked *.mp4, *.ogg, *.webm, etc will not be supported - obviously.

'I find it striking that many critics of the tech companies hold two directly incompatible views: a) you have to make them post everything, and b) you need to make them liable for what they put up.'

What is even more striking is how some people can continue to show they have absolutely no awareness of what they continue to write about, even after repeated attempts to demonstrate just how much education is required before they can even begin to understand the subject.

Where does the comparison to the financial industry come from? All the data interfaces and transfers are about decreasing transaction costs. The transactions are the result of some good or service exchanged. The demand for these transactions services is extraordinary since little occurs without them.

The data collection and analysis is designed to generate income. That is why the structures of data collection have been built.

As you say there are already standards of data available. Then what?

Much of this data has no value outside of it's sharing context. A photo of downtown Vancouver is one of millions taken, only gaining value and relevance through sharing and connecting it to some context; who were there, when, for what event, etc.

I'm not complaining that this data is available, or defending the practices of these behemoths. But these systems are useful because of their complexity. The data outside of those contexts can be made useful with extraordinary effort.

So the question that Tyler is asking is whether government will require that the data be exportable in a way that makes it useful. Because outside of the software structure it likely isn't.

'Where does the comparison to the financial industry come from?'

Who defined the 'financial industry?' The point was in reference to this - 'Which are the eligible firms/services to which exportability is going to be required?'

'All the data interfaces and transfers are about decreasing transaction costs.'

Actually, when it comes to something like FATCA or various compliance regulations, that is not true. https://www.irs.gov/businesses/corporations/foreign-account-tax-compliance-act-fatca

'The data collection and analysis is designed to generate income.'

Compliance regulations are about combatting terrorism and proliferation. In the broad sense, they are what tripped Cohen up, by the way. https://en.wikipedia.org/wiki/Bank_regulation_in_the_United_States#Anti-money_laundering_and_anti-terrorism

'So the question that Tyler is asking is whether government will require that the data be exportable in a way that makes it useful.'

Let me be extremely generous, and say that is at least a possible interpretation. One that has absolutely no technical merit, but it might be what Prof. Cowen meant.

'Because outside of the software structure it likely isn't.'

What data structure? Are you familiar with just how much meta-information is attached to a digital image taken by a smart phone? Of course, if you add a caption or identify the people in the picture, you have added information. And the way that Facebook stores it is eminently comparable to the already existing EXIF data.

We are talking about data, after all. Talking about whether that data will be used in a fashion that is identical to what Facebook is simply an incorrect perspective to data portability, as any database programmer will tell you. If you want 100% Facebook functionality, you have to use Facebook. At least until Facebook changes something, as it does on a regular basis.

There's a reason why Zuckerberg said facebook would welcome government regulation.

I suppose there may be good economic arguments why no policy should be allowed which threatens Facebook's business practices and position in the market.

Both Alex and Tyler lean hard on a confusion between data and metadata.

My Christmas pictures are clearly my Christmas pictures. Perhaps social media has extra data about when and from where I shared them.

How did that become not mine?

And as I say, why would anyone lean hard on that not being mine?

These discussions would be a lot better if people stopped using "mine."

"I requested FB to configure FB's system to allow Tyler Cowen to see this photograph that I made." You didn't give up the copyright, but you did ask a third-party to do something to a system that they built, own, and run. We can argue about how informed your consents were about what they could do with the knowledge that the request had been made.

I chose pictures for a reason. Under current law I believe there is an implicit Copyright, declaration not required. Facebook may have some boilerplate about assignment - but it relies on an unsubstantiated chain of possession.

What if I mail you my private photos (controlled distribution) and you post them (per boilerplate, but clearly uncontrolled distribution)?

If you are Facebook you paper it over with assumptions, and make it "someone else's fault."

If you are Tyler you try to say it was never yours anyway.

IANAL

There is some deep history on trying to do public hypertext systems which track copyright of every item. The designers of those systems all went mad before completion. Which is why the World Wide Web was initially mute on copyright. It avoided the madness. But in so doing it created a lot of grey areas that persist today.

Have you ever visited a web page that says "Copyright All rights reserved?" That is nonsensical, right? A web page exists to be copied to your computer for reading. Copyright, in its original meaning, simply incompatible with information systems which work by copying.

Social media just represents new gray areas and new agreed principles for those gray areas.

And as I say, I think "not yours" is the stupidest possible response.

Data portability means that if someone gains access to my Facebook account, or breaches Facebook's security, they can easily impersonate me on every other social network I don't already have an account for.

Facebook as an OAuth identity provider already means that.

'if someone gains access to my Facebook account'

Yes, if that happens today, you will have problems. Including the following one - 'they can easily impersonate me on every other social network I don't already have an account for.'

What's the (logical) difference between "it's not your data" and "you didn't build that"?

Easy. One arbitrarily assigns ownership to a corporation, one recognises diffuse interests.

The digital equivalent of "you didn't build that " would be a demand that all data be free. Cue Richard Stallman.

You are misunderstanding what is required by data portability.

All facebook, or any other competitor, would have to do is allow the user to export data encoding their contributions to the site (presumably with exceptions for contributions which can't be exported without violating privacy protections for another user). There is NO REQUIREMENT that this data can be uploaded in a way that makes sense on any other competing platform. For instance, there would be no requirement on tumblr to figure out how to map their concept of reblogging on to facebook's notion of comments in order to comply with data portability.

Secondly, data portability generally wouldn't cover the kind of content that is reasonably viewed as being created by facebook, e.g., history of what kind of clicks you make. Rather it is about ensuring that the companies can't use the threat of you losing all the posts and pictures you've put up on their service to ensure you don't jump ship to a better service.

--

Personally I think the whole privacy worry thing is going the wrong way and that the danger to avoid is a situation where only the government and a few big corps know people's dirty secrets so I have no qualms about this allowing more access to data.

What about a weaker requirement, just that companies can't have rules preventing people from writing interoperable software? I was under the impression that you don't see other social media apps interoperating with Facebook because doing so would be a violation of their terms of service and get all of your users banned. (Ideally, you could just use Facebook's API without restrictions based on competition, but I assume that would just lead to companies not putting anything important in their API)

Comments for this post are closed