Should computers have their own websites?

by on January 14, 2012 at 5:59 am in Science, Web/Tech | Permalink

I think so, if not now very soon:

Websites designed to be read by computers rather than humans could make it easier to share and use data says Stephen Wolfram, creator of “computational knowledge engine” Wolfram Alpha. Writing in a blog post, he suggests that “.data” should join the likes of .com, .org and .net as a new top-level domain (TLD) for organisations to share data in a standard from, creating a “data web” that would run in parallel with the ordinary web.

Under Wolfram’s scheme, a website like wolfram.com would be accompanied by wolfram.data. A human visitor to wolfram.data would just see a list of publicly available databases, but a computer would be able to access and interact with the data itself.

Of course, this kind of data sharing is already possible thanks to application programming interfaces (APIs), the software instructions published by many web services that allow programmers to combine data in creative ways, such as plotting Twitter updates on a Google map. Each organisation’s API is different though, which can make them hard to use. Wolfram’s proposal would put data in a standard location and format, making it easier to access.

For the pointer I thank Michelle Dawson.

joshua January 14, 2012 at 6:18 am

As commenters on the original Wolfram blog post, the linked article, and elsewhere have pointed out, the suggestion of a .data TLD may be a commendable attempt to use the new TLD rules but is completely backwards to how websites work and it would make much more sense to encourage a data. subdomain. Due to the existing TLDs, there’s no way to know if Wordpress.data would refer to Wordpress.com or Wordpress.org, whereas data.wordpress.com and data.wordpress.org would solve both problems. You also can trust that data-dot subdomains belong to the domain owner, whereas a similarly-named domain on a data-dot TLD could belong to a squatter or worse.

Turkey Vulture January 14, 2012 at 6:41 am

This was my first thought, and it still seems right.

Chase Saunders January 14, 2012 at 7:32 am

Exactly. And Wolfram knows as well… this is a PR stunt.

The fact that there is already an explosion of API’s demonstrates that this isn’t necessary, as does the the existing RSS feed discovery mechanism – which can easily handle somedomain.com vs. somedomain.org because the links are embedded within the HTML of each site.

anon January 14, 2012 at 10:40 am

+1

What is needed are more voluntary data standards.

Aaron January 14, 2012 at 11:09 am

Agreed. They should create some sort of engineering taskforce for the internet. Then people could draft standards and leave requests for comments on those standards.

The Original D January 14, 2012 at 11:30 am

Dry, very dry.

NAME REDACTED January 15, 2012 at 2:23 am

lol, good one.

NK January 14, 2012 at 9:35 pm

I disagree. As far as i can tell .data wouldn’t have to contain the same data as .com. All he is saying that data should be represented differently, easily readable and understandable to computers.
IBM.data may or may not contain data from IBM.net, but so what? If it didn’t contain data you (your computers, that is) needed, you simply wouldn’t use IBM.data and you would point your computers to – for example – the official IBM data domain HAL.data.

How difficult is that? And how difficult is it to provide data.xml on IBM.com to explain where your .data sites are? Squatting: SSL certificates anyone?

Aaron January 14, 2012 at 10:25 pm

And who would own whitehouse.data? The government, or the porn company that owns (owned?) whitehouse.com. If the idea is that you could check whitehouse.com and get a pointer to whitehouse.data for…um…data(?) then you’ve just created two problems:

- You’ve generated two recursive DNS queries where one will do. Now I have to ask the root name servers who’s responsible (what servers serve data) for whitehouse.gov *and* now whitehouse.data.

- Only one of the two aforementioned organizations can get whitehouse.data. So there’s no good mechanism to associate an existing TLD with its .data counterpart.

There are plenty of other reasons .data is a bad idea – take for example it would be shitting up the Internet with yet another annoying TLD – but mostly it’s just a question of Wolfram trying to reinvent the wheel.

This is pretty much *the* primary use case for subdomains.

david January 14, 2012 at 6:22 am

So… RDF? Except even more clunky and separated, which is what strangles RDF right now?

It’s not like attempts to achieve the semantic web are new… If there’s one thing computers excel at, it’s at following arcane instructions embedded in pages. TLDs are for humans.

The fundamental problem that non-human data is easy to game – that humans are, in the end, the ultimate test of interest content – is never dealt with.

Swedo January 14, 2012 at 7:35 am

The subdomain ‘www’ used to be the place for webpages by convention. ‘data’ or ‘api’ may also be useful conventions, but it’s hardly revolutionary.

Silas Barta January 16, 2012 at 12:03 pm

That would be a different (and better!) idea. Wolfram is proposing to replace the .com or .org with .data, which is stupid — you don’t know which TLD you’d be replacing!

Using subdomains to specify a data-only interface would be better, and parallels the practice of using the m. subdomain (e.g., m.example.com) for mobile versions of sites. It also allows nesting of subdomains — foo.data.example.com

NAME REDACTED January 14, 2012 at 7:53 am

1) The .data domain idea is stupid.
2) We already have web pages that are just for computers. Tyler wouldn’t know this, not being a techie, but there are lots of computer only pages. The .json file format is a good example.

Anonymous coward January 14, 2012 at 7:56 am

Each organisation’s API is different though
And there is a good reason for that — the nature of the organizations’ data itself is different. Attempts to shoehorn all potentially existing data into a single schema (for non-trivial meanings of ‘schema’) are doomed to failure for fundamental philosophical reasons, to say nothing of the organizational complications (design-by-committee etc.) Futile though they ultimately are, such attempts hold an irresistible charm to certain types of minds. Wolfram ‘a new kind of science’ is a textbook example of this type.

NAME REDACTED January 14, 2012 at 8:25 am

The XML format and the JSON format does exactly this.

Anonymous coward January 14, 2012 at 9:17 am

XML and JSON fall under the ‘trivial meanings of ‘schema’ heading. By themselves they are nothing but key-value pairs on steroids, same as ASN-1, HTTP headers etc. To make use of the data in these formats, you (or the program) have to know what do the keys and values mean.

NAME REDACTED January 14, 2012 at 9:45 am

Fair enough.

The Original D January 14, 2012 at 11:32 am
Rahul January 14, 2012 at 8:00 am

Looks like a solution in search of a problem. The Semantic Web is hardly a new idea. Already CSS and the separation of data and formatting is happening. RDF, JSON, XML etc. are all intended to solve the same problem and doing quite well.

Besides, he needs to figure incentives; he wants the websites to do all the heavy lifting that’s currently done by the data-miners. Would they oblige and why?

Foo January 14, 2012 at 10:53 am

Regarding incentives, data mining of information that was not designed to be data mined tends to use a site’s bandwidth and computing resources in a very inefficient fashion, which can possibly lead to significantly increased costs if many people engage in that (or if few people do on a massive scale).

Furthermore data miners often have no incentive to save the site’s resources, and so might not bother to do so even if it is possible.

On the other hand, if the site provides an API that is complete and easy to use, everyone will use that since it saves development costs and works better, and the site gets to control the activity.

Rahul January 14, 2012 at 11:12 am

There are easier ways to block bandwidth hoggers and you need to do it anyways since data-miners aren’t the only ones abusing things. I don’t think developing an API is an efficient site-abuse solution.

Michael Stack January 14, 2012 at 10:41 am

Most sites already communicate in exactly this way, albeit not in the ‘.data’ TLD. These APIs are largely hidden from visitors but they are there.

Brian Moore January 14, 2012 at 10:47 am

The programmer responses to this post are very good; and reinforce the concept that this already happens, a lot.

NAME REDACTED January 14, 2012 at 11:07 am

…constantly.

Mark January 14, 2012 at 11:34 am

I don’t know what’s worse: Wolfram’s pomposity in “inventing” things that people have been doing for ages, with the twist of proposing a much worse way of doing it, or Tyler’s credulity in linking to it.

C’mon, does the insight “we could have…. computers… exchanging data with other… computers!!!” really strike you as noteworthy? It’s like something a Jeff Goldbum character would say.

JWatts January 14, 2012 at 10:19 pm

“I don’t know what’s worse: Wolfram’s pomposity in “inventing” things that people have been doing for ages, with the twist of proposing a much worse way of doing it, or Tyler’s credulity in linking to it.”

There’s nothing wrong with exposing new ideas. Not all ideas are necessarily good or viable and this one is clearly not a particularly good idea. But to bash people for just linking to it is a form of arrogance and does more harm than good.

Mark January 16, 2012 at 11:14 am

But this is not a new idea. And Tyler has absolutely no background on which to base his judgment “I think so, if not now, then very soon.” I don’t feel bad for being irritated by people who make pronouncements about areas completely outside their expertise.

Tangurena January 15, 2012 at 11:55 am

Wolfram’s pomposity. We already have “web pages for computers” and they’re called web services.

The first paragraph of this review of Wolfram’s book should give you an idea of how pompous Wolfram is:

>Once, I was one of the authors of a paper on cellular automata. Lawyers for Wolfram Research Inc. threatened to sue me, my co-authors and our employer, because one of our citations referred to a certain mathematical proof, and they claimed the existence of this proof was a trade secret of Wolfram Research.
http://cscs.umich.edu/~crshalizi/reviews/wolfram/

disclaimer: I own a copy of the book.

Marcos January 16, 2012 at 1:37 pm

Wolfram proposing a worst way to do it is, by far the worst offender.

If he was just proposing that we start doing what we are currently doing, we could just comply. But asking for something worse has the potential to get people listenning, and creates the possibility of getting what he asks for.

Davian February 1, 2012 at 6:45 am

You’ve got to be kidding me-it’s so transrpeantly clear now!

ymhawyzmucp February 1, 2012 at 11:57 am

DRVd2y xdiqgfzxajdy

cptgumdijr February 4, 2012 at 3:30 am

cWISCa jdpttndtxjjl

CJC January 14, 2012 at 11:38 am

Well, take a look at the Google vs. Amazon rant by a Google engineer:

https://plus.google.com/112678702228711889851/posts/eVeouesvaVX

There are caveats for the rant, mainly that the guy is fairly low level at Google, etc., but his description of what happens at Amazon (they tend to design towards “platforms”) is what you talk about in this post.

anon January 14, 2012 at 3:49 pm

@CJC: Very interesting post at the link. Thanks for the pointer. And although I’m not an engineer, I have worked with many of them on a few large projects, and Steve Yegge’s point about designing towards platforms sure sounds correct to me.

I also really like his tag line: “Someday my foot won’t fit in my mouth.”
Exactly how I feel most days.

Brian Holtz January 14, 2012 at 12:18 pm

Wolfram admits that a .data TLD would just be a publicity stunt. Instead of publicity stunts, he should roll up his sleeves and understand/promote/improve one of the many solutions already available in this space.

One of the best solutions is Yahoo Query Language, which lets you query the Web as if it’s a database. Wolfram should have tried it at http://developer.yahoo.com/yql/console/ before posting his article.

CBBB January 14, 2012 at 12:44 pm

Stephen Wolfram – Jesus Christ talk about overrated people. Guys like Tyler who don’t know any better like him.

Matt Young January 14, 2012 at 1:45 pm

BSON web bots to the rescue!

A BSON web bot is itself a database format, a nested graph format, a BSON expression format, And a BSON Web Bots cruise the Innner Tubes, looking for semantic pattern matches.

Two BSON bots can execute over each other, there is no fixed object layer.

matt mcknight January 14, 2012 at 3:30 pm

Microformats are being encouraged by Google to imbue existing web content with semantic information.

Rather than a domain, web frameworks such as Ruby on Rails allow you to request the data at the other end of the URL for a page (.xml or .json instead of .html)- or by setting the content-type HTTP header.

Matt Young January 15, 2012 at 12:04 am

Check out MangoDB. Their engine uses BSON format, they transmit in BSON format, and their query and data inserts use JSON as the human text and query interface. So the client server model has gone, query and data are merging into the same BSON format (after translating a query from JSON to BSON), Ultimately, the query is commutative, it doesn’t matter what a web site is and the Mango engine ultimately has know knowledge of what is the query and what is the data. Try writing a general purpose semantic search machine, The concept of the web has been completely subsumed. No more client server.

Martin Keegan January 15, 2012 at 7:22 am

This proposal is barely serious.

Why should anyone pay the .data registrar a fee to do what they can do already on their existing domain?

Web APIs already exist, and it can be highly advantageous for them to share the same infrastructure as the rest of one’s site’s web content. Cookies, in particular, can be used for authentication both to APIs and human-directed content, and cookies do not generally work across TLDs; if I have site.com/person/1234 and site.com/api/person/1234, I can use the same cookie, but site.com/person/1234 and site.data/person/1234 might not work so well. Does this run up against the Javascript same origin policy as well?

Kudos to Wolfram for getting non-techies thinking about the importance of APIs, but his proposal is a non-starter to the point of crankiness.

Matt Young January 15, 2012 at 9:24 am

Been there done that, so many standards.
They have one thing in common, serialized formats, and there really is only one generalized format for serial data, mainly nested stores like JSON, XML, and all the variations. All of them based on the same principle, a graph that can be represented in a linear sequence with two operators to identify the position of any element in the graph. Is the element a sibling, or a descendant..

Then look at storage structure, in the browser, in the server, mostly moving to B-Tree retrieval format, indexed records. Modern browser come complete with indexed storage.

What is in the indexed DB? Nested stores of DOM trees in the browser will become common, an extension of the intelligent cache model, keep structure close to its access point. But what is a DOM tree? A complete serialized BSON format.

Search patterns will have to be processed by the local indexed DB first, that is where the boommarks, cookies and tags for the user are kept. Don’t matter how may API’s you present to user interfaces, underneath we are rapidly moving toward a model of indexed DBs talking to each other., talking the same language.

As Eric Schmidt would say, ‘Get over it”

Repair Calgary Computer January 15, 2012 at 1:44 pm

I wonder if we’d have to own the data domains too or if they’d just come with the computer. Interesting idea.

Luke Bayes January 26, 2012 at 11:42 am

I’m a technologist that is largely ignorant of macro economics. I often find myself disagreeing with the conclusions of the posts here but appreciate how articulate the authors are and assume they have a depth of expertise and knowledge that I don’t. That this post would even get traction here diminishes my faith in this blog in some small way.

Tyler,

Please don’t lend credulous support to proposals that you don’t really understand. It distracts from an otherwise coherent, intelligent reading experience.

I can’t fathom why anyone that understands how the internet works would support, much less propose something like this. Has a registrar bought Wolfram?

Comments on this entry are closed.

Previous post:

Next post: