The Universal Data Model

February 10th, 2008

Several people objected to the proposal posted on Stefan’s blog from the WSO2 guys. Mark Baker and Bill de hÓra argue vehemently against it saying we should move toward a universal data model such as RDF/OWL. While I agree that the solution proposed is not necessarily full proof or what should ultimately be adopted, its a good that we’re discussing this and we need *something* like it.

This looks like another long, never ending, I-won’t-listen-to-you type of debate, but I’ll chime in a cent or two.

Universal All the Way to the Data Model?

Should we just use a data format like RDF/OWN which can be molded into just about anything? Paul Fremantle does a good job of answering this:

The problem of more general media types is that they just push the problem elsewhere. For example, we could do everything with RDF triples, but then you have just pushed the problem to be one of finding a suitable ontology.

One of the best things about the web is how minimal the number of truly centralized services are - and media types just doesn’t seem - to me - to be in the same class as DNS root servers and IP address allocation.

We need more protocols like AtomPub

AtomPub does a great job in generalizing things so you can achieve useful things without knowing as much about a service. This comes at a functionality tradeoff though. For those who say we need more stuff like AtomPub I would like to see proposals. Either you’re going to make it more generic, and reduce the amount of functionality you can access without knowing anything about the service. Or you’re going to make it more specific, reduce the number of use cases it applies to, and increase the functionality you can access without knowing anything about the service. Either way, not a solution to the problem at hand as you’re always leaving problems out of the solution.

Also, even in AtomPub you need to extend it to support extensions. We have OpenSearch extensions,  GData extensions, security extensions, etc etc. When I encounter one of these extensions how do I figure out what it is? Sure its easy if its some universal extension. But try inventing a real (semi-complex) service without your own extensions or your own Microformat. Nearly impossible.

I’m all for AtomPub and more things like it. But people should keep these two things in mind.

Get your head out of the Internet and into the intranet

Read Stu:

Media type proliferation is a governance problem. On the Internet, the IANA is the governing body. In an Intranet, …. it depends on your governance model. What’s clear is that having everyone’s IT department register their own vnd media type seems both silly and untenable because those media types will not likely be general. So they’ll have their own corporate & partners registry.

Sure you can say those media types should be general. See the above though.

Cultural Issues

There is this idea that we should stay far far away from anything that even remotely reminds anyone of WSDL (even if its not an IDL) because it can be misused. I can’t believe that no one isn’t throwing up their arms against this idea.  I suppose this will make me very non-cool, but just because a tool can be used in a bad way doesn’t means that you should never use it. I can use an ice pick to kill someone, but that doesn’t mean one shouldn’t have ice picks. (If you’re twisted you could even view this versatility as a strength of the ice pick ;-))

Update: Just to clarify, I agree with much of what Mark/Bill say. But, to say that the *idea* is completely misplaced seems wrong to me. I still think we need a way to:

a) Discover more about a media type which I create for my intranet/business partners/etc. All I need is a way to associate a media type with a link.

b) Discover more about XML extensions or microformats embedded in things like AtomPub.

2 Responses to “The Universal Data Model”

  1. Stu Says:

    Regarding “pushing the problem elsewhere”: The whole point is that you want to be able to break problems down with appropriate levels of indirection. It’s the whole point behind abstraction, and it’s a major driver for RESTful architectures.

    For example, there is a classic tradeoff between generic representations and specific representations. In XML, I could say

    value1
    value3
    value3

    or

    value2
    value3

    There are pros and cons to either approach. From one POV, RDF is a well specified case of the former, and raises interoperability to the level of logic instead of just syntax, like what we (try, sometimes) do with relational databases.

    OTOH, I wouldn’t want to transfer or interpret bitmap images as RDF triples.

    So, if I want to associate metadata with the JPG, either I:
    1. provide a wrapped representation with another format, say, an Atom entry (Nick Gall’s suggestion in my blog comments), or more commonly, HTML and Microformats
    2. provide an RDF content negotiated representation for the image, see this note as an example
    3. add an HTTP header or parameter to Content-Type (this debate)
    4. modify JPG itself to enable URI hooks (not bloody likely)

    These probably could be generalized to any media type that allows for extensions or mixed vocabularies.

  2. Stu Says:

    Egads, I didn’t escape my XML there. Here was the example:

    <set name="stuff">
    <row>
    <column key="true" name="a">value1</1>
    <column name="a">value1</1>
    <column name="a">value1</1>
    </row>
    </set>

    or

    <stuff key="a">
    <a>value1</a>
    <b>value2</b>
    <c>value3</c>
    </stuff>

Leave a Reply