Using Galaxy to discover schemas

February 12th, 2008

I agree that stuffing a URI in an HTTP Content-Type header isn’t the best way to discover more metadata about a mime-type. And maybe decentralized mime types are a bad idea.

But, there are still times where you may encounter unknown elements and want to find a schema. I thought I’d describe a quick example of how you could use Galaxy to do this.

Searching for schemas by XML element name

1. Download Galaxy (this is a snapshot, but it’ll be released soon enough)

2. Run galaxy:

$ java -jarĀ  galaxy-web-standalone-1.0-beta-2-SNAPSHOT.jar

3. Post your schema to Galaxy:

$ curl -v -d myschema.xsd -u admin:admin –header “Slug: myschema.xsd” –header “X-Artifact-Version: 1.0″ http://localhost:8080api/registry/Default%20Workspace

This adds your schema to the workspace “Default Workspace”. (Workspaces are just folders.)

4. Query Galaxy for a schema whenever you discover an XML element you don’t know [1]

$ curl -v -u admin:admin http://localhost:8080/api/registry?q=select%20artifact%20where%20xmlschema.targetNamespace%20=%20′http://www.example.org/test/’%20and%20xmlschema.element%20=%20′testElement’

5. Parse the result for /feed/entry/content/@src for links to any schemas which describe that XML element.

Searching by media type

Let’s pretend that you didn’t actually have to have your media type approved by IANA or just didn’t care. We can also lookup schemas by media types. We’ll intercept the above process at step #4.

4 (version 1). Log in to the web interface for Galaxy. Select the schema. Add a new property to the metadata where the id is “mediaType”, the description is “Media Type” and the value is “application/vnd.myContentType+xml”.

4 (version 2). Modify the metadata via Atom:

a) Get the atom representation of the schema:

$ curl -v -u admin:admin http://localhost:8080/api/registry/Default%20Workspace/myschema.xsd\;atom > entry.atom

b) Edit the entry.atom file and add the following to the entry/metadata element:

<property name="mediaType" value="application/vnd.myContentType+xml" />

c) PUT the new atom representation of the schema:

$ curl -v -u admin:admin -T entry.atom http://localhost:8080/api/registry/Default%20Workspace/myschema.xsd\;atom

5. Search for schemas associated with your media type:

$ curl -v -u admin;admin http://localhost:8080/api/registry?q=select%20artifact%20where%20mediaType%20=%20′application/vnd.myContentType%2Bxml’

6. Parse the result for /feed/entry/content/@src for links to any schemas which describe that media type

Conclusions

Pros: Easy to find and manage schemas.

Cons: Its not universal. You need to know about Galaxy. Decentralized media types are evidently bad (see Mark’s post).

Endnotes

1. Hooray for URL encoding. In case you’re wondering, the q query parameter is really:

select artifact where xmlschema.targetNamespace = ‘http://www.example.org/test/’ and xmlschema.element = ‘test’

More friendly URLs are on the way…

3 Responses to “Using Galaxy to discover schemas”

  1. Paul Brown Says:

    For that kind of query, it would be nice to have a simple API to register generate query tokens, along the lines of an application-specific TinyURL, but with the additional feature to dereference the token into the query.

  2. Ross Mason Says:

    Paul, I was thinking the same thing. The problem with schemes such as TinyURL is that they are only (really) machine readable, which is ok, but will prove harder to work with over time. I would be nice to combine short/concise urls that are readable. Another way would be to provide aliases and let the user define what the short form URL is.

  3. netzooid » Blog Archive » Galaxy 1.0-beta-2 is released! Mule NetBoot, Maven plugin, improved Atom API, and more… Says:

    [...] We now index XML schemas. Which allows you to do things like this. [...]

Leave a Reply