A Short History of Web Discovery
December 22nd, 2004Search engines were the first automated web discoverers. HTML and HTML metadata were processed and indexexed making it searchable. Then advances made Word documents, PDFs, and other files searchable. Then RSS. Now search engines have been adding more metadata beyond the pathetic HTML metadata that has always been.
I think this could be taken a step further though. Search engines are a short distance from indexing FOAF – which would make everyone who publishes a FOAF description available via a quick search. This info could also be available via a browser plugin like how Firefox has become RSS aware showing you an orange icon whenever a feed is found.
One could imagine descriptors for all sorts of things – for instance a descriptor for merchandise which contains a pictures, description and link to buy. Aggregate this and you could have a distributed Amazon. Or there could be a descriptor for contact information for organizations. All this information would then be available to both the browser and the search engine.
So I have a couple questions:
- Why isn’t this being done?
- What other XML vocabularies are there out there that pertain to this?
- Are there better to discover this information than brute force web crawling?
- How do web services (particularly WSDL/SOAP/UDDI) fit into this?
December 24th, 2004 at 7:37 am
I kept waiting for you to connect this to Christmas and the joy of the holidays and all that.
I’m still waiting.