Sigfrid
Lundberg´s
Stuff

OpenSearch, RSS and OPML as XML Webservices for information retrieval

Sigfrid Lundberg's Stuff 2009-07-09

Bookmark and Share

At the Royal Library we have been working with the building of an infrastructure for publishing of digitized material. It is collections of digital images, usually with very little textual content to go with it.

The cataloging of the images has been made by library staff using the image database system Cumulus from Canto. Cumulus is designed as digital asset management system for people within graphical industry. Canto is producing an advanced Web user interface, but it is rather poor as regards dissemination of the collections on the web. For example, images published that way never ever appear in Google Images. Syndication of our content has not been possible, since the system have been lacking any concept of standardized metadata, which is essential for all collaboration between institutions in the library communities.

Grazr

To take advantage of the relative ease of use for the people handling the images, but still be able to provide access to our collections on the Internet we have an entirely new web interface. For performace reasons, we decided that it should not access the cumulus database directly, but a mirror in our Oracle DB. Furthermore we wanted a REST XML web service layer.

The latter goal was achieved by using two syndication services, built on top of Oracle. The two services were written as JAVA servlets, and supported Outline Processor Markup Language (OPML) and OpenSearch, together with RSS. The rationale for the latter choice of standard was that we wanted mainstream Internet Standards rather than typical library standards such as SRU. OpenSearch is promoted by A9, a subsidiary of Amazon and there are at least pledges of support from other big players such as Google and Yahoo.

OPML was not an as obvious choice. However, OpenSearch is basically a Syndication protocol and as such OPML is a candidate since its most common usage is to provide subject structured access to feeds. An early version of the system is available on our web site, the first version was released for about a year ago. Our technology choices makes it stright forward to syndicate the content. The gadget on this page is an example of this. It is a gadget showing the OPML and RSS from the first collection of images released in this way, Danmarksbilleder, a collection of historical images from a few Danish cities. The widget itself is "The Original Feed Widget" a nifty thing coming from grazr.com. You can make one of your own by filling in the URI http://www.kb.dk/opml_category_service/danmarksbilleder/?opml_mode=shallow&subject=8 in the form on grazr.com.

The use of truly simple de facto Internet standards is one advantage. But we needed a way to build traditional web contents on top of our two web services. Sites as Danmarksbilleder mentioned above, and its "cousins" Kistebilleder, Daells varehus and Partiprogrammer. All these are actually delivered by one single servlet, which is most appropriately described as a mashup-engine.

The engine is written in-house, and supports multiple skins (as seen in the examples above -- Danmarksbilleder differs from Kistebilleder).

In this application a skin is a XML document, which consists of the lay-out for the html page (it is little more than the HTML). Appart from html tags it also contain <kb:include/> tags identified by they id attribute. Each kb:include tag corresponds to a REST XML web service and is connected to a XSLT script via configuration file. The mashup engine reads the skin at initialization, and upon a request it retrieves the required OPML or RSS, transforms the content, pastes these fragments into the skin DOM tree and finally delivers the content to the client.

The mashup engine is not very well written. I can do better. Also the current set of services requires an Oracle schema per collection disseminated which is wasteful. We've haven't got the time yet to fix the first problem, but the second problem is about to be solved since we have a second generation of web services in the pipe-line for release real soon. In this application we've just one single Oracle database for all editions served. I will give you a report on that one when it is online.

blog comments powered by Disqus

Home

Subscribe to Stuff from Sigfrid LundbergSubscribe to my stuff
Subscribe to Stuff from Sigfrid LundbergSubscribe to discussion feed

stuff by category || year

NB

My name is Sigfrid Lundberg. The stuff I publish here may, or may not, be of interest for anyone else.

On this site there is material on photography, music, literature and other stuff I enjoy in life. However, most of it is related to my profession as an Internet programmer and software developer within the area of digital libraries at the Royal Library, Copenhagen (Denmark) and, before that, Lund university (Sweden).

The content here does not reflect the views of my past or present employers

Creative Commons License
This entry (OpenSearch, RSS and OPML as XML Webservices for information retrieval) within Sigfrid Lundberg's Stuff, by Sigfrid Lundberg is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.