Sigfrid Lundberg's Stuff 2009-07-09
At the Royal Library we have been working with the building of an infrastructure for publishing of digitized material. It is collections of digital images, usually with very little textual content to go with it.
The cataloging of the images has been made by library staff using the image database system Cumulus from Canto. Cumulus is designed as digital asset management system for people within graphical industry. Canto is producing an advanced Web user interface, but it is rather poor as regards dissemination of the collections on the web. For example, images published that way never ever appear in Google Images. Syndication of our content has not been possible, since the system have been lacking any concept of standardized metadata, which is essential for all collaboration between institutions in the library communities.
To take advantage of the relative ease of use for the people handling the images, but still be able to provide access to our collections on the Internet we have an entirely new web interface. For performace reasons, we decided that it should not access the cumulus database directly, but a mirror in our Oracle DB. Furthermore we wanted a REST XML web service layer.
The latter goal was achieved by using two syndication services, built on top of Oracle. The two services were written as JAVA servlets, and supported Outline Processor Markup Language (OPML) and OpenSearch, together with RSS. The rationale for the latter choice of standard was that we wanted mainstream Internet Standards rather than typical library standards such as SRU. OpenSearch is promoted by A9, a subsidiary of Amazon and there are at least pledges of support from other big players such as Google and Yahoo.
OPML was not an as obvious choice. However, OpenSearch is basically a Syndication protocol and as such OPML is a candidate since its most common usage is to provide subject structured access to feeds. An early version of the system is available on our web site, the first version was released for about a year ago. Our technology choices makes it stright forward to syndicate the content. The gadget on this page is an example of this. It is a gadget showing the OPML and RSS from the first collection of images released in this way, Danmarksbilleder, a collection of historical images from a few Danish cities. The widget itself is "The Original Feed Widget" a nifty thing coming from grazr.com. You can make one of your own by filling in the URI http://www.kb.dk/opml_category_service/danmarksbilleder/?opml_mode=shallow&subject=8 in the form on grazr.com.
The use of truly simple de facto Internet standards is one advantage. But we needed a way to build traditional web contents on top of our two web services. Sites as Danmarksbilleder mentioned above, and its "cousins" Kistebilleder, Daells varehus and Partiprogrammer. All these are actually delivered by one single servlet, which is most appropriately described as a mashup-engine.
The engine is written in-house, and supports multiple skins (as seen in the examples above -- Danmarksbilleder differs from Kistebilleder).
In this application a skin is a XML document, which consists of the lay-out for the html page (it is little more than the HTML). Appart from html tags it also contain <kb:include/> tags identified by they id attribute. Each kb:include tag corresponds to a REST XML web service and is connected to a XSLT script via configuration file. The mashup engine reads the skin at initialization, and upon a request it retrieves the required OPML or RSS, transforms the content, pastes these fragments into the skin DOM tree and finally delivers the content to the client.
The mashup engine is not very well written. I can do better. Also the current set of services requires an Oracle schema per collection disseminated which is wasteful. We've haven't got the time yet to fix the first problem, but the second problem is about to be solved since we have a second generation of web services in the pipe-line for release real soon. In this application we've just one single Oracle database for all editions served. I will give you a report on that one when it is online.
My name is Sigfrid Lundberg. The stuff I publish here may, or may not, be of interest for anyone else.
On this site there is material on photography, music, literature and other stuff I enjoy in life. However, most of it is related to my profession as an Internet programmer and software developer within the area of digital libraries. I have been that at the Royal Danish Library, Copenhagen (Denmark) and, before that, Lund university library (Sweden).
The content here does not reflect the views of my employers. They are now all past employers, since I retired 1 May 2023.
This entry (OpenSearch, RSS and OPML as XML Webservices for information
retrieval) within Sigfrid Lundberg's Stuff,
by
Sigfrid Lundberg
is licensed under a
Creative Commons
Attribution-ShareAlike 3.0 Unported License.