Sigfrid Lundberg's Stuff 2009-07-06
I'm a terrible nerd. To use software like content management systems or blog software for building my web site is unthinkable for me.
I have a substantial collection of old stuff written for various purposes in very different formats; it is a must to be able to integrate that material into the site. Some of this material are hard-core XML documents in text encoding initiative or docbook XML. There are also documents in legacy formats, such as RTF and (GNU) troff. I'm still using troff for production of new texts, usually via transform from XML using XSLT. The more recent XML documents are using client side XSLT for viewing.
This material is growing. I see it as an extension of my CV, but some of it could be interesting for users since it is general documentation for how to solve certain kinds of problems. I also wanted to seemlessly integrate other, more lightweight, material. Like this article
As I mentioned the other day, I wanted a new navigation system. There are two requirements on that: (i) it should be easy to follow for users and (ii) the pages in it should be good landing pages for search engines. I felt that there was a need for more text that would generate hits in search engines without out being irrelevant for users and misrepresent the content. Finally I wanted the site to be somewhat like a blog. In spite of the improved look and feel I also wanted that all the material already published on the site should retain the current URIs.
The old site was just plain files, I did no scripting whatsoever on that site. I couldn't, however, possibly manage a navigation system a manually, so some scripting had to be involved in generating the site.
To generate the navigation system, I have catalog all existing 'static' material. In particular I had to index manually using keywords or "tags". I decided what material to include and wrote manually a single monolithic metadata file in XML using using the atom syndication format.
Having this file it was easy to write a xslt tranform that generated the browse by year and subject menues appearing in the left column on most pages (that took 61 lines). Then there is another xslt transform aggregating the title, summary and link data into menus, such as the one in XML processing. This took about 200 lines of xslt.
Now, these two xslt scripts take into account only the older kind of static material. I also wanted a new kind of bloggish 'dynamic' material. I needed to integrate the two kinds of material in a single structure. Just as I wrote the metadata for the older set of material in Atom feed document, I write the bloggish kind of stuff in Atom entry documents. They live in a file system /entries under the document root.
To integrate the two I have a nifty little perl script that does two things. It reads the Atom feed for the and parses it into a Document Object Model (DOM) object. Then it traverses the /entries file system, parses each entry. First it drops a transform into html of each entry. The entry itself is entered into the global DOM, which is finally printed into a complete Atom feed. This took 65 lines of perl and 84 lines of xslt.
To get a blog style home page I had to sort the entire set inversely in temporal order, print the most recent entry on the home page and finally print a few pointers to more recent entries. These tasks took another 250 lines of xslt. In addition I have a utility which generates the /entries directories and prints a skeleton entry, which generates another 100 lines of perl which generates the skeleton using XML DOM.
It takes less than a second to rebuild the entire site. Before the refurbishment, I used to copy all files to my server. In order to implement incremental update, I've created a CVS repository on my server. Now I check in everything using CVS after building and testing, and then I publish my stuff by checking out in the servers document root. This will scale up to a some thousand blog entries. Then I'll ingest my entries in Oracle Berkeley DB XML and replace most xslt with XML Query.
Or I'll do that anyway, for the fun of it.
My name is Sigfrid Lundberg. The stuff I publish here may, or may not, be of interest for anyone else.
On this site there is material on photography, music, literature and other stuff I enjoy in life. However, most of it is related to my profession as an Internet programmer and software developer within the area of digital libraries. I have been that at the Royal Danish Library, Copenhagen (Denmark) and, before that, Lund university library (Sweden).
The content here does not reflect the views of my employers. They are now all past employers, since I retired 1 May 2023.
This entry (About this site) within Sigfrid Lundberg's Stuff,
by
Sigfrid Lundberg
is licensed under a
Creative Commons
Attribution-ShareAlike 3.0 Unported License.