nebelwelt.net logo
navigation logo

Newspaper archive - Transformation of Adobe Quark XPress-Data into a relational database

The target of this project was the development of an online archive of the newspaper "Volksblatt" (http://www.volksblatt.li). An additional feature is the current newspaper. People from far away can read the up-to-date newspaper and don't have so wait until the newspaper is delivered. The archive is available at http://archiv.volksblatt.li.

 

The newspaper data (one file per page in Quark XPress) are transformed to XML files and are then imported into the database.

 

Transformation of the unstruktured Quark-Data into struktured XML data

An additional work step was introduced for the newspaper team to automate the transformation process. The journalists have to define the page layout and the article flow by clicking onto all articels and images.

A serverdeamon then produces XML files out of these enriched Quark files.

 

Import of the XML files via XSL into an arbitrary relational database

These XML files are then transformed via XSL and then directly imported into the database. This happens by a special XSL file which maps the markups and transforms the XML structure into SQL statements that are then executed and imported into the db. To use SQL directly in a XSL file we use special saxon sql extensions.

 

Connectivity between serverdaemons

To move data and files between the different servers we use a set of cron scripts and shellscripts. All files (pdf and pictures) are copied by scp (secure copy). We also have hotin and -out directories where new data is uploaded and then imported by the serverdaemons.

Of course will the files be backuped when the import process is done.