Elsevier

Journal of Web Semantics

Volume 6, Issue 4, November 2008, Pages 243-249
Journal of Web Semantics

Semantic annotation and search of cultural-heritage collections: The MultimediaN E-Culture demonstrator

https://doi.org/10.1016/j.websem.2008.08.001Get rights and content

Abstract

In this article we describe a Semantic Web application for semantic annotation and search in large virtual collections of cultural-heritage objects, indexed with multiple vocabularies. During the annotation phase we harvest, enrich and align collection metadata and vocabularies. The semantic-search facilities support keyword-based queries of the graph (currently 20 M triples), resulting in semantically grouped result clusters, all representing potential semantic matches of the original query. We show two sample search scenario’s. The annotation and search software is open source and is already being used by third parties. All software is based on established Web standards, in particular HTML/XML, CSS, RDF/OWL, SPARQL and JavaScript.

Introduction

The main objective of the MultimediaN E-Culture project is to demonstrate how novel Semantic Web and presentation technologies can be deployed to provide better indexing and search support within large virtual collections of cultural-heritage resources. The architecture is fully based on open Web standards, in particular XML, RDF/OWL and SPARQL. The central hypothesis underlying this work is that the use of explicit background knowledge in the form of ontologies/vocabularies/thesauri is in particular useful for information retrieval in knowledge-rich domains.

The cultural-heritage domain is such a knowledge-rich domain. Collection holders traditionally spent considerable effort on the (manual) indexing process of collection objects. Many institutions use and develop controlled vocabularies to standardize the indexing process. The result is that the domain is dominated by a multitude of vocabularies for different subareas in many different languages. Some efforts have been made to develop collection-spanning vocabularies, such as the Getty vocabularies (see further), but it is clear that the domain is too large and diverse to be covered by a single (set of) vocabulary(ies). There is also significant variation in the annotation structure for collection objects, although many institutions use a format that is, or can be interpreted as, a specialization of Dublin Core.

Due to the abundance of vocabularies, the availability of existing semantic annotations of cultural objects, and the fact that this is mainly publicly accessible information (or at least a willingness to make it accessible), cultural heritage appears to be an ideal candidate for application of Semantic Web technology. With the growth of the World-Wide Web collection holders have been increasingly interested in making their collections available online. There are large international initiatives to make inter-collection access possible, for example the European “Europeana” initiative.1 The key problems in inter-collection search lie in the different annotation formats and vocabularies used by collection holders.

The E-Culture project started out with the goal to show that inter-collection search can be achieved at relatively low cost with Semantic Web technology. The approach that we have taken roughly consists of three elements:

  • (i)

    Providing facilities for harvesting, enriching and aligning collection metadata and vocabularies.

  • (ii)

    Providing facilities for semantic search through the resulting graph, including various presentation mechanisms for the search results.

  • (iii)

    Providing facilities for users to add metadata and/or content.

In this article we report on the results with respect to the first two components; work on the third component in under way and is discussed under future work. The following premises underly our approach:

  • The project does not develop new ontologies/vocabularies but solely uses existing ones. The project may develop however vocabulary extensions, in particular through vocabulary alignments.

  • The project uses existing metadata of multiple collections.

The online version of the demonstrator can be found at: http://e-culture.multimedian.nl/demo/search.

Readers are encouraged to first take a look at the demonstrator before reading on. We suggest you consult the tutorial (linked from the online demo page) which provides a sample walk-through of the search functionality. Please note that this is a product of an ongoing project. Visitors should expect the demonstrator to change. We are incorporating more collections and vocabularies and are also extending the annotation, search and presentation functionality. We are incorporating more collections and vocabularies and are also extending the annotation, search and presentation functionality. We are incorporating more collections and vocabularies and are also extending the annotation, search and presentation functionality.

Due to space limitations this article is basically a summary of the key ingredients of the MultimediaN E-Culture demonstrator, which won the Semantic Web Challenge in 2006. Readers should consult the references provided for details. Section 2 describes the semantic annotation process of collections. In Section 3 we discuss the search architecture and some details of the graph-search algorithm. Section 4 provides a peek at the demonstrator through two sample search scenario’s. Research issues arising from the endeavour are discussed in Section 5.

Section snippets

Semantic annotation: collection data, metadata and vocabularies

A this point we have collected descriptions of 200,000 objects from six collections annotated with a range of thesauri and several proprietary controlled keyword lists, which adds up to 20 million triples (detailed statistics are available from http://e-culture.multimedian.nl/demo/). The objects in the collections come from the Rijksmuseum Amsterdam,2 the National Museum of Ethnology,3 the Royal Tropical Institute,4 the

Technical architecture

The technical baseline of the MultimediaN E-Culture demonstrator is formed by the ClioPatria software, built on top of SWI-Prolog and its (Semantic) Web libraries.14 Fig. 2 gives an overview of the architecture. The reader is referred elsewhere for detailed information about ClioPatria [11], [12], [13]. The software is freely available under a GPL license.15

ClioPatria provides two APIs on top of the

Sample search scenario’s

In this section we give two sample scenario’s of the use of the MultimediaN E-Culture demonstrator. The reader is invited to try these out him/herself (see the link in Section 1). It should be noted that the collection is continuously extended, so the actual search results are likely to vary over time.

Discussion

Over the past 2.5 years the E-Culture demonstrator has grown from 4000 to 200,000 objects. We are now planning large-scale deployment in the context of the European digital heritage portal europeana.eu where we intend to grow to a collection of 12–14 M objects from musea, libraries and archives. We discuss here the lessons we learned so far, including the main research challenges we see from our perspective.

Acknowledgements

We are grateful to Marco de Niet, Annelies van Nispen (Digital Heritage Netherlands22), Marie-France van Orsouw and Annemiek Teesing (Netherlands Institute for Cultural Heritage23) for their valuable input. This research would not have been possible without the gracious support of the collection owners: the Rijksmuseum Amsterdam, the National Museum of Ethnology, the Royal Tropical Institute, The Netherlands Institute for Art History, the Royal Library, and

References (13)

  • L. Hollink et al.

    Patterns of semantic relations to improve image content search

    J. Web Semant.

    (2007)
  • A. Amin et al.

    Understanding cultural heritage experts’ information seeking needs

  • L.M. Aroyo et al.

    CHIP demonstrator: semantics-driven recommendations and museum tour generation

  • V. de Boer et al.

    Extracting instances of relations from web documents using redundancy

  • M. Hildebrand et al.

    Facet: a browser for heterogeneous semantic web repositories

  • A. Miles, T. Baker, R. Swick, Best practice recipes for publishing RDF vocabularies, Working draft, W3C,...
There are more references available in the full text version of this article.

Cited by (97)

  • Metainformation scenarios in Digital Humanities: Characterization and conceptual modelling strategies

    2019, Information Systems
    Citation Excerpt :

    Interesting examples can be found in [39], which allow us to continue with the example presented in Fig. 6. In [39], two complete searching scenarios are shown in relation to an art repository. The scenarios correspond to search operations by e.g. painting author or painting style.

  • A methodology for building domain ontology of cultural heritage

    2023, Digital Scholarship in the Humanities
View all citing articles on Scopus
View full text