metade.org

dr. patrick sinclair’s blog

Archive for the ‘semweb’ Category

The Semantic Web: a Medical Perspective

with one comment

On Saturday I presented a guest lecture for a masters course on Medical Informatics at the Universidade do Porto, Portugal.

The title of the lecture was: “The Semantic Web: a Medical Perspective”.

I described the issues surrounding the sharing of data in the medical domain, and how the current web fails to address the needs of users wishing to integrate different data sets. I then introduced the semantic web, but I did it from the perspective of linking data rather than the top-down, formal ontologies and logic route.

Although presenting a full lecture was a new experience for me, I think it went well. I gave the presentation in English rather than try a mixture of rusty Portuguese/English, although looking back now perhaps I should have gone with the Portuguese. It’s so easy to drop in a typical English expressions or word that non-native English speakers won’t have a clue about!

I’ve made the slides for the presentation available here: slides (PDF, 7Mb).

Written by metade

June 9th, 2007 at 9:08 pm

Posted in presentation, semweb

The Ups and Downs of Image Retrieval

without comments

Last week I gave a joint IAM seminar with Jon Hare, as part of the IAM 2006/2007 seminar series.

The seminar was officially called “Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and Bottom-up Approaches“, although we decided that what we really were talking about was “The Ups and Downs of Image Retrieval”.

It was basically a follow up on the paper we presented at the Mastering the Gap workshop at the European Semantic Web Conference last year.

We described the two basic approaches to image retrieval (semantic-based and content-based) and introduced the issue of the semantic gap in image retrieval, in particular by looking at the user requests to real picture libraries and archives. We then went to describe how semantic web techniques might be used to overcome some of the issues, although it’s challenging when dealing with real archive data. Finally, Jon showed his amazing semantic space technique, which allows users to search collections of unannotated images using text keywords.

The slides are available in PDF (5.7M) and because of the movies/demos we used in the presentation as a Quicktime interactive movie in two sizes: large (118M) and small (13M).

The seminar itself was recored and podcast on the IAM seminar feed, but I don’t think it’s visible outside of ECS. We’ll try to get it put somewhere sensible soon.

Written by metade

May 28th, 2007 at 3:37 pm

Semantic Web resources and tools for Cultural Heritage

without comments

After working in the “semantic web meets cultural heritage” domain for a number of years, there are many useful resources and tools I’ve become familiar with. I thought it would be useful to share some of these - hopefully it will be helpful to someone out there!

Ontologies

The CIDOC CRM is an extremely rich core ontology for describing cultural heritage documentation. I know a fair bit about it, having used it in the Sculpteur and eCHASE projects but it’s best to look at the official CIDOC CRM site for more details.

Whilst I don’t think the VRA Core model is as powerful as CIDOC CRM, it seems to be more accessible and easier to get to grips with. The W3C Multimedia Semantics Incubator Group describe how VRA Core can be used in cultural heritage documentation in their report on Image Annotation on the Semantic Web.

SKOS: Simple Knowledge Organization System

SKOS provides an OWL ontology for modeling knowledge organisation systems such as controlled lists/vocabularies/thesauri. It provides the means to describe thesauri concepts and their relationships (e.g. labels, alternative labels, broader and narrower concepts, concept schemes and so on).

The best starting point is the SKOS Core Guide, that shows you how to go about modeling a thesaurus scheme in SKOS.

Geonames

Geonames is a fantastic resource that provides information on locations in semantic web format (i.e. RDF). For example, check out Southampton in RDF! Their website has a nice Google Maps interface, but the really great thing is the Geonames web service that lets you match a query string (e.g. “Southampton, England”) to a specific place. I’ve used this service to add rich geographical information (e.g. latitude and longitude) to cultural archives that only had ambiguous text entries for place information.

D2R Server

D2R Server provides a mapping mechanism to publish relational databases on the Semantic Web. The databases are exposed as RDF, and can be queried using SPARQL.

Semantic Web Frameworks

There are loads of semantic web development frameworks, and there are a number of useful resources describing them too! Check out:

Ones I’ve found particularly interesting are:

Miscellaneous RDF Stuff

If you need to read/write RDF by hand, use N3 rather than RDF/XML - it’s a lot easier!

Also, if you need to write any code to generate RDF data, consider using NTriples. Each triple goes on a separate line, so you don’t need to worry about setting up the RDF/XML document structure correctly.

Written by metade

May 12th, 2007 at 9:18 am

Posted in semweb

Exposing AllegroGraph as a Joseki SPARQL end-point

with 3 comments

Back when I was working on the Sculpteur project, we had real issues with semantic web triple store performance, in particular in regards to the amount of data they could handle.

So we took a more traditional relational database/Z39.50 SRW approach, finally resulting in the OpenMKS system.

So it’s great to see how far triple stores have come: the AllegroGraph triple store server from Franz Inc. can handle billions of triples. They offer a free version that supports up to 50 million triples - that’s a lot of data! For example, the E-Culture demonstrator (part of the Multimedian project in the Netherlands) incorporates a number of cultural heritage vocabularies/thesauri, Wordnet and three museum/archive collections and takes up only 9 million triples (according to this paper by Schreiber et al.).

So I’ve been experimenting with AllegroGraph, in particular so it can act as a back-end to an mSpace interface. Although AllegroGraph supports SPARQL queries, it doesn’t expose an HTTP SPARQL endpoint (at least I couldn’t find one!).

Here’s how I went about exposing AllegroGraph through a Joseki SPARQL endpoint. It’s not the most efficient way of doing it, but it was the easiest to implement and it works!

AllegroGraph provides a Jena layer, so inspired by the D2R server code, I created a Jena Assembler for the AllegroGraph Jena graph. With this, you can then set up a Joseki data set:


_:allegro rdf:type ja:RDFDataset ;
rdfs:label "Allegro" ;
ja:defaultGraph
[ rdfs:label "Allegro Graph" ;
a ag:AllegroModel ;
ag:modelName "test"; # the name of the AllegroGraph model name
ag:modelLocation "/tmp/ag/"; # the location of the AllegroGraph model
] ;
.

ag:AllegroModel
rdfs:subClassOf ja:Object;
ja:assembler “org.metade.allegrograph.AllegroGraphAssembler”;
.

I’ve assembled a Maven project with the code which you can download, maybe it’ll be of use to someone. Give me a shout if you need help getting it going.

Now I mentioned that this isn’t the best approach. We’ve done some basic testing on it, and although it manages to cope with some complex queries far better than other systems we’ve tried, it can be slow on others. There is a lot of overhead in this approach. From the top of my head (i.e. this might not be very accurate!) each SPARQL query goes through:

  • Jena ARQ SPARQL library
  • Jena graph model
  • AllegroGraph Jena wrapper
  • AllegroGraph Java layer
  • AllegroGraph native store

But it works…

In terms of future work, AllegroGraph supports SPARQL queries directly so we’re planning on developing the SPARQL endpoint that talks to this directly. It will be interesting to compare the two approaches, but my guess is that the direct SPARQL queries will be a lot faster.

Written by metade

April 18th, 2007 at 12:06 pm

Posted in code, semweb, sparql