metade.org

dr. patrick sinclair’s blog

Archive for the ‘code’ Category

We Need @ Social Innovation Camp

without comments

This weekend I attended Social Innovation Camp, (SICamp) where I had an absolutely fantastic time! SICamp is an experiment to bring together people over a weekend and get them to build/prototype web apps that will drive social change - check out what the Guardian had to say about it on Saturday and then on Sunday after the projects were presented.

I worked on the we-need.org project, which was led by Craig Griffin from Fresh Voice. The idea is to provide a web site where people with disabilities are able to express their needs through an accessible web interface rather than having their needs assessed by the system as it is currently (i.e. by filling in a 50-odd page form). The site would then aggregate and present interactive visualisations of the needs in a given community so they can be more efficiently handled. Hopefully the video of the pitch we presented on Sunday will be made available online soon, where our team explained the concept far better than I can!

For the weekend, we built a Ruby on Rails application where users could use a basic html form to express their needs. We also experimented with a graphical radial interface which is better suited for people with certain types of disabilities. We used the Geokit Rails plugin to geocode users’ addresses so that their needs could be plotted on a map. We used Simile Exhibit to prototype the visualisation of the aggregated needs of communities on a map.

As mentioned during the project presentation, we required some sample data so I put together some Ruby to randomly generate some users and their needs. So here is the infamous Ruby “random needs generator” one-liner:


(0..rand(3)).to_a.map { |a| rand(@@need_count)+1 }.uniq

I’ve bravely put all of the we-need code on github, but do keep in mind that it was all built in less than 24 hours!

I had a really amazing time, worked with some fantastic people on We Need and also really enjoyed the post-SICamp discussions in the pub on Sunday! Big thanks to the SICamp team for organising everything - I’m already looking forward to the next SICamp meetup!

Written by metade

December 9th, 2008 at 11:48 pm

Posted in code, sicamp

My take on regression testing CSS

with one comment

As we add new features to the BBC Music Beta, we have more pages to check before making a new release.

We’re using test-driven-development on the code generating the pages, but obviously these techniques don’t cover the visual appearance of the site. As we reuse the same visual modules on different pages across the site, CSS bugs creep up unexpectedly on different pages across the site.

For example, the code generating the links module is reused on three different pages:

And to illustrate the problem, I just noticed a CSS quirk with the background of the links module on that third link. So I started thinking about how one might go about regression testing CSS and hacked a simple solution together using CutyCapt, ImageMagick’s compare tool and Ruby Rake.

The result is illustrated here:

Stable version of BBC Music Beta artist profile page

Stable version of BBC Music Beta artist profile page

Development Version of BBC Music Beta artist profile page

Development Version of BBC Music Beta artist profile page

Difference between the stable and development versions of the site

Difference between the stable and development versions of the site

The first image is a screen capture of the stable version of an artist’s profile page, taken directly from the live BBC Music Beta. The second image is taken from our development version of the site. The third image shows the difference between the two - in our upcoming release we are shuffling some of the modules around so these changes are very noticeable.

At the moment, my tool is very basic: you give it the stable and development host names, and a list of paths to test (example configuration). It uses CutyCapt to pull down each of the paths from the stable and development hosts, and then runs the ImageMagick compare tool between each pair of images. It then produces a very simple HTML file that displays all of the pages being tested.

While we’re just using this tool informally at the moment, it’s already been really useful to catch unexpected CSS bugs on our site.

Written by metade

November 15th, 2008 at 2:57 pm

BBC Music/MusicBrainz bookmarklet

with 5 comments

At BBC Audio and Music Interactive, I’m one of the software engineers working on the BBC Music Discovery team. This week we launched the BBC Music Beta, which focuses in particular on publishing information about the artists broadcast on the BBC. You can read more about the site on Tom Scott’s blog and BBC Radio Labs.

Matthew Shorter describes at the bottom of his post how to use MusicBrainz to find a given artist. Here’s a little something to make that a touch easier: a BBC Music/MusicBrainz bookmarklet!

Drag this BBC Music/MusicBrainz link to your bookmarks bar in your browser. Now, when you’re on an artist page (e.g. Coldplay) click on the bookmarklet to switch between BBC Music and MusicBrainz artist page.

Enjoy!

Written by metade

July 30th, 2008 at 11:20 pm

Posted in bbc, code, music, musicbrainz

Improving music recommendations step one: ignoring bad data

without comments

When I presented my music recommendations hack at Mashed last weekend, I showed some examples by randomly browsing around the artists and brands pages.

When I came to the Giles Peterson show, I was surprised that the system was recommending artists such as ‘The Automatic’ and ‘Arctic Monkeys’.

This struck me as extremely unusual recommendations for a show featuring “Latin, funk, soul and hip-hop”, but I suspected that the data rather than the system was at fault. I had a quick look at the source data that had been fed into the system for this show and found:

  • The Wombats (1)
  • My Chemical Romance (1)
  • Hard-Fi (1)
  • Gideon Conn (1)
  • Armand Van Helden (1)
  • Editors (1)

Looking at this list, it seems that the recommendations actually make sense: there is very little data for the show, and actually it doesn’t even look correct!

This data has been generated from the digital play out system but we are unable to track some of the shows, especially specialist music shows such as Giles Peterson. The DJ might play directly off their own vinyl/cd/computer/other crazy device, or the show might be pre-recorded.

So what I’ve done is simply ignore brands with a low average artist play count (<=1.0), which should avoid this kind of situation.

I also want to point out that there is a basic API in place, although it still needs documenting. Just add '.json' at the end of brand/artist/last.fm profile URLs to get a JSON feed of the data.

Written by metade

June 26th, 2008 at 2:20 pm

My Mashed 2008 Hack: Recommending BBC radio shows and artists

with 2 comments

Mashed08: London, June 21/2 2008

I’ve just returned from Mashed 2008 where I formed part of the BBC Radio Labs contingent.

We were providing all sorts of fun things for people to play with, from live BBC Radio audio streams, feeds of what track is being played over the air and archives of both the audio and metadata feeds. All of the details are available on the BBC Audio and Music Interactive at Mashed 2008 site.

One of the things that I was directly involved in was the “How many times brands have played artists” data set. By matching the music tracks played on air to MusicBrainz artists, and then work out which radio show the track was played on, we can build this index of which artists were played on what shows. For example, we can see which artist Jo Whiley has played the most, or work out who’s been playing the Arctic Monkeys the most.

It is also a great resource for recommending artists and shows and shows to people. So what I did for Mashed was feed this data into the Semantic Space engine, developed at the University of Southampton by Jon Hare, and build a web app around it: music-recommendations.metade.org.

The site let’s you browse around artists and shows, and view lists of other artists and shows the system has recommended. It also provides recommendations based on a last.fm profile top artist feed.

There is a little more detail on how the technique works on the site (hint: it’s based on latent semantic analysis), and I intend to carry on working with Jon to improve both the quality of the recommendations and how they are visualised.

Written by metade

June 23rd, 2008 at 8:57 am

Every day should be Hack Day!

with one comment

Back from Hack Day London - it was absolutely incredible! I had really great fun, did a lot of hacking/coding all night long!

I worked with Jon Hare, and we went for the augmented reality web service, which was pretty crazy. We made the ARToolKit, which you can see in the video below, into a web service!

Although it’s a little rough around the edges still, you can play with our hack online at: http://multimedia.ecs.soton.ac.uk/artheworld

I’ve also posted the slides from our 90 seconds presentation on Slideshare:

Written by metade

June 19th, 2007 at 1:27 pm

Fun Augmented Reality Stuff

without comments

For my PhD I had loads of fun with tangible augmented reality interfaces. I posted a couple of videos on my work onto Google Video a few weeks back, which shows the kinds of things I ended up doing:

This work was done using the ARToolKit, an augmented reality toolkit that does the computer vision tracking of the marker cards you can see in the video.

I’m very tempted to build something based on this for Hack day!

In collaboration with a couple of guys from the research lab I work at, we recently had the crazy idea to make the ARToolkit into a web service.

The basic idea is as follows: you take a photo of an ARToolKit marker in a fun place, upload it to the web service and it renders some 3D content over the card using the ARToolKit.

I started putting together a basic web app using the Ruby Camping framework. I even managed to get it to upload result to Flickr if the user decided they liked the resulting image.

Here’s a screenshot of what we have so far, plus the Flickr account for the service.

The app still needs a lot of work (e.g. letting people choose which 3D model they want rendered), but the basics are there for a nice hack.

What we really need to do is work out an actual application/scenario for this - there probably isn’t one and it’s just a bit of fun, pointless hackery. But I’m quite intrigued to investigate some kind of whacky “physical meets the digital” experiment…

Written by metade

June 12th, 2007 at 8:39 pm

Posted in code, hackdaylondon

Exposing AllegroGraph as a Joseki SPARQL end-point

with 4 comments

Back when I was working on the Sculpteur project, we had real issues with semantic web triple store performance, in particular in regards to the amount of data they could handle.

So we took a more traditional relational database/Z39.50 SRW approach, finally resulting in the OpenMKS system.

So it’s great to see how far triple stores have come: the AllegroGraph triple store server from Franz Inc. can handle billions of triples. They offer a free version that supports up to 50 million triples - that’s a lot of data! For example, the E-Culture demonstrator (part of the Multimedian project in the Netherlands) incorporates a number of cultural heritage vocabularies/thesauri, Wordnet and three museum/archive collections and takes up only 9 million triples (according to this paper by Schreiber et al.).

So I’ve been experimenting with AllegroGraph, in particular so it can act as a back-end to an mSpace interface. Although AllegroGraph supports SPARQL queries, it doesn’t expose an HTTP SPARQL endpoint (at least I couldn’t find one!).

Here’s how I went about exposing AllegroGraph through a Joseki SPARQL endpoint. It’s not the most efficient way of doing it, but it was the easiest to implement and it works!

AllegroGraph provides a Jena layer, so inspired by the D2R server code, I created a Jena Assembler for the AllegroGraph Jena graph. With this, you can then set up a Joseki data set:


_:allegro rdf:type ja:RDFDataset ;
rdfs:label "Allegro" ;
ja:defaultGraph
[ rdfs:label "Allegro Graph" ;
a ag:AllegroModel ;
ag:modelName "test"; # the name of the AllegroGraph model name
ag:modelLocation "/tmp/ag/"; # the location of the AllegroGraph model
] ;
.

ag:AllegroModel
rdfs:subClassOf ja:Object;
ja:assembler “org.metade.allegrograph.AllegroGraphAssembler”;
.

I’ve assembled a Maven project with the code which you can download, maybe it’ll be of use to someone. Give me a shout if you need help getting it going.

Now I mentioned that this isn’t the best approach. We’ve done some basic testing on it, and although it manages to cope with some complex queries far better than other systems we’ve tried, it can be slow on others. There is a lot of overhead in this approach. From the top of my head (i.e. this might not be very accurate!) each SPARQL query goes through:

  • Jena ARQ SPARQL library
  • Jena graph model
  • AllegroGraph Jena wrapper
  • AllegroGraph Java layer
  • AllegroGraph native store

But it works…

In terms of future work, AllegroGraph supports SPARQL queries directly so we’re planning on developing the SPARQL endpoint that talks to this directly. It will be interesting to compare the two approaches, but my guess is that the direct SPARQL queries will be a lot faster.

Written by metade

April 18th, 2007 at 12:06 pm

Posted in code, semweb, sparql