Archive for May, 2007

Notes on Two days at XTech 2007

May 21, 2007

I was lucky enough to be one of the developers at Talis to attend XTech 2007 (well the first 2 days of it) and what a great conference it was, as well as being in the centre of Paris – my 21st floor Novotel room displayed a wonderful view of the city overlooking the River Seine (perfect!).

I find it sometimes easy to develop tunnel vision, in that you’ve got a job to do, something to implement, and a deadline and you’re focussed on that instead of the wider issues around what you are doing and the technologies that you are using. So it was good to go to XTech and see what others are doing around the semantic web area.

Ubiquitous Computing

One of the themes on the first day was ubiquitous computing with discussion around location aware devices, i.e., mobile devices with sensors that can interact with their physical environment. One of the talks on this theme was from Claus Dahl of Imity who has developed a client bluetooth application that can take a bluetooth scan of your immediate environment to “see” what other bluetooth objects are around. The Imity client can keep, tag and share a history of these objects with other people who have the Imity client installed. One of the interesting points Dahl made was that 3 months worth of personal location history was hard to fake as physical data is “stickier” than online data; so this history is identity. The Imity client is open source and they are planning an api for it.

Location aware devices and software such as the Imity client bring about new opportunities for social networking because as you share your tags you can find people you’ve met or people who have attended the same events as you over time. Check out imity.com for the opportunities in social networking that this involves.

Though not mentioned in the Imity talk these kinds of social networking opportunities also highlight privacy issues as if your phone is bluetooth enabled then it can be discovered by anyone with an Imity client. This leads me on to one of the keynote speeches by Adam Greenfield “Everywhere: expectation, emergence, reality”. His theme was that networking technology is no longer a PC on our desk with an ethernet cable but is “in the woodwork” everywhere around us. He quoted Mark Weiser “The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it”. Greenfield takes the view that now all the components are in place to enable this infiltration of devices into our everyday lives and that this imperceptible and pervasive aspect of computer technology has worrying implications for privacy. The usual counterargument to this is that if you’re honest then you shouldn’t be afraid as you have nothing to hide. Greenfield argued, however, that all societies survive due to a veneer of hypocrisy which is the oil that keeps the social wheels turning (a good definition of politeness perhaps). It’s worrying enough that the average Londoner is apparently caught on camera up to 300 times a day (see the liberty site), however, the ubiquity of these devices provide even more opportunities for surveillance and “reality mining”, i.e., harvesting facts about our behaviour. There are also issues around who controls this information. Some of the recommendations he makes as to how society should deal with these systems are that the systems should “default to harmlessness” and not necessarily embarrass, humiliate or shame users, (for example by being imprecise about users location etc.) Systems should also be deniable, i.e., offer users the ability to opt out at any time. Greenfield is an excellent speaker and I was impressed enough with his talk to order his book.

Open Data

Another theme was Open Data. The talk that I attended by Alf Eaton discussed how Semantic Web technologies can facilitate the sharing of scientific data and experimental results. For example, searching across all scientific literature cannot currently be done as all we have are pdf formats. What would be useful would be to have machine readable documents and semantic browsing, for example, a link which says show me all papers about this gene. Semantic browsing around this data is only possible if things have identifiers, i.e., each paper should have a DOI (Digital Object Identifier) assigned to it. Tools need to be made available for people to collaborate and put their scientific data and experiments for others to see. There was also discussion around how scientists doing experiments need to use structured data and the ontologies being created to enable this. An interesting issue brought up in the question and answer session regarded historical data, i.e., how do you generate a unique ID for, for example, the population Britain between 1957 and the present and also what sort of things should be uniquely identified.

This links with a great keynote by Gavin Starks, who has a background in astrophysics. His talk was about climate change, which he started off by showing us some scary statistics regarding C02 level concentration and world temperature as well as showing us satellite images of how much the polar ice caps have shrunk in the last decade. There is, apparently, a proven correlation between how high C02 levels are and the temperature of the planet. I haven’t yet seen “An Inconvenient Truth” but Starks recommends it as presenting the scientific data very well. Starks then went on to present what he and his colleagues were doing about climate change which was the launch of AMEE, a semantic web platform and “generic algorithmic engine” surrounded by an api and consisting of data supplied by DEFRA, the Royal Society, Global Cool, 0c Climate group (he said that even Rupert Murdoch has apparently bought into this). This platform enables things like toolkits for schools, tradespeople can use it to give energy profiles to their businesses, campaigners can use it to collaborate with each other etc.

I feel depressed about climate change and I’ve always thought that to do something done about it would be nigh impossible as that would mean world governments having to take the lead and collaborate together to tackle the problem (fat chance!). However, due to semantic web technology new social networks are emerging so perhaps we will see a more grass roots movement of concerned citizens who are willing to collaborate for the survival of our species and other species. I was going to say the survival of our planet, but as Starks with his atrophysics background, pointed out, the planet doesn’t care, the planet will survive no matter what we do it. It’s the living beings on our planet, including ourselves that we need to be concerned about. By the way AMEE stands for Avoiding Mass Extinction Engine.

Tutorials

XPath 2.0, XQuery 1.0 and XSLT 2.0 Explained was a detailed and useful tutorial by Priscilla Walmsley which I can’t do justice to as I only attended for a morning. What I did pick up was that there are about 110 built-in functions in XPath 2.0 and also that you can write your own functions. Priscilla took us through the XPath/XQuery data model ( the document node is now the top level node, this was formerly the root node the term root no longer being used). The new version is now strongly typed (which apparently has caused some heated argument from people who think it should be more scripting like). Also new comparison operators exist in XPath 2.0 which it would be better to use as these are optimized for performance, e.g., eq, ne, lt, le, gt, ge rather than the old =, != etc. See her new book here.

Browser Technologies

Though I’ve done front end web development with old fashioned ASP and with ASP.net I’m not all that familiar with the W3C specifications for html and css (my excuse is that I haven’t had time to read and delve into them in detail as I just needed to get a job done quickly) so the talk by Molly Holchszschlag was aimed at developers like me. One of the more frustrating aspects of web development involves browsers rendering the same thing differently or that things will work in one browser and not in another, so this was an interesting and useful talk on browser interoperability and why browsers work the way they do. One of the problems is fractured specifications (linked to this there was some heated debate on why two specifications, xhtml and html 5, are needed), and ambiguities in the specifications. I hope I’m not wrong but I got the impression that, for example, a particular browser may have been developed to comply with one specification and a different browser developed to comply with a different version of the same specification or that different browsers historically have been developed to comply with different specifications. Therefore, no wonder different browsers work, well differently! As well as some interesting historical analysis of the way browser technology has developed she also made some suggestions as to the way forward to improve browser interoperability such as:

Evolving tools and getting community feedback.
Work to common standards and clarify any ambiguities in the W3C specifications.
Have transparent and open development cycles, rather than closed, competitive and secretive development efforts, and work from common use cases.

Time to Leave

The last talk I attended before I left was RSS Remixing by Ian Davis who did a demo of the apis surrounding our Talis platform. He showed how different RSS results retrieved can be used to augment each other so, for example, you can do a search of bibliographic data and augment this with book jacket images and reviews, and data from wikipedia.

I thoroughly enjoyed my 2 days at XTech and met some really nice people as well hearing some useful stuff.

Advertisements

Lucene Sorting Again

May 14, 2007

Fixing various issues in this story I’m implementing which uses Lucene sorting functionality. Ran into the issue that if you specify a field to be sorted by, and that field is not indexed, then IndexSearcher throws a java runtime exception. A bit annoying as a typed exception could be caught and more easily dealt with. So what we’ve done is retrieved the indexed field names via

IndexSearcher.getIndexReader().getFieldNames(
IndexReader.FieldOption.INDEXED)

which returns a Collection, then checked that the fields specified in the sort criteria are in this Collection.