Category Archives: search engines

The search engine and Us

To my horror I realise that it is nearly a year since I wrote an article for this blog. I have been busy blogging elsewhere, and it is not for want of topics. I’ll do a quick post – a cautionary tale about search-engine optimisation and branding.

My church supports a charity which for many years was known as USPG – standing for the United Society for the Propagation of the Gospel. Its origins lie in missionary activity in the days of the British Empire, but now its work emphasises partnership with churches overseas (our particular area of support is a project in Malawi).

A few years ago the Society decided to rebrand itself as the ‘United Society’ or simply ‘Us’. Getting rid of the awkward colonialist connotations of the full name, and also sounding inclusive and pally. But you can see the problem – a search on ‘Us’ will find instances of the personal pronoun, or pages about the United States, if indeed ‘us’ is not a stopword and excluded altogether from searches.

(Actually even leaving out search engines, there’s a problem with ‘Us’ in conversation. Saying “St Filofax’ is supporting Us this month” is ambiguous. It reminds me of the workplace I had which named its servers after parts of the body (don’t ask….). It was very hard dictating IP addresses or URLs involving the one called ‘colon’.)

So back to USPG it went. Nice idea, but really any rebranding needs to take account of the search engine test.

what happened to subject gateways?

An ‘ILRT alumni and friends’ group has sprung up on Facebook, with several dozen members. We reminisce about events we attended and projects we worked on, and share photos of former staff jollies and awaydays, conference visits, giveaway items and so on. Many of us had something to do with the resource discovery site Intute (I still have an Intute plastic drinks coaster). It covered all subjects and subsumed ILRT’s own SOSIG (Social Services Information Gateway). As well as cross-searching, Intute also had add-ons such as newsfeeds and a personalised space for tagging and exporting content. It ran from 2006 until 2011, with content frozen till 2014, when the site was finally closed.

(Earlier when I worked at NISS – now Eduserv which has been in the news this week – we had a ‘Directory of Networked Resources’, which also covered all subjects. As well as cataloguing all resources on the NISS site, anyone could catalogue and submit a resource. It ceased to work in about 2002, I believe because there wasn’t the staff time available to check the resources that had been submitted, and by then it was only duplicating other gateways, including those that became Intute. BUBL met a similar fate.)

What went wrong with Intute, apart from the general shortage of funds after the credit crunch? We were proud of SOSIG, but it was apparent when I worked on merging it with other subject gateways, that they were at different levels of development and detail. The claimed advantage of Intute was that it enabled cross-searching, but it wasn’t so clear how much of an advantage that was. If you are a historian, do you need to be able to find resources relating to chemistry? And how much do you need a facility to export content, when most people are comfortable with cut and paste, and have their own way of storing information that they need to keep?

More generally, we don’t seem to hear much about subject gateways any more. A search on the term suggests that the main people who run ‘subject gateways’ now are professional organisations. They were labour-intensive – SOSIG had a distributed team of subject experts cataloguing resources, plus a team at ILRT keeping the whole thing going. And the whole process of resource discovery has changed. Better search engine algorithms have lessened the need for them, as relevant, popular resources rise to the top of the list of search results, so there is less need to go browsing through a catalogue.

But they haven’t gone away altogether. For a few years now I’ve been an editor of the Digital Classicist Wiki, which collects and catalogues ‘digital projects and tools of relevance to classicists’ (this last term is quite generously interpreted). The wiki works by voluntary labour from interested people, and is part of the Digital Classicist hub, which in turn is hosted by KCL’s Department of Digital Humanities. We meet for a sprint once a month, with a suggested list of resources to be added or edited, and contribute at other times. The interface is not flashy, and the tagging by category doesn’t catch everything (this is one of the areas I like to work on); the categories themselves are a folksonomy* of what people find useful, not a systematic hierarchical classification scheme. But it’s got over 500 entries in it now. Because of its focus on digital projects, it doesn’t attempt to link to every Classics-related website – for example, it doesn’t cover Classics departments or most journals – but it does include the sort of resources which were once found on Intute.

I don’t know whether similar sites exist for other subjects. But it would seem that in the case of Classics at least, we have reverted to the earlier model of subject gateway – created by volunteer experts and with basic functionality – as being more sustainable than the relatively short life of sites such as Intute.

(* I don’t use ‘folksonomy’ perjoratively – traditional classification schemes such as UDC or DDC would struggle with some of the categories which are useful for digital resources in classics. A topic for another post!)

How to identify a name

If I do a Google search on my own name (let’s face it, we all do vanity searches from time to time) I see at the bottom of the results page a warning sentence ‘Some results may have been removed under data protection law in Europe’. This message is now appearing after all sets of search results where Google thinks you’ve searched on a personal name.

My name is mildly unusual (though I share it with a sometime First Lady of a large U.S. State), but it’s clearly a name, even though my surname is also a common noun. My husband’s name (unique, we believe, with a very unusual surname in Europe) also generates the message. A quick test on other names produces mixed results. A Chinese name is identified as such, for example. Quotes and capital letters are disregarded and initials + surname are also recognised as a potential name. However….

I know someone called ‘Fondant Fancy’. OK, that isn’t actually her name, but she has a name which could also easily be a kind of cake. Searching on it produces references to her, as well as cake recipes. But no warning message. Nor does searching for a bogus surname invented by a friend produce the message, even in combination with a real forename. Nor does searching on my surname, or my husband’s, with an invented forename.

This system for identifying names is obviously never going to work 100%. Clearly they have a list of forenames (so my name gets identified) and surnames (so my friend’s bogus name does not). But some kinds of name are going to cause problems:

  • very unusual forenames or surnames. It’s now quite common to ‘invent’ names out of bits of other names, for example, in an effort to be unique. (And to cause a lifetime of problems every time your child has to give their name. But at least it fools Google.)
  • names where both parts are unusual (at least according to Google’s lists).

I imagine Google has some way of trawling directories of known names, and other sources, in order to get its raw material. Meanwhile there will soon be ways of getting at information which has been withdrawn from Google search results in the European domain. Sites will spring up offering searching at Google.com, or comparing the results of Google.com with Google.co.uk, or even offering to search pages which are found by one and not the other.