Where should the digital humanities live ?

Don’t get me wrong. The cluster of work that bears the label ‘digital humanities’ is important; very important. I’ve spent the last decade or so of my working life in the gap between historians and application developers, trying to make sure that digital tools get designed in the ways historians need them to be designed. Projects digitising books; collaborative editing platforms; institutional repositories; Open Access journal platforms; web archives: I’ve done a similar job, more or less well, in each case. As well as that, I was (and remain) founding co-convener of the Digital History seminar at the Institute of Historical Research, which looks to showcase finished historical scholarship that would have been impossible without the digital, broadly defined.

But there is a problem with how we understand the term, I think. I receive the term as signifying a community of practice, of scholars employing new technological means to achieve the same ends as they did before ‘the digital’. And as that community of practice grows, one would naturally expect a degree of self-consciousness within it as to the distinctiveness of what we’re all doing. This is inevitable, and almost certainly helpful, as new journals, conferences and online spaces appear to in which work can get published that might be too innovative for traditional channels to handle, and for discussions about method to take place safely.

My worry is over the institutional location of this activity. Several universities have spotted the potential of locating DH people together, and so there are several Schools or Faculties or Departments of Digital Humanities, all centres of real excellence, in universities in the UK and elsewhere. It’s an institutional means of nurturing something important, and it seems to work. My concern is with the long-term.

As in all large organisations, the internal structures of universities have their own force in determining the shape of the work that goes on within them. Structures shape cultures and cultures influence behaviours. It’s nobody’s doing, but the effect is real.

A department has a head, who usually sits at the same table as the head of History, or Philosophy; and funds run down these channels, and reporting lines back up. And my concern is that this Digital Humanities, this enterprise that starts to be treated (in institutional terms) as a discipline in its own right, could become a silo. The unintended consequence of creating a permanent space in which to foster the new approach is that Dr So-and-So in English, or Philosophy, can say “Oh, a digital approach, you say ? You want DH – they’re over in the Perkins Building.” Enterprising individuals and projects can and do bridge these gaps between departments; but the effect of the existence of the silo on the general consciousness has to be reckoned with, and mitigating the effect takes time and effort.

Put it this way. When Microsoft Word came within the reach of university budgets, no-one proposed that a Department of Word-Processed Humanities be set up – although word-processing was a technology that became ubiquitous in a short space of time, and had profound and widespread and general effects on a crucial element of academic practice – just like the digital humanities. And right now, there are not Schools of Social Humanities, to foster communities of practice in the most effective use of Twitter for dissemination and impact. Both these were disruptive technologies which were (and are) promoted across departments, faculties and whole institutions until they needed (or need) promoting no longer.

The end game for a Faculty of DH should be that the use of the tools becomes so integrated within Classics, French and Theology that it can be disbanded, having done its job. DH isn’t a discipline; it’s a cluster of new techniques that give rise to new questions; but they are still questions of History, or Philosophy, or Classics; and it is in those spaces that the integration needs eventually to take place.

Wikipedia, authority and the free rider problem

I am a selfish Wikipedian. By which I mean, that while I am very happy to use Wikipedia, I have not been very serious about contributing to it. There are a small handful of pages for which I keep the further reading (reasonably) up to date, and correct if a particularly egregious error appears.  But it is sporadic, and one of the first things to be squeezed out if life gets busy.

And I wonder whether there aren’t real gains for historians from helping Wikipedia become truly authoritative, but which are obscured by natural disincentives in the way in which our scholarly ecosystem works.

Firstly, the disincentives. One is a residual wariness of something that can be edited by ‘just anyone’. I myself have dissuaded students from citing Wikipedia as an authority in itself, as part of what I am teaching is the ability to go to the scholarly article that is cited in Wikipedia, and indeed beyond it to the primary source. But my experience is that, in matters of fact, Wikipedia is very reliable unless it concerns a highly charged topic (the significance of Margaret Thatcher, say). And even the making of that judgement is an important part of learning to think critically about what it is we read.

Perhaps more significant is the fact that Wikipedia appears to be edited by no-one in particular. One of the contradictions of modern academic life is that most scholars would, I think, assert the existence of a common good, the pursuit of knowledge, towards which we work in some abstract sense. At the same time, the ways in which we are habituated to achieve that end are fundamentally about competition between scholars for scarce resources: attention, leading to esteem, leading to career advancement.

We write books and articles, which help us get and then keep a job. A smaller but growing number write blogs like this one, and tweet about those blogs. Part of this is about ‘impact’ (that is to say, increasing our share of those scarce quanta of public attention). And all of it depends on being identified as the creator of an item of intellectual property: tweet, blog post, article, book, media interview. Few, even at the wildest edges of the Open Access movement, propose licensing of scholarly outputs without attribution, even if a work may be licensed for the most radical of remixing. All depends on being known.

But Wikipedia doesn’t credit its authors, or at least not in a prominent and easily reportable way. And so the question arises: even though contributing to Wikipedia is to the common good, what is in it for me ?

The answer may depend on a more speculative and more risky model of collaborative work, but one which holds out the prospect of a genuinely authoritative resource, made by authorities. And that in turn should reward the best published work, in the good old-fashioned and citable way, by channelling readers to it. (It would be even better for works available Open Access.)

But it depends on everyone jumping together. As long as some contribute, but others only consume, there remains a classic economist’s ‘free rider’ problem. When people use a resource without ‘paying’ (in the form of their own time, and their own particular expertise) then the cost of production is unevenly spread, and the quality of the product denuded. But if editing Wikipedia became a genuinely widespread enterprise amongst scholars, then even if my contribution is not recognised with each and every edit, my ‘main’ work (if it is any good) will be cited and integrated into the fabric of Wikipedia by others. And we might get a more informed public debate about each and every matter, which looks like impact to me. Perhaps I should get more serious about this now.

What use is a personal tweet archive ?

A little while ago I wrote a post about the need to plan for archiving the digital “papers” of historians. In that post I talked about research data (what we used to called “notes”); about the systems that form the bridge between that data and the writing process; and about written outputs themselves, and their various iterations. It looked forward to a time when all these digital objects, in multiple formats but from one mind, are available to future students of the way the discipline has developed.

What that post neglected was data about the way I publicise my work. Perhaps one of the reasons we’ve been slow to think about this is that, at one time, most academics didn’t need to. Apart from giving papers at gatherings of the learned, the task of publicising one’s work belonged to the publisher. And if one’s publisher was the right one, then the work would inevitably end up in the hands of the small group of people who needed to know about it. And whilst the media don is not a new phenomenon, most historians might have thought such self-publicity outside the academy something of an embarrassment, even rather vulgar.

How times change. Universities are training their staff in dealing with the traditional media and in the most effective way of using social media. And this opens up a new category of data that ought to be archived, if only to understand how the push for ‘impact’ actually played out in these early years. And some of it is being archived. The Library of Congress are archiving every tweet, although it isn’t yet clear how that archive may be made available for use. The UK Web Archive, along with other national web archives, have been archiving selected blogs (including this one) for several years, and the EU-funded BlogForever project is looking to join those projects up. But this approach, valuable though it is, separates the content from the author, and from the rest of their digital archive. Whilst that link might be retrievable at a higher discovery layer, something important is still lost.

But now the helpful folk at Twitter, in a move that ought to be applauded, have made it very quick and easy to download an archive of one’s own tweets, right back to the beginning. And so I did: 1682 tweets, over 14.5 months. But what to do with it ?

Straight away, scrolling through a long CSV file starts to tell the story of the making of other things: the first retweet of someone else’s work which was subsequently to influence my own; the first traces of an idea, or even of a question I was beginning to ask, which spawned a blog post, and then a paper. I also find that I shared at least one link in more than two thirds of my tweets, which sounds public-spirited until I add that a good proportion were my own posts. I can start mining the data for key terms and themes, and how they ebbed and flowed.

It would be useful if there was a way to keep this data fresh, of course, to avoid going back to Twitter for a new download every so often. And, thanks to @mhawksey, there is a simple way of doing this, using Google Drive. Martin explains all here, with a handy video set-up guide.tweet archive

And so I now have a cloud-based archive of my tweets, complete with a basic search and browse web interface. This is now a lazy man’s look-up of old tweets and the resources they pointed to, searchable by handle, hashtag or key term.

But perhaps this is something about which most people are lazy. Social media provides us with an overwhelming stream of quite-interesting things, in amongst which are nuggets of gold. Those nuggets I can manage in the old way, by recording them properly, perhaps in a bibliography. I might even read them, one day. But the quite-interesting stuff, whilst being too much ever to record properly, will probably remain quite interesting. And so this provides a middle way between formal curation of a webliography and just searching the live web (which assumes I can remember enough about what I’m looking for.)

Might this archive now change my future tweeting ? Early days to judge perhaps. But I think it may, since I may now retweet and share in preference to using favourites, in order to get a link to a resource into the archive. I can also imagine starting to use personal hashtags, as a way of structuring my own archive at the same time as I tweet. Real-time curation perhaps ?

And I might share it too. Since this is now unambiguously my own data, rather than Twitter’s, I can licence it for reuse by others in larger corpora for analysis. Imagine a pooled archive of the tweets of many historians. Now that would be interesting.

Open Access and open licensing

Much of the recent concern about Open Access in the UK, at least for the humanities, has not been about the general principle, but rather about the means.

In my hearing, however, perhaps at least as much consternation was in reaction to the prospect of subsequently licensing those outputs for re-use using one or other of the Creative Commons suite of licences. CC allows various degrees of redistribution, and re-use, without further recourse to the author, but with credit given. Commercial use can be restricted (or not); the making of derivative works can be provided for (or not). You can Meet the Licenses here.

As an advocate of greater Open Access in the humanities, I suspect that Research Councils UK made a tactical error in suggesting that it intended to enforce the most liberal of these licenses. CC-BY ‘lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation.’ Here’s why I think the focus on CC-BY has been a mistake, at this point.

Personally, I have never quite been convinced that ‘full’ or ‘real’ OA was dependent on maximally open licensing. I see free availability of the content for reading and citation as quite distinct from the subsequent reuse of that content in other ways. Both are desirable, but can be decoupled without damage. A move to any form of OA represents a major cultural change, albeit one that is necessary. Given this I would rather see an OA article with all rights reserved (as a staging post) than to not see that article at all. And to couple the two too closely risks the first goal by too strong an insistence on the second. Over time, cultures can and do change; but we ought to practice the art of the possible.

More generally, it isn’t yet clear to me what re-use of a traditional history article looks like. Quotation (with a reference) is a mode historians understand; so is citation as an authority in paraphrase. Both are possible from an article with all rights reserved. Compilation of readers and anthologies would be made easier by CC, but doesn’t require CC-BY. It also isn’t clear what ‘remixing’ of traditional historical writing looks like if it doesn’t involve quotation. Historians are also well used to acknowledging a seminal work in a footnote (or even once only in foreword or acknowledgments) without quoting it directly, but is this all that giving ‘credit’ for ‘remixing’ an idea really means ? If so, there is little to fear; but I’m not sure we know, yet.

Over time, there will be possibilities for data-mining in corpora of scholarly articles, but we ought to think on about whether this can be accommodated without full CC-BY. Much turns on the question of what counts as a derivative work in the context of an aggregated database, and what the output to the user is; and whether an insistence on  non-commercial re-use shuts down important future possibilities that we can’t yet foresee.

It may be that CC-BY is the right default option; my feeling is that it probably will be. But I think we should probably take more time to document some of these use cases, in order to plan a movement towards licensing for historical writing that is neither more restrictive nor more liberal than it need be, and allows scholars to dip in their toes without plunging in up to the neck. For now, there are horses we should avoid scaring, lest they bolt.

Implementing Google Authorship

Some time ago I read this useful post from IonLeap about an impending shift in the method Google uses to rank pages. Put briefly, it involves a move from ranking content by the number of links that point to it, to a system based on the author.

How do Google propose to do that ? Well, it’s based around forming connections between an author’s content, and their Google+ profile if they have one. As well as having potentially very significant impacts on where authors choose to maintain their ‘hub’ – the profile around which everything else revolves – it promises to help authors tie together their work, and to give it greater exposure in Google search.

How is it done ? Very simply – and the post above gives some simple instructions, also available here. Once you have a G+ profile, one simple line of code within your blog template does the job; in my case, in the link to the G+ profile over on the right. Leave it for a couple of weeks, and this begins to happen.

google authorship

Not only does this post score highly on a very general search, it shows the photo from my G+ profile (if the user is logged into Google), and links to it. [See note 1 below]

I’ve yet to see whether this will lead to increased traffic to the blog; it’s early days. But it strikes me as a quick and easy thing to implement for bloggers, and I don’t see an obvious cost or much risk.

[Note 1. As @j_w_baker rightly points out, how the ranking works may be influenced by the 'filter bubble', and I didn't test these results before implementing the change. But they seem roughly comparable between machines so far. I'd be very interested to hear any before-and-after findings from others.]

Religion, politics and law in contemporary Britain: a web archive

[This is an expanded version of a post first published in the UK Web Archive blog.]

It has been over two years in the making, but I am delighted to be able to say that my own special collection in the UK Web Archive is now online.

UKWA (for which I am engagement and liaison lead, based at the British Library) collects and preserves websites of scholarly and cultural importance for the UK web domain. Already UKWA collect some 11,000 sites, and has more than 50,000 instances in total, with series of snapshots of some sites going back the best part of a decade. That’s a lot of data, and so one of the ways into the archive is by means of the special collection, of sites on a particular theme.religion politics law thumbnail

A couple of years ago, long before coming to the BL, I joined a project at the Library which brought together a group of scholars to guest-curate special collections on our research interests. I had become interested in the sharpening of the terms of debate about the place of religion in British public life, particularly since 9/11 and the London bombings in 2005. I’ve long been interested in public debate about church and state; but until relatively recently this happened by means of the print press, public oratory, ephemeral publication and the broadcast media. It struck me that a good deal of this debate had already moved online, and so new ways of capturing and preserving it were going to be needed. And so, the ‘politics of religion collection’ (as it was then known) was born. (See these posts on my progress.)

I fairly soon realised why I’m not an archivist, since all sorts of unfamiliar questions hove into view. When archiving the web, what is the base unit ? A whole domain, such as www.bbc.co.uk ? Or a single URL ? Several sites, like that of the National Secular Society or the Christian Institute were central to my concerns, and so could be included whole. But what does one do with a single post on a PR blog about the handling of the sharia law row by Rowan Williams and his staff ? In fact, the collection is a mixture of whole domains and individual directories or pages from larger sites; an uneasy compromise, but a necessary one.

Also (and I may as well come straight out with it), the collection is selective, and thus in a real sense subjective. As a watcher of contemporary religious politics, against the backdrop of recent history, my impression is that the place of religious ideas, symbols and organisations in public life is at its most contested for decades. Historians are traditionally wary of assessing the significance of present trends, since it leaves hostages to fortune and later events. Yet, all archival choices from a pool of material not defined in advance by provenance involve some judgements as to significance; and historians are as well suited as any to make those judgements. And so I have put the collection together now to enable future historians to begin to answer the questions which I anticipate will be significant. (See an older post on why I think historians should engage with this way of working.)

There were other issues. Were I the archivist for a particular organisation, I’d have no problem with getting permission to add material to my archive: everything produced in-house would be in view. The problem for web archiving is that we’re dealing with other people’s copyright work, and so an individual permission is needed for each site. I have a long list of sites which I would dearly love to add to the collection, but for which (for various reasons) we’ve had no response. So, if you are the owner of Protest the Pope, or Holy Redundant, or Christians in Politics, please get in touch. For now, even if the collection cannot be anything like comprehensive, I do hope that it is at least coherent.

There are particular strengths, and some gaps. It includes many campaigning organisations, both secularist and religious, and is heavy on the conservative Christian groups about which I myself know most. It is very light on non-Christian faiths, since I know the field much less well.  It is still very much open, however, and so suggestions of sites that ought to be included are very welcome, via this blog or at the UKWA Nominate a Site page.

What can you do with it ?  For now, there is a simple browse function; and the collection can be searched on its own.  And over time, all sorts of uses will present themselves, which we can’t currently imagine. But the data is there: a growing longitudinal series of timed instances of websites, identified as thematically related; that is to say, an archive.

Reflections on Academic Writing Month 2012

As AcWRiMo draws to a close, I thought it worth reflecting on, both about my own participation, and what it might tell us about the enterprise of academic writing more generally.

As it happened, on November 1st I was already in something of a purple patch with regard to my own book. I had tried a new approach (which I blogged about here) which was working very well indeed. It still is, and I don’t think I have written many more words this month than I would have otherwise. But I do think AcWRiMo has helped, in that there has been much and surprising mutual support via Twitter, as I and others have checked in to report progress day by day.

More broadly, AcWriMo has prompted much and interesting reflection on good practice for writing. Valuable posts for me included these from ThesisWhisperer and US Intellectual History, and several others that stressed the formation of a writing habit, by small daily steps. If AcWriMo becomes an annual fixture (which I hope it does), then it could hold open a space each year not only to make a determined effort at actual writing, but also to step back and think about what we do as academic authors, and how.

Two broader thoughts also present themselves. Firstly, as @jfwinters observed, AcWriMo has shown up a gap in general training provision for new graduate students. I remember a rather perfunctory graduate training course on how to structure a piece of work, but little on the day-to-day to discipline of getting words on paper. My strong impression is that if graduate students get any guidance at all, it is by the happy accident of having a supervisor who thinks it a priority, rather than because it is an integral part of learning the academic life.

Also, if we have AcWriMo, how about Friendly Peer Review Month (FrPeReMo) ? There have been a number of interesting ventures recently in Open Peer Review, in which peer review becomes an iterative process conducted in the open, as prelude (or even substitute) for formalised and anonymous peer review as managed by publishers. Part of the success of AcWriMo is that it makes one accountable to others. Why not extend the principle to some kind of mutual critique of written work (as writing) – the deal being “I’ll comment constructively on your writing if you will on mine” ? Thinking back, I don’t think anyone at all (apart from my supervisor) read my thesis before it reached proof-reading stage, and I’m sure it would have been better if they had. I need not be able to comment on the content of your writing, but I can surely come to it purely as a reader, and a fellow writer.

Early thoughts on ORCID

It was very encouraging last week to see the launch of ORCID, a service which (on the face of it) would seem to offer a solution to one of the key problems of decentralised scholarship and publication: how to connect, in a machine-readable way, all of your published output.

The problem is obvious, really. I have published nine articles over a period of six years: some in journals, some in annual volumes that look a bit like journals, others in collections of essays. At the same time, I’ve published a host of book reviews, some in print journals which exist online, others in online-only publications like Reviews in History. And, I’ve also had a hand in a number of funded research projects which issued in various semi-published reports. And the only means of connecting them together (and distinguishing them from the doubtless admirable work of Peter Webster the oceanographer, and Peter Webster the archaeologist) is this blog, which exists only in unstructured HTML, and so isn’t easily picked up by automated services.

So: a unique identifier for each researcher, managed centrally somehow, seems to make a lot of sense. And so here it is: my shiny new ORCID ID . ORCID will allow you to quickly associate articles that have Digital Object Identifiers with your ID, along with patents. And quick off the mark is ImpactStory, which can then digest your ORCID ID and pull together various impact metrics. (Here’s a first attempt for my own.) It is early days for the service, since it only launched last week; but there is already an issue, even now.
A while ago, I argued in Research Fortnight that the arts and humanities were in danger of being left behind as the pace towards gold open access picks up. And I think there is a risk of the arts and humanities getting left behind here as well. Allow me to demonstrate why.

Of my nine published articles, only one of them appeared in a journal which routinely assigns DOIs. The rest appeared only in print. OK (you might think): that’s because some of them are on the old side, and so the picture ought to improve over time. Answer: yes, up to a point.

Next year this forthcoming article in Parliamentary History should come with a DOI, certainly. However, another paper on Michael Ramsey will appear in a printed volume some time in 2014, which is unlikely to appear as an e-book. Similarly, my book on Michael Ramsey, slated for 2013/14, will appear only in print. And, the book the proposal for which I just submitted (due for completion in 2014 and thus for publication in 2015) won’t either, unless things change very quickly.

Some of this could be solved by allowing the association of ISBNs with researcher IDs in ORCID; but the first release doesn’t support it, which I confess was a source of amazement to me. I also have no fewer than 20 items in  an institutional repository, all of them with structured metadata which could usefully be integrated in some way; but perhaps that is in the pipeline.

But more generally, ORCID seems to me to be a system that suits the natural sciences, where most if not all publication happens in journals that are available electronically, and from publishers of the size to be able to afford to implement DOIs. This simply isn’t how humanities publication works; and I don’t see a clear way in which that will change any time soon.

Why historians should care about web archiving

Someone said to me at a conference recently (not his exact words), “if we can’t get historians interested in web archives, then who can we reach ?” But so far, there hasn’t been much visible engagement between contemporary historians and web archives, even though those archives are now well established at national memory institutions such as the Library of Congress or the British Library. [Full disclosure: the latter employs me, but this post represents a personal view, not the Library’s.] And as an historian who has been involved with web archives since before coming to the BL, I think this needs to change.

The evidence is mounting of how vulnerable the web actually is. One study found that 11% of content shared via social media will have disappeared a year later, and another 7% each year after that – a startling rate. And since there was a time lag between the migration of the archival record into a digital-only mode and the establishment of web archives, there is already a large hole in the record from perhaps the mid-nineties to the mid-noughties. A recent post of mine over at the UK Web Archive blog showed just how significant are some of the sites that now exist only in web archives; and that’s only the ones the UKWA managed to capture in time. We can only guess at what is now lost forever.

So, in twenty or thirty years’ time, historians of the very late twentieth century will have reason to regret that no-one thought to keep their primary sources safe for them. But there is another problem. It is a brave historian who writes on the very recent past, a remote subject indeed; I myself wrote an article in 2004 that extended up to 1990, and not without some unease about the hostages to scholarly fortune it gave. And so most of the historians who have the greatest personal stake in archiving the web right now haven’t yet entered the profession. I would argue that historians are uniquely well-placed to view the present in relation to the past, and thus to anticipate those aspects of the present for which there is most need for a record. But it would take a significant change in culture such that historians working now start to take a hand in preserving sources for our successors.

“But this isn’t my job”, the response might be. “Surely this is what archivists are for ? (It always used to be.)” Granted, in a pre-digital world, institutional archivists in government, civil society, the churches, concentrated on capturing unpublished materials produced in-house, took in those personal archives that were offered to them, and left the copyright libraries to pick up books and journals. If the ephemeral stuff in the cracks didn’t survive, then such was life. Now, the volume of words is so much greater, and the means of disseminating them so dispersed, that archivists as a profession (already an undervalued and underpaid one, I might add) can’t hope even to see, let alone arrange to capture everything of note.

So: we need a new model of archival curation, based on a partnership between archivists, scholars and the public. The technical means are there; it simply needs a new form of engagement, and we historians can help make it happen.

A Heisenberg Principle of web archiving ?

Whatever it means to real scientists, the famous ‘uncertainty principle’ of Werner Heisenberg is sometime popularly taken to mean that it is impossible closely to observe something without in some way altering it. It’s also a conundrum that has faced anthropologists when observing cultures far removed from their own: how far does the consciousness of being observed alter the behaviour of the subject ?

I’ve been publishing in print in the traditional way for some years now, and everyone knows that books are (in theory) permanent, that they find their way into libraries; and so one writes conscious that the words cannot be unwritten. Writing for the web, however, has had a more transient aesthetic: I can write with the freedom that comes from knowing that (in a site I control) I can retrospectively edit at will, should I choose to. There are good scholarly reasons not to, to do with making my work reliably citable; but in the final analysis I am not bound by them.

So far, the visibility of web archiving by national memory institutions is not yet high. In addition, if the UK Web Archive considers a site important enough to archive, then it must gain explicit permission; and by no means all website owners give that consent.  This blog is already being archived by the UK Web Archive  (last crawl in April 2012); but had I been at all concerned about the things I write having a permanent existence, then I could have withheld permission.

On the horizon is a major piece of legislation that could subtly but importantly change things: the Legal Deposit Libraries (Non-print Works) Regulations 2013 (see the most recent public consultations here.) As and when these successfully negotiate the passage through Parliament, any website in the .uk domain could be archived for posterity without the explicit consent of the owner.

The change in the law in itself isn’t my main point, however: the effects of increasing consciousness of it is. Put simply: will some words that might have been written in 2012 not be written in 2014 because the author was conscious that they could not later be retracted ? I think it likely. Would it be a ‘bad thing’ ? I don’t suppose we know yet; but we ought to be thinking about it.