Feeds:
Posts
Comments

Archive for January, 2013

Example page from the Text Mining course

Example page from the Text Mining course

The Institute of Historical Research now offer a wide selection of digital research training packages designed for historians and made available online on History SPOT.  Most of these have received mention on this blog from time to time and hopefully some of you will have had had a good look at them.  These courses are freely available and we only ask that you register for History SPOT to access them (which is a free and easy process).  Full details of our online and face-to-face courses can also be found on the IHR website.

I thought that it might be useful to talk a little more about these courses on the blog and provide a brief sample.  Over the coming months I will post up a series of blog posts about each of our training courses, and give you a little sneak peak so that you have a better idea what to expect.

I have chosen the Text Mining module as the first, for several reasons.  First, because it is probably the one that exemplifies what we are trying to do the best.  That is, to make digital tools accessible to historians through a series of introductory training courses.  The Text Mining for Historians module does just this, beginning from the very simple and slowly moving forward toward the more complex.

Text mining is not a tool of itself, but a series of tools that enables us to explore, interrogate, and analyse large bodies of text or texts.  Imagine, if you will, that you have gathered together a corpus of text – perhaps it’s a diary or series of diaries from a particular period, perhaps it’s a series of publications on a particular subject, or maybe it’s a set of official records spanning many decades or even centuries.  Normally you would wade through these documents one at a time and take notes.  Text mining allows you to automate certain elements of this task and helps you to discover trends and connections that you might never be able to do looking at the texts through traditional methods.

This training module takes you from the theory (i.e. what is text mining all about) through to its application for historical texts, and eventually on to the more complex areas of what is called topic modelling, natural language processing, and named entity recognition.  In this post I’m going to quote from the opening section of this course as it gives a description of what historians might consider a good use for text mining.  In this example we are looking at the Old Bailey Trial accounts used on the popular Old Bailey Proceedings Online website:

 ****

Would you like to know how often the word ‘guilty’ appears in the Old Bailey trial accounts? The answer is findable using a standard search engine on the Old Bailey Online website (it’s 182612). How about how many people were found guilty? The answer is 163261. What about the number of defendants found guilty of murder? The answer is 1518. These last two figures are not possible to find through the standard search engine as they are an entirely different type of question; we are not looking for how many times the word ‘guilty’ appears in the proceedings but how many trials resulted in a guilty verdict. We want to discover something meaningful within the body of texts, automatically rather than manually checking each and every trial account.

This is a relatively simple example of text mining where the original documents have been marked up and tagged by surname, given name, alias, offence, verdict, and punishment. To calculate those results manually you would have to work your way through 197,745 criminal trial accounts (some 127 million words in total).

This form of text mining, however, is little more than an advanced search engine – useful but limited. As the creators of the Old Bailey Online themselves admit (and have attempted to redress in a subsequent project):

‘Analyzing this kind of data by decade, or trial type, or defendant gender etc., can re-enforce the categories, the assumptions, and the prejudices the user brings to each search and those applied by the team that provided the XML markup when the digital archive was first created’.

– Dan Cohen et al, ‘Data Mining with Criminal Intent’, Final White Paper (31 August 2011), p. 12.

In other words the search options and text tagging were emphasising and reinforcing a pre-determined expectation of what the resource creators believed was the important data. Text mining tools can help to explore alternative questions more openly.

The Data Mining with Criminal Intent (DMCI) project has done just this by enabling researchers not only to query the Old Bailey site but to export those results to a Zotero library to be managed and from there toVoyeur and other text mining tools for text analysis and visualisation.

The team behind the project uses the example of an investigator trying to understand the role poison might have had in murder cases. Using the search engine brings up 448 entries for ‘poison’ but doesn’t tell us much about what this means. Using Zotero and Voyeur it is possible to filter out the stop words and legal terminology common in all entries to find out what other words commonly appear near to the word ‘poison’. Through this method of text mining it was possible to conclude that poison was probably more commonly administered through drinks such as coffee than through food (see pp. 6-7 of the white paper report Data Mining with Criminal Intent’).

****

If you would like to have a look at this module please register for History SPOT for free and follow the instructions (http://historyspot.org.uk).  If you would like further information about this course, and the others that the IHR offer please have a look at our Research Training pages on the IHR website.

Advertisements

Read Full Post »

shutterstock_82911643When you read a blog post about History what are you looking for?  If you own a blog do you write posts about historical topics?  Why do you do this?  What do you get out of it?  These are all things that are of interest for the Blogging for Historians project. 

The project examines the purpose behind blogging either as an individual or as an intuition for academic purposes.  It looks at ideas about best practice as well as the hopes and desires of those writing or reading the posts.  The idea is to gather a wider body of evidence regarding what people involved in History-related disciplines think of blogging and why they may give it a go.  The project will attempt to do the following:

  • A series of podcasted interviews with practitioners in archives, libraries and history departments who blog about History in one form or another.
  • A workshop (details to follow) about History blogging to be held in the Institute of Historical Research
  • An online survey asking for thoughts and ideas about blogging

A crucial part of the research for the Blogging for Historians project will derive from the survey.  This is live now and it would be brilliant if you could take a moment of your time to fill it in.  The survey is very short and should take less than five minutes to complete.  It is broken down into three sections:

  1. Using blogs
  2. Creating and managing blogs
  3. Personal details

It is the first two sections that will provide the majority of interest and will hopefully raise some interesting thoughts, ideas and questions.  Essentially the survey asks why we create blogs, what do we hope to gain from them, and how do we access blog posts as a reader?  It also asks what do we gain by reading blogs?  From this survey it is hoped that we can further understand the processes and many reasons why blogs have become such a successful forum for writing, reading, and discussion over the last few years, and what impact or importance this might already and in the future have for the History discipline.  

I would be very grateful if you could fill in this survey.  It doesn’t matter if you own a blog or just visit them (or even if you don’t visit them – I would be interested in that too).  The survey is interested principally in History-related blogs, but this does not necessarily mean academic or professional.  There are a variety of History-related blogs out there, all of which have something useful and interesting to offer. 

Access to the survey can be found from this link:

Blogging for Historians Online Survey

It should take no longer than five minutes to complete and personal details will be kept confidential.  Statistics from the results of the survey alongside my thoughts and analysis will appear on this blog early in 2013. 

For more details about the Blogging for Historians project see its own blog here: Blogging for Historians Blog

The project is funded through the SMKE scheme.  For further details about this project see here: SMKE website

Read Full Post »

Sport and Leisure History seminar
5 November 2012
Dion Georgiou (Queen Mary, University of London)
“The Drab Suburban Streets Were Metamorphosed into a Veritable Fairyland”: Spectacle, Ritual and Festivity in the Ilford Hospital Carnival, 1905-1914? 

 carnival

Dion Georgiou, one of the conveners for the Sport and Leisure History seminar discusses the relationship of leisure and the suburban.  The focus of this podcast is on a series of carnivals held between 1905 and 1914 in the London suburb of Ilford.  The carnivals were held for fundraising purposes due to the need for money to establish a new hospital for the community in Ilford.  Dion analyses the carnival through primary source material of local newspapers – in this instance the Ilford Reporter and Iflord Guardian – to gain an insight into concepts of spectacle, ritual, and festival.   The carnival reflects both the old and new and Dion seems particularly interested in the idea of the carnival as invented tradition through repetition.

To listen to this podcast click here.

Read Full Post »

Cover smallToday we are presenting the second of our recent additions to online training.

The Institute of Historical Research are very pleased to announce the launch of our first extensive and comprehensive online training course: Building and Using Databases for Historical Research.  The online course covers the entire life-cycle of creating and using a relational database and can be undertaken at any time and completed at your own pace.

Depending on the type of data that you are using to carry out historical research, databases – such as Microsoft Access – can be an essential tool for the historian.  However, few courses teach databases with historical data in mind or the needs of the historian.  We believe that this course can fill that gap.  The IHR have been running face-to-face training in Databases for a very long time, so the expansion to also provide the course online was an obvious choice for us.

Module 1 from the Databases course

Module 1 from the Databases course

 

Here is the information that we have on our website:

The aim of this training course is to equip you with the skills required to build and utilise a relational database suited to historical research. It is a non-tutor led course that can be completed at your own pace and at a time of your own choosing.

This course is a continuation of the free online course Designing databases for historical research handbook, which provides a free introduction for historians who wish to create databases. Building and Using Databases for Historical Research takes you through the entire process of creating and using databases and is, therefore, a much larger and comprehensive course. As such it is recommended to work your way through the Designing databases for historical research handbook before embarking on this course.

When you register for this course you will work through three modules that look at the following aspects of building and using databases:

Module 1 introduces the tools and techniques used in building a database for historical research. It covers the process of constructing related tables to accommodate your data, as well as introducing a number of practical measures that you can employ to control the quality of the data that you create. The Module also addresses what you need to do to incorporate existing data into a newly-constructed database.

 

Sample page from the Databases course

Sample page from the Databases course

Module 2 introduces the numerous ways that database tools can help you ask research questions of your data, ranging from simply finding individual instances of information at the micro level, through to providing complex networking and record linkage overviews. This Module also provides a basic introduction to employing queries highlight statistical patterns in large bodies of data through aggregation tools.

 

Module 3 addresses two main aspects of using a database in a historical research project: ‘managing’ the database and generating research output. The former element introduces various methods for ensuring good practice in terms of file and version control, back up and documentation – all important aspects of making sure the database is useful to your research; whilst the latter looks at ways of extracting data in various formats (including visual) to share with other historians.

 

The course costs £99 which includes access to the online materials, discussion forums and example data for a four month period.  The course ends with a final exercise where you can test the knowledge that you have gained and receive some feedback.

For further information check out the IHR research training pages or have a look at the Designing Databases for Historical Research Handbook which contains more information on the course as well.

 

Read Full Post »


Palaeography header 72 RGBAfter a period of tests, the introductory module of the new online course on Palaeography and Manuscript Studies is now available. InScribe provides a set of materials suitable both for someone interested in exploring Palaeography for the first time as well as for those in need of a refresher. Graduate students, academics and members of the general public undertaking this introductory module will become familiar with the most important writing styles (scripts) of the medieval period with particular reference to the English context; they will be able to explore a number of newly digitised manuscripts; and they will acquire some transcription practice.

Screenshot from the InScribe course

Screenshot from the InScribe course

The module includes short videos with experts on the field discussing relevant topics. Moreover, transcription can be practiced in the new Transcription Tool developed in collaboration with KCL.

 

Screenshot of the Transcription Tool

Screenshot of the Transcription Tool

Later in the year, we will release new modules that will provide advanced online training on Diplomatic, Script and Translation, Codicology and Illumination. The introductory module is free of charge.

To try InScribe click here. Notice that you will need to register (for free) to gain access to the module.

 

Read Full Post »

British History in the Long eighteenth century seminar
24 October 2012
Adam Crymble (King’s College, London University)
Profiling Irish Crime in London, 1801-1820

Through a combination of close and distant reading of the online Old Bailey Proceedings in conjunction with the Middlesex criminal registers and the 1841 Census of England and Wales, Adam Crymble had been able to discover a seasonal pattern to crime committed by Irish immigrants in the London area between the years 1801-1820.  The conclusion that Irish immigrants tended to commit crimes more often in the Autumn is in contrast to what Adam has found for most other criminals in London.  Crime is generally less prominent in the summer, we are told, when there is more resources available but more prevalent in the deep winter when resources become scarce.  The Irish, however appear different because many of them return home for the winter months (amongst other reasons).

Illustration from book about the trial of Helen Duncan Image of the w:Old Bailey

Illustration from book about the trial of Helen Duncan Image of the w:Old Bailey

The primary aim of Adam Crymble’s talk to the British History in the Long Eighteenth Century seminar is to give a picture of Irish life in London in the first two decades of the nineteenth century.  He is particularly interested in crime and whether there were differences between the types of crime Irish settlers were likely to be prosecuted for than with other criminals of English or other nationality.  The difficulty here, however, is identifying someone who is Irish in the trial proceedings.  Keyword searching of the records brings out very poor, uneven, and inaccurate results.  It is rare that nationality is recorded.  Therefore, a comparison to the Middlesex criminal records, which record birth place, helps to identify and confirm some of their identities.  Bringing in the 1841 census enables further cross-linking to surname.  Whilst, Adam would not claim that these methods bring out complete accuracy, he believes that the results are enough to form accurate conclusions about the nature of crime committed by Irish immigrants in this period.

This paper draws out some interesting and new themes and patterns in regards to criminal activity as recorded in the Old Bailey trials further adding to our picture of life and crime in eighteenth century London.

For more on Adam Crymble’s research see his profile at King’s College London.

 

To listen to this podcast click here.

Read Full Post »

The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog.

Here’s an excerpt:

4,329 films were submitted to the 2012 Cannes Film Festival. This blog had 24,000 views in 2012. If each view were a film, this blog would power 6 Film Festivals

Click here to see the complete report.

Read Full Post »

Older Posts »

%d bloggers like this: