Week 10: Data Mining and Distant Reading

This week Jeri and I are leading the discussion. She has already posted an excellent overview of the readings, so I thought I would look at the sites and tools.

With Criminal Intent was a response to the Digging into Data challenge in 2009. It combines a specialized API with a personal research environment and visualization tools. The data is all from the records of the Old Bailey.

Let me just say that Dan Cohen is right about the importance of a good API. It makes a huge difference. I mucked around with the Old Bailey website when I was working on my Masters in Edinburgh – we talked about its utility in a class on material culture in 18th century Britain. It was fun to poke around but hard to get anywhere.  The API developed for With Criminal Intent is so much more useful, because you can drill down so quickly.

Compare the two search pages:

Old Bailey Search
Old Bailey search

The old search page ( top left) was oriented more towards punishments, verdicts and specific persons. The API (bottom left), on the other hand, looks more towards general categories and helps you narrow down to subcategories of punishment or offense.  Moreover, once you’ve started the search you can further narrow by the existing categories, based on what the results are.

Old Bailey API

To explain: I ran a search for offence category Theft, subcategory shoplifting, where the victim is female. I was then able to see the rate of punishments for qualifying crimes – the top being transportation with 144 sentences. From here I can further narrow my search, view results, or move the data into Zotero or Voyeur.

What this API allows me to do that the old search did not is to generalize while still narrowing down. Not only did the creators of the API make gender a category for analysis, but they also defined for the users the subcategories of offences, verdicts, and punishments.

With Criminal Intent is, in my opinion, a good model for data mining in history. Note that  from the API you can directly access the raw source, the actual entries.  While a historian using the site can look at larger trends they can also zoom in on each and every instance if they want.

Compare that functionality with Google’s Ngram or Moretti’s graphs of novels. As Moretti points out, on the graphs each work is only “tiny dots in the graph of figre 2, indistinguishable from all the others.” ((Franco Moretti, Graphs, Maps, Trees: Abstract Models for a Literary History (London, New York: Verson, 2005), 8.)) From Google NGrams you can move to book search for a year or set of years, probably best done by opening a window. You cannot, however, narrow the search beyond the date and the general language corpus.

What do we make of these sites? What do they make of history? Which are tools and which are methodologies? Any advanced search option gives you choices of which parameters to narrow, but those parameters are pre-defined.

Do these tools, or methodologies, change the way we formulate and ask questions of our historical data? If nothing else, it certainly alters what we can discover, in very little time.

Week 5: Public History (on the web)

The title of this week in Clio Wired is Public History and the bulk of the reading list is web sites where the general public can engage with history. The sites are:

Two of the sites – Deerfield and Freedom – use a pre-content site, a sort of splash page. The image and text presentation is decidedly reminiscent of the initial physical encounter with a museum exhibit: a wall with text summarizing the exhibit and a representative image or two. Both of the sites were developed by museum groups, so the transfer of layout theory makes sense. As a web user, I don’t know that I like having to take an additional click to start exploring content, although to be fair you can go from the first page of Freedom to the collections. However, I can see how the layout might create a mental note for the user “This content comes from a museum”; which could increase the users’ trust of the site content, given Rosenzweig and Thelen’s 1998 finding that Americans trust museums for “truth” in history.1

Three of the sites are very clearly about mapping history, geolocating photographs and the like: Cleveland Historical, History Pin, PhillyHistory. Of these, I find myself most frustrated by HistoryPin because it has the least amount of information about each image or object in its collection. Cleveland and Philly’s provide contextual information for each image, Philly going so far as to provide quotes for some of them. These all have mobile app versions of their content, and I would be curious to find out how much of the text data is displayed in the mobile versions. I can see that someone developing a purely mobile app might think photos were the best way to engage people in history, and yet I can’t help but think that the addition of context allows for a richer experience.

In fact, that’s my overall impression of these sites. The ones with more context, more ability to “dig down,” as they say, are more engaging to me. What is the object or image, where did it come from, how did it end up on the web site, what does it mean? As a historian, I’m used to asking these questions and figuring them out for myself, but as a curator one of your jobs is to provide some guidance to your visitors. For me, It’s not enough just to put things on the web. You have to give clues, at the very least, to help visitors understand why it was worth digitizing and putting online in the first place.

1 Roy Rosenzweig and David Thelan, The Presence of the Past: Popular Uses of History in American Life. New York: Columbia University Press, 1998.

Week 4: Design, Standards, and Usability

I enjoyed reading the various essays from A List Apart regarding design and usability, but the piece for the week which most engaged me was the article by Elings and Weibel regarding shared metadata standards for museums, archives, and libraries.

My job at the historic house and what I am now doing with CHNM both came down to assigning keywords, metadata, to historic documents and (at the house) objects. One of my roles at the House was to propose, evaluate, and define new keywords for our relational database. As a result, I’m aware of the benefit of a controlled vocabulary, as well as the challenges which accompany it.

As I think about it, the challenges fit well with the readings about design and architecture of websites. Both situations force the builder or implementer to look at the audience, or audiences, they plan to serve, and how the audience(s) will interact with the data they provide.
Continue reading “Week 4: Design, Standards, and Usability”

Access issue: what window?

I’m currently reading the design and building oriented sections of Cohen and Rosenzweig’s Digital History, and remembered something I keep forgetting to mention. One of the issues which kept coming up about websites and web-based history interactives when I was with the museum was the fact that some minority populations primarily access the web via mobile phones (and not necessarily smartphones, either). How should/will this impact design of sites which want to do truly “public” history?

Week 3: What Is Digital History?

(For those not in the class, the syllabus is online.)

The readings for this week took me from familiar ground to unfamiliar and back again. Some of them touched on some topics I’d been wanting to blog about but hadn’t quite gotten a hold of, including the history and nature of hypertext and the illusion of permanence of digital media. I will talk about hypertext in a moment, but I want to reflect on the question of presentation of information which is raised both by Cohen and Rosenzweig in Digital History and in the JAH article “The Promise of Digital History“.

Both of these works include discussions about how the material (source, image, analysis) is presented to the user. While design and layout is an element of print publishing, it seems to me a less vital element in that medium than it is for digital history. In fact, the need to consider interactivity, presentation, and layout when creating digital history are part of why I feel that it has many similarities to museum work. Creating an exhibit in physical space and creating a digital history require the curator/historian/team to tackle many of the same questions.

There are design questions: font choices and color schemes, and how long captions/label text should be. There is the inevitable uncertainty of whether anyone will read your text at all, because they might simply look at your lovely objects/pictures and move on, completely missing the carefully crafted context you’ve provided. Then there are questions about layout (how do we want to organize the flow? rooms in a line or independent path creation?) and interactive/hands on elements (do we use them? what do they do if we add them?)

Interactivity and layout for digital historians lead my thoughts to  hypercard. The first personal computer I ever interacted with was the Apple Macintosh (IIe, I think) that my Dad brought home. Most of my first experiences in digital narrative were built using hypercard; as a result I tend to conceptualize digital text in hypercard form, at least in the draft stages. “The Promise of Digital History” and the chapters from Digital History made me realize that I have been thinking in an essentially traditional manner and also inspired a model for alternatives.

How I had been thinking was very linear narrative. If it were a museum exhibit, you would go in via a door or space marked “Entrance” and all the rooms would have only two doors, one in and one out; guided like a historic house or self-guided like many Smithsonian exhibits. There might be hyperlinks which opened small windows of commentary, but the history would otherwise progress in the traditional way.  My hypercard model is a copy of Douglas Adams’ Hitchhiker’s Guide omnibus edition  that technically belonged to my older sister. There may have been some hyperlinked words with sounds or images, and I believe there was a built-in standard dictionary, but the only interesting deviation from a published version was that file opened to a window with a large red button reading Don’t Panic which you clicked to open the text.

The other hypercard experience I remember is a definite contrast. Its physical layout was more a group of rooms opening onto each other in a variety of ways, some of them utterly unexpected, and the stories you uncovered ran every which way, overlapping and underlapping in truly delightful ways. It was a game-story called The Manhole.1 Although the stories were set, the fact that you could discover them in your own order at your own pace was exciting. Simply the fact that the story was discovered and not just in front of you was exciting.

Reading Adams’ work on the computer was entertaining, but playing The Manhole was interesting. Which is why I think it’s worthwhile for digital historians to consider Peter Gallagher’s comment regarding “a pursuit of content versus a delivery of content.”2 The digital medium has such potential to engage users in a pursuit of content, and while it may not always be feasible, it must  not be forgotten.

1 Some images from the game can be seen at Smackerel.net.
2 “The Promise of Digital History,” Journal of American History, September 2008. The comment is in response to a question regarding similarities between museum exhibits and digital history, located roughly halfway down the page.

Alternatives to Delicious

According the news, Delicious (an online bookmark manager) is being sent the way of the Dodo by Yahoo. I hadn’t even realized that Yahoo had acquired delicious, which I’ve been using off and on for years now.

So, the question becomes: if delicious is dead, what do we use instead? Suggestions thus far include:

  • Licorize – looks more focused for group work (tagline: “for the world wide tribe”)
  • Diigo
  • Xmarks
  • Evernote – I already use this, but with more emphasis on the “note” aspect. Still, it could work.

Comments or additions from the crowd?