Seeking a system

I’ve been trying to find a way of organizing my manuscript sources for the dissertation in a way which is readable, makes sense, and can be exported to do interesting things if the need arises.

At present, most of my sources are in a spreadsheet that I use to record documents as I photograph in the archive, with columns for year, month, day, sender, receiver, gist, and archival information; I add the transcription later. I had set up what is essentially a vertical spreadsheet to capture essential data in addition to a large text field for the transcription, but that was in a rather old version of FileMakerPro and only had enough records to serve as a test case.

(I used to be very good at building relational databases in FileMaker, making use of the special features and functions, etc. I have a fondness for it because as a kid I would play around with the database option in ClarisWorks, which FileMaker grew out of.)

A new version of FileMaker, even at the individual educator price-point, is well above what I’m willing to pay, and it feels a bit odd to use FileMaker after years of hanging out with people who write their own code. Thus, I have been considering my options. Ideally I’d like a relational database with tables for individuals, letters, locations, and repositories (the last is negotiable), and the ability to export data as a csv so I can plug the information into something like R to play around with it. If nothing else I would like a large text field with formatting options for entering and reading transcriptions.

Airtable, which is hosted online and thus would be accessible from any computer (and backed up externally) does not have a good way to view large text blocks, as far as I can tell, so it’s out of the running.

I already use Zotero to keep track of secondary sources, and it is apparently possible to get metadata out of it and into a csv. However, the notes field has always felt small to me and there is not (currently) a place to track the locations of the author and recipient of a letter except within the transcription itself. It’s not ideal but is still an option.

Another option is Omeka S, which is still in beta. It has the ability to create resource templates with whatever elements you want from various existing vocabularies, has media storage and a mapping plugin, and it is possible to import from CSVs and output via API. The main sticking points are that I’d have to come up with a standardized “title” for the documents, and coming up with a specific mapping of my current metadata fields to a mix of Dublin Core and Bibliographic Ontology elements. Of the options that don’t require me (re)learning how to code, Omeka S is the biggest contender after just continuing to use a spreadsheet.

There is a forthcoming system that looks promising – Tropy – but it’s not yet in beta and I need something sooner rather than later (maybe for the next project?)

I could, of course, try to remember my lessons in PHP and MySQL from the autumn of 2013, when I built a small but functional database with a sample set of data. Or I could try and learn Ruby or Django in order to fork and modify Project Quincy to suit my own needs. I’m not entirely sure about the cost/benefit of that in terms of time.

At this point, I’m sticking to my spreadsheet for metadata (currently using Google Sheets, although I know how fickle Google can be) and keeping transcriptions in a separate location. Any suggestions of alternate (well documented) open source database solutions are welcome.

Share this: