A very large forest

This week and next are text mining and topic modeling in the class on programming for historians. I’ve been reading around on both topics (I present next week, on topic modeling), and I keep shifting back and forth from “okay, this makes sense” to “wait, what?”

It is the problem of forest and trees. Individual trees I can identify: oh, look, a sugar maple, or oh, a short line of python. It’s when I start trying to understand the whole forest (just what is this Text Mining thing, anyway? How can I use topic modeling when I don’t have texts yet?) that I run into trouble.

Further, I admit to being a little daunted by the process, largely because my era is late 18th and early 19th century. Which means I’ll come up against the problems described and solved by Ted Underwood. It’s fantastic that he’s found a solution which works, but 4,600 rules for solving spelling errors is rather overwhelming. I’m trying not to lose sight of the trees in the forest.

What I really need is to play with the tools, and for that I need some texts. Does anyone know of a friendly, downloadable corpus? Preferably one from the late 18th or early 19th century?

Mapping Correspondents

When I started to think about trying to map the addresses (well, cities and states/countries) of the correspondents in my manuscript collection, my first visual was the Mapping the Republic of Letters Project, with the lines showing the to and from of the letters. That would be so neat! Until I realized that for me, all roads lead to Liverpool (and very occasionally, Sedgwick near Kendal). Continue reading “Mapping Correspondents”

What can web scraping do for me?

After last week, I was convinced that web scraping (especially with wget) was a nifty tool, but I wasn’t sure how useful it would be to me. After all, most of the data I’m working with and putting into my database is coming from an archive collection which doesn’t even have a detailed finding aid. The names, dates, summaries, and everything else are created by me as I go through the hundreds of photos I take each time I visit the archive. Continue reading “What can web scraping do for me?”

All Things are New in the Morning

I did no coding this past weekend. Saturday I read and did work for my other class (cholera and yellow fever!) and Sunday, despite the drizzle, I went out and enjoyed my favourite season of the year by picking apples.

When I sat down again with my editdocs file, which has given me so many headaches over the past few weeks, I immediately saw where certain things were going wrong. I had missing spaces in some of my echoes which were interfering with the code (“select” needed to be ” select “). In the space of a couple hours I was able to get the page to display preselected information, something which I’d been trying to do for over a week.

I’m not done with my CRUD yet. I’m slowly writing in the update code for editdocs, which has to update or create for all four join tables. I want to be sure I’m writing things properly so I’m not rushing through it. Plus, ever since we turned on error display I’m getting some odd error messages that I can’t quite understand (the variable isn’t undefined on line 29! Line 29 is where I tell you what it is! There’s an = and everything!).

Still, taking a break and looking at something other than a screen made it much easier to see where I was writing errors and where I need to go. Next time I hit a wall when coding, I will get up and go for a walk or something.

(The title of this post is a line from the poem “Blake Tells the Tiger the Tale of the Tailor” from A Visit to William Blake’s Inn by Nancy Willard)

Variable Overload?

Today I worked on the RUD in my database CRUD (Create, Read, Update, Delete). I was trying to get the RU to work. For the sake of efficiency, I put my desire to display sender and receiver on hold and just made a table pulling straight from the documents database. That worked and I added in links to the “full data” and “edit” pages, only one of which sort of exists at the moment. Most of today was spent working on edit.

Taking some excellent advice, I broke things out into pieces. First step: pulling the record ID as stored in the URL (written into the url in the showdocs page) into the code on the editdocs page. Note that the snippets below assume you’ve connected to the database somewhere upcode.

if (isset($_GET[‘kp_doc_id’])) {
echo "Isset Success. <br />";}
else{ echo "Problems";}

Then get the id and echo it to make sure the value is set correctly.

if (isset($_GET[‘kp_doc_id’])) {
echo "Isset Success. <br />";
// make sure the ‘id’ value is valid
if (is_numeric($_GET[‘kp_doc_id’]) && $_GET[‘kp_doc_id’] > 0) {
// get ‘id’ from URL
$id = $_GET[‘kp_doc_id’];
echo $id;}

Once I got all of that to work, I followed Sasha’s lead and added in a $stmt. However, I seem to be running into problems binding more than a few variables and plugging them into the form. Everything works fine if I only bind one or two, but with all the ones I need it seems to grind to a halt.

I’ve made the pages live anyway. Showdocs will give you access to edit docs (by clicking on edit). The code is on git gist, since it was longer than I cared to put here. Critiques welcome.

Parlez-vous code?

This semester I am continuing the trend of taking a digital (history) class. Although we’re calling it clio3, the name is properly Programming for Historians. The code and other work I generate will be going up in its own little corner of my webspace.

Hopefully I will finish the semester the proud creator of a working database into which I can input all of the various letters written by the family on whom my dissertation will be based, and with the database I will be able to conduct analysis (particularly location and movement). I’m excited to be building such a tool form scratch. I could have thrown something together in FileMakerPro (I managed several FMP databases at my previous job), but ever since I heard Jean Bauer talk about her Early American Foreign Service Database I’ve wanted to do the work myself, code and all.

I’ve been mucking about on the edges of codes of various kind for years. As a kid on MicroMUSE I learned the necessary commands to build myself an awesome house with an ever more awesome treehouse in the back yard (I was 10, what can I say?). Working in FMP I wrote scripts whose sytax reminded me a little playing around in the mux. I like the elegance and logic of coding languages, with the if and elseif, @desc and $variables. Unlike English, a language which bounces around and changes its mind about spelling and rules, code language seems to stay consistent once you’ve met it. I say seems to, because I’m still only just learning to speak these various languages, and I could be deluding myself. After all, in code-land I can only say “Parlez-vous anglais?” or “Ou est le WC?” and not much else.