I’ve written about annotating OCRed PDFs before at the hypothes.is blog. This post is an attempt to slow down and expand upon a few simple steps of the general process of doing so: how to actually upload and host PDFs within a WordPress install. If you are familiar with the WP platform then this is likely nothing new. But it’s something I get asked about enough–usually by teachers entirely new to the platform–that I thought I’d write a brief blog/tutorial (blogtorial?). Combined with my post on getting started with your own WordPress install using Reclaim Hosting, this will get you set up with your own blog/website to host multiple formats of texts for annotation by a class.
This post is a continuation of the drum I’ve been beating lately about the utility and power of teachers compiling their own anthologies or course readers in some annotatable place online. In discussing this process before, though, I’ve mostly been talking about copying and pasting text into annotatable WordPress HTML posts or pages--or doing something similar in PressBooks or Scalar). There’s probably no need for most educators to go through the trouble of setting up Scalar or PressBooks to get a handful of texts online. And if you are working with PDFs primarily–if you are, say, reading a bunch of academic articles in that format–there’s a bit more to the story.
While this blog is WordPress focused, I’ve worked with teachers who have hosted PDFs for annotation in a variety of LMSs and CMSs. I’ll probably circle back and write some posts about these other spaces, but here are a few that I know hypothes.is plays well with: Blackboard, Moodle, and Drupal. And here are a few that I know it does not: Google Docs and Canvas. You can have students download PDFs from both Google Docs and Canvas (not to mention Dropbox, etc.) and collaboratively annotate them locally as outlined here. It’s not as nice as having an easy place online to point students to, but it works. And of course hypothes.is is working on interoperability with both those systems.
As I mentioned in the OCR blog from last year, it was a professor at NYU, Krystyna Michael, who first started pushing me to figure this whole thing out. I helped her OCR a PDF, but she did what I describe step-by-step below and hosted PDFs of essays and articles her class was reading on a WordPress page, including the one by Joan Didion above.
You do need your PDFs to be OCRed to be annotatable, though, so here’s a short video showing how to do so with Adobe AcrobatPro:
The How To
(Though it could have been annotated at the NMC website, I chose to download the PDF to my desktop and upload it to my WordPress blog for demonstration purposes. In other cases, the PDF of a reading may not exist yet online, or may not yet be OCRed, so I would have to be responsible for hosting it.)
That link will open as a .pdf webpage hosted in my WordPress blog installation.
I cannot make hypothes.is native on this page using the WordPress plugin, as I can on regular posts and pages within the blog. I can however activate the hypothes.is Chrome extension or bookmarklet and annotate the page myself or with others. So too can I create a via proxy link to the page that is natively annotatable without the extension/bookmarklet–such a link could be made directly from a post/page of readings (as here), but would need to be done manually by changing the link above.
I can add a PDF to my WordPress install from within a page or post by clicking “Add media” just above the composition window. Below you can see I selected text to create a link.
I can also “Add media” unassociated with a particular post/page by going to Dashboard > Media > New.
I can then find the new media/PDF in my media “Library” and copy the URL to get a link I can paste/share elsewhere:
That URL is the same as it would be by creating the link inside a post/page except that I’ve skipped creating a page/hyperlink for the URL to be linked from.
I haven’t looked into it yet, but I don’t think I can make the .pdf page itself private. But of course if your blog requires a login, then you probably don’t need to be concerned about copyright issues (trust me, I know as much about copyright law as Dr. Nick does about medicine). My annotations at that page, unless private or part of a group, will however be viewable, along with their textual referents, in the hypothes.is stream.
However, if my concern is that my group’s annotations are private here, I should create a new private group using hypothes.is for our collaborative annotation.
Someone else could go through the same process as me, downloading the same doc from NMC, uploading elsewhere, and annotating the PDF. And if they did so in public, those annotations would also appear on “my” version of the PDF. I did so myself here–adding a second annotation in the new location.
The synchronization above is due to a spec of PDF protocol–each PDF has a kind of fingerprint. That fingerprint allows for collaborative annotation of PDFs in different locations and even locally. (Jon Udell explains why in this blog.) There are ways to erase the original fingerprint and thus get a new and different version of a PDF, but I’m not going to go into that here. If you are worried about multiple classes annotating the same text over time, then simply use the hypothes.is groups feature.