
(This is part 3: part 1 is here, part 2 is here, part 4 is here)
Scanning projects largely wrapped up in 2016. Slowly but surely, (thanks largely to the tenacity of Jessica Sedgwick, then the BLC metadata consultant) our repositories began to implement a technical connector that allowed our individual repositories to connect with Digital Commonwealth/DPLA.
A COLLABORATIVE HOME?
Although Northeastern was coordinating the construction of this collection, we tried to follow a distributed ownership model, starting with our first piece of digital infrastructure, our blog. We posted each entry on “What’s New,” Snell Library’s blog, but there was an understanding that content belonged to the community and could be repurpused by partners on their own institutional blogs.
With the digital collection coming together, the group started to talk about portals, connectors, and how we can make this a true collection, not just several thousand contributed items to Digital Commonwealth/DPLA. We discussed purchasing and owning separate digital infrastructure collaboratively, but as the project was designed as a “lightweight, nimble project that attempts to lay the technical and descriptive groundwork for cross-institutional collaboration through the technical infrastructure of the DPLA and Digital Commonwealth” , we decided against it. Separate digitial infrastructure was neither lightweight nor nimble. In the end, we decided to create a widget that could live on multiple websites and use Northeastern’s already existing web tools to create a portal to the collection.
WIDGET TECH
To maintain the distributed ownership model, the group decided to create a tool that could search the collection directly from any website– essentially bring the collection into the organization’s own website rather than partners pointing to an external resource.
We looked at a widget created by Dean Farrell and Josh Wilson that allows a user to easily install a DPLA-branded search box on any website. When you type a word or phrase into the box, the widget goes to the DPLA and automatically populates a search. After clicking ‘search,’ the next screen that opens is a DPLA page with the results. Could this widget be modified to automatically populate desegregation search terms in addition to what has been typed in the box? At DPLAFest in 2016 (presenting on the Deseg project), I met Audrey Altman from DPLA, who played with the code. By using the search term ‘cat’, she determined that it was, in fact, possible. In 2017, Northeastern’s technical team (Eli Zoller, Ernesto Valencia) forked the github code, modified it, and created a working widget for the project.
SEARCH STRING
We then needed to determine the search terms that would bring up desegregation material from DPLA. Instead of solely using our common search string, we decided to riff on the https://www.umbrasearch.org model that uses a set of individual words to search for items rather than an LCSH subject heading. The group first constructed a boolean search (school* OR educat*) AND Boston AND (desegregation OR segregation OR integration) that we thought would retrieve most items, including those from our collections. In testing, we found a glitch– we recieved an expected number of results using Digital Commonwealth’s search tool and an unexpected result from DPLA, either a higher or lower number depending on which set of terms was listed first.
We reached out to Mark Breedlove at DPLA, who confirmed our suspicions– the search tool DPLA uses, (Elasticsearch, and Lucene) doesn’t honor “AND” and “OR” and “( parentheses )” so our boolean markers weren’t being observed. Although not ideal, we modified our search string to [ school* + integration + Boston ]
Mark suggested that we use an API instead of using the DPLA search box– it would allow us to use a more robust search string including fielded data (DPLA doesn’t have an “Advanced” search function, just API availability. His suggested search to start was:
curl -X GET \ ‘https://api.dp.la/v2/items?api_key=[YOUR API KEY] &q=integration%20OR%20desegregation&sourceResource.spatial=boston&sourceResource.subject=school%20OR%20schools&fields=id%2CsourceResource.title%2CsourceResource.description&page_size=50′
The use of an API does allow robust searching and would allow us to retrieve much more specific results. Unfortuantely, it requires the results of that search to be embedded into an owned website, which was not part of the initial scope.
WE MAKE A WEBSITE ANYWYAY:
For all of its wonderful aggregating power, DPLA and Digital Commonwealth are not set up for full-text searching OCRed collections, something that our own Digital Repository is capable of. We decided to make use of the tools, infrastructure built by Northeastern’s Digital Scholarship Group (CERES) to make a Northeastern portal to the collections.
Although a Northeastern-specific portal was the primary reason for bpsdesegregation.library.northeastern.edu to be built, it started to serve as the collaborative collection portal, and later partners agreed to use it also as the WWW place for information about the collection to live. Currently, the website is where information about the widget lives, and serves as a gathering place for a growing set of descriptive, information, and pedagogical tools.
(This is part 3: part 1 is here, part 2 is here)