Islandora and the COAR Next Generation Repositories Report

Late last year, a working group of the Confederation of Open Access Repositories (COAR) released a report with recommendations to adopt "new technologies, standards, and protocols that will help repositories become more integrated into the web environment and enable them ​to ​play ​a ​larger ​role ​in ​the ​scholarly ​communication ​ecosystem." Islandora's own Institutional Repository Interest Group took up the report and measured Islandora against it, looking at both the current functionality available in Islandora 7.x, and how we can best shape Islandora CLAW to meet these recommendations for the future (complete with issues in the CLAW GitHub so we can track our progress). They have shared their own results, written up by convenor Bryan Brown:

 

#1: Exposing Identifiers

The brunt of the recommendation here seems to be implementing best practices listed at http://signposting.org/ regarding typed HTTP links. I’m not sure what Islandora 7.x is doing in terms of typed HTTP links, but I’m assuming nothing beyond whatever Drupal 7 does by default. It could certainly be doing more, but there’s a lot to chew on in the best practices in terms of deciding what actually needs to be done, and how this should be done for different types of objects. CLAW, being a linked data application that operates primarily via HTTP, should definitely be doing these things. I’ve made a use case for this at https://github.com/Islandora-CLAW/CLAW/issues/860.

 

#2: Declaring ​Licenses ​at ​a ​Resource ​Level

Very similar to Behavior #1 (Exposing Identifiers), this recommends using best practices from http://signposting.org/ to use typed HTTP links to expose the URI for the license that best describes a resource. Good in theory, but not all licenses have machine-readable URIs, and would require either migrating existing free-text licenses to ones that have a URI, or in the case of special one-off licenses, creating URIs for local licenses (which wouldn’t be very interoperable). COAR recommends using Creative Commons licenses since they have readily available URIs, but CC licenses aren’t really a good fit for scholarly works since publishing introduces a lot of issues that CC licenses don’t cover. As for the human readable part, that’s just a matter of your metadata and your theming. 7.x and CLAW both should be able to display human-readable rights statements, but neither can do the HTTP link part currently. CLAW use case at https://github.com/Islandora-CLAW/CLAW/issues/860.

 

#3: Discovery ​through ​Navigation

Even more emphasis on using the best practices at http://signposting.org/. 7.x’s Islandora Google Scholar module adds a link to the PDF for citation/thesis objects as an HTML meta tag, but that’s it. Its easy to see how adding this as a typed HTTP link, especially for compound objects would be helpful to let a machine know about the different parts of a larger meta-object. This feature would be nice for 7.x, but as a Linked Data Application CLAW should definitely have it. Covered again by https://github.com/Islandora-CLAW/CLAW/issues/860.

 

#4: Interacting ​with ​Resources ​(Annotation, ​Commentary ​and ​Review)

Members of the IR IG are not sold on this one for use in university IRs. Perhaps there are very specific types of repo systems where peer review, comments and annotations are useful, perhaps for aggregators or publishing platforms. In a university IR, it seems like it could actually hinder adoption because faculty might not want folks interacting with their scholarship, and would request mediation for such things which would slow down already overworked IR staff. Drupal already has tons of modules for things like this, so you could probably modify one to work with Islandora objects in 7.x, and in CLAW you wouldn’t even have to write any code, just turn the module on and configure it. Turning those annotations into linked data on the object would be a bit more difficult, but that difficulty would be more in deciding how the metadata should look than how to implement.

 

#5: Resource ​Transfer

This seems to be suggesting a modern form of OAI-PMH, but in a way that includes assets in the transfer. Strong recommendation for ResourceSync, which we have no experience with, but looks like it would do the job. 7.x will probably never have this, but CLAW should focus on it. Use case at https://github.com/Islandora-CLAW/CLAW/issues/857.

 

#6: Batch ​Discovery

We aren’t really not sure how this differs from Behavior #5 (Resource Transfer) since this seems to be a use case where someone used “Resource Transfer” technology to put all of your repo’s stuff in an aggregator so that it could be found in multiple places. You take care of #5, you already take care of #6. Covered by use case https://github.com/Islandora-CLAW/CLAW/issues/857.

 

#7: Collecting ​and ​Exposing ​Activities

This seems to be a mash-up of #4 and #5: capture interactions, turn them into metadata that you expose, and then push that metadata along with the rest of your data with ResourceSync. There are a LOT of recommendations for possible ways to do this, which underscores the fact that there’s not a clear standard for this and probably not a lot of consumers for this kind of data either. This seems like a “nice to have”, not a “have to have”.

 

#8: Identification ​of ​Users

This seems like a good idea, and ORCID seems like the obvious best choice in a scholarly context. We don’t know much about the other two ID systems involved (Social Network Identities and WebID), perhaps they would be good for folks who don’t have an ORCID, but then again perhaps this could be a good way to get people to use/understand ORCID. Use of ORCID could potentially lock out non-academic users, which may be a bug or a feature depending on your goals. Whichever you pick, the problem is going to be getting something that people use across the web in order to deliver on the promises outlined in this section. In an age where people are wary about privacy and the web knowing too much about you, we don’t think this one would get as much broad adoption as COAR thinks.

 

#9: Authentication ​of ​Users

We don’t understand how this is different from #8, it seems like the two go together to such a degree that separating them is only confusing.

 

#10: Exposing ​Standardized ​Usage ​Metrics

This is a nice dream, but much harder than it sounds. Current generation repositories are pretty close to doing all they can in terms of capturing views/downloads on objects, although client-side triggers are better than server-side ones in order to avoid problems with caching, and Piwik seems to be a winner in the international community due to its focus on privacy and flexibility (although it does require setting up your own Piwik server). Standardizing the way usage stats are exposed from the same repo is a good idea as well, but none of us have experience with SUSHI or COUNTER.

All this can be done to perfect aggregation of usage stats on the same repo, but aggregating/summing stats from external sources is not going to be a practical option until there is a centralized source that does this with a solid API.

 

#11: Preserving ​Resources

While we agree with the sentiment here, we’re not sure they are saying anything new. Fedora should take care of the actual preservation bits, and Islandora has always requested least-common-denominator open format file types for archival master datastreams and used derivative processes to spin into other formats.