Enhancing Library Discovery with
Linked Open Data

IGeLU/ELUNA Show & Tell, August 1, 2016

Steve Meyer, Data Strategist, UW-Madison Libraries

A Little Context

library.wisc.edu/experiments/linked-data

Sample URLs

Catalog Enhanced with Linked Data:
https://search-ld.library.wisc.edu/

Info Card

Screenshot of Picasso info card

Example In Context

Just for Fun - UW-Madison Catalog

How does this work?
Data Flow

Systems in Play

System/Data Source Role Description/Relevant Pieces
Local Catalog User Interface
  • Ruby on Rails + Solr web app
  • Client of Alma APIs
    circ availability, request functionality, etc.
  • Think Blacklight/VuFind
Alma Resource Management Platform
  • JSON-LD bibliographic data API
  • JSON-LD API returns entity URIs
    based on LCNAF authority control
VIAF
Virtual International Authority File
Identity Hub
  • Inbound: resolve Person data from LCNAF IDs
  • Outbound: Person entities link to other data
    related resources/entities on Web
Getty Vocabularies Linked Open Data source Contains cited entity descriptions
DBpedia Linked Open Data source Linked Data descriptions from Wikipedia info
Wikidata Linked Open Data source
  • Linked Data descriptions from Wikipedia info
  • Contains cited entity descriptions

Bibliographic Identity Crawl

High level diagram for crawling bibliographic data

Step 1: Alma JSON-LD Creator Data

"creator":[
  {
    "@id":"http://id.loc.gov/authorities/names/n78086005",
    "label":"Picasso, Pablo, 1881-1973.",
    "sameAs":"http://viaf.org/viaf/sourceID/LC|n78086005"
  },
  {
    "@id":"http://id.loc.gov/authorities/names/n79006977",
    "label":"Stein, Gertrude, 1874-1946.",
    "sameAs":"http://viaf.org/viaf/sourceID/LC|n79006977"
  }
]
Documentation

I Have: LCNAF URI; I Need: VIAF URI


Linked Data Classifieds

URI Want Ads

Looking for Love

LCNAF URI seeking VIAF companion. Non-smoker. Must be schema:sameAs compatible. owl:sameAs a plus. Looking for care-free companionship. Love nature, painting, abstraction.

Travel With Me!

VIAF URI seeking DBpedia companion. Must be owl:sameAs compatible. schema:sameAs should not contact. Let's travel the world wide web together.

Step 2.1: Resolving VIAF URI
(and corresponding data!)

http://viaf.org/viaf/sourceID/LC|n78086005

redirects to

http://viaf.org/viaf/15873

and then to

http://viaf.org/viaf/15873/

(Note: trace the HTTP redirect responses and look closely at the VIAF authority cluster documentation to understand the URI/URL semantics.)

Which VIAF Graph Entity?


<rdf:RDF>
  <rdf:Description rdf:about="http://viaf.org/viaf/15873/"/>
  <rdf:Description rdf:about="http://viaf.org/viaf/15873"/>
  <rdf:Description rdf:about="http://viaf.org/viaf/sourceID/BAV%7CADV12539089#skos:Concept"/>
  <rdf:Description rdf:about="http://viaf.org/viaf/sourceID/LC%7Cn++78086005#skos:Concept"/>
  ...
</rdf:RDF>

Ask the Graph, Use What is Known

Already known:

{
  "@id":"http://id.loc.gov/authorities/names/n78086005",
  "label":"Picasso, Pablo, 1881-1973.",
  "sameAs":"http://viaf.org/viaf/sourceID/LC|n78086005"
}

Query:

sameas_uri  = RDF::URI.new("http://schema.org/sameAs")
creator_uri = RDF::URI.new("http://id.loc.gov/authorities/names/n78086005")
graph.query(predicate: sameas_uri, object: creator_uri)

VIAF Entity Resolved

The query produces the following RDF triple.

Subject <http://viaf.org/viaf/15873>
Predicate <http://schema.org/sameAs>
Object <http://id.loc.gov/authorities/names/n78086005>

Step 2.2: Find VIAF Person Entity Links

Example: Find the DBpedia URI

sameas_uri     = RDF::URI.new("http://schema.org/sameAs")
entity_subject = RDF::URI.new("http://viaf.org/viaf/15873")
other_entities = graph.query(subject: entity_subject, predicate: sameas_uri)
other_entities.select {|s| s.object.to_s.match('http://dbpedia.org/resource')}.first

Resulting Triple

Subject <http://viaf.org/viaf/15873>
Predicate <http://schema.org/sameAs>
Object <http://dbpedia.org/resource/Pablo_Picasso>

Rinse, Repeat...

Until all 3 Linked Open Data source URIs are found
  1. <http://vocab.getty.edu/ulan/500009666-agent>
  2. <http://www.wikidata.org/entity/Q5593>
  3. <http://dbpedia.org/resource/Pablo_Picasso>

Crawl or Query?

Method Pros Cons
Crawl
  • Simple HTTP requests to fetch data
  • Keep data processing based in RDF triples

Each response only references related entities by URI
Related data requires subsequent HTTP requests

Query

A few queries encapsulate all required data to
significantly reduce number of HTTP requests

Introduces complexity in the form of:

  • Another technology, SPARQL
  • Results processing taken out of flow of RDF/triples

Sample SPARQL

Browse View
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT ?influenced ?influencedGivenName ?influencedSurname ?influencedSameAs
WHERE {
  { 
    <http://dbpedia.org/resource/Pablo_Picasso> dbo:influenced ?influenced . 
  }
  UNION
  {
    ?influenced dbo:influencedBy <http://dbpedia.org/resource/Pablo_Picasso> .
  }
  ?influenced foaf:givenName ?influencedGivenName .
  ?influenced foaf:surname ?influencedSurname .
  OPTIONAL { 
    ?influenced owl:sameAs ?influencedSameAs . 
    FILTER regex(STR(?influencedSameAs), "viaf.org").
  }
}

Crawl vs. Query Comparison

Picasso's influence according to DBpedia

  • SPARQL query: 1 HTTP request
  • HTTP Crawl: 54 HTTP requests
    • Initial HTTP request for primary subject
    • 1 HTTP request per influence related entity
      Picasso was an influence upon 53 other entities in DBpedia

How does this work?
Code

BibCard

https://github.com/UW-Madison-Library/bibcard BibCard README

BibCard Caveats!

  • This is an experimental reference implementation
  • Not fully documented yet
  • We are still evaluating the code

With that out of the way, let's dive in!

Design Goals

Environment / Context

Library Discovery ⊃ Web Application

Technical Considerations

  1. Maximize Perceived Speed via
    • Caching Mechanisms
    • Asynchronous Enhancement
  2. Programming Model: Object Oriented

BibCard Core Responsibilities

  1. Fetch data that can be cached
  2. Instantiate Person object

1) Fetch Cacheable Data

# Given an LCNAF or VIAF URI,
# Resolve the related URIs,
# Query for relevant data points,
# Return a micrograph
data = BibCard.person_data("http://id.loc.gov/authorities/names/n78086005")

puts data

# <http://viaf.org/viaf/15873> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .
# <http://viaf.org/viaf/15873> <http://schema.org/deathDate> "1973-04-09" .
# <http://viaf.org/viaf/15873> <http://schema.org/sameAs> <http://id.loc.gov/authorities/names/n78086005> .
# ...

Micrograph

A small heterogeneous collection of RDF triples from multiple sources

<http://viaf.org/viaf/85312226> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .
<http://viaf.org/viaf/85312226> <http://schema.org/name> "Tim Berners-Lee"@en .
<http://viaf.org/viaf/85312226> <http://schema.org/birthDate> "1955-06-08" .
<http://viaf.org/viaf/85312226> <http://schema.org/sameAs> <http://dbpedia.org/resource/Tim_Berners-Lee> .
<http://viaf.org/viaf/85312226> <http://schema.org/sameAs> <http://www.wikidata.org/entity/Q80> .
<http://dbpedia.org/resource/Tim_Berners-Lee> <http://dbpedia.org/ontology/influencedBy> <http://dbpedia.org/resource/Paul_Otlet> .
<http://dbpedia.org/resource/Tim_Berners-Lee> <http://dbpedia.org/ontology/influencedBy> <http://dbpedia.org/resource/Jon_Postel> .
<http://dbpedia.org/resource/Tim_Berners-Lee> <http://dbpedia.org/ontology/abstract> "Professor Sir Timothy John Berners-Lee, OM, KBE, FRS, FREng, FRSA, DFBCS (born 8 June 1955), also known as TimBL, is an English computer scientist, best known as the inventor of the World Wide Web. He made a proposal for an information management system in March 1989, and he implemented the first successful communication between a Hypertext Transfer Protocol (HTTP) client and server via the Internet sometime around mid-November of that same year.Berners-Lee is the director of the World Wide Web Consortium (W3C), which oversees the Web's continued development. He is also the founder of the World Wide Web Foundation, and is a senior researcher and holder of the Founders Chair at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).He is a director of the Web Science Research Initiative (WSRI), and a member of the advisory board of the MIT Center for Collective Intelligence.  In 2011 he was named as a member the Board of Trustees of the Ford Foundation.In 2004, Berners-Lee was knighted by Queen Elizabeth II for his pioneering work. In April 2009, he was elected a foreign associate of the United States National Academy of Sciences. He was honoured as the \"Inventor of the World Wide Web\" during the 2012 Summer Olympics opening ceremony, in which he appeared in person, working with a vintage NeXT Computer at the London Olympic Stadium. He tweeted \"This is for everyone\", which instantly was spelled out in LCD lights attached to the chairs of the 80,000 people in the audience." .
<http://dbpedia.org/resource/Paul_Otlet> <http://xmlns.com/foaf/0.1/givenName> "Paul" .
<http://dbpedia.org/resource/Paul_Otlet> <http://xmlns.com/foaf/0.1/surname> "Otlet" .
<http://dbpedia.org/resource/Paul_Otlet> <http://www.w3.org/2002/07/owl#sameAs> "http://viaf.org/viaf/54277595" .
<http://dbpedia.org/resource/Jon_Postel> <http://xmlns.com/foaf/0.1/givenName> "Jon" .
<http://dbpedia.org/resource/Jon_Postel> <http://xmlns.com/foaf/0.1/surname> "Postel" .
<http://www.wikidata.org/entity/Q80> <http://www.wikidata.org/prop/direct/P69> <http://www.wikidata.org/entity/Q73094> .
<http://www.wikidata.org/entity/Q80> <http://www.wikidata.org/prop/direct/P69> <http://www.wikidata.org/entity/Q5369138> .
<http://www.wikidata.org/entity/Q80> <http://www.wikidata.org/prop/P69> <http://www.wikidata.org/entity/statement/q80-166977F2-448C-48BB-B4B1-C14A3714DE82> .
<http://www.wikidata.org/entity/Q80> <http://www.wikidata.org/prop/P69> <http://www.wikidata.org/entity/statement/Q80-E892C384-4E42-4E6E-9A3D-6E6FE3201954> .
<http://www.wikidata.org/entity/Q80> <http://schema.org/description> "Web developer" .
<http://www.wikidata.org/entity/Q80> <http://www.wikidata.org/prop/direct/P937> <http://www.wikidata.org/entity/Q42944> .
<http://www.wikidata.org/entity/Q73094> <http://www.w3.org/2000/01/rdf-schema#label> "The Queen's College" .
<http://www.wikidata.org/entity/statement/q80-166977F2-448C-48BB-B4B1-C14A3714DE82> <http://www.wikidata.org/prop/statement/P69> <http://www.wikidata.org/entity/Q73094> .
<http://www.wikidata.org/entity/Q5369138> <http://www.w3.org/2000/01/rdf-schema#label> "Emanuel School" .
<http://www.wikidata.org/entity/statement/Q80-E892C384-4E42-4E6E-9A3D-6E6FE3201954> <http://www.wikidata.org/prop/statement/P69> <http://www.wikidata.org/entity/Q5369138> .
<http://www.wikidata.org/entity/Q42944> <http://www.w3.org/2000/01/rdf-schema#label> "European Organization for Nuclear Research" .
Info card for Tim Berners-Lee

SPARQL JSON Response...

{
    "head":{
        "link":[],
        "vars":["abstract","foundedDate","location","thumbnail","depiction"]
    },
    "results":{
        "distinct":false,"ordered":true,
        "bindings":[
            {
                "abstract":{
                    "type":"literal",
                    "xml:lang":"en",
                    "value":"Gertrude Stein (February 3, 1874 \u2013 July 27, 1946) was an American writer of novels, poetry and plays. Born in the Allegheny West neighborhood of Pittsburgh, Pennsylvania, and raised in Oakland, California, Stein moved to Paris in 1903, making France her home for the remainder of her life. A literary innovator and pioneer of Modernist literature, Stein\u2019s work broke with the narrative, linear, and temporal conventions of the 19th-century. She was also known as a collector of Modernist art.In 1933, Stein published a kind of memoir of her Paris years, The Autobiography of Alice B. Toklas, written in the voice of Toklas, her life partner. The book became a literary bestseller and vaulted Stein from the relative obscurity of cult literary figure into the light of mainstream attention."
                },
                "thumbnail":{
                    "type":"uri",
                    "value":"http://commons.wikimedia.org/wiki/Special:FilePath/Gertrude_Stein_1935-01-04.jpg?width=300"
                },
                "depiction":{
                    "type":"uri",
                    "value":"http://commons.wikimedia.org/wiki/Special:FilePath/Gertrude_Stein_1935-01-04.jpg"
                }
            }
        ]
    }
}

Gets Translated Into N-triples for Micrograph

<http://dbpedia.org/resource/Gertrude_Stein> <http://dbpedia.org/ontology/abstract> "Gertrude Stein (February 3, 1874 \u2013 July 27, 1946) was an American writer of novels, poetry and plays. Born in the Allegheny West neighborhood of Pittsburgh, Pennsylvania, and raised in Oakland, California, Stein moved to Paris in 1903, making France her home for the remainder of her life. A literary innovator and pioneer of Modernist literature, Stein\u2019s work broke with the narrative, linear, and temporal conventions of the 19th-century. She was also known as a collector of Modernist art.In 1933, Stein published a kind of memoir of her Paris years, The Autobiography of Alice B. Toklas, written in the voice of Toklas, her life partner. The book became a literary bestseller and vaulted Stein from the relative obscurity of cult literary figure into the light of mainstream attention." .
<http://dbpedia.org/resource/Gertrude_Stein> <http://dbpedia.org/ontology/thumbnail> "http://commons.wikimedia.org/wiki/Special:FilePath/Gertrude_Stein_1935-01-04.jpg?width=300" .
(Which is a little awkward)

2) Instantiate Code Objects

Instantiating a BibCard::Person

Pseudo Code: instantiate a BibCard::Person within a Rails app & using Rails' caching

Code Flow: Instantiating Person Object

  1. Get micrograph for URI
    use web app framework's cache
  2. Load a Spira repository from the raw data
  3. Using the repository graph, find VIAF URI
  4. Instantiate BibCard::Person with VIAF URI

Ruby Spira

"The name is from Latin, for 'breath of life'--it's time to give those resource URIs some character." - Spira: A Linked Data ORM for Ruby

"Spira is a framework for using the information in RDF.rb repositories as model objects."

- github.com/ruby-rdf/spira

Using Spira Objects in View Code

Web app view code for rendering HTML for a person's alma maters Web application view code for rendering HTML for a person's alma maters

Spira Model Objects

Intended to look similar to Rails ActiveRecord associations

BibCard: Here There Be Dragons...

  • Data Hacks
  • Awkward Code
  • Unreliable Endpoints

URI Hack #1: Dealing with Linked Data Dead Ends

This truly makes me squirm :(
Offensive code: modifying URI

Reason

  • VIAF entity with a link to Getty data is a schema:Person
  • VIAF entity therefore links to Getty's corresponding entity of rdf:type schema:Person
  • Useful Getty data is attached to entity of rdf:type gvp:PersonConcept
  • Getty data set does not appear to link its own PersonConcept to its own data for the schema:Person

URI Hack #2: Finding URIs by String Match

Code example: finding URI by string matching

Request Timeouts

Code example: handling request timeouts

Request Timeout Consequences

Functionality can only be a progressive enhancement
(Linked Data info cards are additive, not core data)

  • Caching should be neither too short, nor too long
  • Content added to the page via asynchronous requests

Closing Thoughts:
How come this is even possible?

  • Authority file data was well positioned to be Linked Data
  • Major library systems players are cooperating
    Vendors, Maintainers of National and/or Union Catalogs
  • Well established, flexible technology patterns
    caching, async web apps, object models
  • Having a major data hub in VIAF

Image Credits

ImageCreatorLicense
And they're off! Jun Attribution, Share Alike
Man With Mustache Seeks Love Matt Niemi Attribution, Non-Commercial, No Derivatives
Red Flag joe christiansen Attribution
Breath of Life KellyB. Attribution
here there be... Chris Blakeley Attribution, Non-Commercial, No Derivatives
Dead End, DeKalb, IL Tadson Bussey Attribution, No Derivatives
an open book kate hiscock Attribution