URIs Referencing Text Sections

A Proposal for Writing Fragids

Version: 2021-10-26

Some Preliminary Background

Background

RDF

RDF

  • Resource Description Framework
  • Method of making machine-actionable assertions that are semantically interoperable
  • (Better: Assertion Framework)
  • Allows for cross-project inferences

RDF Data Model

  • Graph structure
  • Every datum is a triple: subject, predicate (!), object
  • Subject: URI
  • Predicate: URI
  • Object: URI or literal (data)
  • w3.org/TR/rdf11-primer

Background

IRI / URI

IRI / URI

  • International / Universal Resource Identifier
  • RFC 3986-87
  • Includes URNs, URLs
  • scheme ":" hier-part [ "?" query ] [ "#" fragment ]

URI Examples

  • foo://example.com:8042/over/there?name=ferret#nose
  • ftp://ftp.is.co.za/rfc/rfc1808.txt
  • http://www.ietf.org/rfc/rfc2396.txt
  • ldap://[2001:db8::7]/c=GB?objectClass?one
  • mailto:John.Doe@example.com
  • news:comp.infosystems.www.servers.unix
  • tel:+1-816-555-1212
  • telnet://192.0.2.16:80/
  • urn:oasis:names:specification:docbook:dtd:xml:4.1.2

URI requirements

  • Persistence
  • Uniqueness
  • Interoperability

Background

Texts

We refer to texts
either abstractly ("The Iliad says...")
or concretely ("In this manuscript the Iliad says...")

Texts

Conceptual Material
terms
work (version)
text bearing object
= scriptum
examples
  • The Iliad (Alexander's translation)
  • Frankenstein (Mel Brooks's Young Frankenstein)
  • this particular manuscript (one-item set)
  • this book (many-item set)
  • this digital file (many-item set)

The task

Write RDF triples about texts

Requirements

  • Scope: non-digital texts
  • Support assertions about a text qua work or scriptum
  • Precision: up to tokens, characters
  • Support discontinuous phrases
  • Data independent (perhaps absent)
  • Media-agnostic (text, audio, video)

What we can do now

Shakespeare wrote Henry V.
@prefix db: <http://dbpedia.org/resource/> .
@prefix dce: <http://purl.org/dc/elements/1.1/> .
db:William_Shakespeare dce:creator db:Henry_V_(play)
turtle syntax
Plato's Republic quotes from Homer's Iliad.
@prefix db: <http://dbpedia.org/resource/> .
@prefix cito: <http://purl.org/spar/cito/> .
db:Republic_(Plato) cito:cites db:Iliad
Thanks a lot.
(Where exactly?)

Desideratum: specificity

For centuries we have created and consumed
specific, human-readable references
that are persistent, unique, and interoperable.
How to convert this convention to URIs?

Studies in Text Reuse

Plato Resp. 328e6 quotes Il. 6.211
"the threshold of old age"
@prefix db: <http://dbpedia.org/resource/> .
@prefix cito: <http://purl.org/spar/cito/> .
db:Republic_(Plato) cito:cites db:Iliad

Linguistics

In Shakespeare's Henry VI, part 2, 1.4.32, 'Henry' is at once both grammatical object and grammatical subject.
"The duke yet lives that Henry shall depose"
@prefix db: <http://dbpedia.org/resource/> .
@prefix *la: <http://example.org/linguistic-annotation/> .
@prefix olia: <http://purl.org/olia/olia.owl#> .
db:Henry_VI,_Part_2 *la:hasFeature olia:Subject
db:Henry_VI,_Part_2 *la:hasFeature olia:DirectObject

Translation Studies

At Psalm 31.2 (31.1; LXX 30.2) הַטֵּה is translated κλῖνον by the Septuagint
"incline"
@prefix saws: <http://purl.org/saws/ontology#> .
?????? saws:isDirectTranslationOf ??????
(How do you even get started?)

Challenges

  • URI for a specific portion of text?
  • Texts with multiple reference systems
  • Texts with ambiguous reference systems
  • How to conceptualize/approach works?

Previous approaches

General

Ad hoc

Problems

  • urn:cts: — misleading; not registered with IANA
  • (In praise of tag URIs [RFC 4151].)
  • Namespaces, subnamespaces administered how?
  • URIs exposed to collision (uniqueness in jeopardy)
  • How to find URIs?
  • Domain model incomplete, underdeveloped (FRBR has problems, and is applied inconsistently)
  • Many texts have multiple "canonical" reference systems (uniqueness in jeopardy)
  • Do URIs persist?
  • Dependent upon specific digital corpora (interoperability in jeopardy)

Step back

Rethink the problem

  • Don't invent a new URI system
  • Coopt URIs already in use
  • Approach the problem from the persective of the URI writer, not the data provider/curator
  • Define a domain model of texts (go beyond FRBR)
  • Define a domain model of reference systems
  • Define a rigorous syntax
  • Start with tractable examples

A proposal

Writing Fragids

Fragid

The Basic Idea

  1. Start with a base URI for a work or scriptum (then specify which you mean).
  2. Declare a reference system.
  3. Use the reference system to navigate to smaller units.
  4. Perhaps point to individual tokens/characters.

Draft Examples

Walter Burkert, Lore and Science in Ancient Pythagoreanism, page 14 line 2
http://www.worldcat.org/oclc/860129739#$wf0:a=w;t=m;r=.;14:2$
  1. $wf0: — version number
  2. a=w; — treat the base URI as a work (not a scriptum)
  3. t=m; — ref. system type is material (not logical)
  4. r=.; — the reference scriptum is the base URI, qua scriptum
  5. 14:2 — material unit 14, subunit 2
Plato, Resp. 328e6
http://dbpedia.org/resource/Republic_(Plato)#$wf0:a=w;t=m;r=http://www.worldcat.org/oclc/1688842;2:328:5:6$
  1. $wf0: — version number
  2. a=w; — treat the base URI as a work (not a scriptum)
  3. t=m; — ref. system type is material (not logical)
  4. r=http://www.worldcat.org/oclc/1688842; — the reference scriptum is Stephanus's edition
  5. 2:328:5:6 — material units 2 (volume), 328 (page), 5 (subpage), 6 (line)
Problems: volume number unexpected; Stephanus's material reference system imprecisely applied to any other version
Shakespeare, Henry VI, part 2, 1.4.32, 'Henry'
http://dbpedia.org/resource/Henry_VI,_Part_2#$wf0:a=w;t=l;r=    http://www.worldcat.org/oclc/935285574;1:4:32::Henry[1]$
  1. $wf0: — version number
  2. a=w; — treat the base URI as a work (not a scriptum)
  3. t=l; — ref. system type is logical (not material)
  4. r=http://www.worldcat.org/oclc/935285574; — the reference scriptum is Warren's edition
  5. 1:4:2 — logical units 1 (act), 4 (scene), 32 (line)
  6. ::Henry[1] — first instance of 'Henry'
Problem: can't be applied to Japanese, Chinese, etc. versions; URI needs a mechanism to specify restriction only to English versions of the work.

WF Design

(In ten minutes? You must be joking.)

Domain model

  • Texts: works or scripta
  • Works
    • work-versions
    • conceptual
    • more like Bibframe than FRBR
    • can nest, overlap
  • Scripta
    • class of material objects
    • may be constrained
  • Scriptum readers
  • Text divisions: material or logical
Everything above needs thoughtful, in-depth discussion

Restrictions

  • Only scriptum readers should attempt to write WFs
  • Only uniquely enumerable divisions
  • Only writings with one material reference system, one material logical system
  • Prime candidates: modern articles, books
Everything above needs thoughtful, in-depth discussion

Challenges

Writing Fragid challenges

  • Too confusing, theorized?
  • Epistemology: scriptum reader?
  • Things with too many URIs (scripta)
  • Things without any URIs (works, work-versions)
  • Restricting a work to a work-version
  • URI syntax conflicts?
  • How to handle end/footnotes?
  • Tokens based on spaces only?
  • How to handle errors

Viability

Will it work?

Breakthrough technology?
Utter failure?
Somewhere in-between?
Failure can be success.

Media

Writing Fragids Should be Media-Agnostic

  • TEI
  • HTML
  • Audio (e.g., audiobooks)
  • Video
  • Nothing
A media format provider must write specifications specifying strictures for WF-compliance.

TEI Akita

  • Experiment in WF-compliant TEI
  • Works on any flavor of TEI
  • No long URIs in the text structure: dependence exclusively upon @n
  • Stand-off Schematron validation + SQF
  • WF declarations made via processing instructions
(Oh, the name? Well...WF → "woof" → dog → early dog breed → akita)

Status

WF Status

Writing Fragids

Join the conversation

Joel Kalvesmaki, kalvesmaki@gmail.com