Classical scholars traditionally identify passages of text using canonical citation schemes. We can derive from this practice a general model of texts as an ordered hierarchy of citation objects (the “OHCO2 model”), and can cite passages in the OHCO2 model using Canonical Text Service (CTS) URNs. CTS URNs express the OHCO2 model’s relation of a citation hierarchy (such as “book/chapter/section”) to a complementary work hierarchy (“notional work/specific edition or translation/individual exemplar”). Analyses, or readings, of a text are also unique, citable objects. Since every reading or analysis tokenizes a text, the citable analysis can be aligned with a CTS URN identifying the token it analyses. I will introduce a general model of this alignment (OHCO2-realigned citable analyses, or ORCA), and briefly illustrate how the ORCA model reduces many higher-order problems in text analysis to familiar, general operations on lists, sets and directed graphs.
Back to Workshop IV: Mathematical Analysis of Cultural Expressive Forms: Text Data