[Edit][History] last modified April 9

The authoritative Open Library schema -- a specification of the database fields used to represent items like books and authors -- is a python expression in the source repository, here.

An more readable version may be generated by executing that file; here it is as of 2007-08-30. (Asterixes indicate multi-valued fields. The types "string", "text", "url" and "date" are all currently represented in ThingDB as strings, but could be displayed or edited in different ways.)

edition

FieldTypeMARC FieldsExample (Description)
source_record_loc string* "marc_records_scriblio_net/part01.dat:29834:543" (a locator for the source record data)
source_record_id string* "LC:DLC:00000006" (a record identifier that is globally unique and that also can be constructed consistently from the contents of a record and an identifier for its source catalog)
author_identifier string* 100:abcd, 110:ab, 710:ab, 111:acdn, 711:acdn "Twain, Mark, 1835-1910" (unique author id in some catalog)
contributions string* 700:abcde "Illustrated by: Steve Bjorkman"
title string 245:ab clean_name "The adventures of Tom Sawyer"
by_statement string* 245:c "Herman Melville ; [illustrated by Barry Moser]"
sort_title string "adventures of Tom Sawyer"
other_titles string* 246:a, 730:a-z, 740:apn "Mark Twain's The Adventures of Tom Sawyer"
work_title string 240:amnpr, 130:a-z (The 240 "work title" is used in the OCLC FRBR algorithm. The 130 is also used, and there should be either a 130 or a 240 in a record, but not both. It would be ideal if we could pick up either for the work title.)
edition string 250:ab "2nd. editon" (information about this edition)
publisher string 260:b clean_name "W. W. Norton & Co."
publish_place string* 260:a clean "New York"
publish_date date 008:7-10 "2006"
pagination string 300:a "viii, 383 p. :" (full pagination information)
number_of_pages int 300:a biggest_decimal 383 (largest decimal found)
subjects string* 600:abcd--x--v--y--z, 610:ab--x--v--y--z, 650:a--x--v--y--z, 651:a--x--v--y--z "Runaway children -- Fiction"
subject_place string* 651:a*, 650:z* "Venice (Italy)"
subject_time string* 600:y*, 650:y* "20th century"
genre string* 600:v*, 650:v*, 651:v* "Biography"
series string* 440:av, 490:av, 830:av "Oxford world's classics"
language string 008:35-37 "ISO" tag "ISO: tel" (coded or human-readable description of the text's language)
physical_format string* 245:h
notes string* 5XX!505!520:a-z
description text 520:a
exerpts text*
table_of_contents text* 505:art
cover_image url
scan_contributor string
scan_sponsor string
dewey_number string* 082:a "914.3"
LC_classification string 050:ab "BJ1533.C4 L49"
ISBN string* 020:a normalize_isbn, 024:a normalize_isbn "9780393926033" (13-digit ISBN)
UCC_13 string
UPC string
ISMN string
DOI string
LCCN string 010:a normalize_lccn "2006285320"
GTIN_14 string
oca_identifier string "albertgallatinja00stevrich"

author

FieldTypeMARC FieldsExample (Description)
identifier string* "Twain, Mark, 1835-1910" (unique id in some catalog)
name string "Mark Twain" (human-readable name)
birth_date date "1835"
death_date date "1910"
bio text
Please see Karen Coyle's earlier [notes](http://www.kcoyle.net/temp/PharosSchema+kc.html) on the schema, and also the tables and notes below, all of which inspired the working schema. Possible new schema -------------------------

EDITION

name type example/description
source_name STRING  
source_record_pos INT  
work ID-REF  
authors ID-REFs Tolkien, J. R. R.
contributors STRINGs "Illustrated by: Steve Bjorkman"
agencies/organizations STRINGs American Civil Liberties Union. Berkeley Chapter
title STRING The adventures of Tom Sawyer
"by" statement STRINGs Herman Melville ; [illustrated by Barry Moser]
sort title INT adventures of Tom Sawyer
other titles STRINGs Mark Twain's The Adventures of Tom Sawyer
edition STRING 2nd. editon
publisher STRING W. W. Norton & Co.,
publish_place STRING New York :
publish_date DATE c2007.
number_of_pages STRING viii, 383 p. :
subjects STRINGs Runaway children -- Fiction
series STRINGs Oxford world's classics
notes STRINGs  
BISAC_subject_categories STRINGs see definitions here
language_code STRING code from ISO 639-2/B; e.g., "tel"
language STRING human-readable description of the text's language, e.g, "Telugu"
physical_format STRING  
description HTML  
table of contents STRINGs  
Dewey numberSTRINGs914.3
LC ClassificationSTRINGBJ1533.C4 L49
cover_image URL  
scan_contributor STRING  
scan_sponsor STRING  
ISBN_10 STRING 0393926036
ISBN_13 STRING 9780393926033
UCC_13 STRING  
UPC STRING  
ISMN STRING  
DOI STRING  
LCCN STRING  
GTIN_14 STRING  
oca_identifier STRING "albertgallatinja00stevrich"

New EDITION with MARC and ONIX fields

name

type

field

subfield/code

example/description

source_name

STRING




source_record_pos

INT




work

ID-REF




authors

ID-REFs



Tolkien, J. R. R.

authors (MARC)


100

abcd

$a Tolkien, J. R. R.

authors (ONIX)


Contributor

"A01"


contributors

STRINGs



Morcock, Michael, 1939- , ed.

contributors (MARC)


700 w/o $t

abcde

$a Tolkien, J. R. R.

contributors (ONIX)


Contributor

not "A01"


agencies/organizations

STRINGs



American Civil Liberties Union. Berkeley Chapter

agencies/Organizations (MARC)


110, 710

ab

$a American Civil Liberties Union. $b Berkeley chapter



111, 711

acdn

a IEEE Pacific Rim Conference on Multimedia n (5th : d 2004 : c Tokyo, Japan)

agencies/Organizations (ONIX)


CorporateName



title

STRING



The adventures of Tom Sawyer

title (MARC)


245

ab

$a Moby Dick : $b or, The whale

"by" statement

STRINGs



Herman Melville ; [illustrated by Barry Moser]

"by" statement (MARC)


245

c

$c Herman Melville ; [illustrated by Barry Moser]

"by" statement (ONIX)


contributor
ContributorRole



sort title

INT



adventures of Tom Sawyer

other titles

STRINGs



Mark Twain's The Adventures of Tom Sawyer

other titles (MARC)


246

a

$a Beowolf

other titles (MARC)


730

a-z*

$a Bible $p N.T. $p Luke. $l Greek. $s Codex Sinaiticus

other titles (MARC)


740

apn


work title

STRING



The Pickwick papers.

work title (MARC) [1]


240

amnpr

$aThe Pickwick papers.

work title (MARC)


130

a-z*

$a Bible $p N.T. $p Luke. $l Greek. $s Codex Sinaiticus

edition

STRING



2nd. edition

edition (MARC)


250

ab

$a 4th ed. /$brevised and corrected

publisher

STRING



W. W. Norton & Co.,

publisher (MARC)


260

b

$b Macmillan

publisher (ONIX)


PublisherName



publish_place

STRING



New York :

publish_place (MARC)


260

a

$a New York :

publish_date [7]

DATE



2006

publish_date (MARC) [2]


008

pos 7-10

2006

publish_date (ONIX) [2]


PublicationDate


2006






number_of_pages

STRING



viii, 383 p. :

number_of_pages (MARC)


300

a

$a 149 p. ;

number_of_pages (ONIX)


NumberOfPages



subjects

STRINGs



Runaway children -- Fiction

subjects (MARC)


600

abcdxvyz

$a Thomas, Olive, $d 1898-1920 $x Death and burial

subjects (MARC)


610

abxvyz

$a Partito democratico

subjects (MARC)


650

axvyz

$a Heads of state $z Italy $z Venice $v Biography

subjects (MARC)


651

axvyz

$a Venice (Italy) $x History $y 1508-1797

series

STRINGs



Oxford world's classics

series (MARC)


440

av

$a Lecture notes in computer science $v 4407

series (MARC)


490

av

$a Österreichische Film ; $v 22

series (MARC)


830

av

$a Studi musicologici $v 6

series (ONIX)


ImprintName



notes

STRINGs




notes (MARC) [3]


5XX

a-z

$a Includes bibliographical references

BISAC_subject_categories

STRINGs



see definitions here

language_code [7]

STRING



code from ISO 639-2/B; e.g., "tel"

language_code (MARC)


008

pos. 35-37

eng

language_code (ONIX)


LanguageOfText



language [7]

STRING



human-readable description of the text's language, e.g, "Telugu"

physical_format

STRING




physical_format (ONIX)


ProductForm
Description



description

HTML




description (MARC)


520

a

$a Carew Raleigh was the only surviving son of Sir Walter Raleigh. This work ...

description (ONIX)


MainDescription



table of contents

STRINGs




table of contents


505

art

$t River of names / $r Dorothy Allison --

dewey number

STRINGs



914.3

dewey number


082

a

$a 914.3

LC Classification

STRING



BJ1533.C4 L49

LC Classification (MARC)


050

ab

$a BJ1533.C4 $b L49

cover_image

URL




scan_contributor

STRING




scan_sponsor

STRING




ISBN_10

STRING



0393926036

ISBN_10 (MARC) [4]


020

a

$a 0195144953 (alk. paper)

ISBN_13

STRING



9780393926033

ISBN_13 (MARC) [5]


020

a


ISBN_13 (MARC) [5]


024

a

9780399153501

UCC_13

STRING




UPC

STRING




ISMN

STRING




DOI

STRING




LCCN

STRING




LCCN (MARC) [6]


010

a

$a 2006285320

GTIN_14

STRING




oca_identifier

STRING



"albertgallatinja00stevrich"

place_facet[7]

STRINGs



Venice (Italy)

place_facet (MARC)


651

a


place_facet (MARC)


650

Z


Genre[7]

STRINGs



Biography

Genre (MARC)


600, 650, 651

v


Genre (ONIX)





Time_facet[7]

STRINGs



20th century

time_facet (MARC)


600, 650

Y

20th century


[1] The 240 "work title" is used in the OCLC FRBR algorithm. The 130 is also used, and there should be either a 130 or a 240 in a record, but not both. It would be ideal if we could pick up either for the work title.

[2] There are two sources in the MARC record for date of publication. The 260 $c may contain characters beyond the year ("c1997" or "1946 [reprinted 1965]"). Positions 07-10 of the 008 field have a normalized date ("1997" or "1946"). The dates as represented in the 260 will not be found outside of library records, so the 008 date can be substituted for it. For ONIX, the publication date often has month and day as well as year. For uses in terms of merging and for faceting, only the year should be used.

[3] MARC has a wide range of notes that appear in fields that begin with "5". All notes EXCEPT the 505 (table of contents) and 520 (summary) can be placed in a notes field. Notes fields can be repeatable.

[4] The ISBN field is not necessarily "clean" – there can be trailing data (0195144953 (alk. paper)). Take only the 10 or 13-character token, which should appear first. The token is all numeric EXCEPT that the final character can be "X".

[5] There are two possible locations for the ISBN_13 in MARC records. Records from some sources, including LC, will have the ISBN-13 in an 020 field. Many records will have two 020 fields, one with the ISBN-10 and one with the ISBN-13. Records from sources other than LC may have the ISBN-13 in the 024 field. There can be other 13-digit EANs in the 024 field, so the ISBN is identified by a "3" in the first indicator position.

[6]The LCCN field is not necessarily "clean" – there can be trailing data ($a 3400058678 /rev). If you wish to use the LCCN for matching, take only the numeric token from the subfield.

[7] this field is a potential facet for display and selection