The authoritative Open Library schema -- a specification of the database fields used to represent items like books and authors -- is a python expression in the source repository, here.
An more readable version may be generated by executing that file; here it is as of 2007-08-30. (Asterixes indicate multi-valued fields. The types "string", "text", "url" and "date" are all currently represented in ThingDB as strings, but could be displayed or edited in different ways.)
edition
| Field | Type | MARC Fields | Example (Description) |
|---|---|---|---|
| source_record_loc | string* | "marc_records_scriblio_net/part01.dat:29834:543" (a locator for the source record data) | |
| source_record_id | string* | "LC:DLC:00000006" (a record identifier that is globally unique and that also can be constructed consistently from the contents of a record and an identifier for its source catalog) | |
| author_identifier | string* | 100:abcd, 110:ab, 710:ab, 111:acdn, 711:acdn | "Twain, Mark, 1835-1910" (unique author id in some catalog) |
| contributions | string* | 700:abcde | "Illustrated by: Steve Bjorkman" |
| title | string | 245:ab clean_name | "The adventures of Tom Sawyer" |
| by_statement | string* | 245:c | "Herman Melville ; [illustrated by Barry Moser]" |
| sort_title | string | "adventures of Tom Sawyer" | |
| other_titles | string* | 246:a, 730:a-z, 740:apn | "Mark Twain's The Adventures of Tom Sawyer" |
| work_title | string | 240:amnpr, 130:a-z | (The 240 "work title" is used in the OCLC FRBR algorithm. The 130 is also used, and there should be either a 130 or a 240 in a record, but not both. It would be ideal if we could pick up either for the work title.) |
| edition | string | 250:ab | "2nd. editon" (information about this edition) |
| publisher | string | 260:b clean_name | "W. W. Norton & Co." |
| publish_place | string* | 260:a clean | "New York" |
| publish_date | date | 008:7-10 | "2006" |
| pagination | string | 300:a | "viii, 383 p. :" (full pagination information) |
| number_of_pages | int | 300:a biggest_decimal | 383 (largest decimal found) |
| subjects | string* | 600:abcd--x--v--y--z, 610:ab--x--v--y--z, 650:a--x--v--y--z, 651:a--x--v--y--z | "Runaway children -- Fiction" |
| subject_place | string* | 651:a*, 650:z* | "Venice (Italy)" |
| subject_time | string* | 600:y*, 650:y* | "20th century" |
| genre | string* | 600:v*, 650:v*, 651:v* | "Biography" |
| series | string* | 440:av, 490:av, 830:av | "Oxford world's classics" |
| language | string | 008:35-37 "ISO" tag | "ISO: tel" (coded or human-readable description of the text's language) |
| physical_format | string* | 245:h | |
| notes | string* | 5XX!505!520:a-z | |
| description | text | 520:a | |
| exerpts | text* | ||
| table_of_contents | text* | 505:art | |
| cover_image | url | ||
| scan_contributor | string | ||
| scan_sponsor | string | ||
| dewey_number | string* | 082:a | "914.3" |
| LC_classification | string | 050:ab | "BJ1533.C4 L49" |
| ISBN | string* | 020:a normalize_isbn, 024:a normalize_isbn | "9780393926033" (13-digit ISBN) |
| UCC_13 | string | ||
| UPC | string | ||
| ISMN | string | ||
| DOI | string | ||
| LCCN | string | 010:a normalize_lccn | "2006285320" |
| GTIN_14 | string | ||
| oca_identifier | string | "albertgallatinja00stevrich" |
author
| Field | Type | MARC Fields | Example (Description) |
|---|---|---|---|
| identifier | string* | "Twain, Mark, 1835-1910" (unique id in some catalog) | |
| name | string | "Mark Twain" (human-readable name) | |
| birth_date | date | "1835" | |
| death_date | date | "1910" | |
| bio | text |
| name | type | example/description |
|---|---|---|
| source_name | STRING | |
| source_record_pos | INT | |
| work | ID-REF | |
| authors | ID-REFs | Tolkien, J. R. R. |
| contributors | STRINGs | "Illustrated by: Steve Bjorkman" |
| agencies/organizations | STRINGs | American Civil Liberties Union. Berkeley Chapter |
| title | STRING | The adventures of Tom Sawyer |
| "by" statement | STRINGs | Herman Melville ; [illustrated by Barry Moser] |
| sort title | INT | adventures of Tom Sawyer |
| other titles | STRINGs | Mark Twain's The Adventures of Tom Sawyer |
| edition | STRING | 2nd. editon |
| publisher | STRING | W. W. Norton & Co., |
| publish_place | STRING | New York : |
| publish_date | DATE | c2007. |
| number_of_pages | STRING | viii, 383 p. : |
| subjects | STRINGs | Runaway children -- Fiction |
| series | STRINGs | Oxford world's classics |
| notes | STRINGs | |
| BISAC_subject_categories | STRINGs | see definitions here |
| language_code | STRING | code from ISO 639-2/B; e.g., "tel" |
| language | STRING | human-readable description of the text's language, e.g, "Telugu" |
| physical_format | STRING | |
| description | HTML | |
| table of contents | STRINGs | |
| Dewey number | STRINGs | 914.3 |
| LC Classification | STRING | BJ1533.C4 L49 |
| cover_image | URL | |
| scan_contributor | STRING | |
| scan_sponsor | STRING | |
| ISBN_10 | STRING | 0393926036 |
| ISBN_13 | STRING | 9780393926033 |
| UCC_13 | STRING | |
| UPC | STRING | |
| ISMN | STRING | |
| DOI | STRING | |
| LCCN | STRING | |
| GTIN_14 | STRING | |
| oca_identifier | STRING | "albertgallatinja00stevrich" |
New EDITION with MARC and ONIX fields
name
type
field
subfield/code
example/description
source_name
STRING
source_record_pos
INT
work
ID-REF
authors
ID-REFs
Tolkien, J. R. R.
authors (MARC)
100
abcd
$a Tolkien, J. R. R.
authors (ONIX)
Contributor
"A01"
contributors
STRINGs
Morcock, Michael, 1939- , ed.
contributors (MARC)
700 w/o $t
abcde
$a Tolkien, J. R. R.
contributors (ONIX)
Contributor
not "A01"
agencies/organizations
STRINGs
American Civil Liberties Union. Berkeley Chapter
agencies/Organizations (MARC)
110, 710
ab
$a American Civil Liberties Union. $b Berkeley chapter
111, 711
acdn
a IEEE Pacific Rim Conference on Multimedia n (5th : d 2004 : c Tokyo, Japan)
agencies/Organizations (ONIX)
CorporateName
title
STRING
The adventures of Tom Sawyer
title (MARC)
245
ab
$a Moby Dick : $b or, The whale
"by" statement
STRINGs
Herman Melville ; [illustrated by Barry Moser]
"by" statement (MARC)
245
c
$c Herman Melville ; [illustrated by Barry Moser]
"by" statement (ONIX)
contributor
ContributorRole
sort title
INT
adventures of Tom Sawyer
other titles
STRINGs
Mark Twain's The Adventures of Tom Sawyer
other titles (MARC)
246
a
$a Beowolf
other titles (MARC)
730
a-z*
$a Bible $p N.T. $p Luke. $l Greek. $s Codex Sinaiticus
other titles (MARC)
740
apn
work title
STRING
The Pickwick papers.
work title (MARC) [1]
240
amnpr
$aThe Pickwick papers.
work title (MARC)
130
a-z*
$a Bible $p N.T. $p Luke. $l Greek. $s Codex Sinaiticus
edition
STRING
2nd. edition
edition (MARC)
250
ab
$a 4th ed. /$brevised and corrected
publisher
STRING
W. W. Norton & Co.,
publisher (MARC)
260
b
$b Macmillan
publisher (ONIX)
PublisherName
publish_place
STRING
New York :
publish_place (MARC)
260
a
$a New York :
publish_date [7]
DATE
2006
publish_date (MARC) [2]
008
pos 7-10
2006
publish_date (ONIX) [2]
PublicationDate
2006
number_of_pages
STRING
viii, 383 p. :
number_of_pages (MARC)
300
a
$a 149 p. ;
number_of_pages (ONIX)
NumberOfPages
subjects
STRINGs
Runaway children -- Fiction
subjects (MARC)
600
abcdxvyz
$a Thomas, Olive, $d 1898-1920 $x Death and burial
subjects (MARC)
610
abxvyz
$a Partito democratico
subjects (MARC)
650
axvyz
$a Heads of state $z Italy $z Venice $v Biography
subjects (MARC)
651
axvyz
$a Venice (Italy) $x History $y 1508-1797
series
STRINGs
Oxford world's classics
series (MARC)
440
av
$a Lecture notes in computer science $v 4407
series (MARC)
490
av
$a Österreichische Film ; $v 22
series (MARC)
830
av
$a Studi musicologici $v 6
series (ONIX)
ImprintName
notes
STRINGs
notes (MARC) [3]
5XX
a-z
$a Includes bibliographical references
BISAC_subject_categories
STRINGs
see definitions here
language_code [7]
STRING
code from ISO 639-2/B; e.g., "tel"
language_code (MARC)
008
pos. 35-37
eng
language_code (ONIX)
LanguageOfText
language [7]
STRING
human-readable description of the text's language, e.g, "Telugu"
physical_format
STRING
physical_format (ONIX)
ProductForm
Description
description
HTML
description (MARC)
520
a
$a Carew Raleigh was the only surviving son of Sir Walter Raleigh. This work ...
description (ONIX)
MainDescription
table of contents
STRINGs
table of contents
505
art
$t River of names / $r Dorothy Allison --
dewey number
STRINGs
914.3
dewey number
082
a
$a 914.3
LC Classification
STRING
BJ1533.C4 L49
LC Classification (MARC)
050
ab
$a BJ1533.C4 $b L49
cover_image
URL
scan_contributor
STRING
scan_sponsor
STRING
ISBN_10
STRING
0393926036
ISBN_10 (MARC) [4]
020
a
$a 0195144953 (alk. paper)
ISBN_13
STRING
9780393926033
ISBN_13 (MARC) [5]
020
a
ISBN_13 (MARC) [5]
024
a
9780399153501
UCC_13
STRING
UPC
STRING
ISMN
STRING
DOI
STRING
LCCN
STRING
LCCN (MARC) [6]
010
a
$a 2006285320
GTIN_14
STRING
oca_identifier
STRING
"albertgallatinja00stevrich"
place_facet[7]
STRINGs
Venice (Italy)
place_facet (MARC)
651
a
place_facet (MARC)
650
Z
Genre[7]
STRINGs
Biography
Genre (MARC)
600, 650, 651
v
Genre (ONIX)
Time_facet[7]
STRINGs
20th century
time_facet (MARC)
600, 650
Y
20th century
[1] The 240 "work title" is used in the OCLC FRBR algorithm. The 130 is also used, and there should be either a 130 or a 240 in a record, but not both. It would be ideal if we could pick up either for the work title.
[2] There are two sources in the MARC record for date of publication. The 260 $c may contain characters beyond the year ("c1997" or "1946 [reprinted 1965]"). Positions 07-10 of the 008 field have a normalized date ("1997" or "1946"). The dates as represented in the 260 will not be found outside of library records, so the 008 date can be substituted for it. For ONIX, the publication date often has month and day as well as year. For uses in terms of merging and for faceting, only the year should be used.
[3] MARC has a wide range of notes that appear in fields that begin with "5". All notes EXCEPT the 505 (table of contents) and 520 (summary) can be placed in a notes field. Notes fields can be repeatable.
[4] The ISBN field is not necessarily "clean" – there can be trailing data (0195144953 (alk. paper)). Take only the 10 or 13-character token, which should appear first. The token is all numeric EXCEPT that the final character can be "X".
[5] There are two possible locations for the ISBN_13 in MARC records. Records from some sources, including LC, will have the ISBN-13 in an 020 field. Many records will have two 020 fields, one with the ISBN-10 and one with the ISBN-13. Records from sources other than LC may have the ISBN-13 in the 024 field. There can be other 13-digit EANs in the 024 field, so the ISBN is identified by a "3" in the first indicator position.
[6]The LCCN field is not necessarily "clean" – there can be trailing data ($a 3400058678 /rev). If you wish to use the LCCN for matching, take only the numeric token from the subfield.
[7] this field is a potential facet for display and selection