About the Librarianship

futurelib

Open Library provides an exciting opportunity for the librarianship community. For the first time, we'll have an open, public, curated, universal catalog of all books. But this also presents an enormous challenge: we need to develop a schema for all that book information.

Like the MARC format, we'll want our schema to contain all the important bibliographic information that librarians want to collect about books. But we'll also want to take advantage of all the things we've learned since MARC. We'll also want to store some information that's of less importance to librarians, but of more importance to publishers (like the ONIX format stores) and arbitrary users. And we'll have to figure out how to present all this data in a way that makes sense to relatively untutored users.

We're currently calling this new schema futurelib and we hope to hold a series of meetings for geeks, librarians, and booklovers to come together and hash out a proposal. To aid in this discussion, we have a page of library terminology with definitions. Feel free to add new terms or definitions to the list.

In the meantime, you might be interested in our current draft schema which we threw together to get the demo up. There is also a crosswalk from the Infogami data elements to Dublin Core.

OLNs

But it's not just schemas. A universal catalog will also allow us to have a new, universal book identification scheme -- something akin to ISBNs or ISTNs, but for all books, not just recent ones. We're currently calling this scheme OLN for Open Library Number, but we'll need your help hashing out how it should work.

Some useful data

MARC language codes with translation
LC Class numbers outline. This file combines all of the documents from http://www.loc.gov/catdir/cpso/lcco/, and is in text format with tabs for the levels.
First two levels of LC Class numbers. This file has the LC Classification at the level of one character and two character codes, followed by a display form (eg. H Social sciences (General), HA Statistics). The codes and display forms are separated by a tab character.

Merging

A big part of our problem is collecting book information from multiple sources and merging it together: merging records from publishers and from libraries, merging together the same book when in different libraries, merging together the same book that has different editions, etc. We'll need your help developing algorithms that can do this effectively as well as in correcting the results when the algorithms go wrong.

We're starting with a version of the merge algorithm developed for the MELVYL catalog.

There is also a page where we can lay out different options for creating Work sets -- (also known as FRBR-ization).

Data collection

We want as much data as we can get our hands on. If you know anyone with data or know how to get some, please get in touch.

If you are creating a data dump for us, it's best if you can use a standard format (MARC21, MARCXML, UNIMARC, etc.). Be sure to include in each record fields that give:

your local record ID (MARC 001)
something that identifies the source of the record (your system or institution) (MARC 003)
the version date (either the last date the record was updated, or the date of the data dump) (MARC 005)

Further info

We're going to keep working on and discussing all these issues. For more information:

Join the librarianship list

History

Created March 4, 2009
272 revisions

June 26, 2023	Edited by AgentSapphire	update link to work merge documentation
June 26, 2023	Edited by AgentSapphire	Edited without comment.
November 30, 2022	Edited by AgentSapphire	update limit from 50 to 200
October 14, 2022	Edited by tmanarl	remove in development tag
March 4, 2009	Created by webchick	creating .en /about/lib page