pymarc - 读取、写入和修改 MARC 书目数据

读取、写入和修改 MARC 书目数据

项目描述

_|_|_|    _|    _|  _|_|_|  _|_|      _|_|_|  _|  _|_|    _|_|_|
_|    _|  _|    _|  _|    _|    _|  _|    _|  _|_|      _|
_|    _|  _|    _|  _|    _|    _|  _|    _|  _|        _|
_|_|_|      _|_|_|  _|    _|    _|    _|_|_|  _|          _|_|_|
_|              _|
_|          _|_|

pymarc 是一个 python 库，用于处理以 MARC21编码的书目数据。它提供了一个用于读取、写入和修改 MARC 记录的 API。它主要被设计成一个紧急弹射座椅，用于将您的数据资产从 MARC 中取出并以某种更理智的方式呈现。然而多年来，它一直被用来创建和修改 MARC 记录，因为尽管一再呼吁它作为一种格式消亡，但 MARC 似乎像僵尸一样过着幸福的生活。

以下是您可能希望如何使用 pymarc 的一些常见示例。如果您遇到一个您认为应该在此处的示例，请发送拉取请求。

安装

您可能只想使用 pip 安装 pymarc：

pip install pymarc

如果你想下载并安装最新的源代码，你需要 git：

git clone git://gitlab.com/pymarc/pymarc.git

您还需要setuptools。获得源代码和 setuptools 后，运行 pymarc 测试套件以确保发行版符合要求：

python setup.py test

然后安装：

python setup.py install

阅读

大多数情况下，您将拥有一些 MARC 数据并希望从中提取数据。这是一个读取一批记录并打印出标题的示例。如果你好奇，这个例子使用了 pymarc 存储库中可用的批处理文件：

from pymarc import MARCReader
with open('test/marc.dat', 'rb') as fh:
    reader = MARCReader(fh)
    for record in reader:
        print(record.title())

The pragmatic programmer : from journeyman to master /
Programming Python /
Learning Python /
Python cookbook /
Python programming for the absolute beginner /
Web programming : techniques for integrating Python, Linux, Apache, and MySQL /
Python programming on Win32 /
Python programming : an introduction to computer science /
Python Web programming /
Core python programming /
Python and Tkinter programming /
Game programming with Python, Lua, and Ruby /
Python programming patterns /
Python programming with the Java class libraries : a tutorial for building Web
and Enterprise applications /
Learn to program using Python : a tutorial for hobbyists, self-starters, and all
who want to learn the art of computer programming /
Programming with Python /
BSD Sockets programming from a multi-language perspective /
Design patterns : elements of reusable object-oriented software /
Introduction to algorithms /
ANSI Common Lisp /

对象有pymarc.Record一些方便的方法，例如title获取书目记录的位，其他方法包括：author, isbn, subjects, location, notes, physicaldescription, publisher, pubyear, issn, issn_title。但实际上，要使用 MARC 数据，您需要了解用于指定各种信息位的数字字段标签和子字段代码。MARC 记录中隐藏的内容比这些方法提供的访问权限要多得多。例如，该title方法从245字段、子字段a和b. 您可以245a像这样访问：

print(record['245']['a'])

某些字段（例如主题）可以重复。在这种情况下，您将希望将 get_fields它们全部作为pymarc.Field对象，然后您可以进一步与之交互：

for f in record.get_fields('650'):
    print(f)

如果您是 MARC 领域的新手，了解 MARC是一本很好的入门读物，一旦您了解了基础知识，国会图书馆的MARC 21 格式页面是一个很好的参考。

写作

这是创建记录并将其写入文件的示例。

from pymarc import Record, Field
record = Record()
record.add_field(
    Field(
        tag = '245',
        indicators = ['0','1'],
        subfields = [
            'a', 'The pragmatic programmer : ',
            'b', 'from journeyman to master /',
            'c', 'Andrew Hunt, David Thomas.'
        ]))
with open('file.dat', 'wb') as out:
    out.write(record.as_marc())

更新

更新的工作方式相同，您将其读入、修改，然后再次写出：

from pymarc import MARCReader
with open('test/marc.dat', 'rb') as fh:
    reader = MARCReader(fh)
    record = next(reader)
    record['245']['a'] = 'The Zombie Programmer'
with open('file.dat', 'wb') as out:
    out.write(record.as_marc())

JSON 和 XML

如果您发现自己使用 MARC 数据并进行分发，您可能会通过使用 JSON 或 XML 序列化让其他开发人员更快乐一些。使用 XML 或 JSON 的主要好处是使用了 UTF8 字符编码，而不是令人沮丧的过时的 MARC8 编码。他们还将能够使用标准的 JSON 和 XML 读/写工具来获取他们想要的数据，而不是使用一些疯狂的 MARC 处理库，例如 ahem、pymarc。

XML

要解析 MARCXML 记录文件，您可以：

from pymarc import parse_xml_to_array

records = parse_xml_to_array('test/batch.xml')

如果您有一个大型 XML 文件并且不想将它们全部读入内存，您可以：

from pymarc import map_xml

def print_title(r):
    print(r.title())

map_xml(print_title, 'test/batch.xml')

此外，如果您愿意，除了map_xml和parse_xml_to_array的路径之外，您还可以传入类似 object 的文件：

records = parse_xml_to_array(open('test/batch.xml'))

JSON

JSON 支持相当少，因为您可以调用pymarc.Record' as_json()方法为给定的 MARC 记录返回 JSON：

from pymarc import MARCReader

with open('test/one.dat','rb') as fh:
    reader = MARCReader(fh)
    for record in reader:
        print(record.as_json(indent=2))

{
  "leader": "01060cam  22002894a 4500",
  "fields": [
    {
      "001": "11778504"
    },
    {
      "010": {
        "ind1": " ",
        "subfields": [
          {
            "a": "   99043581 "
          }
        ],
        "ind2": " "
      }
    },
    {
      "100": {
        "ind1": "1",
        "subfields": [
          {
            "a": "Hunt, Andrew,"
          },
          {
            "d": "1964-"
          }
        ],
        "ind2": " "
      }
    },
    {
      "245": {
        "ind1": "1",
        "subfields": [
          {
            "a": "The pragmatic programmer :"
          },
          {
            "b": "from journeyman to master /"
          },
          {
            "c": "Andrew Hunt, David Thomas."
          }
        ],
        "ind2": "4"
      }
    },
    {
      "260": {
        "ind1": " ",
        "subfields": [
          {
            "a": "Reading, Mass :"
          },
          {
            "b": "Addison-Wesley,"
          },
          {
            "c": "2000."
          }
        ],
        "ind2": " "
      }
    },
    {
      "300": {
        "ind1": " ",
        "subfields": [
          {
            "a": "xxiv, 321 p. ;"
          },
          {
            "c": "24 cm."
          }
        ],
        "ind2": " "
      }
    },
    {
      "504": {
        "ind1": " ",
        "subfields": [
          {
            "a": "Includes bibliographical references."
          }
        ],
        "ind2": " "
      }
    },
    {
      "650": {
        "ind1": " ",
        "subfields": [
          {
            "a": "Computer programming."
          }
        ],
        "ind2": "0"
      }
    },
    {
      "700": {
        "ind1": "1",
        "subfields": [
          {
            "a": "Thomas, David,"
          },
          {
            "d": "1956-"
          }
        ],
        "ind2": " "
      }
    }
  ]
}

如果要解析 MARCJSON 记录文件，您可以：

from pymarc import parse_json_to_array

records = parse_json_to_array(open('test/batch.json'))

print(records[0])

=LDR  00925njm  22002777a 4500
=001  5637241
=003  DLC
=005  19920826084036.0
=007  sdubumennmplu
=008  910926s1957\\\\nyuuun\\\\\\\\\\\\\\eng\\
=010  \\$a   91758335
=028  00$a1259$bAtlantic
=040  \\$aDLC$cDLC
=050  00$aAtlantic 1259
=245  04$aThe Great Ray Charles$h[sound recording].
=260  \\$aNew York, N.Y. :$bAtlantic,$c[1957?]
=300  \\$a1 sound disc :$banalog, 33 1/3 rpm ;$c12 in.
=511  0\$aRay Charles, piano & celeste.
=505  0\$aThe Ray -- My melancholy baby -- Black coffee -- There's no you -- Doodlin' -- Sweet sixteen bars -- I surrender dear -- Undecided.
=500  \\$aBrief record.
=650  \0$aJazz$y1951-1960.
=650  \0$aPiano with jazz ensemble.
=700  1\$aCharles, Ray,$d1930-$4prf

支持

如果您需要帮助，pymarc 开发人员鼓励您加入pymarc Google Group 。此外，请随时使用GitLab 上的问题跟踪来提交功能请求或错误报告。如果您想抓挠，请抓挠它，然后在 GitLab上发送合并请求。

如果您开始与 MARC 合作，您可能会觉得除了技术支持之外，您还需要精神上的支持。Libera上的 #code4lib频道对两者来说都是一个好地方。

项目详情

发布历史发布通知| RSS订阅

这个版本

4.2.0

2022 年 4 月 4 日

4.1.3

2022 年 3 月 2 日

4.1.2

2021 年 12 月 10 日

4.1.1

2021 年 5 月 26 日

4.1.0

2021 年 3 月 30 日

4.0.0

2020 年 2 月 29 日

3.2.0

2019 年 12 月 10 日

3.1.13

2019 年 3 月 27 日

3.1.12

2019 年 2 月 27 日

3.1.11

2018 年 12 月 3 日

3.1.10

2018 年 5 月 18 日

3.1.9

2018 年 5 月 18 日

3.1.8

2018 年 4 月 9 日

3.1.7

2017 年 7 月 26 日

3.1.6

2017 年 3 月 22 日

3.1.5

2016 年 7 月 20 日

3.1.4

2016 年 7 月 20 日

3.1.3

2016 年 7 月 18 日

3.1.2

2016 年 5 月 3 日

3.1.1

2015 年 12 月 18 日

3.0.4

2015 年 7 月 27 日

3.0.3

2014 年 12 月 3 日

3.0.2

2014 年 10 月 10 日

3.0.1

2014 年 6 月 2 日

3.0.0

2014 年 5 月 30 日

2.9.2

2014 年 1 月 24 日

2.9.1

2013 年 11 月 24 日

2.9.0

pymarc 4.2.0

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

安装

阅读

写作

更新

JSON 和 XML

支持

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史发布通知| RSS订阅

pymarc 4.2.0

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

安装

阅读

写作

更新

JSON 和 XML

支持

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史 发布通知| RSS订阅

发布历史发布通知| RSS订阅