fhir-pyrate - FHIR-PYrate 是一个包，它提供了一个高级 API 来查询 FHIR 服务器以获取资源包并将结构化信息作为 pandas DataFrames 返回。它还可用于使用 RegEx 和 SpaCy 过滤资源并下载 DICOM 研究和

FHIR-PYrate 是一个包，它提供了一个高级 API 来查询 FHIR 服务器以获取资源包并将结构化信息作为 pandas DataFrames 返回。它还可用于使用 RegEx 和 SpaCy 过滤资源并下载 DICOM 研究和

项目描述

这个包旨在提供一个简单的抽象来查询和构建 FHIR 资源作为 pandas DataFrames。

主要有四个类：

Ahoy：在 FHIR API 上进行身份验证（示例 1、2 ），目前仅支持 BasicAuth 和令牌身份验证。
Pirate ：通过 FHIR API 提取和搜索数据（示例 1、2、3和 4）。
Miner：在诊断报告中搜索关键字或短语（示例 4）。
DicomDownloader：下载完整的研究或系列（示例 2）。

免责声明：我们已尝试为一些公共 FHIR 服务器添加测试。然而，由于资源的质量和数量，我们无法像在我们研究所的本地 FHIR 服务器上测试的那么多。如果代码中的任何内容仅适用于我们的服务器，或者您在身份验证方面遇到问题（或其他任何问题），请创建问题或给我们发送电子邮件。

安装

要么点

该软件包可以使用 PyPi 安装

pip install fhir-pyrate

或使用 GitHub（始终是最新版本）。

pip install git+https://github.com/UMEssen/FHIR-PYrate.git

这两个命令只安装Pirate. 如果您还想使用Miner或DicomDownloader，那么您需要将它们安装为额外的依赖项

pip install "fhir-pyrate[miner]" # only for miner
pip install "fhir-pyrate[downloader]" # only for downloader
pip install "fhir-pyrate[all]" # for both

或在诗中

我们也可以将诗歌用于同样的目的。使用 PyPi，我们需要运行以下命令。

poetry add fhir-pyrate
poetry install

而从 GitHub 中添加它，我们有不同的选择，因为直到最近，诗歌还只能从 master 分支安装。

诗歌1.2.0a2+：

poetry add git+https://github.com/UMEssen/FHIR-PYrate.git
poetry install

对于以前的版本，您需要将以下行添加到pyproject.toml文件中：

fhir-pyrate = {git = "https://github.com/UMEssen/FHIR-PYrate.git", branch = "main"}

然后运行

poetry lock

同样在诗歌中，上面只安装了Pirate. 如果您还想使用Miner或DicomDownloader，那么您需要将它们安装为额外的依赖项

poetry add "fhir-pyrate[miner]" # only for miner
poetry add "fhir-pyrate[downloader]" # only for downloader
poetry add "fhir-pyrate[all]" # for both

或通过将以下内容添加到您的pyproject.toml文件中：

fhir-pyrate = {git = "https://github.com/UMEssen/FHIR-PYrate.git", branch = "main", extras = ["all"]}

运行测试

在实现新功能时，请确保使用我们的单元测试不会破坏现有功能。首先使用 FHIR 服务器的用户名和密码设置环境变量FHIR_USER和环境变量，然后运行测试。FHIR_PASSWORD

poetry run python -m unittest discover tests

如果你实现了一个新特性，请在 tests中为它添加一个小测试。您也可以将测试用作示例。

解释和例子

请查看示例文件夹以获取完整示例。

啊哈

Ahoy类用于进行身份验证，并且是Pirate和 DicomDownloader类所必需的。

from fhir_pyrate import Ahoy

# Authorize via password
auth = Ahoy(
  username="your_username",
  auth_method="password",
  auth_url=auth-url, # The URL for authentication
  refresh_url=refresh-url, # The URL to refresh the authentication
)

我们接受以下身份验证方法：

token：将您已经生成的令牌作为构造函数参数传递。
密码：通过提示输入您的密码。
env：使用FHIR_USERandFHIR_PASSWORD环境变量（主要用于单元测试）。您还可以使用该change_environment_variable_name 功能更改其名称。
密钥环：待实施。

海盗

Pirate可以查询在 FHIR API 中实现的任何资源，并初始化如下：

from fhir_pyrate import Pirate

auth = ...

# Init Pirate
search = Pirate(
    auth=auth,
    base_url=fhir-url, # e.g. "http://hapi.fhir.org/baseDstu2"
    print_request_url=False, # If set to true, you will see all requests
)

Pirate 函数执行以下三件事之一：

他们运行查询并收集资源并将它们存储在捆绑列表中。
- steal_bundles: 单个进程，没有时间跨度来指定
- steal_bundles_for_timespan: 单个进程，可以指定时间跨度
- sail_through_search_space：多进程，用许多更小的时间跨度分而治之，使用steal_bundles_for_timespan
- trade_rows_for_bundles：多进程，以 DataFrame 作为输入，每行运行一个查询，使用steal_bundles
他们获取捆绑列表并构建数据框。
- bundles_to_dataframe: 多进程
它们是结合了 1&2 的功能或设置一些特定参数的包装器。
- query_to_dataframe：多进程，执行使用bundles_function （1.中的任何功能）选择的任何功能，然后bundles_to_dataframe在结果上运行。
- trade_rows_for_dataframe：多进程，对DataFrame的每一行执行steal_bundles& bundles_to_dataframe 。

姓名	类型	多处理	DF输入？	输出
窃取捆绑包	1	不	不	FHIRObj 捆绑包列表
steal_bundles_for_timespan	1	不	不	FHIRObj 捆绑包列表
帆通搜索空间	1	是的	不	FHIRObj 捆绑包列表
trade_rows_for_bundles	1	是的	是的	FHIRObj 捆绑包列表
bundles_to_dataframe	2	是的	不	数据框
query_to_dataframe	3	是的	是的	数据框
trade_rows_for_dataframe	3	是的	是的	数据框

BETA FEATURE：也可以使用bundle_caching指定缓存文件夹的参数缓存捆绑包。这尚未经过广泛测试，并且没有任何缓存失效机制。

ImagingStudy 的玩具请求：

search = ...

# Make the FHIR call
bundles = search.query_to_dataframe(
    bundles_function=search.sail_through_search_space,
    resource_type="ImagingStudy",
    date_init="2021-04-01",
    time_attribute_name="started",
    request_params={
      "modality": "CT",
      "_count": 5000,
    }
)

参数request_params是一个字典，它将字符串作为键（FHIR 标识符）并将任何内容作为值。如果值是列表或元组，则所有值都将用于构建对 FHIR API 的请求。

query_to_dataframe是一个包装函数。它收集由 bundles_function被调用和调用产生的包bundles_to_dataframe。在这种情况下，我们使用了sail_through_search_space。

该sail_through_search_space函数使用多处理模块来加速一些查询。多处理按如下方式进行：时间范围被划分为多个时间跨度（与进程一样多），并且同时研究每个较小的时间范围。这就是为什么有必要为函数提供 adate_init 和date_endparam 的原因sail_through_search_space。的默认值为date_init=2010-01-01和今天（执行查询的日期）date_end。

资源的一个问题方面是获取资源的日期是使用不同的属性定义的。此外，一些资源使用固定日期，其他使用时间段。您可以指定要与一起使用的日期属性time_attribute_name。在下表中，您可以看到我们对每个资源使用了哪些资源属性。默认属性是_lastUpdated.

日期基于时间段（如Encounter或Procedure）的资源可能会导致多处理中的重复，因为一个条目可能属于生成的多个时间跨度。使用数据构建 DataFrame 后，您可以删除 ID 重复项。

`trade_rows_for_bundles`

如果我们已经有一个带有fhir_patient_ids 或任何其他标识符的 Excel 工作表或 CSV 文件），并且我们想根据这些标识符请求资源，我们可以使用以下函数trade_rows_for_bundles：

search = ...
# DataFrame containing FHIR patient IDs
patient_df = ...

# Collect all imaging studies defined within df_reports
dr_bundles = search.trade_rows_for_bundles(
  patient_df,
  resource_type="DiagnosticReport",
  request_params={"_count": "100", "status": "final"},
  df_constraints={"subject": "fhir_patient_id"},
)

我们只需要定义resource_type我们想要从 DataFrame 中强制执行的约束和约束df_constraints。该字典应包含成对的 ( fhir_identifier, identifier_column)，其中fhir_identifier是 API 搜索参数，identifier_column 是存储我们要搜索的值的列。此外，可以使用一个系统来更好地识别 DataFrame 的约束。例如，假设我们有一列 DataFrame（称为loinc_code它包含一堆不同的 LOINC 代码。我们df_constraints可以如下所示：

df_constraints={"code": ("http://loinc.org", "loinc_code")}

此函数也使用多处理，但与以前不同的是，它将并行调查 DataFrame 的行。

`bundles_to_dataframe`

上面描述的两个函数返回一个FHIRObj包列表，然后可以DataFrame使用这个函数将其转换为一个。

bundles_to_dataframe关于如何处理和从包中提取相关信息，有三个选项：

提取所有内容，在这种情况下，您可以使用该 flatten_data 函数，该函数已经是的默认值process_function，因此您实际上不需要指定任何内容。

# Create bundles with Pirate
search = ...
bundles = ...
# Convert the returned bundles to a dataframe
df = search.bundles_to_dataframe(
    bundles=bundles,
)

使用处理函数，您可以通过遍历条目并选择元素来准确定义所需的属性。将添加到字典中的值表示 DataFrame 的列。有关何时执行此操作可能有意义的示例，请查看示例 3。

from typing import List, Dict
from fhir_pyrate.util.fhirobj import FHIRObj
# Create bundles with Pirate
search = ...
bundles = ...
def get_diagnostic_text(bundle: FHIRObj) -> List[Dict]:
    records = []
    for entry in bundle.entry or []:
        resource = entry.resource
        records.append(
            {
                "fhir_diagnostic_report_id": resource.id,
                "report_status": resource.text.status,
                "report_text": resource.text.div,
            }
        )
    return records
# Convert the returned bundles to a dataframe
df = search.bundles_to_dataframe(
    bundles=bundles,
    process_function=get_diagnostic_text,
)

使用参数仅提取部分信息fhir_paths。在这里，您可以放置遵循FHIRPath标准的字符串列表。为此，我们使用了fhirpath-py包，它使用了 antr4解析器。此外，您可以使用元组，例如(key, fhir_path)，key列的名称将存储从该 FHIRPath 派生的信息。

# Create bundles with Pirate
search = ...
bundles = ...
# Convert the returned bundles to a dataframe
df = search.bundles_to_dataframe(
    bundles=bundles,
    fhir_paths=["id", ("code", "code.coding"), ("identifier", "identifier[0].code")],
)

关于 FHIR 路径的注意 1：该标准还允许一些原始数学运算，例如模数 ( mod) 或整数除法 ( div)，如果资源的某些字段使用这些术语作为属性，这可能会出现问题。在许多生成的公共 FHIR 资源中实际上就是这种情况。在这种情况下，text.div不能使用该术语，而应使用处理函数（如 2. 中所示）。

关于 FHIR 路径的注意 2：由于可以使用 tuple 指定列名 (key, fhir_path)，因此重要的是要知道，如果一个键多次用于不同的信息但对于相同的资源，则该字段将仅填充第一次出现不是无。

df = search.query_to_dataframe(
    bundles_function=search.steal_bundles,
    resource_type="DiagnosticReport",
    request_params={
        "_count": 1,
        "_include": "DiagnosticReport:subject",
    },
    # CORRECT EXAMPLE
    # In this case subject.reference is None for patient, so all patients will have their Patient.id
    fhir_paths=[("patient", "subject.reference"), ("patient", "Patient.id")],
    # And Patient.id is None for DiagnosticReport, so they will have their subject.reference
    fhir_paths=[("patient", "Patient.id"), ("patient", "subject.reference")],
    # WRONG EXAMPLE
    # In this case, only the first code will be stored
    fhir_paths=[("code", "code.coding[0].code"), ("code", "code.coding[1].code")],
    # CORRECT EXAMPLE
    # Whenever we are working with codes, it is usually better to use the `where` argument and
    # to store the values using a meaningful name
    fhir_paths=[
        ("code_abc", "code.coding.where(system = 'ABC').code"),
        ("code_def", "code.coding.where(system = 'DEF').code"),
    ],
    stop_after_first_page=True,
)

如果您不确定我们是否多次收集了相同的条目（即在使用多处理sail_through_search_space的资源中使用时间段时），请使用drop_duplicatespandas 的功能。我们不希望重复的列名列表应作为参数传递，所有重复的行都将消失。

`query_to_dataframe`

该函数只是一个包装器，可用于组合任何类型 1 和 bundles_to_dataframe. 查看一些用例的示例。

`trade_rows_for_dataframe`

此函数的输出类似于query_to_dataframewith bundles_function=trade_rows_for_bundles，但有两个主要区别：

在这里，将检索捆绑包并立即计算 DataFrame。首先检索所有包， query_to_dataframe(bundles_function=trade_rows_for_bundles, ...)然后将它们转换为 DataFrame。
如果df_constraints指定了约束，它们将最终出现在最终的 DataFrame 中。

您可以在示例 3中找到一个示例。

矿工

Miner使用 DataFrame 并在SpaCy的帮助下搜索特定的正则表达式。也可以为应该排除的文本添加正则表达式。请使用正则表达式检查器（例如https://regex101.com/）来构建您的正则表达式。

from fhir_pyrate import Miner

df_diagnostic_reports = ...  # Get a DataFrame
# Search for text where the word "Tumor" is present
miner = Miner(
    target_regex="Tumor*",
    decode_text=...# Here you can write a function that processes each single text (e.g. stripping, decoding)
)
df_filtered = miner.nlp_on_dataframe(
  df_diagnostic_reports,
  text_column_name="report_text",
  new_column_name="text_found"
)

DicomDownloader

在我们研究所，我们有一个 DicomWebAdapter 应用程序，可用于从我们医院的 PACS 系统下载研究和系列。DicomDownloader 使用 DicomWebClient和每个 PACS 的特定内部 URL 来连接和下载图像。我们找不到提供类似功能的公共系统，因此该课程仅在我们的内部 FHIR 服务器上进行了测试。如果您有任何疑问或希望某些特定功能能够在您的研究所使用，请不要犹豫并联系我们，或写一个拉取请求！

DicomDownloader下载完整的研究 ( StudyInstanceUID ) 或特定系列 (StudyInstanceUID + SeriesInstanceUID)。

相关数据可以通过 es DICOM ( .dcm) 或 NIfTI ( .nii.gz) 下载。在 NIfTI 案例中，将有一个附加.dcm文件来存储一些元数据。

使用该功能download_data_from_dataframe，可以直接从给定数据帧的数据中下载研究和系列。可以指定包含研究/系列信息的列。要查看 DataFrame 的外观示例，请参阅示例 2。将返回一个 DataFrame，其中指定了成功下载的 Study/Series ID、去识别的 ID 和下载文件夹名称。此外，还将返回包含失败研究的 DataFrame，以及错误类型和回溯。

from fhir_pyrate import DicomDownloader

auth = ...
# Initialize the Study Downloader
# Decide to download the data as NIfTis, set it to "dicom" for DICOMs
downloader = DicomDownloader(
  auth=auth,
  output_format="nifti",
  dicom_web_url=DICOM_WEB_URL, # Specify a URL of your DICOM Web Adapter
)

# Get some studies
df_studies = ...
# Download the series
successful_df, error_df = downloader.download_data_from_dataframe(
  df_studies,
  output_dir="out",
  study_uid_col="study_instance_uid",
  series_uid_col="series_instance_uid",
  download_full_study=False, # If we download the entire study, series_instance_uid will not be used
)

此外，还可以使用该download_data功能下载作为参数给出的单个研究或系列。在这种情况下，映射信息将作为可用于构建映射文件的字典列表返回。

# Download only one series and get some download information
download_info = downloader.download_data(
  study_uid="1.2.826.0.1.3680043.8.498.24222694654806877939684038520520717689",
  series_uid="1.2.826.0.1.3680043.8.498.33463995182843850024561469634734635961",
  output_dir="out",
  save_metadata=True,
)
# Download only one full study
download_info_study = downloader.download_data(
  study_uid="1.2.826.0.1.3680043.8.498.24222694654806877939684038520520717689",
  series_uid=None,
  output_dir="out",
  save_metadata=True,
)

贡献

贡献使开源社区成为学习、启发和创造的绝佳场所。非常感谢您所做的任何贡献。如果你有一个可以让这变得更好的建议，请分叉回购并创建一个拉取请求。您也可以简单地打开带有“增强”标签的问题。

分叉项目
创建你的功能分支（git checkout -b feature/AmazingFeature）
提交您的更改（git commit -m 'Add some AmazingFeature'）
推送到分支（git push origin 功能/AmazingFeature）
打开拉取请求

作者和致谢

gakusai : 最初的想法、开发、标志和人物
giuliabaldini：开发、测试、新功能

我们要感谢razorx89的无限知识。

执照

这个项目是在MIT License下获得许可的。

项目状态

该项目正在积极开发中。

项目详情

发布历史发布通知| RSS订阅

0.2.0b4 预发布

2022 年 9 月 23 日

0.2.0b3 预发布

2022 年 9 月 23 日

0.2.0b2 预发布

2022 年 9 月 2 日

0.2.0b1 预发布

2022 年 8 月 26 日

这个版本

0.1.0

2022 年 8 月 3 日

0.1.0b9 预发布

2022 年 7 月 1 日

0.1.0b8 预发布

2022 年 6 月 13 日

0.1.0b7 预发布

2022 年 6 月 8 日

0.1.0b6 预发布

2022 年 6 月 2 日

0.1.0b5 预发布

2022 年 5 月 27 日

0.1.0b4 预发布

2022 年 5 月 18 日

0.1.0b3

由 Python 中文网翻译和维护。

fhir-pyrate 0.1.0

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

安装

要么点

或在诗中

运行测试

解释和例子

啊哈

海盗

`sail_through_search_space`

`trade_rows_for_bundles`

`bundles_to_dataframe`

`query_to_dataframe`

`trade_rows_for_dataframe`

矿工

DicomDownloader

贡献

作者和致谢

执照

项目状态

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史发布通知| RSS订阅

fhir-pyrate 0.1.0

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

安装

要么点

或在诗中

运行测试

解释和例子

贡献

作者和致谢

执照

项目状态

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史 发布通知| RSS订阅

发布历史发布通知| RSS订阅