Skip to main content

使用 pytest 对系统发育管道进行质量控制

项目描述

Phytest 徽标

pypi 徽章 测试徽章 报道徽章 文档徽章 黑色徽章 预提交徽章 doi 徽章

Phytest:系统发育分析的质量控制。


文档:https ://phytest-devs.github.io/phytest

代码:https ://github.com/phytest-devs/phytest

教程:https ://github.com/phytest-devs?q=example


安装

使用 pip 安装 phytest:

pip install phytest

快速开始

Phytest 是一种用于在系统发育分析期间自动对序列、树和元数据文件进行质量控制检查的工具。Phytest 确保系统发育分析符合用户定义的质量控制测试。

在这里,我们将创建示例数据文件来运行我们的测试。

创建对齐fasta文件example.fasta

>Sequence_A
ATGAGATCCCCGATAGCGAGCTAGCGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_B
ATGAGATCCCCGATAGCGAGCTAGXGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_C
ATGAGA--CCCGATAGCGAGCTAGCGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_D
ATGAGATCCCCGATAGCGAGCTAGCGATNNNNNNNNNNNNNNNNNTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG

创建树 newick 文件example.tree

(Sequence_A:1,Sequence_B:0.2,(Sequence_C:0.3,Sequence_D:0.4):0.5);

编写测试文件

我们想对我们的数据强制执行以下约束:
  1. 比对有4个序列

  2. 序列长度为 100

  3. 序列仅包含字符 A、T、G、C、N 和 -

  4. 允许序列仅包含单个碱基缺失

  5. 最长的 Ns 是 10

  6. 这棵树有 4 个提示

  7. 树分叉了

  8. 对齐和树具有相同的名称

  9. 所有内部分支都长于给定阈值

  10. 树中没有异常分支

我们可以在 python 文件中编写这些测试example.py

from phytest import Alignment, Sequence, Tree


def test_alignment_has_4_sequences(alignment: Alignment):
    alignment.assert_length(4)


def test_alignment_has_a_width_of_100(alignment: Alignment):
    alignment.assert_width(100)


def test_sequences_only_contains_the_characters(sequence: Sequence):
    sequence.assert_valid_alphabet(alphabet="ATGCN-")


def test_single_base_deletions(sequence: Sequence):
    sequence.assert_longest_stretch_gaps(max=1)


def test_longest_stretch_of_Ns_is_10(sequence: Sequence):
    sequence.assert_longest_stretch_Ns(max=10)


def test_tree_has_4_tips(tree: Tree):
    tree.assert_number_of_tips(4)


def test_tree_is_bifurcating(tree: Tree):
    tree.assert_is_bifurcating()


def test_aln_tree_match_names(alignment: Alignment, tree: Tree):
    aln_names = [i.name for i in alignment]
    tree.assert_tip_names(aln_names)


def test_all_internal_branches_lengths_above_threshold(tree: Tree, threshold=1e-4):
    tree.assert_internal_branch_lengths(min=threshold)


def test_outlier_branches(tree: Tree):
    # Here we create a custom function to detect outliers
    import statistics

    tips = tree.get_terminals()
    branch_lengths = [t.branch_length for t in tips]
    cut_off = statistics.mean(branch_lengths) + statistics.stdev(branch_lengths)
    for tip in tips:
        assert tip.branch_length < cut_off, f"Outlier tip '{tip.name}' (branch length = {tip.branch_length})!"

运行 Phytest

然后我们可以对我们的数据运行这些测试phytest

phytest examples/example.py -s examples/data/example.fasta -t examples/data/example.tree

通过添加生成报告--report report.html

HTML 报告

从输出中我们可以看到几个测试失败:

FAILED examples/example.py::test_sequences_only_contains_the_characters[Sequence_B] - AssertionError: Invalid pattern found in 'Sequence_B'!
FAILED examples/example.py::test_single_base_deletions[Sequence_C] - AssertionError: Longest stretch of '-' in 'Sequence_C' > 1!
FAILED examples/example.py::test_longest_stretch_of_Ns_is_10[Sequence_D] - AssertionError: Longest stretch of 'N' in 'Sequence_D' > 10!
FAILED examples/example.py::test_outlier_branches - AssertionError: Outlier tip 'Sequence_A' (branch length = 1.0)!

Results (0.07s):
    15 passed
    4 failed
        - examples/example.py:12 test_sequences_only_contains_the_characters[Sequence_B]
        - examples/example.py:16 test_single_base_deletions[Sequence_C]
        - examples/example.py:20 test_longest_stretch_of_Ns_is_10[Sequence_D]
        - examples/example.py:32 test_outlier_branches

有关更多信息,请参阅文档https://phytest-devs.github.io/phytest

项目详情


下载文件

下载适用于您平台的文件。如果您不确定要选择哪个,请了解有关安装包的更多信息。

源分布

phytest-1.1.0.tar.gz (17.8 kB 查看哈希)

已上传 source

内置分布

phytest-1.1.0-py3-none-any.whl (18.2 kB 查看哈希

已上传 py3