使用 pytest 对系统发育管道进行质量控制
项目描述
Phytest:系统发育分析的质量控制。
文档:https ://phytest-devs.github.io/phytest
代码:https ://github.com/phytest-devs/phytest
教程:https ://github.com/phytest-devs?q=example
安装
使用 pip 安装 phytest:
pip install phytest
快速开始
Phytest 是一种用于在系统发育分析期间自动对序列、树和元数据文件进行质量控制检查的工具。Phytest 确保系统发育分析符合用户定义的质量控制测试。
在这里,我们将创建示例数据文件来运行我们的测试。
创建对齐fasta文件example.fasta
>Sequence_A
ATGAGATCCCCGATAGCGAGCTAGCGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_B
ATGAGATCCCCGATAGCGAGCTAGXGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_C
ATGAGA--CCCGATAGCGAGCTAGCGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_D
ATGAGATCCCCGATAGCGAGCTAGCGATNNNNNNNNNNNNNNNNNTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
创建树 newick 文件example.tree
(Sequence_A:1,Sequence_B:0.2,(Sequence_C:0.3,Sequence_D:0.4):0.5);
编写测试文件
- 我们想对我们的数据强制执行以下约束:
比对有4个序列
序列长度为 100
序列仅包含字符 A、T、G、C、N 和 -
允许序列仅包含单个碱基缺失
最长的 Ns 是 10
这棵树有 4 个提示
树分叉了
对齐和树具有相同的名称
所有内部分支都长于给定阈值
树中没有异常分支
我们可以在 python 文件中编写这些测试example.py
from phytest import Alignment, Sequence, Tree
def test_alignment_has_4_sequences(alignment: Alignment):
alignment.assert_length(4)
def test_alignment_has_a_width_of_100(alignment: Alignment):
alignment.assert_width(100)
def test_sequences_only_contains_the_characters(sequence: Sequence):
sequence.assert_valid_alphabet(alphabet="ATGCN-")
def test_single_base_deletions(sequence: Sequence):
sequence.assert_longest_stretch_gaps(max=1)
def test_longest_stretch_of_Ns_is_10(sequence: Sequence):
sequence.assert_longest_stretch_Ns(max=10)
def test_tree_has_4_tips(tree: Tree):
tree.assert_number_of_tips(4)
def test_tree_is_bifurcating(tree: Tree):
tree.assert_is_bifurcating()
def test_aln_tree_match_names(alignment: Alignment, tree: Tree):
aln_names = [i.name for i in alignment]
tree.assert_tip_names(aln_names)
def test_all_internal_branches_lengths_above_threshold(tree: Tree, threshold=1e-4):
tree.assert_internal_branch_lengths(min=threshold)
def test_outlier_branches(tree: Tree):
# Here we create a custom function to detect outliers
import statistics
tips = tree.get_terminals()
branch_lengths = [t.branch_length for t in tips]
cut_off = statistics.mean(branch_lengths) + statistics.stdev(branch_lengths)
for tip in tips:
assert tip.branch_length < cut_off, f"Outlier tip '{tip.name}' (branch length = {tip.branch_length})!"
运行 Phytest
然后我们可以对我们的数据运行这些测试phytest:
phytest examples/example.py -s examples/data/example.fasta -t examples/data/example.tree
通过添加生成报告--report report.html。
从输出中我们可以看到几个测试失败:
FAILED examples/example.py::test_sequences_only_contains_the_characters[Sequence_B] - AssertionError: Invalid pattern found in 'Sequence_B'!
FAILED examples/example.py::test_single_base_deletions[Sequence_C] - AssertionError: Longest stretch of '-' in 'Sequence_C' > 1!
FAILED examples/example.py::test_longest_stretch_of_Ns_is_10[Sequence_D] - AssertionError: Longest stretch of 'N' in 'Sequence_D' > 10!
FAILED examples/example.py::test_outlier_branches - AssertionError: Outlier tip 'Sequence_A' (branch length = 1.0)!
Results (0.07s):
15 passed
4 failed
- examples/example.py:12 test_sequences_only_contains_the_characters[Sequence_B]
- examples/example.py:16 test_single_base_deletions[Sequence_C]
- examples/example.py:20 test_longest_stretch_of_Ns_is_10[Sequence_D]
- examples/example.py:32 test_outlier_branches
有关更多信息,请参阅文档https://phytest-devs.github.io/phytest。
项目详情
下载文件
下载适用于您平台的文件。如果您不确定要选择哪个,请了解有关安装包的更多信息。
源分布
phytest-1.1.0.tar.gz
(17.8 kB
查看哈希)
内置分布
phytest-1.1.0-py3-none-any.whl
(18.2 kB
查看哈希)