pytest_wdl.data_types package

Submodules

pytest_wdl.data_types.bam module

Convert BAM to SAM for diff.

class pytest_wdl.data_types.bam.BamDataFile(local_path: pathlib.Path, localizer: Optional[pytest_wdl.localizers.Localizer] = None, **compare_opts)[source]

Bases: pytest_wdl.data_types.DataFile

Supports comparing output of BAM file. This uses pysam to convert BAM to SAM, so that DataFile can carry out a regular diff on the SAM files.

class pytest_wdl.data_types.bam.Sorting[source]

Bases: enum.Enum

An enumeration.

COORDINATE = 1
NAME = 2
NONE = 0
pytest_wdl.data_types.bam.assert_bam_files_equal(file1: pathlib.Path, file2: pathlib.Path, allowed_diff_lines: int = 0, min_mapq: int = 0, compare_tag_columns: bool = False)[source]

Compare two BAM files: * Convert them to SAM format * Optionally re-sort the files by chromosome, position, and flag * First compare all lines using only a subset of columns that should be deterministic * Next, filter the files by MAPQ and compare the remaining rows using all columns

Parameters
  • file1 – First BAM to compare

  • file2 – Second BAM to compare

  • allowed_diff_lines – Number of lines by which the BAMs are allowed to differ (after being convert to SAM)

  • min_mapq – Minimum mapq used to filter reads when comparing all columns

  • compare_tag_columns – Whether to include tag columns (12+) when comparing all columns

pytest_wdl.data_types.bam.bam_to_sam(input_bam: pathlib.Path, output_sam: pathlib.Path, headers: Optional[Iterable[str]] = ('HD', 'SQ', 'RG'), min_mapq: Optional[int] = None, sorting: pytest_wdl.data_types.bam.Sorting = <Sorting.NONE: 0>)[source]

Use PySAM to convert bam to sam.

pytest_wdl.data_types.bam.diff_bam_columns(file1: pathlib.Path, file2: pathlib.Path, columns: str) → int[source]

pytest_wdl.data_types.json module

class pytest_wdl.data_types.json.JsonDataFile(local_path: pathlib.Path, localizer: Optional[pytest_wdl.localizers.Localizer] = None, **compare_opts)[source]

Bases: pytest_wdl.data_types.DataFile

pytest_wdl.data_types.vcf module

Some tools that generate VCF (callers) will result in very slightly different qual scores and other floating-point-valued fields when run on different hardware. This handler ignores the QUAL and INFO columns and only compares the genotype (GT) field of sample columns. Only works for single-sample VCFs.

class pytest_wdl.data_types.vcf.VcfDataFile(local_path: pathlib.Path, localizer: Optional[pytest_wdl.localizers.Localizer] = None, **compare_opts)[source]

Bases: pytest_wdl.data_types.DataFile

pytest_wdl.data_types.vcf.diff_vcf_columns(file1: pathlib.Path, file2: pathlib.Path, compare_phase: bool = False) → int[source]

Module contents

class pytest_wdl.data_types.DataFile(local_path: pathlib.Path, localizer: Optional[pytest_wdl.localizers.Localizer] = None, **compare_opts)[source]

Bases: object

A data file, which may be local, remote, or represented as a string.

Parameters
  • local_path – Path where the data file should exist after being localized.

  • localizer – Localizer object, for persisting the file on the local disk.

  • allowed_diff_lines – Number of lines by which the file is allowed to differ from another and still be considered equal.

  • compare_opts – Additional type-specific comparison options.

assert_contents_equal(other: Union[str, pathlib.Path, DataFile]) → None[source]

Assert the contents of two files are equal.

If allowed_diff_lines == 0, files are compared using MD5 hashes, otherwise their contents are compared using the linux diff command.

Parameters

other – A DataFile or string file path.

Raises

AssertionError if the files are different.

property path
set_compare_opts(**kwargs)[source]

Update comparison options.

Parameters

**kwargs – Comparison options to update.

class pytest_wdl.data_types.DefaultDataFile(local_path: pathlib.Path, localizer: Optional[pytest_wdl.localizers.Localizer] = None, **compare_opts)[source]

Bases: pytest_wdl.data_types.DataFile

pytest_wdl.data_types.assert_binary_files_equal(file1: pathlib.Path, file2: pathlib.Path, digest: str = 'md5') → None[source]
pytest_wdl.data_types.assert_text_files_equal(file1: pathlib.Path, file2: pathlib.Path, allowed_diff_lines: int = 0, diff_fn: Callable[[pathlib.Path, pathlib.Path], int] = <function diff_default>) → None[source]
pytest_wdl.data_types.compare_gzip(file1: pathlib.Path, file2: pathlib.Path)[source]
pytest_wdl.data_types.diff_default(file1: pathlib.Path, file2: pathlib.Path) → int[source]

Default diff command.

Parameters
  • file1 – First file to compare

  • file2 – Second file to compare

Returns

Number of different lines.