Count#

A simple example where we can find how count task works. This task is able with command-line.

[1] -

from os import getcwd
from os.path import dirname
from openvariant import count

dataset_folder = f'{dirname(getcwd())}/datasets/sample2'
annotation_path = f'{dirname(getcwd())}/datasets/sample2/annotation.yaml'

count task allows us to count the number of rows that result has. It has the following parameters:

base_path - Input path to explore and parse.
annotation_path - Path of the annotation path.
group_by - Key to group rows.
where - Filter expression.
cores - Maximum processes to run in parallel.
quite - Do not show the progress meanwhile the parsing is running.
skip_files - Skip unreadable files and directories.

On the following example we can see a general case of count task:

[2] -

result = count(base_path=dataset_folder, annotation_path=annotation_path, quite=True)
print(f"Total: {result[0]}")

Total: 18

One of the parameter on count task is where. You will be able to apply a conditional filter. The possible operations can be:

== - Equal.
!= - Not equal.
<= - Less or equal than.
< - Less than.
>= - More or equal than.
> - More than.

Also, group_by to group rows on different values of this key. An example of these parameters is the following one:

[3] -

result = count(base_path=dataset_folder, annotation_path=annotation_path, where="SYMBOL == 'ATAD3C'", group_by="CANCER", quite=True)
print(f"Total: {result[0]}")
print(f"Groups and count: {result[1]}")

Total: 2
Groups and count: {'MESO': 1, 'ACC': 1}

Cat

Group by