Count#

A simple example where we can find how count task works. This task is able with command-line.

[1] - 
from os import getcwd
from os.path import dirname
from openvariant import count

dataset_folder = f'{dirname(getcwd())}/datasets/sample2'
annotation_path = f'{dirname(getcwd())}/datasets/sample2/annotation.yaml'

count task allows us to count the number of rows that result has. It has the following parameters:

  • base_path - Input path to explore and parse.

  • annotation_path - Path of the annotation path.

  • group_by - Key to group rows.

  • where - Filter expression.

  • cores - Maximum processes to run in parallel.

  • quite - Do not show the progress meanwhile the parsing is running.

  • skip_files - Skip unreadable files and directories.

On the following example we can see a general case of count task:

[2] - 
result = count(base_path=dataset_folder, annotation_path=annotation_path, quite=True)
print(f"Total: {result[0]}")
Total: 18

One of the parameter on count task is where. You will be able to apply a conditional filter. The possible operations can be:

  • == - Equal.

  • != - Not equal.

  • <= - Less or equal than.

  • < - Less than.

  • >= - More or equal than.

  • > - More than.

Also, group_by to group rows on different values of this key. An example of these parameters is the following one:

[3] - 
result = count(base_path=dataset_folder, annotation_path=annotation_path, where="SYMBOL == 'ATAD3C'", group_by="CANCER", quite=True)
print(f"Total: {result[0]}")
print(f"Groups and count: {result[1]}")
Total: 2
Groups and count: {'MESO': 1, 'ACC': 1}