Count#
A simple example where we can find how count task works. This task is able with command-line.
[1] -
from os import getcwd
from os.path import dirname
from openvariant import count
dataset_folder = f'{dirname(getcwd())}/datasets/sample2'
annotation_path = f'{dirname(getcwd())}/datasets/sample2/annotation.yaml'
count task allows us to count the number of rows that result has. It has the following parameters:
base_path- Input path to explore and parse.annotation_path- Path of the annotation path.group_by- Key to group rows.where- Filter expression.cores- Maximum processes to run in parallel.quite- Do not show the progress meanwhile the parsing is running.skip_files- Skip unreadable files and directories.
On the following example we can see a general case of count task:
[2] -
result = count(base_path=dataset_folder, annotation_path=annotation_path, quite=True)
print(f"Total: {result[0]}")
Total: 18
One of the parameter on count task is where. You will be able to apply a conditional filter. The possible operations can be:
==- Equal.!=- Not equal.<=- Less or equal than.<- Less than.>=- More or equal than.>- More than.
Also, group_by to group rows on different values of this key. An example of these parameters is the following one:
[3] -
result = count(base_path=dataset_folder, annotation_path=annotation_path, where="SYMBOL == 'ATAD3C'", group_by="CANCER", quite=True)
print(f"Total: {result[0]}")
print(f"Groups and count: {result[1]}")
Total: 2
Groups and count: {'MESO': 1, 'ACC': 1}