Group by#
A simple example where we can find how group by task works. This task is able with command-line.
[1] -
from os.path import dirname
from os import getcwd
from openvariant import group_by
dataset_folder = f'{dirname(getcwd())}/datasets/sample2'
annotation_path = f'{dirname(getcwd())}/datasets/sample2/annotation.yaml'
group_by task allows us to group the rows depending on the value of an output field.
base_path- Input path to explore and parse.annotation_path- Path of the annotation path.script- Command-line to execute with the result of the parsing.key_by- Key to group rows.where- Filter expression.cores- Maximum processes to run in parallel.quite- Do not show the progress meanwhile the parsing is running.header- Show header on the result.skip_files- Skip unreadable files and directories.
On the following example we can see a general case for group by task:
[2] -
for group, values, script_used in group_by(base_path=dataset_folder, annotation_path=annotation_path, script=None, key_by="CANCER", quite=True):
print(f'Group: {group}')
for row in values:
print(row)
print("\n")
Group: MESO
ACAP3 1p36.33 MESO
ACTRT2 1p36.32 MESO
AGRN 1p36.33 MESO
ANKRD65 1p36.33 MESO
ATAD3A 1p36.33 MESO
ATAD3B 1p36.33 MESO
ATAD3C 1p36.33 MESO
AURKAIP1 1p36.33 MESO
B3GALT6 1p36.33 MESO
Group: ACC
ACAP3 1p36.33 ACC
ACTRT2 1p36.32 ACC
AGRN 1p36.33 ACC
ANKRD65 1p36.33 ACC
ATAD3A 1p36.33 ACC
ATAD3B 1p36.33 ACC
ATAD3C 1p36.33 ACC
AURKAIP1 1p36.33 ACC
B3GALT6 1p36.33 ACC
One of the parameters on count task is where. You will be able to apply a conditional filter. The possible operations can be:
==- Equal.!=- Not equal.<=- Less or equal than.<- Less than.>=- More or equal than.>- More than.
One example of this parameter is the following one:
[3] -
for group, values, script_used in group_by(base_path=dataset_folder, annotation_path=annotation_path, script=None,where="SYMBOL == 'ATAD3C'", key_by="CANCER", quite=True):
print(f'Group: {group}')
for row in values:
print(row)
print("\n")
Group: MESO
ATAD3C 1p36.33 MESO
Group: ACC
ATAD3C 1p36.33 ACC
Also, on group by task, there is script parameter which will allow to the user to execute a command shell on the parsed result. In the following example we can see how many characters there are in each group of the parsed output:
[4] -
for group, values, script_used in group_by(base_path=dataset_folder, annotation_path=annotation_path, script="wc -m", key_by="CANCER", quite=True):
print(f'Group: {group}')
for row in values:
print(row)
print("\n")
Group: MESO
181
Group: ACC
172