xb package

xb.neighborhood module

xb.neighborhood.nhood_squidpy(adata, sample_key='sample', radius=50, cluster_key='leiden', save=True, plot_path='./', cmap='inferno', vmax=None, vmin=None)[source]

Compute neighborhood enrichment based on Squidpy’s function

Args:

adata (AnnData): AnnData object with the cells of the experiment.

sample_key (str): name of the column where the sample each cell belongs to is specify. It should be a column present in adata.obs.

radius (int): radius to consider when compuing the spatial neighbors, specified in the scale that adata.obsm[‘spatial’] is in (typically um).

cluster_key (str): name of the column where the cell type of each cell is specified. The neighborhood enrichment will be computed based on this groups.

save (Boolean): specify whether the resulting plot should be saved in the paths specified in ‘plot_path’ or not.

cmap (str): name of the colormap used to plot the neighborhood enrichment plot.

vmax (int): maximum value to show in the neighborhood enrcihment plot.

vmin (int): minimum value to show in the neighborhood enrichment plot.

results:

adata1: AnnData object with the neighborhood enrichment scores computed.

xb.plotting module

xb.plotting.generate_hex_colors(num_colors=70)[source]

Generate a list of hex colors.

Args:: num_colors(int): number of colors to generate.
results:: hex_colors (list):list of randomly generated colors.

xb.plotting.map_of_clusters(adata, key='leiden', clusters='all', size=8, background='white', figuresize=(10, 7), save=None, format='pdf')[source]

Make spatial plots based on a given adata object.

Args:

key (str): the terms in adata.obs that you want to plot.

clusters (str or list):’all’ for plotting all clusters in a single plot, ‘individual’: for plots of individual genes, or [‘3’,’5’] (your groups between square brackets to plot only some clusters.

size: to change the size of your spots.

background (str): color of the background.

figuresize (tupple): to specify the size of your figure.

save (boolean or str): whether want to save your figure. If so, please add the PATH of the folder where you want to save it.

format (str): specify the format in which you want to save your figure (i.e. ‘.pdf’, ‘.png’).

results:

None.

xb.plotting.plot_cell_counts(adata, plot_path: str, save=True, clustering_params={})[source]

Plot the histogram of the counts detected per cell

Args:

adata (AnnData): AnnData object with the information of cells profiled.

plot_path (str): path where to save the generated plot, if needed.

save (boolean): whether to save or not the output path.

clustering_params (dict): list of parameters used for preprocessing and clustering the experiment.

results:

None.

xb.plotting.plot_domains(adata, groupby='nbd_domain')[source]

Generate the spatial plots of the domains previously identified

Args:

adata (AnnData): AnnData object with the information of cells profiled.

groupby (str): Name of the column in adata.obs where the domain information is stored.

results:

None

xb.calculating module

xb.calculating.alphashape_fun(points, alpha=0.1)[source]

Caculate area of a a cell

Args:

points (list of tuple): list of xy points found in a cell (i.e. [(1,2),(2,4)]).

alpha (int): alpha parameter to be tuned to define cell border.

results:

area(flaot): Area of the cell.

xb.calculating.coexpression_calculation(exp, min_exp=0)[source]

Caculate coexpression between genes in a given dataset

Args:

exp (DataFrame): expression of cells profiled in a cell x gene format, where cells are rows and genes are columns.

min_exp (float): Maximum expression of the cells to be considered as not expressing a gene (typically is 0).

results:

coexpression(DataFrame): coexpression DataFrame represented as a gene-by-gene matrix.

xb.calculating.compute_fmi(ground_truth, predicted)[source]

Compute fowlkes mallows index for two different clusterings

Args:

ground_truth (list): list of reference clusters given to cells profiled.

predicted (list): list of predicted/computed clusters for cells profiled.

results:

fmi_score(float): fowlkes mallows index.

xb.calculating.compute_nmi(ground_truth, predicted)[source]

Compute normalized mutual information score for two different clusterings

Args:

ground_truth (list): list of reference clusters given to cells profiled.

predicted (list): list of predicted/computed clusters for cells profiled.

results:

nmi_score(float): normalized mutual information.

xb.calculating.compute_vi(ground_truth, predicted)[source]

Compute variation of information for comparing two different clusterings

Args:

ground_truth (list): list of reference clusters given to cells profiled.

predicted (list): list of predicted/computed clusters for cells profiled.

results:

vi_score(float): variation of information.

xb.calculating.dispersion(reads_original, adata1)[source]

Calculate the distance between each read and its assigned cell

Args:

reads_original(DataFrame): information of all profiled reads.

adata1(AnnData): object with the expression and metadata of cells profiled, including spatial position.

results:

reads_assigned (DataFrame): information of all profiled reads, includinf distance to its closest cell.

xb.calculating.dist_nuc(reads_ctdsub)[source]

Compute the median distance to the nuclei the edges of each cell, for all cells profiled

Args:: reads_ctdsub (DataFrame): Dataframe containing the information of the transcripts profiled, incuding their location in ‘x_location’ and ‘y_location’, as well as the cell they are assigned to, in ‘cell_id’.
results:: median_dist(float): Median distance of cell edges for all cells profiled.

xb.calculating.distance_calc(x1, y1, x2, y2)[source]

Calculate distance between two points

Args:

x1(float): x coordinate of the first point.

y1(float): y coordinate of the first point.

x2(float): x coordinate of the second point.

y2(float): y coordinate of the second point.

results:

distance (float): distance between the two points.

xb.calculating.domainassign(plsin, adatadom)[source]

Assign cells to domains based on predefined polygons

Args:

plsin (DataFrame): Information of polygons defining domains.

adatadom (AnnData): cells profiled spatially in an AnnData object and with information of their spatial location in [‘x_centroid’] and [‘y_centroid’].

results:

None.

xb.calculating.entropy(clustering)[source]

Compute entropy

Args:: clustering (list): list of clusters assigned to cells.
results:: entropy_value(float): entropy value computed.

xb.calculating.hex_to_rgb(value)[source]

Transform hex to rgb

Args:: value (str): hex code to be transform (i.e ‘#h4a4a2’).
results:: rgb_value(tuple): Rgb value.

xb.calculating.negative_marker_purity_coexpression(adata_sp: AnnData, adata_sc: AnnData, key: str = 'celltype', pipeline_output: bool = True, minexp: float = 0.0)[source]

Negative marker purity aims to measure read leakeage between cells in spatial datasets.

For this, we calculate the increase in reads assigned in spatial datasets to pairs of genes-celltyes with no/very low expression in scRNAseq

Args:

adata_sp : AnnData; Annotated AnnData object with counts from spatial data.

adata_sc : AnnData; Annotated AnnData object with counts scRNAseq data.

key : str; Celltype key in adata_sp.obs and adata_sc.obs.

pipeline_output : float, optional; Boolean for whether to use the function in the pipeline or not.

returns:

negative marker purity : float; Increase in proportion of reads assigned in spatial data to pairs of genes-celltyes with no/very low expression in scRNAseq.

xb.calculating.svf_moranI(adata1, sample_key='sample', radius=50.0)[source]

Compute spatially variable features using Moran’s I (squidpy implementation)

Args:

adata1 (AnnData): AnnData object of profiled cells.

sample_key (str): Column of adata1.obs where sample of origin of each cell is stored.

radius (float): Radius usd to compute the spatial neighbors in sq.gr.spatial_neighbors. Given in the scale the spatial coordinates are in (typically in um).

results:

adata1 (AnnData): AnnData object of profiled cells with computed svf’s.

hs_results(DataFrame): DataFrame with the results of computing moran’s I for each gene in the given input dataset, including pval, FDR and ranking of the gene.

xb.comparing module

xb.comparing.combine_med(medians, tag)[source]

Combine precomputed medians into a single dataframe

Args:

medians(list): list of precomputed medians of expression.

tag(str): tag to be added as a column to the list of medians. In here, this is the methods the medians where computed from.

results:

mm(DataFrame): formated medians into a DataFrame.

xb.comparing.median_calculator(adata_dict, df_filt)[source]

Calculate medians expression for cells profiled with each technology compared to a reference single cell RNAseq dataset

Args:

adata_dict (dict): dictionary including the names of the datasets analyzed as .keys() and AnnData’s of each technologies as .values(). It includes a reference scRNAseq dataset in ‘anno_scRNAseq’.

df_filt(DataFrame): dataframe including the list of genes to be compared in .index.

results:

means(dict): dictionary of means computed with names of the datasets in .keys() and a list of medians computed as .values().

genes_s(dict):dictionary of gene name of the means computed with names of the datasets in .keys() and a list of neme of the genes that have been used to compute medians computed as .values().

xb.domain_identification module

xb.domain_identification.adapt_banksy_for_multisample(adata, samplekey='sample')[source]

Modify the spatial coordinates of each sample in adata so that they can be later be processed together by banksy

Args:

adata (AnnData): AnnData object with the cells of the experiment.

samplekey(str): name of the column in adata.obs where the sample of origin of each cell is stored.

results:

adata (AnnData): AnnData object with the cells of the experiment with modified adata.obs[‘spatial’], ready to perform banksy.

xb.domain_identification.compare_domains(adata, domain_keys: list, save=True, plot_path='./')[source]

Compare domains assigned by different methods using ARI. Generate heatmap comparing them

Args:

adata (AnnData): AnnData object with the cells of the experiment.

domain_keys(list): list of the column names in adata.obs where domains are stored.

save(boolean): whether to save plots on not.

plot_path(str): path to the folder where to save the resulting plots.

results:

ARI (DataFrame): DataFrame consisting of ARI computed between domain idenification methods.

xb.domain_identification.define_palette(n_colors=50)[source]

Create a random palette of colors in hex format

Args:: n_colors(str): number of colors to be inclued in the palette.
results:: colorlist(list): list of generated colors in hex format.

xb.domain_identification.domains_by_banksy(adata, plot_path: str, banksy_params: dict, save=True)[source]

Modify the spatial coordinates of each sample in adata so that they can be later be processed together by banksy

Args:

adata (AnnData): AnnData object with the cells of the experiment where Banksy will be computed.

save(boolean): whether to save the resulting object or not.

plot_path(str): path where to save the plots generated, if desired.

banksy_params(dict): parameters required to perform banksy.

results:

adata (AnnData): Original AnnData object with the cells of the experiment with domains identified assigned to cells.

adata_res (AnnData): AnnData object resulting of the identification of domains. It contains all intermediate information generated by Banksy.

xb.domain_identification.domains_by_nbd(adata, hyperparameters_nbd: dict)[source]

Define cellular domains by collapsing using the cellular identity of neighboring cell types and clustering Args:

adata (AnnData): AnnData object with the cells of the experiment.

hyperparameters_nbd(dict): dictionary with all the parameters required to identify domains based on neighbors (neighbors based domains).

results:

adata (AnnData): Original AnnData object with expression of cells and the domain identified incorporated in a column in adata.obs.

adataneigh (AnnData): AnnData object where domains have been identified. Cells here include the identity of neighboring cells.

xb.domain_identification.domains_by_rbd(adata, hyperparameters_rbd: dict)[source]

Define cellular domains by collapsing the expression of cells arround each cell (a.k.a pseudobining) and clustering Args:

adata (AnnData): AnnData object with the cells of the experiment.

hyperparameters_rbd(dict): dictionary with all the parameters required to identify domains based on reads (read based domains).

results:

adata (AnnData): Original AnnData object with expression of cells and the domain identified incorporated in a column in adata.obs.

adataneigh (AnnData): AnnData object where domains have been identified. Cells here include the expression of neighboring cells collapsed into them.

xb.domain_identification.format_data_neighs(adata, sname, neighs=10)[source]

Redefine the expression of cells in adata by counting the neighnoring cell types of each cell

Args:

adata (AnnData): AnnData object with the cells of the experiment.

sname(str): column in adata.obs where the cluster assigned to each cells are stored.

neighs(int): number of neighbors to consider when computing neighboring cells.

results:

adata1 (AnnData): AnnData object with neighboring cell types included in a cell-by-celltype matrix.

xb.domain_identification.format_data_neighs_colapse(adata, condit, neighs=10)[source]

Redefine the expression of cells in adata by collapsing the expression of its neighbors into each cell (a.k.a pseudobining)

Args:

adata (AnnData): AnnData object with the cells of the experiment.

condit(str): column in adata.obs where the sample each cell belongs to is stored.

neighs(int): number of neighbors to consider when collapsing the expression of neighboring cells.

results:

adata1 (AnnData): AnnData object with expression of cells collapsed from neighboring cells.

xb.domain_identification.spatial_plot(adata, groupby='nbd_domain', save=False, plot_path='./')[source]

Generate spatial plot of each sample in an AnnData object, with cells color as required Args:

adata (AnnData): AnnData object with the cells of the experiment.

groupby(str): name of the column in adata.obs to use to color cells.

save(boolean):whether to save the resulting plots or not.

plot_path(str): if required, path where to save the resulting plots.

results:: None.

xb.formatting module

xb.formatting.batch_prep_xenium_data_for_baysor(files, outpath, CROP=True, COORDS=[1000, 5000, 1000, 5000])[source]

Running the function prep_xenium_data_for_baysor for multiple samples

Args:

files(list): list including the paths where the Xenium outputs are saved for each sample (output from the machine).

outpath(str): path where to store the resulting adata object.

CROP(boolean): whether to use a small Region of interest for segmentation.

COORDS(list): if CROP is used, coordinates of the crop in the form of [YMIN,YMAX,XMIN,XMAX].

results:

None.

xb.formatting.cell_area(adata_sp: AnnData, pipeline_output=True)[source]

Calculates the area of the region imaged using convex hull and divide total number of cells/area. XY position should be in um2

Args:

adata_sp : AnnData, annotated AnnData object with counts from spatial data.

pipeline_output : float, optional, boolean for whether to create the pipeline output.

results:

density : float

Cell density (cells/um)

xb.formatting.format_background(path)[source]

Format OME-TIFF background mipped image to .tiff image

Args:: path(str): path to the folder where the output of the Xenium machine is stored.
results:: None

xb.formatting.format_baysor_output_to_adata(path: str, output_path: str)[source]

Format baysor’s output to anndata

Args:: path (AnnData): path to the folder where baysor’s output is stored output_path(str): path where to store the generated adata
results:: adata (AnnData): AnnData object with the cells of the experiment

xb.formatting.format_data_neighs(adata, sname, condit, neighs=10)[source]

Redefine the expression of cells in adata by counting the neighnoring cell types of each cell

Args:

adata (AnnData): AnnData object with the cells of the experiment.

sname(str): column in adata.obs where the cluster assigned to each cells are stored.

neighs(int): number of neighbors to consider when computing neighboring cells.

results:

adata1 (AnnData): AnnData object with neighboring cell types included in a cell-by-celltype matrix.

xb.formatting.format_data_neighs_colapse(adata, sname, condit, neighs=10)[source]

Redefine the expression of cells in adata by collapsing the expression of its neighbors into each cell (a.k.a pseudobining)

Args:

adata (AnnData): AnnData object with the cells of the experiment.

sname(str): column in adata.obs where sample is stored.

condit(str): column in adata.obs where the sample each cell belongs to is stored.

neighs(int): number of neighbors to consider when collapsing the expression of neighboring cells.

results:

adata1 (AnnData): AnnData object with expression of cells collapsed from neighboring cells.

xb.formatting.format_to_adata(files: list, output_path: str, use_parquet=True, save=False, max_nucleus_distance=0, min_quality=10)[source]

Format xenium datasets (outputs from the machine, up to date 2024) to adata files and filter reads based on quality parameters

Args:

files(list): list including the paths where the Xenium outputs are saved for each sample (output from the machine).

output_path(str): path where to store the resulting adata object.

use_parquet(boolean): whether to use parquet files as an input to generate the AnnData File. (it’s way faster).

save(boolean): whether to save the resulting object.

max_nucleus_distance: Maximum distance from the nuclei for reads to be kept in redefined cells.

min_quality(float): Define minimum quality (qv) of reads to keep in the analysis.

results:

adata: AnnData object with the formated cells with only reads that passed the filters established.

xb.formatting.format_xenium_adata(path, tag, output_path)[source]

Format xenium data (output from the machine) to adata format, using the original Xenium format (pre-release)

Args:

path(str): path to the folder where the output of the Xenium machine is stored.

tag(str): sample tag to be added to be added to all cells formated from the section.

output_path(str): path where to store the resulting adata object.

results:

adata: AnnData object with the formated cells

xb.formatting.format_xenium_adata_2023(path, tag, output_path)[source]

Format xenium data (output from the machine) to adata format, considerin the format used by Xenium in Q1 2023

Args:

path(str): path to the folder where the output of the Xenium machine is stored.

tag(str): sample tag to be added to be added to all cells formated from the section.

output_path(str): path where to store the resulting adata object.

results:

adata: AnnData object with the formated cells.

xb.formatting.format_xenium_adata_final(path, tag, output_path, use_parquet=True, save=True)[source]

Format xenium data (output from the machine) to adata format using the official up-to-date Xenium format

Args:

path(str): path to the folder where the output of the Xenium machine is stored, if requested.

tag(str): sample tag to be added to be added to all cells formated from the section.

output_path(str): path where to store the resulting adata object.

use_parquet(boolean): whether to use parquet files as an input to generate the AnnData File. (it’s way faster).

save(boolean): whether to save the resulting object.

results:

adata: AnnData object with the formated cells

xb.formatting.format_xenium_adata_mid_2023(path, tag, output_path)[source]

Format xenium data (output from the machine) to adata format, considerin the format used by Xenium at Q2 2023

Args:

path(str): path to the folder where the output of the Xenium machine is stored.

tag(str): sample tag to be added to be added to all cells formated from the section.

output_path(str): path where to store the resulting adata object.

results:

adata: AnnData object with the formated cells.

xb.formatting.generate_random_color_variation(base_color, deviation=0.17)[source]

Generate variations of a reference color

Args:

base_color (str):reference hex color.

deviation(float): deviation from the base color that the resulting color should have.

results:

modified_hex_color(str):resulting hex color.

xb.formatting.keep_nuclei(adata1, overlaps_nucleus=1)[source]

Redefine cells in AnnData to keep only nuclear reads

Args:

adata1(AnnData): AnnData object with the cells of the experiment.

overlaps_nucleus(int): whether to keep only nuclear reads only (1) or cytoplasmic reads (0) in the redefinition of cells.

results:

adata: AnnData object with the formated cells

xb.formatting.keep_nuclei_and_quality(adata1, tag: str, max_nucleus_distance=1, min_quality=20, save=True, output_path='')[source]

Redefine cell expression based on nuclei expression an quality of detected reads

Args:

adata1 (AnnData): AnnData object with the cells of the experiment before filtereing reads based on quality or nuclear/non-nuclear.

tag (str): sample tag to added in the name of the saved filed, if needed.

save(boolean): whether to save the resulting files.

output_path(str): if needed, where to save the resulting files.

max_nucleus_distance(float): Maximum distance from the nuclei for reads to be kept in redefined cells.

min_quality(float): Define minimum quality (qv) of reads to keep in the analysis.

results:

adata1nuc(AnnData): AnnData object with the cells redefined based to input parameters.

xb.formatting.prep_xenium_data_for_baysor(XENIUM_DIR: str, OUT_DIR: str, CROP=True, COORDS=[15000, 16000, 15000, 16000])[source]

Format xenium datasets for its use for baysor segmentation

Args:

XENIUM_DIR(list): path where the Xenium output is saved for each sample (output from the machine).

OUT_DIR(str): path where to store the resulting adata object.

CROP(boolean): whether to use a small Region of interest for segmentation.

COORDS(list): if CROP is used, coordinates of the crop in the form of [YMIN,YMAX,XMIN,XMAX].

results:

None.

xb.preprocessing module

xb.preprocessing.main_preprocessing(adata, target_sum=100, mincounts=10, mingenes=3, neigh=15, npc=0, nuc=1, scale=False, hvg=False, default=False, total_clusters=30, norm=True, lg=True)[source]

Preprocess and cluster the cells in adata, given the parameters specified. This function is mainly used for simulating the performance of different preprocessing strategies

Args:

adata (AnnData): AnnData object with the cells of the experiment.

norm(boolean): Whether to normalize based cells or not.

target_sum(int or None): Target sum to use if the normalization is done based on library size. None is used for automatic calculation of library size.

lg(boolean): Whether to log-transforms cells.

mincounts (int): Minimum amount of counts detected in a cell to pass the quality filters.

mingenes (int): Minimum amount of genes expressed in a cell to pass the quality filters.

neigh(int): number of neighbors to used when calculating the nearest neighbors by sc.pp.neighbors().

npc(int): number of principal components to used when calculating the nearest neighbors by sc.pp.neighbors().

scale(boolean): whether to scale the data or not.

hvg(boolean): whether to select highly variable genes for further processing or not.

total_clusters (int): number of clusters to obtain in the process of clustering (+-2).

default(boolean): whether the run is the original one or not.

nuc(int): DEPRECATED. NOT USED IN THIS FUNCTION.

results:

adata: AnnData object with the preprocessed and clustered cells according to the parameters specified.

xb.preprocessing.preprocess_adata(adata, save=True, clustering_params={}, output_path='output_path')[source]

Preprocess and cluster the cells in adata given the parameters specified.

Args:

adata (AnnData): AnnData object with the cells of the experiment.

save (boolean):whether to save or not the adata object once it has been processed.

clustering_params(dict): Dictionary where main preprocessing and clustering parameters are inputed.

output_path(str): path where to save the adata object in case that option is selected.

results:

adata: AnnData object with the preprocessed and clustered cells according to the parameters specified.

xb.simulating module

xb.simulating.allcombs(adata)[source]

Simulate preprocessing workflows and extract results based on it

Args:: adata (AnnData): AnnData object with the cells of the experiment.
results:: allres(DataFrame): Clustering obtained with different preprocessing workflows.

xb.simulating.allcombs_simulated(adata, default_key='class')[source]

Simulate preprocessing workflows and extract results based on it for simulated data

Args:

adata (AnnData): AnnData object with the cells of the experiment.

default_key(str): name of the column in adata.obs where the reference cell types/clusters are stored.

results: allres(DataFrame): Clustering obtained with different preprocessing workflows.

xb.simulating.compute_fmi(ground_truth, predicted)[source]

Compute fowlkes mallows index for two different clusterings

Args:

ground_truth (list): list of reference clusters given to cells profiled.

predicted (list): list of predicted/computed clusters for cells profiled.

results:

fmi_score(float): fowlkes mallows index

xb.simulating.compute_vi(ground_truth, predicted)[source]

Compute variation of information for comparing two different clusterings

Args:

ground_truth (list): list of reference clusters given to cells profiled.

predicted (list): list of predicted/computed clusters for cells profiled.

results:

vi_score(float): variation of information.

xb.simulating.entropy(clustering)[source]

Compute entropy

Args:: clustering (list): list of clusters assigned to cells.
results:: entropy_value(float): entropy value computed.

xb.simulating.keep_nuclei_and_quality(adata1, overlaps_nucleus=1, qvmin=20)[source]

Redefine cell expression based on nuclei expression an quality of detected reads

Args:

adata1 (AnnData): AnnData object with the cells of the experiment before filtereing reads based on quality or nuclear/non-nuclear.

overlaps_nucleus(int): Keep reads overlapping nucleus only (1) or all (2).

qvmin(int): Define minimum quality (qv) of reads to keep in the analysis.

results:

adata1nuc(AnnData): AnnData object with the cells redefined based to input parameters.

xb.simulating.main_preprocessing(adata, target_sum=100, mincounts=10, mingenes=3, neigh=15, npc=0, nuc=1, scale=False, hvg=False, default=False, total_clusters=30, default_resol=1.6, logstatus=True, normstatus=True)[source]

preprocess and cluster cells in an Anndata object given some input parameters

Args:

adata(AnnData): AnnData object with the cells of the experiment before simulating the missegmentation.

target_sum(int or None): Target sum to use if the normalization is done based on library size. None is used for automatic calculation of library size.

mincounts (int): Minimum amount of counts detected in a cell to pass the quality filters.

mingenes (int): Minimum amount of genes expressed in a cell to pass the quality filters.

neigh(int): number of neighbors to used when calculating the nearest neighbors by sc.pp.neighbors().

npc(int): number of principal components to used when calculating the nearest neighbors by sc.pp.neighbors().

nuc(int): wether to use only nuclear reads (1) or all reads (0).

scale(boolean): whether to scale the data or not.

hvg(boolean): whether to select highly variable genes for further processing or not.

default(boolean): whether the run is the original one or not.

total_clusters (int): number of clusters to obtain in the process of clustering (+-2).

default_resol(float): clustering resolution to use as a default when clustering.

logstatus(boolean): Whether to log-transforms cells.

normstatus(boolean): Whether to normalize based cells or not.

results:

adata(AnnData): AnnData object after preprocessing and clustering.

xb.simulating.missegmentation_simulation(adata_sc_sub, missegmentation_percentage=0.1)[source]

Simulate missegmentation using a reference single cell data in adata form.

Args:: adata_sc_sub (AnnData): AnnData object with the cells of the experiment before simulating the missegmentation missegmentation_percentage (float): percentage of cells (%) that are presenting missegmentation
results:: adata_sc_sub(AnnData): AnnData object with the cells where missegmentation has been simulated according to input parameters

xb.simulating.noise_adder(adata_sc, percentage_of_noise=0.1)[source]

Add noise to a single cell data inputed according to input parameters

Args:: adata_sc (AnnData): AnnData object with the cells of the experiment before adding noise percentage_of_noise (float): percentage of noise events (%) in relation to the total amounts of cells
results:: adata_sc(AnnData): AnnData object with the cells where noise has been added

xb.simulating.subset_of_single_cell(adata_sc_sub, markers, random_markers_percentage=0, reads_x_cell=None, number_of_markers=200, n_reads_x_gene=40, percentage_of_noise=0.1, ms_percentage=0.1)[source]

Transform a single cell data to present spatial characteristics

Args:: adata_sc_sub (AnnData): AnnData object with the cells obtained from single cell datasets before transforming them into spatial-like datasets markers (DataFrame): dataframe incluing the main markers identified per cluster per cluster random_markers_percentage (float): percentage of non-marker genes included randomly in the genes selected for the panel reads_x_cell=None n_reads_x_gene (int,None): if int, final number of reads/cells required in the spatial-like datasets. If None, cells are not transformed number_of_markers (int): total number of genes to be included in the simulated dataset. n_reads_x_gene (int): final number of reads/gene required in the spatial-like datasets percentage_of_noise (float): percentage of noise events (%) in relation to the total amounts of cells ms_percentage (float): percentage of cells (%) that are presenting missegmentation
results:: adata_sc(AnnData): AnnData object with the cells after transfroming them into spatial-like datasets

xb.Spage_main module

SpaGE [1] @author: Tamim Abdelaal This function integrates two single-cell datasets, spatial and scRNA-seq, and enhance the spatial data by predicting the expression of the spatially unmeasured genes from the scRNA-seq data. The integration is performed using the domain adaption method PRECISE [2]

References

[1] Abdelaal T., Mourragui S., Mahfouz A., Reiders M.J.T. (2020) SpaGE: Spatial Gene Enhancement using scRNA-seq [2] Mourragui S., Loog M., Reinders M.J.T., Wessels L.F.A. (2019) PRECISE: A domain adaptation approach to transfer predictors of drug response from pre-clinical models to tumors

class xb.Spage_main.PLS(n_components=10)[source]

Bases: object

Implement PLS to make it compliant with the other dimensionality reduction methodology. (Simple class rewritting).

property components_

fit(X, y)[source]

get_components_()[source]

predict(X)[source]

set_components_(x)[source]

transform(X)[source]

class xb.Spage_main.PVComputation(n_factors, n_pv, dim_reduction='pca', dim_reduction_target=None, project_on=0)[source]

Bases: object

Attributes:

n_factors: int: Number of domain-specific factors to compute.
n_pv: int: Number of principal vectors.
dim_reduction_method_source: str: Dimensionality reduction method used for source data.
dim_reduction_target: str: Dimensionality reduction method used for source data.
source_components_numpy.ndarray, shape (n_pv, n_features): Loadings of the source principal vectors ranked by similarity to the target. Components are in the row.
source_explained_variance_ratio_: numpy.ndarray, shape (n_pv): Explained variance of the source on each source principal vector.
target_components_numpy.ndarray, shape (n_pv, n_features): Loadings of the target principal vectors ranked by similarity to the source. Components are in the row.
target_explained_variance_ratio_: numpy.ndarray, shape (n_pv): Explained variance of the target on each target principal vector.
cosine_similarity_matrix_: numpy.ndarray, shape (n_pv, n_pv): Scalar product between the source and the target principal vectors. Source principal vectors are in the rows while target’s are in the columns. If the domain adaptation is sensible, a diagonal matrix should be obtained.

compute_principal_vectors(source_factors, target_factors)[source]

Compute the principal vectors between the already computed set of domain-specific factors, using approach presented in [1,2]. IMPORTANT: Same genes have to be given for source and target, and in same order

Args:

source_factors: np.ndarray, shape (n_components, n_genes): Source domain-specific factors.
target_factors: np.ndarray, shape (n_components, n_genes): Target domain-specific factors.

results:

self: returns an instance of self.

fit(X_source, X_target, y_source=None)[source]

Compute the common factors between two set of data. IMPORTANT: Same genes have to be given for source and target, and in same order

Args:

X_sourcenp.ndarray, shape (n_components, n_genes): Source dataset.
X_targetnp.ndarray, shape (n_components, n_genes): Target dataset.
y_sourcenp.ndarray, shape (n_components, 1) (optional, default to None): Eventual output, in case one wants to give ouput (for instance PLS).

results:

self: returns an instance of self.

transform(X, project_on=None)[source]

Projects data onto principal vectors.

Args:

Xnumpy.ndarray, shape (n_samples, n_genes): Data to project.
project_on: int or bool, default to None: Where data should be projected on. 0 means source PVs, -1 means target PVs and 1 means both PVs. If None, set to class instance value.

results:

Projected data as a numpy.ndarray of shape (n_samples, n_factors).

xb.Spage_main.SpaGE(Spatial_data, RNA_data, n_pv, genes_to_predict=None)[source]

@author: Tamim Abdelaal This function integrates two single-cell datasets, spatial and scRNA-seq, and enhance the spatial data by predicting the expression of the spatially unmeasured genes from the scRNA-seq data.

Args:

Spatial_dataDataframe: Normalized Spatial data matrix (cells X genes).
RNA_dataDataframe: Normalized scRNA-seq data matrix (cells X genes).
n_pvint: Number of principal vectors to find from the independently computed principal components, and used to align both datasets. This should be <= number of shared genes between the two datasets.
genes_to_predictstr array: list of gene names missing from the spatial data, to be predicted from the scRNA-seq data. Default is the set of different genes (columns) between scRNA-seq and spatial data.

results:

Imp_Genes: Dataframe: Matrix containing the predicted gene expressions for the spatial cells. Rows are equal to the number of spatial data rows (cells), and columns are equal to genes_to_predict, .

xb.Spage_main.gene_imputation(adata, sc_adata, new_genes: list)[source]: Function to impute genes using SpaGe

xb.Spage_main.leave_one_out_validation(adata, sc_adata, genes: list)[source]: Function to validate the imputation of genes using SpaGe

xb.Spage_main.process_dim_reduction(method='pca', n_dim=10)[source]

Default linear dimensionality reduction method. For each method, return a BaseEstimator instance corresponding to the method given as input.

Args:

method: str, default to ‘pca’: Method used for dimensionality reduction. Implemented: ‘pca’, ‘ica’, ‘fa’ (Factor Analysis), ‘nmf’ (Non-negative matrix factorisation), ‘sparsepca’ (Sparse PCA).
n_dim: int, default to 10: Number of domain-specific factors to compute.

results:

Classifier, i.e. BaseEstimator instance