xb package
xb.neighborhood module
- xb.neighborhood.nhood_squidpy(adata, sample_key='sample', radius=50, cluster_key='leiden', save=True, plot_path='./', cmap='inferno', vmax=None, vmin=None)[source]
Compute neighborhood enrichment based on Squidpy’s function
- Args:
adata (AnnData): AnnData object with the cells of the experiment.
sample_key (str): name of the column where the sample each cell belongs to is specify. It should be a column present in adata.obs.
radius (int): radius to consider when compuing the spatial neighbors, specified in the scale that adata.obsm[‘spatial’] is in (typically um).
cluster_key (str): name of the column where the cell type of each cell is specified. The neighborhood enrichment will be computed based on this groups.
save (Boolean): specify whether the resulting plot should be saved in the paths specified in ‘plot_path’ or not.
cmap (str): name of the colormap used to plot the neighborhood enrichment plot.
vmax (int): maximum value to show in the neighborhood enrcihment plot.
vmin (int): minimum value to show in the neighborhood enrichment plot.
- results:
adata1: AnnData object with the neighborhood enrichment scores computed.
xb.plotting module
- xb.plotting.generate_hex_colors(num_colors=70)[source]
Generate a list of hex colors.
- Args:
num_colors(int): number of colors to generate.
- results:
hex_colors (list):list of randomly generated colors.
- xb.plotting.map_of_clusters(adata, key='leiden', clusters='all', size=8, background='white', figuresize=(10, 7), save=None, format='pdf')[source]
Make spatial plots based on a given adata object.
- Args:
key (str): the terms in adata.obs that you want to plot.
clusters (str or list):’all’ for plotting all clusters in a single plot, ‘individual’: for plots of individual genes, or [‘3’,’5’] (your groups between square brackets to plot only some clusters.
size: to change the size of your spots.
background (str): color of the background.
figuresize (tupple): to specify the size of your figure.
save (boolean or str): whether want to save your figure. If so, please add the PATH of the folder where you want to save it.
format (str): specify the format in which you want to save your figure (i.e. ‘.pdf’, ‘.png’).
- results:
None.
- xb.plotting.plot_cell_counts(adata, plot_path: str, save=True, clustering_params={})[source]
Plot the histogram of the counts detected per cell
- Args:
adata (AnnData): AnnData object with the information of cells profiled.
plot_path (str): path where to save the generated plot, if needed.
save (boolean): whether to save or not the output path.
clustering_params (dict): list of parameters used for preprocessing and clustering the experiment.
- results:
None.
- xb.plotting.plot_domains(adata, groupby='nbd_domain')[source]
Generate the spatial plots of the domains previously identified
- Args:
adata (AnnData): AnnData object with the information of cells profiled.
groupby (str): Name of the column in adata.obs where the domain information is stored.
- results:
None
xb.calculating module
- xb.calculating.alphashape_fun(points, alpha=0.1)[source]
Caculate area of a a cell
- Args:
points (list of tuple): list of xy points found in a cell (i.e. [(1,2),(2,4)]).
alpha (int): alpha parameter to be tuned to define cell border.
- results:
area(flaot): Area of the cell.
- xb.calculating.coexpression_calculation(exp, min_exp=0)[source]
Caculate coexpression between genes in a given dataset
- Args:
exp (DataFrame): expression of cells profiled in a cell x gene format, where cells are rows and genes are columns.
min_exp (float): Maximum expression of the cells to be considered as not expressing a gene (typically is 0).
- results:
coexpression(DataFrame): coexpression DataFrame represented as a gene-by-gene matrix.
- xb.calculating.compute_fmi(ground_truth, predicted)[source]
Compute fowlkes mallows index for two different clusterings
- Args:
ground_truth (list): list of reference clusters given to cells profiled.
predicted (list): list of predicted/computed clusters for cells profiled.
- results:
fmi_score(float): fowlkes mallows index.
- xb.calculating.compute_nmi(ground_truth, predicted)[source]
Compute normalized mutual information score for two different clusterings
- Args:
ground_truth (list): list of reference clusters given to cells profiled.
predicted (list): list of predicted/computed clusters for cells profiled.
- results:
nmi_score(float): normalized mutual information.
- xb.calculating.compute_vi(ground_truth, predicted)[source]
Compute variation of information for comparing two different clusterings
- Args:
ground_truth (list): list of reference clusters given to cells profiled.
predicted (list): list of predicted/computed clusters for cells profiled.
- results:
vi_score(float): variation of information.
- xb.calculating.dispersion(reads_original, adata1)[source]
Calculate the distance between each read and its assigned cell
- Args:
reads_original(DataFrame): information of all profiled reads.
adata1(AnnData): object with the expression and metadata of cells profiled, including spatial position.
- results:
reads_assigned (DataFrame): information of all profiled reads, includinf distance to its closest cell.
- xb.calculating.dist_nuc(reads_ctdsub)[source]
Compute the median distance to the nuclei the edges of each cell, for all cells profiled
- Args:
reads_ctdsub (DataFrame): Dataframe containing the information of the transcripts profiled, incuding their location in ‘x_location’ and ‘y_location’, as well as the cell they are assigned to, in ‘cell_id’.
- results:
median_dist(float): Median distance of cell edges for all cells profiled.
- xb.calculating.distance_calc(x1, y1, x2, y2)[source]
Calculate distance between two points
- Args:
x1(float): x coordinate of the first point.
y1(float): y coordinate of the first point.
x2(float): x coordinate of the second point.
y2(float): y coordinate of the second point.
- results:
distance (float): distance between the two points.
- xb.calculating.domainassign(plsin, adatadom)[source]
Assign cells to domains based on predefined polygons
- Args:
plsin (DataFrame): Information of polygons defining domains.
adatadom (AnnData): cells profiled spatially in an AnnData object and with information of their spatial location in [‘x_centroid’] and [‘y_centroid’].
- results:
None.
- xb.calculating.entropy(clustering)[source]
Compute entropy
- Args:
clustering (list): list of clusters assigned to cells.
- results:
entropy_value(float): entropy value computed.
- xb.calculating.hex_to_rgb(value)[source]
Transform hex to rgb
- Args:
value (str): hex code to be transform (i.e ‘#h4a4a2’).
- results:
rgb_value(tuple): Rgb value.
- xb.calculating.negative_marker_purity_coexpression(adata_sp: AnnData, adata_sc: AnnData, key: str = 'celltype', pipeline_output: bool = True, minexp: float = 0.0)[source]
Negative marker purity aims to measure read leakeage between cells in spatial datasets.
For this, we calculate the increase in reads assigned in spatial datasets to pairs of genes-celltyes with no/very low expression in scRNAseq
- Args:
adata_sp : AnnData; Annotated
AnnDataobject with counts from spatial data.adata_sc : AnnData; Annotated
AnnDataobject with counts scRNAseq data.key : str; Celltype key in adata_sp.obs and adata_sc.obs.
pipeline_output : float, optional; Boolean for whether to use the function in the pipeline or not.
- returns:
negative marker purity : float; Increase in proportion of reads assigned in spatial data to pairs of genes-celltyes with no/very low expression in scRNAseq.
- xb.calculating.svf_moranI(adata1, sample_key='sample', radius=50.0)[source]
Compute spatially variable features using Moran’s I (squidpy implementation)
- Args:
adata1 (AnnData): AnnData object of profiled cells.
sample_key (str): Column of adata1.obs where sample of origin of each cell is stored.
radius (float): Radius usd to compute the spatial neighbors in sq.gr.spatial_neighbors. Given in the scale the spatial coordinates are in (typically in um).
- results:
adata1 (AnnData): AnnData object of profiled cells with computed svf’s.
hs_results(DataFrame): DataFrame with the results of computing moran’s I for each gene in the given input dataset, including pval, FDR and ranking of the gene.
xb.comparing module
- xb.comparing.combine_med(medians, tag)[source]
Combine precomputed medians into a single dataframe
- Args:
medians(list): list of precomputed medians of expression.
tag(str): tag to be added as a column to the list of medians. In here, this is the methods the medians where computed from.
- results:
mm(DataFrame): formated medians into a DataFrame.
- xb.comparing.median_calculator(adata_dict, df_filt)[source]
Calculate medians expression for cells profiled with each technology compared to a reference single cell RNAseq dataset
- Args:
adata_dict (dict): dictionary including the names of the datasets analyzed as .keys() and AnnData’s of each technologies as .values(). It includes a reference scRNAseq dataset in ‘anno_scRNAseq’.
df_filt(DataFrame): dataframe including the list of genes to be compared in .index.
- results:
means(dict): dictionary of means computed with names of the datasets in .keys() and a list of medians computed as .values().
genes_s(dict):dictionary of gene name of the means computed with names of the datasets in .keys() and a list of neme of the genes that have been used to compute medians computed as .values().
xb.domain_identification module
- xb.domain_identification.adapt_banksy_for_multisample(adata, samplekey='sample')[source]
Modify the spatial coordinates of each sample in adata so that they can be later be processed together by banksy
- Args:
adata (AnnData): AnnData object with the cells of the experiment.
samplekey(str): name of the column in adata.obs where the sample of origin of each cell is stored.
- results:
adata (AnnData): AnnData object with the cells of the experiment with modified adata.obs[‘spatial’], ready to perform banksy.
- xb.domain_identification.compare_domains(adata, domain_keys: list, save=True, plot_path='./')[source]
Compare domains assigned by different methods using ARI. Generate heatmap comparing them
- Args:
adata (AnnData): AnnData object with the cells of the experiment.
domain_keys(list): list of the column names in adata.obs where domains are stored.
save(boolean): whether to save plots on not.
plot_path(str): path to the folder where to save the resulting plots.
- results:
ARI (DataFrame): DataFrame consisting of ARI computed between domain idenification methods.
- xb.domain_identification.define_palette(n_colors=50)[source]
Create a random palette of colors in hex format
- Args:
n_colors(str): number of colors to be inclued in the palette.
- results:
colorlist(list): list of generated colors in hex format.
- xb.domain_identification.domains_by_banksy(adata, plot_path: str, banksy_params: dict, save=True)[source]
Modify the spatial coordinates of each sample in adata so that they can be later be processed together by banksy
- Args:
adata (AnnData): AnnData object with the cells of the experiment where Banksy will be computed.
save(boolean): whether to save the resulting object or not.
plot_path(str): path where to save the plots generated, if desired.
banksy_params(dict): parameters required to perform banksy.
- results:
adata (AnnData): Original AnnData object with the cells of the experiment with domains identified assigned to cells.
adata_res (AnnData): AnnData object resulting of the identification of domains. It contains all intermediate information generated by Banksy.
- xb.domain_identification.domains_by_nbd(adata, hyperparameters_nbd: dict)[source]
Define cellular domains by collapsing using the cellular identity of neighboring cell types and clustering Args:
adata (AnnData): AnnData object with the cells of the experiment.
hyperparameters_nbd(dict): dictionary with all the parameters required to identify domains based on neighbors (neighbors based domains).
- results:
adata (AnnData): Original AnnData object with expression of cells and the domain identified incorporated in a column in adata.obs.
adataneigh (AnnData): AnnData object where domains have been identified. Cells here include the identity of neighboring cells.
- xb.domain_identification.domains_by_rbd(adata, hyperparameters_rbd: dict)[source]
Define cellular domains by collapsing the expression of cells arround each cell (a.k.a pseudobining) and clustering Args:
adata (AnnData): AnnData object with the cells of the experiment.
hyperparameters_rbd(dict): dictionary with all the parameters required to identify domains based on reads (read based domains).
- results:
adata (AnnData): Original AnnData object with expression of cells and the domain identified incorporated in a column in adata.obs.
adataneigh (AnnData): AnnData object where domains have been identified. Cells here include the expression of neighboring cells collapsed into them.
- xb.domain_identification.format_data_neighs(adata, sname, neighs=10)[source]
Redefine the expression of cells in adata by counting the neighnoring cell types of each cell
- Args:
adata (AnnData): AnnData object with the cells of the experiment.
sname(str): column in adata.obs where the cluster assigned to each cells are stored.
neighs(int): number of neighbors to consider when computing neighboring cells.
- results:
adata1 (AnnData): AnnData object with neighboring cell types included in a cell-by-celltype matrix.
- xb.domain_identification.format_data_neighs_colapse(adata, condit, neighs=10)[source]
Redefine the expression of cells in adata by collapsing the expression of its neighbors into each cell (a.k.a pseudobining)
- Args:
adata (AnnData): AnnData object with the cells of the experiment.
condit(str): column in adata.obs where the sample each cell belongs to is stored.
neighs(int): number of neighbors to consider when collapsing the expression of neighboring cells.
- results:
adata1 (AnnData): AnnData object with expression of cells collapsed from neighboring cells.
- xb.domain_identification.spatial_plot(adata, groupby='nbd_domain', save=False, plot_path='./')[source]
Generate spatial plot of each sample in an AnnData object, with cells color as required Args:
adata (AnnData): AnnData object with the cells of the experiment.
groupby(str): name of the column in adata.obs to use to color cells.
save(boolean):whether to save the resulting plots or not.
plot_path(str): if required, path where to save the resulting plots.
- results:
None.
xb.formatting module
- xb.formatting.batch_prep_xenium_data_for_baysor(files, outpath, CROP=True, COORDS=[1000, 5000, 1000, 5000])[source]
Running the function prep_xenium_data_for_baysor for multiple samples
- Args:
files(list): list including the paths where the Xenium outputs are saved for each sample (output from the machine).
outpath(str): path where to store the resulting adata object.
CROP(boolean): whether to use a small Region of interest for segmentation.
COORDS(list): if CROP is used, coordinates of the crop in the form of [YMIN,YMAX,XMIN,XMAX].
- results:
None.
- xb.formatting.cell_area(adata_sp: AnnData, pipeline_output=True)[source]
Calculates the area of the region imaged using convex hull and divide total number of cells/area. XY position should be in um2
- Args:
adata_sp : AnnData, annotated
AnnDataobject with counts from spatial data.pipeline_output : float, optional, boolean for whether to create the pipeline output.
- results:
density : float
Cell density (cells/um)
- xb.formatting.format_background(path)[source]
Format OME-TIFF background mipped image to .tiff image
- Args:
path(str): path to the folder where the output of the Xenium machine is stored.
- results:
None
- xb.formatting.format_baysor_output_to_adata(path: str, output_path: str)[source]
Format baysor’s output to anndata
- Args:
path (AnnData): path to the folder where baysor’s output is stored output_path(str): path where to store the generated adata
- results:
adata (AnnData): AnnData object with the cells of the experiment
- xb.formatting.format_data_neighs(adata, sname, condit, neighs=10)[source]
Redefine the expression of cells in adata by counting the neighnoring cell types of each cell
- Args:
adata (AnnData): AnnData object with the cells of the experiment.
sname(str): column in adata.obs where the cluster assigned to each cells are stored.
neighs(int): number of neighbors to consider when computing neighboring cells.
- results:
adata1 (AnnData): AnnData object with neighboring cell types included in a cell-by-celltype matrix.
- xb.formatting.format_data_neighs_colapse(adata, sname, condit, neighs=10)[source]
Redefine the expression of cells in adata by collapsing the expression of its neighbors into each cell (a.k.a pseudobining)
- Args:
adata (AnnData): AnnData object with the cells of the experiment.
sname(str): column in adata.obs where sample is stored.
condit(str): column in adata.obs where the sample each cell belongs to is stored.
neighs(int): number of neighbors to consider when collapsing the expression of neighboring cells.
- results:
adata1 (AnnData): AnnData object with expression of cells collapsed from neighboring cells.
- xb.formatting.format_to_adata(files: list, output_path: str, use_parquet=True, save=False, max_nucleus_distance=0, min_quality=10)[source]
Format xenium datasets (outputs from the machine, up to date 2024) to adata files and filter reads based on quality parameters
- Args:
files(list): list including the paths where the Xenium outputs are saved for each sample (output from the machine).
output_path(str): path where to store the resulting adata object.
use_parquet(boolean): whether to use parquet files as an input to generate the AnnData File. (it’s way faster).
save(boolean): whether to save the resulting object.
max_nucleus_distance: Maximum distance from the nuclei for reads to be kept in redefined cells.
min_quality(float): Define minimum quality (qv) of reads to keep in the analysis.
- results:
adata: AnnData object with the formated cells with only reads that passed the filters established.
- xb.formatting.format_xenium_adata(path, tag, output_path)[source]
Format xenium data (output from the machine) to adata format, using the original Xenium format (pre-release)
- Args:
path(str): path to the folder where the output of the Xenium machine is stored.
tag(str): sample tag to be added to be added to all cells formated from the section.
output_path(str): path where to store the resulting adata object.
- results:
adata: AnnData object with the formated cells
- xb.formatting.format_xenium_adata_2023(path, tag, output_path)[source]
Format xenium data (output from the machine) to adata format, considerin the format used by Xenium in Q1 2023
- Args:
path(str): path to the folder where the output of the Xenium machine is stored.
tag(str): sample tag to be added to be added to all cells formated from the section.
output_path(str): path where to store the resulting adata object.
- results:
adata: AnnData object with the formated cells.
- xb.formatting.format_xenium_adata_final(path, tag, output_path, use_parquet=True, save=True)[source]
Format xenium data (output from the machine) to adata format using the official up-to-date Xenium format
- Args:
path(str): path to the folder where the output of the Xenium machine is stored, if requested.
tag(str): sample tag to be added to be added to all cells formated from the section.
output_path(str): path where to store the resulting adata object.
use_parquet(boolean): whether to use parquet files as an input to generate the AnnData File. (it’s way faster).
save(boolean): whether to save the resulting object.
- results:
adata: AnnData object with the formated cells
- xb.formatting.format_xenium_adata_mid_2023(path, tag, output_path)[source]
Format xenium data (output from the machine) to adata format, considerin the format used by Xenium at Q2 2023
- Args:
path(str): path to the folder where the output of the Xenium machine is stored.
tag(str): sample tag to be added to be added to all cells formated from the section.
output_path(str): path where to store the resulting adata object.
- results:
adata: AnnData object with the formated cells.
- xb.formatting.generate_random_color_variation(base_color, deviation=0.17)[source]
Generate variations of a reference color
- Args:
base_color (str):reference hex color.
deviation(float): deviation from the base color that the resulting color should have.
- results:
modified_hex_color(str):resulting hex color.
- xb.formatting.keep_nuclei(adata1, overlaps_nucleus=1)[source]
Redefine cells in AnnData to keep only nuclear reads
- Args:
adata1(AnnData): AnnData object with the cells of the experiment.
overlaps_nucleus(int): whether to keep only nuclear reads only (1) or cytoplasmic reads (0) in the redefinition of cells.
- results:
adata: AnnData object with the formated cells
- xb.formatting.keep_nuclei_and_quality(adata1, tag: str, max_nucleus_distance=1, min_quality=20, save=True, output_path='')[source]
Redefine cell expression based on nuclei expression an quality of detected reads
- Args:
adata1 (AnnData): AnnData object with the cells of the experiment before filtereing reads based on quality or nuclear/non-nuclear.
tag (str): sample tag to added in the name of the saved filed, if needed.
save(boolean): whether to save the resulting files.
output_path(str): if needed, where to save the resulting files.
max_nucleus_distance(float): Maximum distance from the nuclei for reads to be kept in redefined cells.
min_quality(float): Define minimum quality (qv) of reads to keep in the analysis.
- results:
adata1nuc(AnnData): AnnData object with the cells redefined based to input parameters.
- xb.formatting.prep_xenium_data_for_baysor(XENIUM_DIR: str, OUT_DIR: str, CROP=True, COORDS=[15000, 16000, 15000, 16000])[source]
Format xenium datasets for its use for baysor segmentation
- Args:
XENIUM_DIR(list): path where the Xenium output is saved for each sample (output from the machine).
OUT_DIR(str): path where to store the resulting adata object.
CROP(boolean): whether to use a small Region of interest for segmentation.
COORDS(list): if CROP is used, coordinates of the crop in the form of [YMIN,YMAX,XMIN,XMAX].
- results:
None.
xb.preprocessing module
- xb.preprocessing.main_preprocessing(adata, target_sum=100, mincounts=10, mingenes=3, neigh=15, npc=0, nuc=1, scale=False, hvg=False, default=False, total_clusters=30, norm=True, lg=True)[source]
Preprocess and cluster the cells in adata, given the parameters specified. This function is mainly used for simulating the performance of different preprocessing strategies
- Args:
adata (AnnData): AnnData object with the cells of the experiment.
norm(boolean): Whether to normalize based cells or not.
target_sum(int or None): Target sum to use if the normalization is done based on library size. None is used for automatic calculation of library size.
lg(boolean): Whether to log-transforms cells.
mincounts (int): Minimum amount of counts detected in a cell to pass the quality filters.
mingenes (int): Minimum amount of genes expressed in a cell to pass the quality filters.
neigh(int): number of neighbors to used when calculating the nearest neighbors by sc.pp.neighbors().
npc(int): number of principal components to used when calculating the nearest neighbors by sc.pp.neighbors().
scale(boolean): whether to scale the data or not.
hvg(boolean): whether to select highly variable genes for further processing or not.
total_clusters (int): number of clusters to obtain in the process of clustering (+-2).
default(boolean): whether the run is the original one or not.
nuc(int): DEPRECATED. NOT USED IN THIS FUNCTION.
- results:
adata: AnnData object with the preprocessed and clustered cells according to the parameters specified.
- xb.preprocessing.preprocess_adata(adata, save=True, clustering_params={}, output_path='output_path')[source]
Preprocess and cluster the cells in adata given the parameters specified.
- Args:
adata (AnnData): AnnData object with the cells of the experiment.
save (boolean):whether to save or not the adata object once it has been processed.
clustering_params(dict): Dictionary where main preprocessing and clustering parameters are inputed.
output_path(str): path where to save the adata object in case that option is selected.
- results:
adata: AnnData object with the preprocessed and clustered cells according to the parameters specified.
xb.simulating module
- xb.simulating.allcombs(adata)[source]
Simulate preprocessing workflows and extract results based on it
- Args:
adata (AnnData): AnnData object with the cells of the experiment.
- results:
allres(DataFrame): Clustering obtained with different preprocessing workflows.
- xb.simulating.allcombs_simulated(adata, default_key='class')[source]
Simulate preprocessing workflows and extract results based on it for simulated data
- Args:
adata (AnnData): AnnData object with the cells of the experiment.
default_key(str): name of the column in adata.obs where the reference cell types/clusters are stored.
results: allres(DataFrame): Clustering obtained with different preprocessing workflows.
- xb.simulating.compute_fmi(ground_truth, predicted)[source]
Compute fowlkes mallows index for two different clusterings
- Args:
ground_truth (list): list of reference clusters given to cells profiled.
predicted (list): list of predicted/computed clusters for cells profiled.
- results:
fmi_score(float): fowlkes mallows index
- xb.simulating.compute_vi(ground_truth, predicted)[source]
Compute variation of information for comparing two different clusterings
- Args:
ground_truth (list): list of reference clusters given to cells profiled.
predicted (list): list of predicted/computed clusters for cells profiled.
- results:
vi_score(float): variation of information.
- xb.simulating.entropy(clustering)[source]
Compute entropy
- Args:
clustering (list): list of clusters assigned to cells.
- results:
entropy_value(float): entropy value computed.
- xb.simulating.keep_nuclei_and_quality(adata1, overlaps_nucleus=1, qvmin=20)[source]
Redefine cell expression based on nuclei expression an quality of detected reads
- Args:
adata1 (AnnData): AnnData object with the cells of the experiment before filtereing reads based on quality or nuclear/non-nuclear.
overlaps_nucleus(int): Keep reads overlapping nucleus only (1) or all (2).
qvmin(int): Define minimum quality (qv) of reads to keep in the analysis.
- results:
adata1nuc(AnnData): AnnData object with the cells redefined based to input parameters.
- xb.simulating.main_preprocessing(adata, target_sum=100, mincounts=10, mingenes=3, neigh=15, npc=0, nuc=1, scale=False, hvg=False, default=False, total_clusters=30, default_resol=1.6, logstatus=True, normstatus=True)[source]
preprocess and cluster cells in an Anndata object given some input parameters
- Args:
adata(AnnData): AnnData object with the cells of the experiment before simulating the missegmentation.
target_sum(int or None): Target sum to use if the normalization is done based on library size. None is used for automatic calculation of library size.
mincounts (int): Minimum amount of counts detected in a cell to pass the quality filters.
mingenes (int): Minimum amount of genes expressed in a cell to pass the quality filters.
neigh(int): number of neighbors to used when calculating the nearest neighbors by sc.pp.neighbors().
npc(int): number of principal components to used when calculating the nearest neighbors by sc.pp.neighbors().
nuc(int): wether to use only nuclear reads (1) or all reads (0).
scale(boolean): whether to scale the data or not.
hvg(boolean): whether to select highly variable genes for further processing or not.
default(boolean): whether the run is the original one or not.
total_clusters (int): number of clusters to obtain in the process of clustering (+-2).
default_resol(float): clustering resolution to use as a default when clustering.
logstatus(boolean): Whether to log-transforms cells.
normstatus(boolean): Whether to normalize based cells or not.
- results:
adata(AnnData): AnnData object after preprocessing and clustering.
- xb.simulating.missegmentation_simulation(adata_sc_sub, missegmentation_percentage=0.1)[source]
Simulate missegmentation using a reference single cell data in adata form.
- Args:
adata_sc_sub (AnnData): AnnData object with the cells of the experiment before simulating the missegmentation missegmentation_percentage (float): percentage of cells (%) that are presenting missegmentation
- results:
adata_sc_sub(AnnData): AnnData object with the cells where missegmentation has been simulated according to input parameters
- xb.simulating.noise_adder(adata_sc, percentage_of_noise=0.1)[source]
Add noise to a single cell data inputed according to input parameters
- Args:
adata_sc (AnnData): AnnData object with the cells of the experiment before adding noise percentage_of_noise (float): percentage of noise events (%) in relation to the total amounts of cells
- results:
adata_sc(AnnData): AnnData object with the cells where noise has been added
- xb.simulating.subset_of_single_cell(adata_sc_sub, markers, random_markers_percentage=0, reads_x_cell=None, number_of_markers=200, n_reads_x_gene=40, percentage_of_noise=0.1, ms_percentage=0.1)[source]
Transform a single cell data to present spatial characteristics
- Args:
adata_sc_sub (AnnData): AnnData object with the cells obtained from single cell datasets before transforming them into spatial-like datasets markers (DataFrame): dataframe incluing the main markers identified per cluster per cluster random_markers_percentage (float): percentage of non-marker genes included randomly in the genes selected for the panel reads_x_cell=None n_reads_x_gene (int,None): if int, final number of reads/cells required in the spatial-like datasets. If None, cells are not transformed number_of_markers (int): total number of genes to be included in the simulated dataset. n_reads_x_gene (int): final number of reads/gene required in the spatial-like datasets percentage_of_noise (float): percentage of noise events (%) in relation to the total amounts of cells ms_percentage (float): percentage of cells (%) that are presenting missegmentation
- results:
adata_sc(AnnData): AnnData object with the cells after transfroming them into spatial-like datasets
xb.Spage_main module
SpaGE [1] @author: Tamim Abdelaal This function integrates two single-cell datasets, spatial and scRNA-seq, and enhance the spatial data by predicting the expression of the spatially unmeasured genes from the scRNA-seq data. The integration is performed using the domain adaption method PRECISE [2]
References
[1] Abdelaal T., Mourragui S., Mahfouz A., Reiders M.J.T. (2020) SpaGE: Spatial Gene Enhancement using scRNA-seq [2] Mourragui S., Loog M., Reinders M.J.T., Wessels L.F.A. (2019) PRECISE: A domain adaptation approach to transfer predictors of drug response from pre-clinical models to tumors
- class xb.Spage_main.PLS(n_components=10)[source]
Bases:
objectImplement PLS to make it compliant with the other dimensionality reduction methodology. (Simple class rewritting).
- property components_
- class xb.Spage_main.PVComputation(n_factors, n_pv, dim_reduction='pca', dim_reduction_target=None, project_on=0)[source]
Bases:
object- Attributes:
- n_factors: int
Number of domain-specific factors to compute.
- n_pv: int
Number of principal vectors.
- dim_reduction_method_source: str
Dimensionality reduction method used for source data.
- dim_reduction_target: str
Dimensionality reduction method used for source data.
- source_components_numpy.ndarray, shape (n_pv, n_features)
Loadings of the source principal vectors ranked by similarity to the target. Components are in the row.
- source_explained_variance_ratio_: numpy.ndarray, shape (n_pv)
Explained variance of the source on each source principal vector.
- target_components_numpy.ndarray, shape (n_pv, n_features)
Loadings of the target principal vectors ranked by similarity to the source. Components are in the row.
- target_explained_variance_ratio_: numpy.ndarray, shape (n_pv)
Explained variance of the target on each target principal vector.
- cosine_similarity_matrix_: numpy.ndarray, shape (n_pv, n_pv)
Scalar product between the source and the target principal vectors. Source principal vectors are in the rows while target’s are in the columns. If the domain adaptation is sensible, a diagonal matrix should be obtained.
- compute_principal_vectors(source_factors, target_factors)[source]
Compute the principal vectors between the already computed set of domain-specific factors, using approach presented in [1,2]. IMPORTANT: Same genes have to be given for source and target, and in same order
- Args:
- source_factors: np.ndarray, shape (n_components, n_genes)
Source domain-specific factors.
- target_factors: np.ndarray, shape (n_components, n_genes)
Target domain-specific factors.
- results:
self: returns an instance of self.
- fit(X_source, X_target, y_source=None)[source]
Compute the common factors between two set of data. IMPORTANT: Same genes have to be given for source and target, and in same order
- Args:
- X_sourcenp.ndarray, shape (n_components, n_genes)
Source dataset.
- X_targetnp.ndarray, shape (n_components, n_genes)
Target dataset.
- y_sourcenp.ndarray, shape (n_components, 1) (optional, default to None)
Eventual output, in case one wants to give ouput (for instance PLS).
- results:
self: returns an instance of self.
- transform(X, project_on=None)[source]
Projects data onto principal vectors.
- Args:
- Xnumpy.ndarray, shape (n_samples, n_genes)
Data to project.
- project_on: int or bool, default to None
Where data should be projected on. 0 means source PVs, -1 means target PVs and 1 means both PVs. If None, set to class instance value.
- results:
Projected data as a numpy.ndarray of shape (n_samples, n_factors).
- xb.Spage_main.SpaGE(Spatial_data, RNA_data, n_pv, genes_to_predict=None)[source]
@author: Tamim Abdelaal This function integrates two single-cell datasets, spatial and scRNA-seq, and enhance the spatial data by predicting the expression of the spatially unmeasured genes from the scRNA-seq data.
- Args:
- Spatial_dataDataframe
Normalized Spatial data matrix (cells X genes).
- RNA_dataDataframe
Normalized scRNA-seq data matrix (cells X genes).
- n_pvint
Number of principal vectors to find from the independently computed principal components, and used to align both datasets. This should be <= number of shared genes between the two datasets.
- genes_to_predictstr array
list of gene names missing from the spatial data, to be predicted from the scRNA-seq data. Default is the set of different genes (columns) between scRNA-seq and spatial data.
- results:
- Imp_Genes: Dataframe
Matrix containing the predicted gene expressions for the spatial cells. Rows are equal to the number of spatial data rows (cells), and columns are equal to genes_to_predict, .
- xb.Spage_main.gene_imputation(adata, sc_adata, new_genes: list)[source]
Function to impute genes using SpaGe
- xb.Spage_main.leave_one_out_validation(adata, sc_adata, genes: list)[source]
Function to validate the imputation of genes using SpaGe
- xb.Spage_main.process_dim_reduction(method='pca', n_dim=10)[source]
Default linear dimensionality reduction method. For each method, return a BaseEstimator instance corresponding to the method given as input.
- Args:
- method: str, default to ‘pca’
Method used for dimensionality reduction. Implemented: ‘pca’, ‘ica’, ‘fa’ (Factor Analysis), ‘nmf’ (Non-negative matrix factorisation), ‘sparsepca’ (Sparse PCA).
- n_dim: int, default to 10
Number of domain-specific factors to compute.
- results:
Classifier, i.e. BaseEstimator instance