- Видео 69
- Просмотров 881 354
Sanbomics
США
Добавлен 11 апр 2021
Bioinformatics tutorials with a focus on next-generation sequencing analysis. Topics cover RNAseq, single-cell RNAseq, linux/shell usage, python, R, phylogenetics, alignments, and more. I am a PhD student with over 12 years of programing experience and 7 years of genomics, sequencing, and bioinformatics experience. I have multiple peer reviewed scientific publications as first author.
Twitter: @Sanbomics
Threads: @Sanbomics
blueksy: @sanbomics.bsky.social
Twitter: @Sanbomics
Threads: @Sanbomics
blueksy: @sanbomics.bsky.social
2024 updated single-cell guide - Part 2: RNA Integration and annotation
In this video I integrate the single-cell RNA data together with scVI and use multiple methods of label transfer from reference datasets. I then verify and annotate the individual clusters using known marker genes. This video covers advanced analysis steps, such as tuning hyperparameters in our scVI model, making custom reference datasets, and more.
Main notebook:
github.com/mousepixels/sanbomics_scripts/blob/main/sc2024/annotation_integration.ipynb
Example of bad mapping:
github.com/mousepixels/sanbomics_scripts/blob/main/sc2024/bad_mapping.ipynb
Part 1:
ruclips.net/video/cmOlCTGX4Ik/видео.html
0:00 Celltypist transfer
13:49 scVI transfer
21:13 Integration
29:22 Dim reduction
34:00 Annotation
Main notebook:
github.com/mousepixels/sanbomics_scripts/blob/main/sc2024/annotation_integration.ipynb
Example of bad mapping:
github.com/mousepixels/sanbomics_scripts/blob/main/sc2024/bad_mapping.ipynb
Part 1:
ruclips.net/video/cmOlCTGX4Ik/видео.html
0:00 Celltypist transfer
13:49 scVI transfer
21:13 Integration
29:22 Dim reduction
34:00 Annotation
Просмотров: 3 755
Видео
2024 updated single-cell guide - Part 1: RNA preprocessing and quality control
Просмотров 7 тыс.2 месяца назад
This is a comprehensive tutorial on the most up-to-date recommendations for single-cell sequencing. This is part 1 of a multi-part series. Here I download a dataset, remove background RNA, preform quality control, and remove low quality cells. Part 2 will cover dimension reduction and cell annotation. We will eventually get to in-depth analysis and scATAC analysis. Notebook: github.com/mousepix...
Single-cell pseudotime and gene regulatory analysis with CellOracle
Просмотров 3,5 тыс.6 месяцев назад
CellOracle is a powerful suite of tools that can perform pseudotime analysis, gene regulatory network analysis, and in silico perturbation analysis on single-cell data in python. This is a simple tutorial covering the basics of CellOracle while analyzing a developmental pancreas dataset. Notebook: github.com/mousepixels/sanbomics_scripts/blob/main/celloracle_pseudotime_GRN.ipynb Reference: www....
Processing single-cell RNAseq counts with simpleaf (alevin-fry)
Просмотров 2,2 тыс.9 месяцев назад
Simpleaf is a faster and more efficient alternative to other counters, such as cellranger, and it works with other single-cell chemistries. It is a wrapper for Alevin-fry and is made by the same lab that created the Salmon RNAseq aligner. Simpleaf is still in development as of making this video. Do not be surprised if the workflow changes slightly in the future. Github notes: github.com/mousepi...
RNAseq mapping with Salmon for differential expression
Просмотров 7 тыс.11 месяцев назад
How do you process raw read data for the purpose of differential expression? In this video I map raw RNAseq reads using Salmon and follow up with differential expression analysis in R with Deseq2. notebook: github.com/mousepixels/sanbomics_scripts/blob/main/salmon_to_deseq.Rmd references: salmon.readthedocs.io/en/latest/salmon.html index preparation: combine-lab.github.io/alevin-tutorial/2019/s...
Pseudobulk single-cell analysis in Python with Scanpy and pyDeseq2
Просмотров 7 тыс.11 месяцев назад
It is now possible to do pseudobulk analysis directly in python on your scanpy object. I create the pseudobulk from single-cell data then analyze it with the python port of Deseq2. Notebook: github.com/mousepixels/sanbomics_scripts/blob/main/pseudobulk_pyDeseq2.ipynb
Differential expression in Python with pyDESeq2
Просмотров 18 тыс.Год назад
Analyze RNAseq counts data with a Python implementation of DESeq2. I cover basic differential expression analysis, PCA plots, GSEA, heatmaps, and volcano plots. Github: github.com/mousepixels/sanbomics_scripts/blob/main/PyDeseq2_DE_tutorial.ipynb The samples include normal human cell control and replicative senescence cells from NCBI accession GSE171663 0:00 Intro 0:30 Differential expression 7...
Single-cell background decontamination in R and Python with SoupX
Просмотров 4,8 тыс.Год назад
SoupX is an essential tool for ambient RNA decontamination in single-cell RNA sequencing data. Ambient RNA in solution is partitioned into droplets and confounds downstream analyses. This concise tutorial covers SoupX implementation for both R (Seurat objects) and Python (Scanpy objects), offering step-by-step guidance and expert insights to improve data quality and accuracy in your single-cell...
Can chatGPT do single-cell bioinformatic analysis?
Просмотров 26 тыс.Год назад
Here I test if chatGPT with the GPT-4 model can do basic single-cell RNA analysis. In short, the results are impressive.
Comparing single-cell RNA integration methods | Which is the best?
Просмотров 9 тыс.Год назад
Which single-cell integration method is the best? In this video I compare 5 different methods using 3 different challenging integration problems. I test Seurat CCA, Seurat RPCA, SCVI-tools, and Scanorama. I measure time and memory usage and also examine integration outcomes. Github: github.com/mousepixels/sanbomics_scripts/tree/main/integration_comparison Datasets: www.cell.com/cell/fulltext/S0...
Single-cell trajectory and pseudotime analysis with Monocle3 and Seurat in R
Просмотров 14 тыс.Год назад
In this video I perform trajectory analysis in R on a large dataset of cells undergoing dedifferentiation into iPSCs. I use Seurat to load, merge, and preprocess the data. I use Monocle3 to calculate pseudotime and create the trajectories. I go over basic analyses and plotting using Monocle3. Notebook: github.com/mousepixels/sanbomics_scripts/blob/main/monocle3_tutorial.Rmd References: www.cell...
Easy RNAseq volcano plot with one line of code
Просмотров 5 тыс.Год назад
Make a super easy and PRETTY volcano plot from differentially expressed genes with only one line of code. Plotting aesthetic figures can be challenging and/or time consuming. Here I show you how to make a pretty volcano plot without needed much prior coding knowledge. They are also highly customizable for more advanced users.
Applying random forest classifiers to single-cell RNAseq data
Просмотров 6 тыс.Год назад
Learn how to apply machine learning to single-cell data. Random forest is a powerful machine learning classifier and a great tool for analyzing single-cell RNAseq data. In addition to predicting classifications, you can extract the gene importance from the model as a way to identify genes that describe your populations. Here I use several examples to show you how to use the random forest model ...
Introduction to single cell ATAC data analysis in R
Просмотров 14 тыс.Год назад
This is a primer for single cell/nuclei ATAC-seq data analysis. What is single cell ATACseq? How do you perform basic scATAC-seq analysis in R? I describe what scATACseq is. Then I use Seurat and Signac to do data analysis using a recent Nature communications paper. I do preprocessing, clustering, differential accessibility analysis, RNA activity estimation, and I make various plots. Notebook: ...
Complete single-cell RNAseq analysis walkthrough | Advanced introduction
Просмотров 75 тыс.Год назад
This is a comprehensive introduction into single-cell analysis in python. I recreate the main single cell analyses from a recent Nature publication. I explain the basics of single-cell sequencing analysis and also introduce more advanced topics. I cover doublet removal, preprocessing, integration, clustering, cell identification, differential expression, gene-set enrichment, non-parametric stat...
Introduction to spatial sequencing data analysis
Просмотров 8 тыс.Год назад
Introduction to spatial sequencing data analysis
Single-cell gene co-expression | single-cell RNAseq methods
Просмотров 4 тыс.Год назад
Single-cell gene co-expression | single-cell RNAseq methods
3 minute GSEA tutorial in R | RNAseq tutorials
Просмотров 24 тыс.Год назад
3 minute GSEA tutorial in R | RNAseq tutorials
Simple guide to GSEA and plotting in python
Просмотров 9 тыс.Год назад
Simple guide to GSEA and plotting in python
Convert h5ad anndata to a Seurat single-cell R object
Просмотров 9 тыс.Год назад
Convert h5ad anndata to a Seurat single-cell R object
Guide to filtering and subsetting single-cell anndata and pandas objects | basic and advanced
Просмотров 6 тыс.Год назад
Guide to filtering and subsetting single-cell anndata and pandas objects | basic and advanced
Label single-cells automatically in python | scVI label transfer
Просмотров 4,2 тыс.Год назад
Label single-cells automatically in python | scVI label transfer
Beautiful and customizable RNAseq volcano plots
Просмотров 11 тыс.2 года назад
Beautiful and customizable RNAseq volcano plots
How to remove single-cell doublets in python
Просмотров 3,1 тыс.2 года назад
How to remove single-cell doublets in python
Single-cell analysis with scVI machine-learning toolkit
Просмотров 8 тыс.2 года назад
Single-cell analysis with scVI machine-learning toolkit
Single-cell integration in python with scanpy
Просмотров 8 тыс.2 года назад
Single-cell integration in python with scanpy
RNAseq volcano plot of differentially expressed genes
Просмотров 27 тыс.2 года назад
RNAseq volcano plot of differentially expressed genes
How to do gene ontology analysis in python
Просмотров 15 тыс.2 года назад
How to do gene ontology analysis in python
RNAseq analysis | Gene ontology (GO) in R
Просмотров 53 тыс.2 года назад
RNAseq analysis | Gene ontology (GO) in R
Single-cell gene set activity with AUCell
Просмотров 4,9 тыс.2 года назад
Single-cell gene set activity with AUCell
Hello, Would you please make a tut on advance workflow (based on good paper) on sc RNA Seq by using R?
Please make this same tutorial for R🙏
Thank you for the informative tutorial video; it has been immensely beneficial to my scientific research!😄
Glad it was helpful!
great content, languages just different, don't have to be good or bad.
Exactly!
do i have to do this for each sample or what ?
You will have to run it, but not install it for every sample.
1:12, why do we have to predict doublet at each sample separately??
Because every sample is a little different. If there were more cells in one sample then the doublet rate will be higher for that sample. Also samples have different cell types etc
Well it was really awesome. Im still undergreadute so it was little bit hard to understand bioinfo part but python code part was clear. Can you or did you do other integration methods or can you record another video ?
Yeah I actually have a video that compares multiple integration methods: ruclips.net/video/NFA2YGshATs/видео.html
Hi! Thanks so much for such a great tutorial! Have a naïve question of someone who just started in this world: When raw data is not available, for example, you can only download normalised filtered values, do you skip the pre-processing step? Is it correct to pre-process normalised values, let's say tmm? Again, thanks so much for all the videos!
Yeah if there are no raw counts then you will have to skip the ambient removal. Unfortunately, this is the only way sometimes.
Man, you absolutely saved me! Suddenly, everything makes sense now. Subscribed!
:)
Thank you so much for your videos! I am a grad student who recently started a sing cell project and since I found your channel, your explanations and code have been getting me through this tough time. I was wondering if you will be planning on doing cNMF in the future? It is something that I and our lab have had difficulty with. Thanks again!
I can definitely keep that in mind for a future video!
Thank you for this video! I am confused why you can use the Ensembl version 109 for TxImport--but you ran salmon with the Gencode transcripts fasta. Doesn't gencode differ in what transcripts are included versus Ensembl? Or this doesn't really matter?
Hi there, thanks. But I have this error.I changed it to 6 but still I have this error !!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=154478, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 7
Your videos are amazing. Thanks a lot. Could I use 3050 with 64 GB RAM for this kind of analysis? Thanks a lot.
You can do a decent number of cells with 64 gb ram. I would think you could handle around ~200k in memory at the same time without too many issues. Some steps/algoirthms use a lot more memory though so it is highly dependent on what you do. In my experience 64 gb wont be enough for large datasets/atlases but you can def do small numbers of samples.
While this is technically correct, it is not very statistically sound. When adjusting for the bonferroni and bh procedures, we typically change the cutoff point, not the actual p-value. multiplying the p-value by the number of tests can lead to p-values greater than one (specifically for the bonferroni method, the bh method is already accounted for by dividing by the rank), which is impossible since a p-value is a probability between 0 and 1. while the end conclusion is the same, it doesn't make sense from a statistical standpoint. you can absolutely use this method to get the right significance, but if you are presenting this to a statistician or publishing this work, you would need to adjust the p-value cutoff instead of the actual p-value or change any p-value that is greater than 1 to be exactly 1, but even this is a little more nuanced than just multiplication.
This is a bit pedantic: of course probabilities don't go above 1.You can simply clip the data frame column to have a max value of 1.
This man is an absolute God send, I can't even begin to count the amount of times he has came in clutch with a solution to issues I encounter in my personal projects and during my internship!
Glad I could help!
Thank you for this very helpful tutorial. is there a function for plotting the euclidean distance map for each sample?
@Sanbomics... good content but you really need to learn how to talk!
I done gone learned how to talk real good like enough. No idea what u r meaning. Such rude
we're still looking forward to the future part ;D
I know i know xD. I was going to start working on it this weekend. I have been very busy!
someday soon...
@@sanbomics we're all hoping for this series to be completed so we can implement it, we're rooting for you! we're grateful for anything you can share :D
Thanks for your work. It's really useful.
Can you cover batch correction in python?
I love this, thank you!
thank you very much. Great work. Maybe a dumb question, but how would you process bam files that contain SCdata from several cells? Essentially what I need is a table similar to yours with genes in the rows and cells in the columns (instead of whole bam files).
You can convert it back to fastq then run it through various single cell counters. e.g., if it is 10x data you can use cellranger bamtofastq then cellranger count
Thank you so much! This video is perfect for those who want to analyze scRNA-seq data!
Firstly thank you so much for your videos are very useful. I have used featureCounts to generate the count table, but I obtain a percentage of Unassigned_NoFeatures to high (around 50%). I checked that the annotation file used to the alignment and to generate the count table is the same, also I checked the type of stranded of the assay and I continue having the same problem. I tried to change the GTF.featureType to exon by gene and the % of Unassigned_NoFeatures decrease until 15% more or less. These results suggest me that I have a high content of introns or intergenic regiones in my results but when I checked with the IGV I don't observe that. I don't know if you can help me with this or tell me is this results are normal for human data. Thank you so much!!
Hmm. Sounds weird. It's hard for me to diagnose from here. See what happens if you use a pseudoaligner instead like salmon
You are definitely using the right annotation? xD
@@sanbomics Thank you so much for your feedback. I am triying now with Salmon
@@sanbomics yes, I check that several times 😅
Thanks for making very useful videos. I was wondering if you would like to make a video related to single cell analysis using Julius AI a data analysis AI.
Hello! I am curious if you will have a much expanded version of the Spatial Seq Data analysis, also curious if there is any Proteomic analysis tutorial coming up! Thanks for your great work!
I might do a visium HD and/or a xenium tutorial soon but i have a lot of things in the queue and not enough time xD
@@sanbomics Visium HD would be amazing! I´m new to spatial transcriptomics and my first project is on visium hd data, am desperately looking for a nice workflow of analysis
hey mate Am getting this error followed all your steps except the filter one pls help I have 5 columns in my matrix (M10,M11,M12,M3 and M5) and EMBSEL gene ids to it The dds step is not working Error in checkForExperimentalReplicates(object, modelMatrix) : The design matrix has the same number of samples and coefficients to fit, so estimation of dispersion is not possible. Treating samples as replicates was deprecated in v1.20 and no longer
I never comment on youtube videos but thank you so much for this. It was so simple and straight to the point. I am new to coding and needed to get all of the outputs from 20 featurecount reads into one output file, and this was the only thing I could find that not only made sense, but also worked. Thank you thank you thank you!!!
Wooo glad it helped you!
Do you have a code for doing the same in R studio, I've been trying to built a Seurat object with public available data, using the counts, position and images, with no success. Thanks
Sorry :( only code for this in python atm. What step specifically are you having trouble with? The initial loading of the data?
Hi Sam, do you have a video of how you’re downloading the data from NCBI (papers) because that part I don’t understand.
I don't have a video. But I tweet about it sometimes if you follow me on twitter. I may make something like this in the future. You can check out my most recent video series for another example from a different dataset
great video, I learn a lot. But i was wondering what device you use? I did the same analysis but i could load the model because im out of memory.
this computer has 128 gb memory. But you can try the analysis with fewer samples if you want to follow along still
Dear sanbomic! I am wondering what is your first options for gene regulatory network analysis? when you have only single cell rna seq data or also have single cell ATAC data!!
I like scenic and scenic+ if you have paired ATAC data. Scenic and cell oracle are also good choices.
@@sanbomics Thank you so much
Thank you, very usefull !
thank you for the nice video, regarding to the part for making the cell typ fraction plot (form this part of the code till end of this part: adata.obs.groupby(['sample']).count()) may you also please explain how to do it in R with the Seurat objecet? thanks
Your older videos are also very helpful. Thank you for everything
They never got much love though xD
Hi! thanks for this! just wondering if you have tried comparing results of CellOracle from SCENIC?
I haven't, but there is someone in the lab working on that at the moment.
Hello I'm use Star to align rna .fq using --quantmode when i cat ReadPerGene.out.tap it's indicate that my read only map to rRNAs ncRNA but when i check in IGV it shown that my reads mapped to coding gene ,how do i fix this I'm trying to use ReadPerGene.out.tab to further analysis.
Hi! Im memory limited, so I can only load in my dataset using the backed = 'r' option. How would I subset in this scenario?
I avoid backed at all costs haha. I know I have had to do this before.. but it is so infrequent that I don't remember how off the top of my head and I don't remember where I can find an example. Hope you figured it out, sorry for slow response
Not sure if the DeseqDataSet parameters have changed since this tutorial but I had to change clinical to metadata when running: dds = DeseqDataSet( counts = counts, metadata=pb.obs, design_factors="tumour")
Yup its changed a lot. I'll be remaking it soon!
there is no complex heat map package
haii may I know if it runs .tsv file?
if you import the tsv first as a pandas dataframe
Could you please help how to make a violin plot using AUCell package like you did for dimplot?
Your tutorials are always incredibly helpful! Do you have a scripts that implements UMAP to spatial transition animation for the Xenium dataset? We have a beautiful new dataset that can share. sqiuatarizonadotedu. Thanks so much.
Hey! Could you please explain what these computed counts are?
Men U R a real hero
No you are the real hero! (The Boys reference)
Very cool video! Could you please tell us how to do something similar to your introduction with the umap transforming to the logo??
I have the video where I turn my cat into a UMAP. Let me know if that helps, if not, I can maybe post the code.
my name is zeinab bahari . you can find me in research gat... i need help in rna seq data analysis
If you need help you can check out sanbomics.com
Hi.thanks for your good video.how can acsess to you dr. i need some emergenecy help in my data analysis.. please help me
Hi, you can reach me through sanbomics.com
Im not a scientist. I came here from cancer research. I gave it a thumbs up. Clear and informational.
Is there a reason you used CellTypist before integration? It means that the overclustering done by CellTypist is different to the overclustering done post-integration when annotating (which is making annotation a bit confusing in my case)
You can do it after depending on how many cells you have. With this many cells it becomes almost impossible because it requires a dense matrix.