Sanbomics
Sanbomics
  • Видео 69
  • Просмотров 881 354
2024 updated single-cell guide - Part 2: RNA Integration and annotation
In this video I integrate the single-cell RNA data together with scVI and use multiple methods of label transfer from reference datasets. I then verify and annotate the individual clusters using known marker genes. This video covers advanced analysis steps, such as tuning hyperparameters in our scVI model, making custom reference datasets, and more.
Main notebook:
github.com/mousepixels/sanbomics_scripts/blob/main/sc2024/annotation_integration.ipynb
Example of bad mapping:
github.com/mousepixels/sanbomics_scripts/blob/main/sc2024/bad_mapping.ipynb
Part 1:
ruclips.net/video/cmOlCTGX4Ik/видео.html
0:00 Celltypist transfer
13:49 scVI transfer
21:13 Integration
29:22 Dim reduction
34:00 Annotation
Просмотров: 3 755

Видео

2024 updated single-cell guide - Part 1: RNA preprocessing and quality control
Просмотров 7 тыс.2 месяца назад
This is a comprehensive tutorial on the most up-to-date recommendations for single-cell sequencing. This is part 1 of a multi-part series. Here I download a dataset, remove background RNA, preform quality control, and remove low quality cells. Part 2 will cover dimension reduction and cell annotation. We will eventually get to in-depth analysis and scATAC analysis. Notebook: github.com/mousepix...
Single-cell pseudotime and gene regulatory analysis with CellOracle
Просмотров 3,5 тыс.6 месяцев назад
CellOracle is a powerful suite of tools that can perform pseudotime analysis, gene regulatory network analysis, and in silico perturbation analysis on single-cell data in python. This is a simple tutorial covering the basics of CellOracle while analyzing a developmental pancreas dataset. Notebook: github.com/mousepixels/sanbomics_scripts/blob/main/celloracle_pseudotime_GRN.ipynb Reference: www....
Processing single-cell RNAseq counts with simpleaf (alevin-fry)
Просмотров 2,2 тыс.9 месяцев назад
Simpleaf is a faster and more efficient alternative to other counters, such as cellranger, and it works with other single-cell chemistries. It is a wrapper for Alevin-fry and is made by the same lab that created the Salmon RNAseq aligner. Simpleaf is still in development as of making this video. Do not be surprised if the workflow changes slightly in the future. Github notes: github.com/mousepi...
RNAseq mapping with Salmon for differential expression
Просмотров 7 тыс.11 месяцев назад
How do you process raw read data for the purpose of differential expression? In this video I map raw RNAseq reads using Salmon and follow up with differential expression analysis in R with Deseq2. notebook: github.com/mousepixels/sanbomics_scripts/blob/main/salmon_to_deseq.Rmd references: salmon.readthedocs.io/en/latest/salmon.html index preparation: combine-lab.github.io/alevin-tutorial/2019/s...
Pseudobulk single-cell analysis in Python with Scanpy and pyDeseq2
Просмотров 7 тыс.11 месяцев назад
It is now possible to do pseudobulk analysis directly in python on your scanpy object. I create the pseudobulk from single-cell data then analyze it with the python port of Deseq2. Notebook: github.com/mousepixels/sanbomics_scripts/blob/main/pseudobulk_pyDeseq2.ipynb
Differential expression in Python with pyDESeq2
Просмотров 18 тыс.Год назад
Analyze RNAseq counts data with a Python implementation of DESeq2. I cover basic differential expression analysis, PCA plots, GSEA, heatmaps, and volcano plots. Github: github.com/mousepixels/sanbomics_scripts/blob/main/PyDeseq2_DE_tutorial.ipynb The samples include normal human cell control and replicative senescence cells from NCBI accession GSE171663 0:00 Intro 0:30 Differential expression 7...
Single-cell background decontamination in R and Python with SoupX
Просмотров 4,8 тыс.Год назад
SoupX is an essential tool for ambient RNA decontamination in single-cell RNA sequencing data. Ambient RNA in solution is partitioned into droplets and confounds downstream analyses. This concise tutorial covers SoupX implementation for both R (Seurat objects) and Python (Scanpy objects), offering step-by-step guidance and expert insights to improve data quality and accuracy in your single-cell...
Can chatGPT do single-cell bioinformatic analysis?
Просмотров 26 тыс.Год назад
Here I test if chatGPT with the GPT-4 model can do basic single-cell RNA analysis. In short, the results are impressive.
Comparing single-cell RNA integration methods | Which is the best?
Просмотров 9 тыс.Год назад
Which single-cell integration method is the best? In this video I compare 5 different methods using 3 different challenging integration problems. I test Seurat CCA, Seurat RPCA, SCVI-tools, and Scanorama. I measure time and memory usage and also examine integration outcomes. Github: github.com/mousepixels/sanbomics_scripts/tree/main/integration_comparison Datasets: www.cell.com/cell/fulltext/S0...
Single-cell trajectory and pseudotime analysis with Monocle3 and Seurat in R
Просмотров 14 тыс.Год назад
In this video I perform trajectory analysis in R on a large dataset of cells undergoing dedifferentiation into iPSCs. I use Seurat to load, merge, and preprocess the data. I use Monocle3 to calculate pseudotime and create the trajectories. I go over basic analyses and plotting using Monocle3. Notebook: github.com/mousepixels/sanbomics_scripts/blob/main/monocle3_tutorial.Rmd References: www.cell...
Easy RNAseq volcano plot with one line of code
Просмотров 5 тыс.Год назад
Make a super easy and PRETTY volcano plot from differentially expressed genes with only one line of code. Plotting aesthetic figures can be challenging and/or time consuming. Here I show you how to make a pretty volcano plot without needed much prior coding knowledge. They are also highly customizable for more advanced users.
Applying random forest classifiers to single-cell RNAseq data
Просмотров 6 тыс.Год назад
Learn how to apply machine learning to single-cell data. Random forest is a powerful machine learning classifier and a great tool for analyzing single-cell RNAseq data. In addition to predicting classifications, you can extract the gene importance from the model as a way to identify genes that describe your populations. Here I use several examples to show you how to use the random forest model ...
Introduction to single cell ATAC data analysis in R
Просмотров 14 тыс.Год назад
This is a primer for single cell/nuclei ATAC-seq data analysis. What is single cell ATACseq? How do you perform basic scATAC-seq analysis in R? I describe what scATACseq is. Then I use Seurat and Signac to do data analysis using a recent Nature communications paper. I do preprocessing, clustering, differential accessibility analysis, RNA activity estimation, and I make various plots. Notebook: ...
Complete single-cell RNAseq analysis walkthrough | Advanced introduction
Просмотров 75 тыс.Год назад
This is a comprehensive introduction into single-cell analysis in python. I recreate the main single cell analyses from a recent Nature publication. I explain the basics of single-cell sequencing analysis and also introduce more advanced topics. I cover doublet removal, preprocessing, integration, clustering, cell identification, differential expression, gene-set enrichment, non-parametric stat...
Introduction to spatial sequencing data analysis
Просмотров 8 тыс.Год назад
Introduction to spatial sequencing data analysis
Single-cell gene co-expression | single-cell RNAseq methods
Просмотров 4 тыс.Год назад
Single-cell gene co-expression | single-cell RNAseq methods
3 minute GSEA tutorial in R | RNAseq tutorials
Просмотров 24 тыс.Год назад
3 minute GSEA tutorial in R | RNAseq tutorials
Simple guide to GSEA and plotting in python
Просмотров 9 тыс.Год назад
Simple guide to GSEA and plotting in python
Convert h5ad anndata to a Seurat single-cell R object
Просмотров 9 тыс.Год назад
Convert h5ad anndata to a Seurat single-cell R object
Guide to filtering and subsetting single-cell anndata and pandas objects | basic and advanced
Просмотров 6 тыс.Год назад
Guide to filtering and subsetting single-cell anndata and pandas objects | basic and advanced
Label single-cells automatically in python | scVI label transfer
Просмотров 4,2 тыс.Год назад
Label single-cells automatically in python | scVI label transfer
Beautiful and customizable RNAseq volcano plots
Просмотров 11 тыс.2 года назад
Beautiful and customizable RNAseq volcano plots
How to remove single-cell doublets in python
Просмотров 3,1 тыс.2 года назад
How to remove single-cell doublets in python
Single-cell analysis with scVI machine-learning toolkit
Просмотров 8 тыс.2 года назад
Single-cell analysis with scVI machine-learning toolkit
Single-cell integration in python with scanpy
Просмотров 8 тыс.2 года назад
Single-cell integration in python with scanpy
RNAseq volcano plot of differentially expressed genes
Просмотров 27 тыс.2 года назад
RNAseq volcano plot of differentially expressed genes
How to do gene ontology analysis in python
Просмотров 15 тыс.2 года назад
How to do gene ontology analysis in python
RNAseq analysis | Gene ontology (GO) in R
Просмотров 53 тыс.2 года назад
RNAseq analysis | Gene ontology (GO) in R
Single-cell gene set activity with AUCell
Просмотров 4,9 тыс.2 года назад
Single-cell gene set activity with AUCell

Комментарии

  • @asshimul1168
    @asshimul1168 19 часов назад

    Hello, Would you please make a tut on advance workflow (based on good paper) on sc RNA Seq by using R?

  • @asshimul1168
    @asshimul1168 5 дней назад

    Please make this same tutorial for R🙏

  • @JUNPENGYOU-us7mk
    @JUNPENGYOU-us7mk 9 дней назад

    Thank you for the informative tutorial video; it has been immensely beneficial to my scientific research!😄

    • @sanbomics
      @sanbomics 9 дней назад

      Glad it was helpful!

  • @yanshixiong434
    @yanshixiong434 10 дней назад

    great content, languages just different, don't have to be good or bad.

  • @sapienthought1103
    @sapienthought1103 12 дней назад

    do i have to do this for each sample or what ?

    • @sanbomics
      @sanbomics 10 дней назад

      You will have to run it, but not install it for every sample.

  • @freenergy777
    @freenergy777 13 дней назад

    1:12, why do we have to predict doublet at each sample separately??

    • @sanbomics
      @sanbomics 10 дней назад

      Because every sample is a little different. If there were more cells in one sample then the doublet rate will be higher for that sample. Also samples have different cell types etc

  • @aytacoksuzoglu2975
    @aytacoksuzoglu2975 15 дней назад

    Well it was really awesome. Im still undergreadute so it was little bit hard to understand bioinfo part but python code part was clear. Can you or did you do other integration methods or can you record another video ?

    • @sanbomics
      @sanbomics 10 дней назад

      Yeah I actually have a video that compares multiple integration methods: ruclips.net/video/NFA2YGshATs/видео.html

  • @frutitadelosmares
    @frutitadelosmares 16 дней назад

    Hi! Thanks so much for such a great tutorial! Have a naïve question of someone who just started in this world: When raw data is not available, for example, you can only download normalised filtered values, do you skip the pre-processing step? Is it correct to pre-process normalised values, let's say tmm? Again, thanks so much for all the videos!

    • @sanbomics
      @sanbomics 10 дней назад

      Yeah if there are no raw counts then you will have to skip the ambient removal. Unfortunately, this is the only way sometimes.

  • @hrisivanov3150
    @hrisivanov3150 17 дней назад

    Man, you absolutely saved me! Suddenly, everything makes sense now. Subscribed!

  • @TheXu122
    @TheXu122 18 дней назад

    Thank you so much for your videos! I am a grad student who recently started a sing cell project and since I found your channel, your explanations and code have been getting me through this tough time. I was wondering if you will be planning on doing cNMF in the future? It is something that I and our lab have had difficulty with. Thanks again!

    • @sanbomics
      @sanbomics 10 дней назад

      I can definitely keep that in mind for a future video!

  • @gregj3913
    @gregj3913 20 дней назад

    Thank you for this video! I am confused why you can use the Ensembl version 109 for TxImport--but you ran salmon with the Gencode transcripts fasta. Doesn't gencode differ in what transcripts are included versus Ensembl? Or this doesn't really matter?

  • @MKShams
    @MKShams 20 дней назад

    Hi there, thanks. But I have this error.I changed it to 6 but still I have this error !!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=154478, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 7

  • @caspase888
    @caspase888 24 дня назад

    Your videos are amazing. Thanks a lot. Could I use 3050 with 64 GB RAM for this kind of analysis? Thanks a lot.

    • @sanbomics
      @sanbomics 10 дней назад

      You can do a decent number of cells with 64 gb ram. I would think you could handle around ~200k in memory at the same time without too many issues. Some steps/algoirthms use a lot more memory though so it is highly dependent on what you do. In my experience 64 gb wont be enough for large datasets/atlases but you can def do small numbers of samples.

  • @JmandudE888
    @JmandudE888 25 дней назад

    While this is technically correct, it is not very statistically sound. When adjusting for the bonferroni and bh procedures, we typically change the cutoff point, not the actual p-value. multiplying the p-value by the number of tests can lead to p-values greater than one (specifically for the bonferroni method, the bh method is already accounted for by dividing by the rank), which is impossible since a p-value is a probability between 0 and 1. while the end conclusion is the same, it doesn't make sense from a statistical standpoint. you can absolutely use this method to get the right significance, but if you are presenting this to a statistician or publishing this work, you would need to adjust the p-value cutoff instead of the actual p-value or change any p-value that is greater than 1 to be exactly 1, but even this is a little more nuanced than just multiplication.

    • @sanbomics
      @sanbomics 10 дней назад

      This is a bit pedantic: of course probabilities don't go above 1.You can simply clip the data frame column to have a max value of 1.

  • @danielpintard7382
    @danielpintard7382 26 дней назад

    This man is an absolute God send, I can't even begin to count the amount of times he has came in clutch with a solution to issues I encounter in my personal projects and during my internship!

    • @sanbomics
      @sanbomics 10 дней назад

      Glad I could help!

  • @AbelDavid-qc5xy
    @AbelDavid-qc5xy 26 дней назад

    Thank you for this very helpful tutorial. is there a function for plotting the euclidean distance map for each sample?

  • @azxcf2912
    @azxcf2912 27 дней назад

    @Sanbomics... good content but you really need to learn how to talk!

    • @sanbomics
      @sanbomics 27 дней назад

      I done gone learned how to talk real good like enough. No idea what u r meaning. Such rude

  • @mehdiraouine2979
    @mehdiraouine2979 28 дней назад

    we're still looking forward to the future part ;D

    • @sanbomics
      @sanbomics 28 дней назад

      I know i know xD. I was going to start working on it this weekend. I have been very busy!

    • @sanbomics
      @sanbomics 10 дней назад

      someday soon...

    • @Dumbo-eo5ps
      @Dumbo-eo5ps 9 дней назад

      @@sanbomics we're all hoping for this series to be completed so we can implement it, we're rooting for you! we're grateful for anything you can share :D

  • @phuchoanglevn
    @phuchoanglevn 29 дней назад

    Thanks for your work. It's really useful.

  • @AbelDavid-qc5xy
    @AbelDavid-qc5xy 29 дней назад

    Can you cover batch correction in python?

  • @user-cj1sh8qu5h
    @user-cj1sh8qu5h 29 дней назад

    I love this, thank you!

  • @benjaminwehnert1893
    @benjaminwehnert1893 Месяц назад

    thank you very much. Great work. Maybe a dumb question, but how would you process bam files that contain SCdata from several cells? Essentially what I need is a table similar to yours with genes in the rows and cells in the columns (instead of whole bam files).

    • @sanbomics
      @sanbomics 28 дней назад

      You can convert it back to fastq then run it through various single cell counters. e.g., if it is 10x data you can use cellranger bamtofastq then cellranger count

  • @young-kookkim5031
    @young-kookkim5031 Месяц назад

    Thank you so much! This video is perfect for those who want to analyze scRNA-seq data!

  • @celiagonzalezgil57
    @celiagonzalezgil57 Месяц назад

    Firstly thank you so much for your videos are very useful. I have used featureCounts to generate the count table, but I obtain a percentage of Unassigned_NoFeatures to high (around 50%). I checked that the annotation file used to the alignment and to generate the count table is the same, also I checked the type of stranded of the assay and I continue having the same problem. I tried to change the GTF.featureType to exon by gene and the % of Unassigned_NoFeatures decrease until 15% more or less. These results suggest me that I have a high content of introns or intergenic regiones in my results but when I checked with the IGV I don't observe that. I don't know if you can help me with this or tell me is this results are normal for human data. Thank you so much!!

    • @sanbomics
      @sanbomics 28 дней назад

      Hmm. Sounds weird. It's hard for me to diagnose from here. See what happens if you use a pseudoaligner instead like salmon

    • @sanbomics
      @sanbomics 28 дней назад

      You are definitely using the right annotation? xD

    • @celiagonzalezgil57
      @celiagonzalezgil57 24 дня назад

      @@sanbomics Thank you so much for your feedback. I am triying now with Salmon

    • @celiagonzalezgil57
      @celiagonzalezgil57 24 дня назад

      @@sanbomics yes, I check that several times 😅

  • @fsh9134
    @fsh9134 Месяц назад

    Thanks for making very useful videos. I was wondering if you would like to make a video related to single cell analysis using Julius AI a data analysis AI.

  • @yaseminsucu416
    @yaseminsucu416 Месяц назад

    Hello! I am curious if you will have a much expanded version of the Spatial Seq Data analysis, also curious if there is any Proteomic analysis tutorial coming up! Thanks for your great work!

    • @sanbomics
      @sanbomics 28 дней назад

      I might do a visium HD and/or a xenium tutorial soon but i have a lot of things in the queue and not enough time xD

    • @tamaterinha
      @tamaterinha 23 дня назад

      @@sanbomics Visium HD would be amazing! I´m new to spatial transcriptomics and my first project is on visium hd data, am desperately looking for a nice workflow of analysis

  • @ParthShah-hc8pw
    @ParthShah-hc8pw Месяц назад

    hey mate Am getting this error followed all your steps except the filter one pls help I have 5 columns in my matrix (M10,M11,M12,M3 and M5) and EMBSEL gene ids to it The dds step is not working Error in checkForExperimentalReplicates(object, modelMatrix) : The design matrix has the same number of samples and coefficients to fit, so estimation of dispersion is not possible. Treating samples as replicates was deprecated in v1.20 and no longer

  • @Kelly-gg8eq
    @Kelly-gg8eq Месяц назад

    I never comment on youtube videos but thank you so much for this. It was so simple and straight to the point. I am new to coding and needed to get all of the outputs from 20 featurecount reads into one output file, and this was the only thing I could find that not only made sense, but also worked. Thank you thank you thank you!!!

    • @sanbomics
      @sanbomics Месяц назад

      Wooo glad it helped you!

  • @davidstivenarboledaprado8731
    @davidstivenarboledaprado8731 Месяц назад

    Do you have a code for doing the same in R studio, I've been trying to built a Seurat object with public available data, using the counts, position and images, with no success. Thanks

    • @sanbomics
      @sanbomics Месяц назад

      Sorry :( only code for this in python atm. What step specifically are you having trouble with? The initial loading of the data?

  • @Amanda-re2vt
    @Amanda-re2vt Месяц назад

    Hi Sam, do you have a video of how you’re downloading the data from NCBI (papers) because that part I don’t understand.

    • @sanbomics
      @sanbomics Месяц назад

      I don't have a video. But I tweet about it sometimes if you follow me on twitter. I may make something like this in the future. You can check out my most recent video series for another example from a different dataset

  • @mhmmdbduh
    @mhmmdbduh Месяц назад

    great video, I learn a lot. But i was wondering what device you use? I did the same analysis but i could load the model because im out of memory.

    • @sanbomics
      @sanbomics Месяц назад

      this computer has 128 gb memory. But you can try the analysis with fewer samples if you want to follow along still

  • @ykoy1577
    @ykoy1577 Месяц назад

    Dear sanbomic! I am wondering what is your first options for gene regulatory network analysis? when you have only single cell rna seq data or also have single cell ATAC data!!

    • @sanbomics
      @sanbomics Месяц назад

      I like scenic and scenic+ if you have paired ATAC data. Scenic and cell oracle are also good choices.

    • @ykoy1577
      @ykoy1577 Месяц назад

      @@sanbomics Thank you so much

  • @ionutiordachi695
    @ionutiordachi695 Месяц назад

    Thank you, very usefull !

  • @saraalidadiani5881
    @saraalidadiani5881 Месяц назад

    thank you for the nice video, regarding to the part for making the cell typ fraction plot (form this part of the code till end of this part: adata.obs.groupby(['sample']).count()) may you also please explain how to do it in R with the Seurat objecet? thanks

  • @ykoy1577
    @ykoy1577 Месяц назад

    Your older videos are also very helpful. Thank you for everything

    • @sanbomics
      @sanbomics Месяц назад

      They never got much love though xD

  • @lizheltamon
    @lizheltamon Месяц назад

    Hi! thanks for this! just wondering if you have tried comparing results of CellOracle from SCENIC?

    • @sanbomics
      @sanbomics Месяц назад

      I haven't, but there is someone in the lab working on that at the moment.

  • @catalyst1918
    @catalyst1918 Месяц назад

    Hello I'm use Star to align rna .fq using --quantmode when i cat ReadPerGene.out.tap it's indicate that my read only map to rRNAs ncRNA but when i check in IGV it shown that my reads mapped to coding gene ,how do i fix this I'm trying to use ReadPerGene.out.tab to further analysis.

  • @qwerty11111122
    @qwerty11111122 Месяц назад

    Hi! Im memory limited, so I can only load in my dataset using the backed = 'r' option. How would I subset in this scenario?

    • @sanbomics
      @sanbomics Месяц назад

      I avoid backed at all costs haha. I know I have had to do this before.. but it is so infrequent that I don't remember how off the top of my head and I don't remember where I can find an example. Hope you figured it out, sorry for slow response

  • @gracegregory4846
    @gracegregory4846 Месяц назад

    Not sure if the DeseqDataSet parameters have changed since this tutorial but I had to change clinical to metadata when running: dds = DeseqDataSet( counts = counts, metadata=pb.obs, design_factors="tumour")

    • @sanbomics
      @sanbomics Месяц назад

      Yup its changed a lot. I'll be remaking it soon!

  • @SamipSapkota-zg8hy
    @SamipSapkota-zg8hy Месяц назад

    there is no complex heat map package

  • @zamhazri6240
    @zamhazri6240 Месяц назад

    haii may I know if it runs .tsv file?

    • @sanbomics
      @sanbomics Месяц назад

      if you import the tsv first as a pandas dataframe

  • @divyamishra2641
    @divyamishra2641 Месяц назад

    Could you please help how to make a violin plot using AUCell package like you did for dimplot?

  • @MrQiushenfeng
    @MrQiushenfeng Месяц назад

    Your tutorials are always incredibly helpful! Do you have a scripts that implements UMAP to spatial transition animation for the Xenium dataset? We have a beautiful new dataset that can share. sqiuatarizonadotedu. Thanks so much.

  • @aayushinotra7945
    @aayushinotra7945 Месяц назад

    Hey! Could you please explain what these computed counts are?

  • @issanmitro
    @issanmitro Месяц назад

    Men U R a real hero

    • @sanbomics
      @sanbomics Месяц назад

      No you are the real hero! (The Boys reference)

  • @duadpeada5068
    @duadpeada5068 Месяц назад

    Very cool video! Could you please tell us how to do something similar to your introduction with the umap transforming to the logo??

    • @sanbomics
      @sanbomics Месяц назад

      I have the video where I turn my cat into a UMAP. Let me know if that helps, if not, I can maybe post the code.

  • @zeinabbahari
    @zeinabbahari Месяц назад

    my name is zeinab bahari . you can find me in research gat... i need help in rna seq data analysis

    • @sanbomics
      @sanbomics Месяц назад

      If you need help you can check out sanbomics.com

  • @zeinabbahari
    @zeinabbahari Месяц назад

    Hi.thanks for your good video.how can acsess to you dr. i need some emergenecy help in my data analysis.. please help me

    • @sanbomics
      @sanbomics Месяц назад

      Hi, you can reach me through sanbomics.com

  • @cold_hardfacts
    @cold_hardfacts Месяц назад

    Im not a scientist. I came here from cancer research. I gave it a thumbs up. Clear and informational.

  • @georgieb1326
    @georgieb1326 Месяц назад

    Is there a reason you used CellTypist before integration? It means that the overclustering done by CellTypist is different to the overclustering done post-integration when annotating (which is making annotation a bit confusing in my case)

    • @sanbomics
      @sanbomics Месяц назад

      You can do it after depending on how many cells you have. With this many cells it becomes almost impossible because it requires a dense matrix.