Getting Started

To get started you will need to get familiar with the command line (aka shell or bash) and either R or Python. I have used R mainly so my recommendations will be biased towards R. See [Sanbiomics](https://www.youtube.com/@sanbomics/featured) for Python advice. Below I will outline the typical analysis steps when analyzing omics data and highlight which tools you can use to do them. Again, these are my recommendations and there are other ways/tools to accomplish these steps

Basic sequencing/bioinformatics analytics pipeline

Step Language Tool
download data/fastqs shell sra fastq dump (if published) or download for sequencing core. It is really important for data quality purposes you DO NOT change the filenames.
align fastqs to reference genome shell bowtie for DNAse, bowtie2 for any other chromatin based assays (i.e. ChIP-seq), STAR for RNA-seq
Call Peaks (for chromatin based assays) shell / R MACS2 for ChIP-seq, seacr for CUTNRUN
Extract RNA-seq Counts or Chromatin-based assay reads under peaks shell / R HTSeq or salmon R package for RNA-seq, Rsubreads::featureCounts() for chromatin based assays
perform QC & filtering shell / R
  • look for batch effects
  • Are there higher or lower sequencing depth on specific sequencing days or library prep days?
  • Are there any outliers?
FastQC for QC check of fastq files. For the extracted reads, create several plots looking at sequencing depth’s relationship to genomic metadata like library prep days, sequencing days, and sample groups.
Differential analysis R DESeq2 for many types of genomic data, diffbind for chromatin based assays
Map peaks to nearest genes R ChIPseeker
Enrichment testing for differential genes or genes near differential peaks R / web-based tools Most enrichment testing does a version of a fisher's exact test to look at the proportion of significant genes compared to the proportion of genes annotated to a specific pathway or GO term. ClusterProfiler for GO enrichment, KEGG pathway enrichment, Reactome pathway enrichment.
TF enrichment analysis shell HOMER or MEME
Gene Correlation Analysis R WGCNA

After all your analyses are done you pour over your genes and make connections and interpret data

Resources

Bioinformatics Tutorials/Courses

This Data Carpentries Genomics Curriculum course is taught to the incoming UPGG Students every year. IMO, these modules are the most important:

Applied Computational Genomics Course at UU: Spring 2022. Taught by a bioinformatics legend. I find his raw genomic data processing content very insightful. This includes:

HarvardX Biomedical Data Science Open Online Training. This is a really good front to back course on bioinformatics. I would focus on:

Bioinformatics YouTube Channels