Blog Page Template

Getting Started

To get started you will need to get familiar with the command line (aka shell or bash) and either R or Python. I have used R mainly so my recommendations will be biased towards R. See [Sanbiomics](https://www.youtube.com/@sanbomics/featured) for Python advice. Below I will outline the typical analysis steps when analyzing omics data and highlight which tools you can use to do them. Again, these are my recommendations and there are other ways/tools to accomplish these steps

Basic sequencing/bioinformatics analytics pipeline

Step	Language	Tool
download data/fastqs	`shell`	sra fastq dump (if published) or download for sequencing core. It is really important for data quality purposes you DO NOT change the filenames.
align fastqs to reference genome	`shell`	bowtie for DNAse, bowtie2 for any other chromatin based assays (i.e. ChIP-seq), STAR for RNA-seq
Call Peaks (for chromatin based assays)	`shell` / `R`	MACS2 for ChIP-seq, seacr for CUTNRUN
Extract RNA-seq Counts or Chromatin-based assay reads under peaks	`shell` / `R`	HTSeq or salmon R package for RNA-seq, Rsubreads::featureCounts() for chromatin based assays
perform QC & filtering	`shell` / `R`	look for batch effects Are there higher or lower sequencing depth on specific sequencing days or library prep days? Are there any outliers? FastQC for QC check of fastq files. For the extracted reads, create several plots looking at sequencing depth’s relationship to genomic metadata like library prep days, sequencing days, and sample groups.
Differential analysis	`R`	DESeq2 for many types of genomic data, diffbind for chromatin based assays
Map peaks to nearest genes	`R`	ChIPseeker
Enrichment testing for differential genes or genes near differential peaks	`R` / web-based tools	Most enrichment testing does a version of a fisher's exact test to look at the proportion of significant genes compared to the proportion of genes annotated to a specific pathway or GO term. ClusterProfiler for GO enrichment, KEGG pathway enrichment, Reactome pathway enrichment.
TF enrichment analysis	`shell`	HOMER or MEME
Gene Correlation Analysis	`R`	WGCNA

After all your analyses are done you pour over your genes and make connections and interpret data

Resources

Bioinformatics Tutorials/Courses

This Data Carpentries Genomics Curriculum course is taught to the incoming UPGG Students every year. IMO, these modules are the most important:

Applied Computational Genomics Course at UU: Spring 2022. Taught by a bioinformatics legend. I find his raw genomic data processing content very insightful. This includes:

HarvardX Biomedical Data Science Open Online Training. This is a really good front to back course on bioinformatics. I would focus on:

Bioinformatics YouTube Channels

Bioinformagician
Chatomics
- Bulk RNA-seq analysis
- Single cell RNA-seq analysis
StatQuest

Bioinformatics Books

Prof. Jean Fan - She does live coding sessions which I find to be very helpful for realistic coding. But she mainly focuses on spatial transcriptomics.
Duke’s computational biology reading group playlist on the Duke Center for Computational Thinking. This whole playlist is live coding and they have an R package with code and data to walk through each tutorial.

Bioinformatic Data Skills (Official Website: here, Online PDF: here) - good all around
Modern Statistics for Modern Biology - great for stats concepts
Computational Biology - good all around for concepts and code

Getting Started in Bioinformatics