Skip to content

Genomics Container

The genomics container is a dedicated compute environment for bioinformatics, pre-loaded with alignment, variant calling, quantification, and pipeline orchestration tools. SSH in and start processing data immediately — no installation required.

Alignment & mapping: samtools 1.22.1, bcftools 1.22, htslib 1.22.1, bowtie2 2.5.4, BWA, STAR 2.7.11b

Variant calling & analysis: GATK 4.6.2, PLINK 2.0, beagle 5.4, GLIMPSE2 2.0, Picard 3.4.0

Sentieon Genomics Suite 202503.02: High-performance, GATK-compatible variant calling with deterministic results and significantly faster runtimes. Includes quickstart scripts for whole-genome, exome, and tumor/normal pipelines in ~/sentieon_quickstart. Licensed and ready to use out of the box — no separate Sentieon license required.

Quantification & QC: kallisto 0.51.1, MultiQC, cutadapt

Pipeline orchestration: Nextflow 25.10

Data access: sra-toolkit (fastq-dump, fasterq-dump, prefetch)

Languages: Python 3.12 (numpy, pandas, polars, biopython, pysam, scanpy, anndata, scikit-bio, cyvcf2), R with Bioconductor (DESeq2, edgeR, limma, GenomicRanges, rtracklayer, Biostrings), Java (OpenJDK)

  1. Sign in at console.carolinacloud.io
  2. In the sidebar, click Create then choose Genomics
  3. Select your CPU tier — we recommend high-performance for alignment and variant calling workloads
  4. Use the sliders to configure vCPUs, RAM, and disk size (see recommendations below)
  5. Optionally paste your SSH public key
  6. Click Create
  7. Once the status changes to “Running”, use the SSH command on the instance card to connect
Terminal window
curl -X POST https://console.carolinacloud.io/api/instance/ \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"resource_type": "container",
"flavor": "genomics",
"n_vcpus": 16,
"mem_gib": 64,
"disk_size_gib": 500,
"cpu_tier": "high-performance"
}'

Optional fields:

FieldDescription
nameHuman-friendly name (e.g. wgs-pipeline-01)
cpu_tiergeneral-purpose (default) or high-performance ($0.01/vCPU/hr, recommended for genomics)
public_keySSH public key for password-less access
Terminal window
ccloud new container --cpus 16 --ram 64 --disk 500 \
--flavor genomics \
--tier high-performance \
--name wgs-pipeline
Terminal window
ssh -p <wan_ssh_port> ccloud@login.carolinacloud.io

The port and password are shown on your instance card. All tools are on $PATH and ready to use immediately.

WorkloadvCPUsRAMDiskCPU tier
Small panel sequencing4-816-32 GiB100-200 GiBGeneral purpose
Whole-exome8-1632-64 GiB200-500 GiBHigh performance
Whole-genome (30x)16-3264-128 GiB500-1000 GiBHigh performance
RNA-seq (STAR alignment)16+64+ GiB200-500 GiBHigh performance
Population genetics (GWAS)8-1632-64 GiB200-500 GiBEither

STAR alignment requires significant RAM for genome indexing — budget at least 32 GiB for human genome alignment. Whole-genome variant calling with Sentieon or GATK benefits from high-performance CPUs and 16+ cores.

Sentieon tools are on your $PATH and the license is pre-configured. Run Sentieon commands directly:

Terminal window
sentieon driver -r reference.fa -i input.bam \
--algo Haplotyper output.vcf.gz

Quickstart scripts for common workflows are in ~/sentieon_quickstart. These can be used as templates for:

  • Germline whole-genome/exome variant calling
  • Somatic tumor/normal analysis
  • Joint genotyping

Sentieon provides the same variant calling algorithms as GATK but with deterministic results and substantially faster runtimes, making it well-suited for production pipelines on high-performance instances.

Terminal window
# Download reads from SRA
fasterq-dump SRR12345678 --threads 8
# Align with BWA
bwa index reference.fa
bwa mem -t 16 reference.fa SRR12345678_1.fastq SRR12345678_2.fastq \
| samtools sort -@ 4 -o aligned.bam
samtools index aligned.bam
# Mark duplicates
picard MarkDuplicates I=aligned.bam O=dedup.bam M=metrics.txt
# Variant calling with Sentieon (fast, deterministic)
sentieon driver -r reference.fa -i dedup.bam \
--algo Haplotyper variants.vcf.gz
# Or with GATK
gatk HaplotypeCaller -R reference.fa -I dedup.bam -O variants.vcf.gz
# QC report
multiqc .

Need more cores mid-pipeline? Containers support live resizing without stopping:

Terminal window
curl -X PATCH https://console.carolinacloud.io/api/instance/<uuid>/ \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"n_vcpus": 32, "mem_gib": 128}'

Scale up for alignment, scale down for overnight analysis — pay only for what you use.