Genomics Container
The genomics container is a dedicated compute environment for bioinformatics, pre-loaded with alignment, variant calling, quantification, and pipeline orchestration tools. SSH in and start processing data immediately — no installation required.
What’s included
Section titled “What’s included”Alignment & mapping: samtools 1.22.1, bcftools 1.22, htslib 1.22.1, bowtie2 2.5.4, BWA, STAR 2.7.11b
Variant calling & analysis: GATK 4.6.2, PLINK 2.0, beagle 5.4, GLIMPSE2 2.0, Picard 3.4.0
Sentieon Genomics Suite 202503.02: High-performance, GATK-compatible variant calling with deterministic results and significantly faster runtimes. Includes quickstart scripts for whole-genome, exome, and tumor/normal pipelines in ~/sentieon_quickstart. Licensed and ready to use out of the box — no separate Sentieon license required.
Quantification & QC: kallisto 0.51.1, MultiQC, cutadapt
Pipeline orchestration: Nextflow 25.10
Data access: sra-toolkit (fastq-dump, fasterq-dump, prefetch)
Languages: Python 3.12 (numpy, pandas, polars, biopython, pysam, scanpy, anndata, scikit-bio, cyvcf2), R with Bioconductor (DESeq2, edgeR, limma, GenomicRanges, rtracklayer, Biostrings), Java (OpenJDK)
Via the Dashboard
Section titled “Via the Dashboard”- Sign in at console.carolinacloud.io
- In the sidebar, click Create then choose Genomics
- Select your CPU tier — we recommend high-performance for alignment and variant calling workloads
- Use the sliders to configure vCPUs, RAM, and disk size (see recommendations below)
- Optionally paste your SSH public key
- Click Create
- Once the status changes to “Running”, use the SSH command on the instance card to connect
Via the API
Section titled “Via the API”curl -X POST https://console.carolinacloud.io/api/instance/ \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "resource_type": "container", "flavor": "genomics", "n_vcpus": 16, "mem_gib": 64, "disk_size_gib": 500, "cpu_tier": "high-performance" }'Optional fields:
| Field | Description |
|---|---|
name | Human-friendly name (e.g. wgs-pipeline-01) |
cpu_tier | general-purpose (default) or high-performance ($0.01/vCPU/hr, recommended for genomics) |
public_key | SSH public key for password-less access |
Via the CLI
Section titled “Via the CLI”ccloud new container --cpus 16 --ram 64 --disk 500 \ --flavor genomics \ --tier high-performance \ --name wgs-pipelineConnecting
Section titled “Connecting”ssh -p <wan_ssh_port> ccloud@login.carolinacloud.ioThe port and password are shown on your instance card. All tools are on $PATH and ready to use immediately.
Resource recommendations
Section titled “Resource recommendations”| Workload | vCPUs | RAM | Disk | CPU tier |
|---|---|---|---|---|
| Small panel sequencing | 4-8 | 16-32 GiB | 100-200 GiB | General purpose |
| Whole-exome | 8-16 | 32-64 GiB | 200-500 GiB | High performance |
| Whole-genome (30x) | 16-32 | 64-128 GiB | 500-1000 GiB | High performance |
| RNA-seq (STAR alignment) | 16+ | 64+ GiB | 200-500 GiB | High performance |
| Population genetics (GWAS) | 8-16 | 32-64 GiB | 200-500 GiB | Either |
STAR alignment requires significant RAM for genome indexing — budget at least 32 GiB for human genome alignment. Whole-genome variant calling with Sentieon or GATK benefits from high-performance CPUs and 16+ cores.
Using Sentieon
Section titled “Using Sentieon”Sentieon tools are on your $PATH and the license is pre-configured. Run Sentieon commands directly:
sentieon driver -r reference.fa -i input.bam \ --algo Haplotyper output.vcf.gzQuickstart scripts for common workflows are in ~/sentieon_quickstart. These can be used as templates for:
- Germline whole-genome/exome variant calling
- Somatic tumor/normal analysis
- Joint genotyping
Sentieon provides the same variant calling algorithms as GATK but with deterministic results and substantially faster runtimes, making it well-suited for production pipelines on high-performance instances.
Example: WGS pipeline from SRA to VCF
Section titled “Example: WGS pipeline from SRA to VCF”# Download reads from SRAfasterq-dump SRR12345678 --threads 8
# Align with BWAbwa index reference.fabwa mem -t 16 reference.fa SRR12345678_1.fastq SRR12345678_2.fastq \ | samtools sort -@ 4 -o aligned.bamsamtools index aligned.bam
# Mark duplicatespicard MarkDuplicates I=aligned.bam O=dedup.bam M=metrics.txt
# Variant calling with Sentieon (fast, deterministic)sentieon driver -r reference.fa -i dedup.bam \ --algo Haplotyper variants.vcf.gz
# Or with GATKgatk HaplotypeCaller -R reference.fa -I dedup.bam -O variants.vcf.gz
# QC reportmultiqc .Live resizing
Section titled “Live resizing”Need more cores mid-pipeline? Containers support live resizing without stopping:
curl -X PATCH https://console.carolinacloud.io/api/instance/<uuid>/ \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"n_vcpus": 32, "mem_gib": 128}'Scale up for alignment, scale down for overnight analysis — pay only for what you use.