Skip to content

Genomics Container

The genomics container is built for bioinformaticians running sequencing pipelines, variant analysis, and multi-omics workflows. It ships with a comprehensive set of command-line tools so you can start processing data immediately without spending hours on installation.

Alignment and mapping: samtools 1.22.1, bcftools 1.22, htslib 1.22.1, bowtie2 2.5.4, BWA, STAR 2.7.11

Variant calling and analysis: GATK 4.6, PLINK 2.0, beagle 5.4, GLIMPSE2 2.0, Sentieon Genomics Suite 202503.02

Quantification: kallisto 0.51.1

Pipeline orchestration: Nextflow 25.10

Utilities: sra-toolkit (fastq-dump, fasterq-dump, prefetch, etc.), MultiQC, cutadapt, gnuplot, picard

Languages and runtimes: Python 3.12 with scientific and bioinformatics libraries (numpy, pandas, polars, biopython, pysam, scanpy, anndata, etc.), R with Bioconductor (DESeq2, edgeR, limma, GenomicRanges, rtracklayer, Biostrings), Java (OpenJDK)

Picard 3.4.0 is installed and available on your PATH via the picard wrapper script.
Use commands like:

Terminal window
picard MarkDuplicates I=input.bam O=dedup.bam M=metrics.txt

The wrapper handles invoking Java with the correct JAR, so you should run Picard with picard ... and not java -jar picard.jar ....

The container includes the Sentieon Genomics Suite 202503.02 for high-performance, GATK-compatible variant calling and secondary analysis.

  • Installation location: tools are installed under ~/sentieon-genomics-202503.02 and added to your PATH, so you can run Sentieon commands directly (for example, sentieon driver ... within your workflows).
  • Environment variables: SENTIEON_INSTALL_DIR and SENTIEON_PYTHON are configured so Sentieon tools and Python integrations work out of the box.
  • Quickstart examples: example scripts and configs are available in ~/sentieon_quickstart, which you can use as templates for whole-genome/exome and tumor/normal pipelines.

In many cases you can adapt existing GATK-style workflows by swapping in the corresponding Sentieon commands to take advantage of improved speed and deterministic results. For full usage details, example pipelines, and advanced configuration, see the official Sentieon documentation.

Starting June 30, 2026, running genomics instances pay a +$0.06 / vCPU / hr surcharge on top of base compute to cover the Sentieon license. Until then it’s free for everyone — the window exists so existing users see the rate before billing diverges.

A few details:

  • The surcharge only applies while the instance is running. Stopped, exited, or migrating instances pay storage-only (same as base compute).
  • It scales linearly with n_vcpus. A 16-vCPU genomics container costs an extra $0.96 / hr while running.
  • It only applies to flavor=genomics. The plaingenomics flavor is identical to genomics minus the Sentieon suite — same toolkit, same image, no Sentieon binaries, no surcharge. Pick plaingenomics if your pipeline doesn’t actually call into Sentieon.
  • The surcharge is included in the hourly cost shown on the instance card and the creation modal — there’s no separate line item on your invoice.

If you’re unsure whether you need Sentieon: if you’re running GATK, BWA-MEM, samtools, or your own variant-calling stack and never invoke sentieon directly, you don’t. Use plaingenomics and skip the surcharge.

  • Whole-genome and exome sequencing pipelines (alignment, variant calling, annotation)
  • RNA-seq analysis (STAR alignment, kallisto quantification, DESeq2 differential expression)
  • Multi-omics workflows combining genomic, transcriptomic, and proteomic data
  • Population genetics (PLINK, GLIMPSE2, beagle for imputation)
  • Nextflow pipelines that need all tools available in one environment
WorkloadvCPUsRAMDisk
Small panel sequencing4-816-32 GiB100-200 GiB
Whole-exome8-1632-64 GiB200-500 GiB
Whole-genome16-3264-128 GiB500-1000 GiB
RNA-seq (STAR)16+64+ GiB200-500 GiB

STAR alignment requires significant RAM for genome indexing. Budget at least 32 GiB for human genome alignment.

Terminal window
curl -X POST https://console.carolinacloud.io/api/instance/ \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"resource_type": "container",
"flavor": "genomics",
"n_vcpus": 16,
"mem_gib": 64,
"disk_size_gib": 500,
"cpu_tier": "high-performance"
}'

Or via the CLI:

Terminal window
ccloud new container --cpus 16 --ram 64 --disk 500 \
--flavor genomics --tier high-performance

SSH only. No web interface.

Terminal window
ssh -p <port> ccloud@login.carolinacloud.io

All tools are on $PATH and ready to use immediately after connecting.

Pass S3 credentials and an optional aws_prefill_bucket at creation time to pre-stage reference genomes, FASTQ files, or other inputs into ~/s3-prefill/ before the container is ready. Carolina Cloud-managed buckets are also auto-mounted at ~/ccloud-s3/. See S3 Integration for details.

The Claude Code CLI is pre-installed. Pass an Anthropic API key at creation time to pre-authenticate it for use in the container’s terminal. See AI Integration for details.

Coming soon: use our Anthropic key. The creation modal already shows a Use ours toggle next to the Anthropic API key field. It’s currently disabled — when it lands, you’ll be able to skip the key entirely and have Claude Code authenticated against a Carolina Cloud-managed Anthropic key, billed through your normal Carolina Cloud invoice. Until then, paste your own sk-ant-... key in the Use my own key field.

Between pipeline runs, a genomics container is typically idle. Enable auto-stop to have the container shut itself off after a timeout of idle CPU — compute billing stops, your reference data and intermediate results stay on disk, and you can start it back up when the next job is ready.