Genomics Container

The genomics container is built for bioinformaticians running sequencing pipelines, variant analysis, and multi-omics workflows. It ships with a comprehensive set of command-line tools so you can start processing data immediately without spending hours on installation.

Pre-installed tools

Alignment and mapping: samtools 1.22.1, bcftools 1.22, htslib 1.22.1, bowtie2 2.5.4, BWA, STAR 2.7.11

Variant calling and analysis: GATK 4.6, PLINK 2.0, beagle 5.4, GLIMPSE2 2.0, Sentieon Genomics Suite 202503.02

Quantification: kallisto 0.51.1

Pipeline orchestration: Nextflow 25.10

Utilities: sra-toolkit (fastq-dump, fasterq-dump, prefetch, etc.), MultiQC, cutadapt, gnuplot, picard

Languages and runtimes: Python 3.12 with scientific and bioinformatics libraries (numpy, pandas, polars, biopython, pysam, scanpy, anndata, etc.), R with Bioconductor (DESeq2, edgeR, limma, GenomicRanges, rtracklayer, Biostrings), Java (OpenJDK)

Picard usage

Picard 3.4.0 is installed and available on your PATH via the picard wrapper script.
Use commands like:

picard MarkDuplicates I=input.bam O=dedup.bam M=metrics.txt

The wrapper handles invoking Java with the correct JAR, so you should run Picard with picard ... and not java -jar picard.jar ....

Sentieon Genomics suite

The container includes the Sentieon Genomics Suite 202503.02 for high-performance, GATK-compatible variant calling and secondary analysis.

Installation location: tools are installed under ~/sentieon-genomics-202503.02 and added to your PATH, so you can run Sentieon commands directly (for example, sentieon driver ... within your workflows).
Environment variables: SENTIEON_INSTALL_DIR and SENTIEON_PYTHON are configured so Sentieon tools and Python integrations work out of the box.
Quickstart examples: example scripts and configs are available in ~/sentieon_quickstart, which you can use as templates for whole-genome/exome and tumor/normal pipelines.

In many cases you can adapt existing GATK-style workflows by swapping in the corresponding Sentieon commands to take advantage of improved speed and deterministic results. For full usage details, example pipelines, and advanced configuration, see the official Sentieon documentation.

Sentieon billing (usage-based)

Starting July 1 2026, running Sentieon costs +$0.06 / vCPU / hr — but only while a Sentieon tool is actually computing, on top of base compute. It’s free for everyone until then.

A few details:

Metered, not flat. You’re charged only for the time Sentieon is actually running. Downloading inputs, building indexes, GATK steps, or sitting idle cost nothing extra — even on a genomics container. So a pipeline that spends its first 20 minutes pulling FASTQs from S3 pays $0 of Sentieon for those 20 minutes.
It scales with n_vcpus: a 16-vCPU container costs an extra $0.96 / hr during the Sentieon step, and $0 otherwise.
Only the Sentieon compute tools (alignment, sorting, variant calling) are metered; the bundled license server doesn’t count.
The instance card shows a live “Sentieon running” badge with the current surcharge whenever Sentieon is active (it can lag a start or stop by a few minutes). There’s no separate invoice line item — it folds into your usage.

Don’t need Sentieon? Just don’t call it. Run GATK, BWA-MEM, samtools, or your own variant-calling stack and you pay only normal compute — nothing to configure, no flag to set.

When to use it

Whole-genome and exome sequencing pipelines (alignment, variant calling, annotation)
RNA-seq analysis (STAR alignment, kallisto quantification, DESeq2 differential expression)
Multi-omics workflows combining genomic, transcriptomic, and proteomic data
Population genetics (PLINK, GLIMPSE2, beagle for imputation)
Nextflow pipelines that need all tools available in one environment

Resource recommendations

Workload	vCPUs	RAM	Disk
Small panel sequencing	4-8	16-32 GiB	100-200 GiB
Whole-exome	8-16	32-64 GiB	200-500 GiB
Whole-genome	16-32	64-128 GiB	500-1000 GiB
RNA-seq (STAR)	16+	64+ GiB	200-500 GiB

STAR alignment requires significant RAM for genome indexing. Budget at least 32 GiB for human genome alignment.

Creating a genomics container

curl -X POST https://console.carolinacloud.io/api/instance/ \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "resource_type": "container",
    "flavor": "genomics",
    "n_vcpus": 16,
    "mem_gib": 64,
    "disk_size_gib": 500,
    "cpu_tier": "high-performance"
  }'

Or via the CLI:

ccloud new container --cpus 16 --ram 64 --disk 500 \
  --flavor genomics --tier high-performance

Access

SSH only. No web interface.

ssh -p <port> ccloud@login.carolinacloud.io

All tools are on $PATH and ready to use immediately after connecting.

S3 integration

Pass S3 credentials and an optional aws_prefill_bucket at creation time to pre-stage reference genomes, FASTQ files, or other inputs into ~/s3-prefill/ before the container is ready. Carolina Cloud-managed buckets are also auto-mounted at ~/ccloud-s3/. See S3 Integration for details.

AI integration

The Claude Code CLI is pre-installed. Pass an Anthropic API key at creation time to pre-authenticate it for use in the container’s terminal. See AI Integration for details.

Coming soon: use our Anthropic key. The creation modal already shows a Use ours toggle next to the Anthropic API key field. It’s currently disabled — when it lands, you’ll be able to skip the key entirely and have Claude Code authenticated against a Carolina Cloud-managed Anthropic key, billed through your normal Carolina Cloud invoice. Until then, paste your own sk-ant-... key in the Use my own key field.

Auto-stop on idle

Between pipeline runs, a genomics container is typically idle. Enable auto-stop to have the container shut itself off after a timeout of idle CPU — compute billing stops, your reference data and intermediate results stay on disk, and you can start it back up when the next job is ready.