Genomics Container
The genomics container is built for bioinformaticians running sequencing pipelines, variant analysis, and multi-omics workflows. It ships with a comprehensive set of command-line tools so you can start processing data immediately without spending hours on installation.
Pre-installed tools
Section titled “Pre-installed tools”Alignment and mapping: samtools 1.22.1, bcftools 1.22, htslib 1.22.1, bowtie2 2.5.4, BWA, STAR 2.7.11
Variant calling and analysis: GATK 4.6, PLINK 2.0, beagle 5.4, GLIMPSE2 2.0, Sentieon Genomics Suite 202503.02
Quantification: kallisto 0.51.1
Pipeline orchestration: Nextflow 25.10
Utilities: sra-toolkit (fastq-dump, fasterq-dump, prefetch, etc.), MultiQC, cutadapt, gnuplot, picard
Languages and runtimes: Python 3.12 with scientific and bioinformatics libraries (numpy, pandas, polars, biopython, pysam, scanpy, anndata, etc.), R with Bioconductor (DESeq2, edgeR, limma, GenomicRanges, rtracklayer, Biostrings), Java (OpenJDK)
Picard usage
Section titled “Picard usage”Picard 3.4.0 is installed and available on your PATH via the picard wrapper script.
Use commands like:
picard MarkDuplicates I=input.bam O=dedup.bam M=metrics.txtThe wrapper handles invoking Java with the correct JAR, so you should run Picard with picard ... and not java -jar picard.jar ....
Sentieon Genomics suite
Section titled “Sentieon Genomics suite”The container includes the Sentieon Genomics Suite 202503.02 for high-performance, GATK-compatible variant calling and secondary analysis.
- Installation location: tools are installed under
~/sentieon-genomics-202503.02and added to yourPATH, so you can run Sentieon commands directly (for example,sentieon driver ...within your workflows). - Environment variables:
SENTIEON_INSTALL_DIRandSENTIEON_PYTHONare configured so Sentieon tools and Python integrations work out of the box. - Quickstart examples: example scripts and configs are available in
~/sentieon_quickstart, which you can use as templates for whole-genome/exome and tumor/normal pipelines.
In many cases you can adapt existing GATK-style workflows by swapping in the corresponding Sentieon commands to take advantage of improved speed and deterministic results. For full usage details, example pipelines, and advanced configuration, see the official Sentieon documentation.
Sentieon licensing surcharge
Section titled “Sentieon licensing surcharge”Starting June 30, 2026, running genomics instances pay a +$0.06 / vCPU / hr surcharge on top of base compute to cover the Sentieon license. Until then it’s free for everyone — the window exists so existing users see the rate before billing diverges.
A few details:
- The surcharge only applies while the instance is running. Stopped, exited, or migrating instances pay storage-only (same as base compute).
- It scales linearly with
n_vcpus. A 16-vCPU genomics container costs an extra $0.96 / hr while running. - It only applies to
flavor=genomics. Theplaingenomicsflavor is identical togenomicsminus the Sentieon suite — same toolkit, same image, no Sentieon binaries, no surcharge. Pickplaingenomicsif your pipeline doesn’t actually call into Sentieon. - The surcharge is included in the hourly cost shown on the instance card and the creation modal — there’s no separate line item on your invoice.
If you’re unsure whether you need Sentieon: if you’re running GATK, BWA-MEM, samtools, or your own variant-calling stack and never invoke sentieon directly, you don’t. Use plaingenomics and skip the surcharge.
When to use it
Section titled “When to use it”- Whole-genome and exome sequencing pipelines (alignment, variant calling, annotation)
- RNA-seq analysis (STAR alignment, kallisto quantification, DESeq2 differential expression)
- Multi-omics workflows combining genomic, transcriptomic, and proteomic data
- Population genetics (PLINK, GLIMPSE2, beagle for imputation)
- Nextflow pipelines that need all tools available in one environment
Resource recommendations
Section titled “Resource recommendations”| Workload | vCPUs | RAM | Disk |
|---|---|---|---|
| Small panel sequencing | 4-8 | 16-32 GiB | 100-200 GiB |
| Whole-exome | 8-16 | 32-64 GiB | 200-500 GiB |
| Whole-genome | 16-32 | 64-128 GiB | 500-1000 GiB |
| RNA-seq (STAR) | 16+ | 64+ GiB | 200-500 GiB |
STAR alignment requires significant RAM for genome indexing. Budget at least 32 GiB for human genome alignment.
Creating a genomics container
Section titled “Creating a genomics container”curl -X POST https://console.carolinacloud.io/api/instance/ \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "resource_type": "container", "flavor": "genomics", "n_vcpus": 16, "mem_gib": 64, "disk_size_gib": 500, "cpu_tier": "high-performance" }'Or via the CLI:
ccloud new container --cpus 16 --ram 64 --disk 500 \ --flavor genomics --tier high-performanceAccess
Section titled “Access”SSH only. No web interface.
ssh -p <port> ccloud@login.carolinacloud.ioAll tools are on $PATH and ready to use immediately after connecting.
S3 integration
Section titled “S3 integration”Pass S3 credentials and an optional aws_prefill_bucket at creation time to pre-stage reference genomes, FASTQ files, or other inputs into ~/s3-prefill/ before the container is ready. Carolina Cloud-managed buckets are also auto-mounted at ~/ccloud-s3/. See S3 Integration for details.
AI integration
Section titled “AI integration”The Claude Code CLI is pre-installed. Pass an Anthropic API key at creation time to pre-authenticate it for use in the container’s terminal. See AI Integration for details.
Coming soon: use our Anthropic key. The creation modal already shows a Use ours toggle next to the Anthropic API key field. It’s currently disabled — when it lands, you’ll be able to skip the key entirely and have Claude Code authenticated against a Carolina Cloud-managed Anthropic key, billed through your normal Carolina Cloud invoice. Until then, paste your own sk-ant-... key in the Use my own key field.
Auto-stop on idle
Section titled “Auto-stop on idle”Between pipeline runs, a genomics container is typically idle. Enable auto-stop to have the container shut itself off after a timeout of idle CPU — compute billing stops, your reference data and intermediate results stay on disk, and you can start it back up when the next job is ready.