3. DICOM I/O
1. Setup: Download Sample Data¶
We'll use the test dataset from HuggingFace which includes DICOM format radiotherapy data (RT-STRUCT, RT-DOSE, CT).
from huggingface_hub import snapshot_download
from pathlib import Path
import numpy as np
# Download the dataset (cached locally after first download)
data_path = snapshot_download(
repo_id="contouraid/dosemetrics-data",
repo_type="dataset",
revision="main" # Use latest version with DICOM data
)
data_path = Path(data_path)
print(f"✓ Data downloaded to: {data_path}")
print(f"\nAvailable datasets:")
for item in data_path.iterdir():
if item.is_dir():
print(f" - {item.name}")
Fetching 165 files: 100%|██████████| 165/165 [00:00<00:00, 574800.80it/s]
✓ Data downloaded to: /Users/amithkamath/.cache/huggingface/hub/datasets--contouraid--dosemetrics-data/snapshots/839ceab7ba71766265fd6a637fe799341bb0364f Available datasets: - test_subject - longitudinal - dicom
2. Basic Data Loading¶
The simplest way to load DICOM data is using load_structure_set(). This function:
- Automatically detects the DICOM format
- Loads RT-DOSE, RT-STRUCT, and CT data
- Returns a StructureSet object with convenient access methods
from dosemetrics.io import load_structure_set
# Load DICOM data
dicom_path = data_path / "dicom"
structures = load_structure_set(dicom_path)
print(f"✓ Loaded DICOM data from: {dicom_path}")
print(f"\nNumber of structures: {len(structures)}")
print(f"Structure names: {structures.structure_names}")
✓ Loaded DICOM data from: /Users/amithkamath/.cache/huggingface/hub/datasets--contouraid--dosemetrics-data/snapshots/839ceab7ba71766265fd6a637fe799341bb0364f/dicom Number of structures: 43 Structure names: ['PTV_Total', 'ParotidCon-PTV', 'ParotidIps-PTV', 'Parotids', 'Parotids-PTV', 'PharConst-PTV', 'PharynxConst', 'PTVHighOPT', 'PTV70', 'OralCavity', 'OpticNerve_R', 'OpticNerve_L', 'OCavity-PTV', 'Mandible-PTV', 'Mandible', 'Lungs', 'Lips', 'Lens_R', 'Lens_L', 'Larynx-PTV', 'Larynx', 'LacrimalGlands', 'Eyes', 'Cochlea_R', 'Cochlea_L', 'Chiasm', 'BrainStem_03', 'BrainStem', 'Brain', 'BrachialPlexus', 'Body', 'Pituitary', 'Posterior_Neck', 'RingPTVHigh', 'Shoulders', 'SpinalCord', 'SpinalCord_05', 'Submand-PTV', 'SubmandL-PTV', 'SubmandR-PTV', 'Submandibular', 'Thyroid', 'Thyroid-PTV']
3. Working with the StructureSet¶
The StructureSet object provides the same convenient interface regardless of the source format (DICOM or NIfTI).
# List all available structures with details
print("Available structures:")
print("-" * 60)
for i, name in enumerate(structures.structure_names, 1):
structure = structures.get_structure(name)
voxel_count = (structure.mask > 0).sum()
print(f"{i:2d}. {name:20s} - {voxel_count:7,d} voxels")
# Get a specific structure mask
ptv = structures.get_structure("PTV_Total")
print(f"\nPTV details:")
print(f" Shape: {ptv.mask.shape}")
print(f" Data type: {ptv.mask.dtype}")
print(f" Non-zero voxels: {(ptv.mask > 0).sum():,}")
print(f" Min value: {ptv.mask.min()}")
print(f" Max value: {ptv.mask.max()}")
# Check if structures are available
print(f"\nStructure availability:")
for check_name in ["Brainstem", "Chiasm", "OpticNerve_L"]:
available = check_name in structures
print(f" {check_name:20s}: {'✓' if available else '✗'}")
Available structures: ------------------------------------------------------------ 1. PTV_Total - 18,135 voxels 2. ParotidCon-PTV - 3,986 voxels 3. ParotidIps-PTV - 3,496 voxels 4. Parotids - 7,475 voxels 5. Parotids-PTV - 7,475 voxels 6. PharConst-PTV - 2,715 voxels 7. PharynxConst - 2,822 voxels 8. PTVHighOPT - 17,989 voxels 9. PTV70 - 18,135 voxels 10. OralCavity - 24,829 voxels 11. OpticNerve_R - 148 voxels 12. OpticNerve_L - 150 voxels 13. OCavity-PTV - 24,597 voxels 14. Mandible-PTV - 12,155 voxels 15. Mandible - 12,155 voxels 16. Lungs - 213,112 voxels 17. Lips - 5,909 voxels 18. Lens_R - 22 voxels 19. Lens_L - 21 voxels 20. Larynx-PTV - 20 voxels 21. Larynx - 3,237 voxels 22. LacrimalGlands - 420 voxels 23. Eyes - 3,966 voxels 24. Cochlea_R - 47 voxels 25. Cochlea_L - 45 voxels 26. Chiasm - 135 voxels 27. BrainStem_03 - 6,839 voxels 28. BrainStem - 4,592 voxels 29. Brain - 305,948 voxels 30. BrachialPlexus - 4,283 voxels 31. Body - 2,014,376 voxels 32. Pituitary - 98 voxels 33. Posterior_Neck - 154,430 voxels 34. RingPTVHigh - 107,732 voxels 35. Shoulders - 315,026 voxels 36. SpinalCord - 2,060 voxels 37. SpinalCord_05 - 10,879 voxels 38. Submand-PTV - 3,055 voxels 39. SubmandL-PTV - 1,557 voxels 40. SubmandR-PTV - 1,499 voxels 41. Submandibular - 3,064 voxels 42. Thyroid - 2,130 voxels 43. Thyroid-PTV - 1,435 voxels PTV details: Shape: (107, 512, 512) Data type: bool Non-zero voxels: 18,135 Min value: False Max value: True Structure availability: Brainstem : ✗ Chiasm : ✓ OpticNerve_L : ✓
4. Loading Dose Distribution (Recommended)¶
Use the high-level Dose class to load RT-DOSE files. This is the recommended approach for dose analysis.
from dosemetrics import Dose
# Find and load the RT-DOSE file
dose_files = list((dicom_path / "RTDOSE").glob("*.dcm"))
if dose_files:
dose = Dose.from_dicom(dose_files[0], name="Clinical")
print("Dose Distribution:")
print("-" * 60)
print(f" Name: {dose.name}")
print(f" Dimensions: {dose.shape}")
print(f" Max dose: {dose.max_dose:.2f} Gy")
print(f" Mean dose: {dose.mean_dose:.2f} Gy")
print(f" Min dose: {dose.min_dose:.2f} Gy")
print(f" Spacing: {dose.spacing} mm")
print(f" Origin: {dose.origin} mm")
else:
print("No RT-DOSE files found")
Dose Distribution: ------------------------------------------------------------ Name: Clinical Dimensions: (107, 100, 166) Max dose: 74.20 Gy Mean dose: 1.05 Gy Min dose: 0.00 Gy Spacing: (2.5, 2.5, 3.0) mm Origin: (-199.3532327, -145.5323869, 61.0) mm
5. Computing Dose Statistics¶
Combine the dose and structures to compute dose statistics.
from dosemetrics.metrics import dvh
# Compute dose statistics for a structure
if dose_files:
ptv = structures.get_structure("PTV_Total")
# Check if dose and structure are compatible
if dose.is_compatible_with_structure(ptv):
stats = dvh.compute_dose_statistics(dose, ptv)
print("PTV Dose Statistics:")
print("-" * 60)
print(f" Mean dose: {stats['mean_dose']:.2f} Gy")
print(f" Max dose: {stats['max_dose']:.2f} Gy")
print(f" Min dose: {stats['min_dose']:.2f} Gy")
print(f" D95: {stats['D95']:.2f} Gy")
print(f" D50: {stats['D50']:.2f} Gy")
print(f" D05: {stats['D05']:.2f} Gy")
# Compute DVH
dose_bins, volumes = dvh.compute_dvh(dose, ptv)
print(f"\nDVH computed with {len(dose_bins)} dose bins")
else:
print("⚠️ Warning: Dose and structure grids are incompatible")
print(f" Dose shape: {dose.shape}, spacing: {dose.spacing}")
print(f" Structure shape: {ptv.mask.shape}, spacing: {ptv.spacing}")
print(" This DICOM dataset has dose and structures at different resolutions.")
print(" Consider using resampling or working with aligned data.")
⚠️ Warning: Dose and structure grids are incompatible Dose shape: (107, 100, 166), spacing: (2.5, 2.5, 3.0) Structure shape: (107, 512, 512), spacing: (1.12, 1.12, 3.0) This DICOM dataset has dose and structures at different resolutions. Consider using resampling or working with aligned data.
6. Loading Multiple Dose Distributions¶
The new architecture allows you to load and compare multiple RT-DOSE files.
# Load multiple dose files if available
if len(dose_files) > 1:
doses = []
for i, dose_file in enumerate(dose_files[:3], 1):
d = Dose.from_dicom(dose_file, name=f"Plan_{i}")
doses.append(d)
print(f"{d.name}: Max dose = {d.max_dose:.2f} Gy")
# Try to compare doses for PTV
print(f"\nDose comparison for PTV_Total:")
ptv = structures.get_structure("PTV_Total")
# Check compatibility before computing statistics
compatible_doses = []
for d in doses:
if d.is_compatible_with_structure(ptv):
compatible_doses.append(d)
if compatible_doses:
for d in compatible_doses:
stats = dvh.compute_dose_statistics(d, ptv)
print(f" {d.name}: Mean = {stats['mean_dose']:.2f} Gy, D95 = {stats['D95']:.2f} Gy")
else:
print(" ⚠️ Note: Dose and structure grids have different resolutions")
print(f" Dose shape: {doses[0].shape}, Structure shape: {ptv.mask.shape}")
print(" This is common in DICOM data. Consider resampling for analysis.")
else:
print(f"Only one dose file available: {dose_files[0].name}")
Plan_1: Max dose = 74.20 Gy Plan_2: Max dose = 73.61 Gy Plan_3: Max dose = 72.57 Gy Dose comparison for PTV_Total:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[12], line 13 11 ptv = structures.get_structure("PTV_Total") 12 for d in doses: ---> 13 stats = statistics.compute_dose_statistics(d, ptv) 14 print(f" {d.name}: Mean = {stats['mean_dose']:.2f} Gy, D95 = {stats['D95']:.2f} Gy") 15 else: File ~/Repositories/ContourAId/dosemetrics/src/dosemetrics/metrics/statistics.py:38, in compute_dose_statistics(dose, structure) 12 def compute_dose_statistics(dose: 'Dose', structure: 'Structure') -> Dict[str, float]: 13 """ 14 Compute comprehensive dose statistics for a structure. 15 (...) 36 >>> print(f"D95: {stats['D95']:.2f} Gy") 37 """ ---> 38 dose_values = dose.get_dose_in_structure(structure) 40 if len(dose_values) == 0: 41 return { 42 'mean_dose': 0.0, 43 'max_dose': 0.0, (...) 51 'D98': 0.0, 52 } File ~/Repositories/ContourAId/dosemetrics/src/dosemetrics/dose.py:134, in Dose.get_dose_in_structure(self, structure) 121 """ 122 Extract dose values within a structure mask. 123 (...) 131 ValueError: If dose and structure are not spatially compatible 132 """ 133 if not self.is_compatible_with_structure(structure): --> 134 raise ValueError( 135 f"Dose '{self.name}' (shape={self.shape}) is not compatible " 136 f"with structure '{structure.name}' (shape={structure.mask.shape}). " 137 f"Dose spacing: {self.spacing}, Structure spacing: {structure.spacing}" 138 ) 140 return self.dose_array[structure.mask] ValueError: Dose 'Plan_1' (shape=(107, 100, 166)) is not compatible with structure 'PTV_Total' (shape=(107, 512, 512)). Dose spacing: (2.5, 2.5, 3.0), Structure spacing: (1.12, 1.12, 3.0)
7. Low-Level DICOM Operations (Advanced)¶
For advanced use cases, you can access raw DICOM data using low-level functions. This returns dictionaries instead of high-level objects.
from dosemetrics.io import dicom_io
# Load raw data as dictionary (not recommended for typical use)
dicom_data_dict = dicom_io.load_dicom_folder(dicom_path, return_as_structureset=False)
print("Low-Level DICOM Data Dictionary:")
print("-" * 60)
print(f"Keys: {list(dicom_data_dict.keys())}")
if 'structures' in dicom_data_dict:
print(f"\nStructures: {len(dicom_data_dict['structures'])}")
for name in list(dicom_data_dict['structures'].keys())[:3]:
print(f" - {name}")
if 'dose_volumes' in dicom_data_dict:
print(f"\nDose volumes: {len(dicom_data_dict['dose_volumes'])}")
for name, data in dicom_data_dict['dose_volumes'].items():
print(f" - {name}: shape {data['array'].shape}")
print("\nNote: For most use cases, use the high-level Dose and StructureSet classes instead.")
7. Exploring DICOM File Structure¶
Let's examine the organization of DICOM files in the folder.
import os
print("DICOM Folder Structure:")
print("-" * 60)
for root, dirs, files in os.walk(dicom_path):
level = root.replace(str(dicom_path), '').count(os.sep)
indent = ' ' * 2 * level
folder_name = os.path.basename(root)
if folder_name:
print(f'{indent}{folder_name}/')
subindent = ' ' * 2 * (level + 1)
# Show first few files in each directory
dicom_files = [f for f in files if not f.startswith('.')]
if dicom_files:
for i, file in enumerate(dicom_files[:3]):
print(f'{subindent}{file}')
if len(dicom_files) > 3:
print(f'{subindent}... and {len(dicom_files) - 3} more files')
# Count DICOM file types
ct_files = list((dicom_path / "CT").glob("*")) if (dicom_path / "CT").exists() else []
rtdose_files = list((dicom_path / "RTDOSE").glob("*")) if (dicom_path / "RTDOSE").exists() else []
rtstruct_files = list((dicom_path / "RTSTRUCT").glob("*")) if (dicom_path / "RTSTRUCT").exists() else []
rtplan_files = list((dicom_path / "RTPLAN").glob("*")) if (dicom_path / "RTPLAN").exists() else []
print(f"\nDICOM File Summary:")
print(f" CT slices: {len(ct_files)}")
print(f" RT-DOSE files: {len(rtdose_files)}")
print(f" RT-STRUCT files: {len(rtstruct_files)}")
print(f" RT-PLAN files: {len(rtplan_files)}")
8. Analyzing Spatial Properties¶
Extract spatial information from DICOM data.
# Get spatial information from loaded structures
if hasattr(structures, 'spacing') and structures.spacing is not None:
voxel_spacing = np.array(structures.spacing)
# Get dimensions from any structure
sample_structure = structures.get_structure(structures.structure_names[0])
grid_dimensions = np.array(sample_structure.mask.shape)
physical_dimensions = voxel_spacing * grid_dimensions
print("Spatial Properties:")
print("-" * 60)
print(f"Grid dimensions (voxels): {grid_dimensions}")
print(f"Voxel spacing (mm): {voxel_spacing}")
print(f"Physical dimensions (mm): {physical_dimensions}")
print(f"Physical dimensions (cm): {physical_dimensions / 10}")
print(f"\nVoxel volume: {np.prod(voxel_spacing):.3f} mm³")
print(f"Total volume: {np.prod(physical_dimensions) / 1000:.1f} cm³")
else:
print("Spatial metadata not directly available.")
print("Try loading with dicom_io for full metadata access.")
9. Format Detection¶
DoseMetrics can automatically detect the format of data in a folder.
from dosemetrics.io import detect_folder_format
# Check what format a folder contains
format_type = detect_folder_format(dicom_path)
print(f"Detected format: {format_type}")
# Verify it's DICOM
assert format_type == 'dicom', f"Expected 'dicom', got '{format_type}'"
print("✓ Format is DICOM as expected")
10. Working with Individual Structures¶
Access and analyze individual structure masks from DICOM RT-STRUCT.
# Load and analyze Brainstem structure
if "Brainstem" in structures:
brainstem = structures.get_structure("Brainstem")
brainstem_mask = brainstem.mask
print("Brainstem Mask:")
print("-" * 60)
print(f" Shape: {brainstem_mask.shape}")
print(f" Data type: {brainstem_mask.dtype}")
print(f" Voxels in mask: {(brainstem_mask > 0).sum():,}")
if hasattr(structures, 'spacing') and structures.spacing is not None:
voxel_volume_mm3 = np.prod(structures.spacing)
structure_volume_cm3 = (brainstem_mask > 0).sum() * voxel_volume_mm3 / 1000
print(f" Physical volume: {structure_volume_cm3:.2f} cm³")
else:
print("Brainstem structure not found in dataset")
11. Comparing DICOM and NIfTI Workflows¶
DoseMetrics provides a unified interface regardless of the source format.
print("Unified API Comparison:")
print("=" * 60)
print("\nLoading data:")
print(" DICOM: structures = load_structure_set(dicom_path)")
print(" NIfTI: structures = load_structure_set(nifti_path)")
print("\nAccessing structures:")
print(" Both: structure = structures.get_structure('PTV')")
print(" Both: mask = structure.mask")
print("\nIterating structures:")
print(" Both: for name in structures.structure_names:")
print("\nAssigning types:")
print(" Both: load_structure_set(path, structure_type_mapping=mapping)")
print("\n✓ The API is format-agnostic!")
Summary¶
In this notebook, you learned how to:
- ✓ Load DICOM RT-DOSE and RT-STRUCT data using
load_structure_set() - ✓ Work with the StructureSet API for convenient access
- ✓ Access dose distributions from RT-DOSE files
- ✓ Assign custom structure types (TARGET, OAR, etc.)
- ✓ Access spatial metadata from DICOM files
- ✓ Use low-level DICOM I/O functions for advanced control
- ✓ Explore DICOM folder structure (CT, RT-DOSE, RT-STRUCT, RT-PLAN)
- ✓ Analyze spatial properties and volumes
- ✓ Detect data format automatically
Key API Functions¶
High-level¶
load_structure_set(folder_path)- Auto-detect format and load all dataload_structure_set(folder_path, format='dicom')- Explicitly load as DICOM
Low-level¶
dicom_io.load_dicom_folder(folder_path)- Load all DICOM files as dictionarydicom_io.load_rt_struct(file_path)- Load RT-STRUCT filedicom_io.load_rt_dose(file_path)- Load RT-DOSE file
Utilities¶
detect_folder_format(folder_path)- Detect data formatStructureType- Enum for structure classification
Key Differences: DICOM vs NIfTI¶
| Aspect | DICOM | NIfTI |
|---|---|---|
| Format | Clinical standard | Research format |
| Structure | Multiple files (RT-STRUCT, RT-DOSE, CT) | Single file per volume |
| Metadata | Rich clinical metadata | Basic spatial metadata |
| File Size | Larger (uncompressed) | Smaller (compressed) |
| Loading | Requires parsing multiple files | Direct array loading |
| Use Case | Clinical workflows, archives | Research, ML pipelines |
Next Steps¶
- NIfTI I/O: See nifti-io.ipynb for NIfTI operations
- Comparing Plans: Learn how to compare treatment plans in comparing-plans.ipynb
- API Documentation: Explore the full DoseMetrics API