Skip to main content

Command Palette

Search for a command to run...

Top 5 Image Denoising Datasets Every Computer Vision Researcher Must Know

A Complete Guide to Noisy-Clean Paired Datasets, Noise Types, Metrics, Models, and Implementation — With Final Year Project Angles for Researchers, PhD, M.Tech, and Final Year Students

Updated
48 min read
Top 5 Image Denoising Datasets Every Computer Vision Researcher Must Know
F
AI enthusiast and academic researcher with a focus on deep learning, computer vision, and NLP. I write about IEEE-aligned project ideas, model architectures, and practical AI implementation guides for final year engineering students. Helping students bridge the gap between research papers and real-world code.

Who is this for? Final year B.Tech/M.Tech students building a denoising project, PhD researchers benchmarking new architectures, and CV practitioners who need to understand which dataset to trust and why. Every section is written to save you the 40+ hours of scattered paper-reading that most researchers go through before picking a dataset.

Introduction

What is Image Denoising?

Image denoising is the process of recovering a clean signal x from a corrupted observation y = x + n, where n represents noise introduced during image capture, transmission, or compression. It is one of the oldest and most foundational problems in image processing — and yet, in 2024, it remains wide open because real-world noise is far more complex than the toy Gaussian model that dominated two decades of research.

Modern denoising has evolved across three eras:

  • Filter-based era (1990s–2000s): Bilateral filters, non-local means (NLM), and BM3D exploited self-similarity and spatial statistics.

  • Discriminative learning era (2010s): CNNs such as DnCNN learned a residual mapping from noisy to clean image, dramatically outperforming classical methods on Gaussian noise benchmarks.

  • Transformer and diffusion era (2020s–present): Models like Restormer, SwinIR, NAFNet, and score-based diffusion models set state-of-the-art results on real-noise benchmarks, operating at full image resolution with global attention.

Each architectural leap was made possible — and measurable — only because the community agreed on common benchmark datasets.

Why Datasets Matter More Than You Think

A model's reported PSNR number is meaningless without knowing the dataset it was tested on. Here is why that matters for you:

  • Different noise types demand different architectures. A model trained on additive white Gaussian noise (AWGN) collapses on smartphone shot noise. If your evaluation dataset only contains AWGN, you will incorrectly conclude your model generalises.

  • Reproducibility depends on standardised splits. Using CBSD68 as your test set is a signal to reviewers that you are following established protocol. Using a random subset of ImageNet is a red flag.

  • Dataset licensing constrains deployment. Some datasets are research-only. If you plan to commercialise your denoising product, you need datasets with permissive licences.

  • Compute budget is gated by dataset size. SIDD-Full has 30,000+ image pairs; SIDD-Medium has 320 pairs. Knowing which variant to use can save a week of training.

How This Article is Structured

Each of the five dataset sections follows the same 12-subsection template: overview, origin, noise characteristics, image statistics, download/access, licence, how researchers use it, code to load it, reported state-of-the-art numbers, known limitations, research angles, and a quick-reference summary card. After the datasets, you get metrics, model benchmarks, data preparation recipes, a research gap radar, a week-by-week implementation roadmap, and the tooling ecosystem.

Read end to end for the first time. Return to individual sections as a reference.


What Makes a Great Image Denoising Dataset?

Before diving into the five datasets, it is worth understanding the evaluation criteria so you can apply them to future datasets as well.

1. Noise Type Coverage

A dataset should clearly document whether it contains synthetic noise (AWGN, Poisson, speckle, JPEG artefacts) or real noise captured optically. Real noise is signal-dependent: the noise variance at a given pixel correlates with the brightness of that pixel. This heteroscedastic property means classical AWGN-trained models fail catastrophically on real photos. A great dataset either focuses tightly on one noise type (enabling controlled comparison) or explicitly covers multiple noise levels/types (enabling generalisability evaluation).

2. Paired vs Unpaired Data

Paired datasets provide a noisy image and a corresponding ground-truth clean image of the same scene. This allows full-reference metrics (PSNR, SSIM) to be computed and enables supervised training with MSE/L1 losses. Capturing true pairs is technically challenging — even a millisecond between two captures introduces misalignment.

Unpaired datasets only contain noisy images (no clean reference). Training on unpaired data requires self-supervised or unsupervised approaches (Noise2Noise, Noise2Void, Blind2Unblind). These methods are increasingly relevant in medical imaging, satellite imagery, and other domains where clean ground truth is unobtainable.

3. Image Diversity

A dataset with 100 images, all photographed under the same ISO and lighting, will produce narrow benchmarks. Diversity means: variety of textures (smooth regions, fine detail, periodic patterns), variety of colour (indoor, outdoor, night, macro), variety of content (portraits, landscapes, architecture, documents), and variety of capture conditions (multiple cameras, multiple ISOs, multiple scenes). Greater diversity = more trustworthy generalisation claims.

4. Resolution and Quality

Training high-resolution models on a 200×200 pixel dataset teaches the network nothing about long-range dependencies. Conversely, a 50-megapixel dataset is unusable without significant preprocessing. The sweet spot for most academic research is full-frame sensor images (12–24 megapixels) that are cropped into 256×256 or 512×512 patches at training time.

5. Licence and Accessibility

Not all "publicly available" datasets are free for all uses. Creative Commons licences, research-only restrictions, and institutional data-sharing agreements all affect what you can do with a dataset. Before building a research paper, an open-source repo, or a commercial product, check the dataset licence.


Dataset 1 — CBSD68

1.1 Overview

CBSD68 (Colour Berkeley Segmentation Dataset, 68 images) is the most widely used Gaussian denoising benchmark in computer vision. If you have read any CNN-based denoising paper published between 2017 and 2024, you have almost certainly seen a PSNR table with columns labelled σ=15, σ=25, and σ=50 evaluated on CBSD68. It is the community's de facto standard for controlled, reproducible evaluation of AWGN denoising.

1.2 Origin and History

CBSD68 is a subset of the Berkeley Segmentation Dataset 500 (BSD500), originally created at UC Berkeley for the study of image segmentation and boundary detection. The 500 images in BSD500 are split into training (200), validation (100), and test (200) sets. The 68 images in CBSD68 are the colour versions of the BSD68 test set — a subset of the 200 test images that were selected because they had been photographed in colour and had reliable, non-blurry originals.

The dataset was popularised for denoising benchmarking by Zhang et al. in the 2017 DnCNN paper (Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Restoration, IEEE TIP), which made σ ∈ {15, 25, 50} AWGN the standard evaluation protocol.

1.3 Noise Characteristics

CBSD68 does not ship with noisy images. Instead, the community convention is to take the clean images and add synthetic AWGN at a chosen noise level σ at evaluation time. This has two implications:

  • The noisy images are perfectly reproducible as long as you fix the random seed and noise level.

  • The noise model is a simplification: real camera noise is signal-dependent (Poisson + Gaussian), spatially correlated in the Bayer pattern, and amplified nonlinearly by the camera's ISP pipeline. AWGN is none of these things.

Noise levels commonly evaluated: σ = 15 (mild, corresponds roughly to a low-ISO well-lit shot), σ = 25 (medium), σ = 50 (heavy, simulates high-ISO or compressed imagery).

1.4 Image Statistics

Attribute Value
Number of images 68
Colour space RGB (3-channel)
Original resolution ~481×321 or ~321×481 pixels
Content Natural scenes: animals, architecture, textures, people
Source camera Varied (multiple photographers in BSD500)
Clean ground truth Yes (the original images serve as clean references)

1.5 Download and Access

The dataset is available from the official Berkeley Computer Vision Group repository and is mirrored on multiple research hubs.

1.6 Licence

BSD500 images were collected under a research-use licence from the Berkeley Computer Vision Group. Commercial use is not explicitly authorised. For academic research and publications, the dataset is universally accepted. Verify the original licence terms at the Berkeley link above before any commercial deployment.

1.7 How Researchers Use CBSD68

CBSD68 is used exclusively as a test set. The standard pipeline is:

  1. Train the model on a separate training set (e.g., 400 images from BSD400, or DIV2K patches).

  2. At evaluation, add AWGN to each CBSD68 image at σ ∈ {15, 25, 50}.

  3. Run inference and compute average PSNR and SSIM across all 68 images.

  4. Report results in the standard table format that reviewers expect.

The separation of training and test split is critical. Researchers who accidentally include BSD68 images in their training set will report inflated numbers — this is a known pitfall (see Section 16).

The separation of training and test split is critical. Researchers who accidentally include BSD68 images in their training set will report inflated numbers — this is a known pitfall (see Section 16).

1.8 Code to Load and Evaluate on CBSD68

import os
import numpy as np
from PIL import Image
from skimage.metrics import peak_signal_noise_ratio as psnr
from skimage.metrics import structural_similarity as ssim

def add_awgn(img_np, sigma):
    """Add AWGN to a float32 image in [0, 1]."""
    noise = np.random.normal(0, sigma / 255.0, img_np.shape).astype(np.float32)
    return np.clip(img_np + noise, 0, 1)

def evaluate_cbsd68(model, dataset_path, sigma=25, seed=42):
    np.random.seed(seed)
    psnr_list, ssim_list = [], []
    
    image_files = sorted([f for f in os.listdir(dataset_path) if f.endswith('.png')])
    
    for fname in image_files:
        clean = np.array(Image.open(os.path.join(dataset_path, fname))).astype(np.float32) / 255.0
        noisy = add_awgn(clean, sigma)
        
        # model() should accept (H, W, 3) float32 and return the same shape
        denoised = model(noisy)
        
        psnr_val = psnr(clean, denoised, data_range=1.0)
        ssim_val = ssim(clean, denoised, data_range=1.0, multichannel=True, channel_axis=2)
        
        psnr_list.append(psnr_val)
        ssim_list.append(ssim_val)
    
    print(f"CBSD68 | σ={sigma} | PSNR: {np.mean(psnr_list):.2f} dB | SSIM: {np.mean(ssim_list):.4f}")
    return np.mean(psnr_list), np.mean(ssim_list)

1.9 State-of-the-Art Numbers on CBSD68

Model Year σ=15 PSNR σ=25 PSNR σ=50 PSNR
BM3D 2007 33.52 30.71 27.38
DnCNN-C 2017 33.90 31.24 27.95
FFDNet 2018 33.87 31.21 27.96
RIDNet 2019 31.40 28.18
SwinIR-DN 2021 34.42 31.78 28.56
Restormer 2022 34.81 32.00 28.75
NAFNet 2022 34.75 31.92 28.67

PSNR in dB, higher is better. Numbers sourced from respective papers. Minor differences exist across implementations due to border handling.

1.10 Known Limitations

  • Synthetic noise only. The gap between AWGN benchmarks and real-world performance can be 2–5 dB, depending on the camera ISP.

  • Small test set. 68 images have relatively high variance; a model can get lucky with 1–2 difficult textures and appear better than it is.

  • No high-resolution images. The ~321×481 px images do not challenge models on large-scale spatial dependencies.

  • Colour handling ambiguity. Some papers evaluate on greyscale converted versions; ensure you compare like with like.

1.11 Research Angles for Final Year / PhD Students

  • Noise-level estimation: Build a blind denoising system that estimates σ from the image before denoising. Evaluate on CBSD68 with random σ ∈ [5, 75].

  • Lightweight architectures: Propose a mobile-friendly denoiser and show competitive PSNR on CBSD68 with <1M parameters.

  • Perceptual loss study: Show that models optimised purely for PSNR on CBSD68 fail perceptual quality metrics (LPIPS). Introduce a combined loss and measure both.

  • Frequency-domain denoising: Analyse noise behaviour in DCT/Wavelet domain and propose a transform-domain architecture benchmarked on CBSD68.

1.12 Quick Reference Card

CBSD68 | 68 colour images | AWGN (synthetic, σ-configurable) | Research-only licence | Use as: test set | Primary metrics: PSNR, SSIM | Download: berkeley.edu / github.com/cszn


Dataset 2 — SIDD

2.1 Overview

SIDD (Smartphone Image Denoising Dataset) was introduced by Abdelhamed et al. at CVPR 2018. It is the most important real-noise benchmark of the last five years and has become the community standard for evaluating real-world denoising in the same way CBSD68 anchors synthetic denoising. If you are working on anything related to smartphone photography, low-light imaging, or real-noise denoising, SIDD is non-negotiable.

2.2 Origin and History

SIDD was created by researchers at York University and Samsung Research. The motivating insight was that AWGN-trained models fail on real photos because real sensor noise has a fundamentally different structure. The authors used five modern smartphones (Apple iPhone 7, Google Pixel, Samsung Galaxy S6 Edge, Motorola Nexus 6, and LG G4) to capture 10 static indoor scenes, each under four distinct lighting conditions and two ISO settings, using both a standard and a noisy capture mode.

Clean reference images were obtained by averaging hundreds of burst-captured frames of the same static scene — a rigorous approach that produces high-fidelity ground truth by averaging out independent noise instances. This methodology makes SIDD ground truth far more reliable than many competing datasets that use lower-quality alignment-based methods.

2.3 Noise Characteristics

SIDD noise is real camera noise that includes:

  • Shot noise (Poisson): Signal-dependent noise proportional to photon count.

  • Read noise (Gaussian-like): Arising from electronics in the sensor readout circuitry.

  • Fixed pattern noise: Pixel-specific sensitivity variations baked into the sensor.

  • ISP artefacts: Demosaicing, sharpening, and tone-mapping in the camera pipeline alter noise statistics in nonlinear, camera-specific ways.

The noise is significantly more structured than AWGN. Critically, the noise level varies with brightness — bright regions are relatively clean, dark regions are significantly noisier. This signal-dependent property must be accounted for in any model trained on SIDD.

2.4 Image Statistics

Attribute Value
Full dataset pairs 30,000 noisy-clean patches (SIDD-Full)
Medium variant pairs 320 noisy-clean full-frame images (SIDD-Medium)
Benchmark (test) pairs 1,280 patches — clean targets withheld by server
Resolution Full-frame sensor outputs, various resolutions
Cameras 5 smartphones
Scenes 10 indoor, 4 lighting conditions, 2 ISO levels
Colour space sRGB (post-ISP)

2.5 Download and Access

SIDD is hosted officially at the York University research portal:

  • Official page and all variants: https://www.eecs.yorku.ca/~kamel/sidd/

  • SIDD-Medium (recommended for training/dev): Linked from the official page — search "SIDD Medium Dataset" on the page.

  • SIDD Benchmark submission portal: https://www.eecs.yorku.ca/~kamel/sidd/benchmark.php — submit denoised results here; the server computes PSNR/SSIM against the withheld clean images.

  • Google Drive mirror is also linked from the official page for both SIDD-Small and SIDD-Medium.

2.6 Licence

SIDD is released for non-commercial research use only. You may use it for academic papers, thesis work, and public research benchmarks. Commercial deployment (e.g., in a product or mobile app) requires a separate licence agreement with the dataset creators.

2.7 How Researchers Use SIDD

SIDD has two standard usage modes:

Training: Use SIDD-Medium (320 pairs) or the patches extracted from SIDD-Full for supervised training. The community standard is to train on SIDD-Medium + DND training data (when available) and evaluate on the SIDD benchmark.

Evaluation: The SIDD benchmark withholds ground truth for the 1,280 test patches. You submit your denoised outputs to the evaluation server, which returns PSNR and SSIM. This prevents overfitting to the test set. Scores are automatically listed on the public leaderboard.

2.8 Code to Load SIDD Patches

import h5py
import numpy as np

def load_sidd_medium(noisy_path, clean_path):
    """
    Load SIDD-Medium as HDF5 patch arrays.
    noisy_path: path to SIDD_Medium_Srgb_noisy_matlab.mat
    clean_path: path to SIDD_Medium_Srgb_gt_matlab.mat
    """
    with h5py.File(noisy_path, 'r') as f:
        noisy = np.array(f['x'])   # shape: (N, H, W, C) or (N, C, H, W)
    
    with h5py.File(clean_path, 'r') as f:
        clean = np.array(f['y'])

    # Transpose if channels-first (C, H, W) → (H, W, C)
    if noisy.ndim == 4 and noisy.shape[1] == 3:
        noisy = noisy.transpose(0, 2, 3, 1)
        clean = clean.transpose(0, 2, 3, 1)
    
    # Normalise to [0, 1]
    noisy = noisy.astype(np.float32) / 255.0
    clean = clean.astype(np.float32) / 255.0
    
    print(f"Loaded {noisy.shape[0]} pairs, shape {noisy.shape[1:]}")
    return noisy, clean

2.9 State-of-the-Art Numbers on SIDD Benchmark

Model Year PSNR (dB) SSIM
CBDNet 2019 38.06 0.942
RIDNet 2019 38.71 0.951
MPRNet 2021 39.71 0.958
SwinIR-Real 2021 39.96 0.960
Restormer 2022 40.02 0.960
NAFNet 2022 40.30 0.961
MixDenoiser 2023 40.50 0.962

Benchmark PSNR/SSIM from the official SIDD leaderboard. Numbers may vary slightly across evaluation protocols.

2.10 Known Limitations

  • Indoor scenes only. SIDD has no outdoor, nighttime, or telephoto images. Models trained exclusively on SIDD may struggle on outdoor or macro photography.

  • sRGB space. The noise statistics in SIDD reflect post-ISP processing. Models trained on SIDD cannot be directly applied to RAW sensor data without adaptation.

  • Five cameras. While diverse, five smartphones do not cover DSLR, mirrorless, or medium-format noise profiles.

  • Static scenes only. Motion blur is absent; the clean ground truth averaging technique only works for static content.

2.11 Research Angles for Final Year / PhD Students

  • Camera-agnostic denoising: Train on data from two smartphones in SIDD and test on the remaining three. Measure the domain gap and propose a camera-metadata-conditioned architecture.

  • RAW vs sRGB denoising pipeline: Extend SIDD to RAW by pairing it with a known ISP model, then compare denoising-in-RAW vs denoising-in-sRGB.

  • Few-shot adaptation: Use only 10–20 SIDD pairs for fine-tuning a model pre-trained on CBSD68. Measure sample efficiency.

  • Self-supervised denoising on SIDD: Apply Noise2Noise or Blind2Unblind using only noisy inputs from SIDD; compare to fully supervised baseline.

2.12 Quick Reference Card

SIDD | 320 full-frame pairs (Medium) / 30K patches (Full) / 1,280 test patches (Benchmark) | Real smartphone noise | Non-commercial research licence | Use as: training + benchmark evaluation | Primary metrics: PSNR, SSIM via server | Download: eecs.yorku.ca/~kamel/sidd/


Dataset 3 — DND

3.1 Overview

DND (Darmstadt Noise Dataset) is a real-noise benchmark released by Plotz and Roth at CVPR 2017. DND occupies a unique position in the denoising ecosystem: it provides only test images — no training data. Its purpose is pure evaluation. The clean ground truth is withheld by a server, preventing any possibility of test-set leakage. Together with SIDD, DND defines the two-pillar standard for real-noise denoising evaluation.

3.2 Origin and History

DND was created at TU Darmstadt, Germany. The key technical challenge the authors solved was obtaining reliable clean reference images for real-world scenes — without the motion blur that comes from averaging burst captures of non-static scenes. Their solution: photograph scenes using a DSLR with two different ISOs. The low-ISO image (nearly noise-free) serves as the clean reference; the high-ISO image (visibly noisy) is the input. Because both images are captured within seconds of each other with no scene movement, alignment is excellent.

The dataset uses four consumer cameras (Sony A7R II, Olympus OM-D E-M10, Panasonic Lumix DMC-GX7, Nikon D600) covering a range of sensor sizes and noise profiles — from micro four-thirds to full-frame 36MP.

3.3 Noise Characteristics

DND noise is real optical noise from high-ISO captures with:

  • Poisson shot noise dominating bright, well-exposed regions.

  • Gaussian read noise more visible in dark/shadow areas.

  • Luminance-only noise amplification in many cases (cameras suppress chroma noise aggressively in their ISPs).

  • Correlation between colour channels due to demosaicing.

Compared to SIDD, DND noise tends to be visually coarser at the patch level because high-ISO DSLR images amplify noise more aggressively than smartphone HDR pipelines. DND represents a harder denoising problem in terms of absolute noise magnitude.

3.4 Image Statistics

Attribute Value
Number of image pairs 50 high-ISO / low-ISO pairs
Patches per image 20 (size 512×512 each)
Total test patches 1,000
Resolution Full-frame DSLR, up to 36 megapixels before cropping
Cameras 4 (Sony, Olympus, Panasonic, Nikon)
Content Indoor and outdoor scenes, diverse textures
Clean ground truth Withheld — server-side evaluation only

3.5 Download and Access

DND is hosted at TU Darmstadt and remains one of the cleanest examples of benchmark hosting in the community:

3.6 Licence

DND is available for academic, non-commercial research use. You must register and accept the terms on the official website before downloading. Commercial use is prohibited.

3.7 How Researchers Use DND

DND is a test-only benchmark. No training is done on DND data. The standard workflow is:

  1. Train your model on SIDD-Medium, or SIDD + CBSD68 (with synthetic noise), or a mix of real and synthetic data.

  2. Run inference on the 1,000 DND patches.

  3. Zip and upload the denoised patches to the DND benchmark server.

  4. The server scores and publishes your result.

Many papers that evaluate on SIDD also evaluate on DND to demonstrate cross-dataset generalisation. A model that is strong on SIDD but weak on DND has likely overfit to smartphone-specific noise characteristics.

3.8 Code to Load and Prepare DND Submissions

import os
import numpy as np
import scipy.io as sio
from PIL import Image

def load_dnd_patches(dnd_data_path):
    """
    Load DND benchmark patches for inference.
    dnd_data_path: path to folder containing SIDD-style .mat files or png patches
    Returns list of (patch_id, noisy_patch) tuples.
    """
    patches = []
    info_path = os.path.join(dnd_data_path, 'info.mat')
    
    if os.path.exists(info_path):
        info = sio.loadmat(info_path)
        # Each row in info['boundingboxes'] gives crop coordinates
        for i, bbox in enumerate(info['boundingboxes'][0]):
            img_path = os.path.join(dnd_data_path, 'images_srgb', f'dnd_{i+1:04d}.mat')
            img_data = sio.loadmat(img_path)
            img = img_data['Inoisy']
            
            # Extract the 20 patches per image
            for j, bb in enumerate(bbox[0]):
                r, c, h, w = int(bb[0]), int(bb[1]), int(bb[2]), int(bb[3])
                patch = img[r:r+h, c:c+w, :]
                patches.append((f"{i+1:04d}_{j+1:02d}", patch.astype(np.float32)))
    
    return patches

def save_dnd_submission(denoised_patches, output_dir):
    """Save denoised patches in DND submission format."""
    os.makedirs(output_dir, exist_ok=True)
    for patch_id, patch in denoised_patches:
        patch_uint8 = (np.clip(patch, 0, 1) * 255).astype(np.uint8)
        Image.fromarray(patch_uint8).save(os.path.join(output_dir, f'{patch_id}.png'))
    print(f"Saved {len(denoised_patches)} patches to {output_dir}")

3.9 State-of-the-Art Numbers on DND Benchmark

Model Year PSNR (dB) SSIM
CBDNet 2019 38.06 0.942
RIDNet 2019 39.26 0.953
CycleISP 2020 39.56 0.956
MPRNet 2021 39.80 0.954
SwinIR-Real 2021 39.96 0.952
Restormer 2022 40.03 0.956
NAFNet 2022 40.30 0.957

From the DND public leaderboard and respective papers. Always cross-reference with the live leaderboard at noise.visinf.tu-darmstadt.de for the most current scores.

3.10 Known Limitations

  • Test-only — no training set. You cannot train on DND. This is by design but can be limiting for researchers wanting to study training dynamics on real noise.

  • Static scenes, indoor bias. Similar to SIDD, dynamic scenes and extreme outdoor conditions are absent.

  • DSLR only. DND does not cover smartphone noise, making it complementary to, rather than a replacement for, SIDD.

  • 50 scenes × 4 cameras. The evaluation diversity, while better than synthetic-only benchmarks, is still limited compared to what a commercial product will encounter in the wild.

3.11 Research Angles for Final Year / PhD Students

  • Cross-dataset generalisation study: Train exclusively on SIDD-Medium and evaluate on DND without any DND-specific fine-tuning. Characterise the PSNR gap and its causes.

  • Camera-blind real denoising: Design a model that reads EXIF metadata (ISO, exposure time, camera model) and conditions its denoising on that information. Evaluate on DND's four cameras separately.

  • Unsupervised real denoising: Apply a Noise2Noise framework where both the noisy input and pseudo-clean target come from burst captures — simulate this scenario using DND pairs.

  • High-resolution patch denoising: Study how model performance changes as patch size increases from 128×128 to 512×512 using DND's large patches.

3.12 Quick Reference Card

DND | 50 image pairs → 1,000 test patches (512×512) | Real DSLR noise, 4 cameras | Non-commercial research licence | Use as: test-only benchmark | Primary metrics: PSNR, SSIM via server | Download: noise.visinf.tu-darmstadt.de


Dataset 4 — Kodak24

4.1 Overview

Kodak24 (also called the Kodak Lossless True Color Image Suite) is a collection of 24 high-quality, uncompressed colour photographs originally released by the Eastman Kodak Company in the early 1990s. Despite being over three decades old, Kodak24 remains one of the most used evaluation datasets in image restoration — appearing in papers on denoising, super-resolution, compression artefact reduction, and image enhancement. Its enduring relevance speaks to a simple truth: high-quality, artefact-free, diverse images remain scarce.

4.2 Origin and History

The Kodak dataset was digitised and released by Kodak as a standard image quality reference for the colour imaging and printing industry. The 24 images span a wide variety of photographic subjects: landscapes, architecture, people, animals, textiles, and close-up objects. The images were captured on professional photographic equipment and digitised at a time when CCD sensor quality was industry-leading.

In the denoising community, the dataset entered widespread use because it is freely hosted, small enough to test on quickly, and contains challenging textures (fine fabric, foliage, hair) that stress test frequency-preserving denoising algorithms.

4.3 Noise Characteristics

Like CBSD68, Kodak24 contains only clean images. Noisy versions are generated synthetically at test time. The standard evaluation protocol follows the same AWGN convention: add Gaussian noise at σ ∈ {15, 25, 50} and compute PSNR/SSIM of the denoised output vs the original.

What distinguishes Kodak24 from CBSD68 is the image resolution and quality — Kodak24 images are 768×512 pixels (or 512×768 in portrait orientation), approximately 2.5× larger than BSD images. This makes Kodak24 a better test for models that need to handle large-scale spatial structure and long-range texture coherence.

4.4 Image Statistics

Attribute Value
Number of images 24
Resolution 768×512 or 512×768 (portrait/landscape)
Colour space 24-bit RGB (true colour, uncompressed)
Content Landscapes, portraits, architecture, animals, textiles
Noise None (synthetic AWGN added at eval time)
Clean ground truth Yes (original images)

4.5 Download and Access

Kodak24 is freely hosted at multiple permanent locations:

# Download all 24 Kodak images via wget
for i in $(seq -w 1 24); do
  wget http://r0k.us/graphics/kodak/kodak/kodim${i}.png
done

4.6 Licence

The Kodak images were released by Eastman Kodak Company as a reference standard. They are widely considered to be in the public domain or at minimum freely usable for academic research. Multiple decades of use in peer-reviewed publications without any IP challenge supports this position. However, confirm the current status at the source URL if commercial use is planned.

4.7 How Researchers Use Kodak24

Kodak24 is used as a secondary test set alongside CBSD68. Papers that report results on both datasets provide stronger evidence of generalisation. Because Kodak24 has larger images and different photographic character than BSD images, it is particularly useful for revealing differences in texture rendering and edge preservation.

Some researchers also use Kodak24 as a training set or part of it — particularly in compression artefact removal, where there is no standard training/test split. For denoising, it is almost always used as a test set.

4.8 Code to Load and Evaluate on Kodak24

import os
import numpy as np
from PIL import Image
from skimage.metrics import peak_signal_noise_ratio as psnr
from skimage.metrics import structural_similarity as ssim

def evaluate_kodak24(model, dataset_path, sigma=25, seed=0):
    """Evaluate a denoising model on Kodak24 with AWGN at level sigma."""
    np.random.seed(seed)
    results = []
    
    image_files = sorted([f for f in os.listdir(dataset_path) 
                          if f.lower().startswith('kodim') and f.endswith('.png')])
    
    for fname in image_files:
        clean = np.array(Image.open(os.path.join(dataset_path, fname))).astype(np.float32) / 255.0
        noise = np.random.normal(0, sigma / 255.0, clean.shape).astype(np.float32)
        noisy = np.clip(clean + noise, 0, 1)
        
        denoised = model(noisy)  # model returns float32 [0,1]
        
        p = psnr(clean, denoised, data_range=1.0)
        s = ssim(clean, denoised, data_range=1.0, multichannel=True, channel_axis=2)
        results.append((fname, p, s))
    
    avg_psnr = np.mean([r[1] for r in results])
    avg_ssim = np.mean([r[2] for r in results])
    print(f"Kodak24 | σ={sigma} | PSNR: {avg_psnr:.2f} dB | SSIM: {avg_ssim:.4f}")
    return results

4.9 State-of-the-Art Numbers on Kodak24

Model Year σ=15 PSNR σ=25 PSNR σ=50 PSNR
BM3D 2007 34.28 31.68 28.46
DnCNN-C 2017 34.48 31.73 28.44
FFDNet 2018 34.63 31.89 28.66
SwinIR-DN 2021 35.34 32.44 29.18
Restormer 2022 35.47 32.55 29.25

PSNR in dB. Kodak24 numbers are slightly higher than CBSD68 due to higher native resolution and different content distribution.

4.10 Known Limitations

  • Only 24 images. Variance on 24 samples is high; a single difficult image (e.g., kodim23 — the parrot with saturated colours) can swing the average by 0.3 dB.

  • Synthetic noise only. Same limitation as CBSD68.

  • Age of the dataset. Kodak images from the 1990s do not reflect modern camera characteristics (HDR, wide gamut, AI-processed colour science).

  • No standardised test/train split. Different papers use Kodak24 in different ways; always specify your protocol.

4.11 Research Angles for Final Year / PhD Students

  • Colour-aware denoising: Kodak24's high-saturation images (the parrot, the flowers) are challenging because noise in saturated colours is perceptually more visible. Design a colour-space-aware loss function.

  • Perceptual quality on textures: Use Kodak24's rich fabric and foliage images to study the PSNR–perceptual quality trade-off. Show that PSNR-optimal denoising over-smooths fine textures.

  • Cross-resolution evaluation: Downsample Kodak24 images to BSD-equivalent resolution and measure performance difference; quantify resolution's role in benchmark scores.

  • Image quality without ground truth: Since Kodak24 clean images are available, use them to train/calibrate no-reference metrics (NIQE, BRISQUE) and evaluate whether they correlate with PSNR.

4.12 Quick Reference Card

Kodak24 | 24 colour images | AWGN (synthetic) | Public domain / freely usable | Use as: secondary test set | Primary metrics: PSNR, SSIM | Download: r0k.us/graphics/kodak / github.com/cszn


Dataset 5 — PolyU

5.1 Overview

PolyU (Hong Kong Polytechnic University Real-World Noisy Image Dataset) is a real-noise dataset introduced by Xu et al. in 2018. PolyU is smaller than SIDD and DND but provides important complementary coverage: it spans multiple camera types (compact, DSLR, and smartphone) and multiple ISO levels, and it was collected with a different bursting methodology that offers useful diversity. For researchers who want to test cross-dataset robustness or study real-noise structure without the complexity of SIDD's server-based evaluation, PolyU is a practical choice.

5.2 Origin and History

PolyU was created by the Image Computing Group at The Hong Kong Polytechnic University. The dataset was built to address a gap in the literature: SIDD focused heavily on smartphones, and DND used only a DSLR-style approach. PolyU explicitly includes a consumer-grade compact camera, reflecting real-world use cases where researchers, journalists, and citizen scientists use non-flagship equipment.

The clean reference images in PolyU were obtained by averaging 500 burst frames of the same static scene, producing very low-noise reference images that approach the true clean signal with high confidence.

5.3 Noise Characteristics

PolyU noise is real sensor noise from multiple camera types:

  • Compact camera (Canon PowerShot G7X Mark II): Smaller sensor, higher noise at equivalent ISO.

  • DSLR (Sony A7 II): Larger sensor, lower noise floor, different noise texture.

  • Smartphone (Apple iPhone 8, Huawei Mate 10): ISP-heavy pipeline with significant noise suppression already applied.

Because cameras are explicitly listed with metadata, PolyU is particularly useful for camera-conditioned denoising research where you want to evaluate how well a model generalises across hardware platforms.

5.4 Image Statistics

Attribute Value
Number of scenes 100
Cameras 4 (Canon compact, Sony DSLR, iPhone, Huawei)
ISO levels Multiple per scene (e.g., ISO 1600, 3200, 6400)
Total noisy/clean pairs 400+ (100 scenes × multiple ISOs)
Resolution Varies by camera (full sensor output)
Colour space sRGB
Clean reference Mean of 500 burst frames

5.5 Download and Access

PolyU is hosted on the Hong Kong Polytechnic University research page and available for direct download:

bash

# Clone the GitHub repository to get the data directly
git clone https://github.com/csjunxu/PolyU-Real-World-Noisy-Images-Dataset.git

5.6 Licence

PolyU is released for academic and non-commercial research use. The GitHub repository states this explicitly. Citation of the original paper (Xu et al., 2018, ECCV) is required when using the dataset.

5.7 How Researchers Use PolyU

PolyU is commonly used in two ways:

As a secondary real-noise test set: Researchers train on SIDD-Medium, test on DND (via server) and SIDD benchmark, and then additionally report PolyU numbers to demonstrate cross-dataset generalisation. Strong performance on all three real-noise datasets is a compelling generalisation claim.

As a training set for multi-camera denoising: Because PolyU covers multiple camera types, it is used alongside SIDD as training data for camera-agnostic denoisers. The DSLR and compact camera data are especially valuable for filling in the camera coverage gaps in SIDD.

5.8 Code to Load PolyU Pairs

import os
import numpy as np
from PIL import Image
from glob import glob

def load_polyu_pairs(polyu_root):
    """
    Load PolyU noisy-clean pairs.
    Expects directory structure:
      polyu_root/
        Real_noisy/   <- noisy images
        Mean/         <- clean (mean-averaged) images
    """
    noisy_files = sorted(glob(os.path.join(polyu_root, 'Real_noisy', '*.png')))
    clean_files = sorted(glob(os.path.join(polyu_root, 'Mean', '*.png')))
    
    assert len(noisy_files) == len(clean_files), "Mismatch between noisy and clean files"
    
    pairs = []
    for nf, cf in zip(noisy_files, clean_files):
        noisy = np.array(Image.open(nf)).astype(np.float32) / 255.0
        clean = np.array(Image.open(cf)).astype(np.float32) / 255.0
        pairs.append((noisy, clean))
    
    print(f"Loaded {len(pairs)} PolyU noisy-clean pairs")
    return pairs

def evaluate_polyu(model, pairs):
    """Run inference and compute PSNR on PolyU pairs."""
    from skimage.metrics import peak_signal_noise_ratio as psnr
    psnr_list = []
    for noisy, clean in pairs:
        denoised = model(noisy)
        psnr_list.append(psnr(clean, denoised, data_range=1.0))
    print(f"PolyU PSNR: {np.mean(psnr_list):.2f} dB")
    return psnr_list

5.9 State-of-the-Art Numbers on PolyU

Model Year PSNR (dB) SSIM
BM3D 2007 37.79 0.951
CBDNet 2019 38.12 0.953
RIDNet 2019 39.23 0.962
AINDNet 2020 39.54 0.965
MPRNet 2021 39.79 0.966
Restormer 2022 40.41 0.968

Numbers from respective papers. PolyU PSNR values are slightly higher than DND due to different noise levels in the dataset.

5.10 Known Limitations

  • Small by modern standards. 100 scenes × ~4 cameras = 400 pairs is small for training deep networks. Usually supplemented with SIDD.

  • Alignment challenges. Unlike SIDD's burst-averaged approach, some PolyU pairs have subtle misalignment that can bias PSNR estimates upward.

  • Static indoor scenes. PolyU scenes are predominantly indoor/still-life, limiting outdoor generalisation.

  • Less standardised splits. Unlike SIDD and DND, there is no official server-based evaluation; results are computed locally by different researchers using different crops, introducing some inconsistency.

5.11 Research Angles for Final Year / PhD Students

  • Multi-camera denoising network: Train a single network jointly on PolyU (compact + DSLR) and SIDD (smartphone) and test generalisation to unseen cameras.

  • ISO-conditioned denoising: PolyU provides multiple ISOs for the same scene. Design a model conditioned on ISO level and measure per-ISO PSNR improvement.

  • Burst photography denoising: Use PolyU's multiple captures of the same scene to implement a temporal denoising system (align frames, merge, denoise).

  • Alignment-aware training: Study the effect of subtle misalignment in PolyU pairs on model training stability. Propose a misalignment-robust loss function.

5.12 Quick Reference Card

PolyU | ~400 noisy-clean pairs | Real noise, 4 cameras (compact, DSLR, 2× smartphone) | Non-commercial research licence | Use as: training supplement + secondary test set | Primary metrics: PSNR, SSIM (local) | Download: github.com/csjunxu/PolyU-Real-World-Noisy-Images-Dataset


Image Denoising Metrics Explained

PSNR — Peak Signal-to-Noise Ratio

PSNR is the oldest and most universal denoising metric. It is defined as:

$$PSNR = 10 × log₁₀(MAX² / MSE)$$

where MAX is the maximum pixel value (255 or 1.0) and MSE is the mean squared error between the denoised and clean image. PSNR is fast to compute, easy to interpret (higher = better), and universally reported. Its main weakness is poor correlation with human perception — two images with the same PSNR can look dramatically different, particularly when one has ringing artefacts and the other has texture blurring.

Typical range: 25–45 dB for denoising tasks. A difference of 0.5 dB is considered meaningful; a difference of 0.1 dB is at the noise floor of evaluation variance.

SSIM — Structural Similarity Index

SSIM measures the similarity between two images in terms of luminance, contrast, and structure, producing a score between 0 and 1 (higher = better). It correlates better with human quality perception than PSNR for artefacts such as blurring and ringing. SSIM is always reported alongside PSNR in denoising papers. Its main limitation is sensitivity to global brightness shifts and poor discrimination at very high quality levels (scores above 0.98 all look the same numerically but may look different perceptually).

LPIPS — Learned Perceptual Image Patch Similarity

LPIPS uses the feature space of a pre-trained deep network (typically VGG or AlexNet) to measure perceptual similarity. Lower LPIPS = more perceptually similar. LPIPS is increasingly reported in high-quality restoration papers because it captures texture fidelity and naturalness that PSNR and SSIM miss. A model can achieve +0.3 dB PSNR over a competitor while having worse LPIPS — this is the "over-sharpening" problem.

FSIM — Feature Similarity Index

FSIM measures similarity using phase congruency (a biologically motivated edge/feature detector) and gradient magnitude. It correlates strongly with human judgements on natural images and is robust to contrast changes that confuse SSIM. Less commonly reported than PSNR/SSIM but valuable for texture-heavy datasets like Kodak24.

NIQE — Naturalness Image Quality Evaluator

NIQE is a no-reference (blind) metric — it does not require a clean reference image. It measures how much the denoised image deviates from a statistical model of natural image patches. Lower NIQE = more natural. Useful when you do not have ground truth, such as when evaluating on your own real-world photos. Its limitation is that it is not always well-correlated with denoising-specific quality.

BRISQUE — Blind/Referenceless Image Spatial Quality Evaluator

BRISQUE is another no-reference metric that uses natural scene statistics (NSS) in the spatial domain. It produces a score where lower = better quality. BRISQUE is implemented in scikit-image and OpenCV, making it easy to use for automated quality screening of large batches.

Which Metric for Which Task?

Scenario Recommended Metrics
AWGN denoising benchmark (CBSD68, Kodak24) PSNR + SSIM
Real-noise benchmark (SIDD, DND, PolyU) PSNR + SSIM (server)
Perceptual quality comparison LPIPS + SSIM
Blind denoising on unlabelled data NIQE or BRISQUE
Full research paper PSNR + SSIM + LPIPS + NIQE
Mobile/latency-constrained eval PSNR only (fast)

Comparison Table — All 5 Datasets Across 12+ Attributes

Attribute CBSD68 SIDD DND Kodak24 PolyU
Year 2001 (BSD) 2018 2017 1990s 2018
# Images / Pairs 68 images 320 pairs (Medium) 50 pairs → 1,000 patches 24 images ~400 pairs
Noise Type Synthetic AWGN Real (smartphone) Real (DSLR) Synthetic AWGN Real (multi-camera)
Noise Level σ ∈ {15,25,50} Multiple ISOs Multiple ISOs σ ∈ {15,25,50} Multiple ISOs
Ground Truth Yes (synthetic) Yes (burst avg) Server-withheld Yes (synthetic) Yes (burst avg)
Resolution ~481×321 Full-frame 512×512 patches 768×512 Full-frame
Camera Types N/A 5 smartphones 4 DSLRs N/A 4 (compact+DSLR+phone)
Colour Space RGB sRGB sRGB RGB sRGB
Use as Train No (test only) Yes No No Yes
Use as Test Yes Yes (server) Yes (server) Yes Yes (local)
Licence Research Non-commercial Non-commercial Public domain Non-commercial
Best For AWGN baselines Smartphone noise DSLR real noise Secondary AWGN test Multi-camera generalisation
Noise Model Additive Gaussian Shot + read + ISP Shot + read Additive Gaussian Shot + read + ISP

How to Choose the Right Dataset

By Noise Type

You are studying Gaussian denoising algorithms (theoretical analysis, new architectures on clean formulations): use CBSD68 and Kodak24. These are the community standard and ensure your results are directly comparable to hundreds of published papers.

You are working on real-world smartphone photography (mobile app, camera app, low-light mode): use SIDD as your primary benchmark. It is the most directly representative dataset for this use case.

You need to demonstrate cross-domain generalisation: test on both SIDD and DND. A model that scores well on both real-noise benchmarks from different camera families has demonstrated meaningful generalisation.

You want multi-camera coverage or are building a camera-agnostic system: use PolyU alongside SIDD for training, and DND + SIDD for evaluation.

By Domain

Medical imaging: None of the five datasets is directly appropriate. Adapt the methodology (paired capture of static phantoms) from SIDD to your imaging modality. CBSD68-style evaluation with Gaussian noise can serve as a sanity check.

Satellite imagery: Real satellite noise (quantisation + thermal + shot) is unlike any of the five datasets. Use them for pre-training and fine-tune on domain-specific data.

Document scanning: CBSD68 and synthetic noise evaluation can approximate scanner noise; supplement with your own scanned document pairs.

By Task

Blind denoising (unknown σ): CBSD68 with σ sampled uniformly from [0, 75]. Report results at σ=15, 25, 50 for comparability.

Non-blind denoising (known σ): CBSD68, Kodak24.

Real-noise denoising (unknown noise model): SIDD, DND, PolyU.

Self-supervised / unpaired denoising: SIDD (noisy frames only) or collect your own real noisy images.

By Compute Budget

GPU-constrained (≤ 8 GB VRAM, < 24h training): Train on SIDD-Small or SIDD-Medium (320 pairs). Evaluate on CBSD68 and SIDD benchmark.

Standard research (24 GB VRAM, 2–5 day training): Train on SIDD-Medium + DnCNN standard training set. Evaluate on CBSD68, Kodak24, SIDD, DND, PolyU.

Large-scale (multi-GPU, > 1 week training): Use SIDD-Full (30,000+ pairs) + large synthetic augmentation. Restormer and NAFNet scale in this regime.


Common Denoising Models Benchmarked

Understanding the landmark models is essential for positioning your own research contribution. Here is a concise reference guide to the eight most important models across the three eras.

BM3D (2007): The gold standard non-learning baseline. Groups similar patches into 3D arrays and applies collaborative filtering. Still competitive with early CNNs and serves as the lower bound that all deep models must beat. Available in Python via the bm3d package.

DnCNN (2017, Zhang et al., IEEE TIP): The paper that established residual learning for denoising. DnCNN learns the noise residual rather than the clean image, enabling very deep networks. Its blind version (DnCNN-B) handles a range of σ values. Benchmark: CBSD68 σ=25 PSNR 31.24 dB.

FFDNet (2018, Zhang et al., IEEE TIP): Extends DnCNN by accepting a noise level map as input, making it the first practical flexible-blind denoiser. Faster than DnCNN due to downsampling before convolution. CBSD68 σ=25 PSNR 31.21 dB.

CBDNet (2019, Guo et al., CVPR): First major blind real-noise denoiser combining a noise estimation sub-network with a denoising sub-network. Trained jointly on synthetic and real noisy data. Benchmark: SIDD 38.06 dB.

RIDNet (2019, Anwar & Barnes, ICCV): Feature Attention modules + multi-scale architecture. Strong real-noise performance. SIDD 38.71 dB, DND 39.26 dB.

SwinIR (2021, Liang et al., ICCV Workshop): First transformer-based image restoration model using Shifted Window attention. Revolutionised the benchmark table — SwinIR-DN set CBSD68 σ=25 PSNR 31.78 dB, SwinIR-Real set SIDD 39.96 dB.

Restormer (2022, Zamir et al., CVPR): Multi-Dconv Head Transposed Attention — computes attention across channels rather than spatial tokens, enabling efficient full-resolution processing. Sets or matches SOTA on every major benchmark.

NAFNet (2022, Chen et al., ECCV): Removes all non-linear activations and replaces them with a single Simple Gate. Achieves competitive or superior PSNR with faster inference than Restormer. SIDD 40.30 dB — among the highest reported.


How to Prepare Noisy-Clean Pairs for Training

Synthetic Noise Generation

Synthetic noise is deterministic and fully controllable. The standard approach for AWGN benchmarks:

import numpy as np
from PIL import Image

def generate_awgn_pairs(clean_images, sigma_range=(10, 50)):
    """
    Generate AWGN noisy-clean pairs with random sigma per image.
    clean_images: list of float32 numpy arrays in [0, 1], shape (H, W, 3)
    Returns list of (noisy, clean, sigma) tuples.
    """
    pairs = []
    for img in clean_images:
        sigma = np.random.uniform(sigma_range[0], sigma_range[1])
        noise = np.random.normal(0, sigma / 255.0, img.shape).astype(np.float32)
        noisy = np.clip(img + noise, 0, 1)
        pairs.append((noisy, img, sigma))
    return pairs

def generate_mixed_noise(clean, sigma_g=25, lambda_p=30):
    """
    Poisson-Gaussian mixed noise (closer to real sensor noise than pure AWGN).
    sigma_g: Gaussian read noise std (in [0,255] scale)
    lambda_p: Poisson noise intensity scaling factor
    """
    # Poisson component (signal-dependent)
    poisson = np.random.poisson(clean * lambda_p) / lambda_p - clean
    # Gaussian component (signal-independent)
    gaussian = np.random.normal(0, sigma_g / 255.0, clean.shape).astype(np.float32)
    return np.clip(clean + poisson + gaussian, 0, 1).astype(np.float32)

Real Noise Handling

When working with real noisy-clean pairs (SIDD, DND, PolyU), apply the following preprocessing:

  • Alignment check: Verify that noisy and clean patches are spatially aligned. Apply sub-pixel alignment correction if needed (use optical flow or phase correlation).

  • Gamma and tone mapping: Ensure both noisy and clean images are in the same colour space (sRGB or linearised). Mixing them will confuse the network.

  • Outlier filtering: Remove pairs where the PSNR between noisy and clean is below 20 dB (likely a misaligned pair) or above 50 dB (likely a duplicate or calibration error).

Patch Extraction

def extract_patches(image, patch_size=256, stride=128):
    """Extract overlapping patches from an image for training."""
    H, W, C = image.shape
    patches = []
    for y in range(0, H - patch_size + 1, stride):
        for x in range(0, W - patch_size + 1, stride):
            patch = image[y:y+patch_size, x:x+patch_size, :]
            patches.append(patch)
    return patches

For a full training set, extract 256×256 patches from each image pair with stride 128. A typical setup from 320 SIDD-Medium pairs yields ~50,000 patches, sufficient for initial training.

Augmentation

Standard augmentation for denoising: random horizontal flip, random vertical flip, and random 90/180/270° rotation. Do NOT apply augmentation separately to noisy and clean — augment both identically:

import random

def augment_pair(noisy, clean):
    """Apply identical augmentation to a noisy-clean pair."""
    # Random horizontal flip
    if random.random() > 0.5:
        noisy, clean = noisy[:, ::-1, :], clean[:, ::-1, :]
    # Random vertical flip
    if random.random() > 0.5:
        noisy, clean = noisy[::-1, :, :], clean[::-1, :, :]
    # Random 90° rotation
    k = random.randint(0, 3)
    noisy = np.rot90(noisy, k)
    clean = np.rot90(clean, k)
    return noisy.copy(), clean.copy()

Research Gap Radar — 5 Open Problems

These five open problems represent genuine opportunities for publishable contributions at conferences like CVPR, ICCV, ECCV, and IEEE TIP in 2024–2026.

Gap 1 — Cross-Domain Generalisation Without Ground Truth. Current models trained on SIDD smartphones fail measurably on DSLR cameras from DND. There is no established method for adapting a model trained on one camera family to another without paired ground truth from the target camera. This is the domain adaptation problem applied to noise — and it is far from solved.

Gap 2 — Efficient Transformers for Edge Deployment. Restormer and SwinIR achieve excellent PSNR but require 2–8 GB of GPU memory at inference. Deploying these on smartphones or IoT cameras is not practical. Designing transformer-class architectures with sub-100MB memory footprint and real-time inference on ARM CPUs is an open and commercially relevant problem.

Gap 3 — Perceptual Quality vs PSNR Alignment. Most SOTA models maximise PSNR/SSIM, which leads to over-smoothed textures that look unnatural. Diffusion-based denoisers (like GDP and DDRM) produce perceptually convincing textures but score lower PSNR. Designing a model that is simultaneously competitive on both is an open challenge.

Gap 4 — Denoising in Compressed/Pre-processed Pipelines. Real-world images go through JPEG compression, ISP processing, and AI-based noise suppression before the researcher ever sees them. The noise distribution in a JPEG social media upload is fundamentally different from SIDD's carefully controlled acquisitions. A dataset and model for this "post-pipeline" noise type would be a significant contribution.

Gap 5 — Video Denoising with Temporal Consistency. Image denoising models applied frame-by-frame to video produce temporal flickering artefacts that are visually objectionable even when per-frame PSNR is high. The gap between spatial and spatio-temporal denoising remains large on challenging sequences (night video, action sports). The benchmark datasets for video denoising (DAVIS, Set8) are underdeveloped compared to image benchmarks.


Implementation Roadmap — 8-Step Week-by-Week Guide

This roadmap is designed for a Final Year / M.Tech student with 2–3 months before submission and access to a single GPU (8–24 GB).

Week 1 — Environment Setup and Baseline Reproduction. Install PyTorch 2.x, clone the BasicSR framework (see Tools section), and reproduce DnCNN results on CBSD68 (σ=25). Target: ≥31.2 dB. If you cannot reproduce the baseline, your environment has a bug. Fix it now, not in Week 6.

Week 2 — Dataset Preparation. Download CBSD68, Kodak24, SIDD-Medium, and PolyU. Write patch extraction and augmentation scripts. Compute dataset statistics (mean, std, noise power spectrum). Visualise 10 noisy-clean pairs from each dataset. Confirm alignment is correct.

Week 3 — Baseline Training on SIDD. Train DnCNN-C or FFDNet on SIDD-Medium patches. Evaluate on CBSD68 σ=25 and the SIDD benchmark. Record baseline numbers. This is your comparison anchor.

Week 4 — Implement Your Proposed Component. This is where your contribution lives. Examples: a new attention mechanism, a frequency-domain branch, a noise level estimation module, a new loss function. Implement in isolation and test it on a toy problem first.

Week 5 — Integrate and Train. Integrate your component into the baseline architecture. Train from scratch (or fine-tune from the baseline checkpoint). Evaluate on all five datasets. Expect the first run to underperform the baseline — this is normal.

Week 6 — Ablation Studies. Remove or modify your key component and measure the PSNR drop. A good ablation answers: "what specifically does this component contribute?" Run 3–5 variants.

Week 7 — Submit to SIDD and DND Benchmarks. Generate submission files for SIDD and DND servers. Submit and record official numbers. These are the numbers that go in your paper — they are more credible than locally computed scores.

Week 8 — Paper Writing. Structure your paper as: Introduction → Related Work → Method → Experiments → Conclusion. For the experiments section, report results in the standard table format: dataset × model × σ × PSNR / SSIM. Include visual comparisons on at least three challenging patches.


Tools and Frameworks

BasicSR: The most widely used PyTorch framework for image restoration. Supports DnCNN, FFDNet, RealESRGAN, ESRGAN, SwinIR, EDVR, and more. Well-documented, modular, and easy to extend for custom architectures. Repository: https://github.com/XPixelGroup/BasicSR

MMagic (formerly MMEditing): OpenMMLab's image and video editing framework. Broader scope than BasicSR but slightly more complex configuration. Useful for video denoising and multi-task restoration. Repository: https://github.com/open-mmlab/mmagic

OpenCV: The essential image I/O and preprocessing library. Use for image reading, resizing, colour space conversion (BGR ↔ RGB ↔ YCbCr), and visualisation. Available via pip install opencv-python.

scikit-image: Python library for image processing with clean APIs for PSNR, SSIM, and patch extraction. Use skimage.metrics.peak_signal_noise_ratio and skimage.metrics.structural_similarity. Available via pip install scikit-image.

IQA-PyTorch: A comprehensive PyTorch library for image quality metrics including PSNR, SSIM, LPIPS, FSIM, NIQE, BRISQUE, and more. The cleanest unified interface for metric computation in research code. Repository: https://github.com/chaofengc/IQA-PyTorch. Install: pip install pyiqa.


7 Common Mistakes Researchers Make

Mistake 1 — Training and Testing on Overlapping Data. The most damaging error. If any BSD68 images appear in your training set and you also evaluate on CBSD68, your test numbers are invalid. Always verify splits before training.

Mistake 2 — Not Fixing Random Seeds for Noise Generation. If you generate different AWGN realisations for each evaluation run, your PSNR numbers will vary by ±0.05 dB across runs, making ablation comparisons unreliable. Fix np.random.seed(0) at the start of every evaluation script.

Mistake 3 — Computing PSNR in the Wrong Colour Space. Some papers compute PSNR on the Y channel (luminance) of YCbCr, others on RGB. Y-channel PSNR is typically 0.5–1.0 dB higher than RGB PSNR for the same image. Always state which channel/colour space you use.

Mistake 4 — Ignoring Border Effects. Models that use padding can produce artefacts at image borders. Some evaluation protocols crop a border of 4–8 pixels before computing PSNR. If you do not match the protocol of the papers you compare to, your numbers are not comparable.

Mistake 5 — Reporting Only One Metric. PSNR alone is insufficient for a 2024 research paper. Always report SSIM alongside PSNR. For real-noise datasets, adding LPIPS strengthens the paper significantly.

Mistake 6 — Not Normalising Correctly. Models trained on [0, 255] and evaluated on [0, 1] produce meaningless PSNR numbers. Verify data ranges at every step: data loading, model input, model output, and metric computation.

Mistake 7 — Evaluating a Model Trained on Real Noise on Synthetic Benchmarks Without Retraining. A model trained purely on SIDD real noise will perform poorly on CBSD68 AWGN because the noise distribution is completely different. Either train separate models or train on mixed synthetic + real data and acknowledge this in your paper.


Your Next Steps + Conclusion

A Practical Action Plan

You have now read a complete guide to the five most important image denoising datasets in the field. Here is how to turn that knowledge into research output.

If you are a Final Year B.Tech student: Start with CBSD68. Download it, write the noise-adding script from Section 1.8, run BM3D as a baseline, then train DnCNN from BasicSR. Getting your first PSNR number that beats BM3D on CBSD68 σ=25 is the milestone that tells you your pipeline works.

If you are an M.Tech student with a 6-month timeline: Follow the 8-week roadmap in Section 14. Aim to submit results to both SIDD and DND benchmarks — these server-validated numbers significantly strengthen your thesis.

If you are a PhD student: Your contribution should engage with at least one of the five open problems in the Research Gap Radar. Whether it is cross-domain generalisation, perceptual quality, or video denoising, pick the gap that aligns with your lab's existing expertise and propose a principled solution with rigorous empirical validation across multiple datasets.

What You Should Take Away

Image denoising is one of the few computer vision problems where the community has reached remarkable benchmark consensus. Reporting results on CBSD68, SIDD, and DND is a universally understood signal — reviewers know exactly what the numbers mean and what they should be. This consensus is a gift; use it by following the standard protocols precisely.

At the same time, do not let benchmark chasing substitute for genuine scientific insight. The most cited denoising papers — DnCNN, SwinIR, Restormer — succeeded not because they squeezed out 0.1 dB of PSNR, but because they introduced architectural ideas (residual learning, shifted window attention, transposed attention) that changed how the entire community thinks about the problem.

The datasets reviewed here will remain relevant for years. The models that beat them will come from researchers who understand the data deeply, not just the leaderboard numbers.


Further Reading and Resources

  • DnCNN Paper: Zhang et al., "Beyond a Gaussian Denoiser," IEEE TIP 2017.

  • SIDD Paper: Abdelhamed et al., "A High-Quality Denoising Dataset for Smartphone Cameras," CVPR 2018.

  • DND Paper: Plotz & Roth, "Benchmarking Denoising Algorithms with Real Photographs," CVPR 2017.

  • Restormer Paper: Zamir et al., "Restormer: Efficient Transformer for High-Resolution Image Restoration," CVPR 2022.

  • NAFNet Paper: Chen et al., "Simple Baselines for Image Restoration," ECCV 2022.

  • BasicSR Documentation: https://basicsr.readthedocs.io

  • IQA-PyTorch Documentation: https://iqa-pytorch.readthedocs.io