Skip to main content

Command Palette

Search for a command to run...

Top 5 Image Dehazing Datasets Every Computer Vision Researcher Must Know

A Complete Guide to Hazy-Clean Paired Datasets, Haze Types, Metrics, Models, and Implementation — With Final Year Project Angles for Researchers, PhD, M.Tech, and Final Year Students

Updated
53 min read
Top 5 Image Dehazing Datasets Every Computer Vision Researcher Must Know
F
AI enthusiast and academic researcher with a focus on deep learning, computer vision, and NLP. I write about IEEE-aligned project ideas, model architectures, and practical AI implementation guides for final year engineering students. Helping students bridge the gap between research papers and real-world code.

Who is this for? Final year B.Tech/M.Tech students building a dehazing project, PhD researchers benchmarking new architectures, and CV practitioners who need to understand which dataset to trust and why. Every section is written to save you the 40+ hours of scattered paper-reading that most researchers go through before picking a dataset.

Introduction

What is Image Dehazing?

Image dehazing is the process of recovering a clear, haze-free image J from a hazy observation I, where atmospheric scattering has degraded contrast, colour fidelity, and visibility. The classical physical model that governs this degradation is the Atmospheric Scattering Model (ASM):

$$I(x) = J(x) · t(x) + A · (1 − t(x))$$

where:

  • I(x) is the observed hazy image at pixel x

  • J(x) is the scene radiance (the clean image we want to recover)

  • t(x) is the transmission map — the fraction of light that reaches the camera without being scattered

  • A is the global atmospheric light (the colour of the haze, typically a bright greyish-white)

The transmission map is related to scene depth d(x) and the atmospheric scattering coefficient β by:

$$t(x) = e^(−β · d(x))$$

This means that distant objects (large d) have very low transmission — they are almost completely obscured by haze — while nearby objects retain most of their original appearance. Dehazing algorithms work by estimating t(x) and A from I(x) alone, then inverting the ASM to recover J(x).

Three Eras of Dehazing Research

Prior-based era (2000s–2014): Methods like Dark Channel Prior (DCP) by He et al. (CVPR 2009, TPAMI 2011) exploited the statistical observation that in most haze-free image patches, at least one colour channel has very low intensity. DCP-based dehazing became the foundational baseline that all subsequent methods compare against.

Deep learning era (2016–2020): CNNs such as DehazeNet, MSCNN, AOD-Net, and GFN learned end-to-end mappings from hazy to clean images, dramatically outperforming prior-based methods on benchmark datasets. These methods were trained and evaluated almost exclusively on synthetic haze.

Transformer and physics-guided era (2020–present): Models like FFA-Net, MAXIM, DehazeFormer, and Dehamer use attention mechanisms, multi-scale feature fusion, and explicit physical priors simultaneously. The critical challenge of this era is the synthetic-to-real gap: models trained on synthetic haze often fail on real outdoor haze despite achieving high PSNR on benchmarks.

Why Datasets Matter in Dehazing

Dehazing has a unique dataset challenge that other restoration tasks do not share: obtaining ground-truth clean images for real hazy scenes is extremely difficult. You cannot photograph the same scene on a clear day and a hazy day and simply compare them — the lighting, time of day, and seasonal variation will have changed. This fundamental difficulty has driven the community to develop multiple creative dataset collection strategies, each with its own trade-offs.

Understanding which dataset uses which strategy — and what limitations that introduces — is essential for correctly interpreting benchmark results and designing your own research.

How This Article is Structured

Each of the five dataset sections follows the same 12-subsection template: overview, origin, haze characteristics, image statistics, download/access, metadata block, licence, how researchers use it, code to load it, reported state-of-the-art numbers, known limitations, research angles, and a quick-reference summary card. After the datasets, you get metrics, model benchmarks, data preparation recipes, a research gap radar, a week-by-week implementation roadmap, and the tooling ecosystem.


What Makes a Great Image Dehazing Dataset?

1. Haze Type Coverage

Haze in the real world is not a single phenomenon. Homogeneous haze is uniform across the image — a simple fog where density does not vary much spatially. Heterogeneous (non-homogeneous) haze varies spatially — thick patches next to thin patches, common in morning mist and industrial smog. Dense haze almost completely obscures distant scene content with transmission values near zero. A great dataset should clearly document which type it contains, because models trained on homogeneous synthetic haze fail dramatically on non-homogeneous real haze.

2. Synthetic vs Real Haze

Synthetic haze is generated by applying the ASM to clean images using depth maps, yielding perfectly aligned hazy-clean pairs with precisely known A and t(x). Synthetic data enables controlled training and full-reference evaluation but misses the spectral, spatial, and dynamic complexity of real outdoor haze.

Real haze is captured in actual foggy or hazy weather conditions. Getting clean ground truth requires using a haze machine in a controlled indoor setting (I-Haze, O-Haze), capturing the scene before and after haze (challenging outdoors), or accepting that no ground truth is available (unpaired real haze datasets like RTTS).

3. Indoor vs Outdoor Scenes

Indoor dehazing (I-Haze) and outdoor dehazing (O-Haze) involve fundamentally different illumination conditions, depth ranges, and haze density profiles. A model trained only on indoor data will fail outdoors because the depth range — and therefore the transmission variation across the image — is completely different.

4. Image Diversity and Scale

A dataset with 10 images is suitable only for evaluation, not training. A dataset with 10,000 image pairs covers a range of textures, depths, and haze densities sufficient for training robust models. Diversity also means coverage across time of day, season, and weather type.

5. Licence and Accessibility

The dehazing community is smaller than the denoising community and some datasets are less formally licensed. Always check whether a dataset allows commercial use, requires citation, or is restricted to academic research before building a product or open-source tool.


Dataset 1 — RESIDE

1.1 Overview

RESIDE (Real-world hazy Images for Single Image DEhazing) is the largest and most comprehensive image dehazing dataset ever created and is the undisputed standard for dehazing benchmarking. Published by Li et al. at IEEE TIP 2019, RESIDE combines synthetic indoor and outdoor hazy images at large scale with a carefully curated real-world evaluation set. If you have read any deep learning dehazing paper since 2018, you have almost certainly seen PSNR and SSIM numbers on RESIDE's SOTS (Synthetic Objective Testing Set) subset.

1.2 Origin and History

RESIDE was created by researchers at Hefei University of Technology and the University of Maryland. The authors recognised that existing dehazing datasets were either too small (a few dozen images) or evaluated only on synthetic data, creating a gap between benchmark performance and real-world results. RESIDE was designed to bridge this gap by providing multiple subsets covering different haze types, densities, and evaluation protocols.

The dataset was introduced alongside the RESIDE benchmark challenge and has been updated in multiple versions. RESIDE-V0 (the original) and RESIDE-6K (a curated 6,000-pair training subset) are the most commonly used variants in recent papers.

1.3 Haze Characteristics

RESIDE uses synthetic haze generated by applying the ASM to depth maps:

  • Indoor subset (ITS — Indoor Training Set): 13,990 hazy images generated from 1,399 clean indoor images using synthetic depth maps and multiple A and β values per image.

  • Outdoor subset (OTS — Outdoor Training Set): 313,950 hazy images from clean outdoor images with synthetic haze. The scale of OTS is unique in the dehazing field.

  • SOTS (Synthetic Objective Testing Set): 500 indoor + 500 outdoor test images with ground truth. This is the standard evaluation split.

  • HSTS (Hybrid Subjective Testing Set): 10 synthetic + 10 real hazy images for perceptual evaluation without ground truth on the real subset.

  • RTTS (Real-world Task-driven Testing Set): 4,322 real hazy images without ground truth, for qualitative evaluation only.

Haze parameters: atmospheric light A ∈ [0.7, 1.0] (uniform RGB), scattering coefficient β ∈ [0.04, 0.20] for outdoor, β ∈ [0.6, 1.8] for indoor.

1.4 Image Statistics

Attribute Value
Indoor training pairs (ITS) 13,990 hazy + 1,399 clean
Outdoor training pairs (OTS) 313,950 hazy + 8,970 clean
SOTS test pairs 500 indoor + 500 outdoor
RTTS real images (no GT) 4,322
Resolution Varies: 460×620 (indoor), up to 1024×1024 (outdoor)
Haze type Synthetic homogeneous
Depth maps Included for ITS
Colour space RGB

1.5 Download and Access

1.6 Dataset Metadata

Field Detail
Official Download https://sites.google.com/view/reside-dehaze-datasets/
Published 2018 (RESIDE v0), updated 2019 (IEEE TIP paper)
License Research / Non-commercial use
Authors Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zuo, Zhangyang Wang
File size ITS: ~3 GB
Citation Li et al., "Benchmarking Single-Image Dehazing and Beyond," IEEE TIP 2019

1.7 Licence

RESIDE is released for non-commercial research and education use only. All publications using RESIDE must cite the original IEEE TIP paper. Commercial use requires written permission from the dataset creators.

1.8 How Researchers Use RESIDE

Standard training protocol: Train on ITS (indoor) or OTS (outdoor) or both. Most recent papers use ITS for indoor evaluation and OTS for outdoor evaluation separately, since models trained on indoor data do not generalise well to outdoor haze due to different depth ranges.

Standard evaluation protocol: Report PSNR and SSIM on SOTS-Indoor (500 pairs) and SOTS-Outdoor (500 pairs) separately. Always specify which SOTS subset you are using — "SOTS" alone is ambiguous.

Qualitative evaluation: Run inference on RTTS images and include visual comparisons in your paper. Since RTTS has no ground truth, only qualitative assessment is possible.

1.9 Code to Load RESIDE

import os
import numpy as np
from PIL import Image
from torch.utils.data import Dataset

class RESIDEDataset(Dataset):
    """
    Dataset loader for RESIDE ITS or SOTS.
    
    Directory structure expected:
      root/
        hazy/   <- hazy images (named e.g. 1_1.png, 1_2.png for ITS)
        clear/  <- clean images (named e.g. 1.png for ITS)
    
    For ITS, multiple hazy images correspond to one clean image.
    The naming convention is {clean_id}_{haze_param_id}.png
    """
    def __init__(self, root, mode='its', transform=None):
        self.root = root
        self.transform = transform
        self.hazy_dir = os.path.join(root, 'hazy')
        self.clean_dir = os.path.join(root, 'clear')
        
        self.hazy_files = sorted(os.listdir(self.hazy_dir))
        
        if mode == 'its':
            # For ITS: extract clean image ID from hazy filename
            self.clean_map = {
                h: h.split('_')[0] + '.png' 
                for h in self.hazy_files
            }
        else:
            # For SOTS: 1-to-1 correspondence
            self.clean_map = {h: h for h in self.hazy_files}
    
    def __len__(self):
        return len(self.hazy_files)
    
    def __getitem__(self, idx):
        hazy_name = self.hazy_files[idx]
        clean_name = self.clean_map[hazy_name]
        
        hazy = np.array(
            Image.open(os.path.join(self.hazy_dir, hazy_name))
        ).astype(np.float32) / 255.0
        
        clean = np.array(
            Image.open(os.path.join(self.clean_dir, clean_name))
        ).astype(np.float32) / 255.0
        
        if self.transform:
            hazy, clean = self.transform(hazy, clean)
        
        # Convert to (C, H, W) for PyTorch
        hazy = hazy.transpose(2, 0, 1)
        clean = clean.transpose(2, 0, 1)
        
        return hazy, clean

def compute_psnr_ssim_reside(model, sots_root, split='indoor'):
    """Evaluate a dehazing model on RESIDE SOTS."""
    import torch
    from skimage.metrics import peak_signal_noise_ratio as psnr
    from skimage.metrics import structural_similarity as ssim
    
    dataset = RESIDEDataset(
        os.path.join(sots_root, split), 
        mode='sots'
    )
    psnr_list, ssim_list = [], []
    
    for hazy_t, clean_t in dataset:
        with torch.no_grad():
            output = model(hazy_t.unsqueeze(0)).squeeze(0)
        
        output_np = output.permute(1,2,0).cpu().numpy().clip(0,1)
        clean_np  = clean_t.permute(1,2,0).cpu().numpy()
        
        psnr_list.append(psnr(clean_np, output_np, data_range=1.0))
        ssim_list.append(
            ssim(clean_np, output_np, data_range=1.0, 
                 multichannel=True, channel_axis=2)
        )
    
    print(f"RESIDE SOTS-{split} | "
          f"PSNR: {np.mean(psnr_list):.2f} dB | "
          f"SSIM: {np.mean(ssim_list):.4f}")
    return np.mean(psnr_list), np.mean(ssim_list)

1.10 State-of-the-Art Numbers on RESIDE SOTS

SOTS-Indoor:

Model Year PSNR (dB) SSIM
DCP 2011 16.62 0.818
DehazeNet 2016 21.14 0.847
AOD-Net 2017 20.29 0.877
MSBDN 2020 33.67 0.985
FFA-Net 2020 36.39 0.989
MAXIM 2022 38.11 0.991
DehazeFormer-B 2023 40.05 0.994

SOTS-Outdoor:

Model Year PSNR (dB) SSIM
DCP 2011 19.13 0.815
AOD-Net 2017 24.14 0.920
GDN 2019 30.86 0.982
MSBDN 2020 33.48 0.982
FFA-Net 2020 33.57 0.984
DehazeFormer-B 2023 34.81 0.986

1.11 Known Limitations

  • Synthetic haze only (ITS/OTS/SOTS). The ASM-based synthetic haze is spatially homogeneous and does not capture real-world haze complexity. Models achieving 40+ dB on SOTS-Indoor can look visually poor on real outdoor photos.

  • Depth map quality for ITS. Indoor depth maps from datasets like NYU Depth V2 used in RESIDE generation have noise and errors that introduce artefacts in synthetic haze.

  • Scale imbalance. OTS is 300K+ images; most research groups train on ITS only due to compute constraints, meaning OTS is underutilised.

  • No heterogeneous haze. All RESIDE synthetic haze is generated with a single global A per image — real haze has spatially varying A, which this dataset does not capture.

1.12 Research Angles for Final Year / PhD Students

  • Synthetic-to-real domain adaptation: Train on RESIDE-ITS and evaluate on O-Haze or NH-Haze. Quantify the domain gap and propose an adaptation method.

  • Depth-aware dehazing: Use the depth maps included with RESIDE-ITS to design a depth-conditioned architecture that varies its processing by scene distance.

  • RTTS as an unsupervised training signal: Apply Noise2Noise-style or contrastive learning using RTTS real hazy images without ground truth.

  • Lightweight dehazing on OTS: Most papers train on ITS; use the full OTS scale to train a lightweight model and show that scale compensates for reduced model capacity.

1.13 Quick Reference Card

RESIDE | 13,990 indoor + 313,950 outdoor training pairs | SOTS: 1,000 test pairs | Synthetic homogeneous haze | Non-commercial research licence | Use as: training + standard benchmark | Primary metrics: PSNR, SSIM on SOTS | Download: sites.google.com/view/reside-dehaze-datasets/


Dataset 2 — O-Haze

2.1 Overview

O-Haze (Outdoor Haze dataset) is the first dataset to provide real outdoor hazy-clean image pairs captured using a professional haze machine. Released by Ancuti et al. at CVPR Workshop 2018, O-Haze addresses the fundamental limitation of purely synthetic benchmarks by providing genuine optical haze — the same atmospheric scattering physics that occurs in real foggy weather, reproduced in a controlled outdoor setting. It is one of the NTIRE 2018 and NTIRE 2019 challenge datasets, giving it significant community visibility.

2.2 Origin and History

O-Haze was created at the Multimedia Lab of Hasselt University, Belgium. The collection methodology was carefully designed: 45 outdoor scenes were photographed both with and without haze produced by a professional haze machine positioned to fill the scene. The haze machine produces water-droplet-based aerosol that mimics the optical properties of natural outdoor haze and fog. This approach yields perfectly aligned hazy-clean pairs — the camera is fixed, only the haze presence changes between the two captures.

O-Haze was one of the first datasets to allow researchers to quantitatively evaluate real-haze dehazing with full-reference metrics, filling a critical gap between synthetic benchmarks and purely qualitative real-world testing.

2.3 Haze Characteristics

O-Haze haze is real, optically generated using a professional haze/fog machine:

  • Haze type: Dense, relatively homogeneous fog produced by atomised water particles.

  • Spatial variation: Some spatial variation exists due to wind, outdoor air movement, and distance from the haze machine — making it more realistic than synthetic homogeneous haze.

  • Depth correlation: Distant objects are more obscured than near objects, consistent with the ASM model.

  • Colour shift: Real haze introduces a whitish colour cast that varies slightly from image to image depending on ambient lighting and haze density — unlike the constant A used in synthetic datasets.

  • Haze density: Generally dense — visibility is significantly reduced in most image pairs.

2.4 Image Statistics

Attribute Value
Total image pairs 45 (hazy + clean)
Training pairs 40
Validation pairs 5
Test pairs (withheld for challenge) 5
Resolution 4964×3312 pixels (original, ~16 MP)
Standard evaluation resolution Resized to 512×512 or 1024×1024
Haze type Real (haze machine), outdoor
Scene content Gardens, streets, paths, vegetation
Colour space RGB

2.5 Download and Access

2.6 Dataset Metadata

Field Detail
Official Download https://data.vision.ee.ethz.ch/cvl/ntire18//o-haze/
Published April 2018 (CVPR Workshop)
License Non-commercial research use
Authors Codruta O. Ancuti, Cosmin Ancuti, Radu Timofte, Christophe De Vleeschouwer
File size ~2.1 GB
Citation Ancuti et al., "O-HAZE: A Dehazing Benchmark with Real Hazy and Haze-Free Outdoor Images," CVPRW 2018

2.7 Licence

O-Haze is available for non-commercial research and education use. Citation of the CVPR Workshop 2018 paper is required. The dataset was released as part of the NTIRE 2018 challenge and remains hosted by ETH Zurich's Computer Vision Lab.

2.8 How Researchers Use O-Haze

O-Haze is used almost exclusively as a test set due to its small size (45 pairs). The standard protocol is:

  1. Train on RESIDE-ITS or OTS (synthetic).

  2. Fine-tune or directly evaluate on O-Haze to assess real-haze generalisation.

  3. Report PSNR/SSIM on the 40-pair training set (some papers) or the 5-pair validation set.

  4. Submit to the NTIRE challenge evaluation server for official test numbers.

Papers that do well on RESIDE-SOTS but poorly on O-Haze are demonstrating the synthetic-to-real gap — and this pattern is common enough that O-Haze has become essential for any paper claiming real-world applicability.

2.9 Code to Load O-Haze

import os
import numpy as np
from PIL import Image
from glob import glob
from skimage.metrics import peak_signal_noise_ratio as psnr
from skimage.metrics import structural_similarity as ssim

def load_ohaze_pairs(ohaze_root, split='train', target_size=(512, 512)):
    """
    Load O-Haze hazy-clean pairs.
    
    Expected structure:
      ohaze_root/
        hazy/   <- hazy images (e.g. 01_outdoor_hazy.jpg)
        GT/     <- clean ground truth (e.g. 01_outdoor_GT.jpg)
    """
    hazy_files = sorted(glob(os.path.join(ohaze_root, 'hazy', '*.jpg')))
    gt_files   = sorted(glob(os.path.join(ohaze_root, 'GT', '*.jpg')))
    
    pairs = []
    for hf, gf in zip(hazy_files, gt_files):
        hazy  = Image.open(hf).convert('RGB')
        clean = Image.open(gf).convert('RGB')
        
        if target_size:
            hazy  = hazy.resize(target_size, Image.LANCZOS)
            clean = clean.resize(target_size, Image.LANCZOS)
        
        hazy_np  = np.array(hazy).astype(np.float32) / 255.0
        clean_np = np.array(clean).astype(np.float32) / 255.0
        pairs.append((hazy_np, clean_np))
    
    print(f"Loaded {len(pairs)} O-Haze pairs at {target_size}")
    return pairs

def evaluate_ohaze(model_fn, pairs):
    """Evaluate a dehazing model on O-Haze pairs."""
    psnr_list, ssim_list = [], []
    for hazy, clean in pairs:
        dehazed = model_fn(hazy)
        p = psnr(clean, dehazed, data_range=1.0)
        s = ssim(clean, dehazed, data_range=1.0,
                 multichannel=True, channel_axis=2)
        psnr_list.append(p)
        ssim_list.append(s)
    print(f"O-Haze | PSNR: {np.mean(psnr_list):.2f} dB | "
          f"SSIM: {np.mean(ssim_list):.4f}")
    return np.mean(psnr_list), np.mean(ssim_list)

2.10 State-of-the-Art Numbers on O-Haze

Model Year PSNR (dB) SSIM
DCP 2011 15.78 0.561
MSCNN 2016 17.56 0.810
AOD-Net 2017 15.03 0.527
GFN 2018 21.55 0.844
EPDN 2019 22.57 0.863
FFA-Net 2020 22.12 0.858
AECR-Net 2021 23.43 0.879
DehazeFormer-B 2023 24.16 0.891

O-Haze PSNR values are substantially lower than RESIDE-SOTS values — this is expected due to real haze complexity. Do not compare absolute numbers across datasets.

2.11 Known Limitations

  • Only 45 pairs. Extremely small. Cannot be used for training from scratch — fine-tuning only.

  • Haze machine ≠ natural haze. The water-droplet aerosol from a haze machine has different particle size distribution and spectral properties than natural outdoor haze (combustion particles, dust, humidity). Models trained on O-Haze may not generalise to natural foggy weather.

  • Controlled outdoor setting. All 45 scenes were photographed in the same garden/outdoor facility. Scene diversity is very limited.

  • No depth information. Unlike RESIDE, O-Haze provides no auxiliary depth maps.

  • Resolution mismatch. The 16 MP originals are almost always resized to 512×512 or 1024×1024 for training, introducing rescaling artefacts.

2.12 Research Angles for Final Year / PhD Students

  • Domain gap measurement: Train a model on RESIDE-OTS and evaluate zero-shot on O-Haze. Report PSNR gap and identify which scene elements (distant vegetation, sky, shadows) suffer most.

  • Fine-tuning with limited real data: Demonstrate that fine-tuning on just 10 O-Haze pairs (from 40 available) improves generalisation. Study optimal fine-tuning strategies.

  • Perceptual quality on real haze: Show that PSNR-optimal dehazing on O-Haze over-saturates colours or produces halo artefacts around edges. Design a perceptual loss to fix this.

  • Multi-scale evaluation: Evaluate models at original O-Haze resolution (4964×3312) vs. resized resolution. Show the resolution degradation effect on haze removal quality.

2.13 Quick Reference Card

O-Haze | 45 real outdoor pairs | Real optical haze (haze machine) | Non-commercial research licence | Use as: fine-tuning + real-haze evaluation | Primary metrics: PSNR, SSIM | Download: data.vision.ee.ethz.ch/cvl/ntire18//o-haze/


Dataset 3 — I-Haze

3.1 Overview

I-Haze (Indoor Haze dataset) is the indoor counterpart to O-Haze, released by the same Hasselt University team at CVPR Workshop 2018. I-Haze provides real hazy-clean image pairs captured in an indoor environment using a professional haze machine. It is the primary benchmark for indoor dehazing evaluation and was also used in the NTIRE 2018 and 2019 challenge tracks.

3.2 Origin and History

I-Haze was motivated by the observation that indoor haze and smog (from cooking fires, industrial ventilation failures, fires, and cigarette smoke) is a significant real-world problem distinct from outdoor haze. The collection was performed in the same laboratory environment as O-Haze, with the haze machine filling a controlled indoor space to produce dense, near-uniform haze across different scenes and objects.

The dataset is notable for providing both a training split and a withheld test split evaluated via the NTIRE challenge server, enabling reproducible benchmark comparison with server-validated scores.

3.3 Haze Characteristics

I-Haze uses real, optically generated indoor haze:

  • Haze distribution: More spatially uniform than O-Haze because indoor environments have less air movement. This makes I-Haze closer to the ASM model assumption than O-Haze.

  • Depth range: Shorter depth range (3–8 metres typical for indoor scenes) compared to outdoor. Transmission values are therefore generally higher than outdoor datasets — the haze is less extreme at typical indoor distances.

  • Colour temperature: Indoor scenes have mixed lighting (artificial + natural window light), creating complex interactions with the haze colour.

  • Haze density: Dense enough to significantly reduce contrast and produce visible whitish colour cast, but typically not as severe as Dense-Haze.

3.4 Image Statistics

Attribute Value
Total image pairs 35 (hazy + clean)
Training pairs 25
Validation pairs 5
Test pairs (withheld) 5
Resolution 2833×4256 pixels (original, ~12 MP)
Standard evaluation resolution Resized to 512×512 or 1024×1024
Haze type Real (haze machine), indoor
Scene content Indoor objects, furniture, artwork, household items
Colour space RGB

3.5 Download and Access

3.6 Dataset Metadata

Field Detail
Official Download https://data.vision.ee.ethz.ch/cvl/ntire18//i-haze/
Published April 2018 (CVPR Workshop)
License Non-commercial research use
Authors Codruta O. Ancuti, Cosmin Ancuti, Radu Timofte, Christophe De Vleeschouwer
File size ~1.4 GB
Citation Ancuti et al., "I-HAZE: A Dehazing Benchmark with Real Hazy and Haze-Free Indoor Images," CVPRW 2018

3.7 Licence

I-Haze is available for non-commercial research and education use. Citation of the CVPR Workshop 2018 paper is required. Hosted by ETH Zurich's Computer Vision Lab.

3.8 How Researchers Use I-Haze

I-Haze is primarily used as an indoor real-haze evaluation set. The standard protocol mirrors O-Haze:

  1. Train on RESIDE-ITS (synthetic indoor data).

  2. Evaluate on I-Haze training/validation pairs to measure synthetic-to-real indoor gap.

  3. Papers reporting both I-Haze and O-Haze results demonstrate generalisation across environments.

I-Haze is commonly paired with O-Haze in result tables, with papers reporting both to show indoor/outdoor coverage. A model that excels on RESIDE-ITS but performs mediocrely on I-Haze is relying on synthetic haze artefacts that do not generalise.

3.9 Code to Load I-Haze

import os
import numpy as np
from PIL import Image
from glob import glob

def load_ihaze_pairs(ihaze_root, target_size=(512, 512)):
    """
    Load I-Haze hazy-clean pairs.
    
    Expected structure:
      ihaze_root/
        hazy/   <- hazy images (e.g. 01_indoor_hazy.jpg)
        GT/     <- clean ground truth (e.g. 01_indoor_GT.jpg)
    """
    hazy_files = sorted(glob(os.path.join(ihaze_root, 'hazy', '*.jpg')))
    gt_files   = sorted(glob(os.path.join(ihaze_root, 'GT', '*.jpg')))
    
    assert len(hazy_files) == len(gt_files), \
        f"Mismatch: {len(hazy_files)} hazy vs {len(gt_files)} clean"
    
    pairs = []
    for hf, gf in zip(hazy_files, gt_files):
        hazy  = np.array(
            Image.open(hf).convert('RGB').resize(
                target_size, Image.LANCZOS
            )
        ).astype(np.float32) / 255.0
        clean = np.array(
            Image.open(gf).convert('RGB').resize(
                target_size, Image.LANCZOS
            )
        ).astype(np.float32) / 255.0
        pairs.append((hazy, clean))
    
    print(f"Loaded {len(pairs)} I-Haze pairs")
    return pairs

def patch_based_dehazing_eval(model_fn, pairs, patch_size=256, stride=256):
    """
    Evaluate model using patch-based inference (for memory-limited GPUs).
    Useful when evaluating at full resolution on I-Haze's large originals.
    """
    from skimage.metrics import peak_signal_noise_ratio as psnr
    from skimage.metrics import structural_similarity as ssim
    import numpy as np
    
    psnr_list, ssim_list = [], []
    for hazy, clean in pairs:
        H, W, _ = hazy.shape
        output = np.zeros_like(hazy)
        count  = np.zeros((H, W, 1))
        
        for y in range(0, H - patch_size + 1, stride):
            for x in range(0, W - patch_size + 1, stride):
                patch = hazy[y:y+patch_size, x:x+patch_size]
                denoised_patch = model_fn(patch)
                output[y:y+patch_size, x:x+patch_size] += denoised_patch
                count[y:y+patch_size, x:x+patch_size] += 1
        
        output = (output / count.clip(min=1)).clip(0, 1)
        psnr_list.append(psnr(clean, output, data_range=1.0))
        ssim_list.append(ssim(clean, output, data_range=1.0,
                               multichannel=True, channel_axis=2))
    
    print(f"I-Haze patch eval | PSNR: {np.mean(psnr_list):.2f} dB | "
          f"SSIM: {np.mean(ssim_list):.4f}")
    return psnr_list, ssim_list

3.10 State-of-the-Art Numbers on I-Haze

Model Year PSNR (dB) SSIM
DCP 2011 14.43 0.754
MSCNN 2016 15.22 0.785
AOD-Net 2017 16.72 0.820
GFN 2018 22.30 0.880
EPDN 2019 22.83 0.888
FFA-Net 2020 23.75 0.912
AECR-Net 2021 24.02 0.915
DehazeFormer-B 2023 25.14 0.927

3.11 Known Limitations

  • Only 35 pairs — the smallest dataset in this guide. Variance on 5 test pairs is extremely high; a 0.5 dB difference may not be statistically meaningful.

  • Single indoor setting. All 35 scenes are from the same laboratory space. Models fine-tuned on I-Haze may overfit to that specific room's characteristics.

  • Haze machine uniformity bias. Indoor air movement is minimal, making I-Haze haze even more spatially uniform than real indoor smoke or smog events.

  • No depth information and no metadata about haze density or A/β values.

3.12 Research Angles for Final Year / PhD Students

  • Mixed indoor/outdoor dehazing: Train jointly on RESIDE-ITS + I-Haze + O-Haze pairs and study whether joint training improves generalisation compared to separate models.

  • Haze density estimation from image cues: Design a sub-network that estimates haze density from the image and conditions the dehazing process accordingly — evaluate on I-Haze vs Dense-Haze.

  • Colour correction post-dehazing: Real haze introduces colour casts that synthetic-trained models often leave uncorrected. Design a post-processing colour calibration step evaluated on I-Haze.

  • Benchmark instability study: Evaluate 5 different models on I-Haze's 5 validation images and compute the variance of the PSNR estimate. Show that I-Haze is too small for reliable benchmark comparison and propose a bootstrap confidence interval reporting method.

3.13 Quick Reference Card

I-Haze | 35 real indoor pairs | Real optical haze (haze machine) | Non-commercial research licence | Use as: indoor real-haze evaluation | Primary metrics: PSNR, SSIM | Download: data.vision.ee.ethz.ch/cvl/ntire18//i-haze/


Dataset 4 — NH-Haze

4.1 Overview

NH-Haze (Non-Homogeneous Haze dataset) is the first dataset to explicitly address spatially non-uniform haze — the most challenging and realistic form of haze present in real-world outdoor photography. Released by Ancuti et al. at CVPR Workshop 2020, NH-Haze was motivated by the recognition that all previous real-haze datasets (including O-Haze and I-Haze) still produced relatively uniform haze because the haze machine was operated at constant intensity. NH-Haze deliberately introduces spatial variation in haze density across each image.

4.2 Origin and History

NH-Haze was also created at Hasselt University, building directly on the O-Haze and I-Haze experience. The key innovation in data collection was operating the haze machine in a dynamic, non-uniform mode — varying the output intensity and position during capture to produce heterogeneous haze distributions. Additionally, NH-Haze was captured in different outdoor locations and across different times of day, introducing more scene and lighting diversity than O-Haze.

NH-Haze was the dehazing challenge dataset for NTIRE 2020 at CVPR, and NH-Haze 2 was released for NTIRE 2021. Both versions are commonly referenced as NH-Haze in papers.

4.3 Haze Characteristics

NH-Haze haze is real, non-homogeneous optical haze:

  • Non-uniform density: Haze thickness varies spatially across the image — some regions may have near-zero transmission while adjacent regions are relatively clear.

  • No global atmospheric light assumption: Because haze density varies, the standard ASM with a single global A does not hold. This is a direct challenge to DCP-based and most CNN-based denoisers that assume homogeneous haze.

  • Scene depth correlation preserved: Despite non-homogeneity, distant objects are still more obscured than near objects on average.

  • Natural-looking haze: The visual appearance is significantly more realistic than O-Haze, more closely resembling morning mist over varied terrain or industrial haze near sources.

4.4 Image Statistics

Attribute Value
NH-Haze v1 pairs 55 (hazy + clean)
Training pairs (v1) 45
Validation pairs (v1) 5
Test pairs (v1, withheld) 5
NH-Haze v2 additional pairs 25
Resolution ~5000×3000 pixels (original)
Standard evaluation resolution 1600×1200 or 512×512
Haze type Real, non-homogeneous outdoor
Scene content Diverse outdoor: forests, streets, buildings, fields
Colour space RGB

4.5 Download and Access

4.6 Dataset Metadata

Field Detail
Official Download https://data.vision.ee.ethz.ch/cvl/ntire20/nh-haze/
Published June 2020 (CVPRW)
License Non-commercial research use
Authors Codruta O. Ancuti, Cosmin Ancuti, Radu Timofte
File size ~3.8 GB (v1)
Citation Ancuti et al., "NH-HAZE: An Image Dehazing Benchmark with Non-Homogeneous Hazy and Haze-Free Images," CVPRW 2020

4.7 Licence

NH-Haze is available for non-commercial research and education use. Citation of the CVPR Workshop 2020 paper is required. Hosted by ETH Zurich.

4.8 How Researchers Use NH-Haze

NH-Haze serves two key roles:

As a hard real-haze test set: A model trained on RESIDE and evaluated on O-Haze/I-Haze may still be fooled by the relatively uniform haze in those datasets. NH-Haze is the stress test — models that maintain strong performance here have genuinely learned to handle spatially varying haze.

As a training set for non-homogeneous dehazing: With 45 training pairs, NH-Haze is borderline usable for fine-tuning (not full training). Most papers use it only for evaluation, but some recent works include it in multi-dataset training protocols alongside RESIDE.

4.9 Code to Load NH-Haze

import os
import numpy as np
from PIL import Image
from glob import glob

def load_nhhaze_pairs(nhhaze_root, version=1, target_size=None):
    """
    Load NH-Haze hazy-clean pairs.
    
    Expected structure:
      nhhaze_root/
        hazy/   <- non-homogeneous hazy images
        GT/     <- clean ground truth
    """
    hazy_files = sorted(glob(os.path.join(nhhaze_root, 'hazy', '*.png')))
    if not hazy_files:
        hazy_files = sorted(glob(os.path.join(nhhaze_root, 'hazy', '*.jpg')))
    
    gt_files = sorted(glob(os.path.join(nhhaze_root, 'GT', '*.png')))
    if not gt_files:
        gt_files = sorted(glob(os.path.join(nhhaze_root, 'GT', '*.jpg')))
    
    pairs = []
    for hf, gf in zip(hazy_files, gt_files):
        hazy_img  = Image.open(hf).convert('RGB')
        clean_img = Image.open(gf).convert('RGB')
        
        if target_size:
            hazy_img  = hazy_img.resize(target_size, Image.LANCZOS)
            clean_img = clean_img.resize(target_size, Image.LANCZOS)
        
        pairs.append((
            np.array(hazy_img).astype(np.float32) / 255.0,
            np.array(clean_img).astype(np.float32) / 255.0
        ))
    
    print(f"Loaded {len(pairs)} NH-Haze v{version} pairs")
    return pairs

def visualise_transmission_map(hazy, clean, save_path=None):
    """
    Estimate and visualise approximate transmission map for NH-Haze analysis.
    Uses simplified DCP-based estimation for visualisation only.
    """
    import matplotlib.pyplot as plt
    
    A = np.percentile(hazy, 99.9, axis=(0,1))
    t_approx = 1.0 - np.min(hazy / A.clip(min=1e-6), axis=2)
    t_approx = t_approx.clip(0, 1)
    
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    axes[0].imshow(hazy); axes[0].set_title('Hazy input')
    axes[1].imshow(clean); axes[1].set_title('Clean GT')
    axes[2].imshow(t_approx, cmap='jet'); axes[2].set_title('Approx. haze density')
    for ax in axes: ax.axis('off')
    
    if save_path:
        plt.savefig(save_path, dpi=150, bbox_inches='tight')
    plt.show()

4.10 State-of-the-Art Numbers on NH-Haze

Model Year PSNR (dB) SSIM
DCP 2011 10.57 0.521
AOD-Net 2017 15.40 0.651
GDN 2019 13.80 0.520
FFA-Net 2020 19.87 0.692
AECR-Net 2021 19.88 0.720
DehazeFormer-B 2023 20.66 0.748
MB-TaylorFormer 2023 21.08 0.762

NH-Haze PSNR values are noticeably lower than O-Haze and I-Haze — non-homogeneous haze is genuinely harder to remove.

4.11 Known Limitations

  • 55 pairs total — still a small dataset. Same high-variance benchmark concerns as O-Haze and I-Haze.

  • Controlled non-homogeneity. The haze machine variability is not the same as natural spatially varying haze from moving air masses, so there may still be a gap to fully natural conditions.

  • No depth maps or metadata. No auxiliary information about haze distribution.

  • Evaluation split inconsistency. Some papers evaluate on 45 training images (having seen the data); others use only the 5 validation images. This inconsistency makes cross-paper comparison unreliable — always check which split a paper uses.

4.12 Research Angles for Final Year / PhD Students

  • Non-homogeneous haze modelling: Extend the ASM to a spatially varying A(x) model and design a network that explicitly estimates A(x) as a spatial map rather than a scalar.

  • Unsupervised segmentation of haze regions: Use NH-Haze's spatial variation to develop an attention map that identifies thick vs thin haze regions and applies different processing.

  • NH-Haze + RESIDE joint training study: Does training on NH-Haze's 45 pairs jointly with RESIDE-OTS improve PSNR on NH-Haze vs RESIDE-only? Quantify the benefit of real heterogeneous data.

  • Transformer attention visualisation on NH-Haze: Show that transformer attention heads in SwinIR-style models attend differently to high-haze vs low-haze regions. Use NH-Haze's spatial variation as a natural probe.

4.13 Quick Reference Card

NH-Haze | 55 real outdoor pairs (v1) | Real non-homogeneous optical haze | Non-commercial research licence | Use as: hard real-haze benchmark | Primary metrics: PSNR, SSIM | Download: data.vision.ee.ethz.ch/cvl/ntire20/nh-haze/


Dataset 5 — Dense-Haze

5.1 Overview

Dense-Haze is a real-image dehazing dataset specifically designed for the most extreme haze conditions — where visibility is severely reduced and the clean scene is almost entirely obscured. Released by Ancuti et al. at ICIP 2019, Dense-Haze challenges dehazing models in a regime where standard methods collapse and was used as the NTIRE 2019 challenge dataset. It represents the hardest evaluation point in the standard dehazing benchmark suite.

5.2 Origin and History

Dense-Haze was created at Hasselt University as a direct response to feedback from the O-Haze and I-Haze challenges: that those datasets, while real, did not capture the severity of haze in extreme visibility conditions such as thick fog, dust storms, or smoke-filled environments. The dataset was collected with the haze machine operating at maximum intensity and positioned to produce the densest possible haze while still allowing meaningful differences between hazy and clean images.

Dense-Haze was the primary evaluation dataset for the NTIRE 2019 Challenge on Image Dehazing, giving it significant exposure and making it a standard inclusion in any paper claiming state-of-the-art on challenging real haze conditions.

5.3 Haze Characteristics

Dense-Haze haze is real, extremely dense optical haze:

  • Transmission values: Very low throughout the image — many regions have t(x) < 0.1, meaning less than 10% of the original scene light reaches the camera. This is beyond the range most ASM-based methods handle.

  • Colour destruction: Dense haze effectively destroys colour information in distant and mid-range regions, producing an almost uniformly white-grey image in severe cases.

  • Scene content recovery difficulty: Even with ground truth available, recovering fine textures and colours from densely hazy images is extremely challenging — the information is genuinely lost due to photon scattering.

  • Spatial structure: Despite extreme density, some spatial variation exists — objects very close to the camera are slightly more visible than background elements.

5.4 Image Statistics

Attribute Value
Total image pairs 55 (hazy + clean)
Training pairs 45
Validation pairs 5
Test pairs (withheld) 5
Resolution ~1600×1200 pixels
Haze type Real, extremely dense outdoor haze
Scene content Outdoor: varying locations, trees, structures
Colour space RGB

5.5 Download and Access

bash

# Example: download Dense-Haze using wget after accepting terms on the official site
# Replace the URL with the actual download link provided after acceptance
wget -O dense_haze.zip "https://data.vision.ee.ethz.ch/cvl/ntire19/dense-haze/<token>/DenseHaze.zip"
unzip dense_haze.zip -d ./Dense-Haze/

5.6 Dataset Metadata

Field Detail
Official Download https://data.vision.ee.ethz.ch/cvl/ntire19/dense-haze/
Published September 2019 (ICIP 2019)
License Non-commercial research use
Authors Codruta O. Ancuti, Cosmin Ancuti, Radu Timofte, Luc Van Gool
File size ~600 MB
Citation Ancuti et al., "Dense-Haze: A Benchmark for Image Dehazing with Dense-Haze and Haze-Free Images," ICIP 2019

5.7 Licence

Dense-Haze is available for non-commercial research and education use. Citation of the ICIP 2019 paper is required. Hosted by ETH Zurich's Computer Vision Lab alongside O-Haze, I-Haze, and NH-Haze.

5.8 How Researchers Use Dense-Haze

Dense-Haze is used primarily as an extreme-conditions test set. Because it is the hardest benchmark in the standard suite, it clearly differentiates architectures:

  • A model that achieves >20 dB on Dense-Haze with good SSIM is genuinely handling severe haze.

  • Models that achieve 15–17 dB on Dense-Haze are barely recovering structure.

  • DCP and classic prior-based methods effectively fail (producing artefact-heavy results that may have lower PSNR than even the hazy input on some pairs).

The standard five-dataset evaluation protocol in competitive papers is: RESIDE-SOTS Indoor/Outdoor + O-Haze + I-Haze + NH-Haze + Dense-Haze. This suite covers synthetic, real-mild, real-moderate, real-non-homogeneous, and real-dense conditions comprehensively.

5.9 Code to Load Dense-Haze

import os
import numpy as np
from PIL import Image
from glob import glob
from skimage.metrics import peak_signal_noise_ratio as psnr
from skimage.metrics import structural_similarity as ssim

def load_densehaze_pairs(densehaze_root, target_size=(512, 512)):
    """
    Load Dense-Haze hazy-clean pairs.
    
    Expected structure:
      densehaze_root/
        hazy/   <- densely hazy images
        GT/     <- clean ground truth
    """
    hazy_files = sorted(glob(os.path.join(densehaze_root, 'hazy', '*.png')))
    if not hazy_files:
        hazy_files = sorted(glob(os.path.join(densehaze_root, 'hazy', '*.jpg')))
    gt_files = sorted(glob(os.path.join(densehaze_root, 'GT', '*.png')))
    if not gt_files:
        gt_files = sorted(glob(os.path.join(densehaze_root, 'GT', '*.jpg')))
    
    pairs = []
    for hf, gf in zip(hazy_files, gt_files):
        hazy  = Image.open(hf).convert('RGB')
        clean = Image.open(gf).convert('RGB')
        if target_size:
            hazy  = hazy.resize(target_size, Image.LANCZOS)
            clean = clean.resize(target_size, Image.LANCZOS)
        pairs.append((
            np.array(hazy).astype(np.float32) / 255.0,
            np.array(clean).astype(np.float32) / 255.0
        ))
    
    return pairs

def evaluate_densehaze_with_input_baseline(model_fn, pairs):
    """
    Evaluate dehazing model AND compute hazy-input PSNR for Dense-Haze.
    On very dense haze, some models perform WORSE than just passing
    the hazy image through — this comparison is informative.
    """
    model_psnr, input_psnr, ssim_list = [], [], []
    for hazy, clean in pairs:
        dehazed = model_fn(hazy)
        model_psnr.append(psnr(clean, dehazed, data_range=1.0))
        input_psnr.append(psnr(clean, hazy, data_range=1.0))
        ssim_list.append(ssim(clean, dehazed, data_range=1.0,
                               multichannel=True, channel_axis=2))
    
    print(f"Dense-Haze | Model PSNR: {np.mean(model_psnr):.2f} dB | "
          f"Input PSNR: {np.mean(input_psnr):.2f} dB | "
          f"Gain: {np.mean(model_psnr)-np.mean(input_psnr):+.2f} dB | "
          f"SSIM: {np.mean(ssim_list):.4f}")
    return model_psnr, input_psnr, ssim_list

5.10 State-of-the-Art Numbers on Dense-Haze

Model Year PSNR (dB) SSIM
DCP 2011 10.06 0.382
AOD-Net 2017 13.14 0.414
DCPDN 2018 13.66 0.432
EPDN 2019 16.15 0.519
FFA-Net 2020 14.39 0.452
MSBDN 2020 15.37 0.491
AECR-Net 2021 15.80 0.466
DehazeFormer-B 2023 16.62 0.560

Note: FFA-Net's PSNR drops relative to MSBDN on Dense-Haze despite being superior on RESIDE. This illustrates that ranking changes between datasets — always report multiple benchmarks.

5.11 Known Limitations

  • Ground truth recoverability ceiling. When haze is this dense, the scene information is physically lost in the captured image. Even perfect dehazing is bounded by information theory — no algorithm can recover detail that was not captured by the sensor.

  • Only 55 pairs — same small-dataset variance concerns as the other Hasselt University datasets.

  • Constant haze density within images. Dense-Haze, despite being "dense," is still relatively spatially uniform — it does not combine extreme density with spatial variation (the way NH-Haze does at lower density).

  • Evaluation instability: Tiny evaluation sets mean that results on Dense-Haze have very high confidence intervals. A 0.5 dB difference on 5 test images is not statistically meaningful.

5.12 Research Angles for Final Year / PhD Students

  • Information-theoretic limits of dehazing: Compute the maximum recoverable PSNR for Dense-Haze images given the transmission map. Show that the gap between current SOTA (~16 dB) and this theoretical maximum explains why further gains are fundamentally limited.

  • Diffusion models for dense dehazing: Standard regression-based models saturate around 16–17 dB on Dense-Haze. Score-based diffusion models can hallucinate plausible scene content from partial information — evaluate whether diffusion-based dehazing improves perceptual quality on Dense-Haze even when PSNR does not improve.

  • Multi-image fusion for dense haze: If multiple frames of the same scene are available (e.g., video), use temporal aggregation to improve Dense-Haze recovery. Even slight camera motion provides new viewing angles that contain different scene information.

  • Dense-haze pre-processing for downstream tasks: Evaluate whether Dense-Haze dehazing improves performance on downstream tasks like detection or segmentation on the dehazed images, even when the PSNR numbers seem mediocre.

5.13 Quick Reference Card

Dense-Haze | 55 real outdoor pairs | Real extreme-density optical haze | Non-commercial research licence | Use as: extreme-condition benchmark | Primary metrics: PSNR, SSIM | Download: data.vision.ee.ethz.ch/cvl/ntire19/dense-haze/


Image Dehazing Metrics Explained

PSNR — Peak Signal-to-Noise Ratio

PSNR measures the ratio of maximum signal power to distortion power in decibels. In dehazing, it compares the dehazed image against the clean ground truth:

PSNR = 10 × log₁₀(MAX² / MSE)

Higher PSNR = better dehazing. Typical ranges: 30–40+ dB on RESIDE-SOTS (synthetic), 20–25 dB on O-Haze/I-Haze (real mild), 19–21 dB on NH-Haze (real non-homogeneous), 14–17 dB on Dense-Haze (extreme). Never compare absolute PSNR across datasets — the haze difficulty fundamentally changes the achievable score.

SSIM — Structural Similarity Index

SSIM evaluates luminance, contrast, and structural similarity simultaneously, producing a score between 0 and 1 (higher = better). In dehazing, SSIM is particularly informative because it detects halo artefacts and contrast errors that PSNR tolerates. A model with high PSNR but low SSIM is typically over-brightening the image globally without restoring structural detail.

LPIPS — Learned Perceptual Image Patch Similarity

LPIPS uses deep network feature distances to measure perceptual similarity (lower = better). Increasingly reported in dehazing papers to detect the over-saturation and plastic-looking texture artefacts common in GAN-based methods. A dehazing model that scores 1 dB lower PSNR but has significantly better LPIPS is usually producing more visually pleasing results.

CIEDE2000 — Colour Difference Metric

CIEDE2000 measures perceptual colour difference between the dehazed and ground truth images, accounting for human colour perception nonlinearity. Particularly relevant for dehazing because haze introduces a strong colour cast (whitish or yellowish) that PSNR and SSIM may not penalise correctly. Lower CIEDE2000 = better colour fidelity.

FADE — Fog Aware Density Evaluator

FADE is a no-reference metric designed specifically for fog and haze assessment. It estimates haze density from the image without a clean reference, producing a score where lower = clearer image. FADE is useful for evaluating on unpaired real hazy images like RTTS, where no ground truth exists. Its limitations include poor calibration for non-natural images and sensitivity to over-brightening (which artificially reduces estimated haze density).

NIQE — Naturalness Image Quality Evaluator

NIQE measures deviation from natural image statistics without requiring a reference. In dehazing, NIQE helps detect when a model produces an unnaturally sharp or unnaturally smooth result — both common failure modes. Lower NIQE = more natural appearance.

Which Metric for Which Task?

Scenario Recommended Metrics
Synthetic haze (RESIDE SOTS) PSNR + SSIM
Real mild haze (O-Haze, I-Haze) PSNR + SSIM + LPIPS
Non-homogeneous haze (NH-Haze) PSNR + SSIM + CIEDE2000
Dense haze (Dense-Haze) PSNR + SSIM + visual comparison
Unpaired real haze (RTTS) FADE + NIQE (no-reference)
Full research paper PSNR + SSIM + LPIPS + FADE

Comparison Table — All 5 Datasets Across 12+ Attributes

Attribute RESIDE O-Haze I-Haze NH-Haze Dense-Haze
Year 2018/2019 2018 2018 2020 2019
# Pairs 313K+ (OTS) / 14K (ITS) 45 35 55 55
Haze Type Synthetic Real (machine) Real (machine) Real, non-homogeneous Real, dense
Setting Indoor + Outdoor Outdoor Indoor Outdoor Outdoor
Ground Truth Synthetic (exact) Yes (captured) Yes (captured) Yes (captured) Yes (captured)
Haze Density Configurable Medium-dense Medium Non-uniform Extreme
Resolution 460×620 to 1024+ 4964×3312 2833×4256 ~5000×3000 ~1600×1200
Depth Maps Yes (ITS) No No No No
Use as Train Yes (primary) No (too small) No (too small) No (borderline) No
Use as Test Yes (SOTS) Yes Yes Yes Yes
Licence Non-commercial Non-commercial Non-commercial Non-commercial Non-commercial
Typical PSNR 35–40 dB 22–24 dB 23–25 dB 19–21 dB 14–17 dB
Challenge NTIRE 2018/2019 NTIRE 2018/2019 NTIRE 2018/2019 NTIRE 2020/2021 NTIRE 2019

How to Choose the Right Dataset

By Haze Type

You are studying haze physics and want controlled conditions: Use RESIDE with synthetic ASM haze. You control A and β precisely and can isolate their effects.

You need to demonstrate real-world applicability at moderate haze: Use O-Haze (outdoor) or I-Haze (indoor) for evaluation. Train on RESIDE, evaluate on both.

You are targeting UAV, road scene, or satellite imagery in real hazy conditions: Use NH-Haze — its non-homogeneous structure best matches real atmospheric conditions over varied terrain.

You are working on extreme conditions (wildfire smoke, thick industrial fog, dense coastal mist): Use Dense-Haze as your primary benchmark.

By Domain

Autonomous driving: RESIDE-OTS (outdoor synthetic) + NH-Haze (non-homogeneous) + RTTS (real road scenes, unpaired). Road scenes need both far-field dehazing (for navigation) and near-field accuracy (for pedestrian detection).

Indoor CCTV/surveillance: RESIDE-ITS + I-Haze. Indoor depth ranges and artificial lighting conditions are specific to this domain.

Medical imaging / endoscopy: None of the five datasets are directly applicable. Turbid medium scattering in tissue is governed by different physics. Use these datasets for pre-training only.

Satellite/aerial remote sensing: RESIDE-OTS provides the largest training set. NH-Haze best approximates heterogeneous cloud/haze cover patterns.

By Compute Budget

GPU-constrained (≤8 GB VRAM): Train on RESIDE-ITS only (14K pairs, easily fits). Evaluate on SOTS-Indoor + I-Haze + O-Haze.

Standard research (24 GB VRAM): Train on RESIDE-ITS + OTS subset (100K pairs). Evaluate on full five-dataset suite.

Large-scale (multi-GPU): Use full RESIDE-OTS (313K pairs) + data augmentation + multi-scale training. Evaluate on all five datasets plus RTTS qualitative.


Common Dehazing Models Benchmarked

DCP — Dark Channel Prior (He et al., CVPR 2009 / TPAMI 2011): The foundational prior-based method. Exploits the observation that at least one colour channel has near-zero intensity in most haze-free patches. Computationally efficient and requires no training data. Still the mandatory baseline. Fails on sky regions and white objects where the dark channel prior breaks.

DehazeNet (Cai et al., IEEE TIP 2016): The first CNN-based dehazing method. Learns the transmission map from hazy image patches using a shallow convolutional network. Pioneered the end-to-end learning paradigm for dehazing.

AOD-Net (Li et al., ICCV 2017): Reformulates the ASM to directly estimate a unified parameter K that captures both A and t in a single network pass. Extremely lightweight and fast — good for embedded deployment despite lower PSNR than modern methods.

GFN — Gated Fusion Network (Ren et al., CVPR 2018): Multi-scale feature fusion with learned gating. Strong performance on O-Haze and I-Haze at the time of publication.

FFA-Net (Qin et al., AAAI 2020): Feature Fusion Attention Network — channel and pixel attention mechanisms allow the network to adaptively weight features by their importance for haze removal. Significant jump over prior methods on RESIDE-SOTS.

MSBDN (Dong et al., CVPR 2020): Multi-Scale Boosted Dehazing Network with dense feature fusion. Strong on RESIDE outdoor scenes.

AECR-Net (Wu et al., CVPR 2021): Adaptive Enhancement and Contrastive Regularisation. Uses a contrastive loss between hazy and clean features to improve perceptual quality alongside PSNR.

DehazeFormer (Song et al., IEEE TIP 2023): Transformer-based architecture with K-space normalisation and a modified window attention mechanism. Currently achieves the best published results on RESIDE-SOTS-Indoor (40+ dB) and strong results across all real-haze benchmarks.


How to Prepare Hazy-Clean Pairs for Training

Synthetic Haze Generation

If you want to create your own training data or augment existing datasets with synthetic haze:

import numpy as np
from PIL import Image
import cv2

def synthesise_haze_asm(clean_img, depth_map, beta_range=(0.05, 0.20), 
                         A_range=(0.7, 1.0)):
    """
    Generate a hazy image using the Atmospheric Scattering Model.
    
    Args:
        clean_img: float32 numpy array [H, W, 3] in [0, 1]
        depth_map: float32 numpy array [H, W] — scene depth, normalised to [0, 1]
        beta_range: scattering coefficient range
        A_range: atmospheric light range (per channel)
    
    Returns:
        hazy: synthesised hazy image
        t_map: transmission map
        A: atmospheric light vector
    """
    beta = np.random.uniform(*beta_range)
    A = np.random.uniform(A_range[0], A_range[1], size=(1, 1, 3)).astype(np.float32)
    
    # Normalise depth to a reasonable physical range (e.g., 0–10m)
    depth = depth_map * 10.0
    
    # Compute transmission map
    t_map = np.exp(-beta * depth).astype(np.float32)
    t_map = np.clip(t_map, 0.1, 1.0)  # prevent t=0 (complete obstruction)
    t_map = t_map[:, :, np.newaxis]   # shape [H, W, 1]
    
    # Apply ASM: I = J*t + A*(1-t)
    hazy = clean_img * t_map + A * (1.0 - t_map)
    hazy = np.clip(hazy, 0, 1).astype(np.float32)
    
    return hazy, t_map.squeeze(), A.squeeze()

def generate_hazy_dataset(clean_images, depth_maps, num_per_image=3):
    """
    Generate multiple hazy versions of each clean image using different A, β.
    Mimics RESIDE ITS generation where 10 hazy variants exist per clean image.
    """
    pairs = []
    for clean, depth in zip(clean_images, depth_maps):
        for _ in range(num_per_image):
            hazy, t_map, A = synthesise_haze_asm(clean, depth)
            pairs.append({
                'hazy': hazy, 
                'clean': clean, 
                't_map': t_map, 
                'A': A
            })
    return pairs

Real Haze Dataset Preprocessing

For O-Haze, I-Haze, NH-Haze, and Dense-Haze:

def preprocess_real_haze_pairs(pairs, patch_size=256, stride=128, 
                                 min_haze_score=0.01):
    """
    Extract patches from real haze pairs with quality filtering.
    Filters out near-sky patches where haze dominates without scene content.
    """
    filtered_patches = []
    
    for hazy, clean in pairs:
        # Skip patches where clean image is nearly uniform (sky, wall)
        # and hazy image adds almost no structure
        H, W, _ = hazy.shape
        
        for y in range(0, H - patch_size + 1, stride):
            for x in range(0, W - patch_size + 1, stride):
                hazy_p  = hazy[y:y+patch_size, x:x+patch_size]
                clean_p = clean[y:y+patch_size, x:x+patch_size]
                
                # Quality filter: discard near-uniform patches
                clean_std = np.std(clean_p)
                if clean_std < 0.03:  # uniform patch (sky/wall)
                    continue
                
                # Haze filter: discard patches where haze is minimal
                haze_diff = np.mean(hazy_p) - np.mean(clean_p)
                if haze_diff < min_haze_score:
                    continue
                
                filtered_patches.append((hazy_p, clean_p))
    
    print(f"Extracted {len(filtered_patches)} quality patches")
    return filtered_patches

Augmentation for Dehazing

python

import random

def augment_hazy_pair(hazy, clean):
    """
    Apply identical augmentation to hazy-clean pair.
    Dehazing-specific: avoid colour jitter on clean (distorts ground truth).
    """
    # Geometric augmentation (applied identically)
    if random.random() > 0.5:
        hazy  = hazy[:, ::-1, :].copy()
        clean = clean[:, ::-1, :].copy()
    if random.random() > 0.5:
        hazy  = hazy[::-1, :, :].copy()
        clean = clean[::-1, :, :].copy()
    k = random.randint(0, 3)
    hazy  = np.rot90(hazy, k).copy()
    clean = np.rot90(clean, k).copy()
    
    # Haze intensity augmentation (only applied to hazy, not clean)
    # Randomly scale the haze density to augment training distribution
    if random.random() > 0.7:
        alpha = random.uniform(0.8, 1.2)  # scale haze intensity
        # Move toward or away from the clean image
        hazy_aug = clean + alpha * (hazy - clean)
        hazy = np.clip(hazy_aug, 0, 1).astype(np.float32)
    
    return hazy, clean

Research Gap Radar — 5 Open Problems

Gap 1 — Synthetic-to-Real Domain Transfer. The community's best model achieves 40+ dB on RESIDE-SOTS-Indoor but only ~25 dB on I-Haze. This 15 dB gap represents a fundamental failure of synthetic-trained models on real data. No paper has yet demonstrated a principled solution that closes this gap without fine-tuning on real pairs. Domain adaptation, physics-informed augmentation, and self-supervised approaches are active but unsolved directions.

Gap 2 — Non-Homogeneous Haze Modelling. The ASM assumes a single global atmospheric light A — an assumption that fails for all real-world haze. NH-Haze made this problem visible, but current SOTA on NH-Haze (21 dB) is still substantially below what models achieve on homogeneous haze datasets. Designing architectures that explicitly model spatially varying A(x) and β(x) is an open problem.

Gap 3 — Video Dehazing with Temporal Consistency. Image dehazing models applied frame-by-frame to video produce temporal flickering. Real haze in video (dashcam footage, surveillance) is temporally correlated — adjacent frames share almost the same haze distribution. Exploiting this temporal correlation for consistency is largely unexplored compared to single-image methods.

Gap 4 — Dehazing for Downstream Tasks. Most dehazing papers optimise PSNR/SSIM — pixel-level fidelity. But the end goal is usually a downstream task: object detection, licence plate recognition, road segmentation. The question of whether PSNR improvement translates to downstream task improvement is not consistently answered. Task-aware dehazing loss functions are an open research direction.

Gap 5 — Nighttime and Coloured Haze. All five standard datasets contain daytime haze. Nighttime haze is fundamentally different: multiple light sources create spatially varying illumination, and haze interaction with artificial lights produces glow and halo artefacts that the ASM does not model. A benchmark and model suite for nighttime dehazing would be a significant contribution.


Implementation Roadmap — 8-Step Week-by-Week Guide

This roadmap targets a Final Year / M.Tech student with 2–3 months before submission and a single GPU (8–24 GB VRAM).

Week 1 — Environment and Baseline. Install PyTorch 2.x, clone BasicIR or the FFA-Net repository, and reproduce DCP results on RESIDE-SOTS-Indoor. Target baseline: ~16 dB (DCP). If you can get AOD-Net working: ~20 dB. These numbers confirm your evaluation pipeline is correct.

Week 2 — Dataset Setup. Download RESIDE-ITS (training) and SOTS (testing). Write the data loader from Section 12. Compute average hazy image statistics (mean brightness, contrast, estimated β distribution). Visualise 10 hazy-clean pairs. Download O-Haze and I-Haze for later evaluation.

Week 3 — Train a Baseline CNN. Train FFA-Net or a simplified MSBDN on RESIDE-ITS. Target: ≥30 dB on SOTS-Indoor after 100 epochs. If you cannot reach 30 dB, check your patch augmentation and normalisation.

Week 4 — Implement Your Contribution. Examples: a new attention mechanism (channel + spatial jointly), a transmission map estimation branch, a contrastive loss inspired by AECR-Net, a Swin Transformer block substitution. Implement and test in isolation.

Week 5 — Integrate and Train Full Model. Full training from scratch or from checkpoint. Evaluate on SOTS-Indoor/Outdoor every 50 epochs. Track both PSNR and SSIM.

Week 6 — Cross-Dataset Evaluation. Run your trained model (zero-shot, no fine-tuning) on O-Haze, I-Haze, NH-Haze, and Dense-Haze. Record all numbers. This is where most models reveal their weaknesses — and where your research narrative takes shape.

Week 7 — Ablation Studies. Remove your key component and retrain. Remove your loss function component and retrain. Show delta PSNR for each ablation. This is the most important section of your paper for reviewers.

Week 8 — Paper and Submission. Structure: Introduction → Related Work → Proposed Method → Experiments (tables across all 5 datasets) → Conclusion. Always include visual comparisons on at least 3 challenging images from each real-haze dataset.


Tools and Frameworks

BasicIR: The image restoration framework from XPixelGroup, the same team behind BasicSR. Specifically includes dehazing models (FFA-Net, AECR-Net) alongside other restoration tasks. Clean codebase, well-documented training configs. Repository: https://github.com/XPixelGroup/BasicIR

BasicSR: The broader image restoration framework covering denoising, super-resolution, and dehazing. Includes training scripts, loss functions, and metric evaluation utilities. Repository: https://github.com/XPixelGroup/BasicSR

DehazeFormer official code: https://github.com/IDKiro/DehazeFormer — includes training scripts for RESIDE and all five evaluation datasets. Best starting point for transformer-based dehazing.

OpenCV: For image I/O, colour space conversion, and the DCP baseline implementation. pip install opencv-python. OpenCV includes a built-in defogging module (cv2.createCLAHE) that can serve as a simple baseline.

scikit-image: PSNR, SSIM, and image quality metrics. pip install scikit-image.

IQA-PyTorch: Unified metric library for PSNR, SSIM, LPIPS, NIQE, FADE, and CIEDE2000. Essential for comprehensive evaluation. Repository: https://github.com/chaofengc/IQA-PyTorch. Install: pip install pyiqa.

FADE metric implementation: https://github.com/Utkarsh-Deshmukh/Fog-Aware-Density-Evaluator — Python implementation of the FADE no-reference fog density metric.


7 Common Mistakes Researchers Make

Mistake 1 — Reporting Only RESIDE-SOTS Numbers. A paper that reports 40 dB on SOTS-Indoor but does not evaluate on any real-haze dataset will face justified reviewer pushback. The community has largely moved to requiring at least one real-haze benchmark alongside RESIDE. Always include O-Haze or I-Haze results.

Mistake 2 — Confusing SOTS-Indoor and SOTS-Outdoor. These are separate test sets with different difficulty levels. SOTS-Outdoor typically yields 3–5 dB higher PSNR than SOTS-Indoor for the same model. Reporting "SOTS PSNR: 36 dB" without specifying indoor or outdoor is ambiguous and will be flagged by reviewers.

Mistake 3 — Evaluating at Wrong Resolution. O-Haze, I-Haze, NH-Haze, and Dense-Haze are captured at high resolution (2K–16 MP) but almost always evaluated at 512×512 or 1024×1024. Some papers resize differently, leading to incomparable numbers. Always state your evaluation resolution.

Mistake 4 — Training and Testing on the Same Real-Haze Dataset. With only 45 pairs in O-Haze, some researchers mistakenly use all 45 for both training fine-tuning and evaluation. This inflates numbers. Use the official train/validation/test split and evaluate only on the held-out split.

Mistake 5 — Ignoring the Sky Region Problem. DCP and many CNN methods produce visible artefacts in sky regions (over-darkening in DCP, bright halo in CNNs). If your visual comparisons never show sky regions, reviewers will notice. Include at least one challenging sky-containing image in your qualitative results.

Mistake 6 — Claiming SOTA Without Real-Haze Evaluation. Achieving 40+ dB on RESIDE-SOTS-Indoor is impressive but no longer sufficient to claim state-of-the-art in 2024. The bar has shifted to include real-haze performance. Ensure your proposed method is competitive on at least NH-Haze or Dense-Haze.

Mistake 7 — Using PSNR Alone to Compare Methods on Dense-Haze. On Dense-Haze, the PSNR range is 10–17 dB — an extremely compressed range where small numerical differences are noise. Always supplement Dense-Haze evaluation with SSIM, LPIPS, and side-by-side visual comparisons. A model at 15.5 dB with SSIM=0.55 and visually good textures is clearly superior to one at 16.0 dB with SSIM=0.45 and blurry output.


Your Next Steps + Conclusion

A Practical Action Plan

You now have a complete blueprint for image dehazing research. Here is how to convert it into output.

If you are a Final Year B.Tech student: Start with RESIDE-ITS and FFA-Net. Get a PSNR of ≥30 dB on SOTS-Indoor. Then add O-Haze evaluation — that cross-dataset number is what makes a thesis examiner nod approvingly.

If you are an M.Tech student with 6 months: Follow the 8-week roadmap. The key differentiator at this level is the cross-dataset evaluation table. Showing results on all five datasets — synthetic, real mild, real non-homogeneous, real dense — demonstrates thoroughness that distinguishes a publication-ready thesis from an average one.

If you are a PhD student: Engage directly with the Research Gap Radar. The synthetic-to-real gap (Gap 1) and non-homogeneous haze modelling (Gap 2) are the two problems where a strong paper could make a lasting impact. Gap 4 (dehazing for downstream tasks) is particularly underexplored and has natural conference venue fit (not just TIP/CVPR, but also ICCV and AAAI).

What the Community Has Learned

The history of dehazing benchmarking is a story of progressively harder datasets forcing the community to build progressively more realistic models. RESIDE established the first large-scale benchmark and revealed that deep learning could obliterate prior-based methods — on synthetic data. O-Haze and I-Haze exposed the synthetic-to-real gap. NH-Haze showed that even real-haze methods struggle with spatial variation. Dense-Haze identified the fundamental limits of information recovery.

Each dataset in this guide played a specific role in that progression. A researcher who understands all five — not just the one they plan to use — is equipped to position their work correctly in the literature, anticipate reviewer objections, and identify the next gap to close.

The datasets will remain relevant. The models that surpass current SOTA will come from researchers who ask not "how do I squeeze 0.2 dB on SOTS?" but "what is the next fundamental limitation this benchmark is revealing, and how do I address it?"


Further Reading and Resources

  • DCP Paper: He et al., "Single Image Haze Removal Using Dark Channel Prior," TPAMI 2011.

  • RESIDE Paper: Li et al., "Benchmarking Single-Image Dehazing and Beyond," IEEE TIP 2019.

  • FFA-Net Paper: Qin et al., "FFA-Net: Feature Fusion Attention Network for Single Image Dehazing," AAAI 2020.

  • AECR-Net Paper: Wu et al., "Contrastive Learning for Compact Single Image Dehazing," CVPR 2021.

  • DehazeFormer Paper: Song et al., "Vision Transformers for Single Image Dehazing," IEEE TIP 2023.

  • NTIRE 2020 Challenge Report: Ancuti et al., "NTIRE 2020 Challenge on NonHomogeneous Dehazing," CVPRW 2020.

  • IQA-PyTorch Documentation: https://iqa-pytorch.readthedocs.io

  • BasicIR Documentation: https://github.com/XPixelGroup/BasicIR

Originally published on Kaggle: Top 5 Image Dehazing Datasets for Researchers