Fig. 4

Overview of CONCH (CONtrastive learning from Captions for Histopathology). Illustration of the CONCH model architecture and dataset composition, comprising approximately 1.17 million image-text pairs, including 457,373 H&E staining pairs and 713,595 IHC and special staining pairs. The figure shows the data processing pipeline, including object detection, caption splitting, and image-text matching, along with key performance metrics in zero-shot classification and cross-modal retrieval tasks