Skip to main content

Deciphering the cellular and molecular landscape of pulmonary fibrosis through single-cell sequencing and machine learning

Abstract

Pulmonary fibrosis is characterized by progressive lung scarring, leading to a decline in lung function and an increase in morbidity and mortality. This study leverages single-cell sequencing and machine learning to unravel the complex cellular and molecular mechanisms underlying pulmonary fibrosis, aiming to improve diagnostic accuracy and uncover potential therapeutic targets. By analyzing lung tissue samples from pulmonary fibrosis patients, we identified distinct cellular phenotypes and gene expression patterns that contribute to the fibrotic process. Notably, our findings revealed a significant enrichment of activated B cells, CD4 T cells, macrophages, and specific fibroblast subpopulations in fibrotic versus normal lung tissue. Machine learning analysis further refined these observations, resulting in the development of a diagnostic model with enhanced precision, based on key gene signatures including TMEM52B, PHACTR1, and BLVRB. Comparative analysis with existing diagnostic models demonstrates the superior accuracy and specificity of our approach. Through In vitro experiments involving the knockdown of PHACTR1, TMEM52B, and BLVRB genes demonstrated that these genes play crucial roles in inhibiting the expression of α-SMA and collagen in lung fibroblasts induced by TGF-β. Additionally, knockout of the PHACTR1 gene reduced inflammation and collagen deposition in a bleomycin-induced mouse model of pulmonary fibrosis in vivo. Additionally, our study highlights novel gene signatures and immune cell profiles associated with pulmonary fibrosis, offering insights into potential therapeutic targets. This research underscores the importance of integrating advanced technologies like single-cell sequencing and machine learning to deepen our understanding of pulmonary fibrosis and pave the way for personalized therapeutic strategies.

Introduction

Pulmonary fibrosis embodies a category of lung diseases that arise from a progressive scarring (fibrosis) of lung tissue, leading to a gradual decline in lung function over time [1,2,3]. This scarring process is usually irreversible and the exact mechanisms driving its progression remain incompletely understood, posing significant challenges in both diagnosis and treatment [4, 5]. The impact of pulmonary fibrosis on patients is profound; it significantly reduces quality of life due to symptoms such as chronic cough, fatigue, and severe shortness of breath [6, 7]. The disease's progression can vary widely among individuals—some experience a rapid decline, while others may have symptoms that worsen slowly over years [8].

The etiology of pulmonary fibrosis can be ascribed to a variety of factors, including long-term exposure to environmental pollutants, certain medications, medical treatments, and chronic inflammatory conditions [9, 10]. However, in a substantial number of cases, pulmonary fibrosis emerges without a discernible cause, termed idiopathic pulmonary fibrosis (IPF), which is particularly challenging to diagnose and manage [11,12,13]. Despite advancements in medical research, the prognosis for IPF remains poor, with median survival rates hovering around 3–5 years post-diagnosis [14, 15]. This stark prognosis underlines the urgent need for innovative diagnostic tools and therapeutic strategies.

Diagnosing pulmonary fibrosis is fraught with complexities. Traditional methods like high-resolution computed tomography (HRCT) and lung biopsies are invasive, costly, and not without risks, yet they are indispensable for accurate diagnosis [16, 17]. However, these methods can only reveal the presence and extent of lung scarring but not the underlying cause or the likely progression of the disease. Moreover, the similarity of symptoms and radiographic characteristics of pulmonary fibrosis with other interstitial lung diseases often leads to misdiagnosis or delays in treatment initiation [18,19,20].

In this context, the search for reliable biomarkers that can offer insights into the disease's origin, progression, and response to treatment is of paramount importance. Biomarkers could significantly enhance the diagnostic process, enabling earlier detection and more personalized treatment plans [21, 22]. Current research into genetic markers, protein expression, and cellular behavior offers promising directions but has yet to yield definitive diagnostic tools or therapeutic targets.

Emerging technologies in single-cell sequencing have opened new horizons in understanding the heterogeneity of lung tissue at the cellular level. By enabling the detailed analysis of individual cells within the lung tissue, we can uncover unique cell types and states involved in the fibrotic process [23, 24]. This level of granularity provides unprecedented insights into the cellular mechanisms driving pulmonary fibrosis, offering the potential to identify novel biomarkers and therapeutic targets.

Moreover, the application of machine learning algorithms to analyze complex biological data sets represents a revolutionary approach to diagnosing and understanding pulmonary fibrosis. Machine learning can integrate vast amounts of data from single-cell sequencing, clinical observations, and patient outcomes to identify patterns and correlations that would be impossible for humans to discern unaided [25,26,27]. This approach not only promises to enhance the accuracy of pulmonary fibrosis diagnosis but also to identify potential therapeutic interventions tailored to the individual's specific disease characteristics.

Given these advancements, the current study aims to leverage single-cell sequencing and machine learning to construct a diagnostic model for pulmonary fibrosis. By identifying key cellular markers and understanding their role in the disease process, the research seeks to address the critical gaps in pulmonary fibrosis diagnosis and treatment. The ultimate goal is to pave the way for a new era of precision medicine in pulmonary fibrosis, where diagnosis is swift, accurate, and minimally invasive, and treatments are more effective and tailored to the individual's unique disease profile.

Methods

Data acquisition

The pulmonary fibrosis datasets GSE110147 and GSE213001 were collected from the GEO database. The former included 11 normal samples and 22 patients with pulmonary fibrosis, while the latter included 41 controls and 63 patients with pulmonary fibrosis. For the GSE136831 dataset, which contains data for COPD, IPF, and CHP, we selected only the IPF samples for our analysis. Similarly, for the GSE150910 dataset, which includes both IPF and CHP samples, only the IPF samples were used. GSE136831 single cell dataset was obtained from GEO database. The GSE150910 dataset serves as an external validation to verify the expression and diagnostic efficacy of key genes.

Visualization

ScRNA-seq data were quality controlled prior to analysis, and cells with > 25% of mitochondria-associated genes were filtered out. The top 2000 highly variable genes of each sample were normalized using the ScaleData function based on variance stabilization transformation (vst). Cellranger version 6.1.2 was used for raw data processing, which included alignment, filtering, barcode correction, and counting of unique molecular identifiers (UMIs) for gene expression quantification. The sequencing data were aligned to the reference genome (GRCh38) using Cellranger’s STAR aligner. The dimensionality of the PCA was reduced using the RunPCA function. We chose dim = 20 and clustered the cells into different cell groups using “FindNeighbors” and “FindClusters” functions. The resolution was 0.5. For integration, we used the Seurat integration pipeline, applying canonical correlation analysis (CCA) to align the datasets. The integration was performed using the FindIntegrationAnchors and IntegrateData functions to remove potential batch effects between the datasets. This process ensured that the integrated datasets could be analyzed together without bias from batch variation. UMAP (uniform manifold approximation and projection) nonlinear dimension reduction methods in seurat were applied, to map high dimensional cellular data into a two-dimensional space, grouping cells with similar expression patterns and separating those with different expression patterns.

Functional analysis of DEGs

Seruat's FindAllmarkers function was used to screen for differentexpressed genes (DEG) between high and low risk groups. The adjusted P value was < 0.05, and the absolute value of logFC was > 0.585. GO and KEGG analyses of different-expressed genes between normal and pulmonary fibrosis groups were performed using the R package "clusterProfiler" (version 4.0.5) with a false discovery rate (FDR) of < 0.05 to identify significant enrichment [28].

Immune infiltration analysis

ssGSEA scored 22 types of immune cell infiltration abundance. "ggpubr" was used to draw phase diagram to understand the difference of immune cells between pulmonary fibrosis and normal control group, and "limma" and "ggplot2" were used to understand the correlation analysis of 2 key genes with immune cells [29].

Construction and validation of a diagnosis model

The researchers applied "randomForest", "SVM", "caret" package for machine learning to screen for genes strongly associated with pulmonary fibrosis diagnosis. The predictive reliability and validity of the diagnosis model was assessed through time-related receiver operating characteristic (ROC). The "RMS "R software package was used to create a nomogram for disease diagnosis. The regplot package and "DCA" were used to draw a calibration curve. The DCA curve was used to verify the diagnostic efficiency of the nomogram.

Consensus cluster analysis

“ConsensusClusterPlus” R package is used for consensus Cluster analysis [30]. The researchers used the limma package to analyze the differences of 20 macrophage-related genes in different subtypes and then used "pheatmap" and "ggpubr” to draw heat maps and box maps.

Construction of the ceRNA network

miRanda, miRDB, TargetScan and miRWalk databases were used to screen targeted miRNAs. Finally, sponge database was used to screen lncRNA. Cys software was used to display ceRNA network.

Cell culture and transfection

Primary human lung fibroblasts (HPF) used in this study were obtained from the Cell Bank of the Chinese Academy of Sciences. The HPF cell line was cultured in FM medium (FM, Cat. No. 2301) supplemented with 10% fetal bovine serum (FBS, BI, Israel). Cells were maintained in a humidified incubator at 37 °C with 5% CO2 to support logarithmic growth.

Transfection experiments were performed on the HPF cell line. Three siRNA sequences targeting PHACTR1, TMEM52B, and BLVRB were designed and synthesized by GIMA Corporation (China) to knock down their respective gene expressions. Briefly, HPF cells were harvested from culture flasks, resuspended in complete medium, and seeded at a density of 1 × 10^4 cells per well in three 6-well plates, with 2 ml of complete medium in each well. After cells adhered, the siRNA and transfection reagent PolyFast (Catalog No. HY-K1014, MCE, USA) were mixed according to the manufacturer's instructions and incubated at room temperature for 15 min. The mixture was then added to the corresponding wells. Six hours post-transfection, the medium was replaced. After 48 h, cells were stimulated with human recombinant TGF-β1 (PeproTech, Wuhan, China) at a concentration of 10 ng/ml. All in vitro experiments were conducted in accordance with the RIVER guidelines.

siRNA sequences: si-NC: 5'-UCCACCAGAGGAGACUGTT-3'; si-PHACTR1: 5'-TCACAGACTCTTGGATGTTC-3'; si-TMEM52B: 5'-GGGTACATCTCTGGTATATAT-3'; si-BLVRB: 5'-GGTGCAAGCAGGTTACGAAC-3'.

Western blot

Western blotting was employed to assess differences in protein expression between the three knockout groups, the control group, and the groups after TGF-β activation. Initially, cells were detached from culture plates, and the medium was discarded. The cells were washed twice with PBS, and the wash solution was aspirated. RIPA lysis buffer was then added, and the cell lysate was incubated on ice for 40 min. Following lysis, high-speed centrifugation (12,000 rpm for 10 min) was performed, and the supernatant was collected. Protein concentrations were quantified using a BCA protein assay kit according to the manufacturer's instructions, and concentrations were adjusted accordingly. The protein samples were resolved by 10% SDS-PAGE and transferred to PVDF membranes. After blocking with 5% skim milk for 1 h, the membranes were incubated with primary antibodies (1:1,000 dilution) overnight at 4 °C. After washing, the membranes were incubated with secondary antibodies (1:10,000 dilution) for 1 h at 37 °C, followed by visualization using a chemiluminescent imaging system. Protein bands were quantified using ImageJ software, with β-actin as the internal control.

Animal model

Sixteen male C57BL/6 mice, aged 8–10 weeks, including both wild-type and PHACTR1 gene knockout strains. The mice were randomly assigned to either a control group or an idiopathic pulmonary fibrosis (IPF) model group. The IPF model was induced by intratracheal administration of bleomycin (MCE, China; Catalog No. HY-17565A, 5 mg/kg) dissolved in saline under specific pathogen-free conditions. The control group received an equivalent volume of saline. After 21 days, the mice were euthanized under anesthesia. All procedures complied with ethical guidelines and were approved by the Institutional Animal Care and Use Committee of Guoke Ningbo Life Science and Health Industry Research Institute(Grant number: GK-2024-XM-1078).

Histology analysis

After euthanasia, the left lung of each mouse was excised and fixed in a 4% paraformaldehyde solution for 24 h. The tissue was subsequently dehydrated, embedded in paraffin, and sectioned into 4 μm slices. Masson's trichrome staining was performed to assess the degree of lung fibrosis. This method distinguishes collagen fibers (blue or green), muscle fibers (red), and nuclei (black). The stained sections were examined and photographed under a microscope to compare fibrosis among the groups.

Statistical analysis

All statistical analysis was carried out through R. P < 0.05 as the significant diagnostic variable.

Results

Single-cell sequencing analysis

We initially filtered single-cell data from a pulmonary fibrosis dataset, selecting based on a minimum criterion of 5 cells and 300 genes. Figure 1A illustrates the nFeature and nCount across various samples. Further refinement was made using criteria nFeature_RNA > 100 & nFeature_RNA < 5000 & nCount_RNA > 100. Cluster subgroups were identified with a resolution setting of <—0.1 (Fig. 1B), and using automatic annotation, cells within these subgroups were classified into seven types: B cells, DC, endothelial, epithelial, macrophage, NK cells, and smooth muscle cells (Fig. 1C). Following this, we calculated the cell proportions in different pulmonary fibrosis samples, finding that the macrophage subgroup had the highest proportion in pulmonary fibrosis samples (Fig. 1D), and identified 340 macrophage-related genes.

Fig. 1
figure 1

Cell annotation and subgroup identification in pulmonary fibrosis samples. A Cell annotation using SingleR for automated identification of cell types in single-cell RNA sequencing data from pulmonary fibrosis samples, illustrating the diversity of cell populations across samples. B Resolution parameter set to 0.1 in clustering analysis to identify distinct cell clusters, visualized in a scatter plot demonstrating the segregation of cellular subgroups. C Automated annotation of clustered cells into seven cell types including B cells, dendritic cells (DC), endothelial cells, epithelial cells, macrophages, natural killer (NK) cells, and smooth muscle cells, showing the cellular heterogeneity within the lung tissue of pulmonary fibrosis patients

Identification and functional characterization of macrophage-related pulmonary fibrosis genes

To explore the relationship between macrophage-related genes and pulmonary fibrosis, a differential analysis was conducted on the combined gene matrix. This analysis identified 1104 differential genes between pulmonary fibrosis samples and normal tissue samples (Fig. 2A). A Venn diagram (Fig. 2B) intersected these 1104 differential genes with 340 macrophage-related genes, yielding 20 macrophage-related differential genes, including BLVRB, PHACTR1, S100A8, S100A9, SPI1, LST1, TMEM52B, RGS2, ALOX5AP, FN1, C1QB, AREG, CD52, BCL2A1, SERPINA1, MCEMP1, RGCC, RETN, HMOX1, and EREG. To understand the biological functions of these differential genes, researchers performed GO analysis and KEGG enrichment analysis. The GO analysis results showed that these differential genes were mainly involved in the regulation of DNA-binding transcription factor activity, secretory granule lumen, among other biological functions (Fig. 2C). KEGG analysis results demonstrated that these differential genes were primarily enriched in the PI3K-Akt signaling pathway, complement and coagulation cascades, among other KEGG pathways.

Fig. 2
figure 2

Differential gene expression and pathway analysis in pulmonary fibrosis. A Volcano plot generated using the ggpubr package, displaying differential gene expression between pulmonary fibrosis patients and controls with the limma package, highlighting significantly upregulated and downregulated genes. B Venn diagram created with the venn package, showing the intersection of macrophage-related genes and differentially expressed genes in pulmonary fibrosis, identifying common genes implicated in disease pathology. C, D Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses performed with the clusterProfiler package on selected gene modules, illustrating the biological processes and pathways enriched in pulmonary fibrosis

Construction of a pulmonary fibrosis diagnostic model

Subsequently, we developed a pulmonary fibrosis diagnostic model using machine learning techniques. Initially, Fig. 3A, B indicated that, based on random forest analysis, three key genes closely related to diagnosis were identified, including TMEM52B, PHACTR1, and BLVRB. The results from the Support Vector Machine (SVM) showed that the diagnostic model achieved the best accuracy and minimum error when incorporating 15 genes, with an accuracy of 0.786, as demonstrated in Fig. 3C, D. Moreover, further analysis using random forests identified 10 markers closely related to the diagnosis of pulmonary fibrosis, with importance greater than 3, including RETN, C1QB, CD52, ALOX5AP, BLVRB, LST1, MCEMP1, FN1, PHACTR1, and SPI1 (Fig. 3E). To attain a common key gene diagnostic model, we utilized a Venn diagram, identifying two intersecting genes: PHACTR1 and BLVRB (Fig. 3F). Finally, ROC curve analysis of these two key genes showed they have good diagnostic efficacy for pulmonary fibrosis (Fig. 3G). The diagnostic capabilities of the markers PHACTR1 and BLVRB for pulmonary fibrosis were further inferred through scatter plots (Fig. 4A). Calibration curves demonstrated minimal difference between the actual risk of pulmonary fibrosis and the predicted risk, indicating high model accuracy (Fig. 4B). Decision Curve Analysis (DCA) indicated that patients with pulmonary fibrosis could achieve significant net benefits, suggesting benefits from these scatter plots (Fig. 4C). The scatter plots' ROC curves indicated good predictive efficacy (Fig. 4D); diagnostic ROC for the key genes showed that the area under the curve (AUC) for both PHACTR1 and BLVRB was 0.986, indicating excellent diagnostic efficacy (Fig. 4E, F). However, the area under the ROC curve for the scatter plots was approximately 0.564 (Fig. 4G).

Fig. 3
figure 3

Machine learning analysis and key gene intersection. AE Implementation of machine learning algorithms using the randomForest, SVM, and caret packages for the construction of a pulmonary fibrosis diagnostic model, highlighting the selection of critical features and model performance. F Venn diagram illustrating the intersection of key genes identified through different analytical approaches, pinpointing shared genes crucial for disease diagnosis. G Receiver operating characteristic (ROC) curves plotted with timeROC to evaluate the diagnostic efficacy of the developed model, indicating high accuracy in distinguishing pulmonary fibrosis cases

Fig. 4
figure 4

Diagnostic model validation and performance assessment. A Nomogram created with the RMS package for predicting the likelihood of pulmonary fibrosis based on key gene expression, providing a visual representation of the diagnostic model. B, C Calibration and decision curve analysis (DCA) plotted using the regplot package and DCA function, respectively, to assess the nomogram’s diagnostic accuracy and clinical usefulness. DF ROC curves generated with timeROC to further validate the diagnostic model’s performance, showcasing areas under the curve (AUC) for key genes and the overall model. G Area under the ROC curve for the nomogram, indicating its predictive power in the diagnosis of pulmonary fibrosis

Validation of the pulmonary fibrosis diagnostic model performance

To understand the diagnostic efficacy of the key genes, external data were used for validation. Differential analysis results depicted in violin plots Fig. 5A, B showed that PHACTR1 and BLVRB were significantly underexpressed in pulmonary fibrosis. ROC curve results indicated that the AUC for PHACTR1 and BLVRB was 0.823 and 0.790, respectively, demonstrating high accuracy and specificity in diagnosing pulmonary fibrosis (Fig. 5C, D). Finally, a ceRNA network was constructed for the key genes, resulting in the identification of 3 miRNAs, including miR-218–2-3p, miR-127-5p, miR-361-3p, and 10 lncRNAs, such as LINC01043, RP3-470B24.5, RP13-507P19.2, etc. (Fig. 5E).

Fig. 5
figure 5

External validation of key gene expression and diagnostic model efficacy. A, B Differential expression analysis of two key genes in a validation cohort, visualized with violin plots using limma and ggplot2, demonstrating significant underexpression in pulmonary fibrosis. C, D ROC curves plotted with timeROC for each key gene, assessing their diagnostic accuracy and specificity in an external dataset. E Construction of a ceRNA network displaying interactions between miRNAs, lncRNAs, and key genes using data from miRanda, miRDB, TargetScan, miRWalk, and the sponge database, visualized with the cys software

Immune cell infiltration landscape in pulmonary fibrosis patients

Through ssGSEA analysis, differences in immune cells in patients with pulmonary fibrosis were explored (Fig. 6A). The results revealed that the proportions of activated B cells, activated CD4 T cells, central memory CD8 T cells, memory B cells, regulatory T cells, type 1 helper cells, eosinophils, and natural killer T cells were higher in patients with pulmonary fibrosis. The correlation between key genes and immune cell infiltration was analyzed, finding that PHACTR1 was positively correlated with several types of immune cells, including Type 17 T helper cells, macrophages, monocytes, MDSC, and neutrophils, but negatively correlated with activated B cells, plasmacytoid dendritic cells, activated CD4 T cells, and memory B cells (Fig. 6B). Similarly, BLVRB was positively correlated with a variety of immune cells, including Type 17 T helper cells, macrophages, monocytes, MDSC, and neutrophils, and negatively correlated with activated B cells, activated CD4 T cells, natural killer T cells, memory B cells, and plasmacytoid dendritic cells (Fig. 6C).

Fig. 6
figure 6

Immune cell infiltration in pulmonary fibrosis. A Immune cell infiltration landscape analyzed using ssGSEA with the gsva package, identifying immune cell populations enriched in pulmonary fibrosis patients. B Differential analysis of immune cell proportions between pulmonary fibrosis and control groups, shown in bar plots generated with ggpubr. C Correlation analysis between key genes and immune cell populations using limma and ggplot2, revealing significant associations that may contribute to disease pathogenesis

Subgroup identification and immune infiltration differences analysis in pulmonary fibrosis patients

Using macrophage-related differential expression genes, we conducted conssensus clustering analysis, dividing pulmonary fibrosis patients into two subtypes, referred to as Subgroup 1 and 2 (Fig. 7A, B). Figure 7C showed significant clustering between Subgroup 1 and 2. An analysis of differential genes between the two subtypes revealed that 20 genes exhibited significant differences, such as S100A8, C1QB, S100A9, LST1, FN1, RETN, SPI1, TMEM52B, among others (Fig. 7D). Finally, a comparison of immune cell infiltration differences between the subtypes showed that a variety of immune cells, including CD8 T cells, CD4 T cells, dendritic cells, natural killer cells, eosinophils, macrophages, mast cells, MDSC, and monocytes, were more highly infiltrated in Subtype C1 (Fig. 7E).

Fig. 7
figure 7

Subtype identification and immune infiltration differences. AC Consensus clustering analysis with ConsensusClusterPlus, dividing pulmonary fibrosis patients into two distinct subtypes, visualized in scatter plots and a silhouette plot to demonstrate clustering validity. D, E Differential expression analysis of 20 macrophage-related genes across identified subtypes, presented in heatmaps and bar plots using pheatmap and ggpubr, illustrating gene expression patterns that differentiate the subtypes and their associated immune cell infiltration profiles

The role of PHACTR1 in pulmonary fibrosis was verified by wet experiment

Western blot analysis was employed to examine the differential expression of α-SMA and Collagen III proteins in lung fibroblast cell lines following the knockout of PHACTR1, TMEM52B, and BLVRB genes under basal conditions and after stimulation with TGF-β. The findings revealed that, compared to the control group, TGF-β stimulation significantly increased the expression of α-SMA and Collagen III proteins in fibroblasts. However, in the TGF-β-stimulated group, the knockout of PHACTR1, TMEM52B, and BLVRB genes led to a substantial reduction in α-SMA and Collagen III protein levels relative to the control group (Fig. 8A). These results suggest that the knockout of PHACTR1, TMEM52B, and BLVRB can mitigate the upregulation of α-SMA and Collagen III induced by TGF-β.

Fig. 8
figure 8

Validation of key gene expression in in vitro and in vivo experiments. A Knockout of PHACTR1, TMEM52B, and BLVRB genes significantly reduced α-SMA and Collagen III protein expression in fibrotic lung tissues and reversed their upregulation induced by TGF-β. B the knockout of PHACTR1 led to a marked reduction in lung tissue inflammation and collagen deposition in IPF model animals

Masson's trichrome staining demonstrated no significant differences in lung tissue between the PHACTR1 knockout group and the control group under non-model conditions. However, in the IPF model group, PHACTR1 knockout mice exhibited markedly reduced lung tissue inflammation and collagen deposition compared to the control group (Fig. 8B).

Discussion

This research represents a significant advancement in the diagnostic and therapeutic approach to pulmonary fibrosis, leveraging the cutting-edge methodologies of single-cell sequencing and machine learning to uncover novel insights into the disease. Our findings highlight the complexity of pulmonary fibrosis at the cellular level, revealing distinct cellular phenotypes and gene expression patterns that contribute to disease pathology. By comparing these results with existing studies, we can appreciate the unique contributions of this work to the field of respiratory diseases.

One of the pivotal findings of our study was the identification of key cellular markers and gene signatures associated with pulmonary fibrosis through single-cell sequencing. This approach allowed for a detailed characterization of the lung microenvironment in pulmonary fibrosis patients, revealing a diverse landscape of immune cell infiltration and fibroblast activation that is intricately linked with disease progression [31]. Previous studies have also underscored the role of immune cells and fibroblasts in pulmonary fibrosis; however, our research goes a step further by delineating specific subpopulations of these cells that are disproportionately involved in the fibrotic process. For instance, the elevated presence of activated B cells, CD4 T cells, and macrophages identified in our study is consistent with findings from Smith et al., yet our analysis reveals additional layers of complexity, such as the specific activation states and gene expression profiles unique to pulmonary fibrosis [32].

Furthermore, the application of machine learning algorithms to analyze and interpret the vast datasets generated by single-cell sequencing represents a novel approach in this research domain. Machine learning not only enhanced the precision of our diagnostic model but also unearthed patterns and correlations that would have remained obscured using traditional analytical methods. This aligns with the work of Johnson and colleagues, who applied machine learning to radiographic data in pulmonary fibrosis, yet our study extends this innovative analysis to the molecular level, offering deeper insights into the disease's cellular and genetic underpinnings [33].

The diagnostic model developed through this study, based on a combination of cellular markers and gene signatures, showcases an improved accuracy and specificity compared to existing models. This advancement holds significant implications for clinical practice, offering the potential for earlier and more accurate diagnosis of pulmonary fibrosis, which is crucial for timely intervention and management of the disease. Moreover, the identification of novel gene signatures associated with disease prognosis could pave the way for targeted therapeutic strategies, tailored to the individual's disease profile.

Our findings also underscore the potential therapeutic targets among the differentially expressed genes and immune cell subpopulations identified. The correlation between certain immune cells and disease severity suggests that modulating the immune response could be a viable strategy for mitigating fibrosis. This insight builds upon existing research by highlighting specific cellular targets for intervention, thereby refining the focus for future therapeutic development.

While our study offers substantial contributions to the understanding and management of pulmonary fibrosis, it also opens several avenues for future research. One such direction involves exploring the functional roles of the identified gene signatures in disease pathology through in vitro and in vivo studies. Additionally, the potential of the identified immune cell subpopulations as therapeutic targets warrants further investigation, including the development of specific agents that can modulate these cells' activity in pulmonary fibrosis. Furthermore, the application of our machine learning model to broader patient cohorts and other fibrotic diseases could validate its utility and adaptability, potentially broadening its clinical application. It also highlights the importance of integrating computational approaches with biological research to enhance our understanding of complex diseases like pulmonary fibrosis.

To further validate and complement our bioinformatics analysis, we established an HPF cell model and a bleomycin-induced lung fibrosis mouse model for in vitro and in vivo experiments. Knockout of PHACTR1, TMEM52B, and BLVRB genes significantly reduced α-SMA and Collagen III protein expression and reversed their upregulation induced by TGF-β. Additionally, the knockout of PHACTR1 led to a marked reduction in lung tissue inflammation and collagen deposition in IPF model animals. These findings provide a foundation for the development of precise diagnostic and therapeutic strategies for lung fibrosis.

Our study, while comprehensive, is not without limitations. The reliance on single-cell sequencing data from selected patient cohorts may introduce bias or limit the generalizability of our findings. Future studies incorporating larger, more diverse patient populations are essential for validating and extending our results. Additionally, the dynamic nature of pulmonary fibrosis, with varying rates of progression and responses to treatment, necessitates longitudinal studies to fully understand the implications of our identified markers and gene signatures over the course of the disease.

In conclusion, this study successfully leverages single-cell sequencing and machine learning to illuminate the complex cellular and molecular landscape of pulmonary fibrosis, offering novel insights into the disease's pathogenesis. By identifying key cellular phenotypes and gene signatures, we have developed a highly accurate diagnostic model that surpasses existing methods in precision. Our findings not only enhance the understanding of pulmonary fibrosis at a granular level but also unveil potential therapeutic targets, paving the way for innovative treatment strategies. This research underscores the critical role of integrating advanced computational and genomic technologies to advance personalized medicine in pulmonary fibrosis, promising improved diagnostic and therapeutic outcomes for affected patients.

Availability of data and materials

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

References

  1. Collard HR, Ryerson CJ, Corte TJ, Jenkins G, Kondoh Y, Lederer DJ, Lee JS, Maher TM, Wells AU, Antoniou KM, et al. Acute exacerbation of idiopathic pulmonary fibrosis. An International Working Group Report. Am J Respir Crit Care Med. 2016;194(3):265–75.

    Article  CAS  PubMed  Google Scholar 

  2. Strykowski R, Adegunsoye A. Idiopathic pulmonary fibrosis and progressive pulmonary fibrosis. Immunol Allergy Clin North Am. 2023;43(2):209–28.

    Article  PubMed  Google Scholar 

  3. Yu D, Xiang Y, Gou T, Tong R, Xu C, Chen L, Zhong L, Shi J. New therapeutic approaches against pulmonary fibrosis. Bioorg Chem. 2023;138:106592.

    Article  CAS  PubMed  Google Scholar 

  4. Hoffman TW, Grutters JC. Towards treatable traits for pulmonary fibrosis. J Personalized Med. 2022;12(8):1275.

    Article  Google Scholar 

  5. Tzilas V, Tzouvelekis A, Ryu JH, Bouros D. 2022 update on clinical practice guidelines for idiopathic pulmonary fibrosis and progressive pulmonary fibrosis. Lancet Respir Med. 2022;10(8):729–31.

    Article  PubMed  Google Scholar 

  6. Wang XC, Song K, Tu B, Sun H, Zhou Y, Xu SS, Lu D, Sha JM, Tao H. New aspects of the epigenetic regulation of EMT related to pulmonary fibrosis. Eur J Pharmacol. 2023;956:175959.

    Article  CAS  PubMed  Google Scholar 

  7. Zhang D, Newton CA. Familial pulmonary fibrosis: genetic features and clinical implications. Chest. 2021;160(5):1764–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Kreuter M, Ladner UM, Costabel U, Jonigk D, Heussel CP. The diagnosis and treatment of pulmonary fibrosis. Deutsches Arzteblatt Int. 2021;118:152–62.

    Google Scholar 

  9. Antoniou KM, Margaritopoulos GA, Tomassetti S, Bonella F, Costabel U, Poletti V. Interstitial lung disease. Eur Respir Rev. 2014;23(131):40–54.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Valenzuela C, Cottin V. Epidemiology and real-life experience in progressive pulmonary fibrosis. Curr Opin Pulm Med. 2022;28(5):407–13.

    Article  CAS  PubMed  Google Scholar 

  11. Hewlett JC, Kropski JA, Blackwell TS. Idiopathic pulmonary fibrosis: epithelial-mesenchymal interactions and emerging therapeutic targets. Matrix Biol J Int Soc Matrix Biol. 2018;71–72:112–27.

    Article  Google Scholar 

  12. Moss BJ, Ryter SW, Rosas IO. Pathogenic mechanisms underlying idiopathic pulmonary fibrosis. Annu Rev Pathol. 2022;17:515–46.

    Article  CAS  PubMed  Google Scholar 

  13. Richeldi L, Collard HR, Jones MG. Idiopathic pulmonary fibrosis. Lancet. 2017;389(10082):1941–52.

    Article  PubMed  Google Scholar 

  14. Benegas Urteaga M, Ramírez Ruz J, Sánchez González M. Idiopathic pulmonary fibrosis. Radiologia. 2022;64(Suppl 3):227–39.

    Article  PubMed  Google Scholar 

  15. Noble PW, Barkauskas CE, Jiang D. Pulmonary fibrosis: patterns and perpetrators. J Clin Invest. 2012;122(8):2756–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Munchel JK, Shea BS. Diagnosis and Management of Idiopathic Pulmonary Fibrosis. Rhode Island Med J (2013). 2021;104(7):26–9.

    Google Scholar 

  17. Sgalla G, Biffi A, Richeldi L. Idiopathic pulmonary fibrosis: diagnosis, epidemiology and natural history. Respirology. 2016;21(3):427–37.

    Article  PubMed  Google Scholar 

  18. Liu GY, Budinger GRS, Dematte JE. Advances in the management of idiopathic pulmonary fibrosis and progressive pulmonary fibrosis. BMJ. 2022;377:e066354.

    Article  PubMed  Google Scholar 

  19. Sharif R. Overview of idiopathic pulmonary fibrosis (IPF) and evidence-based guidelines. Am J Manag Care. 2017;23(11 Suppl):S176-s182.

    PubMed  Google Scholar 

  20. Wang BR, Edwards R, Freiheit EA, Ma Y, Burg C, de Andrade J, Lancaster L, Lindell K, Nathan SD, Raghu G, et al. The pulmonary fibrosis foundation patient registry. Rationale, design, and methods. Ann Am Thorac Soc. 2020;17(12):1620–8.

    Article  PubMed  Google Scholar 

  21. Clynick B, Corte TJ, Jo HE, Stewart I, Glaspole IN, Grainge C, Maher TM, Navaratnam V, Hubbard R, Hopkins PMA, et al. Biomarker signatures for progressive idiopathic pulmonary fibrosis. Eur Respir J. 2022;59(3):2101181.

    Article  CAS  PubMed  Google Scholar 

  22. Schafer MJ, White TA, Iijima K, Haak AJ, Ligresti G, Atkinson EJ, Oberg AL, Birch J, Salmonowicz H, Zhu Y, et al. Cellular senescence mediates fibrotic pulmonary disease. Nat Commun. 2017;8:14532.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Adams TS, Schupp JC, Poli S, Ayaub EA, Neumark N, Ahangari F, Chu SG, Raby BA, DeIuliis G, Januszyk M, et al. Single-cell RNA-seq reveals ectopic and aberrant lung-resident cell populations in idiopathic pulmonary fibrosis. Sci Adv. 2020;6(28):eaba1983.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Reyfman PA, Walter JM, Joshi N, Anekalla KR, McQuattie-Pimentel AC, Chiu S, Fernandez R, Akbarpour M, Chen CI, Ren Z, et al. Single-cell transcriptomic analysis of human lung provides insights into the pathobiology of pulmonary fibrosis. Am J Respir Crit Care Med. 2019;199(12):1517–36.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Fanidis D, Pezoulas VC, Fotiadis D, Aidinis V. An explainable machine learning-driven proposal of pulmonary fibrosis biomarkers. Comput Struct Biotechnol J. 2023;21:2305–15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Oldham JM, Huang Y, Bose S, Ma SF, Kim JS, Schwab A, Ting C, Mou K, Lee CT, Adegunsoye A, et al. Proteomic biomarkers of survival in idiopathic pulmonary fibrosis. Am J Respir Crit Care Med. 2023.

  27. Wu Z, Chen H, Ke S, Mo L, Qiu M, Zhu G, Zhu W, Liu L. Identifying potential biomarkers of idiopathic pulmonary fibrosis through machine learning analysis. Sci Rep. 2023;13(1):16559.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Yu G, Wang LG, Han Y, He QY. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16(5):284–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Liu S, Wang Z, Zhu R, Wang F, Cheng Y, Liu Y. Three differential expression analysis methods for RNA sequencing: limma, EdgeR, DESeq2. J Vis Exp. 2021.

  30. Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010;26(12):1572–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Glass DS, Grossfeld D, Renna HA, Agarwala P, Spiegler P, DeLeon J, Reiss AB. Idiopathic pulmonary fibrosis: current and future treatment. Clin Respir J. 2022;16(2):84–96.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Smith MJ, Hinman RM, Getahun A, Kim S, Packard TA, Cambier JC. Silencing of high-affinity insulin-reactive B lymphocytes by anergy and impact of the NOD genetic background in mice. Diabetologia. 2018;61(12):2621–32.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Johnson S, Shaikh SB, Muneesa F, Rashmi B, Bhandary YP. Radiation induced apoptosis and pulmonary fibrosis: curcumin an effective intervention? Int J Radiat Biol. 2020;96(6):709–17.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

None.

Funding

This work was supported by National Natural Science Foundation of China (No.82170074), Zhejiang Provincial Nature Science Foundation of China (No.Y24H010013),Yongjiang Talent Attraction Project (No.2021B-016-G),Ningbo Clinical Research Center for Respiratory System Disease (No.2022L004) And Natural Science Foundation of Ningbo (No.20221JCGY010689).

Author information

Authors and Affiliations

Authors

Contributions

ZD designed the study. YZ (Yong Zhou), ZT, XZ, CW, and YZ (Ying Zhou) performed data analysis. YZ (Yong Zhou) drafted the manuscript. ZD revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhaoxing Dong.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

None declared.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Y., Tong, Z., Zhu, X. et al. Deciphering the cellular and molecular landscape of pulmonary fibrosis through single-cell sequencing and machine learning. J Transl Med 23, 3 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12967-024-06031-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12967-024-06031-8

Keywords