The identification of enhancer–promoter interactions (EPIs), especially condition-specific ones, is important for the study of gene transcriptional regulation. Existing experimental approaches for EPI identification are still expensive, and available computational methods either do not consider or have low performance in predicting condition-specific EPIs.
We developed a novel computational method called EPIP to reliably predict EPIs, especially condition-specific ones. EPIP is capable of predicting interactions in samples with limited data as well as in samples with abundant data. Tested on more than eight cell lines, EPIP reliably identifies EPIs, with an average area under the receiver operating characteristic curve of 0.95 and an average area under the precision–recall curve of 0.73. Tested on condition-specific EPIPs, EPIP correctly identified 99.26% of them. Compared with two recently developed methods, EPIP outperforms them with a better accuracy.
In this project cell line specific enhancer-promoter interactions (EPIs) in the human body were analyzed. Enhancer data from two sources, active gene promoters and Hi-C data from multiple labs in seven cell lines were used for this analysis. Using 31 features in the enhancer and the promoter regions, including both common and region specific features, we designed a machine learning model that can predict cell specific EPIs with higher performance than the existing prediction tools and can also work with varying number of features. To design the model, we used an ensemble supervised machine learning classifier named AdaBoostClassifier. Among the 31 features, 14 features were specific to the enhancer and promoter regions and 3 features were common to both regions. The 14 features include 9 histone modification features, 4 transcription factor features and 1 chromatin accessibility feature. The features were divided into overlapping partitions which train separate incremental learners in the ensemble classifier.
The model was tested on five different test data sets and compared with state-of-the-art methods TargetFinder (Whalen et al., 2016) and Ripple (Roy et al., 2015). The model performance was impressive on all the five test data sets and higher than TargetFinder and Ripple on predicting cell specific EPIs.
Talukder, A., Saadat, S., Li, X., & Hu, H. (2019). EPIP: a novel approach for condition-specific enhancer–promoter interaction prediction. Bioinformatics, 35(20), 3877-3883.
It is still challenging to predict interacting enhancer-promoter pairs (IEPs), partially because of our limited understanding of their characteristics. To understand IEPs better, here we studied the IEPs in nine cell lines and nine primary cell types.
By measuring the bipartite clustering coefficient of the graphs constructed from these experimentally supported IEPs, we observed that one enhancer is likely to interact with either none or all of the target genes of another enhancer. This observation implies that enhancers form clusters, and every enhancer in the same cluster synchronously interact with almost every member of a set of genes and only this set of genes. We perceived that an enhancer can be up to two megabase pairs away from other enhancers in the same cluster. We also noticed that although a fraction of these clusters of enhancers do overlap with super-enhancers, the majority of the enhancer clusters are different from the known super-enhancers.
Our study showed a new characteristic of IEPs, which may shed new light on distal gene regulation and the identification of IEPs.
We used the chromatin contact data from five labs, multiple enhancer sources and active promoter regions, to confirm an interesting phenomena that shows that enhancers work in clusters to form EPIs. From all the data we got, we generated a total ten EPI data sets. We calculated Bipartite Clustering Coefficient (BCC) from the enhancer-promoter interaction networks extracted from the ten data sets and used it to verify the hypothesis.
The BCC values of the enhancers were close to 1 in all the ten different EPI data sets. This means enhancers do not tend to share promoters partially. A set of enhancers only share with a set of promoters and all the enhancers in that set interact with all the promoters. This indicates that enhancers take part in EPIs forming separate clusters. We also found that these clusters are cell specific.
Talukder, A., Hu, H., & Li, X. (2021). An intriguing characteristic of enhancer-promoter interactions. BMC genomics, 22(1), 1-13.
Pairs of interacting transcription factors (TFs) have previously been shown to bind to enhancers and promoters and contribute to their physical interactions. However, to date, we have limited knowledge about such TF pairs. To fill this void, we systematically studied the co-occurrence of TF-binding motifs in interacting enhancer–promoter (EP) pairs in seven human cell lines. We discovered 423 motif pairs that significantly co-occur in enhancers and promoters of interacting EP pairs. We demonstrated that these motif pairs are biologically meaningful and significantly enriched with motif pairs of known interacting TF pairs. We also showed that the identified motif pairs facilitated the discovery of the interacting EP pairs. The developed pipeline, EPmotifPair, together with the predicted motifs and motif pairs, is available at https://doi.org/10.6084/m9.figshare.14192000. Our study provides a comprehensive list of motif pairs that may contribute to EP physical interactions, which facilitate generating meaningful hypotheses for experimental validation.
We annotated enhancer-promoter (EP) pairs in seven cell lines; GM12878, HMEC, HUVEC, IMR90, K562, KBM7, NHEK; using the FANTOM annotated enhancer regions with active markers, active promoters regions of GENCODE annotated gene transcripts and chromatin contacts from Rao et al., 2014. We filtered the 649 non-redundant TF motifs from the known motifs of JASPAR and CIS-BP databases (Khan et al., 2018; Weirauch et al., 2014). TF motif modules were predicted from the concatenated sequences of the EP pairs and mapped with the 649 non-redundant TF motifs. The mapped motif pairs with one motif in enhancer and the other in the promoter regions were kept. The predicted motif pairs were analyzed for homogeneity and compared with the known interacting TF motifs from BioGRID database (Stark et al., 2006). The information of the predicted motif pairs were used to train a lasso machine learning model to classify EPIs from non-interacting EP pairs.
We predicted hundreds of TF motif pairs with one motif located in the enhancer and the other in the promoter region. These motif pairs were highly significant and mostly shared across different cell lines. The predicted motif pairs were also found to be enriched with the known interacting TF pairs in almost every cell line. We also found the predicted motif pairs useful to train the lasso machine learning model to efficiently identify EPIs. Based on the weights of the lasso model, we could select 147 non-redundant motif pairs that were most significant for the model to classify EPIs. We identified known TF pairs associated with 72 of the 147 motif pairs. 64 of the 72 motif pairs could be associated with interacting TF pairs.
Wang, S., Hu, H., & Li, X. (2022). A systematic study of motif pairs that may facilitate enhancer–promoter interactions. Journal of Integrative Bioinformatics, 19(1).