Associations between the urban exposome and type 2 diabetes: Results from penalised regression by least absolute shrinkage and selection operator and random forest models

Haykanush Ohanyan, L. tzen Portengen, Oriana Kaplani, Anke Huss, Gerard Hoek, Joline W. J. Beulens, Jeroen Lakerveld, Roel Vermeulen

Research output: Contribution to journalArticleAcademicpeer-review

3 Citations (Scopus)


BACKGROUND: Type 2 diabetes (T2D) is thought to be influenced by environmental stressors such as air pollution and noise. Although environmental factors are interrelated, studies considering the exposome are lacking. We simultaneously assessed a variety of exposures in their association with prevalent T2D by applying penalised regression Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest (RF), and Artificial Neural Networks (ANN) approaches. We contrasted the findings with single-exposure models including consistently associated risk factors reported by previous studies.

METHODS: Baseline data (n = 14,829) of the Occupational and Environmental Health Cohort study (AMIGO) were enriched with 85 exposome factors (air pollution, noise, built environment, neighbourhood socio-economic factors etc.) using the home addresses of participants. Questionnaires were used to identify participants with T2D (n = 676(4.6 %)). Models in all applied statistical approaches were adjusted for individual-level socio-demographic variables.

RESULTS: Lower average home values, higher share of non-Western immigrants and higher surface temperatures were related to higher risk of T2D in the multivariable models (LASSO, RF). Selected variables differed between the two multi-variable approaches, especially for weaker predictors. Some established risk factors (air pollutants) appeared in univariate analysis but were not among the most important factors in multivariable analysis. Other established factors (green space) did not appear in univariate, but appeared in multivariable analysis (RF). Average estimates of the prediction error (logLoss) from nested cross-validation showed that the LASSO outperformed both RF and ANN approaches.

CONCLUSIONS: Neighbourhood socio-economic and socio-demographic characteristics and surface temperature were consistently associated with the risk of T2D. For other physical-chemical factors associations differed per analytical approach.

Original languageEnglish
Article number107592
JournalEnvironment International
Early online date18 Oct 2022
Publication statusPublished - 1 Dec 2022


  • Deep learning
  • Machine learning
  • Neighbourhood socio-demographic characteristics
  • Neighbourhood socio-economic position
  • Temperature

Cite this