Ongoing research project · ExposUM 2024

EAD

Environmental Anomaly Detection and Health Impacts: Addressing PFAS Risks
An interdisciplinary project combining environmental data analysis, knowledge representation, anomaly detection, and visual analytics to better understand PFAS contamination and its potential impacts on ecosystems and public health.

PFAS Ontology Engineering Knowledge Graphs Anomaly Detection Visual Analytics One Health

Project overview

PFAS are persistent pollutants detected in water, soil, wildlife, and human biological systems. EAD investigates how semantic technologies, machine learning, and visual exploration can be combined to detect anomalies, structure knowledge, and support explainable environmental analysis.

Why PFAS?

PFAS are a large family of highly persistent chemical compounds associated with major environmental and public health concerns. Their spread, persistence, and complex exposure pathways require more integrated analysis methods.

Core ambition

Build a comprehensive framework to integrate heterogeneous data, detect contamination patterns, and relate environmental exposure to potential health impacts.

Scientific perspective

The project follows a One Health vision, linking environmental quality, ecosystem dynamics, and human health through interoperable representations.

Main objectives

Knowledge representation

  • Construct and enrich an ontology set dedicated to environmental pollutants with an emphasis on PFAS.
  • Integrate structured and unstructured sources into a shared semantic model.
  • Support interoperability, reasoning, and reuse through knowledge graphs.

Data-driven analysis

  • Identify normal and abnormal behaviors through clustering and anomaly detection.
  • Explore spatio-temporal methods and deep learning models for contamination analysis.
  • Design interactive visual interfaces for transparent and explainable exploration.

Methodological pillars

Ontology engineering

Reuse and extend semantic resources to model pollutants, sources, exposure pathways, environmental effects, and health-related dimensions.

Hybrid anomaly detection

Combine clustering, deep learning, and semantic constraints to detect outliers, contamination zones, and emerging patterns in environmental datasets.

Visual analytics

Provide domain experts with interactive views, filters, and explainability features to inspect data, results, and uncertainty.

Current outcomes

Semantic resources

A first ontology and knowledge graph dedicated to PFAS and environmental exposure are available, following Linked Open Data principles.

Related resource: OntoPFAS

Exploration and explainability

A visualisation tool supports interaction with PFAS-related data, including geospatial inspection, filtering, and correlation analysis with explainability-oriented methods.

Clustering and interpolation

Initial work explores clustering approaches to mitigate sampling bias and improve trust in environmental analysis through confidence-aware spatial interpolation.

Towards enriched data hubs

The project aims to connect raw environmental measurements, structured semantic resources, and broader data platforms for more reusable PFAS knowledge infrastructures.

Project timeline

Phase 1 — Foundations Literature review, documentation, team meetings, and project coordination.
Phase 2 — Ontology construction Building and evaluating an ontology for environmental pollutants, with a focus on PFAS.
Phase 3 — Information extraction Developing models to populate the ontology from structured and textual data sources.
Phase 4 — Anomaly detection and analytics Designing data analysis approaches, contamination clustering strategies, and predictive models.
Phase 5 — Visual exploration and reporting Building expert-oriented interfaces, final evaluation, and dissemination.

Consortium

Partners

  • MISTEA (INRAE / Institut Agro)
  • LIRMM (CNRS / Université de Montpellier / Paul-Valéry Montpellier)
  • EPOC (CNRS)
  • PRODIG (CNRS)

Scientific leadership

The project is led by Pascal Neveu and Lylia Abrouk, in collaboration with specialists in machine learning, chemistry, geography, environmental health, and data journalism.