TrueCome: Effective data truth discovery based on fuzzy clustering with prior constraints

Küçük Resim Yok

Tarih

2025

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Elsevier Inc.

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Data truth discovery is a process to determine accurate information from multiple conflicting data sources. Existing truth discovery schemes suffer from low efficiency and insufficient accuracy caused by noisy data and source unreliability, particularly facing three key limitations: (1) inability to leverage inter-attribute constraints for distance metric learning, (2) lack of effective mechanisms for distinguishing truth clusters from noisy streaming data. Few of them can discover the truth for streaming data with noise, and (3) static source reliability estimation that fails to adapt to streaming data dynamics. Few of them can discover the truth for streaming data with noise. To overcome these problems, we propose TrueCome, a possibilistic C-Means truth discovery scheme that leverages constraints between different attributes of an object and applies dynamically updated data source reliability to discover truth for both static and streaming data. TrueCome contains two functional modules: distance learning and truth discovery. The distance learning module constructs a distance function by mining prior constraints of object attributes. Then, the truth discovery module obtains the true values of an object through three steps: data clustering based on data sample distance and data source reliability and attribute weights, truth cluster identification by calculating cluster trust degrees, and truth acquisition derived from True Value Clusters (TVCs). In particular, TrueCome employs Maximum A Posteriori (MAP) estimation to adaptively update source reliability (i.e., source weight), allowing it to handle both static and streaming data effectively. Extensive experiments on two real-world datasets and one synthetic dataset exhibit the superiority of TrueCome over several baselines in terms of accuracy and efficiency, particularly for streaming data with noise. We also validate the design rationality of TrueCome through ablation studies. © 2025 Elsevier Inc.

Açıklama

Anahtar Kelimeler

Data Analysis, Fuzzy Clustering, Noisy Data, Streaming Data, Truth Discovery

Kaynak

Information Sciences

WoS Q Değeri

N/A

Scopus Q Değeri

Q1

Cilt

717

Sayı

Künye

Gao, L., Wu, F., Wang, J., Yan, Z., & Pedrycz, W. (2025). TrueCome: Effective Data Truth Discovery Based on Fuzzy Clustering with Prior Constraints. Information Sciences, 122290.