TrueCome: Effective data truth discovery based on fuzzy clustering with prior constraints
Tarih
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Data truth discovery is a process to determine accurate information from multiple conflicting data sources. Existing truth discovery schemes suffer from low efficiency and insufficient accuracy caused by noisy data and source unreliability, particularly facing three key limitations: (1) inability to leverage inter-attribute constraints for distance metric learning, (2) lack of effective mechanisms for distinguishing truth clusters from noisy streaming data. Few of them can discover the truth for streaming data with noise, and (3) static source reliability estimation that fails to adapt to streaming data dynamics. Few of them can discover the truth for streaming data with noise. To overcome these problems, we propose TrueCome, a possibilistic C-Means truth discovery scheme that leverages constraints between different attributes of an object and applies dynamically updated data source reliability to discover truth for both static and streaming data. TrueCome contains two functional modules: distance learning and truth discovery. The distance learning module constructs a distance function by mining prior constraints of object attributes. Then, the truth discovery module obtains the true values of an object through three steps: data clustering based on data sample distance and data source reliability and attribute weights, truth cluster identification by calculating cluster trust degrees, and truth acquisition derived from True Value Clusters (TVCs). In particular, TrueCome employs Maximum A Posteriori (MAP) estimation to adaptively update source reliability (i.e., source weight), allowing it to handle both static and streaming data effectively. Extensive experiments on two real-world datasets and one synthetic dataset exhibit the superiority of TrueCome over several baselines in terms of accuracy and efficiency, particularly for streaming data with noise. We also validate the design rationality of TrueCome through ablation studies. © 2025 Elsevier Inc.