Scaling Bayesian network discovery through incremental recovery

Castelo, J.R.; Siebes, Arno

Bayesian networks are a type of graphical models that, e.g., allow one to analyze the interaction among the variables in a database. A well-known problem with the discovery of such models from a database is the ``problem of high-dimensionality''. That is, the discovery of a network from a database with a moderate to large number of variables quickly becomes intractable. Most solutions towards this problem have relied on prior knowledge on the structure of the network, e.g., through the definition of an order on the variables. With a growing number of variables, however, this becomes a considerable burden on the data miner. Moreover, mistakes in such prior knowledge have large effects on the final network. Another approach is rather than asking the expert insight in the structure of the final network, asking the database. Our work fits in this approach. More in particular, before we start recovering the network, we first cluster the variables based on a chi-squared measure of association. Then we use an incremental algorithm to discover the network. This algorithm uses the small networks discovered for the individual clusters of variables as its starting point. We illustrate the feasibility of our approach with some experiments. More in particular, we show that in the case where one knows the network, and thus the order, our algorithm yields almost the same network which is, moreover, still an I-map.

Additional Metadata
ACM	PROBABILITY AND STATISTICS (acm G.3), Problem Solving, Control Methods, and Search (acm I.2.8)
MSC	Data analysis (msc 62-07), Graphical methods (msc 62-09), Foundations and philosophical topics (msc 62A01), Measures of association (correlation, canonical correlation, etc.) (msc 62H20), Classification and discrimination; cluster analysis (msc 62H30), Searching and sorting (msc 68P10), Learning and adaptive systems (msc 68T05), Search theory (msc 90B40)
THEME	Information (theme 2)
Publisher	CWI
Series	Information Systems [INS]
Organisation	Database Architectures
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Castelo, J. R., & Siebes, A. (1999). Scaling Bayesian network discovery through incremental recovery. Information Systems [INS]. CWI.

Free Full Text ( Final Version , 618kb )

Scaling Bayesian network discovery through incremental recovery

Publication

Publication

Address

CWI researchers

Questions or comments?

Scaling Bayesian network discovery through incremental recovery

Publication

Publication

Workflow

Workflow

Add Content