Optimal Distribution of Restricted Ranges in Secure Statistical Database

One of the goals of statistical databases is to provide statistics about groups of individuals while protecting their privacy. Sometimes, by correlating enough statistics, sensitive data about individual can be inferred. The problem of protecting against such indirect disclosures of confidential data is called the inference problem and a protecting mechanism—an inference control. A good inference control mechanism should be effective (it should provide security to a reasonable extent) and feasible (a practical way exists to enforce it). At the same time it should retain the richness of the information revealed to the users. During the last few years several techniques were developed for controlling inferences. One of the earliest inference controls for statistical databases restricts the responses computed over too small or too large query-sets. However, this technique is easily subverted. Recently some results were presented (see [Michalewicz & Chen, 1989]) for measuring the usability and security of statistical databases for different distributions of frequencies of statistical queries, based on the concept of multiranges. In this paper we use the genetic algorithm approach to maximize the usability of a statistical database, at the same time providing a reasonable level of security. We discuss also the importance of this new technique.