This is an expository paper. Here we propose a
decision-theoretic framework for addressing aspects of the confidentiality
of information problems in publicly released data. Our basic premise is that
the problem needs to be conceptualized by looking at the actions of three
agents: a data collector, a legitimate data user, and an intruder. Here we
aim to prescribe the actions of the first agent who desires to provide
useful information to the second agent, but must protect against possible
misuse by the third. The first agent is under the constraint that the
released data has to be public to all; this in some societies may not be the
case. ¶
A novel aspect of our paper is that all utilities-fundamental to decision making-are
in terms of Shannon's information entropy. Thus what gets released is a distribution whose entropy maximizes
the expected utility of the first agent. This means that the distribution
that gets released will be different from that which generates the collected
data. The discrepancy between the two distributions can be assessed via the
Kullback-Leibler cross-entropy function. Our proposed strategy therefore
boils down to the notion that it is the information content of the data, not
the actual data, that gets masked. Current practice of ''statistical disclosure limitation'' masks the observed data
via transformations or cell suppression. These transformations are guided by
balancing what are known as ''disclosure risks'' and ''data utility''. The entropy indexed utility functions we propose
are isomorphic to the above two entities. Thus our approach provides a
formal link to that which is currently practiced in statistical disclosure
limitation.