The Mann Whitney test would not solve problems of dependent data.

]]>My name is Jacob Miller and I am having trouble thinking of a model for measuring the probability of a positive ID.

For example. Lets say I have n samples, each with x number of identifiable traits, who are either control, treatments, positive, or negative.

Now, lets assume that there is a recipient for the data. All they see is the average number of controls who were positive, controls who were negative, treatments who were positive, and treatments who were negative.

Lets set a lower bound for the number of positive observations in a cohort (y>=2). What is the probability that the recipient of the aggregated data could get a positive ID on one of the positive subjects, assuming the worst possible scenario (that there were only two positive observations).

What process should I use to establish the probability of identification? Basically, what I want to know is, given a certain number of samples in treatment and control groups, who can read as either positive or negative, what is the probability that a recipient of the aggregated results could identify a single test subject between two time points?

For example:

Company A wants to know if its new add campaign is working. It has one store location. It has sales data from before the start of the campaign. Firm A offers to monitor the effectiveness of the campaign. They have tools to measure if an individual sees an ad, and also if that turns into a purchase. However, they will only send reports with the aggregates of n individuals. What is the probability that, given that company A also has individual sales data, a change in the aggregates sent by Firm A could result in the positive identification of a customer of company A?

Thanks!

]]>Yes, it is. The practice of bivariate screening is common, but not good. Not only can a variable that was not sig bivariately be sig. in a multiple regression, but it can also have effects on other parameters.

Peter

]]>