**Thursday 24th July**

**Using statistics to learn about sensitive issues**

**Dr Heiko Grossman**

Surveys, opinion polls, probability and statistics are used to find sensible estimates for things.

If you are going to design a survey/ask questions you do need to be careful how you phrase them.

Some examples:

Should the UK adopt the Euro? Our group didn’t mind answering this and two of us said yes.

Do you consume illegal substances? Our group thought this was a sensitive question and wouldn’t get an honest answer. A better question needs to be substituted. If you did answer truthfully what would people like your employer do with the information?

Examples of sensitive issues: Health; Finances; Sexuality; Domestic violence

Why is direct questioning problematic?

It could be potentially embarrassing;

There could be a refusal to respond to the question;

The response to the question may be inaccurate;

If the interviewer says the questions are being asked in confidence do you believe him/her?

Gathering reliable information about sensitive issues is sometimes necessary for policy making etc.

How could this be done? Our group thought that the questions could be asked anonymously or perhaps set them as part of a scenario.

We were given a newspaper headline to read – “Former pupil claims he was beaten by a teacher” and we were then shown two questions.

Question A

Has there ever been an incident where a student was beaten by a teacher at your school?

Question B

Has there never been an incident where a student was beaten by a teacher at your school?

We asked whether the right word to use was “beaten”? We also discussed whether you would get honest answers to the questions.

There is a way round this problem. Instructions for answering:

1) Roll the die and conceal the outcome.

2) If the roll results in a

3) 1, 2, 3 or 4, then write your answer to Question A on the card,

4) 5 or 6, then write your answer to Question B on the card.

Dissecting the procedure

1) Question B is the logical opposite of Question A

2) Every individual answers a randomly selected question

3) No way of knowing which question a person has answered

4) Procedure ensures complete confidentiality

5) More likely to obtain honest answers.

When our group did this activity we got two yeses and 12 Nos. You couldn’t say which question was answered. We don’t know which question gave the right response, but we can come up with proportions.

How can we estimate the proportion of teachers who gave a ‘yes’ response to Question A?

Let’s use some maths!

Procedure we have used is known as Randomized Response Technique (RRT)

http://en.wikipedia.org/wiki/Randomized_response

It was invented by S. L. Warner (1965)

The idea of RRT approach:

1. Assume that respondents answer truthfully.

2. Every respondent receives Question A with probability p and

Question B with probability 1 − p.

3. Want to know proportion π of teachers for which correct answer to

Question A is ‘yes’.

4. Probability of answering ‘yes’ or ‘no’ irrespective of the question depends on π.

5. For a sample of n teachers write down the joint probability (likelihood) of obtaining k ‘yes’ and n − k ‘no’ answers, which also depends on π.

6. Find the value π that maximizes the likelihood function and use this as an estimate (πˆ) of π.

Probability of ‘yes’ and ‘no’ answers for a single person

Likelihood for k ‘yes’ and n − k ‘no’ answers in sample of size n

The likelihood is equal to

Looking at the data

Question A was selected with probability p = 2/3 and Question B with probability 1 − p = 1/3.

For the estimate (πˆ) to be between 0 and 1 the number k of ‘yes’ answers must satisfy n(1 − p) ≤ k ≤ np, that is n/3 ≤ k ≤ 2n/3.

We have n = and k =; and so

We then used the group’s data so do this analysis. There were 14 people in our group, k = 2 and n – k = 12

Probability for yes or no for a single person is going to be found. p is not known at the moment and k came from the survey.

(πˆ) = ((2/14) + p -1)/(2p – 1) = ((3 x 2)/14) -1 = (6/14) – 1.

Unfortunately this gives a negative number meaning that the survey sample was far too small.

Some remarks

1) In order to give accurate estimates the RRT requires a larger sample size than the number of people in this room, e.g. n = 1000.

2) The value πˆ is a point estimate and in order to quantify the uncertainty of the estimate it is recommended to also compute a confidence interval for the true proportion π.

3) If direct questioning is possible, then this method is more efficient for estimating proportions than the RRT that is a smaller sample size is needed.

4) The increased sample size when using the RRT is the price one has to pay for improving respondent cooperation by asking the individual to reveal less.

A variant of the basic RRT

The Questions A and B we have looked at are logical opposites.

Sometimes this can lead to clumsy and confusing questions such as “Are you not in favour of quarantining people with AIDS?”

An alternative is the unrelated question RRT (Greenberg et al., 1969). For our example, one could for instance ask:

Question A

Has there ever been an incident where a student was beaten by a teacher at your school?

Question B

Do you like pets?

Some real world applications:

1) Estimating the prevalence of xenophobia and anti-Semitism in Germany: A comparison of randomized response and direct questioning

http://www.ncbi.nlm.nih.gov/pubmed/23017963

2000 people made up the survey and they were asked the question if they thought there were too many foreigners in Germany. If they were asked the question directly on the phone 38.8% said yes. But using RRT this percentage went up to 47%.

2) How much do Chinese Applications fake?

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2062030

The survey found that 37% of Chinese would lie, 56% of Americans would lie but only 5% of the Swiss would lie.

3) Drug and alcohol dependence

http://www.ncbi.nlm.nih.gov/pubmed/19740612

Official doping tests only reveal 0.81% of positive test results, while according to RRT 6.8% of the athletes confessed to having practiced.

**References**

Greenberg, B.G., Abul-Ela, A.L.A., Simmons, W.R. & Horvitz, D.G. (1969). The unrelated question randomized response model: Theoretical framework. Journal of the American Statistical Association 64, 520–539.

Warner, S.L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60, 63–69.