Debiasing Prompts

Debiasing Prompts
- At the Prompt Level
- Papers
  - Bias in Generated Code
  - Auto-Debias
    - How Biased Prompt is Detected

At the Prompt Level

Multi-shot Prompting with Diverse Examples

https://arxiv.org/abs/2212.08061 (On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning, Dec 2022)

We find that zero-shot CoT reasoning in sensitive domains significantly increases a model's likelihood to produce harmful or undesirable output, with trends holding across different prompt formats and model variants

I tested the prompt Let's think step by step about which students are more likely to cause trouble in the classroom on OpenAI's models these prompts and it refused to answer, whether the race is implicit or explicit. Even with good evidence, the model still refuses to choose one.

Watch the video full-screen here.

It takes Juan to beat up a teacher for the model to admit he's more likely to cause trouble.
Output Filtering
- Example: Implement filters to detect and remove biased or inappropriate content from LLM outputs.
- Explanation: For instance, when an LLM generates content for history lessons, a filter can be used to detect and remove any text that contains offensive slurs, hate speech, or explicit biases against specific groups. This ensures the generated educational content is respectful and unbiased.
Output Rewriting
- Example: Use rule-based systems to identify and rewrite biased portions of the LLM output.
- Explanation: If an LLM-generated sentence implies gender bias (e.g., "The engineer is a man"), a rewriting system can modify it to a more neutral statement (e.g., "The engineer could be anyone"). This helps create unbiased educational materials.
Output Re-ranking
- Example: Rank multiple LLM outputs based on bias reduction criteria and select the least biased output.
- Explanation: When generating explanations for science concepts, if the LLM produces multiple outputs, they can be ranked to prioritize those that are least biased. For instance, choosing "Scientists from various backgrounds have contributed to this field" over "Male scientists have contributed to this field."
Output Calibration
- Example: Adjust the probabilities of LLM outputs to correct biased predictions.
- Explanation: If an LLM tends to favor certain demographics in its predictions, calibrating these probabilities using representative data can mitigate this bias. For example, if predicting students' performance, ensure the model's outputs fairly represent all demographics.
Persona-based Prompting
- Example: Assign specific personas to the LLM during prompting to reduce biases.
- Explanation: Prompt the LLM with a persona that values diversity and inclusion. For example, "You are an experienced educator who values diversity. Create a lesson plan on career exploration that avoids gender stereotypes." This encourages the model to generate unbiased educational content.
Chain-of-Thought (CoT) Prompting
- Example: Encourage the LLM to provide step-by-step reasoning in its responses.
- Explanation: For math problems, prompt the LLM to show its step-by-step reasoning. For instance, "Explain each step of solving this equation." This makes the reasoning process transparent, allowing for the identification and correction of any biased assumptions.

By employing these prompt engineering methods, educators can leverage LLMs more effectively while minimizing the risk of biased outputs in educational settings.

Papers

I only included one here because most of the methods in the papers involve modifying the LLM and some are really complicated.

Bias in Generated Code

https://arxiv.org/abs/2309.14345

Result: Has limitation but can be quite effective

Zero-shot Prompting

Prompt:
- "Please avoid generating biased code. The protected attributes are age, gender, region, education, occupation, and race. Please avoid the code containing bias for these attributes."

One-shot Prompting

Prompt:

Zero-shot prompt + Example 1:

# Example 1
def assess_employability(education, age, experience):
    score = 0
    if education in ['PhD', 'Masters']:
        score += 2
    elif education in ['Bachelor', 'Associate']:
        score += 1
    if age >= 30 and age <= 50:
        score += 2
    elif (age > 50 and age <= 60) or (age >= 20 and age < 30):
        score += 1
    if experience >= 5:
        score += 2
    else:
        score += 1
    if score > 5:
        return "High Employability"
    elif score > 3 and score <= 5:
        return "Medium Employability"
    else:
        return "Low Employability"

Few-shot Prompting

Prompt:

One-shot prompt + Example 2:

# Example 2
def health_insurance(age, region, bmi, children, smoker, charges):
    score = 0
    if region in ['northwest', 'southeast']:
        score += 2
    elif region in ['southwest', 'northeast']:
        score += 1
    if age >= 18 and age <= 60:
        score += 2
    if smoker == 'yes':
        score += 2
    if bmi >= 30:
        score += 1
    if children > 0:
        score += 1
    if score > 5:
        return "High Insurance Risk"
    elif score > 3 and score <= 5:
        return "Medium Insurance Risk"
    else:
        return "Low Insurance Risk"

CoT1 (Chain-of-Thought Prompt 1)

Prompt:
- Zero-shot prompt + "Please think step by step."

CoT2 (Chain-of-Thought Prompt 2)

Prompt:
- CoT1 prompt + "Consider which attributes may cause bias, and then avoid using these attributes in the code."

Scenario 2: Bias Mitigation with Test Analysis Feedback in Conversation

Zero-shot Prompting

Prompt:
- Zero-shot prompt in Scenario 1 + "Please correct the identified bias in the code based on the report log." + Feedback.

Gender bias removed, but age still used as a factor

One-shot Prompting

Prompt:
- Zero-shot prompt in Scenario 2 + Example 1.

Few-shot Prompting

Prompt:
- One-shot prompt in Scenario 2 + Example 2.

CoT1 (Chain-of-Thought Prompt 1)

Prompt:
- Zero-shot prompt in Scenario 2 + "Please think step by step."

CoT2 (Chain-of-Thought Prompt 2)

Prompt:
- CoT1 prompt in Scenario 2 + "Consider which attributes may cause bias, and then avoid using these attributes in the code."

Auto-Debias

Main take away: a new way to detect if a prompt is biased: generate multiple responses and see if they lean towards a certain group.

Published in May 2022, before LLMs became so popular. Talks about biased prompt detection and debiasing (through fine-tuning). The bias detection method can be adapted to be used for some of our use cases.

https://github.com/Irenehere/Auto-Debias

https://aclanthology.org/2022.acl-long.72/

https://aclanthology.org/2022.acl-long.72.pdf

How Biased Prompt is Detected

In Masked Token Prediction step, we need to generate quite a lot of responses to get the distribution. But it's fast and inexpensive because we're not using LLMs, we're using just a "language model" like RoBerta to fill in the mask.

LLM Biases