My Take on LLM Biases
https://news.ycombinator.com/item?id=40763835 - comment on ChatGPT is biased against resumes with credentials that imply a disability (washington.edu):
This is expected behavior if you understand that the results from any data-based modeling process (machine learning generally) is a concactination of the cumulative input data topologies and nothing else. So of course a model will be biased against people hinting at disabilities, because existing hiring departments are well known for discriminating and are regularly fined for such
So the only data it could possibly learn from couldn’t teach the model any other possible state space traversal graph, because there are no giant databases for ethical hiring
Why don’t those databases exist? because ethical hiring doesn’t exist in a wide enough scale to provide a larger state space than the data on biased hiring
Ethical Garbage in (all current training datasets) == ethical garbage out (all models modulo response NERFing)
It is mathematically impossible to create a “aligned” artificial intelligence towards human goals if humans do not provide demonstration data that is ethical in nature —- which we currently do not incentivize the creation of.