Decoding News Bias: Multi-Bias Detection in News Articles

Overview

This research addresses the problem of bias detection in news articles, which is crucial for maintaining media integrity and promoting informed public discourse. Unlike prior studies that focus narrowly on political or gender bias, this work proposes a comprehensive framework for detecting multiple types of biases across domains using Large Language Models (LLMs) for dataset creation and transformer-based models for detection.

Related Works

Existing studies have mainly concentrated on political and media bias, particularly in detecting biased word choice, framing, and partisan leanings. Approaches such as DA-RoBERTa and Gaussian Mixture Models (GMM) have achieved notable success in specific domains, while others utilized linguistic, contextual, and demographic features for nuanced detection. More recently, LLMs like GPT-3.5 and GPT-4 have been explored for bias labeling and fake news detection. However, these efforts generally focus on single-bias classification and lack a unified, multi-bias framework — a gap this paper aims to fill.

Bias Types Considered

To capture diverse manifestations of bias, the study categorizes seven types:

Political Bias – Favoring or criticizing political ideologies or entities.
Gender Bias – Unequal portrayal or treatment based on gender.
Entity Bias – Overrepresentation or favoritism toward specific individuals or organizations.
Racial Bias – Discrimination based on race, ethnicity, or culture.
Religious Bias – Unfair treatment or framing of religious beliefs.
Regional Bias – Unequal coverage based on geographic location.
Sensationalism – Exaggeration or emotional framing to attract attention.

Methodology

Dataset Collection

A dataset of 9,790 news articles was curated across six domains — Hollywood, Fashion, Finance, Religion, Politics, and Sports — using the Aylien API. After filtering, 4,886 articles exhibiting at least one bias type were retained.

Label Extraction via LLMs

To efficiently label this dataset, the authors used GPT-4o mini, chosen for its strong language understanding and cost-effectiveness. An instruction-based prompting strategy was adopted, where the model received definitions for each bias and labeled articles in a binary format. The entire dataset was labeled in ~6 hours, demonstrating scalability and consistency.

Experiments

Several transformer-based models — BERT, RoBERTa, ALBERT, DistilBERT, and XLNet — were evaluated for multi-label bias detection. To address class imbalance, the authors used inverse frequency weighting during training. Models were trained on a T4 GPU with AdamW optimizer, learning rate scheduling, and Multilabel Stratified KFold splitting to preserve label distribution across splits.

Results

Model	Notable Performance	F1 Highlights
BERT	Best overall performance	0.89 (Political Bias), 0.80 (Sensational Bias)
RoBERTa	Close to BERT	0.87 (Political Bias), 0.75 (Entity Bias)
ALBERT / DistilBERT / XLNet	Lower F1 due to fewer parameters and class imbalance	0.38–0.56 (Racial/Regional Bias)

Key Observations

BERT consistently outperformed other models across most bias categories due to its robust contextual understanding and pre-training architecture. Its superior performance, especially in political and sensational bias detection, demonstrates the importance of deep contextual embeddings for nuanced bias identification. In contrast, lighter models such as DistilBERT and ALBERT struggled to capture subtle linguistic cues, particularly in categories with limited data such as racial and regional biases.

Despite applying inverse frequency weighting to mitigate class imbalance, the dataset’s uneven distribution remained a key limiting factor. Models showed reduced precision in detecting underrepresented bias types, revealing the challenge of multi-bias detection in imbalanced real-world data. Furthermore, the LLM-based annotations, while efficient, introduced occasional inconsistencies — particularly misclassifying neutral or factual reporting as biased when political or religious entities were mentioned. These insights highlight the need for enhanced data augmentation, annotation refinement, and contextual calibration in future research.

Novelty and Contributions

First multi-bias dataset covering seven bias types.
LLM-driven dataset annotation method offering scalability and efficiency.
Comprehensive evaluation of transformer models on a multi-label bias detection task.
Highlights the need for contextual bias understanding beyond keyword-level detection.

Conclusion

The study demonstrates that combining LLMs for annotation with transformer models for classification enables effective multi-bias detection in news articles. BERT emerged as the most reliable model. However, limitations include potential annotation inconsistencies from the LLM and class imbalance in certain bias categories.

Future Work

Improve LLM annotation reliability via human-in-the-loop validation.
Expand dataset for underrepresented bias types.
Explore advanced augmentation and robust splitting strategies.
Develop systems that provide explanations for detected biases to enhance transparency.

In summary, this research establishes a strong foundation for multi-bias detection in media, pushing beyond traditional binary classification and paving the way for context-aware, ethical AI applications in journalism.

Decoding News Bias: Multi-Bias Detection in News Articles

Implementation Summary

Decoding News Bias: Multi-Bias Detection in News Articles

Overview

Related Works

Bias Types Considered

Methodology

Dataset Collection

Label Extraction via LLMs

Experiments

Results

Key Observations

Novelty and Contributions

Conclusion

Future Work