Compass Vision Achieves Best Performance in a Rigorous and Highly Realistic Deepfake Detection Evaluation

How Blackbird.AI’s digitally manipulated image detector expertly evaluated Deepfake‑Eval 2024, and why that matters for anyone who needs to trust what they see online.

Dall-E 3

On 15 October 2024, we released Compass Vision, our instant, explainable real‑vs‑fake checker. Drop in up to five images and get a verdict, plus confidence score and visual evidence. For commercial use, Compass Vision is also available with higher rate limits and a complete API. Within only a few weeks of that launch—using the same public model—we pointed Compass Vision at the most formidable challenge we could find.

Benchmarks Are Everywhere, But Most Are Stuck in the Past

Academic datasets, such as FaceForensics++, DFDC, and Celeb-DF, encompass substantial volumes of imagery and video segments. However, these datasets are predicated upon technologies that predate the present era. The majority of simulated materials within these collections are generated using Generative Adversarial Network (GAN) face-swapping techniques and rudimentary reenactment tools that were prevalent around 2018. Contemporary synthetic media generators, encompassing diffusion models, advanced text-to-image synthesis systems, and readily accessible consumer-grade face-swap applications, were not yet extant during the period of compilation for these datasets.

LEARN: What Is Narrative Intelligence?

Why Deepfake‑Eval 2024 Is Different

Created and curated by TrueMedia.org — a nonprofit best known for its free deepfake detection portal and election protection work — Deepfake-Eval 2024 was designed to reflect the internet as it exists today rather than the academic labs of years past.

  • Fresh content, fresh threats. Every single file was collected in 2024 after diffusion models and text-to-image apps experienced a surge in popularity.
  • Expert reviewers disagreed on fewer than 10 percent of the samples, which demonstrates the increasing realism of many of these fakes. 
  • Full toolbox of tricks. From diffusion art and Style-Transfer to classic face swaps and GAN mash-ups, all the popular manipulation styles are represented.
  • True diversity. Memes, news photos, influencer selfies: 1975 images drawn from 88 websites in 52 languages.

Because this benchmark originates from an organization that has already made significant public contributions to deepfake detection, our strong performance on Deepfake-Eval 2024 demonstrates Compass Vision’s effectiveness on a test set that the broader narrative attack research community trusts as the new gold standard.

Why Compass Vision Wasn’t in the First Paper

The Deepfake-Eval paper was posted on arXiv on March 24, 2025. At the time, Compass Vision was a fast-moving new entrant to the market. TrueMedia.org didn’t yet included us in their benchmarking, but that’s no surprise: the deepfake detection landscape is shifting rapidly, and newer, more advanced solutions like ours are reshaping the competitive field in real-time.

Once we secured the test set in mid-May and accounted for a few missing images, we conducted a rigorous, apples-to-apples evaluation using the same public API our customers use today.

How Compass Vision Performed

The performance comparison of Compass Vision (Checkpoint: November 2024) against the best commercial benchmarks from Deepfake-Eval 2024 is presented in the following table.

MetricBest CommercialCompass Vision
Accuracy82.0 %86.7 %
AUC0.900.93
Recall0.710.83
Precision0.990.94
F1 Score0.830.89

Table 1: Performance Comparison of Compass Vision based on the checkpoint from November 2024.

The benchmark performance of the leading commercial detector on Deepfake-Eval 2024 was characterized by an accuracy of 82%, an AUC of 0.90, a recall of 71%, a precision of 0.99, and an F1 score of 0.83. Employing an identical protocol and our publicly accessible November 2024 API, Compass Vision demonstrated a notable improvement across key metrics:

  • Accuracy increased to 86.7% from 82%, a significant improvement given that accuracy, alongside AUC, is a primary metric for evaluating the efficacy of deepfake detection.
  • AUC improved to 0.931 from 0.90, indicating an enhanced ability to differentiate between genuine and manipulated images.
  • Recall elevated to 83% from 71%, resulting in a substantial reduction of undetected fraudulent images.
  • Precision was registered at 0.942, slightly down from 0.99, while maintaining a low rate of false positives and improving the identification of manipulated content.
  • F1 score rose to 0.89 from 0.83, reflecting a more optimized equilibrium between recall and precision.

Essentially, Compass Vision identifies approximately one additional manipulated image per eight images that the next most effective system would miss without increasing the incidence of false positives for analysts.

Why the Results Matter

The results hold significant importance due to their implications for information integrity, operational efficiency, and future resilience. 

  1. Firstly, the integrity of information is paramount, as even a single undetected falsification can initiate the proliferation of a deceptive hoax. A higher recall rate diminishes the likelihood of such crises. 
  2. Secondly, operational efficiency is enhanced by a high precision rate, enabling teams to prioritize genuine risks and minimize distractions from erroneous alerts. 
  3. Thirdly, these metrics, derived from a model deployed shortly after its inception, demonstrate promising future resilience. Furthermore, ongoing updates are anticipated to further improve these figures towards the projected 90% threshold, as indicated by research.

Why We Didn’t Publish Sooner

  • Access to the research dataset required a formal process, which caused delays.
  • Some images were missing from the initial release. We contacted the authors to understand the impact of these missing items to ensure the final scoring remains transparent and reliable.

What’s Next for Compass Vision

  1. Stronger generalization. New generators from Google, OpenAI, and dozens of specialist commercial vendors appear monthly. Our R&D pipeline continually retains fresh, adversarially curated data to stay effective against techniques it has never encountered before.
  2. Video detection. Beyond images, we are also tackling deepfake videos. An early-access API for deepfake video detection has been provided to early adopter customers.
Figure 1. Compass Vision performs both Authenticity and Risk Analysis on every asset, whether the content is authentic or manipulated. In this example, it detects AI‑generated lighting inconsistencies, labels the image AI-Generated or Deepfake, and, because it depicts a nuclear plant explosion, issues a High-Risk alert. Supplemental signals (bot score, engagement, reach, and sentiment) provide analysts with immediate insight into potential impact.

How These Results Advance Blackbird.AI’s Narrative Intelligence Mission

Building on the benchmark results outlined above, Compass Vision’s leadership in digital manipulation detection is more than a technical milestone—it is a strategic accelerant for our end‑to‑end Narrative Intelligence Platform.

Blackbird.AI’s visual forensics engine flags every form of manipulation—from diffusion deep‑fakes to classic Photoshop cloning, splicing, style transfer, generative fill, and more. In the latest DeepfakeEval 2024 benchmark, it surpassed the next‑best commercial system (86.7 % accuracy vs 82 %).

Detection alone is only one signal. Once Compass Vision labels an image as manipulated, that signal is passed into our multi-layer analytics to answer the broader narrative questions.

Blackbird.AI is the only company that offers the novel and combined strength of narrative threat analysis, deepfake detection, and contextual claim checking.

  • Narrative Detection – Which larger narratives—across text, audio, and video—does this manipulated asset reinforce or spawn?
  • Compass Context Check – Does the visual support or originate a narrative attack storyline?
  • Cohorts & Actor Mapping – Who is sharing it, and are they organic communities or coordinated networks of bots and trolls?
  • Manipulation & Bot Integrity Signals – Is synthetic engagement artificially amplifying its reach?
  • Trajectory Metrics (Volume, Reach, Growth) – Is the story gaining momentum across platforms and geographies?
  • Network Analysis – How are accounts structurally connected, and which nodes (influencers, botnets, troll farms) are critical for spread?

By fusing these contextual layers with the market’s most precise manipulation detection, Blackbird.AI enables customers to transition from pixel-proof to strategic narrative insight. They see not only that an image is digitally manipulated but also why it matters, who benefits, and how quickly the story is spreading—enabling proactive intervention before the narrative becomes a perceived fact.

The Platform in Action

The screenshots below illustrate how Compass Vision’s manipulation signal flows into holistic narrative intelligence.

Figure 2. The manipulation alert cascades into a broader narrative that aggregates 489 posts, 22.9k engagements, and 152 authors, with cohort tags (e.g., Russian State Supporter) and growth-trend indicators—pinpointing who is amplifying the claim and how fast it is moving across platforms, along with providing Compass Context to understand the context.

These visuals underscore how a single authenticity flag evolves into an actionable narrative context, empowering teams to intervene before manipulated media reshapes public perception.

Getting Started with Compass Vision

Self‑serve App. Experience Compass Vision in your browser, with access to upload images and instantly view authenticity scores along with explanations.

Register Here

API integration. Programmatically embed state‑of‑the‑art manipulation detection into any workflow. After signing up, grab your API key and explore the endpoints for single-image or batch analysis in the fully annotated documentation.

API Documentation

For commercial use, Compass Vision is available with higher rate limits for both Self-serve Apps and API. 

Full Narrative Intelligence with Constellation. For cross‑platform monitoring, cohort analytics, and automated narrative reporting, request a live 

Constellation Demo From The Blackbird.AI Team

Figure 3. In addition to spotting manipulated visuals, Compass Vision surfaces authentic images reused out of context—classic misinformation—and shows Related Narratives propagating similar claims, revealing how genuine media can still mislead when reframed.

The Way Forward – Takeaways for Organization Leaders

  • Detection alone won’t save you from digital manipulation crises. Compass Vision’s 86.7% accuracy identifies the fakes, but understanding who is spreading them, which narratives they support, and how quickly they’re moving across platforms determines whether you can stop the damage. Invest in platforms that connect manipulation detection to narrative context and actor mapping.
  • Your November 2024 defense won’t work against the March 2025 attacks. New synthetic media generators emerge monthly, each with novel techniques that bypass yesterday’s detectors. Develop detection capabilities that continuously retrain on adversarial data rather than relying on static models trained on historical datasets from 2018.
  • The real ROI comes from what you prevent, not what you detect. High precision (94%) means your team spends time on genuine threats instead of chasing false positives. But the bigger value lies in that 83% recall rate—catching manipulated content before it reaches critical mass saves millions in crisis management compared to playing catch-up after a hoax goes viral.

The jump to 86.7% accuracy means Compass Vision catches one additional fake for every eight images—without triggering more false alarms. That’s the difference between a viral hoax spreading unchecked and a manipulation stopped cold. As synthetic media generators proliferate and evolve monthly, Blackbird.AI’s immediate dominance in Deepfake-Eval 2024, using its November checkpoint, signals a crucial point: the detection arms race is winnable. The next frontier? Video deepfakes, where the stakes—and the technical challenges—multiply exponentially.

Reference:

Chandra, N.A., Murtfeldt, R., Qiu, L., Karmakar, A., Lee, H., Tanumihardja, E., Farhat, K., Caffee, B., Paik, S., Lee, C. and Choi, J., 2025. Deepfake-eval-2024: A multi-modal in-the-wild benchmark of deepfakes circulated in 2024. arXiv preprint arXiv:2503.02857.

Available here: https://arxiv.org/pdf/2503.02857

  • To receive a complimentary copy of The Forrester External Threat Intelligence Landscape 2025 Report, visit here.
  • To learn more about how Blackbird.AI can help you in these situations, book a demo.

Abul Hasnat

Abul Hasnat

Lead, Computer Vision

Yazid Lachachi

Yazid Lachachi

Senior Machine Learning Engineer

Naushad UzZaman

Naushad UzZaman
Chief Technology Officer

Naushad is the CTO and Co-founder of Blackbird.AI and leads a team of highly skilled experts, data scientists, and engineers who discover emergent threats to get ahead of real-world harm. UzZaman is responsible for developing Blackbird.AI’s Narrative Intelligence Platform, and along with his team, he has built a unique series of scalable ML, generative AI, and network analysis solutions that detect rare risk signals of threats.

Need help protecting your organization?

Book a demo today to learn more about Blackbird.AI.