How Does AI Detection Work?

In This Blog

10 Most Common Questions About Copyleaks AI Detection

Key Takeaways

Since its release in January 2023, the number of questions we have received about the Copyleaks AI Detector and how AI detection works almost rivals the number we get about generative AI. 

 

Understandably, people want reassurance around generative AI; to get that, they need to feel confident in the technology providing the guardrails. That’s why we’ve compiled our 10 most commonly asked questions about AI detection, how it works, and other things you might be wondering about.

How Do AI Detectors Work?

When a Large Language Model writes a sentence, it probes all of its pre-training data to output a statistically generated sentence, which often does not resemble the patterns of human writing. It becomes more apparent when analyzed against a vast corpus of human writing. 

If you want to learn about the methodology behind how AI detectors work, visit our AI Detector Testing Methodology page.

How was the Copyleaks AI detection model trained?

Regarding how AI detectors work, most of them simply look for AI-generated text or content. However, with the Copyleaks AI Detector, we take a slightly different approach. 

 

First, since 2015, we’ve collected, ingested, and analyzed trillions of crawled and user-sourced content pages from thousands of universities and enterprises worldwide to train our models to understand how humans write. Because our AI Detector is looking for human text instead of AI-generated text, our technology can detect irregular sentence patterns commonly used by genAI more accurately. 

 

Also, by utilizing AI technology, our AI detector can accurately recognize the presence of other AI-generated text and the signals it leaves behind, adding an additional layer of accuracy. 

How is your AI content detection any different from other detectors?

There are several significant differences between other detectors and our AI Detector.  

 

For example: 

  • Credible data at scale, coupled with machine learning and widespread adoption, allows us to continually refine and improve our ability to understand complex text patterns, resulting in over 99% accuracy—and improving daily.

 

  • As an enterprise-based platform, we offer API and LMS integrations, allowing you to bring the power of the AI Detector directly to your native platform and at scale.

 

  • By examining each paragraph and sentence, our report highlights the specific elements of the text potentially written by AI and provides a confidence level.

 

  • It does not flag non-AI-based writing assistant features, unlike other detectors on the market.

 

  • We are GDPR-compliant and SOC 2 and SOC 3 certified. Learn more here.

How do you avoid AI detection false positives?

The chance for content written by a human to be falsely labeled as AI-generated content is 0.2%. Nevertheless, we strive to inspire authenticity and digital trust by creating secure environments to share ideas and learn confidently, and that comes with the responsibility to ensure complete accuracy, particularly around AI detection false positives. 

 

To address this, we have taken several precautions, including: 

 

  • Our detection and the algorithms that power it are designed for detecting human-generated text versus AI-generated text. Detecting AI text tends to give a lower accuracy and increases the likelihood of false positives.  

 

  • To help accelerate our learning and refine the models used, we implemented a feedback loop where users can rate the accuracy of the results, which allows us to continually use examples of false positives, rare as they may be, to improve. 

 

  • We only introduce new model detection after thorough testing. We will release updates only once our internal testing reaches a high confidence threshold.

Does the Copyleaks AI Detector flag writing assistant tools like Grammarly as AI content?

Certain features of writing assistants can cause your content to be flagged by the AI Detector as AI-generated.

For example, Grammarly has a genAI-driven feature that rewrites your content to help improve it, shorten it, etc. As a result, this reworked content could get flagged as AI since it was rewritten by genAI. 

However, the Copyleaks Writing Assistant does not get flagged as AI or any content that Grammarly changed to fix grammatical errors, mechanical issues, etc., because it does not use or uses minimal genAI to power these features or functionalities.

Read our analysis about writing assistant tools getting flagged as AI.

Why is there a minimum and maximum text requirement for some AI content scans?

Our models need a certain volume of text to accurately determine the presence of AI. The higher the character count, the easier it is for our technology to determine irregular patterns, which results in a higher confidence rating for AI detection.

The ideal text requirements for each of our AI offerings are as follows:

 

AI Detector Browser Extension

Minimum: 350 characters 

Maximum: 25,000 characters

 

AI Detector Web-Based Platform: 

Minimum: 255 characters 

Maximum: 2,000 pages (There is no character maximum)

What models can you detect, and what’s the accuracy of each?

As of July 2024, we can detect the latest models of the following LLMs: 

 

  • ChatGPT
  • Gemini
  • Claude 
  • Jasper 3
  • T5

 

Using English text, each model’s detection accuracy varies slightly from model to model, though each is above 98.0%.   

 

Given the type of content being tested, you may encounter slightly different results. Accordingly, we suggest conducting several tests to determine the success rate for your specific content type. 

What languages do you support, and what is the accuracy of each?

The AI Detector offers more language options than any other solution on the market, including English, Spanish, French, Portuguese, German, Italian, Russian, Polish, Romanian, Dutch, Swedish, Czech, Norwegian, Korean, Japanese, Chinese (Simplified and Traditional), and more. Indonesian is the latest supported language, added with the release of the AI Detector V5 in July 2024. 

For a complete list of supported languages, click here

At the moment, English has the highest accuracy at 99.1%. We continue to develop our models to increase the accuracy across other supported languages, and there are plans to introduce accurate detection across dozens of additional languages.

What other AI content detection capabilities are you working on?

We are working on several capabilities, including:

 

  • Continued accuracy improvements for detecting AI text that has gone through a text spinner or otherwise been manipulated  (i.e., including deliberate typos).  

 

  • Across-the-board accuracy improvements.   

 

  • The support of additional languages and models. 

 

We’ll continue to monitor the landscape and closely listen to user feedback to ensure we stay one step ahead of AI content generators and provide the most accurate results possible.  

For a more comprehensive list of frequently asked questions about the Copyleaks AI Detector and its capabilities, click here

Find out what's in your copy.

Related Blogs