Since its release in January 2023, the number of questions we have received about the Copyleaks AI Detector and how AI detection works almost rivals the number we get about generative AI.
Understandably, people want reassurance around generative AI; to get that, they need to feel confident in the technology providing the guardrails. That’s why we’ve compiled our 10 most commonly asked questions about AI detection, how it works, and other things you might be wondering about.
When a Large Language Model writes a sentence, it probes all of its pre-training data to output a statistically generated sentence, which often does not resemble the patterns of human writing. It becomes more apparent when analyzed against a vast corpus of human writing.
If you want to learn about the methodology behind how AI detectors work, visit our AI Detector Testing Methodology page.
Regarding how AI detectors work, most of them simply look for AI-generated text or content. However, with the Copyleaks AI Detector, we take a slightly different approach.
First, since 2015, we’ve collected, ingested, and analyzed trillions of crawled and user-sourced content pages from thousands of universities and enterprises worldwide to train our models to understand how humans write. Because our AI Detector is looking for human text instead of AI-generated text, our technology can detect irregular sentence patterns commonly used by genAI more accurately.
Also, by utilizing AI technology, our AI detector can accurately recognize the presence of other AI-generated text and the signals it leaves behind, adding an additional layer of accuracy.
There are several significant differences between other detectors and our AI Detector.
For example:
The chance for content written by a human to be falsely labeled as AI-generated content is 0.2%. Nevertheless, we strive to inspire authenticity and digital trust by creating secure environments to share ideas and learn confidently, and that comes with the responsibility to ensure complete accuracy, particularly around AI detection false positives.
To address this, we have taken several precautions, including:
Certain features of writing assistants can cause your content to be flagged by the AI Detector as AI-generated.
For example, Grammarly has a genAI-driven feature that rewrites your content to help improve it, shorten it, etc. As a result, this reworked content could get flagged as AI since it was rewritten by genAI.
However, the Copyleaks Writing Assistant does not get flagged as AI or any content that Grammarly changed to fix grammatical errors, mechanical issues, etc., because it does not use or uses minimal genAI to power these features or functionalities.
Read our analysis about writing assistant tools getting flagged as AI.
Our models need a certain volume of text to accurately determine the presence of AI. The higher the character count, the easier it is for our technology to determine irregular patterns, which results in a higher confidence rating for AI detection.
The ideal text requirements for each of our AI offerings are as follows:
AI Detector Browser Extension
Minimum: 350 characters
Maximum: 25,000 characters
AI Detector Web-Based Platform:
Minimum: 255 characters
Maximum: 2,000 pages (There is no character maximum)
As of July 2024, we can detect the latest models of the following LLMs:
Using English text, each model’s detection accuracy varies slightly from model to model, though each is above 98.0%.
Given the type of content being tested, you may encounter slightly different results. Accordingly, we suggest conducting several tests to determine the success rate for your specific content type.
The AI Detector offers more language options than any other solution on the market, including English, Spanish, French, Portuguese, German, Italian, Russian, Polish, Romanian, Dutch, Swedish, Czech, Norwegian, Korean, Japanese, Chinese (Simplified and Traditional), and more. Indonesian is the latest supported language, added with the release of the AI Detector V5 in July 2024.
For a complete list of supported languages, click here.
At the moment, English has the highest accuracy at 99.1%. We continue to develop our models to increase the accuracy across other supported languages, and there are plans to introduce accurate detection across dozens of additional languages.
We are working on several capabilities, including:
We’ll continue to monitor the landscape and closely listen to user feedback to ensure we stay one step ahead of AI content generators and provide the most accurate results possible.
For a more comprehensive list of frequently asked questions about the Copyleaks AI Detector and its capabilities, click here.