Home ยป Inside the massive Chinese 8.7 billion data leak

Inside the massive Chinese 8.7 billion data leak

by Simon Jones Tech Reporter
3rd Feb 26 2:46 pm

While anything China-related tends to yield extremely high numbers, the latest data leak is massive even by Chinese standards. On January 1st 2026, theย Cybernewsย research team discovered 8.73 billion Chinese records exposed online.

The leaked data ranges from national ID numbers and home addresses to social media identifiers and email addresses, severely increasing identity theft and account takeover risks for individuals involved.

The exposed data was stored on a massive Elasticsearch cluster. Organizations and businesses use Elasticsearch because it supports rapid sorting, near-real-time data searching, and high scalability. For example, the cluster our team discovered contained 163 indices, housing billions upon billions of records.

The massive data cluster was discovered on the first days of 2026 and remained open for over three weeks. While there are no indications that the data was abused by malicious actors, if our researchers managed to find it, thereโ€™s no reason others couldnโ€™t too.

โ€œDespite the short exposure window, the scale of the dataset means that automated scraping during this period could have resulted in widespread secondary dissemination,โ€ our researchers said.

Bob Diachenko, a Cybernews contributor, cybersecurity researcher, and owner of SecurityDiscovery.com, is behind this major discovery. According to him, the cluster’s metadata across multiple datasets shows that data was imported as recently as late 2025.

โ€œThe presence of timestamps and import dates points to a long-running aggregation effort rather than a single historical breach,โ€ the team explained.

What information has the major Chinese data leak exposed?

Since exposed records are spread across multiple indices, they vary widely. The exposed records range from full names and poorly protected account passwords to messaging and social media identifiers.

According to the team, the exposed data aggregates personal identifiers, contact information, government-style identifiers, online account references, and credentials at an unprecedented scale.

The geographic distribution of the leaked records is limited, predominantly focusing on mainland China, with regional metadata spanning multiple Chinese provinces and cities.

Researchers note that the exposed cluster was highly organized and segmented, with thematic indices adhering to data type. For example, the team observed phone-centric, ID-centric, account-centric, and other types of datasets.

As the database contained no banner, no organization names, and no operator identifiers, the team could not confirm the identity of the data owner. At the same time, no public claim of ownership has emerged.

โ€œThe infrastructure was hosted on a bulletproof hosting provider, commonly associated with high-risk or non-compliant data operations. Moreover, the dataset structure and scale suggest intentional aggregation, not accidental logging or misconfiguration by a single consumer service,โ€ our researchers said.

Interestingly, the datatypes present in the cluster matched the types of data that data brokers collect. At the same time, other services hosted on the server suggest that the personal and company information could have been abused by a malicious actor for financial fraud.

The team could not accurately evaluate how many individuals were exposed. While different clusters contained duplicate data, the sheer volume of exposed records still suggests the number of exposed individuals could be in the hundreds of millions.

Even though the 8.7 billion-record-strong dataset is no longer accessible, it was open for over three weeks, giving malicious actors ample time to scrape it. Our researchers believe attackers could utilize the data for multiple purposes.

For one, the exposed records included plaintext credentials, some with poorly protected passwords. This type of data is extremely useful for account takeovers, with cybercriminals accessing additional user details. Password information enables cybercrooks to carry out credential stuffing attacks, as users often reuse the same passwords for multiple accounts.

Anotherย major risk for individuals is identity theft. Since the dataset included tremendous amounts of PII, together with national identifiers, malicious actors may attempt to set up fraudulent accounts. ID numbers are often the key metric that organizations and businesses demand upon setting up accounts.

โ€œThis exposure demonstrates how large-scale personal data aggregation can persist outside regulatory oversight when hosted in permissive environments. Even without a confirmed owner, the dataset represents a systemic privacy risk affecting potentially hundreds of millions of individuals,โ€ our researchers explained.

Leave a Comment

You may also like

CLOSE AD