CVE-2025-30065: Apache Parquet -- Remote Code Execution

Background

Apache Parquet is a columnar storage format widely used in the big data ecosystem — Hadoop, Spark, Flink, Hive, and many other data processing frameworks use Parquet files as a standard format for storing analytical data. If your organisation does any significant data engineering or analytics work, there’s a good chance Parquet files are flowing through your pipelines.

CVE-2025-30065 is a CVSS 10.0 in the parquet-avro module — the part that reads Parquet files containing embedded Avro schemas. The attack vector is “network” because data pipelines commonly read Parquet files from remote sources (S3, GCS, HDFS, APIs), meaning an attacker who can supply a malicious Parquet file to a pipeline achieves code execution on the processing server without any direct network access.

This was patched in April 2025. It’s a particularly interesting vulnerability because the attack surface is files rather than network endpoints.

Technical Mechanism

Parquet files can embed Avro schemas. The parquet-avro module processes these embedded schemas when reading Parquet files. The vulnerability is in how this module deserialises Java objects from specific Parquet file metadata.

When the parquet-avro reader processes a Parquet file containing a specially crafted Avro schema section, it deserialises a Java object from the schema metadata without proper type validation. This triggers a Java deserialization gadget chain.

The attack flow in a typical data pipeline:

Attacker creates a malicious Parquet file containing a crafted embedded schema with a serialized gadget chain payload
The malicious file is placed somewhere the target reads from:
- Uploaded to a data lake (S3, GCS, Azure Blob)
- Sent via an API that accepts Parquet format
- Injected into a data feed the pipeline reads from
The data pipeline reads the file using Apache Spark, Flink, or another Parquet-consuming framework
The parquet-avro module deserialises the embedded schema, triggering the gadget chain
The gadget chain executes OS commands on the Spark executor or pipeline node

The danger here is that data pipelines often read from partially trusted or untrusted sources (external data feeds, vendor files, customer uploads) and the engineers running them may not think of file parsing as a code execution risk.

Real-World Exploitation Evidence

As of the initial CISA KEV addition, exploitation was confirmed in the wild, though detailed campaign documentation was limited at disclosure time. The severity and the breadth of the attack surface (essentially any Java application reading Parquet files) made it an immediate priority:

Supply chain risk — the attack vector via data files means exploits don’t need network access to the victim; injecting a file into a shared data lake is sufficient
Cloud environment targeting — Spark clusters in AWS, GCP, and Azure run with cloud IAM permissions; code execution on a Spark executor can lead to cloud credential theft and broader cloud account compromise
Data exfiltration — pipelines processing sensitive analytical data are high-value targets
Broad ecosystem impact — parquet-avro is a transitive dependency in many data frameworks; the blast radius is significant

Impact Assessment

Remote code execution on data processing nodes (Spark workers, Flink nodes, ETL servers)
Cloud credential theft — cloud execution environments have IAM roles; code execution on them exposes cloud credentials
Data access and exfiltration — the processing pipeline has access to the data it processes; full exfiltration possible
Lateral movement — from Spark worker nodes, attackers can move to cluster infrastructure
Data manipulation — writing malicious data back to the pipeline, corrupting analytics
Persistent access — backdoors in data processing infrastructure can be difficult to detect

Affected Versions

Product	Affected Versions	Fixed Version
Apache Parquet	1.15.0 and earlier	1.15.1

Note: Only affects the parquet-avro module. Pure parquet-mr without Avro schema support is not affected.

Remediation Steps

Update parquet-avro to version 1.15.1 or later in all dependencies

In Maven:

<dependency>
  <groupId>org.apache.parquet</groupId>
  <artifactId>parquet-avro</artifactId>
  <version>1.15.1</version>
</dependency>

In Gradle:

implementation 'org.apache.parquet:parquet-avro:1.15.1'

Check transitive dependencies — many frameworks include parquet-avro as a transitive dependency; you may need to force the version:
```
configurations.all {
  resolutionStrategy.force 'org.apache.parquet:parquet-avro:1.15.1'
}
```
Review data sources that feed Parquet files into your pipelines — if any are untrusted, add validation
Audit cloud IAM permissions for Spark/Flink execution roles — least privilege reduces the blast radius of any code execution
Deploy Java security managers or equivalent sandboxing where applicable

Detection Guidance

Data pipeline monitoring — look for:

Unexpected process execution from Spark executor JVMs
Unusual network connections from pipeline nodes
Abnormal CPU patterns that might indicate crypto mining or large data exfiltration

Cloud audit logs — if running in AWS/GCP/Azure:

Look for unexpected API calls from Spark instance IAM roles
Monitor for unusual data access patterns (large GetObject requests, cross-region data movement)

Java security logging — enable Java Security Manager logging to detect blocked reflection calls and class loading anomalies.

Suricata signature (for outbound connections from pipeline nodes):

alert tcp $DATA_PIPELINE_NETS any -> any any (msg:"Possible CVE-2025-30065 Post-Exploit Callback from Pipeline"; flow:established,to_server; content:"|00 00 00 00|"; offset:0; depth:4; classtype:trojan-activity; sid:2034025; rev:1;)

Timeline

Date	Event
2025	CVE-2025-30065 discovered in Apache Parquet
1 April 2025	Apache releases Parquet 1.15.1 patching CVE-2025-30065
April 2025	CISA adds to Known Exploited Vulnerabilities catalogue
April 2025	Active exploitation confirmed in the wild
2025	Security teams across data engineering community rush to update dependencies

Background

Technical Mechanism

Real-World Exploitation Evidence

Impact Assessment

Affected Versions

Remediation Steps

Detection Guidance

Timeline

References

Related Analysis

CVE-2026-25089: Fortinet FortiSandbox — Unauthenticated OS Command Injection

CVE-2026-39808: Fortinet FortiSandbox — Second Unauthenticated Command Injection

CVE-2026-58644: Microsoft SharePoint — Unauthenticated Deserialization RCE