Skip to main content
CVE-2025-30065 Critical Patch Available

CVE-2025-30065: Apache Parquet — Remote Code Execution

CVE Details

CVE ID CVE-2025-30065
CVSS Score 10
Severity Critical
Vendor Apache Software Foundation
Product Apache Parquet
Patch Status Available
Published April 2, 2025
EPSS Score 0.4%
CISA Patch Deadline ⚠ April 23, 2025 Federal deadline passed

Background

Apache Parquet is a columnar storage format widely used in the big data ecosystem — Hadoop, Spark, Flink, Hive, and many other data processing frameworks use Parquet files as a standard format for storing analytical data. If your organisation does any significant data engineering or analytics work, there’s a good chance Parquet files are flowing through your pipelines.

CVE-2025-30065 is a CVSS 10.0 in the parquet-avro module — the part that reads Parquet files containing embedded Avro schemas. The attack vector is “network” because data pipelines commonly read Parquet files from remote sources (S3, GCS, HDFS, APIs), meaning an attacker who can supply a malicious Parquet file to a pipeline achieves code execution on the processing server without any direct network access.

This was patched in April 2025. It’s a particularly interesting vulnerability because the attack surface is files rather than network endpoints.

Technical Mechanism

Parquet files can embed Avro schemas. The parquet-avro module processes these embedded schemas when reading Parquet files. The vulnerability is in how this module deserialises Java objects from specific Parquet file metadata.

When the parquet-avro reader processes a Parquet file containing a specially crafted Avro schema section, it deserialises a Java object from the schema metadata without proper type validation. This triggers a Java deserialization gadget chain.

The attack flow in a typical data pipeline:

  1. Attacker creates a malicious Parquet file containing a crafted embedded schema with a serialized gadget chain payload
  2. The malicious file is placed somewhere the target reads from:
    • Uploaded to a data lake (S3, GCS, Azure Blob)
    • Sent via an API that accepts Parquet format
    • Injected into a data feed the pipeline reads from
  3. The data pipeline reads the file using Apache Spark, Flink, or another Parquet-consuming framework
  4. The parquet-avro module deserialises the embedded schema, triggering the gadget chain
  5. The gadget chain executes OS commands on the Spark executor or pipeline node

The danger here is that data pipelines often read from partially trusted or untrusted sources (external data feeds, vendor files, customer uploads) and the engineers running them may not think of file parsing as a code execution risk.

Real-World Exploitation Evidence

As of the initial CISA KEV addition, exploitation was confirmed in the wild, though detailed campaign documentation was limited at disclosure time. The severity and the breadth of the attack surface (essentially any Java application reading Parquet files) made it an immediate priority:

  • Supply chain risk — the attack vector via data files means exploits don’t need network access to the victim; injecting a file into a shared data lake is sufficient
  • Cloud environment targeting — Spark clusters in AWS, GCP, and Azure run with cloud IAM permissions; code execution on a Spark executor can lead to cloud credential theft and broader cloud account compromise
  • Data exfiltration — pipelines processing sensitive analytical data are high-value targets
  • Broad ecosystem impact — parquet-avro is a transitive dependency in many data frameworks; the blast radius is significant

Impact Assessment

  • Remote code execution on data processing nodes (Spark workers, Flink nodes, ETL servers)
  • Cloud credential theft — cloud execution environments have IAM roles; code execution on them exposes cloud credentials
  • Data access and exfiltration — the processing pipeline has access to the data it processes; full exfiltration possible
  • Lateral movement — from Spark worker nodes, attackers can move to cluster infrastructure
  • Data manipulation — writing malicious data back to the pipeline, corrupting analytics
  • Persistent access — backdoors in data processing infrastructure can be difficult to detect

Affected Versions

ProductAffected VersionsFixed Version
Apache Parquet1.15.0 and earlier1.15.1

Note: Only affects the parquet-avro module. Pure parquet-mr without Avro schema support is not affected.

Remediation Steps

  1. Update parquet-avro to version 1.15.1 or later in all dependencies
  2. In Maven:
    <dependency>
      <groupId>org.apache.parquet</groupId>
      <artifactId>parquet-avro</artifactId>
      <version>1.15.1</version>
    </dependency>
  3. In Gradle:
    implementation 'org.apache.parquet:parquet-avro:1.15.1'
  4. Check transitive dependencies — many frameworks include parquet-avro as a transitive dependency; you may need to force the version:
    configurations.all {
      resolutionStrategy.force 'org.apache.parquet:parquet-avro:1.15.1'
    }
  5. Review data sources that feed Parquet files into your pipelines — if any are untrusted, add validation
  6. Audit cloud IAM permissions for Spark/Flink execution roles — least privilege reduces the blast radius of any code execution
  7. Deploy Java security managers or equivalent sandboxing where applicable

Detection Guidance

Data pipeline monitoring — look for:

  • Unexpected process execution from Spark executor JVMs
  • Unusual network connections from pipeline nodes
  • Abnormal CPU patterns that might indicate crypto mining or large data exfiltration

Cloud audit logs — if running in AWS/GCP/Azure:

  • Look for unexpected API calls from Spark instance IAM roles
  • Monitor for unusual data access patterns (large GetObject requests, cross-region data movement)

Java security logging — enable Java Security Manager logging to detect blocked reflection calls and class loading anomalies.

Suricata signature (for outbound connections from pipeline nodes):

alert tcp $DATA_PIPELINE_NETS any -> any any (msg:"Possible CVE-2025-30065 Post-Exploit Callback from Pipeline"; flow:established,to_server; content:"|00 00 00 00|"; offset:0; depth:4; classtype:trojan-activity; sid:2034025; rev:1;)

Timeline

DateEvent
2025CVE-2025-30065 discovered in Apache Parquet
1 April 2025Apache releases Parquet 1.15.1 patching CVE-2025-30065
April 2025CISA adds to Known Exploited Vulnerabilities catalogue
April 2025Active exploitation confirmed in the wild
2025Security teams across data engineering community rush to update dependencies