BUG: Validate path type in read_parquet, reject non-path/file-like #62979

YukunR · 2025-11-04T13:44:24Z

closes BUG: read_parquet fails when passing a list of remote files (gcs) #62922
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
No whatsnew entry required (small API validation change, no behavior change for valid inputs)

Description

read_parquet documentation specifies that only str / os.PathLike or file-like objects are supported.
Previously, passing unexpected container types such as list would reach backends and fail inconsistently.

This PR adds an early TypeError check at the public API boundary to provide clear, consistent behaviour across pyarrow / fastparquet engines.

Tests are added for both valid and invalid path types.

…andas-devgh-62922)

mroeschke · 2025-11-04T17:35:08Z

pandas/tests/io/test_parquet.py

+def test_read_parquet_valid_path_types(tmp_path, engine):
+    # GH #62922
+    df = pd.DataFrame({"a": [1]})
+    path = tmp_path / "test_read_parquet.parquet"
+    df.to_parquet(path, engine=engine)
+    # str
+    read_parquet(str(path), engine=engine)
+    # os.PathLike
+    read_parquet(pathlib.Path(path), engine=engine)
+    # file-like object
+    buf = BytesIO()
+    df.to_parquet(buf, engine=engine)
+    buf.seek(0)
+    read_parquet(buf, engine=engine)


This test is not needed

mroeschke · 2025-11-04T17:35:23Z

pandas/tests/io/test_parquet.py

+    bad_path_types = [
+        [str(path)],  # list
+        (str(path),),  # tuple
+        b"raw-bytes",  # bytes
+    ]


Testing the list case is sufficient

mroeschke · 2025-11-04T17:35:53Z

pandas/io/parquet.py

+        raise TypeError(
+            f"read_parquet expected str/os.PathLike or file-like object, "
+            f"got {type(path).__name__} type"
+        )


This should probably be done using get_handle in the _get_path_or_handle function.

Hi, thanks for the review!

I looked into _get_path_or_handle more closely.

get_handle is only invoked when we already know path_or_handle is a string and not a directory. For invalid inputs like a list, we never reach that branch as they get passed through stringify_path unchanged.

Also, _get_path_or_handle is only used in PyArrowImpl.read, not in FastParquetImpl.read. So if we only rely on _get_path_or_handle to validate input, validation coverage would be asymmetric across engines.

So, I propose to validate the path type in read_parquet before engine dispatch. Alternatively, I can factor a small _validate_parquet_path_arg(path) helper and call it at the top of both PyArrowImpl.read and FastParquetImpl.read

Let me know which placement you prefer.

BUG: validate path type in read_parquet; reject non-path/file-like (p…

26b56d6

…andas-devgh-62922)

YukunR changed the title ~~BUG: Validate path type in read_parquet, reject non-path/file-like (#62964)~~ BUG: Validate path type in read_parquet, reject non-path/file-like Nov 4, 2025

mroeschke requested changes Nov 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Validate path type in read_parquet, reject non-path/file-like #62979

BUG: Validate path type in read_parquet, reject non-path/file-like #62979

YukunR commented Nov 4, 2025 •

edited

Loading

Uh oh!

mroeschke Nov 4, 2025

Uh oh!

mroeschke Nov 4, 2025

Uh oh!

mroeschke Nov 4, 2025

Uh oh!

YukunR Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

BUG: Validate path type in read_parquet, reject non-path/file-like #62979

Are you sure you want to change the base?

BUG: Validate path type in read_parquet, reject non-path/file-like #62979

Conversation

YukunR commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

mroeschke Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

mroeschke Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

mroeschke Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

YukunR Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YukunR commented Nov 4, 2025 •

edited

Loading