ENH: support reading directory in read_csv #61275

fangchenli · 2025-04-12T07:09:32Z

closes Support reading directory in read_csv bodo-ai/Bodo-Pandas-Collaboration#2
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

…tory

mroeschke · 2025-04-22T16:07:06Z

FWIW I recall the team being negative in the past about supporting reading directories of files, and we document just concatting DataFrames read from a directory: https://pandas.pydata.org/docs/user_guide/cookbook.html#reading-multiple-files-to-create-a-single-dataframe. Are we sure we want to include this?

datapythonista · 2025-04-30T22:05:42Z

FWIW I recall the team being negative in the past about supporting reading directories of files

Do you remember the reason? This seems like a useful thing, as I think it's common for some datasets to be split in different files with the same schema. And there is some added complexity to this, but it seems consistent with other syntactic sugar we have in IO operations such as decompressing, downloading, etc.

…tory

datapythonista · 2025-05-20T22:31:56Z

Note that you've got the image from Will's book in this PR, this happened when we had to hard revert it from git history.

…tory

…ad-csv-from-directory

…tory

jbrockmendel · 2025-08-15T21:10:54Z

i think an unrelated file got added?

…tory

fangchenli · 2025-08-15T22:02:02Z

i think an unrelated file got added?

Removed.

…tory

Copilot

Pull Request Overview

This PR adds support for reading from directories in pandas.read_csv, read_table, and read_fwf, enabling users to process multiple CSV files from both local folders and remote locations via fsspec. The feature returns a generator that yields DataFrames (or TextFileReaders when using chunking/iterator mode) for each file in the directory.

Introduces iterdir() function to handle directory traversal for both local and remote paths
Extends read_csv, read_table, and read_fwf to accept directories and return generators
Updates error messages and exception types for better consistency and clarity

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
pandas/io/common.py	Adds `iterdir()` function and helper functions to support directory iteration for local and remote filesystems
pandas/io/parsers/readers.py	Modifies `_read()` to handle directories and introduces `_multi_file_generator()` for yielding DataFrames from multiple files
pandas/tests/io/conftest.py	Adds fixtures for testing local/remote CSV directories and files
pandas/tests/io/test_common.py	Adds tests for the new `iterdir()` functionality and updates error message patterns
pandas/tests/io/test_fsspec.py	Updates test to use fixture instead of hardcoded filename
pandas/tests/io/parser/test_directory.py	New test file for directory reading functionality
pandas/tests/io/parser/test_compression.py	Adds fixture for empty zip file and updates test to use it
pandas/tests/io/parser/test_unsupported.py	Updates test to expect TypeError instead of ValueError for invalid file inputs
pandas/tests/io/parser/common/test_file_buffer_url.py	Updates error messages and exception types for consistency
doc/source/whatsnew/v3.0.0.rst	Documents the new directory reading feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pandas/io/common.py

Copilot · 2025-11-05T20:44:55Z

pandas/io/common.py

+            path_obj = PurePosixPath(file["name"])
+            if _match_file(
+                path_obj,
+                extensions,
+                glob,
+            ):
+                result.append(f"{scheme}://{path_obj}")  # type: ignore[arg-type]


[nitpick] Inconsistent naming: the variable path_obj is created from file[\"name\"] which is a string path, but the variable is already defined above (line 1466) for the single file case. Consider using a different variable name like file_path_obj to distinguish this from the earlier usage and improve readability.

Suggested change

path_obj = PurePosixPath(file["name"])

if _match_file(

path_obj,

extensions,

glob,

):

result.append(f"{scheme}://{path_obj}") # type: ignore[arg-type]

file_path_obj = PurePosixPath(file["name"])

if _match_file(

file_path_obj,

extensions,

glob,

):

result.append(f"{scheme}://{file_path_obj}") # type: ignore[arg-type]

pandas/io/common.py

pandas/io/parsers/readers.py

Copilot · 2025-11-05T20:44:56Z

doc/source/whatsnew/v3.0.0.rst

 - Support reading Stata 102-format (Stata 1) dta files (:issue:`58978`)
 - Support reading Stata 110-format (Stata 7) dta files (:issue:`47176`)
 - Switched wheel upload to **PyPI Trusted Publishing** (OIDC) for release-tag pushes in ``wheels.yml``. (:issue:`61718`)
+- Added support for reading from directories in :func:`pandas.read_csv`, including local folders and remote locations via ``fsspec``


The whatsnew entry should mention all affected functions (read_csv, read_table, and read_fwf) or use 'CSV reading functions' for accuracy, as the changes apply to multiple reader functions.

Suggested change

- Added support for reading from directories in :func:`pandas.read_csv`, including local folders and remote locations via ``fsspec``

- Added support for reading from directories in CSV reading functions (:func:`pandas.read_csv`, :func:`pandas.read_table`, and :func:`pandas.read_fwf`), including local folders and remote locations via ``fsspec``

pandas/io/common.py

…li/pandas into read-csv-from-directory

WillAyd and others added 2 commits April 11, 2025 08:48

Add Pandas Cookbook to Book Recommendations (pandas-dev#61271)

84d6bd3

bug fix

16cf492

fangchenli added the IO CSV read_csv, to_csv label Apr 12, 2025

datapythonista force-pushed the main branch from 84d6bd3 to adec21f Compare April 12, 2025 16:03

fangchenli added 6 commits April 12, 2025 20:29

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

b69fad1

…tory

fix win related error

822dffc

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

5637dca

…tory

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

3905f1c

…tory

add encoding

361c41c

fix import

02f93bd

fangchenli added 12 commits May 3, 2025 19:49

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

c77158e

…tory

format

179f911

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

d7bef62

…tory

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

db1c7ed

…tory

improve test

91a7956

debug for new fsspec

8b5cdd4

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

13c1258

…tory

debug min version fsspec

abce2fd

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

70bcb2a

…tory

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

b99b641

…tory

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

14d7afc

…tory

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

2a445f3

…tory

fangchenli added 5 commits May 21, 2025 22:35

format

3173270

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

f94a0bf

…tory

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

38bed64

…tory

Merge remote-tracking branch 'upstream' into read-csv-from-directory

2a66b92

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

a2b65e1

…tory

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

b07bee1

…tory

fangchenli changed the title ~~[WIP] ENH: support reading directory in read_csv~~ ENH: support reading directory in read_csv Jul 21, 2025

fangchenli added 7 commits July 21, 2025 17:05

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

65c6f9a

…tory

Merge branch 'main' into read-csv-from-directory

9e1c9c7

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

e4902c7

…tory

Merge remote-tracking branch 'origin/read-csv-from-directory' into re…

7ed1c00

…ad-csv-from-directory

add note to 3.0 whatsnew doc

ca3f0fc

Merge branch 'main' into read-csv-from-directory

2b3d8d1

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

26102ab

…tory

fangchenli requested a review from jbrockmendel August 8, 2025 19:11

fangchenli added 3 commits August 11, 2025 22:35

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

5ab4a1e

…tory

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

aaa95ae

…tory

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

dc8ec97

…tory

fangchenli added 2 commits August 15, 2025 14:59

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

6d53b13

…tory

remove cookbook img

f792a15

fangchenli added 9 commits August 19, 2025 22:52

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

e7fee01

…tory

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

61bef41

…tory

Merge remote-tracking branch 'upstream' into read-csv-from-directory

b14c470

Merge remote-tracking branch 'upstream' into read-csv-from-directory

da1c1ed

Merge remote-tracking branch 'upstream' into read-csv-from-directory

1893382

Merge remote-tracking branch 'upstream/main' into read-csv-from-direc…

9924ebe

…tory

Merge remote-tracking branch 'upstream' into read-csv-from-directory

90ba4c0

Merge branch 'main' into read-csv-from-directory

059d0ff

Merge branch 'main' into read-csv-from-directory

828a047

fangchenli requested a review from Copilot November 5, 2025 20:34

Copilot AI reviewed Nov 5, 2025

View reviewed changes

fangchenli added 2 commits November 5, 2025 15:49

Merge branch 'read-csv-from-directory' of https://github.com/fangchen…

0ffeaf2

…li/pandas into read-csv-from-directory

improve docs

1afe0e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: support reading directory in read_csv #61275

ENH: support reading directory in read_csv #61275

fangchenli commented Apr 12, 2025 •

edited

Loading

Uh oh!

mroeschke commented Apr 22, 2025

Uh oh!

datapythonista commented Apr 30, 2025

Uh oh!

datapythonista commented May 20, 2025

Uh oh!

jbrockmendel commented Aug 15, 2025

Uh oh!

fangchenli commented Aug 15, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Nov 5, 2025

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	- Added support for reading from directories in :func:`pandas.read_csv`, including local folders and remote locations via ``fsspec``
	- Added support for reading from directories in CSV reading functions (:func:`pandas.read_csv`, :func:`pandas.read_table`, and :func:`pandas.read_fwf`), including local folders and remote locations via ``fsspec``

Uh oh!

ENH: support reading directory in read_csv #61275

Are you sure you want to change the base?

ENH: support reading directory in read_csv #61275

Conversation

fangchenli commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mroeschke commented Apr 22, 2025

Uh oh!

datapythonista commented Apr 30, 2025

Uh oh!

datapythonista commented May 20, 2025

Uh oh!

jbrockmendel commented Aug 15, 2025

Uh oh!

fangchenli commented Aug 15, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fangchenli commented Apr 12, 2025 •

edited

Loading