how to use pkgutils.get_data with csv.reader in python?
I have a python module that has a variety of data files, (a set of csv files representing curves) that need to be loaded at runtime. The csv module works very well
# curvefile = "ntc.10k.csv"
raw = csv.reader(open(curvefile, 'rb'), delimiter=',')
But if I import this module into another script, I need to find the full path to the data file.
/project
/shared
curve.py
ntc.10k.csv
ntc.2k5.csv
/apps
script.py
I want the script.py to just refer to the curves by basic filename, not with full paths. In the module code, I can use:
pkgutil.get_data("curve", "ntc.10k.csv")
which works very well at finding the开发者_如何学Python file, but it returns the csv file already read in, whereas the csv.reader requires the file handle itself. Is there any way to make these two modules play well together? They're both standard libary modules, so I wasn't really expecting problems. I know I can start splitting the pkgutil binary file data, but then I might as well not be using the csv library.
I know I can just use this in the module code, and forget about pkgutils, but it seems like pkgutils is really exactly what this is for.
this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, curvefile)
raw = csv.reader(open(DATA_PATH, "rb"))
I opened up the source code to get_data
, and it is trivial to have it return the path to the file instead of the loaded file. This module should do the trick. Use the keyword as_string=True
to return the file read into memory, or as_string=False
, to return the path.
import os, sys
from pkgutil import get_loader
def get_data_smart(package, resource, as_string=True):
"""Rewrite of pkgutil.get_data() that actually lets the user determine if data should
be returned read into memory (aka as_string=True) or just return the file path.
"""
loader = get_loader(package)
if loader is None or not hasattr(loader, 'get_data'):
return None
mod = sys.modules.get(package) or loader.load_module(package)
if mod is None or not hasattr(mod, '__file__'):
return None
# Modify the resource name to be compatible with the loader.get_data
# signature - an os.path format "filename" starting with the dirname of
# the package's __file__
parts = resource.split('/')
parts.insert(0, os.path.dirname(mod.__file__))
resource_name = os.path.join(*parts)
if as_string:
return loader.get_data(resource_name)
else:
return resource_name
It's not ideal, especially for very large files, but you can use StringIO to turn a string into something with a read() method, which csv.reader should be able to handle.
csvdata = pkgutil.get_data("curve", "ntc.10k.csv")
csvio = StringIO(csvdata)
raw = csv.reader(csvio)
Over 10 years after the question has been asked, but I came here using Google and went down the rabbit hole posted in other answers. Nowadays this seems to be more straightforward. Below my implementation using stdlib's importlib
that returns the filesystem path to the package's resource as string. Should work with 3.6+.
import importlib.resources
import os
def get_data_file_path(package: str, resource: str) -> str:
"""
Returns the filesystem path of a resource marked as package
data of a Python package installed.
:param package: string of the Python package the resource is
located in, e.g. "mypackage.module"
:param resource: string of the filename of the resource (do not
include directory names), e.g. "myfile.png"
:return: string of the full (absolute) filesystem path to the
resource if it exists.
:raises ModuleNotFoundError: In case the package `package` is not found.
:raises FileNotFoundError: In case the file in `resource` is not
found in the package.
"""
# Guard against non-existing files, or else importlib.resources.path
# may raise a confusing TypeError.
if not importlib.resources.is_resource(package, resource):
raise FileNotFoundError(f"Python package '{package}' resource '{resource}' not found.")
with importlib.resources.path(package, resource) as resource_path:
return os.fspath(resource_path)
Another way is to use json.loads() along-with file.decode(). As get_data() retrieves data as bytes, need to convert it to string in-order to process it as json
import json
import pkgutil
data_file = pkgutil.get_data('test.testmodel', 'data/test_data.json')
length_data_file = len(json.loads(data_file.decode()))
Reference
精彩评论