how to use pkgutils.get_data with csv.reader in python?

2023-02-11 02:18 问答作者：

I have a python module that has a variety of data files, (a set of csv files representing curves) that need to be loaded at runtime. The csv module works very well

  # curvefile = "ntc.10k.csv"
  raw = csv.reader(open(curvefile, 'rb'), delimiter=',')

But if I import this module into another script, I need to find the full path to the data file.

/project
   /shared
       curve.py
       ntc.10k.csv
       ntc.2k5.csv
   /apps
       script.py

I want the script.py to just refer to the curves by basic filename, not with full paths. In the module code, I can use:

pkgutil.get_data("curve", "ntc.10k.csv")

which works very well at finding the开发者_如何学Python file, but it returns the csv file already read in, whereas the csv.reader requires the file handle itself. Is there any way to make these two modules play well together? They're both standard libary modules, so I wasn't really expecting problems. I know I can start splitting the pkgutil binary file data, but then I might as well not be using the csv library.

I know I can just use this in the module code, and forget about pkgutils, but it seems like pkgutils is really exactly what this is for.

this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, curvefile)
raw = csv.reader(open(DATA_PATH, "rb"))

I opened up the source code to get_data, and it is trivial to have it return the path to the file instead of the loaded file. This module should do the trick. Use the keyword as_string=True to return the file read into memory, or as_string=False, to return the path.

import os, sys

from pkgutil import get_loader

def get_data_smart(package, resource, as_string=True):
"""Rewrite of pkgutil.get_data() that actually lets the user determine if data should
be returned read into memory (aka as_string=True) or just return the file path.
"""

loader = get_loader(package)
if loader is None or not hasattr(loader, 'get_data'):
    return None
mod = sys.modules.get(package) or loader.load_module(package)
if mod is None or not hasattr(mod, '__file__'):
    return None

# Modify the resource name to be compatible with the loader.get_data
# signature - an os.path format "filename" starting with the dirname of
# the package's __file__
parts = resource.split('/')
parts.insert(0, os.path.dirname(mod.__file__))
resource_name = os.path.join(*parts)
if as_string:
    return loader.get_data(resource_name)
else:
    return resource_name

It's not ideal, especially for very large files, but you can use StringIO to turn a string into something with a read() method, which csv.reader should be able to handle.

csvdata = pkgutil.get_data("curve", "ntc.10k.csv") 
csvio = StringIO(csvdata)
raw = csv.reader(csvio)

Over 10 years after the question has been asked, but I came here using Google and went down the rabbit hole posted in other answers. Nowadays this seems to be more straightforward. Below my implementation using stdlib's importlib that returns the filesystem path to the package's resource as string. Should work with 3.6+.

import importlib.resources
import os


def get_data_file_path(package: str, resource: str) -> str:
    """
    Returns the filesystem path of a resource marked as package
    data of a Python package installed.

    :param package: string of the Python package the resource is
                    located in, e.g. "mypackage.module"
    :param resource: string of the filename of the resource (do not
                     include directory names), e.g. "myfile.png"
    :return: string of the full (absolute) filesystem path to the
             resource if it exists.
    :raises ModuleNotFoundError: In case the package `package` is not found.
    :raises FileNotFoundError: In case the file in `resource` is not
                               found in the package.
    """
    # Guard against non-existing files, or else importlib.resources.path
    # may raise a confusing TypeError.
    if not importlib.resources.is_resource(package, resource):
        raise FileNotFoundError(f"Python package '{package}' resource '{resource}' not found.")

    with importlib.resources.path(package, resource) as resource_path:
        return os.fspath(resource_path)

Another way is to use json.loads() along-with file.decode(). As get_data() retrieves data as bytes, need to convert it to string in-order to process it as json

import json
import pkgutil
data_file = pkgutil.get_data('test.testmodel', 'data/test_data.json')
length_data_file = len(json.loads(data_file.decode()))

Reference

继续阅读：python

how to use pkgutils.get_data with csv.reader in python?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？