Find "string" in Text File - Add it to Excel File Using Python

2023-01-23 18:40 问答作者：

I ran a grep command and found se开发者_C百科veral hundred instances of a string in a large directory of data. This file is 2 MB and has strings that I would like to extract out and put into an Excel file for easy access later. The part that I'm extracting is a path to a data file I need to work on later.

I have been reading about Python lately and thought I could somehow do this extraction automatically. But I'm a bit stumped how to start. I have this so far:

data = open("C:\python27\text.txt").read()

if "string" in data:

But then I'm not sure what to use to get out of the file what I want. Anything for a beginner to chew on?

EDIT

Here is some more info on what I was looking for. I have several hundred lines in a text file. Each line has a path and some strings like this:

/path/to/file:STRING=SOME_STRING, ANOTHER_STRING

What I would like from these lines are the paths of those lines with a specific "STRING=SOME_STRING". For example if the line looks like this, I want the path (/path/to/file) to be extracted to another file:

/path/to/file:STRING=SOME_STRING

All this is quite easily done with standard Python, but for "excel" (xls,or xlsx) files -- you'd have to install a third party library for that. However, if you need just a 2D table that cna open up on a spreadsheed you can use Comma Separated Values (CSV) files - these are comaptible with Excel and other spreadsheet software, and comes integrated in Python.

As for searching a string inside a file, it is straightforward. You may not even need regular expressions for most things. What information do you want along with the string?

Also, the "os" module onthse standardlib has some functions to list all files in a directory, or in a directory tree. The most straightforward is os.listdir(path)

String methods like "count" and "find" can be used beyond "in" to locate the string in a file, or count the number of ocurrences.

And finally, the "CSV" module can write a properly formated file to read in ay spreadsheet.

Along the away, you may abuse python's buit-in list objects as an easy way to manipulate data sets around.

Here is a sample programa that counts strings given in the command line found in files in a given directory,, and assembles a .CSV table with them:

# -*- coding: utf-8 -*-
import csv
import sys, os

output_name = "count.csv"

def find_in_file(path, string_list):
    count = []
    file_ = open(path)
    data = file_.read()
    file_.close()
    for string in string_list:
        count.append(data.count(string))
    return count


def main():
    if len(sys.argv) < 3:
        print "Use %s directory_path <string1>[ string2 [...]])\n"  % __package__
        sys.exit(1)
    target_dir = sys.argv[1]
    string_list = sys.argv[2:]
    csv_file = open(output_name, "wt")
    writer = csv.writer(csv_file)
    header = ["Filename"] + string_list
    writer.writerow(header)
    for filename in os.listdir(target_dir):
        path = os.path.join(target_dir, filename)
        if not os.path.isfile(path):
            continue
        line = [filename] + find_in_file(path, string_list)
        writer.writerow(line)
    csv_file.close()

if __name__=="__main__":
    main()

The steps to do this are as follows:

Make a list of all files in the directory (This isn't necessary if you're only interested in a single file)
Extract the names of those files that you're interested in
In a loop, read in those files line by line
See if the line matches your pattern
Extract the part of the line before the first : character

So, the code would look something like this, provided your text files are formatted the way you've shown in the question and that this format is reliably correct:

import sys, os, glob

dir_path = sys.argv[1]
if dir_path[-1] != os.sep: dir_path+=os.sep

file_list = glob.glob(dir_path+'*.txt') #use standard *NIX wildcards to get your file names, in this case, all the files with a .txt extension

with open('out_file.csv', 'w') as out_file:
    for filename in file_list:
        with open(filename, 'r') as in_file:
            for line in in_file:
                if 'STRING=SOME_STRING' in line:
                    out_file.write(line.split(':')[0]+'\n')

This program would be run as python extract_paths.py path/to/directory and would give you a file called out_file.csv in your current directory.

This file can then be imported into Excel as a CSV file. If your input is less reliable than you've suggested, regular expressions might be a better choice.

继续阅读：excel grep python string

Find "string" in Text File - Add it to Excel File Using Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？