Getting one line in a huge file with bash

2022-12-30 03:34 问答作者：

How can I get a开发者_如何转开发 particular line in a 3 gig text file. All the lines have:

the same length, and
are delimited by \n.

And I need to be able to get any line on demand.

How can this be done? Only one line need be returned.

If all the lines have the same length, the best way by far will be to use dd(1) and give it a skip parameter.

Let the block size be the length of each line (including the newline), then you can do:

$ dd if=filename bs=<line-length> skip=<line_no - 1> count=1 2>/dev/null

The idea is to seek past all the previous lines (skip=<line_no - 1>) and read a single line (count=1). Because the block size is set to the line length (bs=<line-length>), each block is effectively a single line. Redirect stderr so you don't get the annoying stats at the end.

That should be much more efficient than streaming the lines before the one you want through a program to read all the lines and then throw them away, as dd will seek to the position you want in the file and read only one line of data from the file.

head -10 file | tail -1 returns line 10 probably slow though.

from here

# print line number 52 
sed -n '52p' # method 1 
sed '52!d' # method 2 
sed '52q;d' # method 3, efficient on large files

An awk alternative, where 3 is the line number.

awk 'NR == 3 {print; exit}' file.txt

If it's not a fixed-record-length file and you don't do some sort of indexing on the line starts, your best bet is to just use:

head -n N filespec | tail -1

where N is the line number you want.

This isn't going to be the best-performing piece of code for a 3Gb file unfortunately but there are ways to make it better.

If the file doesn't change too often, you may want to consider indexing it. By that I mean having another file with the line offsets in it as fixed length records.

So the file:

would give you an fast way to locate each line. Just multiply the desired line number by the index record size and seek to there in the index file.

Then use the value at that location to seek in the main file so you can read until the next newline character.

So for line 3, you would seek to 33 in the index file (index record length is 10 characters plus one more for the newline). Reading the value there, 0000000092, would give you the offset to use into the main file.

Of course, that's not so useful if the file changes frequently although, if you can control what happens when things get appended, you can still add offsets to the index efficiently. If you don't control that, you'll have to re-index whenever the last-modified date of the index is earlier than that of the main file.

And, based on your update:

Update: If it matters, all the lines have the same length.

With that extra piece of information, you don't need the index - you can just seek immediately to the right location in the main file by multiplying the record length by the record length (assuming the values fit into your data types).

So something like the pseudo-code:

def getline(fhandle,reclen,recnum):
    seek to position reclen*recnum for file fhandle.
    read reclen characters into buffer.
    return buffer.

Use q with sed to make the search stop after the line has been printed.

sed -n '11723{p;q}' filename

Python (minimal error checking):

#!/usr/bin/env python
import sys

# by Dennis Williamson - 2010-05-08
# for http://stackoverflow.com/questions/2794049/getting-one-line-in-a-huge-file-with-bash

# seeks the requested line in a file with a fixed line length

# Usage: ./lineseek.py LINE FILE

# Example: ./lineseek 11723 data.txt

EXIT_SUCCESS      = 0
EXIT_NOT_FOUND    = 1
EXIT_OPT_ERR      = 2
EXIT_FILE_ERR     = 3
EXIT_DATA_ERR     = 4

# could use a try block here
seekline = int(sys.argv[1])

file = sys.argv[2]

try:
    if file == '-':
        handle = sys.stdin
        size = 0
    else:
        handle = open(file,'r')
except IOError as e:
    print >> sys.stderr, ("File Open Error")
    exit(EXIT_FILE_ERR)

try:
    line = handle.readline()
    lineend = handle.tell()
    linelen = len(line)
except IOError as e:
    print >> sys.stderr, ("File I/O Error")
    exit(EXIT_FILE_ERR)

# it would be really weird if this happened
if lineend != linelen:
    print >> sys.stderr, ("Line length inconsistent")
    exit(EXIT_DATA_ERR)

handle.seek(linelen * (seekline - 1))

try:
    line = handle.readline()
except IOError as e:
    print >> sys.stderr, ("File I/O Error")
    exit(EXIT_FILE_ERR)

if len(line) != linelen:
    print >> sys.stderr, ("Line length inconsistent")
    exit(EXIT_DATA_ERR)

print(line)

Argument validation should be a lot better and there is room for many other improvements.

A quick perl one liner would work well for this too...

$ perl -ne 'if (YOURLINENUMBER..YOURLINENUMBER) {print $_; last;}' /path/to/your/file

继续阅读：bash

Getting one line in a huge file with bash

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？