Find and replace text in a 47GB large file [closed]
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this questionI have to do some find and replace tasks on a rather big file , a开发者_StackOverflow中文版bout 47 GB in size .
Does anybody know how to do this ? I tried using services like TextCrawler , EditpadLite and more but nothing supports this large a file .
I'm assuming this can be done via the commandline .
Do you have an idea how this can be accomplished ?
Sed (stream editor for filtering and transforming text) is your friend.
sed -i 's/old text/new text/g' file
Sed performs text transformations in a single pass.
I use FART - Find And Replace Text by Lionello Lunesu.
It works very well on Windows Seven x64.
You can find and replace the text using this command:
fart -c big_filename.txt "find_this_text" "replace_to_this"
github
On Unix or Mac:
sed 's/oldstring/newstring/g' oldfile.txt > newfile.txt
fast and easy...
I solved the problem usig, before, split to reduce the large file in smalls with 100 MB each.
If you are using a Unix like system then you can use cat | sed to do this
cat hosted_domains.txt | sed s/com/net/g
Example replaces com with net in a list of domain names and then you can pipe the output to a file.
For me none of the tools suggested here work well. Textcrawler ate all my computer's memory, SED didn't work at all, Editpad complained about memory...
The solution is: create your own script in python, perl or even C++.
Or use the tool PowerGrep, this is the easiest and fastest option.
I have't tried fart, it's only command line and maybe not very friendly.
Some hex editor, such as Ultraedit also work well.
I used
sed 's/[nN]//g' oldfile.fasta > newfile.fasta
to replace all the instances of n's in my 7Gb file.
If I omitted the > newfile.fasta
aspect it took ages as it scrolled up the screen showing me every line of the file.
With the > newfile
it ran it in a matter of seconds on an ubuntu server
精彩评论