How to do sed like text replace with python?
I would like to enable all apt repositories in this file
cat /etc/apt/sources.list
## Note, this file is written by cloud-init on first boot of an instance
## modifications made here will not survive a re-bundle.
## if you wish to make changes you can:
## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
## or do the same in user-data
## b.) add sources in /etc/apt/sources.list.d
#
# See http://help.ubuntu.com/community/UpgradeNotes for how开发者_如何学JAVA to upgrade to
# newer versions of the distribution.
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main
## Major bug fix updates produced after the final release of the
## distribution.
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team. Also, please note that software in universe WILL NOT receive any
## review or updates from the Ubuntu security team.
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse
## Uncomment the following two lines to add software from the 'backports'
## repository.
## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse
## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by Canonical and the
## respective vendors as a service to Ubuntu users.
# deb http://archive.canonical.com/ubuntu maverick partner
# deb-src http://archive.canonical.com/ubuntu maverick partner
deb http://security.ubuntu.com/ubuntu maverick-security main
deb-src http://security.ubuntu.com/ubuntu maverick-security main
deb http://security.ubuntu.com/ubuntu maverick-security universe
deb-src http://security.ubuntu.com/ubuntu maverick-security universe
# deb http://security.ubuntu.com/ubuntu maverick-security multiverse
# deb-src http://security.ubuntu.com/ubuntu maverick-security multiverse
With sed this is a simple sed -i 's/^# deb/deb/' /etc/apt/sources.list
what's the most elegant ("pythonic") way to do this?
You can do that like this:
with open("/etc/apt/sources.list", "r") as sources:
lines = sources.readlines()
with open("/etc/apt/sources.list", "w") as sources:
for line in lines:
sources.write(re.sub(r'^# deb', 'deb', line))
The with statement ensures that the file is closed correctly, and re-opening the file in "w"
mode empties the file before you write to it. re.sub(pattern, replace, string) is the equivalent of s/pattern/replace/ in sed/perl.
Edit: fixed syntax in example
Authoring a homegrown sed
replacement in pure Python with no external commands or additional dependencies is a noble task laden with noble landmines. Who would have thought?
Nonetheless, it is feasible. It's also desirable. We've all been there, people: "I need to munge some plaintext files, but I only have Python, two plastic shoelaces, and a moldy can of bunker-grade Maraschino cherries. Help."
In this answer, we offer a best-of-breed solution cobbling together the awesomeness of prior answers without all of that unpleasant not-awesomeness. As plundra notes, David Miller's otherwise top-notch answer writes the desired file non-atomically and hence invites race conditions (e.g., from other threads and/or processes attempting to concurrently read that file). That's bad. Plundra's otherwise excellent answer solves that issue while introducing yet more – including numerous fatal encoding errors, a critical security vulnerability (failing to preserve the permissions and other metadata of the original file), and premature optimization replacing regular expressions with low-level character indexing. That's also bad.
Awesomeness, unite!
import re, shutil, tempfile
def sed_inplace(filename, pattern, repl):
'''
Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
`sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
'''
# For efficiency, precompile the passed regular expression.
pattern_compiled = re.compile(pattern)
# For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
# writing with updating). This is usually a good thing. In this case,
# however, binary writing imposes non-trivial encoding constraints trivially
# resolved by switching to text writing. Let's do that.
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
with open(filename) as src_file:
for line in src_file:
tmp_file.write(pattern_compiled.sub(repl, line))
# Overwrite the original file with the munged temporary file in a
# manner preserving file attributes (e.g., permissions).
shutil.copystat(filename, tmp_file.name)
shutil.move(tmp_file.name, filename)
# Do it for Johnny.
sed_inplace('/etc/apt/sources.list', r'^\# deb', 'deb')
massedit.py (http://github.com/elmotec/massedit) does the scaffolding for you leaving just the regex to write. It's still in beta but we are looking for feedback.
python -m massedit -e "re.sub(r'^# deb', 'deb', line)" /etc/apt/sources.list
will show the differences (before/after) in diff format.
Add the -w option to write the changes to the original file:
python -m massedit -e "re.sub(r'^# deb', 'deb', line)" -w /etc/apt/sources.list
Alternatively, you can now use the api:
>>> import massedit
>>> filenames = ['/etc/apt/sources.list']
>>> massedit.edit_files(filenames, ["re.sub(r'^# deb', 'deb', line)"], dry_run=True)
This is such a different approach, I don't want to edit my other answer.
Nested with
since I don't use 3.1 (Where with A() as a, B() as b:
works).
Might be a bit overkill to change sources.list, but I want to put it out there for future searches.
#!/usr/bin/env python
from shutil import move
from tempfile import NamedTemporaryFile
with NamedTemporaryFile(delete=False) as tmp_sources:
with open("sources.list") as sources_file:
for line in sources_file:
if line.startswith("# deb"):
tmp_sources.write(line[2:])
else:
tmp_sources.write(line)
move(tmp_sources.name, sources_file.name)
This should ensure no race conditions of other people reading the file. Oh, and I prefer str.startswith(...) when you can do without a regexp.
If you are using Python3 the following module will help you: https://github.com/mahmoudadel2/pysed
wget https://raw.githubusercontent.com/mahmoudadel2/pysed/master/pysed.py
Place the module file into your Python3 modules path, then:
import pysed
pysed.replace(<Old string>, <Replacement String>, <Text File>)
pysed.rmlinematch(<Unwanted string>, <Text File>)
pysed.rmlinenumber(<Unwanted Line Number>, <Text File>)
If I want something like sed, then I usually just call sed
itself using the sh library.
from sh import sed
sed(['-i', 's/^# deb/deb/', '/etc/apt/sources.list'])
Sure, there are downsides. Like maybe the locally installed version of sed
isn't the same as the one you tested with. In my cases, this kind of thing can be easily handled at another layer (like by examining the target environment beforehand, or deploying in a docker image with a known version of sed).
Try pysed:
pysed -r '# deb' 'deb' /etc/apt/sources.list
You could do something like:
p = re.compile("^\# *deb", re.MULTILINE)
text = open("sources.list", "r").read()
f = open("sources.list", "w")
f.write(p.sub("deb", text))
f.close()
Alternatively (imho, this is better from organizational standpoint) you could split your sources.list
into pieces (one entry/one repository) and place them under /etc/apt/sources.list.d/
If you really want to use a sed
command without installing a new Python module, you could simply do the following:
import subprocess
subprocess.call("sed command")
Not sure about elegant, but this ought to be pretty readable at least. For a sources.list it's fine to read all the lines before hand, for something larger you might want to change "in place" while looping through it.
#!/usr/bin/env python
# Open file for reading and writing
with open("sources.list", "r+") as sources_file:
# Read all the lines
lines = sources_file.readlines()
# Rewind and truncate
sources_file.seek(0)
sources_file.truncate()
# Loop through the lines, adding them back to the file.
for line in lines:
if line.startswith("# deb"):
sources_file.write(line[2:])
else:
sources_file.write(line)
EDIT: Use with
-statement for better file-handling. Also forgot to rewind before truncate before.
Cecil Curry has a great answer, however his answer only works for multiline regular expressions. Multiline regular expressions are more rarely used, but they are handy sometimes.
Here is an improvement upon his sed_inplace function that allows it to function with multiline regular expressions if asked to do so.
WARNING: In multiline mode, it will read the entire file in, and then perform the regular expression substitution, so you'll only want to use this mode on small-ish files - don't try to run this on gigabyte-sized files when running in multiline mode.
import re, shutil, tempfile
def sed_inplace(filename, pattern, repl, multiline = False):
'''
Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
`sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
'''
re_flags = 0
if multiline:
re_flags = re.M
# For efficiency, precompile the passed regular expression.
pattern_compiled = re.compile(pattern, re_flags)
# For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
# writing with updating). This is usually a good thing. In this case,
# however, binary writing imposes non-trivial encoding constraints trivially
# resolved by switching to text writing. Let's do that.
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
with open(filename) as src_file:
if multiline:
content = src_file.read()
tmp_file.write(pattern_compiled.sub(repl, content))
else:
for line in src_file:
tmp_file.write(pattern_compiled.sub(repl, line))
# Overwrite the original file with the munged temporary file in a
# manner preserving file attributes (e.g., permissions).
shutil.copystat(filename, tmp_file.name)
shutil.move(tmp_file.name, filename)
from os.path import expanduser
sed_inplace('%s/.gitconfig' % expanduser("~"), r'^(\[user\]$\n[ \t]*name = ).*$(\n[ \t]*email = ).*', r'\1John Doe\2jdoe@example.com', multiline=True)
Here's a one-module Python replacement for perl -p
:
# Provide compatibility with `perl -p`
# Usage:
#
# python -mloop_over_stdin_lines '<program>'
# In, `<program>`, use the variable `line` to read and change the current line.
# Example:
#
# python -mloop_over_stdin_lines 'line = re.sub("pattern", "replacement", line)'
# From the perlrun documentation:
#
# -p causes Perl to assume the following loop around your
# program, which makes it iterate over filename arguments
# somewhat like sed:
#
# LINE:
# while (<>) {
# ... # your program goes here
# } continue {
# print or die "-p destination: $!\n";
# }
#
# If a file named by an argument cannot be opened for some
# reason, Perl warns you about it, and moves on to the next
# file. Note that the lines are printed automatically. An
# error occurring during printing is treated as fatal. To
# suppress printing use the -n switch. A -p overrides a -n
# switch.
#
# "BEGIN" and "END" blocks may be used to capture control
# before or after the implicit loop, just as in awk.
#
import re
import sys
for line in sys.stdin:
exec(sys.argv[1], globals(), locals())
try:
print line,
except:
sys.exit('-p destination: $!\n')
I wanted to be able to find and replace text but also include matched groups in the content I insert. I wrote this short script to do that:
https://gist.github.com/turtlemonvh/0743a1c63d1d27df3f17
The key component of that is something that looks like like this:
print(re.sub(pattern, template, text).rstrip("\n"))
Here's an example of how that works:
# Find everything that looks like 'dog' or 'cat' followed by a space and a number
pattern = "((cat|dog) (\d+))"
# Replace with 'turtle' and the number. '3' because the number is the 3rd matched group.
# The double '\' is needed because you need to escape '\' when running this in a python shell
template = "turtle \\3"
# The text to operate on
text = "cat 976 is my favorite"
Calling the above function with this yields:
turtle 976 is my favorite
[None of the answers works properly above !]
I have a case of multiple key-value replacement in one file around 1000 lines. And after replacement the file structure should keep the same. for example:
key1=value_tobe_replaced1
key2=value_tobe_replaced1
. .
. .
key1000=value_tobe_replaced1000
I've tried:
the voted answer from @elmotec for massedit.
answer from @Cecil Curry.
answer from @Keithel.
The three answers definitely helped me a lot but after test I found it costs nearly 40-50s for 1st and 2ed. 3rd is not suitable for multi-replacement so I fixed it.
Notice: refer to the answers before go on.
Here's my code:
Line replacement mode:
start_time = datetime.datetime.now()
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
with open(abs_keypair_file) as kf:
for line in kf:
line_to_write = ''
match_flag = False
for (key, value) in tuple_list:
# print ' %s = %r' % (key, value)
if not re.search(patten, line, flags=re.I):
continue
line_to_write = re.sub(r'\$\({}\)'.format(key), value, line, flags=re.I)
match_flag = True
if not match_flag:
line_to_write = line
tmp_file.write(line_to_write)
shutil.copystat(abs_keypair_file, tmp_file.name)
shutil.move(tmp_file.name, abs_keypair_file)
time_costs = datetime.datetime.now() - start_time
print 'time costs: %s' % time_costs
time costs: 0:00:42.533879
file replacement mode:
start_time = datetime.datetime.now()
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
with open(abs_keypair_file) as kf:
text = kf.read()
for (key, value) in tuple_list:
text = re.sub(patten, value, text, flags=re.M|re.I)
tmp_file.write(text)
shutil.copystat(abs_keypair_file, tmp_file.name)
shutil.move(tmp_file.name, abs_keypair_file)
time_costs = datetime.datetime.now() - start_time
print 'time costs: %s' % time_costs
time costs: 0:00:00.348458
So I suggest if you match my case and your file size is not too large you may follow file replacement mode
.
How to replace if file size is huge? I have no idea.
Hope this helps.
精彩评论