How do I compare Rpm versions in python
I'm trying to find out how I can compare 2 lists of RPMS (Currently installed) and (Available in local repository) and see which RPMS are out of date. I've been tinkering with regex but there are so many different naming standards for RPMS that i can't get a good list to work with. I don't have the actual RPMS on my drive so i can't do rpm -qif.
pattern1 = re.compile(r'^([a-zA-Z0-9_\开发者_运维技巧-\+]*)-([a-zA-Z0-9_\.]*)-([a-zA-Z0-9_\.]*)\.(.*)')
for rpm in listOfRpms:
packageInfo = pattern1.search(rpm[0]).groups()
print packageInfo
This works for a vast majority but not all (2300 / 2400)
yum-metadata-parser-1.1.2-2.el5
('yum-metadata-parser', '1.1.2', '2', 'el5') **What I need
But none these work for instance unless I break some others that worked before..
- wvdial-1.54.0-3
- xdelta-1.1.3-20
- xdelta-1.1.3-20_2
- xmlsec1-1.2.6-3
- xmlsec1-1.2.6-3_2
- ypbind-1.17.2-13
- ypbind-1.17.2-8
- ypserv-2.13-14
- zip-2.3-27
- zlib-1.2.3-3
- zlib-1.2.3-3_2
- zsh-4.2.6-1
In RPM parlance, 2.el5
is the release field; 2 and el5 are not separate fields. However, release need not have a .
in it as your examples show. Drop the \.(.*)
from the end to capture the release field in one shot.
So now you have a package name, version, and release. The easiest way to compare them is to use rpm's python module:
import rpm
# t1 and t2 are tuples of (version, release)
def compare(t1, t2):
v1, r1 = t1
v2, r2 = t2
return rpm.labelCompare(('1', v1, r1), ('1', v2, r2))
What's that extra '1'
, you ask? That's epoch, and it overrides other version comparison considerations. Further, it's generally not available in the filename. Here, we're faking it to '1' for purposes of this exercise, but that may not be accurate at all. This is one of two reasons your logic is going to be off if you're going by file names alone.
The other reason that your logic may be different from rpm
's is the Obsoletes
field, which allows a package to be upgraded to a package with an entirely different name. If you're OK with these limitations, then proceed.
If you don't have the rpm
python library at hand, here's the logic for comparing each of release, version, and epoch as of rpm 4.4.2.3
:
- Search each string for alphabetic fields
[a-zA-Z]+
and numeric fields[0-9]+
separated by junk[^a-zA-Z0-9]*
. - Successive fields in each string are compared to each other.
- Alphabetic sections are compared lexicographically, and the numeric sections are compared numerically.
- In the case of a mismatch where one field is numeric and one is alphabetic, the numeric field is always considered greater (newer).
- In the case where one string runs out of fields, the other is always considered greater (newer).
See lib/rpmvercmp.c
in the RPM source for the gory details.
Here's a working program based off of rpmdev-vercmp
from the rpmdevtools package. You shouldn't need anything special installed but yum
(which provides the rpmUtils.miscutils
python module) for it to work.
The advantage over the other answers is you don't need to parse anything out, just feed it full RPM name-version strings like:
$ ./rpmcmp.py bash-3.2-32.el5_9.1 bash-3.2-33.el5.1
0:bash-3.2-33.el5.1 is newer
$ echo $?
12
Exit status 11 means the first one is newer, 12 means the second one is newer.
#!/usr/bin/python
import rpm
import sys
from rpmUtils.miscutils import stringToVersion
if len(sys.argv) != 3:
print "Usage: %s <rpm1> <rpm2>"
sys.exit(1)
def vercmp((e1, v1, r1), (e2, v2, r2)):
return rpm.labelCompare((e1, v1, r1), (e2, v2, r2))
(e1, v1, r1) = stringToVersion(sys.argv[1])
(e2, v2, r2) = stringToVersion(sys.argv[2])
rc = vercmp((e1, v1, r1), (e2, v2, r2))
if rc > 0:
print "%s:%s-%s is newer" % (e1, v1, r1)
sys.exit(11)
elif rc == 0:
print "These are equal"
sys.exit(0)
elif rc < 0:
print "%s:%s-%s is newer" % (e2, v2, r2)
sys.exit(12)
since the python rpm package seems quite outdated and not available in pip; I wrote a small implementation that works for most package versions; including the magic around ~
signs. This won't cover 100% of the real implementation, but it does the trick for most packages:
def rpm_sort(elements):
""" sort list elements using 'natural sorting': 1.10 > 1.9 etc...
taking into account special characters for rpm (~) """
alphabet = "~0123456789abcdefghijklmnopqrstuvwxyz-."
def convert(text):
return [int(text)] if text.isdigit() else ([alphabet.index(letter) for letter in text.lower()] if text else [1])
def alphanum_key(key):
return [convert(c) for c in re.split('([0-9]+)', key)]
return sorted(elements, key=alphanum_key)
tested:
rpms = ['my-package-0.2.1-0.dev.20180810',
'my-package-0.2.2-0~.dev.20181011',
'my-package-0.2.2-0~.dev.20181012',
'my-package-0.2.2-0',
'my-package-0.2.2-0.dev.20181217']
self.assertEqual(rpms, rpm_sort(rpms))
Not covered
For the moment there is only one case I know that is not covered, but some others might pop up: word~
> word
while according to rpm specification the inverse should be true (any word ending with letters and then a final ~
)
RPM has python bindings, which lets you use rpmUtils.miscutils.compareEVR. The first and third arguments of the tuple are the package name and the packaging version. The middle is the version. In the example below, I'm trying to figure out where 3.7.4a gets sorted.
[root@rhel56 ~]# python
Python 2.4.3 (#1, Dec 10 2010, 17:24:35)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import rpmUtils.miscutils
>>> rpmUtils.miscutils.compareEVR(("foo", "3.7.4", "1"), ("foo", "3.7.4", "1"))
0
>>> rpmUtils.miscutils.compareEVR(("foo", "3.7.4", "1"), ("foo", "3.7.4a", "1"))
-1
>>> rpmUtils.miscutils.compareEVR(("foo", "3.7.4a", "1"), ("foo", "3.7.4", "1"))
1
Based on Owen S's excellent answer, I put together a snippet that uses the system RPM bindings if available, but falls back to a regex based emulation otherwise:
try:
from rpm import labelCompare as _compare_rpm_labels
except ImportError:
# Emulate RPM field comparisons
#
# * Search each string for alphabetic fields [a-zA-Z]+ and
# numeric fields [0-9]+ separated by junk [^a-zA-Z0-9]*.
# * Successive fields in each string are compared to each other.
# * Alphabetic sections are compared lexicographically, and the
# numeric sections are compared numerically.
# * In the case of a mismatch where one field is numeric and one is
# alphabetic, the numeric field is always considered greater (newer).
# * In the case where one string runs out of fields, the other is always
# considered greater (newer).
import warnings
warnings.warn("Failed to import 'rpm', emulating RPM label comparisons")
try:
from itertools import zip_longest
except ImportError:
from itertools import izip_longest as zip_longest
_subfield_pattern = re.compile(
r'(?P<junk>[^a-zA-Z0-9]*)((?P<text>[a-zA-Z]+)|(?P<num>[0-9]+))'
)
def _iter_rpm_subfields(field):
"""Yield subfields as 2-tuples that sort in the desired order
Text subfields are yielded as (0, text_value)
Numeric subfields are yielded as (1, int_value)
"""
for subfield in _subfield_pattern.finditer(field):
text = subfield.group('text')
if text is not None:
yield (0, text)
else:
yield (1, int(subfield.group('num')))
def _compare_rpm_field(lhs, rhs):
# Short circuit for exact matches (including both being None)
if lhs == rhs:
return 0
# Otherwise assume both inputs are strings
lhs_subfields = _iter_rpm_subfields(lhs)
rhs_subfields = _iter_rpm_subfields(rhs)
for lhs_sf, rhs_sf in zip_longest(lhs_subfields, rhs_subfields):
if lhs_sf == rhs_sf:
# When both subfields are the same, move to next subfield
continue
if lhs_sf is None:
# Fewer subfields in LHS, so it's less than/older than RHS
return -1
if rhs_sf is None:
# More subfields in LHS, so it's greater than/newer than RHS
return 1
# Found a differing subfield, so it determines the relative order
return -1 if lhs_sf < rhs_sf else 1
# No relevant differences found between LHS and RHS
return 0
def _compare_rpm_labels(lhs, rhs):
lhs_epoch, lhs_version, lhs_release = lhs
rhs_epoch, rhs_version, rhs_release = rhs
result = _compare_rpm_field(lhs_epoch, rhs_epoch)
if result:
return result
result = _compare_rpm_field(lhs_version, rhs_version)
if result:
return result
return _compare_rpm_field(lhs_release, rhs_release)
Note that I haven't tested this extensively for consistency with the C level implementation - I only use it as a fallback implementation that's at least good enough to let Anitya's test suite pass in environments where system RPM bindings aren't available.
A much simpler regex is /^(.+)-(.+)-(.+)\.(.+)\.rpm$/
I'm not aware of any restrictions on the package name (first capture). The only restrictions on version and release are that they do not contain '-'. There is no need to code this, as the uncaptured '-'s separate those fields, thus if one did have a '-' it would be split and not be a single feild, ergo the resulting capture would not contain a '-'. Only the first capture, the name, contains any '-' because it consumes all extraneous '-' first.
Then, there's the architecture, which this regex assumes no restriction on the architecture name, except that it not contain a '.'.
The capture results are [name, version, release, arch]
Caveats from Owen's answer about relying on the rpm name alone still apply.
Now you have to compare the version strings, which is not straightforward. I don't believe that can be done with a regex. You'd need to implement the comparison algorithm.
精彩评论