How do I compare Rpm versions in python

2023-01-06 16:54 问答作者：

I'm trying to find out how I can compare 2 lists of RPMS (Currently installed) and (Available in local repository) and see which RPMS are out of date. I've been tinkering with regex but there are so many different naming standards for RPMS that i can't get a good list to work with. I don't have the actual RPMS on my drive so i can't do rpm -qif.

pattern1 = re.compile(r'^([a-zA-Z0-9_\开发者_运维技巧-\+]*)-([a-zA-Z0-9_\.]*)-([a-zA-Z0-9_\.]*)\.(.*)')
for rpm in listOfRpms:
     packageInfo = pattern1.search(rpm[0]).groups()
     print packageInfo

This works for a vast majority but not all (2300 / 2400)

  yum-metadata-parser-1.1.2-2.el5
('yum-metadata-parser', '1.1.2', '2', 'el5') **What I need

But none these work for instance unless I break some others that worked before..

wvdial-1.54.0-3
xdelta-1.1.3-20
xdelta-1.1.3-20_2
xmlsec1-1.2.6-3
xmlsec1-1.2.6-3_2
ypbind-1.17.2-13
ypbind-1.17.2-8
ypserv-2.13-14
zip-2.3-27
zlib-1.2.3-3
zlib-1.2.3-3_2
zsh-4.2.6-1

In RPM parlance, 2.el5 is the release field; 2 and el5 are not separate fields. However, release need not have a . in it as your examples show. Drop the \.(.*) from the end to capture the release field in one shot.

So now you have a package name, version, and release. The easiest way to compare them is to use rpm's python module:

import rpm
# t1 and t2 are tuples of (version, release)
def compare(t1, t2):
    v1, r1 = t1
    v2, r2 = t2
    return rpm.labelCompare(('1', v1, r1), ('1', v2, r2))

What's that extra '1', you ask? That's epoch, and it overrides other version comparison considerations. Further, it's generally not available in the filename. Here, we're faking it to '1' for purposes of this exercise, but that may not be accurate at all. This is one of two reasons your logic is going to be off if you're going by file names alone.

The other reason that your logic may be different from rpm's is the Obsoletes field, which allows a package to be upgraded to a package with an entirely different name. If you're OK with these limitations, then proceed.

If you don't have the rpm python library at hand, here's the logic for comparing each of release, version, and epoch as of rpm 4.4.2.3:

Search each string for alphabetic fields [a-zA-Z]+ and numeric fields [0-9]+ separated by junk [^a-zA-Z0-9]*.
Successive fields in each string are compared to each other.
Alphabetic sections are compared lexicographically, and the numeric sections are compared numerically.
In the case of a mismatch where one field is numeric and one is alphabetic, the numeric field is always considered greater (newer).
In the case where one string runs out of fields, the other is always considered greater (newer).

See lib/rpmvercmp.c in the RPM source for the gory details.

Here's a working program based off of rpmdev-vercmp from the rpmdevtools package. You shouldn't need anything special installed but yum (which provides the rpmUtils.miscutils python module) for it to work.

The advantage over the other answers is you don't need to parse anything out, just feed it full RPM name-version strings like:

$ ./rpmcmp.py bash-3.2-32.el5_9.1 bash-3.2-33.el5.1
0:bash-3.2-33.el5.1 is newer
$ echo $?
12

Exit status 11 means the first one is newer, 12 means the second one is newer.

#!/usr/bin/python

import rpm
import sys
from rpmUtils.miscutils import stringToVersion

if len(sys.argv) != 3:
    print "Usage: %s <rpm1> <rpm2>"
    sys.exit(1)

def vercmp((e1, v1, r1), (e2, v2, r2)):
    return rpm.labelCompare((e1, v1, r1), (e2, v2, r2))

(e1, v1, r1) = stringToVersion(sys.argv[1])
(e2, v2, r2) = stringToVersion(sys.argv[2])

rc = vercmp((e1, v1, r1), (e2, v2, r2))
if rc > 0:
    print "%s:%s-%s is newer" % (e1, v1, r1)
    sys.exit(11)

elif rc == 0:
    print "These are equal"
    sys.exit(0)

elif rc < 0:
    print "%s:%s-%s is newer" % (e2, v2, r2)
    sys.exit(12)

since the python rpm package seems quite outdated and not available in pip; I wrote a small implementation that works for most package versions; including the magic around ~ signs. This won't cover 100% of the real implementation, but it does the trick for most packages:

def rpm_sort(elements):
    """ sort list elements using 'natural sorting': 1.10 > 1.9 etc...
        taking into account special characters for rpm (~) """

    alphabet = "~0123456789abcdefghijklmnopqrstuvwxyz-."

    def convert(text):
        return [int(text)] if text.isdigit() else ([alphabet.index(letter) for letter in text.lower()] if text else [1])

    def alphanum_key(key):
        return [convert(c) for c in re.split('([0-9]+)', key)]
    return sorted(elements, key=alphanum_key)

tested:

rpms = ['my-package-0.2.1-0.dev.20180810',
        'my-package-0.2.2-0~.dev.20181011',
        'my-package-0.2.2-0~.dev.20181012',
        'my-package-0.2.2-0',
        'my-package-0.2.2-0.dev.20181217']
self.assertEqual(rpms, rpm_sort(rpms))

Not covered

For the moment there is only one case I know that is not covered, but some others might pop up: word~ > word while according to rpm specification the inverse should be true (any word ending with letters and then a final ~)

RPM has python bindings, which lets you use rpmUtils.miscutils.compareEVR. The first and third arguments of the tuple are the package name and the packaging version. The middle is the version. In the example below, I'm trying to figure out where 3.7.4a gets sorted.

[root@rhel56 ~]# python
Python 2.4.3 (#1, Dec 10 2010, 17:24:35) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import rpmUtils.miscutils
>>> rpmUtils.miscutils.compareEVR(("foo", "3.7.4", "1"), ("foo", "3.7.4", "1"))
0
>>> rpmUtils.miscutils.compareEVR(("foo", "3.7.4", "1"), ("foo", "3.7.4a", "1")) 
-1
>>> rpmUtils.miscutils.compareEVR(("foo", "3.7.4a", "1"), ("foo", "3.7.4", "1")) 
1

Based on Owen S's excellent answer, I put together a snippet that uses the system RPM bindings if available, but falls back to a regex based emulation otherwise:

try:
    from rpm import labelCompare as _compare_rpm_labels
except ImportError:
    # Emulate RPM field comparisons
    #
    # * Search each string for alphabetic fields [a-zA-Z]+ and
    #   numeric fields [0-9]+ separated by junk [^a-zA-Z0-9]*.
    # * Successive fields in each string are compared to each other.
    # * Alphabetic sections are compared lexicographically, and the
    #   numeric sections are compared numerically.
    # * In the case of a mismatch where one field is numeric and one is
    #   alphabetic, the numeric field is always considered greater (newer).
    # * In the case where one string runs out of fields, the other is always
    #   considered greater (newer).

    import warnings
    warnings.warn("Failed to import 'rpm', emulating RPM label comparisons")

    try:
        from itertools import zip_longest
    except ImportError:
        from itertools import izip_longest as zip_longest

    _subfield_pattern = re.compile(
        r'(?P<junk>[^a-zA-Z0-9]*)((?P<text>[a-zA-Z]+)|(?P<num>[0-9]+))'
    )

    def _iter_rpm_subfields(field):
        """Yield subfields as 2-tuples that sort in the desired order

        Text subfields are yielded as (0, text_value)
        Numeric subfields are yielded as (1, int_value)
        """
        for subfield in _subfield_pattern.finditer(field):
            text = subfield.group('text')
            if text is not None:
                yield (0, text)
            else:
                yield (1, int(subfield.group('num')))

    def _compare_rpm_field(lhs, rhs):
        # Short circuit for exact matches (including both being None)
        if lhs == rhs:
            return 0
        # Otherwise assume both inputs are strings
        lhs_subfields = _iter_rpm_subfields(lhs)
        rhs_subfields = _iter_rpm_subfields(rhs)
        for lhs_sf, rhs_sf in zip_longest(lhs_subfields, rhs_subfields):
            if lhs_sf == rhs_sf:
                # When both subfields are the same, move to next subfield
                continue
            if lhs_sf is None:
                # Fewer subfields in LHS, so it's less than/older than RHS
                return -1
            if rhs_sf is None:
                # More subfields in LHS, so it's greater than/newer than RHS
                return 1
            # Found a differing subfield, so it determines the relative order
            return -1 if lhs_sf < rhs_sf else 1
        # No relevant differences found between LHS and RHS
        return 0


    def _compare_rpm_labels(lhs, rhs):
        lhs_epoch, lhs_version, lhs_release = lhs
        rhs_epoch, rhs_version, rhs_release = rhs
        result = _compare_rpm_field(lhs_epoch, rhs_epoch)
        if result:
            return result
        result = _compare_rpm_field(lhs_version, rhs_version)
        if result:
            return result
        return _compare_rpm_field(lhs_release, rhs_release)

Note that I haven't tested this extensively for consistency with the C level implementation - I only use it as a fallback implementation that's at least good enough to let Anitya's test suite pass in environments where system RPM bindings aren't available.

A much simpler regex is /^(.+)-(.+)-(.+)\.(.+)\.rpm$/

I'm not aware of any restrictions on the package name (first capture). The only restrictions on version and release are that they do not contain '-'. There is no need to code this, as the uncaptured '-'s separate those fields, thus if one did have a '-' it would be split and not be a single feild, ergo the resulting capture would not contain a '-'. Only the first capture, the name, contains any '-' because it consumes all extraneous '-' first.

Then, there's the architecture, which this regex assumes no restriction on the architecture name, except that it not contain a '.'.

The capture results are [name, version, release, arch]

Caveats from Owen's answer about relying on the rpm name alone still apply.

Now you have to compare the version strings, which is not straightforward. I don't believe that can be done with a regex. You'd need to implement the comparison algorithm.

继续阅读：python regex rpm

How do I compare Rpm versions in python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？