Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?

2023-03-14 07:58 问答作者：

I'd like to be able to dump a dictionary containing long strings that I'd like to have in the block style for readability. For example:

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block

PyYAML supports the loading of documents with this style but I can'开发者_如何学编程t seem to find a way to dump documents this way. Am I missing something?

import yaml

class folded_unicode(unicode): pass
class literal_unicode(unicode): pass

def folded_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')

yaml.add_representer(folded_unicode, folded_unicode_representer)
yaml.add_representer(literal_unicode, literal_unicode_representer)

data = {
    'literal':literal_unicode(
        u'by hjw              ___\n'
         '   __              /.-.\\\n'
         '  /  )_____________\\\\  Y\n'
         ' /_ /=== == === === =\\ _\\_\n'
         '( /)=== == === === == Y   \\\n'
         ' `-------------------(  o  )\n'
         '                      \\___/\n'),
    'folded': folded_unicode(
        u'It removes all ordinary curses from all equipped items. '
        'Heavy or permanent curses are unaffected.\n')}

print yaml.dump(data)

The result:

folded: >
  It removes all ordinary curses from all equipped items. Heavy or permanent curses
  are unaffected.
literal: |
  by hjw              ___
     __              /.-.\
    /  )_____________\\  Y
   /_ /=== == === === =\ _\_
  ( /)=== == === === == Y   \
   `-------------------(  o  )
                        \___/

For completeness, one should also have str implementations, but I'm going to be lazy :-)

pyyaml does support dumping literal or folded blocks.

Using `Representer.add_representer`

defining types:

class folded_str(str): pass

class literal_str(str): pass

class folded_unicode(unicode): pass

class literal_unicode(str): pass

Then you can define the representers for those types. Please note that while Gary's solution works great for unicode, you may need some more work to get strings to work right (see implementation of represent_str).

def change_style(style, representer):
    def new_representer(dumper, data):
        scalar = representer(dumper, data)
        scalar.style = style
        return scalar
    return new_representer

import yaml
from yaml.representer import SafeRepresenter

# represent_str does handle some corner cases, so use that
# instead of calling represent_scalar directly
represent_folded_str = change_style('>', SafeRepresenter.represent_str)
represent_literal_str = change_style('|', SafeRepresenter.represent_str)
represent_folded_unicode = change_style('>', SafeRepresenter.represent_unicode)
represent_literal_unicode = change_style('|', SafeRepresenter.represent_unicode)

Then you can add those representers to the default dumper:

yaml.add_representer(folded_str, represent_folded_str)
yaml.add_representer(literal_str, represent_literal_str)
yaml.add_representer(folded_unicode, represent_folded_unicode)
yaml.add_representer(literal_unicode, represent_literal_unicode)

... and test it:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
}

print yaml.dump(data)

result:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal

Using `default_style`

If you are interested in having all your strings follow a default style, you can also use the default_style keyword argument, e.g:

>>> data = { 'foo': 'line1\nline2\nline3' }
>>> print yaml.dump(data, default_style='|')
"foo": |-
  line1
  line2
  line3

or for folded literals:

>>> print yaml.dump(data, default_style='>')
"foo": >-
  line1

  line2

  line3

or for double-quoted literals:

>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\nline3"

Caveats:

Here is an example of something you may not expect:

data = {
    'foo': literal_str('this is a\nblock literal'),
    'bar': folded_unicode('this is a folded block'),
    'non-printable': literal_unicode('this has a \t tab in it'),
    'leading': literal_unicode('   with leading white spaces'),
    'trailing': literal_unicode('with trailing white spaces  '),
}
print yaml.dump(data)

results in:

bar: >-
  this is a folded block
foo: |-
  this is a
  block literal
leading: |2-
     with leading white spaces
non-printable: "this has a \t tab in it"
trailing: "with trailing white spaces  "

1) non-printable characters

See the YAML spec for escaped characters (Section 5.7):

Note that escape sequences are only interpreted in double-quoted scalars. In all other scalar styles, the “\” character has no special meaning and non-printable characters are not available.

If you want to preserve non-printable characters (e.g. TAB), you need to use double-quoted scalars. If you are able to dump a scalar with literal style, and there is a non-printable character (e.g. TAB) in there, your YAML dumper is non-compliant.

E.g. pyyaml detects the non-printable character \t and uses the double-quoted style even though a default style is specified:

>>> data = { 'foo': 'line1\nline2\n\tline3' }
>>> print yaml.dump(data, default_style='"')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='>')
"foo": "line1\nline2\n\tline3"

>>> print yaml.dump(data, default_style='|')
"foo": "line1\nline2\n\tline3"

2) leading and trailing white spaces

Another bit of useful information in the spec is:

All leading and trailing white space characters are excluded from the content

This means that if your string does have leading or trailing white space, these would not be preserved in scalar styles other than double-quoted. As a consequence, pyyaml tries to detect what is in your scalar and may force the double-quoted style.

This can be relatively easily done, the only "hurdle" being how to indicate which of the spaces in the string, that needs to be represented as a folded scalar, needs to become a fold. The literal scalar has explicit newlines containing that information, but this cannot be used for folded scalars, as they can contain explicit newlines e.g. in case there is leading whitespace and also needs a newline at the end in order not to be represented with a stripping chomping indicator (>-)

import sys
import ruamel.yaml

folded = ruamel.yaml.scalarstring.FoldedScalarString
literal = ruamel.yaml.scalarstring.LiteralScalarString

yaml = ruamel.yaml.YAML()

data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=folded('this is a folded block\n'),
)

data['bar'].fold_pos = [data['bar'].index(' folded')]

yaml.dump(data, sys.stdout)

which gives:

foo: |
  this is a
  block literal
bar: >
  this is a
  folded block

The fold_pos attribute expects a reversable iterable, representing positions of spaces indicating where to fold.

If you never have pipe characters ('|') in your strings you could have done something like:

import re

s = 'this is a|folded block\n'
sf = folded(s.replace('|', ' '))  # need to have a space!
sf.fold_pos = [x.start() for x in re.finditer('\|', s)]  # | is special in re, needs escaping


data = dict(
    foo=literal('this is a\nblock literal\n'), 
    bar=sf,  # need to have a space
)

yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

which also gives exactly the output you expect

继续阅读：python pyyaml

Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?

Using `Representer.add_representer`

Using `default_style`

Caveats:

1) non-printable characters

2) leading and trailing white spaces

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Using Representer.add_representer

Using default_style

Caveats:

1) non-printable characters

2) leading and trailing white spaces

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Using `Representer.add_representer`

Using `default_style`

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？