Validating a yaml document in python
One of the benefi开发者_运维百科ts of XML is being able to validate a document against an XSD. YAML doesn't have this feature, so how can I validate that the YAML document I open is in the format expected by my application?
Given that JSON and YAML are pretty similar beasts, you could make use of JSON-Schema to validate a sizable subset of YAML. Here's a code snippet (you'll need PyYAML and jsonschema installed):
from jsonschema import validate
import yaml
schema = """
type: object
properties:
testing:
type: array
items:
enum:
- this
- is
- a
- test
"""
good_instance = """
testing: ['this', 'is', 'a', 'test']
"""
validate(yaml.load(good_instance), yaml.load(schema)) # passes
# Now let's try a bad instance...
bad_instance = """
testing: ['this', 'is', 'a', 'bad', 'test']
"""
validate(yaml.load(bad_instance), yaml.load(schema))
# Fails with:
# ValidationError: 'bad' is not one of ['this', 'is', 'a', 'test']
#
# Failed validating 'enum' in schema['properties']['testing']['items']:
# {'enum': ['this', 'is', 'a', 'test']}
#
# On instance['testing'][3]:
# 'bad'
One problem with this is that if your schema spans multiple files and you use "$ref"
to reference the other files then those other files will need to be JSON, I think. But there are probably ways around that. In my own project, I'm playing with specifying the schema using JSON files whilst the instances are YAML.
I find Cerberus to be very reliable with great documentation and straightforward to use.
Here is a basic implementation example:
my_yaml.yaml
:
name: 'my_name'
date: 2017-10-01
metrics:
percentage:
value: 87
trend: stable
Defining the validation schema in schema.py
:
{
'name': {
'required': True,
'type': 'string'
},
'date': {
'required': True,
'type': 'date'
},
'metrics': {
'required': True,
'type': 'dict',
'schema': {
'percentage': {
'required': True,
'type': 'dict',
'schema': {
'value': {
'required': True,
'type': 'number',
'min': 0,
'max': 100
},
'trend': {
'type': 'string',
'nullable': True,
'regex': '^(?i)(down|equal|up)$'
}
}
}
}
}
}
Using the PyYaml to load a yaml
document:
import yaml
def load_doc():
with open('./my_yaml.yaml', 'r') as stream:
try:
return yaml.load(stream)
except yaml.YAMLError as exception:
raise exception
## Now, validating the yaml file is straightforward:
from cerberus import Validator
schema = eval(open('./schema.py', 'r').read())
v = Validator(schema)
doc = load_doc()
print(v.validate(doc, schema))
print(v.errors)
Keep in mind that Cerberus is an agnostic data validation tool, which means that it can support formats other than YAML, such as JSON, XML and so on.
Try Rx, it has a Python implementation. It works on JSON and YAML.
From the Rx site:
"When adding an API to your web service, you have to choose how to encode the data you send across the line. XML is one common choice for this, but it can grow arcane and cumbersome pretty quickly. Lots of webservice authors want to avoid thinking about XML, and instead choose formats that provide a few simple data types that correspond to common data structures in modern programming languages. In other words, JSON and YAML.Unfortunately, while these formats make it easy to pass around complex data structures, they lack a system for validation. XML has XML Schemas and RELAX NG, but these are complicated and sometimes confusing standards. They're not very portable to the kind of data structure provided by JSON, and if you wanted to avoid XML as a data encoding, writing more XML to validate the first XML is probably even less appealing.
Rx is meant to provide a system for data validation that matches up with JSON-style data structures and is as easy to work with as JSON itself."
You can load YAML document as a dict and use library schema to check it:
from schema import Schema, And, Use, Optional, SchemaError
import yaml
schema = Schema(
{
'created': And(datetime.datetime),
'author': And(str),
'email': And(str),
'description': And(str),
Optional('tags'): And(str, lambda s: len(s) >= 0),
'setup': And(list),
'steps': And(list, lambda steps: all('=>' in s for s in steps), error='Steps should be array of string '
'and contain "=>" to separate'
'actions and expectations'),
'teardown': And(list)
}
)
with open(filepath) as f:
data = yaml.load(f)
try:
schema.validate(data)
except SchemaError as e:
print(e)
Yes - having support for validation is vital for lots of important use cases. See e.g. YAML and the importance of Schema Validation « Stuart Gunter
As already mentioned, there is Rx, available for various languages, and Kwalify for Ruby and Java.
See also the PyYAML discussion: YAMLSchemaDiscussion.
A related effort is JSON Schema, which even had some IETF standardization activity: draft-zyp-json-schema-03 - A JSON Media Type for Describing the Structure and Meaning of JSON Documents
I worked on a similar project where I need to validate the elements of YAML.
First, I thought 'PyYAML tags' is the best and simple way. But later decided to go with 'PyKwalify' which actually defines a schema for YAML.
PyYAML tags:
The YAML file has a tag support where we can enforce this basic checks by prefixing the data type. (e.g) For integer - !!int "123"
More on PyYAML: http://pyyaml.org/wiki/PyYAMLDocumentation#Tags This is good, but if you are going to expose this to the end user, then it might cause confusion. I did some research to define a schema of YAML.
- Validate the YAML with its corresponding schema for basic data type check.
- Custom validations like IP address, random strings can be added in schema.
- Have YAML schema separately leaving YAML data simple and readable.
PyKwalify:
There is a package called PyKwalify which serves this purpose: https://pypi.python.org/pypi/pykwalify
This package best fits my requirements. I tried this with a small example in my local set up, and is working. Heres the sample schema file.
#sample schema
type: map
mapping:
Emp:
type: map
mapping:
name:
type: str
required: yes
email:
type: str
age:
type: int
birth:
type: str
Valid YAML file for this schema
---
Emp:
name: "abc"
email: "xyz@gmail.com"
age: yy
birth: "xx/xx/xxxx"
Thanks
Pydantic has not been mentioned.
From their example:
from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel
class User(BaseModel):
id: int
name = 'John Doe'
signup_ts: Optional[datetime] = None
friends: List[int] = []
# Parse your YAML into a dictionary, then validate against your model.
external_data = {
'id': '123',
'signup_ts': '2019-06-01 12:22',
'friends': [1, 2, '3'],
}
user = User(**external_data)
These look good. The yaml parser can handle the syntax erorrs, and one of these libraries can validate the data structures.
- http://pypi.python.org/pypi/voluptuous/ (I've tried this one, it is decent, if a bit sparse.)
- http://discorporate.us/projects/flatland/ (not clear how to validate files at first glance)
You can use python's yaml lib to display message/char/line/file of your loaded file.
#!/usr/bin/env python
import yaml
with open("example.yaml", 'r') as stream:
try:
print(yaml.load(stream))
except yaml.YAMLError as exc:
print(exc)
The error message can be accessed via exc.problem
Access exc.problem_mark
to get a <yaml.error.Mark>
object.
This object allows you to access attributes
- name
- column
- line
Hence you can create your own pointer to the issue:
pm = exc.problem_mark
print("Your file {} has an issue on line {} at position {}".format(pm.name, pm.line, pm.column))
I wrapped some existing json-related python libraries aiming for being able to use them with yaml
as well.
The resulting python library mainly wraps ...
jsonschema
- a validator forjson
files againstjson-schema
files, being wrapped to support validatingyaml
files againstjson-schema
files inyaml
-format as well.jsonpath-ng
- an implementation ofJSONPath
for python, being wrapped to supportJSONPath
selection directly onyaml
files.
... and is available on github:
https://github.com/yaccob/ytools
It can be installed using pip
:
pip install ytools
Validation example (from https://github.com/yaccob/ytools#validation):
import ytools
ytools.validate("test/sampleschema.yaml", ["test/sampledata.yaml"])
What you don't get out of the box yet, is validating against external schemas that are in yaml
format as well.
ytools
is not providing anything that hasn't existed before - it just makes the application of some existing solutions more flexible and more convenient.
I'm not aware of a python solution. But there is a ruby schema validator for YAML called kwalify. You should be able to access it using subprocess if you don't come across a python library.
精彩评论