What is the best file parsing solution for converting files?
I am looking for the best solution for custom file parsing for our enterprise import routines. I want to basically change one file format into a standard file format and have one routine that imports that data into the database. I need to be able to create custom scripts for each client since its difficult to get the customer to comply with a standard or template format. I have looked at PowerShell and Iron Python to do this so far but I am not sure this is the route I want to go开发者_StackOverflow中文版. I have also looked at some tools such as Talend which is a drag and drop style tool which may or may not give me what I want as far as flexibility. We are a .NET shop and have created custom code to do this in the past but I need something that is quicker to create then coding custom parsing functions each time we get a new file format in.
Depending on the complexity and variability of your work, you should consider an ETL tool like SSIS (SQL Server Integration Services).
Python is wonderful for this kind of thing. That's why we use. Each new customer transfer is a new adventure and Python gives us the flexibility to respond quickly.
Edit. All python scripts that read files are "custom file parsers". Without an actual example, it's not sensible to provide a detailed example.
with open( "some file", "r" ) as source:
for line in source:
process( line )
That's about all there is to a "custom file parser". If you're parsing .csv or .xml files, then Python has modules for that. If you're parsing fixed-format files, you'd use string slicing operations. If you're parsing other files (X12? JSON? YAML?) you'll need appropriate parsers.
Tab-Delim.
from collections import namedtuple
RecordLayout = namedtuple('RecordLayout',['field1','field2','field3',...])
def process( aLine ):
record = RecordLayout( aLine.split('\t') )
...
Fixed Layout.
from collections import namedtuple
RecordLayout = namedtuple('RecordLayout',['field1','field2','field3',...])
def process( aLine ):
fields = ( aLine[:10], aLine[10:20], aLine[20:30], ... )
record = RecordLayout( fields )
...
精彩评论