Is there a FilterInputStream to convert \r to the local system newline?
I am one of the developers on a platform that, among other features, allows users to upload data files (from disparate sources) for processing with various scripts.
An issue keeps popping up with tab-separated data files from Excel for Mac. Excel for Mac (even OS X) ends its lines with CR characters (\r); the Linux (and modern Mac) standard is LF (\n). (Windows is CR LF, aka \r\n.) The scripts run on a Linux machine, so they absolutely fail to identify single \r characters as line terminators.
On the backend, we're feeding an InputStream into a JCR Node via its usual API. I'd like a FilterInputStream that开发者_StackOverflow社区 does the line feed conversion for us. It's not much code to write it ourselves, but that's for the obvious cases; if there's a canned library to do this, we'd much prefer it, on the grounds that hopefully other people will have worked out the edge conditions for us.
Is there an open-source library that converts pretty much any of the standard line-feed formats into LF (or the system line feed character) inside a FilterInputStream or other InputStream? A few Google searches didn't turn up anything obvious, but I'd be astonished if there isn't something.
If there isn't, what edge conditions are likely to shoot me in the foot writing this?
Use BufferedReader.readLine(). That will parse and remove whatever line terminators are present. Then when writing each line to the back end, append whatever line terminator you like.
Since you care about CSV files, do you care about empty lines in them? If not, simply write your own filter that converts all consecutive (\n\r)+ into a single \n and you're set. Note that empty rows do not produce empty lines.
精彩评论