开发者

getting string.substring(N) not to choke when N > string.length

I'm writing some code that takes a report from the mainframe and converts it to a spreadsheet.

They can't edit the code on the MF to give me a delimited file, so I'm stuck dealing with it as fixed width.

It's working okay now, but I need to get it more stable before I release it for testing.

My problem is that in any given line of data, say it could have three columns of numbers, each five chars wide at positions 10, 16, and 22. If on this one particular row, there's no data for the last two cols, it won't be padded with spaces; rather, the length of the string will be only 14. So, I can't just blindly have

dim s as string = someStream.readline
a = s.substring(10, 5)
b = s.substring(16, 5)
c = s.substring(22, 5)

because it'll choke when it substrings past the length of the string.

I know I could test the length of the string before processing each row, and I have automated the filling of some of the vsariables using a counter and a loop, and using the counter*theWidthOfTheGivenVariable to jump around, but this project was a dog to start with (come on! turning a report into a spreadsheet?), but there are many different types of rows (it's not just a grid), and the code's getting ugly fast. I'd like this to be clean, clear, and maintainable for the poor sucker that gets this after me.

If it matters, here's my code so far (it's really crufty at the moment). You can see some of my/its idiocy in the processSection#data subs

So, I'开发者_开发问答m wondering

1) is there a way baked in to .NET to have string.substring not error when reading past the end of a string without wrapping it in a try...catch?

and

2) would it be appropriate in this situation to write a new string class that inherits from string that has a more friendly substring function in it?

ETA: Thanks for all the advice and knowledge everyone. I'll go with the extension. Hopefully one of these years, I'll get my chops up enough to pay someone back in kind. :)


I would implement the whole thing a little differently -

Right now, you have to parse every character sent from the mainframe at least three times: once when you create the string, and once when you parse the string, and once as you create a string for each new variable. With fixed-width lines, I'd go for a state machine that reads from the stream character by character. This isn't as hard as it sounds, and it will perform loads better because it only processes each character once. It's how the .Net framework itself handles this kind of thing, if you look at functions like String.Format().


You could write an extension method that would take care of the logic you are trying to incorporate. I'm not a VB.NET guy at heart, but I think this is the correct syntax for an extension method:

''# Comment to fix broken prettify.js syntax highlighting for attributes in vb.net
<Extension()> _
Public Shared Function SafeSubstring(ByVal aString As String, ByVal startIndex As Integer, ByVal length As Integer) As String
    If (startIndex + length > aString.Length) Then
        Return aString.Substring(startIndex)
    Else
        Return aString.Substring(startIndex, length)
    End If
End Function


You can't derive from String, so option 2 is out of the question. I'd argue that it would be a bad use of inheritance anyway, and ReadLine() wouldn't return an instance of your new string...

You could write an extension method though (assuming you're using VB9) - SafeSubstring or something like that. I don't believe there's anything in the framework already.


Option 2 seems like potentially dangerous obfuscation and serious pain to the next developer unless you make it VERY clear somehow that the new API will not choke on certain array operations. (AS per others' comments, you can't inherit the string class, but you could simply wrap it... still, i don't think that this is the recommended approach.)

Option 1, the try/catch, is ugly and will be slow if you hit that error frequently.

I honestly think that your best approach would be to explicitly check for the bounds:

To make it clear and easy to read you could use a static utility method (perhaps contained in a utility class.) This actually gives you the best of both worlds: Cleaner code with custom bounds checking and it will be very clear to whomever picks up the code after you.


Maybe consider a different approach and use some kind of regular expression or sscanf-equivalent for parsing the lines, and avoid all the explicit substrings?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜