Regular expression regex help; ignore random blocks of data
I am doing a regex search on binary files, and I've just discovered a problem, every so often, a 64 byte checksum is used, which throws my searches out. What I want to know is; is there a way to ignore these 64bytes, regardless of where they appear in my data?
My regex is \x18\xC0\x40[\x42\x43][\x00\x01]\x00\x00\x00
my problem is illustrated below;
0230开发者_如何学编程000000FF45198085B918C0404301
FFFFFFFFFFFFFFFFC03CCFFFFFFFFFFF
FFFFFFFFFFFFFFFF3C0CFFFFFFFFFFFF
FFFFFFFFFFFFFFFF0300F0FFFFFFFFFF
FFFFFFFFFFFFFFFF030F0FFFFFFF4700
000000B9000000003C8085B9EDDF0000
In my example, my regex (values needed in bold) obviously doesn't pick up my pattern match. This can happen at any point in the required data as well.
An observation for the checksum data is it always ends 4700, and it is always 8 bytes of FF, followed by 3-4 bytes of values, followed by 4-5 bytes of FF again.
Any help would be greatly appreciated, thanks James
You should probably use two passes for your search. In the first pass you delete all these checksum block, which should be easy enough to identify, in the second pass you do your actual search.
Otherwise, you'd have to allow for a checksum block after each letter of your expression, resulting in a very long and hard to read one.
\x18\xC0\x40[\x42\x43][\x00\x01][^\x00\x00\x00]*\x00\x00\x00
Try this :
\x18\xC0\x40[\x42\x43][\x00\x01](?:\x00{8}[\x00-\xFF]*?\x47\x00)\x00{3}
Updated, this will work if checksum is everywhere. I inserted linefeeds for readability
\x18(?:\x00{8}[\x00-\xFF]*?\x47\x00)
\xC0(?:\x00{8}[\x00-\xFF]*?\x47\x00)
\x40(?:\x00{8}[\x00-\xFF]*?\x47\x00)
[\x42\x43](?:\x00{8}[\x00-\xFF]*?\x47\x00)
[\x00\x01](?:\x00{8}[\x00-\xFF]*?\x47\x00)
\x00(?:\x00{8}[\x00-\xFF]*?\x47\x00)
\x00(?:\x00{8}[\x00-\xFF]*?\x47\x00)
\x00
精彩评论