Some question of reassembling TCP stream
I'm implementing an IPS system, and I'm a litt开发者_如何学Pythonle confused when observing the procesure of TCP stream reassembling by wireshark.
For example, the server transfer a HTML page to the client. The page is divided into 4 parts and encapsulated by TCP packet. Then the server push another 4 TCP packets to the client for a JavaScript text.
My question is, I know I can determine their sequences by measuring their Seq and Len, but how can I determine the end of the HTML text? How can I know the HTML contains 4 TCP packets but not 5?
RFC 2616 section 4.4 states that the message length could be given in several ways:
- By the
Content-Length
header if one is defined. (This is probably the case you're seeing, and it's relatively simple. If you know the position (seq+offset within packet) of the start of body and the message length, you can just add to get the position of the end.) - By chunked encoding. The RFC has the details, but it has a similar encoding for each chunk and a way of noting the final chunk.
multipart/byteranges
(which you won't see unless the client asked for it, and it probably won't for an HTML document).- Or until the TCP connection is closed. (In particular, until a
FIN
packet is sent from the server to the client, which only happens on a clean close; you'd see anRST
otherwise.)
精彩评论