escaping CRLF in HTTP multipart/form-data content type (iOS)

2023-02-05 14:01 问答作者：

I'm trying to post a file using the multipart/form-data content type, and I got this question:

Shouldn't I escape CRLFs when I write the content of a file? I got a code piece on the web and I think it might be wrong:

NSMutableURLRequest* req = [NSMutableURLRequest requestWithURL: url];
[req setHTTPMethod: @"POST"];

NSString* contentType = @"multipart/form-data, boundary=AaB03x";
[req setValue:contentType forHTTPHeaderField: @"Content-type"];

NSData* boundary = [@"\r\n--AaB03x\r\n" dataUsingEncoding:NSUTF8StringEncoding];
NSMutableData *postBody = [NSMutableData data];
[postBody appendData: boundary];
[postBody appendData: [@"Content-Disposition: form-data; name=\"datafile\"; filename=\"t.jpg\"" dataUsingEncoding:NSUTF8StringEncoding]];
[postBody appendData: [@"Content-Type: image/jpeg\r\n\r\n" dataUsingEncoding:NSUTF8StringEncoding]];
[postBody appendData: imageData];
[postBody 开发者_C百科appendData: boundary];
[req setHTTPBody:postBody];

This is wrong because imageData might contain \r\n sequences, right? If so, is there a way to escape CRLFs in raw data? Or am I missing something?

Thanks in advance!

This is an interesting question. Looking at the multipart media type RFC it appears that it is up to the composing agent to make sure that the boundary does not appear in the encapsulated data. In addition, it states the following:

NOTE: Because boundary delimiters must not appear in the body parts being encapsulated, a user agent must exercise care to choose a unique boundary parameter value. The boundary parameter value in the example above could have been the result of an algorithm designed to produce boundary delimiters with a very low probability of already existing in the data to be encapsulated without having to prescan the data.

I interpret this to mean that in order to be sure that the boundary value doesn't appear in the encapsulated data, you would have to scan the data for the boundary value. Because this is an unacceptably expensive operation in most cases, it's expected that user agents will simply choose a value that has a very low probability of occurring in the data.

Consider the probability of the boundary in your example occurring in a random string of bytes (which for the sake of argument, we will assume represents a JPEG image). The full string that would need to be matched in order to end your image data early would be "\r\n--AaB03x" - 10 bytes, or 80 bits. Starting from any bit, the chance that the next 10 bytes are that sequence is one in 2^80. In a 1MB JPEG file, there are 2^23 bits. This means that the chance of a JPEG file containing the sequence is less than 2^23/2^80, or one in 2^57 (more than one hundred quadrillion).

So, I think the answer is that to be 100% sure, you would have to check the data for the boundary sequence, and then use a different one if that boundary sequence exists in the data. But in practice, the chances of the boundary sequence occurring are small enough that it's not worth it.

Technically speaking, it is wrong because the trailing \r\n should not be a part of boundary as stated in RFC2046. The trailing \r\n should be a part of transport-padding, but in practice, it shouldn't matter because you're gonna put it after the boundary anyways.

Also I take it that the whole sequence is to be avoided, not subsequences.

继续阅读：http multipartform-data

escaping CRLF in HTTP multipart/form-data content type (iOS)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？