How to extract file data from an HTTP MIME-encoded message in Linux?

2023-01-26 07:42 问答作者：

I have a program that accepts HTTP post of files and write all the POST result into a file, I want to write a script to delete the HTTP headers, only leave the binary file data, how to do it?

The file content is below (the data between Content-Type: application/octet-stream and ------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3 is what I want:

POST /?user_name=vvvvvvvv&size=837&file_name=logo.gif& HTTP/1.1^M
Accept: text/*^M
Content-Type: multipart/form-data; boundary=----------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
User-Agent: Shockwave Flash^M
Host: 192.168.0.198:9998^M
Content-Length: 1251^M
Connection: Keep-Alive^M
Cache-Control: no-cache^M
Cookie: cb_fullname=ddddddd; cb_user_name=cdc^M
^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filename"^M
^M
logo.gif^M
------------KM7cH2开发者_开发技巧GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filedata"; filename="logo.gif"^M
Content-Type: application/octet-stream^M
^M
GIF89an^@I^^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Upload"^M
^M
Submit Query^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3-

You want to do this as the file is going over, or is this something you want to do after the file comes over?

Almost any scripting language should work. My AWK is a bit rusty, but...

awk '/^Content-Type: application\/octet-stream/,/^--------/'

That should print everything between application/octet-stream and the ---------- lines. It might also include both those lines too which means you'll have to do something a bit more complex:

BEGIN {state = 0}
{
    if ($0 ~ /^------------/) {
        state = 0;
    }
    if (state == 1) {
        print $0
    }
    if ($0 ~ /^Content-Type: application\/octet-stream/) {
        state = 1;
    }
}

The application\/octet-stream line is after the print statement because you want to set state to 1 after you see application/octet-stream.

Of course, being Unix, you could pipe the output of your program through awk and then save the file.

If you use Python, email.parser.Parser will allow you to parse a multipart MIME document.

This may be a crazy idea, but I would try stripping the headers with procmail.

Look at the Mime::Tools suite for Perl. It has a rich set of classes; I’m sure you could put something together in just a few lines.

This probably contains some typos or something, but bear with me anyway. First determine the boundary (input is the file containing the data - pipe if necessary):

boundary=`grep '^Content-Type: multipart/form-data; boundary=' input|sed 's/.*boundary=//'`

Then filter the Filedata part:

fd='Content-Disposition: form-data; name="Filedata"'
sed -n "/$fd/,/$boundary/p"

The last part is filter a few extra lines - header lines before and including the empty line and the boundary itself, so change the last line from previous to:

sed -n "/$fd/,/$boundary/p" | sed '1,/^$/d' | sed '$d'

sed -n "/$fd/,/$boundary/p" filters the lines between the Filedata header and the boundary (inclusive),
sed '1,/^$/d' is deleting everything up to and including the first line (so removes the headers) and
sed '$d' removes the last line (the boundary).

After this, you wait for Dennis (see comments) to optimize it and you get this:

sed "1,/$fd/d;/^$/d;/$boundary/,$d"

Now that you've come here, scratch all this and do what Ignacio suggested. Reason - this probably won't work (reliably) for this, as GIF is binary data.

Ah, it was a good exercise! Anyway, for the lovers of sed, here's the excellent page:

http://sed.sourceforge.net/sed1line.txt

Outstanding information.

继续阅读：bash sed

How to extract file data from an HTTP MIME-encoded message in Linux?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？