How to extract file data from an HTTP MIME-encoded message in Linux?
I have a program that accepts HTTP post of files and write all the POST result into a file, I want to write a script to delete the HTTP headers, only leave the binary file data, how to do it?
The file content is below (the data between Content-Type: application/octet-stream
and ------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3
is what I want:
POST /?user_name=vvvvvvvv&size=837&file_name=logo.gif& HTTP/1.1^M
Accept: text/*^M
Content-Type: multipart/form-data; boundary=----------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
User-Agent: Shockwave Flash^M
Host: 192.168.0.198:9998^M
Content-Length: 1251^M
Connection: Keep-Alive^M
Cache-Control: no-cache^M
Cookie: cb_fullname=ddddddd; cb_user_name=cdc^M
^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filename"^M
^M
logo.gif^M
------------KM7cH2开发者_开发技巧GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filedata"; filename="logo.gif"^M
Content-Type: application/octet-stream^M
^M
GIF89an^@I^^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Upload"^M
^M
Submit Query^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3-
You want to do this as the file is going over, or is this something you want to do after the file comes over?
Almost any scripting language should work. My AWK is a bit rusty, but...
awk '/^Content-Type: application\/octet-stream/,/^--------/'
That should print everything between application/octet-stream
and the ----------
lines. It might also include both those lines too which means you'll have to do something a bit more complex:
BEGIN {state = 0}
{
if ($0 ~ /^------------/) {
state = 0;
}
if (state == 1) {
print $0
}
if ($0 ~ /^Content-Type: application\/octet-stream/) {
state = 1;
}
}
The application\/octet-stream
line is after the print statement because you want to set state
to 1
after you see application/octet-stream
.
Of course, being Unix, you could pipe the output of your program through awk and then save the file.
If you use Python, email.parser.Parser
will allow you to parse a multipart MIME document.
This may be a crazy idea, but I would try stripping the headers with procmail.
Look at the Mime::Tools suite for Perl. It has a rich set of classes; I’m sure you could put something together in just a few lines.
This probably contains some typos or something, but bear with me anyway. First determine the boundary (input
is the file containing the data - pipe if necessary):
boundary=`grep '^Content-Type: multipart/form-data; boundary=' input|sed 's/.*boundary=//'`
Then filter the Filedata
part:
fd='Content-Disposition: form-data; name="Filedata"'
sed -n "/$fd/,/$boundary/p"
The last part is filter a few extra lines - header lines before and including the empty line and the boundary itself, so change the last line from previous to:
sed -n "/$fd/,/$boundary/p" | sed '1,/^$/d' | sed '$d'
sed -n "/$fd/,/$boundary/p"
filters the lines between theFiledata
header and the boundary (inclusive),sed '1,/^$/d'
is deleting everything up to and including the first line (so removes the headers) andsed '$d'
removes the last line (the boundary).
After this, you wait for Dennis (see comments) to optimize it and you get this:
sed "1,/$fd/d;/^$/d;/$boundary/,$d"
Now that you've come here, scratch all this and do what Ignacio suggested. Reason - this probably won't work (reliably) for this, as GIF is binary data.
Ah, it was a good exercise! Anyway, for the lovers of sed
, here's the excellent page:
- http://sed.sourceforge.net/sed1line.txt
Outstanding information.
精彩评论