Using python sockets to receive large http requests

2023-01-20 23:18 问答作者：

I am using python sockets to receive web style and soap requests. The code I have is

import socket
svrsocket = socket.socke开发者_Go百科t(socket.AF_INET, socket.SOCK_STREAM)
host = socket.gethostname();
svrsocket.bind((host,8091))
svrsocket.listen(1)
clientSocket, clientAddress = svrsocket.accept()
message = clientSocket.recv(4096)

Some of the soap requests I receive, however, are huge. 650k huge, and this could become several Mb. Instead of the single recv I tried

message = ''
while True:
  data = clientSocket.recv(4096)
  if len(data) == 0:
   break;
  message = message + data

but I never receive a 0 byte data chunk with firefox or safari, although the python socket how to says I should.

What can I do to get round this?

Unfortunately you can't solve this on the TCP level - HTTP defines its own connection management, see RFC 2616. This basically means you need to parse the stream (at least the headers) to figure out when a connection could be closed.

See related questions here - https://stackoverflow.com/search?q=http+connection

Hiya

Firstly I want to reinforce what the previous answer said

Unfortunately you can't solve this on the TCP level

Which is true, you can't. However you can implement an http parser on top of your tcp sockets. And that's what I want to explore here. Let's get started

Problem and Desired Outcome

Right now we are struggling to find the end to a datastream. We expected our stream to end with a fixed ending but now we know that HTTP does not define any message suffix

And yet, we move forward.

There is one question we can now ask, "Can we ever know the length of the message in advance?" and the answer to that is YES! Sometimes...

You see HTTP/1.1 defines a header called Content-Length and as you'd expect it has exactly what we want, the content length; but there is something else in the shadows: Transfer-Encoding: chunked. unless you really want to learn about it, we'll stay away from it for now.

Solution

Here is a solution. You're not gonna know what some of these functions are at first, but if you stick with me, I'll explain. Alright... Take a deep breath.

Assuming conn is a socket connection to the desired HTTP server

...

    rawheaders = recvheaders(conn,end=CRLF)
    headers = dict_headers(io.StringIO(rawheaders))
    l_content = headers['Content-Length']

    #okay. we've got content length by magic

    buffersize = 4096
    while True:
        if l_content <= 0: break

        data = clientSocket.recv(buffersize)
        message += data
        
        l_content -= len(data)

...

As you can see, we enter the loop already knowing the Content-Length as l_content

While we iterate we keep track of the remaining content by subtracting the length of clientSocket.recv(buff) from l_content.

When we've read at least as much data as l_content, we are done

if l_content <= 0: break

Frustration

Note: For some these next bits I'm gonna give psuedo code because the code can be a bit dense

So now you're asking, what is rawheaders = recvheaders(conn),
what is headers = dict_headers(io.StringIO(rawheaders)),
and HOW did we get headers['Content-Length']?!

For starters, recvheaders. The HTTP/1.1 spec doesn't define a message suffix, but it does define something useful: a suffix for the http headers! And that suffix is CRLF aka \r\n.That means we know when we've recieved the headers when we read CRLF. So we can write a function like

def recvheaders(sock):
    rawheaders = ''
    until we read crlf:
        rawheaders = sock.recv()
    return rawheaders

Next, parsing the headers.

def dict_header(ioheaders:io.StringIO):
    """
    parses an http response into the status-line and headers
    """
    #here I expect ioheaders to be io.StringIO
    #the status line is always the first line
    status = ioheaders.readline().strip()
    headers = {}
    for line in ioheaders:
        item = line.strip()
        if not item:
            break
        //headers look like this 
        //'Header-Name' : 'Value'
        item = item.split(':', 1)
        if len(item) == 2:
            key, value = item
            headers[key] = value
    return status, headers

Here we read the status line then we continue to iterate over every remaining line and build [key,value] pairs from Header: Value with

    item = line.strip()
    item = item.split(':', 1)
    # We do split(':',1) to avoid cases like
    # 'Header' : 'foo:bar' -> ['Header','foo','bar']
    # when we want ---------> ['Header','foo:bar']

then we take that list and add it to the headers dict

    #unpacking
    #key = item[0], value = item[1]
    key, value = item
    header[key] = value

BAM, we've created a map of headers

From there headers['Content-Length'] falls right out.

So,

This structure will work as long as you can guarantee that you will always recieve Content-Length If you've made it this far WOW, thanks for taking the time and I hope this helped you out!

TLDR; if you want to know the length of an http message with sockets, write an http parser

继续阅读：http python soap sockets

Using python sockets to receive large http requests

Hiya

Problem and Desired Outcome

Solution

Frustration

So,

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Hiya

Problem and Desired Outcome

Solution

Frustration

So,

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？