Tag Archives: curl

curl不能发送value为空的header问题追踪以及解决方案

下午遇到的奇葩的问题,先举个curl的例子吧。分别执行这个命令:

curl "www.baidu.com" -H"XXXX:" -v >/dev/null
curl "www.baidu.com" -H"XXXX:1" -v >/dev/null

看到结果:

[root@vm12080024 ~]# curl "www.baidu.com" -H"XXXX:" -v >/dev/null
* About to connect() to www.baidu.com port 80
*   Trying 61.135.169.125... connected
* Connected to www.baidu.com (61.135.169.125) port 80
> GET / HTTP/1.1
> User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
> Host: www.baidu.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 25 Apr 2013 09:10:19 GMT
< Server: BWS/1.0
< Content-Length: 10502
< Content-Type: text/html;charset=utf-8
< Cache-Control: private
< Set-Cookie: BDSVRTM=17; path=/
< Set-Cookie: H_PS_PSSID=2240_2198_1463_1945_2201_1788_2250_2260_2287; path=/; domain=.baidu.com
< Set-Cookie: BAIDUID=D7136CF3FDF7D815ED0F017710590E45:FG=1; expires=Thu, 25-Apr-43 09:10:19 GMT; path=/; domain=.baidu.com
< Expires: Thu, 25 Apr 2013 09:10:19 GMT
< P3P: CP=" OTI DSP COR IVA OUR IND COM "
< Connection: Keep-Alive
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10502  100 10502    0     0   237k      0 --:--:-- --:--:-- --:--:-- 4424kConnection #0 to host www.baidu.com left intact

* Closing connection #0

再对比一下:

[root@vm12080024 ~]# curl "www.baidu.com" -H"XXXX:1" -v >/dev/null
* About to connect() to www.baidu.com port 80
*   Trying 61.135.169.125... connected
* Connected to www.baidu.com (61.135.169.125) port 80
> GET / HTTP/1.1
> User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
> Host: www.baidu.com
> Accept: */*
> XXXX:1
>
< HTTP/1.1 200 OK
< Date: Thu, 25 Apr 2013 09:10:22 GMT
< Server: BWS/1.0
< Content-Length: 10492
< Content-Type: text/html;charset=utf-8
< Cache-Control: private
< Set-Cookie: BDSVRTM=19; path=/
< Set-Cookie: H_PS_PSSID=2240_2299_1444_2132_1945_1788_2250_2254; path=/; domain=.baidu.com
< Set-Cookie: BAIDUID=C4F1B495285429AD74DBDBB39F76E704:FG=1; expires=Thu, 25-Apr-43 09:10:22 GMT; path=/; domain=.baidu.com
< Expires: Thu, 25 Apr 2013 09:10:22 GMT
< P3P: CP=" OTI DSP COR IVA OUR IND COM "
< Connection: Keep-Alive
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10492  100 10492    0     0   242k      0 --:--:-- --:--:-- --:--:-- 4419kConnection #0 to host www.baidu.com left intact

* Closing connection #0

很显然的我们在HTTP 交互时当header的值为空时会被舍弃,那么这个舍弃究竟是在客户端还是服务器端?
先看RFC的协议:http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2

> Empty HTTP headers are imho legal according to RFC. They are used in certain
> environments to indicate special things to back-end services.

The field-content does not include any leading or trailing LWS: linear white space occurring before the first non-whitespace character of the field-value or after the last non-whitespace character of the field-value. Such leading or trailing LWS MAY be removed without changing the semantics of the field value. Any LWS that occurs between field-content MAY be replaced with a single SP before interpreting the field value or forwarding the message downstream.

IMHO, sending an empty header is completely pointless. It shouldn’t be done, and parsers may not correctly parse these headers. Traditionally, people who want to circumvent such limitations when dealing with non-compliant components have specified “pseudo-empty” values like this:

XXX: ""

If you simply want to validate that a header field was sent as some form of boolean switch, consider sending a placeholder value like the above instead of an empty value.
从以上的文档看起来,[ field-value ]是可选的选项,那么从RFC的协议来看是允许的,因此这个drop应该是在客户端的libcurl把空的header头干掉了,我们深度剖析一下libcurl的处理header部分的代码:

CURLcode Curl_add_custom_headers(struct connectdata *conn,Curl_send_buffer *req_buffer)
{
  char *ptr;
  struct curl_slist *headers=conn->data->set.headers;

  while(headers) {
    ptr = strchr(headers->data, ':');
    if(ptr) {
      /* we require a colon for this to be a true header */

      ptr++; /* pass the colon */
      while(*ptr && ISSPACE(*ptr))
        ptr++;

      if(*ptr) {
        /* only send this if the contents was non-blank */

        if(conn->allocptr.host &&
           /* a Host: header was sent already, don't pass on any custom Host:
              header as that will produce *two* in the same request! */

           checkprefix("Host:", headers->data))
          ;
        else if(conn->data->set.httpreq == HTTPREQ_POST_FORM &&
                /* this header (extended by formdata.c) is sent later */
                checkprefix("Content-Type:", headers->data))
          ;
        else if(conn->bits.authneg &&
                /* while doing auth neg, don't allow the custom length since
                   we will force length zero then */

                checkprefix("Content-Length", headers->data))
          ;
        else if(conn->allocptr.te &&
                /* when asking for Transfer-Encoding, don't pass on a custom
                   Connection: */

                checkprefix("Connection", headers->data))
          ;
        else {
          CURLcode result = Curl_add_bufferf(req_buffer, "%s\r\n",
                                             headers->data);
          if(result)
            return result;
        }
      }
    }
    else {
      ptr = strchr(headers->data, ';');
      if(ptr) {

        ptr++; /* pass the semicolon */
        while(*ptr && ISSPACE(*ptr))
          ptr++;

        if(*ptr) {
          /* this may be used for something else in the future */
        }
        else {
          if(*(--ptr) == ';') {
            CURLcode result;

            /* send no-value custom header if terminated by semicolon */
            *ptr = ':';
            result = Curl_add_bufferf(req_buffer, "%s\r\n",
                                             headers->data);
            if(result)
              return result;
          }
        }
      }
    }
    headers = headers->next;
  }
  return CURLE_OK;
}

显然通过

while(*ptr && ISSPACE(*ptr))
        ptr++;

可以看到类似xxxx: 这样的头部全部被忽略了。但从代码可以看到在分号下,我们可以进入到下一个循环,可以设置空的http header了,在版本Fixed in 7.23.0 – November 15 2011。

Empty headers can be sent in HTTP requests by terminating with a semicolon

现在测试一下手工升级curl:

wget "http://curl.haxx.se/download/curl-7.30.0.tar.gz"
tar -zxvf curl-7.30.0.tar.gz
cd curl-7.30.0
./configure -prefix=/usr/local/curl; make; make install
cp  /usr/local/curl/bin/curl  /usr/bin/

此时再检查一下curl的版本

curl -V

可以看到:

[root@mingming-dev curl-7.30.0]# curl -V
curl 7.30.0 (x86_64-unknown-linux-gnu) libcurl/7.30.0 OpenSSL/1.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: IDN IPv6 Largefile NTLM NTLM_WB SSL libz

此时用分号发送看一下:

curl "www.baidu.com" -H"xxxxx;" -v >/dev/null
root@vm12080024 logs]# curl "www.baidu.com" -H"xxxxx;" -v >/dev/null
* About to connect() to www.baidu.com port 80 (#0)
*   Trying 61.135.169.105...
* Adding handle: conn: 0x25a9cb0
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x25a9cb0) send_pipe: 1, recv_pipe: 0
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to www.baidu.com (61.135.169.105) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.30.0
> Host: www.baidu.com
> Accept: */*
> xxxxx:
>
< HTTP/1.1 200 OK
< Date: Fri, 26 Apr 2013 01:47:37 GMT
...下面的省略了

可以升级以后可以直接使用分号发送empty header了 :)

再细致一点,应用到php之中

注意:此时必须重新打包php的curl模块,前提是先升级libcurl~

重新打包curl模块的部分不再赘述了..找源码重新编译一下覆盖掉以前的就行。此时进行测试test.php:

<?php
error_reporting(E_ERROR | E_WARNING | E_PARSE | E_NOTICE);
$url = 'http://skirt.sinaapp.com';
$method = 'POST';
$header = array('X-SWS-Container-Meta-Expires-Rule;','X-SWS-Container-Meta-Expires-Rulesssss:222222222222222222222222');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $method);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
$resp = curl_exec($ch);
$resp_info = curl_getinfo($ch);
var_dump($header);
echo "=========================================================================\n";
var_dump($resp_info);

理论上应该包含了一个X-SWS-Container-Meta-Expires-Rule的没有value的header,事实本如此,直接看测试的结果:

[root@mingming-dev ~]# php test.php
array(2) {
  [0]=>
  string(34) "X-SWS-Container-Meta-Expires-Rule;"
  [1]=>
  string(63) "X-SWS-Container-Meta-Expires-Rulesssss:222222222222222222222222"
}
=========================================================================
array(22) {
  ["url"]=>
  string(25) "http://skirt.sinaapp.com/"
  ["content_type"]=>
  string(24) "text/html; charset=UTF-8"
  ["http_code"]=>
  int(200)
  ...........
  为缩短篇幅,中间的部分就省略了
  ...........
  ["request_header"]=>
  string(158) "POST / HTTP/1.1
Host: skirt.sinaapp.com
Accept: */*
X-SWS-Container-Meta-Expires-Rule:
X-SWS-Container-Meta-Expires-Rulesssss:222222222222222222222222

"
}

此时惊喜的看到包含了X-SWS-Container-Meta-Expires-Rule:这个没有value的header~此文献给这个问题,希望对将来碰到这个问题的人有些启示。