My first http server, soldat-http

2008年的冬天,有一天和一个不认识的网友聊天,他说,以前也对这些框架啦什么的感兴趣,后来就变了,他说他当时的目标是写一个HTTP服务器。我当时能够认同他的观点,不过没什么特别的共鸣。到了今天春天,项目开始了,周末老大休假回杭州,留下大伙大眼瞪小眼要联调。出现了问题发现自己一窍不通,然后开玩笑说一定要把老大的NIO框架看懂才跳槽。结果不幸还没来得及怎么看就闪人了,还好老大开源,现在依然还可以偷偷看看。另外,我还隐约记得09年冬天的时候,似乎也给自己定过个类似一年之内写个HTTP服务器之类的目标。

所以我说,自己写程序是一件情怀驱动的事情:就像有些同学一再强调自己再也不写wordpress插件了,可是还是不断有新版本发布

好吧,其实这个目标最后没有实现。目前这个版本仅仅是能够work,在2010年快要结束的时候赶紧把他搬出来,聊以自慰,嗯,我这一年不是一无所获。
Screenshot - 12282010 - 09:17:59 PM

接下来我还会进一步完善它和底层的事件驱动框架,还有意完成一个Jython的WSGI实现。

Bayeux Protocol

运行一个CometD Demo非常简单,只要创建一个Maven项目即可(CometD Howtos):
$ mvn archetype:generate -DarchetypeCatalog=http://cometd.org

maven会提示用户选择archetype,包括cometd的版本1、版本2,jetty6、jetty7的实现,以及客户端dojo或jquery的实现。这里可以选择最新的:
http://cometd.org -> cometd-archetype-dojo-jetty7 (2.0.0 – CometD archetype for creating a server-side event-driven web application)

项目创建完成后执行mvn jetty:run即可,打开http://127.0.0.1:8080/{artifactId}即可。

CometD的协议包容了各种主要的浏览器,比如在Chromium 5上,dojo采用WebSocket实现;而在不支持WebSocket的Firefox 3上,通过long-polling实现。Bayuex是一个应用协议,CometD是Bayuex的实现,类似鸡与蛋的关系。

有了昨天在Chromium上看WebSocket协议的经验,先看一下CometD的WebSocket实现:
握手。客户端请求/{artifactId}/cometd/handshake
包含Header

GET /cometd-jetty/cometd/handshake HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Host: 127.0.0.1:8080
Origin: http://127.0.0.1:8080
Cookie: JSESSIONID=12jqq6hbsfkfic8vzqpevxtrw

这是标准的WebSocket握手协议,服务端返回:

HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
WebSocket-Origin: http://127.0.0.1:8080
WebSocket-Location: ws://127.0.0.1:8080/cometd-jetty/cometd/handshake

双方完成WebSocket连接的建立。客户端通过websocket发送JSON,进行bayuex的握手:

[{"version":"1.0","minimumVersion":"0.9","channel":"/meta/handshake","supportedConnectionTypes":["websocket","long-polling","callback-polling"],”advice”:{“timeout”:60000,”interval”:0},”id”:”1″}]

服务端返回JSON,下发clientId完成握手:

[{"channel":"/meta/handshake","clientId":"8g6dbnlqr2k6jfo1tdpaeb7iw","version":"1.0","successful":true,"minimumVersion":"1.0","id":"1","supportedConnectionTypes":["websocket","long-polling","callback-polling"]}]

握手完成,bayuex连接建立。

在Demo中,客户端添加了一个handshake的listerner

    function _metaHandshake(handshake)
    {
        if (handshake.successful === true)
        {
            cometd.batch(function()
            {
                cometd.subscribe('/hello', function(message)
                {
                    dojo.byId('body').innerHTML += '<div>Server Says: ' + message.data.greeting + '</div>';
                });
                // Publish on a service channel since the message is for the server only
                cometd.publish('/service/hello', { name: 'World' });
            });
        }
    }

所以在完成握手后,客户端发送一个批量请求,subscribe /hello频道,并且向/service/hello发送json格式的消息。向/service channel发送的信息表示客户端与服务端的单独通信,不会被转发给其他客户端。
id用于区分每个请求,bayuex spec规定向/meta和/service发送的请求必须包含id字段,用于标示请求响应。
请求的内容最终聚合为一个Json

[{"channel":"/meta/subscribe","subscription":"/hello","id":"2","clientId":"8g6dbnlqr2k6jfo1tdpaeb7iw"},{"channel":"/service/hello","data":{"name":"World"},"id":"3","clientId":"8g6dbnlqr2k6jfo1tdpaeb7iw"}]

服务端发回响应,id=2的请求成功,订阅/hello频道成功

[{"channel":"/meta/subscribe","successful":true,"id":"2","subscription":"/hello"}]

之后,服务端发回/hello channel的消息

[{"channel":"/hello","data":{"greeting":"Hello, World"}},{"channel":"/service/hello","successful":true,"id":"3"}]

客户端还要定期发送连接请求保持连接

[{"channel":"/meta/connect","connectionType":"websocket","advice":{"timeout":0},"id":"4","clientId":"8g6dbnlqr2k6jfo1tdpaeb7iw"}]

服务端返回,连接成功

[{"channel":"/meta/connect","advice":{"reconnect":"retry","interval":2500,"timeout":15000},"successful":true,"id":"4"}]

connect请求是用于在客户端和服务端维持连接, Bayeux标准中提到(1, 2):

A transport MUST maintain one and only one outstanding connect message. When a HTTP response that contains a /meta/connect response terminates, the client MUST wait at least the interval specified in the last received advice before following the advice to reestablish the connection

The client MUST maintain only a single outstanding connect message. If the server does not have a current outstanding connect and a connect is not received within a configured timeout, then the server SHOULD act as if a disconnect message has been received.

至此,cometd客户端就可以在/hello频道上订阅、发布消息了。
在Chromium上,所有的操作都在一个WebSocket连接上完成。

而当断开连接时,客户端向服务端发送

[{"channel":"/meta/disconnect","id":"188","clientId":"a8iutjvfp7dtwhzrfujeonk5q"}]

服务端响应

[{"channel":"/meta/disconnect","successful":true,"id":"188"}]

Bayuex基本上就可以理解为一个websocket上的应用协议了。

再看看Firefox 3.6上的实现。Firefox 3.6不支持WebSocket,所有的通信只能通过XHR来实现。
握手,通过一个xhr post请求实现:

POST /{artifactId}/cometd/handshake HTTP/1.1
Host: 127.0.0.1:8080
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.6) Gecko/20100628 Ubuntu/10.04 (lucid) Firefox/3.6.6
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: UTF-8,*
Keep-Alive: 115
Connection: keep-alive
Content-Type: application/json;charset=UTF-8
X-Requested-With: XMLHttpRequest
Referer: http://127.0.0.1:8080/{artifactId}/
Content-Length: 182
Cookie: JSESSIONID=fjnyxb28raih1cnaljrijl1ic
Pragma: no-cache
Cache-Control: no-cache

服务器端响应:

HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
Set-Cookie: BAYEUX_BROWSER=df92-h8q89f416mutgbpxrwb8185u;Path=/
Content-Length: 213
Server: Jetty(7.1.5.v20100705)

[{"channel":"/meta/handshake","clientId":"9185k23lo482oq1po3ivxup2cj","version":"1.0","successful":true,"minimumVersion":"1.0","id":"1","supportedConnectionTypes":["websocket","long-polling","callback-polling"]}]

握手完成,执行客户端定义的回调。发送bayeux请求,通过一个新的XHR上
[{"channel":"/meta/subscribe","subscription":"/hello","id":"2","clientId":"9185k23lo482oq1po3ivxup2cj"},{"channel":"/service/hello","data":{"name":"World"},"id":"3","clientId":"9185k23lo482oq1po3ivxup2cj"}]

服务端同时返回三个bayuex的请求响应

[{"channel":"/meta/subscribe","successful":true,"id":"2","subscription":"/hello"},{"channel":"/hello","data":{"greeting":"Hello, World"}},{"channel":"/service/hello","successful":true,"id":"3"}]

客户端开始发送连接请求

[{"channel":"/meta/connect","connectionType":"long-polling","advice":{"timeout":0},"id":"4","clientId":"9185k23lo482oq1po3ivxup2cj"}]

注意这里使用的是long-polling方式,这是由dojo针对浏览器特性决定的。

Long-polling server implementations attempt to hold open each request until there are events to deliver; the goal is to always have a pending request available to use for delivering events as they occur, thereby minimizing the latency in message delivery.

如果没有新消息,服务端阻塞十秒后返回

[{"channel":"/meta/connect","successful":true,"id":"7"}]

客户端接收到返回立刻发起新的connect请求

当有新消息时,阻塞在服务器端的connect请求会立即返回,同时带回新的消息,如

[{"channel":"/hello","data":{"name":"555"},"id":"6"},{"channel":"/meta/connect","successful":true,"id":"619"}]

而如果是本客户端publish的新消息,会在请求成功的响应中返回,不会影响connect连接,如:

[{"channel":"/hello","data":{"name":"nihao"},"id":"715"},{"channel":"/hello","successful":true,"id":"715"}]

断开时,仍然是通过xhr post一条bayuex命令到服务端

[{"channel":"/meta/disconnect","id":"750","clientId":"9185k23lo482oq1po3ivxup2cj"}]

服务端响应:

[{"channel":"/meta/disconnect","successful":true,"id":"750"}]

至此,通过long polling方式实现bayuex的cometd客户端也描述清楚了。long-polling仍然是通过connect请求来实现pull的方式准实时,与websocket真正push的方式还是存在区别的。

The post is brought to you by lekhonee v0.7

Websocket Protocol

下午用jetty的WekSocketServlet写了一个简单的WebIM程序,正好第一次瞥见WebSocket的狰容。

服务器端
jetty 7.1.5
客户端
Chromium 5.0.375.86

通过wireshark抓包获得这样一些数据:
var _ws = new WebSocket(“ws://127.0.0.1:8080/nothing”)
这个环节创建WebSocket,浏览器与服务器端进行handshake,发送请求

GET /nothing HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Host: 127.0.0.1:8080
Origin: http://127.0.0.1:8080

客户端发出一个Upgrade头,upgrade头在RFC2616 14.42定义

The Upgrade general-header allows the client to specify what additional communication protocols it supports and would like to use if the server finds it appropriate to switch protocols.

Upgrade必须被放入Connection头中标示这是一个Upgrade请求
Connection定义在RFC2616 14.10中:

The Connection general-header field allows the sender to specify options that are desired for that particular connection and MUST NOT be communicated by proxies over further connections.

Origin头还没有进入RFC,他的标准草案可以在这里找到,W3C的标准草案Cross-Origin Resource Sharing定义Origin Header:

The Origin header indicates where the cross-origin request or preflight request originates from.

Origin头的提出是为了解决CSRF的潜在危险,通过Origin服务器端可以获知请求的来源,进而判断其合法性。也就是说将跨域安全性检查的责任交给了服务器端,浏览器端采取信任的策略,避免了原先对跨域一棍子打死的做法。
Jetty 7的org.eclipse.jetty.servlets.CrossOriginFilter对这个头进行了处理。

此外,handshake请求的header中还允许一个Sec-WebSocket-Protocol,用于对服务器端指定一个子协议(应用协议)。

服务器端应答

HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
WebSocket-Origin: http://127.0.0.1:8080
WebSocket-Location: ws://127.0.0.1:8080/nothing

Websocket连接建立。此后,服务器端和客户端可以实现bidirectional的通信,消息体即websocket.send(msg)中的纯文本。要实现这样的机制,浏览器和服务器间需要建立至少两个连接。目前,WebSocket协议中还没有规定客户端对服务器端的连接数限制。不过关于这个限制,RFC2616(HTTP1.1)中规定

Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a given server. A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy.

对此,另一个Bayeux协议倒是已经有了明确的限制:

the Bayeux protocol MUST NOT require any more than 2 HTTP requests to be simultaneously handled by a server in order to handle all application (Bayeux based or otherwise) requests from a client.

到此,客户端和服务器端已经可以建立双工的通信,这也是浏览器级别实现WebSocket协议的最大优点。而对于Firefox 3.x, IE x.x等等,只能在现有的HTTP连接机制上实现WebSocket,如通过long polling和callback polling的方式,但终归无法实现真正双工的通信。

The post is brought to you by lekhonee v0.7

Nginx HTTP Push

前些天看到一个Nginx的Module,用来是实现Comet,今天简单试了一下功能。作者名叫Leo Ponomarev,项目地址:http://pushmodule.slact.net/

安装

Module需要在编译时加入nginx,同时下载nginx和nginx-push-module,在nginx configure时增加一个参数:
./configure –add-module=path/to/nginx_http_push_module

使用

编写一个非常基本的nginx配置文件:

events{
	worker_connections 1024;
}
http{
	server {
		listen	80;
		server_name	localhost;

		location /publish {
			set $push_channel_id $arg_id;

			push_publisher;

			push_store_messages on;
			push_message_timeout 2h;
			push_max_message_buffer_length 10;
			push_min_message_recipients 0;
		}

		location /subscribe{
			push_subscriber;

			push_subscriber_concurrency broadcast;
			set $push_channel_id $arg_id;
			default_type text/plain;
		}

	}
}

一个简单的Server定义了两个路径分别用于publish和subscribe。所有相关的配置项可以在项目主页找到解释,不作赘述。

启动nginx
nginx -c /home/sun/nginxpush/nginx-push.conf

打开一个终端访问subscribe
curl -X GET http://localhost/subscribe?id=0

可以看到HTTP请求被阻塞

打开另一个终端访问publish
curl -X POST http://localhost/publish?id=0 -d “Hello World”

此时subscriber收到字符串”Hello World” ,完成HTTP请求。

subscriber可以通过设置HTTP头来对消息进行过滤,如
curl -X POST http://localhost/publish?id=0 -d “Hello World”
curl -X GET http://localhost/subscribe?id=0 –verbose

* About to connect() to localhost port 80 (#0)
*   Trying 127.0.0.1… connected
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET /subscribe?id=0 HTTP/1.1
> User-Agent: curl/7.19.7 (i686-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3.3
> Host: localhost
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/0.8.28
< Date: Thu, 26 Nov 2009 09:45:25 GMT
< Content-Type: application/x-www-form-urlencoded
< Content-Length: 10
< Last-Modified: Thu, 26 Nov 2009 09:44:59 GMT
< Connection: keep-alive
< Etag: 0
< Vary: If-None-Match, If-Modified-Since
<
* Connection #0 to host localhost left intact
* Closing connection #0

HelloWorld

从响应的头部可以看到Last-Modified: Thu, 26 Nov 2009 09:44:59 GMT的时间是上一次publish的时间,并且通过Vary字段提示了两个选项

  • If-None-Match
  • If-Modified-Since

RFC中对Vary头是这样解释的:

The Vary field value indicates the set of request-header fields that fully determines, while the response is fresh, whether a cache is permitted to use the response to reply to a subsequent request without revalidation.

即可以通过发送If-Modified-Since来获取指定时间之后的数据

curl -X GET -H “If-Modified-Since: Thu, 26 Nov 2009 09:44:50 GMT” http://localhost/subscribe?id=0 –verbose

这时subscribe会重新被阻塞而不是接收上次publish的数据,充分利用了HTTP的语义。

这样用push module来做Web-IM、聊天室的思路就非常清晰了:每个浏览器保持一个subscriber连接,在接收到消息后连接关闭。把消息打印出来,并根据消息响应的头部Last-Modified请求重新subscribe。