Request Based Distributed Computing – A rough sketch

The Hypertext Computing (HTC) paradigm that I have written about in this blog is built on the following observations:

  • There is a fundamental equivalence between http resources and code that if executed would generate the resource
  • It is an accident of history that the scripting models of servers and clients on the web are different.
  • We have an opportunity to apply the lesssons learnt about building secure scriptable clients, to the building of servers and proxies.
  • While the WWW allows a programmer to ignore the network path to an information resource, as programmers, we can’t (yet) ignore where computing will be done. The programmer’s choice of technology (framework, language etc etc) carries with it the implicit choice about the location of computation (server or client).
  • Grid computing must integrate the client’s available computing power rather than assuming that ‘the cloud’ will do everything. As we anticipate processors with 100’s of cores, a bet against the computing power available at the edges of the network is a poor one.
  • The http protocol can be orthogonally extended so that instead of returning the resource at the given URL, a server may instead return code that will generate the resource when executed on a compatible virtual machine.

Doing this will enable us to:

  • Unify the programming models associated with delivering rich user experiences and satisfying http requests on client, proxy and servers.
  • Enable location of code execution to be determined at run time based on criteria like availability of computing power, security and intellectual property concerns rather than just on choice of technology as at present. Thus make location of code execution location transparent to the end user and to the system designer.
  • Facilitate extremely lightweight distributed computing through code mobility from a canonical source to the computing environment that executes it.

Request Based Distributed Computing

[[Update: I have added a set of graphics that illustrate the RDBC architecture.]]

An alternative name for Hypertext Computing is “Request Based Distributed Computing” that is the name that I will use for the remainer of this article. This informal sketch of the Request Based Distributed Computing paradigm involves extending the definition of the http protocol, client, proxies and servers.

http

In achieving the aim of request based distributed computing this proposal does not break the power and security inherent in the request based http model, for example it:

  • does not not assume clients may be interrogated or polled by servers and
  • never expects clients to send code to a server for execution and
  • does not imply that servers become stateful and
  • does not assume that trust can be delegated to a third party process.

The http protocol defines resources which are located using URLs. Request Based Distributed Computing is enabled by the extension of the definition of “resource” to include “coderesource” identified by an extension header field. Coderesources are http resources that are executable on a known Virtual Machine. after executing on a VM the result is indistinguishable from the resource that a webserver would send in response to the same URL. If an http resource returns a coderesource rather than the resource itself, then a well behaved resource will return code WITHOUT reference to the particular data passed via the url/ and or cookie. Internally during one invocation of a CodeResource it operates with the full usage of all language features, local vars etc etc available to it. A coderesource contains a single entry point. CodeResources are wrapped in xml that contains at least the following information:

  • The name and version of the VM that the code may be executed on. Well behaved VMs are always able to execute legacy version code according to the version number in the CodeResource.
  • Contains a mark that is respected by the serving VM that controls mobility. Mobile (or not)
  • Contains a mark that is respected by the serving HTC that controls execution. Executable (or not)

Note: a coderesource marked Not Mobile and Executable corresponds to the behaviour of today’s .php scripts.

A coderesource gets its input from 4 sources:

  • URL parameters
  • A cookie
  • GETs on public http URLs
  • Private resource such as a local databases or GET from a password secured http URL

Code the gets its input from sources 1, 2 and 3 only is mobile code. A large amount of today’s web code can be written in a mobile form. Especially code that facilitates Rich Internet Applications; gadgets and distributed computing projects like SETI@home.

Code that refers to http://localhost resources is not mobile however code that refers http://client is mobile. While http://localhost is understood to refer to a resource local to the server on which the code is found, http://client is introduced to stand for resources on the initiator of the http request. Code containing references to http://client is NOT executable on the server (or on a proxy) since only the client has access to its state.

In addition the extension response header field that identifies that the content of the resource is a coderesource there is an extension request header field that indicates that the request is for mobile code or the requester is open to receiving mobile code. The absence of this request header indicates that the http request is a standard one where the resource itself is expected. RBDC Proxy servers may add this header if it has a local VM, then trap the returned code, execute it and return the resource to the client as expected.

Request Based Distributed Computing (RBDC) Servers

A RBDC Server is an extension of the common web server. It includes at least one sandboxed Virtual Machine (VM) similar to .Net’s CLI or a JVM. A key is that the same virtual machines are used on servers, proxies and clients. If the VM’s primitive instructions are extensible (e.g. like PHP extensions) then the mechanism of extension is by requesting coderesources from canonical RBDC Servers. VM’s contain a look-aside code cache that operates using the http caching mechanism.

Code executed on behalf of a client will generate an error if it refers to http://client resources. If the request that resulted in the failure indicates that the requesting client has the capacity to execute code then the server may return the coderesource instead of the result.

RBDC compatible Proxies

RBDC compatible Proxies also include a VM. HTTP responses that are coderesources that flow through the proxy may be intercepted and executed on the proxy, with the resulting resource returned to the client. Naturally, if the client has specifically requested a coderesource, well behaved proxies will not attempt to execute it.

Proxies can be used on the perimeter of networks to automatically perform processing on behalf of thin clients.

RBDC compatible Clients

A representative example of a http client is a web browser. Clients that support Request Based Distributed Computing contain a Virtual Machine. The VM identifies and accesses ALL local resources via http://client. The local VM may satisfy requests for http://client without employing a full networking stack.

If a client is returned code that it can’t execute it may re-request the URL with headers that request the server to return the resource rather than its coderesource. This can be trapped by proxies and executed and returned or executed by the server and returned.

RBDC clients deprecate existing scripting solutions that are not compatible with RBDC. Scripts embedded in web pages can be references to coderesources available on the web or treated as anonymous functions on the VM. These scripts refer to the DOM via http://client.

RBDC clients have default sandbox security which may be relaxed by the user.

Conclusion

The Hypertext Computer paradigm (or Request Based Distributed Computing) is a small extension of the http protocol and notion of server, proxy and client. Rich Internet Applications, SOA architected applications and SETI@home type distributed computing alike can utilise a common unified programming model. No longer will technology dictate the locus of code execution – instead issues like availability of computing power, intellectual property and security will dictate this at run time.

Click here for discussion of RBDC compared to current technologies.

Comments

  1. Scott Sinclair says

    I’ll go out on a limb here and suggest this is why Java is what it is today…

    One word. Optimisation.

    If I run a Blender standard scene on my (yes yes) Origin 2000 16P running IRIX, it takes around 18 seconds. On the Macbook Pro – 53 seconds.

    If I run the same scene in angry mode i.e. multi-threading enabled, NUMA-aware etc. basically set up specifically for a MIPS super-duper-compuder, around .4 of a second.

    “You pay a high price for platform independence.” – Roger Duke.

    Probably the only quote I took away from Uni that has proved itself true time and time again.


    Scott Sinclair
    Technical Manager UQconnect
    The University of Queensland
    Brisbane, AUSTRALIA

    Sent from my anySIM’d 1.1.1 iPhone – now with added flavour!

  2. Hi David,

    I find the concept intriguing. Of course, it lend itself most well to processes which are both not security-conscious, nor data-centric. Or perhaps where data is as readily accessible to all participants.

    For instance an algorithm that was sorting large sets of data, where the data is accessible from a regular web-service. That web service can be consumed from both client (There are several WSDL client implementations in JavaScript) and server.

    What is most interesting is that you can be very near an implementation, if you restrict yourself to just JavScript initially, since Aptana released their Jaxer server.

    I’ll keep my eyes on this and the best of luck onwards 🙂

    Cheers,
    PS