Apache Forrest WikiRenderer
 
   

BlocksDefinition

PDF
PDF

We're still experimenting with how to best use the wiki for collaborative design, please be careful or ask on cocoon-dev before making changes to this page.

Reader's comments are welcome at the end of this page, possibly using footnote references to point to the original text.

This definition was authored by Stefano Mazzocchi and posted on cocoon-dev.

Part 1: introduction

A step back: what are the problems we are trying to solve

Cocoon is currently a framework implemented as an application.

A 'framework' is supposed to give services to entities included in it, while an application is supposed to be executed by a containing framework.

While the above might sound weird at first, this is a very common situation: an operating system is a framework implemented as an application run at boot time. At the same time, an application server is a framework implemented as an application. But even a browser is a framework implemented as an application.

So, there is no inherently bad design in this concept, *but* the framework must be implemented in such a way that it's inherently *easy* to deploy/install/plug-in/connect/attach/inject/link an internal application that must be executed by the framework.

Cocoon lacks this.

Let me give you an example: I would like to be able to package my stuff that I wrote to be run *by* cocoon and deploy it on Cocoon, maybe even at runtime.

The parallel is easily made: servlets and WARs archives. The Servlet API introduced in a later release the concept of a WAR (Web ARchive) package that includes all the resources needed for the servlet/jsp-based web application to run, including libraries, resources, files and everything.

So, the parallel I want to draw is simple:

   WAR (Web ARchive) -> tomcat (or other servlet container)
   COB (COcoon Block) -> cocoon

so, a WAR package is for tomcat what a COB will be for Cocoon.

In very short terms: a way for you to deploy your stuff on Cocoon without hassle (including special libraries, resources and what not).

Are we really cloning the servlet API?

Many people from the pure J2EE world (even Apache people) believe that Cocoon is just an attempt to rewrite the servlet API for XML. In a sense it's true: the servlet API wasn't designed for pipelines and the deployment descriptor wasn't designed for serious URI space mapping.

So, while the Servlet API introduces components (servlets and filters) that are based on streams of bytes/chars, Cocoon introduces components designed to be part of a pipeline (since 1997 I thought about a way to allow servlet chaining to be feasible, that is probably what triggered the idea of pipeline components for Cocoon).

Anyway, looking at this parallel, Cocoon really lacks a way to make its applications deployed easily within a 'naked' container that includes only the basic and default machinery.

                                    - o -

I'm pretty sure that if I stopped here and went on describing the schema of the COB descriptor file and so on, people would love it, thank me, run to their boss to tell them and blah blah.

Sure, we could stop here, we could clone the WAR concept inside Cocoon, allow you to deploy your stuff and you won't be missing anything.

But there are two things that the servlet API architects didn't consider (not even myself at that time since I was part of that group): polymorphism and inheritance.

Applying Avalon COP philosophy over again

If you ever worked with Avalon, you know the feeling: at first it doesn't make any sense at all. It's a mess of stupid and very abstract interfaces... but after a while, a pattern emerges and it sticks.

Some might think that Avalon (probably Cocoon itself) includes infecting 'memes' and I agree. (Look up the name on google if you don't know what I'm talking about)

Once you start using COP (component oriented programming), it's very hard to go back (so much so that many abuse it and over-componentize their systems... even Cocoon itself suffers from this problem on some parts).

COP is based on IoC (Inversion of Control) and SoC (Separation of Concerns) (for those who still don't know about them!) and while the servlet API makes extensive use of the IoC metapattern, SoC doesn't play a clear and defined role (they tried to patch it with RequestDispatcher, which is the biggest hack I ever seen, I even voted against it but I was overruled).

Anyway, if the servlet API, internally, show use of IoC and SoC, externally, from the WAR point of view, there is *absolutely* no notion of it: a WAR is a package that includes a single and isolated application.

Period. That's it. There are many mechanism that enforce the clear separation between different WARs. So, they implement monolithic web applications and this is *by design*.

Improvements

Improvement #1: component-oriented deployment

Let me give you a possible use-case scenario.

Let us suppose that we implement WAR-like package deployment on top of Cocoon and that your application requires both PDF serialization and SVG->PNG rasterization.

Then, you implement another cocoon web application and you still require PDF generation.

Unfortunatley, since WAR-like installation isolates the packages and their classloaders, you have to install the PDF serialization libraries twice.

Thus the idea of blocks as units of deployable service. Here is a picture:

  • case 1: WAR-like deployment
    +----------------+  +--------------------+
    |       +-------+|  |+-----+    +-------+|
    |       |  FOP  ||  || FOP |    | Batik ||
    |       +-------+|  |+-----+    +-------+|
    |                |  |                    |
    |    webapp1     |  |      webapp2       |
    +----------------+  +--------------------+
  • case 2: block-like deployment
           +-----+  +-------+
           | FOP |  | Batik |
           +-----+  +-------+
              |   \    |
              |    \   |
       +---------+  +---------+
       | webapp1 |  | webapp2 |
       +---------+  +---------+

The second case allows:

  • optimization of resources (libraries are not deployed more than needed)
  • separate distributions (different packages can be prepared and

maintained by different groups independently, as long as the service contracts remain the same)

Improvement #2: polymorphic behavior

The above solution already improves on the WAR model, but we can do better than this. Another use-case scenario:

In the previous scenario, your web application required PDF serialization and, in fact, it mixes concerns if it depends *explicitly* on FOP since, later on, you might want to use another library/service that implements the same (for example iText or RenderX).

So, instead of depending on a particular *implementation* of a service behavior, if we make blocks depending on *behaviors* directly (considered as service contracts) we can implement polymorphic behavior of blocks.

Again, let's visualize it:

  • case 1: dependency on implementation
           +-----+  +-------+
           | FOP |  | Batik |
           +-----+  +-------+
              |   \    |
              |    \   |
       +---------+  +---------+
       | webapp1 |  | webapp2 |
       +---------+  +---------+
  • case 2: dependency on behavior
      +-----------+ +-----------+
      |  +-----+  | | +-------+ |
      |  | FOP |  | | | Batik | |
      |  +-----+  | | +-------+ |
      |  FO->PDF  | | SVG->PNG  |
      +-----------+ +-----------+
              |   \    |
              |    \   |
       +---------+  +---------+
       | webapp1 |  | webapp2 |
       +---------+  +---------+

Here, the webapp1 requires "fo-pdf" serialization services but it does not care (nor should!) which implementation of this service is actually located into the system.

It is, in fact, the installer's concern to indicate whatblock that implementsthat behavior should be used in that system at that time.

Note that this allows several very intersting things:

  1. versioning: it is possible install several different versions of the same block and try them out (even at runtime) and roll-back if the version creates incompatibilities without having to change anything in the blocks, but only using the block manager (which is the part of cocoon responsible for deployment and configuration of blocks in the system).
  1. polymorphism: I can have different implementations of the same behavior and I can switch them simply by acting on the block manager, without having to touch a single configuration line in any block. The blocks are, in fact, sealed.

Improvement #3: block inheritance

The third step is to allow blocks to extends other blocks.

The idea is to be able to wrap a block with another one, creating an 'overloading' mechanism similar to the one used by OOP inheritance where methods are 'fall back' to the extended class if the extending class doesn't implement them.

Let us supposed we have the following block (very simple):

   block A implements http://mystuff.org/skin/1.1

         /stylesheets/changes2document.xslt
         /stylesheets/faq2document.xslt
         /stylesheets/document2html.xslt
         /resources/logo.gif

and let us suppose that we want to change the look and feel of that block. The first two stylesheets provide simply a way to adapt from more specific markup to the Document DTD. So, my block would need to change only the last two resources 'document2html.xslt' and 'logo.gif'.

The best solution is to allow my block to explicitly "extend" that block and inherits the resources that it doesn't contain.

  block b extends block a

         /stylesheets/document2html.xslt
         /resources/logo.gif

but then block B still is considered implementing behavior http://mystuff.org/skin/1.1 because the rest is inherited.

This mainly:

  • reduces block development and maintanance costs because changes and bugfixes are directly inherited by all the extending blocks, thus allowing better SoC between the two groups mainaining the different blocks
  • easy customization: blocks can be adapted for personal specific needs simply with a wrapper around and without the need to repackaging.

Part 2: technical details

Ok. Now that we have described where we want to go, let's describe how.

Cocoon Blocks

A Cocoon block is a zipped archive, just like JARs and WARs.

The suggested extension of a cocoon block is ".cob" (for COcoon Block).

The suggested MIME type is "application/x-cocoon-block".

A Cocoon Block (COB from now on) includes a directory called

  /BLOCK-INF

which contains all the block metadata and the resources that must not be directly referentiable from other blocks (for example, jars, classes or file resources made available thru the classloader). The directories

  /BLOCK-INF/classes
  /BLOCK-INF/jar

are used for classes and jar files. (This follows the WAR paradigm)

The main COB descriptor file is found at

  /BLOCK-INF/block.xml

This file contains markup with a cob-specific namespace and will include the following information:

  1. block implementation metadata:
  • unique URI identifier (this identifier will also be used as an address on where to locate the block and how to download it from the web!) (example: http://mystuff.org/dist/myblock-1.5.34.cob)
  • version (1.5.34)
  • short name (My Block)
  • description
  • author
  • URI of license (http://mystuff.org/dist/license)
  • URI of the distribution location (http://mystuff/dist/latest/myblock.cob)
  • ???
  1. role(s):
  • the URI(s) of the behavioral role(s) this block implements and exposes (optional)
  1. dependencies:
  • the URI(s) of the behavioral roles this block expects, along with the prefixes used by the block as shortcuts in protocol resolving (see below for the meaning of this) (optional)
  1. inheritance:
  • the URI of the block extended. (optional)
  1. sitemap:
  • the location inside the block file space of the sitemap (optional, if not found defaults to '/sitemap.xmap')
  1. configurations:
  • the configurations required for this block to function (optional)

Also, the /BLOCK-INF/ directory contains the 'roles' file for Avalon components:

  /BLOCK-INF/roles.xml

Possible use-case scenario

Suppose you have your naked cocoon running in your favorite servlet container, and you want to deploy myblock.cob. Here is a possible sequence of actions on an hypotetical web interface on top of Cocoon (a-la Tomcat Manager)

  1. upload the myblock.cob to Cocoon
  2. Cocoon scans /BLOCK-INF/, reads block.xml and finds out the behaviors this block depends on as well as the block that it extends.
  3. the block manager connects to the uber "Cocoon Block Librarian" web service (hosted probably on cocoon.apache.org) and asks for the list of blocks that exhibit that required behavior.
  4. the librarian returns a list of those blocks, so the users chooses, or the manager allows the user to deploy its own block that implements the required behavior or to reuse those already deployed blocks that implement the required behaviors.
  5. Cocoon checks that all dependencies are met, then unpacks and installs the blocks
  6. For each block that exposes a sitemap, the deployment manager asks the deploying user where he/she wants to *mount* that block in the managed URI space or if he/she wants to keep them internal only (thus only available to the other blocks, but not mounted on the public URI space)
  7. for each block that requires installation-time configurations, the block manager will present the user information on how to configure the block.
  8. If no collisions in the URI spaces are found, the blocks are made available for servicing.

Resource dereferencing

Security concerns aside, the above scenario shows one major issue: blocks are managed, deployed and mounted by the container. There is (and there should not be) a way for a block to directly access another block because this would ruin IoC.

So, one block doesn't know where the blocks it depends on are located, both on disk *and* on the URI space as well.

The proposed solution is to use block-specific protocols to identify the dereferenced resources.

For example, the myblock.cob/sitemap.xmap file could contain a global matcher which works like this:

  <map:match pattern="**/*.html">
     <map:generate src="{1}.xml"/>
     <map:transform src="block:skin:/stylesheets/document2html.xslt"/>
     <map:serialize/>
  </map:match>

please note the

  block:skin:/stylesheets/document2html.xslt

which indicates

  • block

-> use the block protocol

  • skin

-> use the 'skin' prefix to lookup the block behavior URI and thus the block which implements it for this block (the block manager knows this)

  • /stylesheets/document2html.xslt

-> it will ask the sitemap of the skin block to produce that resource.

Dereferencing navigation

Not only a sitemap needs to connect to the resources contained in the blocks on which the block depends on, but the resulting pages as well.

In fact, suppose you have a block that exposes a web service and another one that exposes a web application that wraps that web service. For sure, the generated web page will have to have a URI to connect to that service, since it's the client's browser that makes the call (unless we want to virtualize everything thru the sitemaps, but I wouldn't suggest it).

So, a possible solution is to use the "block:" protocol in the pages as well and have a URI-mapping transformer right before the serialization stage.

For example, things like

   <form action="block:web-service:/post">...</form>

is transformed into

   <form action="/servizio-web/post"/>...</form>

Some design decision taken

NO BEHAVIOR VALIDATION

I thought a lot about it but I think that having 'behavior description languages' (such as the WSDL-equivalent for blocks) is going to be terribly complicated, expensive to implement and hard to use and enforce, even for simple blocks which don't expose a sitemap and are just repositories for informations.

For this reason, there is no validation taking place: if a block implements a particular behavior and exposes it thru its descriptor file, Cocoon automatically assume it implements the behavior correctly.

In the future, we might think of adding a behavior description layer to enforce a little more validation, but I fear the complexity (for example) of validating stylesheets against a particular required behavior.

IMO, only human try/fail and patching will allow interoperability.

VERSIONING AS PART OF THE BEHAVIOR URI

The behavior URI *MUST* terminate with a /x.y that indicates the major.minor version of the behavior that a block implements.

On dependencies, each block must be able to specify the 'ranges' of versioning that it is known to work with. For example

   <block behavior="http://xml.apache.org/forrest/skin/1.x" prefix="skin"/>

But I haven't really thought about the patterns that could be used for this.

Please, help on this.

CROSS-BLOCK SECURITY

Even I don't think anybody is stupid enough to use a single Cocoon instance to run a full ISP and ask for sandboxing of the single blocks, cross-block security is a big concern, expecially since you might be deploying components on the fly in a binary format.

So, first thing is to protect the /BLOCK-INF/ directory.

The second thing is to wrap each block with its own classloader, connected to the block dependency map, so that each class discovery is done only on the class space of the dependent blocks.

NOTE: this doesn't prevent people from using blocks as trojans, but we won't host blocks which don't come with the source code so we solve that problem.

COCOON MANAGER SECURITY

The cocoon manager might be a block itself that connects to specific cocoon internals and provides a web interface for it. So, it can be removed or disabled when put on production.

Also, the feature of automatic discovery of blocks thru the 'cocoon block library' can be turned off or substituted with its own (even the 'cocoon block library' could be a block, so you could have your own block library on your system instead of connecting to the apache one).

OPTIONAL COP

The block.xml file makes it *optional* to expose behaviors or to depend on them. This allows the COP model to nicely downgrade to the good old single-archive WAR paradigm for those who don't care about block polymorphism.

Conclusions

I think I have exposed a detailed plan on how to implement blocks and solve a number of issues we are having:

  • allow users to 'compose' Cocoon only with those modules they need
  • allow users to easily deploy their stuff on cocoon
  • allow users to easily reuse web applications components without sacrificing coherence, interoperability and easy extensibility
  • allow users to be helped by Cocoon to 'fill the gaps' and be suggested on what components is best required and feed it automatically (apt-get like)
  • allow the Cocoon communities to clearly separate concerns between the core and the application-level stuff, thus allowing the cocoon community to really scale by massive development parallelization
  • allows, for the first time in the history of the web, to use polymorphism, inheritence and COP at a web application level.

THANKS

I would like to thank Giacomo Pati and Carsten Ziegler for their great contribution and precious feedback.

----

CODA

Changes from version 1.0

  • added the concept of block inheritance
  • wrote a scenario for introduction of the COB model as an evolution of

the WAR model.

  • added configurations to blocks
  • changed block.info and blocks.roles into block.xml and roles.xml
  • removed issues already identified by the first round of design

TODO

Blocks should allow to depend on 'ranges' of behavior versions.

Let's try to come up with a way to describe those ranges effectively.

The block manager should present the user with a form on how to configure the block

... thus the block should contain enough configuration metadata (default values, valid entries, ect..) to tell the block manager how to create the form to present. Should we use RDF for this or schemas are good enough?

Which avalon container should we use?

The one we currently use (ECM) is not powerful enough. Is there already a container which is powerful enough to handle our needs as described here? if not, what do we do? we implement our own or work with the avalon people to fix theirs to meet our needs?

How do we implement the block manager?

should it be a command line interface or a web interface, or both? what about security?

The 'uber library of cocoon blocks'

Where do we host it? how to we manage it? How do we provide the block discovery web service? which technology do we use: SOAP or REST?

Should we "digitally sign" our blocks? if so, how?

Reader's comments

Please don't change the content above this line, except if you want to add footnotes reference.

Question on versioning

Why mustthe behavior URI terminate with a /x.y that indicates the major.minor version of the behavior that a block implements? If each block must be able to specify the 'ranges' of versioning that it is known to work with, then why not adopt the functionality offered by XML attributes. For example:

   <block behavior="http://xml.apache.org/forrest/skin" start="1.x" end="2.0" prefix="skin"/>

How to do a first microstep towards the goal.

Taken froma mail from SylvainWallez on cocoon-dev.

> 
> The TreeProcessor works by creating an evaluation tree of 
> ProcessingNodes corresponding to sitemap statements. It asks a 
> TreeBuilder to create this tree and then handles requests with it.
> 
> The TreeBuilder reads the sitemap file (in an Avalon Configuration 
> object) and builds this tree by invoking a ProcessingNodeBuilder for 
> each element encountered in the sitemap. The ProcessingNodeBuilder in 
> turn creates an appropriate ProcessingNode that will be used at runtime 
> to "execute" the sitemap
> 
> The ProcessingNode isn't created directly from the sitemap element, 
> since some sitemap elements don't always lead to identical processing 
> depending either on their attributes (e.g. <map:call resource=""> and 
> <map:call function="">) or the used components (e.g. <map:match> which 
> is different for regular Matcher and PreparedMatcher).
> 
> The DefaultTreeBuilder has a createComponentManager() method that 
> creates - guess what? - the CM that is to be used within the processing 
> tree to lookup components. In that default implementation, this is just 
> the "current" one (i.e. the one passed to "compose()").
> 
> But if you look at SitemapLanguage, which is a subclass of 
> DefaultTreeBuilder, you will notice that its createComponentManager() 
> method creates a new CocoonComponentManager and configures it with 
> <map:components>. So <map:components> defines components of the sitemap 
> just a cocoon.xconf defines them for the Cocoon object.
> 
> Adding a custom classloader to the sitemap to handle blocks should thus 
> be just a matter of giving that custom classloader to the created CM.
>