If you just want to get started editing a page:
Edit(to skip this step hack the bookmarklet to add an 'action=edit' param to the bookmarklet's query string)
Submitto save your edit to the workarea
Committo save the updated file to SVN and trigger a staged build. (to skip this step click on the "Quick Commit" checkbox in the
Publish Siteto deploy.
This section describes the current conditions of the ASF website publishing system and its deficiencies. It also discusses options the Infrastructure Team considered in addressing these problems with an eye towards our future needs.
The existing publishing system at Apache has evolved from the case where the organization's hardware consisted of a single machine. Websites have always been limited to using a combination of static content and cgi scripts in order to not overtax a machine simultaneously responsible for delivering (circa 2000-2003) over 1M hits and serving committers as our CVS master host.
The organization has since grown to encompass about three full cabinets worth of hardware and a pair of machines dedicated to serving mainly www.apache.org and project websites. The machines, eos and aurora, are some of our most expensive equipment and are located in two different datacenters to provide redundancy and failover capabilities. The current traffic load is roughly 20M hits a day for those machines.
However the publishing system involves running hourly
find jobs on
people.apache.org and pushing that content out to eos and aurora with
rsync. With roughly 300GB worth of content to scan it is no longer
possible to do this with a single
find job, so we now run them in
parallel: one find job per website. This puts an incredible load on
people.apache.org's ZFS array as there are roughly 100 sites to scan.
As good as ZFS is, the filesystem will not be able to keep up with this
load as the organization continues to promote new top-level projects.
Several years ago during the wiki craze at Apache, the Infrastructure Team was tasked with setting up a Confluence installation for our projects to use. Apache member Pier Fumigalli developed and offered the autoexport plugin as a way to provide Confluence-backed project websites, which was quickly adopted by several projects. The process involves rsyncing the autoexported pages from the machine hosting Confluence over to people.apache.org, where the standard publication system described above would push those pages out to eos and aurora to be served live.
Over time we began to experience chronic problems with this particular setup. First off, different projects often wanted to use different and occasionally conflicting plugins for their sites. Secondly, plugins would often break during Confluence upgrades. The biggest offender was in fact the autoexport plugin and its reliance on Confluence internals. Virtually every upgrade was guaranteed to break it, and after a while Pier and other java developers at Apache lost interest in supporting it. We tried around for people to support it, and were even willing to compensate folks for their time, but there were no takers. Confluence backed websites were fully dependent on the autoexport plugin to have any chance of working, and the organization was caught between a rock and a hard place in deciding when it was possible to upgrade Confluence.
The other main problem with this configuration is that it makes url deletions a nightmare. The autoexport plugin doesn't support url deletions, and that is carried through to the live sites via rsync.
Currently Apache's Confluence installation is hosted on thor, which is a Sun T5220 Sparc. It's by far our beefiest machine with 8 cores and 8 threads per core, and yet our Confluence service is dog slow. Our installation is simply out-scaling the software, and to keep it performing acceptably will require even more significant equipment investments going forward.
Anakia was a great tool 10 years ago. It is a competing technology to XSLT for dealing with raw XML content. Many projects still rely on anakia to generate their webpages but most of the web has moved on. It's time the ASF caught up with the times.
While Apache is still primarily a place for software developers to collaborate, some of the people who provide support for our press and legal efforts need to be able to contribute to www.apache.org. Expecting them to deal with tools like Anakia to roll their own builds of XML-based content is a non-starter.
Obviously with hourly crons pushing content out to our webservers there
will be delays as long as 2 hours between the time someone commits a change
and logs onto people.apache.org to
svn up the website, and the time it
actually gets synced to the live site. That has been the status quo at
Apache for several years and it simply isn't good enough any longer.
While there is a zoo of available Open Source CMS's to choose from, only a handful of them actually support exports of static content. Even fewer of them offer support for staging. Apache's project websites aren't like Twitter, they don't have rapidly changing content that needs to be updated and delivered in real-time. The sites are meant to provide stable resources for the public to gain necessary information about the software we develop.
While not an open source offering, Roy T. Fielding pursued a CQ5 installation for the organization's use. Roy demoed the featureset at ApacheCon US 2009 and the members of the Infrastructure Team who saw it were thoroughly impressed. It seemingly met all of our core requirements.
However conditions changed in 2010 for Roy, and he simply lost any free time he could have put to this effort. We had to eliminate this as an option going forward, but thank Roy and Day for their time and consideration.
Lenya had most of the features we were looking for, but ultimately was rejected as being insufficiently flexible for use as a foundation-wide CMS. Allowing projects the flexibility of deploying per-project site build technologies which were only limited by the software installation on the build host was the Infrastructure Team's preferred strategy.
In September 2010 Philip Gollucci, VP Infrastructure, gave the green light to a custom-built CMS for the ASF, to be developed primarily by one of the contracted System Administrators. After collecting feedback on the goals and requirements of several interested parties, the development work was undertaken with a goal of completing the work in 60 days or less- just in time for ApacheCon 2010 NA. Fortunately the goals were kept simple enough that the actual development time only spanned about 30 days.
The software follows the Unix development mantra of separate executables for independent activities. The key separation was to ensure content presentation was kept independent from content editing, using the addressability of the web to sew things together. The main advantage of this approach is that it imposes relatively few constraints on the content generation software- different projects may adopt different tools to build their websites, without any of the conflicts inherent in single-process plugin architectures like Confluence.
Dotiac::DTL, a perl port of django's templating library, was chosen
for use with www.apache.org, it is not a requirement that projects adopt
it. Any templating system that runs on FreeBSD may be used, provided
the necessary (perl) glue code is written that makes the system compatible
with the CMS's build system.
The CMS relies on buildbot to provide automated builds and checkins of a project's staging site. Such builds are triggered instantly on commits to the project's site source material and are an essential component of the system.
The build system executes builds in parallel, so it is quite fast, even for a full site build.
Although it is strongly recommended that projects migrating to the CMS adopt markdown, it is not a hard requirement. In fact the codemirror is also provided as an option for those who prefer to store their source content in raw html.
The CMS's overall design was influenced heavily by django's architecture. From the build system to the preferred template system to the webgui, the influences are clear and obvious to anyone familiar with django.
Instead of developing versioning support and a notification scheme into a database driven CMS, Apache's subversion infrastructure was chosen as the central data store for everything. The fact that the web interface to the CMS interacts with the subversion repository in a LAN environment, combined with the lightning-fast SSDs that serve as l2arc cache for the underlying FreeBSD ZFS filesystem, eliminates virtually all subversion network/disk latency. Subversion continues to scale past 1M commits to deliver high performance to Apache developers, as well as to our internal programs that rely on it.
The mod_perl based webgui is under 3500 LOC and takes full advantage of the httpd module API. Being an in-process application it is respectably fast and will scale well even on the limited hardware (a FreeBSD jail) that it runs on.
It was also designed for humans already familiar with the featureset
of the svn command-line tool, taking cues from the Emacs
However it is accessible even to those without any familiarity with
bookmarklet allows users to go
from a live webpage to a WYSIWYM editor session in 2 clicks. Submitting,
committing, and publishing those changes is just as simple and
straightforward. You may access the CMS anonymously
if you are not currently an Apache committer.
Because the webgui revolves around providing users with a temporary server-side working copy, the urls it generates are not meant to be bookmarked, and are forbidden from being shared with others. The fulcrum for sharing changes is the staging site, and the "commits are easy and cheap" concept is built into the webgui.
However the url for publishing a website may be considered appropriate
for writing a basic web service client app. Since the site is based in
subversion developers may check-out the site and commit directly from
their workstations instead of through the webgui, so it may be convenient
for project members to have a simple site publication script. This choice
is entirely up to each project, and a reference implementation is available
at http://s.apache.org/cms-cli. Virtually every resource on the site may
be directed to be served as
application/json simply by adding
to the query string, or by setting
application/json as being preferable to
text/html in the "Accept" request header.
In order to scale effectively to handling multi-gigabyte size websites, the
webgui relies on zfs clones to create per-user working copies. The alternative
algorithm would be to physically copy (with say
cp -R) working copy
trees, but such algorithms are O(N) whereas a zfs clone (essentially a copy-on-write
version of the original) is O(1).
was developed by Paul Querna to provide an infrastructure for
distributing change notifications to our frontline webservers (eos and
aurora). This system is used by the CMS to convert site publication
requests into live publications, and will someday eventually supplant
find + rsync architecture for site publication. It
is a key component of Apache's infrastructure and will continue to be
promoted going forward, even for those projects who elect not to use
Despite the above remarks, there is still room for supporting the generation of "dynamic" content, in the same fashion that Planet Apache works. Namely buildbot may be setup to run periodic builds of select urls that have dynamic content, and to subsequently publish the results of those builds. While it is possible to run these jobs more frequently than once an hour, it is not recommended due to the ensuing email notification traffic generated thereby.
Since the CMS relies on separate sections of svn for original content and staging versus publication, it is possible to configure more relaxed ACLs for content authors versus those capable of publication. The Infrastructure Team recommends that the content on www.apache.org be editable by the full committership, while publication remains restricted to members, committers with apsite karma, and members of the Infrastructure Team.
This section lists the requirements for projects electing to adopt the CMS.
The original source tree MUST have the following layout:
trunk/ content/ (location of actual site content) lib/ (only required for projects using the standard perl build system) path.pm (the analog of django's url.py) view.pm (the corresponding views) cgi-bin/ (optional cgi directory) templates/ (location of site-wide templates) branches/ (optional branches, currently unused)
The source content MUST have a unique file extension for each
generated file. I.e. you cannot generate
the same source file living in the same directory. You must disambiguate
the paths to these resources using copies or svn externals (symlinks are
not supported, sorry).
There is a further restriction in that the webgui and build system treat
foo.page/ directories as attachment directories. This convention
prevents any files contained therein to be built, but may be treated as
content components (eg html snippets and images) for an individual webpage.
Moreover the source files MUST be utf8- no exceptions.
Content source files with
.md extensions are typically expected to
contain optional RFC-compliant (mail or http) headers at the top of the file, or YAML
headers as is customary in comparable, modern static site generation tools.
The build system is under 2000 LOC and relies on
lib/path.pm to provide a specially formatted
@patterns array to give the build system hints on which view to run for
a given source file. The patterns are checked in order, and if none of
the patterns match, the source file will simply be copied over to the build
tree. Each element of the
@patterns array is an arrayref which consists of
3 items: the pattern to test, the name of the view function to call, and a
hashref of named parameters to pass (by value) to the view function. The
patterns are tested against files based on their location rooted within the
lib/path.pm may also provide a hash
%dependencies mapping paths to array
refs. The keys lists names of files which will also be rebuilt whenever a
file matching a value has changed.
(This is typically used for sitemaps.)
The filenames in the values and also
listed in the keys are rooted in the
content/ subdirectory. The
dependency calculation is transitive.
The build system also requires the view functions in
return 2 values, the first being the generated content, and the second
being the new file extension.
The build system will always take the local path to
the current working directory for the build (branches are currently
Changes to either the
lib/ subdirs will trigger a full site
A detailed walkthrough is available for folks working on site design.
Note that the typical ASF::View
based views now support template preprocessing of source content by passing a
preprocess => 1
argument to the configured view in
With the introduction of
svn 1.7+ working copies, it becomes possible to plug in
a wide variety of functionally similar build systems to the standard perl system
described above- think
forrest, etc. If this interests you please
discuss the matter further on the
infrastructure@ mailing list. It is not unfair
to describe this CMS as simply a CI tool with a basic web browser interface.
This section describes the future plans for Apache Infrastructure as it relates to website publication.
After going live with www.apache.org, the next project we would like to tackle is the incubator website. It too is based on anakia, but thanks to Sam Ruby there is an xslt file available to help automate the conversion from xdoc to markdown sources. We would like to complete this migration by March 1, 2011.
After migrating the incubator site we will branch out to approach any Apache project still using anakia to convert to the CMS. This will of course be a project decision, but we hope the advantages of migration will be clear and well appreciated by pmc members. We hope to complete this process during the summer of 2011. Update: see ant adoption for new options for projects still stuck on Anakia.
The next long-term project to tackle is the eventual phaseout of Confluence backed websites. This will be an extensive project which will require development of content conversion tools, but the clock is ticking on how long we can continue to run Confluence without any support for the autoexport plugin. Update: see confluence adoption for new options for projects still stuck on Confluence.
The final long-term objective is to completely eliminate people.apache.org as the publication hub for Apache websites. Security considerations alone make this a worthwhile goal, and to make this happen we would like to mandate the adoption of at least svnpubsub for all projects by the end of 2012.
As of 1 Nov 2010, this ASF CMS system is now running the main www.apache.org site.
The code for the CMS itself is being developed by the Infrastructure Team, and you can follow its Subversion repository.
We are considering turning the CMS into a proper Apache project starting with
an incubator podling. If this interests, you please contact
and sign up!