Choosing a Storage Layer
Choosing a Storage Layer¶
When building a CMS, the choice of storage layer is one of the key decisions to take. Many factors must be considered, the good news is that with all the components and Bundles in the CMF, it takes extra care to provide the necessary extension points to ensure the CMF remains storage layer agnostic.
The goal of this tutorial is to explain the considerations and why Symfony CMF suggest PHPCR and PHPCR-ODM as the ideal basis for a CMS. However, all components and Bundles can be integrated with other solutions with a fairly small amount of work.
Requirements for a CMS Storage Layer¶
At the most fundamental level a CMS is about storing, so the first requirement is that a CMS must provide means to store content with different properties.
A CMS has very different storage needs than for example a system for processing orders. Do note however that it is entirely possible and very intended of the CMF initiative to enable developers to combine the CMF with a system for processing orders. So for example one could create a shopping solution using the CMF for storing the product catalog, while using another system for maintaining the inventory, customer data and orders. This leads to the second requirement, a CMS must provide means to reference content, both content stored inside the CMS, but also in other systems.
The actual content in a CMS tends to be organized in a tree like structure, mimicking a file system. Note that content authors might want to use different structures for how to organize the content and how to organize other aspects like the menu and the routing. This leads to the third requirement, a CMS must provide means to represent the content as a tree structure. Furthermore a fourth requirement is that a CMS should allow maintaining several independent tree structures.
In general data inside a CMS tends to be unstructured. So while several pages inside the CMS might be very similar, there is a good chance that there will be many permutations needing different extra fields, therefore a CMS must not enforce a singular schema for content. That being said, in order to better maintain the content structure and enabling UI layers from generically displaying content elements it is important to optionally be able to express rules that must be followed and that can also help attach additional semantic meaning. So a CMS must provide means to optionally define a schema for content elements.
This requirement actually also relates to another need, in that a CMS must make it easy for content authors to prepare a series of changes in a staging environment that then needs to go online in a single step. This means another requirement is that it is necessary that a CMS should support moving and exporting content between independent tree structures. Note that exporting can be useful also for backups.
When making changes it would however also be useful to be able to version the change sets, so that they remain available for historical purposes, but also to be able to revert whenever needed. Therefore the next requirement is that a CMS should provide the ability to version content.
As we live in a globalized world, websites need to provide content in multiple languages addressing different regions. However not all pieces of content need to be translated and others might only be eventually translated but until then the user should be presented the content in one of the available languages, so a CMS should provide the ability to store content in different languages, with optional fallback rules.
As a CMS usually tends to store an increasing amount of content it will become necessary to provide some way for users to search the content even when the user has only a very fuzzy idea about the content they are looking for, leading to the requirement that a CMS must provide full text search capabilities, ideally leveraging both the contents tree structure and the data schema.
Another popular need is limiting read and/or write access of content to specific users or groups. Ideally this solution would also integrate with the tree structure. So it would be useful if a CMS provides capabilities to define access controls that leverage the tree structure to quickly manage access for entire subtrees.
Finally not all steps in the content authoring process will be done by the same person. As a matter of fact there might be multiple steps all of which might not even be done by a person. Instead some of the steps might even be executed by a machine. So for example a photographer might upload a new image, a content author might attach the photo to some text, then the system automatically generates thumbnails and web optimized renditions and finally an editor decides on the final publication. Therefore a CMS should provide capabilities to assist in the management of workflows.
Here is a summary of the above requirements. Note some of the requirements have a must, while others only have a should. Obviously depending on your use case you might prioritize features differently:
- a CMS must provide means to store content with different properties;
- a CMS must provide means to reference content;
- a CMS must provide means to represent the content as a tree structure;
- a CMS must provide full text search capabilities;
- a CMS must not enforce a singular schema for content;
- a CMS must provide means to optionally define a schema for content elements;
- a CMS should allow maintaining several independent tree structures;
- a CMS should support moving and exporting content between independent tree structures;
- a CMS should provide the ability to version content;
- a CMS should provide the ability to store content in different languages, with optional fallback rules;
- a CMS should provides capabilities to define access controls;
- a CMS should provide capabilities to assist in the management of workflows.
Looking at the above requirements it becomes apparent that out the box an RDBMS is ill-suited to address the needs of a CMS. RDBMS were never intended to store tree structures of unstructured content. Really the only requirement RDBMS cover from the above list is the ability to store content, some way to reference content, keep multiple separate content structures and a basic level of access controls and triggers.
This is not a failing of RDBMS in the sense that they were simply designed for a different use case: the ability to store, manipulate and aggregate structured data. This makes them ideal for storing inventory and orders.
That is not to say that it is impossible to build a system on top of an RDBMS that addresses more or even all of the above topics. Some RDBMS natively support recursive queries, which can be useful for retrieving tree structures. Even if such native support is missing, there are algorithms like materialized path and nested sets that can enable efficient storage and retrieval of tree structures for different use cases.
The point is however that these all require algorithms and code on top of an RDBMS which also tightly bind your business logic to a particular RDBMS and/or algorithm even if some of them can be abstracted. So again using an ORM one could create a pluggable system for managing tree structures with different algorithms which prevent binding the business logic of the CMS to a particular algorithm.
However it should be said once more, that all Bundles and Components in the CMF are developed to enable any persistent storage API and we welcome contributions for adding implementations for other storage systems. So for example CMF RoutingBundle currently only provides Document classes for PHPCR ODM, but the interfaces defined in the Routing component are storage agnostic and we would accept a contribution to add Doctrine ORM support.
PHPCR essentially is a set of interfaces addressing most of the requirements from the above list. This means that PHPCR is totally storage agnostic in the sense that it is possible to really put any persistence solution behind PHPCR. So in the same way as an ORM can support different tree storage algorithms via some plugin, PHPCR aims to provide an API for the entire breath of CMS needs, therefore cleanly separating the entire business logic of your CMS from the persistence choice. As a matter of fact the only feature above not natively supported by PHPCR is support for translations.
Thanks to the availability of several PHPCR implementations supporting various kinds of persistence choices, creating a CMS on top of PHPCR means that end users are enabled to pick and choose what works best for them, their available resources, their expertise and their scalability requirements.
So for the simplest use cases there is for example a Doctrine DBAL based solution provided by the Jackalope PHPCR implementation that can use the SQLite RDBMS shipped with PHP itself. At the other end of the spectrum Jackalope also supports Jackrabbit which supports clustering and can efficiently handle data into the hundreds of gigabytes. By default Jackrabbit simply uses the file system for persistence, but it can also use an RDBMS. However future versions will support MongoDB and support for other NoSQL solutions like CouchDB or Cassandra is entirely possible. Again, switching the persistence solution would require no code changes as the business logic is only bound to the PHPCR interfaces.
As mentioned above using PHPCR does not mean giving up on RDBMS. In many ways, PHPCR can be considered a specialized ORM solution for CMS. However while PHPCR works with so called nodes, in an ORM people expect to be able to map class instances to a persistence layer. This is exactly what PHPCR ODM provides. It follows the same interface classes as Doctrine ORM while also exposing all the additional capabilities of PHPCR, like trees and versioning. Furthermore, it also provides native support for translations, covering the only omission of PHPCR for the above mentioned requirements list of a CMS storage solution.
This work, including the code samples, is licensed under a Creative Commons BY-SA 3.0 license.