Contents
- David’s 7 Simple Rules about creating a JCR Content Model and how they differ than the traditional content model design
- Rule #1: Data first, structure later. Maybe.
- Rule #2: Drive the content hierarchy, don’t let it happen.
- Rule #3: Workspaces are for clone(), merge() and update().
- Rule #4: Beware of Same Name Siblings.
- Rule #5: References considered harmful.
- Rule #6: Files are files.
- Rule #7: IDs are evil.
- Share this:
- Like this:
- Related
David Nuescheler was a co-founder and CTO of Day Software AG, a company which was acquired by Adobe in 2010. David has led the development of the JSR-170, the Java Content Repository (JCR) application programming interface (API), the technology standard for content management. Based on the above API, Day Software created Day CQ Web Content Management system, which Adobe rebranded to Adobe Experience Manager (AEM) after the acquisition. Today AEM is the leading content management system as per Gartner’s Magic Quadrant .
Most of the AEM Developers and architects today have a strong background in the traditional Enterprise Relationship Data modeling with strong data types. Also the usual repositories used are relational databases, where tables and data identification and integrity is enforced at the repository level. JCR takes a different approach than the traditional Relational Database Design and Entity-Relationship (ER) modeling. Let’s see what it is all about.
David’s 7 Simple Rules about creating a JCR Content Model and how they differ than the traditional content model design
To guide a clear approach of designing the repository, David has came up with the following seven simple rules to be followed by the developers.
Rule #1: Data first, structure later. Maybe.
“I recommend not to worry about a declared data structure in an ERD sense. Initially” – says David. Creating a structure is expensive. There is an implicit contract about structure that your application inherently uses. Further data constraints like data type or mandatory constraints should only be applied for data integrity reasons.
For example, if you are creating a blog post, and want to provide the modification date, there is no reason to add a Date node. Since I implicitly trust my blog-writing application to put a “date” there anyway, there really is no need to declare the presence of a lastModified date in the form a of node type.
Note: This is different than the traditional application design, where each data must be properly structured and defined.
Rule #2: Drive the content hierarchy, don’t let it happen.
As per David, “the content hierarchy is a very valuable asset”. He suggest that one should design it very well. The node names should be human readable and should be structured similar to the file system – files and folders. It should allow for easy access control and containment.
For example, if we want to design a simple blogging system, we would approach it by making obvious the content structure. We would even not care about node types, initially.
/content/blog /content/blog/posts /content/blog/posts/davids_content_model /content/blog/posts/aem_as_a_cloud_service /content/blog/comments/davids_content_model/i_like_it /content/blog/comments/davids_content_model/i_like_it/i_disagree
From the above example, the content structure is obvious without any further explanations. What is not so obvious is why the “comments” are not stored under the “posts”. The reason is that access control can be applied in more granular way if we keep the comments under a separate siblings to the posts branch.
Rule #3: Workspaces are for clone(), merge() and update().
JCR introduces the very abstract concept of Workspaces which leaves a lot of developers unclear on what to do with them. As per David, if you have a considerable overlap of “corresponding” nodes (essentially the nodes with the same UUID) in multiple workspaces you probably put workspaces to good use.
David explains that Workspaces are meant for things like:
– project versions
– different states of content – “development”, “QA” and a published.
Do not use workspaces for things like:
– user home directories
– distinct content for different target audiences like public, private, local, etc.
– mail-inboxes for different users
Rule #4: Beware of Same Name Siblings.
Same Name Siblings (SNS) have been introduced into the spec to allow compatibility with data structures that are designed for and expressed through XML and therefore are extremely valuable to JCR.
Although SNS come with a substantial overhead and complexity for the repository. For example, if an SNS is removed or reordered, it has an impact on the paths of all the other SNS and their children.
It is better to use different node names, instead of sibling names with indexes.
Use:
/content/myblog/posts/aem_is_wonderful /content/myblog/posts/cloud_is_new_trend
instead of
/content/myblog[1]/post[1] /content/myblog[1]/post[2]
Rule #5: References considered harmful.
References imply referential integrity. They add overhead to maintain them and also are costly from content flexibility perspective. For example, one document linked to another in the same repository, cannot be deleted, updated, restored, or cloned without the referenced one.
So in JCR, it is better to either model those references as “weak-references” (in JCR v1.0 this essentially boils down to string properties that contain the uuid of the target node) or simply use a path.
Rule #6: Files are files.
If a content model exposes something that even remotely resembles like a file or a folder, try to use (or extend from) nt:file , nt:folder and nt:resource .
David provides another “good rule of thumb”: If you need to store the filename and the mime-type then nt:file / nt:resource is a very good match. If you could have multiple “files” an nt:folder is a good place to store them. If you need to add meta information for your resource, let’s say an “author” or a “description” property, extend nt:resource not the nt:file.
Rule #7: IDs are evil.
Another major difference are the the IDs. In relational databases IDs are a used to express relations. Developers who are transitioning to JCR tend to use them in content models as well. If your content model is full of properties that end in “Id” you probably are not leveraging the hierarchy properly.
Keep also in mind that items can be identified by path, and as much as “symlinks” make way more sense for most users than hardlinks in a Unix filesystem, a path makes a sense for most applications to refer to a target node.
Use:
/content/myblog/posts/iphone_shipping/attachments/front.jpg
instead of:
[Blog] -- blogId -- author [Post] -- postId -- blogId -- title -- text -- date [Attachment] -- attachmentId -- postId -- filename + resource (nt:resource)
Conclusion
The above 7 rules for creating JCR Content Models are actually true, based also on my 8 years of experience as an AEM developer and technical lead in couple of really interesting and challenging projects.I hope you find those useful. Let me know if you would like to discuss further.