开发者

Setting up Jackrabbit in a cluster environment

I want to set up Jackrabbit in a cluster (I am setting it up with Liferay).

I read this document - http://wiki.apache.org/jackrabbit/Clustering , unfortunately it's very short, so I don't understand some of the concepts and best practices. Let me first explain what is my set up:

we have 2 weblogic servers that share the same filesystem and we deploy the same war to both weblogics. I use Oracle as a db (I have connection pool configured in WL and want 开发者_StackOverflow中文版to connect using JNDI)

As I understand from the docs each node has to have a separate configuration with it's own repository directory, workspace filesystem and search index.

Both nodes share PersistranceManager, repository filesystem and datastore (if I have and)

Here are the questions:

  1. what is workspace filesystem and how is it different from repository filesystem. And what is workspace - as I understand it's part of repository and repository can have multiple workspaces but what is workspace is not described in docs.

  2. I want performance to be the best, I won't have to much content and users (10s of simultaneous users), so I want to optimize page load time for faster rendering of the pages. What would be the best practice - should I configure PersistanceManager to go to db?

  3. where should repository filesystem point on each node?

  4. where should workspaces point to on each node?

  5. where should workspace filesystem point to?

I tried to point all of them to my db, but I seem to have deadlocks (or db works too slow).

And I enabled logging and I see a lot of unnecessary reads, looks like for each upload of the file jackrabbit opens connection, pre-caches all the files, closes and does it several times (takes about a minute) to upload very small file, most likely something is wrong with my config.

Here is my config file:

<?xml version="1.0"?>
<Repository>
<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
        <param name="driver" value="javax.naming.InitialContext"/>
        <param name="url" value="ISG" />

    <param name="schema" value="oracle"/>
    <param name="schemaObjectPrefix" value="J_R_FS_"/>
</FileSystem>
<Security appName="Jackrabbit">
    <AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
    <LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
        <param name="anonymousId" value="anonymous" />
    </LoginModule>
</Security>
<Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="liferay" />
<Workspace name="${wsp.name}">
    <PersistenceManager class="org.apache.jackrabbit.core.persistence.db.OraclePersistenceManager">
            <param name="driver" value="javax.naming.InitialContext"/>
            <param name="url" value="ISG" />
        <param name="tableSpace" value="" />

        <param name="schema" value="oracle" />
        <param name="schemaObjectPrefix" value="J_PM_${wsp.name}_" />
        <param name="externalBLOBs" value="false" />
    </PersistenceManager>

    <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
            <param name="driver" value="javax.naming.InitialContext"/>
            <param name="url" value="ISG" />
        <param name="tableSpace" value="" />

        <param name="schema" value="oracle"/>
        <param name="schemaObjectPrefix" value="J_FS_${wsp.name}_"/>
    </FileSystem>
</Workspace>
<Versioning rootPath="${rep.home}/version">
    <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
            <param name="driver" value="javax.naming.InitialContext"/>
            <param name="url" value="ISG" />

        <param name="schema" value="oracle"/>
        <param name="schemaObjectPrefix" value="J_V_FS_"/>
    </FileSystem>
    <PersistenceManager class="org.apache.jackrabbit.core.persistence.db.OraclePersistenceManager">
            <param name="driver" value="javax.naming.InitialContext"/>
            <param name="url" value="ISG" />
        <param name="tableSpace" value="" />

        <param name="schema" value="oracle" />
        <param name="schemaObjectPrefix" value="J_V_PM_" />
        <param name="externalBLOBs" value="false" />
    </PersistenceManager>
</Versioning>

<Cluster id="node_1" syncDelay="2000">
  <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
        <param name="revision" value="${rep.home}/revision.log"/>
            <param name="driver" value="javax.naming.InitialContext"/>
            <param name="url" value="ISG" />
        <param name="tableSpace" value="" />

        <param name="schema" value="oracle"/>
        <param name="schemaObjectPrefix" value="J_C_"/>
    </Journal>
</Cluster>
</Repository>


Liferay's official documentation recommends sharing Jackrabbit data using a database in a clustered scenario, not the file system.

Let's say you're using the file system on each of your Liferay nodes (which is the out of the box Liferay configuration). Node A would not be able to access the Jackrabbit data on Node B and vice versa. As time goes by, the nodes become more and more out of synch. To get around this, you could create a network share and configure each node to point to the share. The problem with doing that is it could result in file corruption if each of the Liferay nodes are writing at the same time.

This leaves you with two options; keep independent file systems and integrate a synchronization utility or put the data in the database. Since file system synchronization is hokey at best, your best option is putting the Jackrabbit data in a database.

There are some pros and cons of using the database. It could decrease performance, true. At the same time, the data is now part of the regular disaster recovery strategy and some could argue it's more portable.

Edit - Addition: An AdvancedFileSystemHook was added at some point in version 5.2 which resolves issues with file corruption and locking concerns when using a shared network file system. In order to implement this, change your portal-ext.properties file to use the AdvancedFileSystemHook, migrate your data to the shared location, point your horizontal nodes to the shared location.


Is Jackrabbit mandatory? Liferay uses the storage engine to store "just" binary data, all the meta data is in Liferay's database, so you don't gain a lot from the JCR repository. This is unfortunate, but the way the current implementation works.

Next: Are you setting up a Jackrabbit cluster or a Liferay cluster? For a Jackrabbit cluster (in a single Liferay node environment) I can't really help. If you cluster Liferay, you'll find some information in the administration guide (click the pdf link - sadly the direct link to the clustering chapter in html is broken, but you'll find the chapter in the pdf - there it's working.)

Some details on Liferay clustering:

Liferay expects the document library to be "atomic" - that is: a document written on one of Liferay's nodes should be immediately readable on every other node in a Liferay cluster. The jackrabbit-solution you find in the administration guide makes jackrabbit use a database to share. But you'll see that the recommended solution is not tu use Jackrabbit, but AdvancedFilesystemHook - other than the default FileSystemHook it stores the documents in multiple subdirectories (works on network shares, SAN recommended). The default FileSystemHook is limited by the number of files allowed (by the OS) in a single directory, AdvancedFileSystemHook will circumvent this by creating multiple subdirectories (like a unix mailspool directory). If it's just for "a few" documents - not reaching any OS limit - I expect FileSystemHook to work as well on a shared directory, but I'm not really sure about file-locking issues there.

As you say you have 10's of users, caring for maximum performance seems to be over the top. I wouldn't expect any difference for any of the possible solutions. Clustering in this order of magnitude is rather about failover (e.g. high availability) than performance - at least from Liferay's point of view.

If you're setting up a Liferay cluster make sure you also follow all the other topics named in that chapter - especially cache synchronization. Otherwise you might be fooled to believe that your document library cluster does not work when it's only a cache that's out of sync.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜