Skip to content
This repository has been archived by the owner on Aug 15, 2023. It is now read-only.

Alfresco Transformation Options

ZachGerth edited this page Jan 23, 2018 · 10 revisions

Overview

In an effort to provide the ability to asynchronously transform content - both for migrations, so that documents being migrated in is not slowed by a renditioning service, and for user experience when uploading large documents - TSG has implemented a variety of Transformation interfaces, allowing documents to be transformed via. external services, via. the internal Alfresco transformer, or via. the Alfresco transformation server. Steps to configuring these are provided below.

Configuring the Asynchronous Transformer

To configure the asynchronous Alfresco embedded transformer, only a couple steps are required in your typical Alfresco OpenContent build.

  1. Open the alfrescoEmb-bean-config.xml and change the "rendition" bean from the "AlfrescoEmbInternalRenditionImpl" to the "AlfrescoEmbQueueRenditionImpl".

  2. In the project-bean-config.xml, add the following bean:

    <bean id="alf-rendition-creator" class="com.tsgrp.opencontent.alfresco.rendition.AlfrescoEmbRenditionCreator" depends-on="SearchService">
       <property name="serviceRegistry" ref="ServiceRegistry"/>
       <!--
           <property name="tenantAdminService" ref="TenantAdminService"/>
           Only inject the tenant admin service if you are in a multi-tenant environment and this is one of the servers that needs to run this.
       -->
       <property name="primaryServer" value="${renditioning.primary.server}"/>
       <property name="maxThreads" value="${max.num.renditioning.threads}"/>
       <property name="maxNumRetries" value="${max.num.rendition.retries}"/>
       <property name="queueQuery" value="${queue.query}"/>
    </bean>

This is the actual bean which queries for documents which need a rendition, and spins off threads to rendition them.

  1. In the same file, create the following beans to set Spring up to run the job:
    <bean id="renditionCreatorScheduler" class="org.springframework.scheduling.timer.MethodInvokingTimerTaskFactoryBean">
        <property name="targetObject" ref="alf-rendition-creator" />
        <property name="targetMethod" value="execute" />
    </bean>

    <bean id="scheduledRenditionTask" class="org.springframework.scheduling.timer.ScheduledTimerTask">
        <property name="delay" value="${rendition.ms.delay}" />
        <property name="period" value="${rendition.ms.period}" />
        <property name="timerTask" ref="renditionCreatorScheduler" />
    </bean>

    <bean id="rendition-creator-scheduler" class="org.springframework.scheduling.timer.TimerFactoryBean">
        <property name="scheduledTimerTasks">
            <list>
                <ref local="scheduledRenditionTask" />
            </list>
        </property>
    </bean>
  1. In alfresco-defaults.properties Make sure you have the query property configured:
  queue.query=ASPECT:tsg\\:renditioning AND -ASPECT:tsg\\:renditionFailed
  1. Finally, in your project-placeholders.properties, define the following properties:
  • max.num.renditioning.threads - This property is the maximum amount of threads renditioning documents which can run at once. For external Alfresco transformation servers, a good rule of thumb is N*2, where N is the number of cores on the server.
  • max.num.rendition.retries - This property is the maximum number of times the bean will try to rendition a particular document; if the document fails to rendition this number of times, it is marked as "Failed to Rendition" and the job will no longer pick it up.
  • rendition.ms.period - This property is the amount of time the job waits before querying again for new documents, in milliseconds. This is often set to 5000.
  • rendition.ms.delay - This property is the amount of time between server startup (when the bean is bootstrapped) and it begins running.
  • renditioning.primary.server - This property is to allow multiple servers to potentially be configured for fallbacks when renditioning. Set this to 'true' in single server environments, or for the server that will be responsible for transformations. You're done! HPI will no longer immediately rendition documents. Instead, the documents will be renditioned when the job next picks a document up.

Configuring the Alfresco External Transformer

Just follow the steps above, and then add a few more properties. Because the Alfresco external transformation server utilizes the same Alfresco API, the steps above will work with a correctly configured external transformation server. Steps for configuring the external Alfresco transformation server can be found in the Alfresco documentation.

  1. Make sure you have these properties configured in your alfresco-global.properties:
### external transformation server connection properties ###
transformserver.username=alfresco
transformserver.password=tsg
transformserver.url=http://<server>:<port>/transformation-server

content.transformer.remoteServer.extensions.msg.pdf.supported=true
renditioning.is.primary.server=true
  1. Add the primaryServer property to your project-bean-config:
    <bean id="alf-rendition-creator" class="com.tsgrp.opencontent.alfresco.rendition.AlfrescoEmbRenditionCreator" depends-on="SearchService">
       <property name="serviceRegistry" ref="ServiceRegistry"/>
       <property name="tenantAdminService" ref="TenantAdminService"/>
       <property name="maxThreads" value="${max.num.renditioning.threads}"/>
       <property name="maxNumRetries" value="${max.num.rendition.retries}"/>
       <property name="primaryServer" value="${renditioning.is.primary.server}"/>
       <property name="queueQuery" value="${queue.query}"/>
    </bean>

Configuring the JodConverter Tranformer File Size Limits

The JodConverter has default size limits on several file extensions that prevent files larger than the defaults from being transformed. These defaults are embedded in the transformer.properties file in the alfresco repository jar. If you would like to override these place the appropriate line in your alfresco-global.properties override with the size value you deem correct.

content.transformer.JodConverter.extensions.xlsm.pdf.maxSourceSizeKBytes=1536
content.transformer.JodConverter.extensions.pptm.pdf.maxSourceSizeKBytes=4096
content.transformer.JodConverter.extensions.xls.pdf.maxSourceSizeKBytes=10240
content.transformer.JodConverter.extensions.sldm.pdf.maxSourceSizeKBytes=4096
content.transformer.JodConverter.extensions.xltx.pdf.maxSourceSizeKBytes=1536
content.transformer.JodConverter.extensions.potx.pdf.maxSourceSizeKBytes=4096
content.transformer.JodConverter.extensions.docx.pdf.maxSourceSizeKBytes=768
content.transformer.JodConverter.extensions.xlsx.pdf.maxSourceSizeKBytes=1536
content.transformer.JodConverter.extensions.pptx.pdf.maxSourceSizeKBytes=4096
content.transformer.JodConverter.extensions.xlam.pdf.maxSourceSizeKBytes=1536
content.transformer.JodConverter.extensions.ppt.pdf.maxSourceSizeKBytes=6144
content.transformer.JodConverter.extensions.docm.pdf.maxSourceSizeKBytes=768
content.transformer.JodConverter.extensions.xltm.pdf.maxSourceSizeKBytes=1536
content.transformer.JodConverter.extensions.dotx.pdf.maxSourceSizeKBytes=768
content.transformer.JodConverter.extensions.xlsb.pdf.maxSourceSizeKBytes=1536
content.transformer.JodConverter.extensions.sldx.pdf.maxSourceSizeKBytes=4096
content.transformer.JodConverter.extensions.ppsm.pdf.maxSourceSizeKBytes=4096
content.transformer.JodConverter.extensions.potm.pdf.maxSourceSizeKBytes=4096
content.transformer.JodConverter.extensions.txt.pdf.maxSourceSizeKBytes=5120
content.transformer.JodConverter.extensions.ppam.pdf.maxSourceSizeKBytes=4096
content.transformer.JodConverter.extensions.dotm.pdf.maxSourceSizeKBytes=768
content.transformer.JodConverter.extensions.doc.pdf.maxSourceSizeKBytes=10240
content.transformer.JodConverter.extensions.vsd.pdf.maxSourceSizeKBytes=4096
content.transformer.JodConverter.extensions.vsdx.pdf.maxSourceSizeKBytes=4096
content.transformer.JodConverter.extensions.ppsx.pdf.maxSourceSizeKBytes=4096

Configuring an external, third-party Transformer

Support for other transformers at present is limited. If the transformer supports a filesystem-based approach - where you drop files in a folder, a job picks it up and places them in another folder when the renditioning is complete - then this support does exist. Follow the steps below.

  1. In the alfrescoEmb-bean-config.xml file, change the "rendition" bean to the "com.tsgrp.opencontent.alfresco.rendition.AlfrescoEmbRenditionImpl" class.
  2. In your project-bean-config, import the "transformation-filesystem-bean-config.xml" file.
  3. Similar to the above step, add the following beans to your project-bean-config:
    	<task:annotation-driven/>
	<bean id="alf-rendition-retrieve-job" class="com.tsgrp.opencontent.alfresco.rendition.AlfrescoEmbRenditionApplier" depends-on="SearchService">
		<property name="serviceRegistry" ref="ServiceRegistry"/>
	</bean>
	<bean id="schedulerTask" class="org.springframework.scheduling.timer.MethodInvokingTimerTaskFactoryBean">
		<property name="targetObject" ref="alf-rendition-retrieve-job" />
		<property name="targetMethod" value="execute" />
	</bean>
	
	<bean id="scheduledTask" class="org.springframework.scheduling.timer.ScheduledTimerTask">
	    <property name="delay" value="300000" />
	    <property name="period" value="${rendition.ms.delay}" />
	    <property name="timerTask" ref="schedulerTask" />
	</bean>
	<bean class="org.springframework.scheduling.timer.TimerFactoryBean">
		<property name="scheduledTimerTasks">
			<list>
				<ref local="scheduledTask" />
			</list>
		</property>
	</bean>
  1. Define the following properties.
  • rendition.ms.delay - the delay between checks for documents which need a rendition applied. As this checks the file system, this may be a rather slow process, and a 1 or 2 second higher delay than normal might be desired.
  • filesystem.dropLocation - This is the place a document which needs to be renditioned should be placed on the filesystem. Note this can be a network drive, etc.
  • filesystem.transformedLocation - This is the place where documents will be placed when transformed.
  • filesystem.counterTries - This is the number of times the job will attempt to pick up a transformed document before giving up.
  • filesystem.millisBetweenTries - This is the number of milliseconds between trying to pick up a transformed document.
  • oc.rest.endpoint - This is a property which points to where one of the smaller, "receiver" OpenContents exists which can return transformed documents.
  1. A new "filesystemTransform" OpenContent project should be deployed to a server where the transformations take place; this will be in charge of receiving requests from the server we configured above, and simply serving the documents up when requested. The above properties must also be defined in this project.
  2. You're done! Please note that the server with the "primary" OpenContent deployed must be able to contact the servers which have the "filesystemTransform" OpenContents deployed, as this is how the transformed content is communicated.
Clone this wiki locally