Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZCS-16214 #6

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion build.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<property name='build.data.dir' value='${build.dir}/data/output' />
<property name='build.lib.dir' value='${build.dir}/lib' />

<property name='jar.file' value='${build.lib.dir}/${name}-${version}z2.jar'/>
<property name='jar.file' value='${build.lib.dir}/${name}-${version}z3.jar'/>

<target name='compile'
description="compiles the source"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,28 @@ public AntiSamyDOMScanner(Policy policy) {
public AntiSamyDOMScanner() throws PolicyException {
super();
}
// Method to decode the Unicode escape sequences
private String decodeUnicodeEscapes(String input) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess import-related regex was introduced in #4, and digging deeper found that it was introduced to circumvent an issue in antisamy library nahsra#24, and as per antisamy developer it's issue with CSS parser used in antisamy which seems fixed in nahsra#108
so if we upgrade the antisamy library then it's very well possible that we could remove our custom handling and also get rid of security issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have further investigated this, and here are my findings:
The fix mentioned in nahsra#108 refers to enabling embedStyleSheets
<directive name="embedStyleSheets" value="true"/>

According to the documentation https://github.com/nahsra/antisamy/wiki/AntiSamy-Directives,
the embedStyleSheets directive allows external stylesheets referenced through @import to be fetched and embedded into the sanitized output. Allowing CSS imports from external URLs is a dangerous practice. It exposes the application to security risks by allowing the inclusion of potentially malicious external CSS, which goes against AntiSamy’s purpose of ensuring secure input sanitization. Support for this feature in AntiSamy is deprecated and will be removed in a future release.

As described in #2, there remains an issue where media queries are stripped during sanitization. This behavior is attributed to the underlying third-party library (org.apache.xml.serialize.HTMLSerializer) used for document serialization within AntiSamy. The fix in nahsra#108 does not explicitly address this media query stripping issue.
Therefore, it is uncertain whether the upgrade resolves the media query serialization issue.

Suggested Next Steps:
Conduct a detailed review of the latest AntiSamy release to verify if the media query stripping issue (linked to HTMLSerializer) has been resolved.
Determine whether the upgrade allows us to eliminate our custom handling without compromising security.
Ensure the embedStyleSheets directive remains disabled to mitigate risks associated with remote CSS imports.

try {
StringBuffer decodedString = new StringBuffer();
String regex = "\\\\([0-9a-fA-F]{4})";
// Compile the regex
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);

// Find all matches and replace them with the decoded character
while (matcher.find()) {
String hexValue = matcher.group(1);
int unicodeValue = Integer.parseInt(hexValue, 16);
matcher.appendReplacement(decodedString, String.valueOf((char) unicodeValue));
}
matcher.appendTail(decodedString);
return decodedString.toString().replaceAll("\\\\", "");
} catch (Exception e) {
// If decoding fails, just return the original string
return input;
}
}

/**
* This is where the magic lives.
Expand Down Expand Up @@ -167,7 +189,7 @@ public CleanResults scan(String html) throws ScanException {
*/


final String trimmedHtml = html;
final String trimmedHtml = decodeUnicodeEscapes(html);

StringWriter out = new StringWriter();

Expand Down