-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Antisamy stripping @ and not encoding if it falls within <> #24
Comments
I'm investigating this. Using a DOM parser, with these settings, I get only: "firstname,lastname" in the output of .getCleanHTML(). Using a SAX parser, I get: |
This use case is not possible with the current architecture. I see the value. I think it’s a candidate for 1.5.9. |
I have similar kind of issue when i try to scan with text: "hello <hii world", i get the output : "hello". |
I have a similar issue, using the Antisamy library with version 1.5.8 and I tried writing the following unit test case: Input:
Expected on
Actual:
I can see that in the |
Also stumbled across this one - are there any plans to work on this in the near future? |
@spassarop - Is this even possible/reasonable? Or way too hard? I suspect 'too hard'. |
Some points to cover and clarify to understand the current behavior:
In conclusion, there is not much to do because of the underlying libraries that parse HTML and CSS. They just expect text in their respective formats to evaluate and parse them in their target spec, parsing invalid HTML will result in a weird result for sure. If you fear someone may put HTML in a place where it shouldn't the solution is to HTML-encode, not to filter. Maybe the whole string, maybe the fragments that must be HTML-free but get inserted in an HTML template, that depends entirely on the usage context. The most I can offer here is the allowed empty tag stuff, a logic change to remove them only if they are known tags but not present in that policy fragment definition. I hope all this explanation make the issues clear. |
@spassarop - You did a lot of analysis on this one. Are there any changes you are comfortable with making that would improve anything? |
Maybe this, so tags like I’m not sure if it does improve something but at least will be consistent with the expected behavior of “onUnknownTag” on policy definition. It can be done for the next version. The other problems cannot be solved through AntiSamy code. In my opinion they’re due to wrong usage and underlying libraries limitations, as I stated in the previous analysis. |
I am using antisamy 1.5.7.
I saw issue when input was
firstname,lastname<name@mail.com> or firstname,lastname<name@mail.com testing>
Result after Antisamy scan is same for both above cases
firstname,lastname<name>
I have below directive in policy file
<directive name="onUnknownTag" value="encode"/>
Is there a place in policy file I can update to encode @ when it is within <> ?
The text was updated successfully, but these errors were encountered: