-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MINVOKER-351] Escape special xml character in junit report #242
Conversation
ac940df
to
4a9a174
Compare
I don't understand this because it looks logically wrong. The model does not care how it is marshaled, shouldn't the XML writer do this? |
4a9a174
to
1ca02fb
Compare
@michael-o there is a problem with special chars ... they are not escaped / removed by the plexus-xml |
- use StringEscapeUtils.escapeXml10 form commons-text
1ca02fb
to
a587156
Compare
Is there an upstream issue for this? |
Read the JIRA issue. I assume we are talking about chars outside of https://en.wikipedia.org/wiki/Valid_characters_in_XML#XML_1.0? |
Exactly similar implementation should be in plexus-xml |
Looking into this. Can we make sure that we do not double escape the five protected chars. |
when we will use MXSerializer we will have a exception ... so special characters should be somehow filtered |
I used:
so escaping is turned off in |
@elharo Should an XML writer escape invalid chars to entities or fail here? |
Maybe you should add a comment in the source code to depict this. |
Am I stupid or where am I supposed to find the output of |
There is:
Standard output of test is pass to log file and next to report |
I know, but I don't see that stdout anywhere. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the buildLog already XML? Looks like it might be. If so, this is the wrong place to fix this problem. The escaping needs to happen when that XML is created, and this shouldn't write XML at all, just copy bytes.
@@ -1640,7 +1642,7 @@ private void runBuild( | |||
} | |||
} catch (RunFailureException e) { | |||
buildJob.setResult(e.getType()); | |||
buildJob.setFailureMessage(e.getMessage()); | |||
buildJob.setFailureMessage(StringEscapeUtils.escapeXml10(e.getMessage())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't followed through the complete code yet, but the failure message should generally only be escaped once, and that should be left to the XML writer. That is, don't escape a string that appears in PCDATA or an attribute value. Let the writer do that.
This all assumes that the writer does correctly escape everything. Some libraries have borked this up in one way or another. Looking at it now Xpp3DomWriter is a mess and doesn't have a clean model of when and what to escape. It seems to confuse the unescaped model strings with the serialized escaped strings, and switches back and forth between escaped or not depending on switches.
I'm not sure exactly how to fix this issue. Really what needs to happen is either fix XPP3DomWriter and likely XPPDom, while breaking the API and all existing dependents, or switch to a better designed XML library here. Short of that maybe you can make this work with enough unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buildJob
is class generated by modello internally MXSerializer
is used
Also here exception is generated by post-build scrip and message contains output of it .... it also be everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably need to pop the code open to figure this out. What is "buildJob"? It's a org.apache.maven.plugins.invoker.model.BuildJob, right? If so, why does it need escaped XML test instead of just plain text? I'd expect it to supply unescaped strings to the writer which would then do any necessary escaping.
In line 1639 above, the message is not escaped. It should likely be escaped in both or neither.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other place message is produced in code and we sure that does not contains special chars.
Only exception.getMessage()
can return everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming buildJob is just an object that contains strings and other things, but has no particular XML internals (like 99.9% of classes that contain strings) nothing inside it should be escaped.
Ideally the XML writer should escape strings when it needs to. Or is the XML writer is broken and won't do that, then the escaping should be done after the string is extracted from the buildJob and before it's passed to the writer. Escaping strings inside the buildJob breaks its semantics and tightly couples all the classes together in one particularly brittle path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming buildJob is just an object that contains strings and other things, but has no particular XML internals (like 99.9% of classes that contain strings) nothing inside it should be escaped.
Ideally the XML writer should escape strings when it needs to. Or if the XML writer is broken and won't do that, then the escaping should be done after the string is extracted from the buildJob and before it's passed to the writer. Escaping strings inside the buildJob breaks its semantics and tightly couples all the classes together in one particularly brittle path.
Yes, the XML writer is broken and needs an issue.
@@ -1779,7 +1781,7 @@ private void writeJunitReport(BuildJob buildJob, String safeFileName) throws Moj | |||
if (buildLogFile != null && buildLogFile.exists()) { | |||
getLog().debug("fileLogger:" + buildLogFile); | |||
try { | |||
systemOut.setValue(FileUtils.fileRead(buildLogFile)); | |||
systemOut.setValue(StringEscapeUtils.escapeXml10(FileUtils.fileRead(buildLogFile))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I don't yet understand is what the "value" being set here is in Xpp3Dom. I think (could be wrong) it's supposed to be the raw text to be output, which means Xpp3Dom is in fact not a document object model at all. The confusion between object model and serialized form is a real problem with Xpp3Dom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value
is a String
, that's it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buildLogFile
contains raw output of tested Maven project so can contain everything.
As we see root cause is in plexus-xml So what is your proposition to solve issue in m-invoker-p? |
Regardless of the fix or the flaw in Plexus XML I completely fail to understand the supplied reprocuder in the JIRA issue. |
Ok.
|
Moving to draft because the IT is poorly designed. |
I hope you want to redesign proposed IT or you have any other hints how should be done. |
It is rather the conditions the IT comes into play. Please read my analysis on the JIRA issue. |
@slawekjaranowski Just ran the changed IT with 3.8.8 and 3.9.6. In 3.8.8 I don't see output from 0x00 to 0x0F at all in |
@slawekjaranowski Please ping compiler and surefire plugin in the embedded IT, that's the reason why there is no output. |
Please also prepend: https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#getName%28int%29 |
Naive upstream solution: codehaus-plexus/plexus-xml#28 |
I have now tested the IT with the patched Plexus XML. Output looks fine now:
but the reader chokes now:
|
And the value is indeed correct: Document doc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(w.toString().getBytes(StandardCharsets.UTF_8)));
NodeList childNodes = doc2.getDocumentElement().getChildNodes();
for (int i = 0; i < childNodes.getLength(); i++) {
System.out.println(childNodes.item(i).getTextContent());
} output:
While I don't understand that those values are serialized, but cannot be deserialized the best would be to drop the serialization of them silently in Plexus XML. |
. Otherwise you need some sort of application specific convention for how you'll encode the disallowed C0 controls. E.g. writing each one as an element like |
In our case the easiest way would be to drop them, no? |
I guess it depends on what it means when a NUL character or other C0 control appears here. It might be fine to drop them or replace each one with a space, especially if you don't care about round triipping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slawekjaranowski Please create an issue with Plexus XML about the incomplete XML escaping and add the link the the spots where you escape. As soon as it is fixed those spots should be removed otherwise we will double escape. Otherwise the change is now fine for me.
For working around the plexus bugs, is it possible to add an additional step in between the build job and the serializer that escapes the message rather than passing escaped values into the build job? |
@elharo |
Right, I don't see a a way to get in between layers without monkey patching. |
e45e5ef
to
dd409d1
Compare
@michael-o, @elharo I try to fix plexus-xml |
Hehe jenkins will fail because build use a previous version of invoker .... and jenkins can not parse it ...
so will be fixed after release |
Let me try something locally: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slawekjaranowski I have added some readibility improvement because the output was a bit confusing, but the test remains. I am fine with that and we need to fix Jenkins later.
835f6af
to
9cdec81
Compare
@slawekjaranowski Thanks for the great collaboration. We have achieved a awesome result here. |
https://issues.apache.org/jira/browse/MINVOKER-351