Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c++ parsing error #1750

Closed
tuarba opened this issue May 11, 2018 · 12 comments · Fixed by #1759
Closed

c++ parsing error #1750

tuarba opened this issue May 11, 2018 · 12 comments · Fixed by #1759
Assignees

Comments

@tuarba
Copy link

tuarba commented May 11, 2018

(If you will report an issue about the result of parsing,
fill the following items. If you get the input file somewhere,
write the URL or something.)


The name of the parser:
c++
The command line you used to run ctags:
ctags -o tags bad.h
The content of input file:

#define int0 int
template <int,
         class _Comp0=less<int>,
         class _Comp1=less<pair<int, int> > >
class Test : public set<int> {
  typedef int xxx;
};
#define int1 int

The tags output you are not satisfied with:

int0	bad.h	/^#define int0 /;"	d

The tags output you expect:

Test	bad.h	/^class Test : public set<int> {$/;"	c
int0	bad.h	1;"	d
int1	bad.h	8;"	d
xxx	bad.h	/^  typedef int xxx;$/;"	t	class:Test

This is what Exuberant Ctags generates

@masatake
Copy link
Member

It is not reproduced on my environment.

[yamato@master]/tmp% cat foo.h
#define int0 int
template <int,
class _Comp0=less,
class _Comp1=less<pair<int, int> > >
class Test : public set {
typedef int xxx;
};
#define int1 int
[yamato@master]/tmp% ~/var/ctags-github/ctags --options=NONE --sort=no -o - /tmp/foo.h
ctags: Notice: No options will be read from files or environment
int0	/tmp/foo.h	/^#define int0 /;"	d
Test	/tmp/foo.h	/^class Test : public set {$/;"	c
xxx	/tmp/foo.h	/^typedef int xxx;$/;"	t	class:Test	typeref:typename:int
int1	/tmp/foo.h	/^#define int1 /;"	d
[yamato@master]/tmp% ~/var/ctags-github/ctags --version
Universal Ctags 0.0.0(54afb0f9), Copyright (C) 2015 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: May  1 2018, 04:57:38
  URL: https://ctags.io/
  Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath, +json, +interactive, +sandbox, +yaml, +aspell

@tuarba
Copy link
Author

tuarba commented May 11, 2018

Not sure if this matters, but here is my ctags --version:
Universal Ctags 0.0.0(744610c), Copyright (C) 2015 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
Compiled: May 11 2018, 13:33:26
URL: https://ctags.io/
Optional compiled features: +wildcards, +regex, +iconv, +option-directory

@masatake
Copy link
Member

masatake commented May 11, 2018

Could you run Universal-crags with following command line?

$ ctags  --options=NONE --fields='*' -o tags bad.h

@tuarba
Copy link
Author

tuarba commented May 11, 2018

Here is what I got:
!_TAG_FILE_FORMAT 2 /extended format; --format=1 will not append ;" to lines/
!_TAG_FILE_SORTED 1 /0=unsorted, 1=sorted, 2=foldcase/
!_TAG_OUTPUT_MODE u-ctags /u-ctags or e-ctags/
!_TAG_PROGRAM_AUTHOR Universal Ctags Team //
!_TAG_PROGRAM_NAME Universal Ctags /Derived from Exuberant Ctags/
!_TAG_PROGRAM_URL https://ctags.io/ /official site/
!_TAG_PROGRAM_VERSION 0.0.0 /744610c/
int0 bad.h /^#define int0 /;" kind:macro line:1 language:C++ roles:def end:1

@b4n
Copy link
Member

b4n commented May 12, 2018

I can confirm the issue. @masatake your input file doesn't have the same line 3, because GitHub ate the <int> on that line. If you add it back, you get @tuarba's behavior. I edited the first comment to properly use code segments not to hide what looks like HTML tags.

So this means that it's the line 3's <int> part that confuses the C++ parser for some reason.

@masatake
Copy link
Member

@b4n, thank you. Reproduced here. I have to update issue our issue template to let a report use tipple quotes for preformatting.

@masatake
Copy link
Member

@pragmaware, could you look at this issue?

I've tried to minimize the reproducer. However, I cannot make it enough small.

template <class _Comp0=less<float>,
          class _Comp1=less<pair<long, double> > >
class Test : set<int> {
  char c;
};

@pragmaware
Copy link
Contributor

Template nastiness... hell of corner cases. This is one of the hard bits.
I've got a fix for this case but it breaks the parser-cxx.r/less-than-operator-between-anglebrackets.d unit test, which is harder to fix and has been handled with some "tricks".

Food for thoughts.

How to figure out that the second < in

template<char i, bool B = i < 10> void f1(void) { } ...

is actually an operator an not a template angle bracket?

@masatake
Copy link
Member

template<char i, bool B = i < 10> void f1(void) { } ...

How does a real compiler handle this?
It looks ambiguous syntax for me.

@pragmaware
Copy link
Contributor

It is one of the test cases we have. The compiler knows that i is a variable and not a (template) type, because it's declared just before... but in theory it could be defined anywhere else.

@codebrainz
Copy link
Contributor

codebrainz commented May 12, 2018

How does a real compiler handle this?

One trick I've seen to make such things simpler to parse is to preprocess the tokens and insert special disambiguation tokens, like say the tokens were:

...
TEMPLATE 
LEFT_ANGLE 
IDENTIFIER 
IDENTIFIER 
COMMA
IDENTIFIER
IDENTIFIER
EQUAL
IDENTIFIER
LEFT_ANGLE
INTEGER
RIGHT_ANGLE
...

You scan the tokens looking for template< and then count < and > openings and closings and inject some tokens like:

...
TEMPLATE_START*
IDENTIFIER
IDENTIFIER
COMMA
IDENTIFIER
IDENTIFIER
EQUAL
IDENTIFIER
LEFT_ANGLE
INTEGER
TEMPLATE_END*
...

* = synthetic token, replacing original token(s)

Of course it's a dirty hack, and doesn't help with truly ambiguous stuff like this:

a < b > c;

Where it's literally impossible to tell whether it's an expression statement comparing a less-than b greater-than c or a variable declaration c of type a templated on b, without first resolving all of the type names. The joys of a crappy grammar 😄

masatake added a commit to masatake/ctags that referenced this issue May 21, 2018
…nested triangle brackets in template)

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
masatake added a commit to masatake/ctags that referenced this issue May 22, 2018
…nested triangle brackets in template)

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
masatake added a commit to masatake/ctags that referenced this issue May 22, 2018
…nested triangle brackets in template)

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake
Copy link
Member

The input is recorded as a test case triggering the bug. See #1756.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants