forked from universal-ctags/ctags
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ctags-optlib.7.rst.in
494 lines (375 loc) · 18.7 KB
/
ctags-optlib.7.rst.in
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
.. _ctags-optlib(7):
==============================================================
ctags-optlib
==============================================================
--------------------------------------------------------------
Universal Ctags parser definition language
--------------------------------------------------------------
:Version: @VERSION@
:Manual group: Universal Ctags
:Manual section: 7
SYNOPSIS
--------
| **@CTAGS_NAME_EXECUTABLE@** [options] [file(s)]
| **@ETAGS_NAME_EXECUTABLE@** [options] [file(s)]
DESCRIPTION
-----------
*Exuberant Ctags*, the ancestor of *Universal Ctags*, has provided
the way to define a new parser from command line. Universal Ctags
extends and refines this feature. *optlib parser* is the name for such
parser in Universal Ctags. "opt" intends a parser is defined with
combination of command line options. "lib" intends an optlib parser
can be more than ad-hoc personal configuration.
This man page is for people who want to define an optlib parser. The
readers should read ctags(1) of Universal Ctags first.
Following options are for defining (or customizing) a parser:
* ``--langdef=<name>``
* ``--map-<LANG>=[+|-]<extension>|<pattern>``
* ``--kinddef-<LANG>=<letter>,<name>,<description>``
* ``--regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]``
* ``--mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]``
Following options are for controlling loading parser definition:
* ``--options=<pathname>``
* ``--options-maybe=<pathname>``
* ``--optlib-dir=[+]<directory>``
The design of options and notations for defining a parser in
Exuberant Ctags may focus on reducing the number of typing by user.
Reducing the number of typing is important for users who want to
define (or customize) a parser quickly.
On the other hand, the design in Universal Ctags focuses on
maintainability. The notation of Universal Ctags is redundant than
that of Exuberant Ctags; the newly introduced kind should be declared
explicitly, (long) names are approved than one-letter flags
specifying kinds, and naming rules are stricter.
This man page explains only stable options and flags. Universal Ctags
also introduces experimental options and flags which have names starting
with ``_``. For documentation on these options and flags, visit
Universal Ctags web site at https://ctags.io/.
Storing a parser definition to a file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Though it is possible to define a parser from command line, you don't
want to type the same command line each time when you need the parser.
You can store options for defining a parser into a file.
@CTAGS_NAME_EXECUTABLE@ loads files (preload files) listed in "FILES"
section of ctags(1) at program starting up. You can put your parser
definition needed usually to the files.
``--options=<pathname>``, ``--options-maybe=<pathname>``, and
``--optlib-dir=[+]<directory>`` are for loading optlib files you need
occasionally. See "Option File Options" section of ctags(1) for
these options.
As explained in "FILES" section of ctags(1), options for defining a
parser listed line by line in an optlib file. Prefixed white spaces are
ignored. A line starting with '#' is treated as a comment. Escaping
shell meta character is not needed.
Use ``.ctags`` as file extension for optlib file. You can define
multiple parsers in an optlib file but it is better to make a file for
each parser definition.
``--_echo=<msg>`` and ``--_force-quit=<num>`` options are for debugging
optlib parser.
Overview for defining a parser
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Design the parser
You need know both the target language and the ctags'
concepts (definition, reference, kind, role, field, extra). About
the concepts, ctags(1) of Universal Ctags may help you.
2. Give a name to the parser
Use ``--langdef=<name>`` option. *<name>* is referred as *<LANG>* in
the later steps.
3. Give a file pattern or file extension for activating the parser
Use ``--map-<LANG>=[+|-]<extension>|<pattern>``.
4. Define kinds
Use ``--kinddef-<LANG>=<letter>,<name>,<description>`` option.
Universal Ctags introduces this option. Exuberant Ctags doesn't
have. In Exuberant Ctags, a kind is defined as a side effect of
specifying ``--regex-<LANG>=`` option. So user doesn't have a
chance to recognize how important the definition of kind.
5. Define patterns
Use ``--regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]``
option for a single-line regular expression. You can also use
``--mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]``
option for a multi-line regular expression.
As *<kind-spec>*, you can use the one-letter flag defined with
``--kinddef-<LANG>=<letter>,<name>,<description>`` option.
OPTIONS
------------
``--langdef=<name>``
Defines a new user-defined language, *<name>*, to be parsed with regular
expressions. Once defined, *<name>* may be used in other options taking
language names.
*<name>* must consist of alphanumeric characters, '``#``', or '``+``'
('[a-zA-Z0-9#+]+'). The graph characters other than '``#``' and
'``+``' are disallowed (or reserved). Some of them (``[-=:{.]``) are
disallowed because they can make the command line parser of
@CTAGS_NAME_EXECUTABLE@ confused. The rest of them are just
reserved for future extending @CTAGS_NAME_EXECUTABLE@.
``all`` is an exception. ``all`` as *<name>* is not acceptable. It is
a reserved word. See the description of
``--kinds-(<LANG>|all)=[+|-](<kinds>|*)`` option in ctags(1) about how the
reserved word is used.
The names of built-in parsers are capitalized. When
@CTAGS_NAME_EXECUTABLE@ evaluates an option in a command line, and
chooses a parser, @CTAGS_NAME_EXECUTABLE@ uses the names of
parsers in a case-insensitive way. Therefore, giving a name
started from a lowercase character doesn't help you to avoid the
parser name confliction. However, in a tags file,
@CTAGS_NAME_EXECUTABLE@ prints parser names in a case-sensitive
way; it prints a parser name as specified in ``--langdef=<name>``
option. Therefore, we recommend you to give a name started from a
lowercase character to your private optlib parser. With this
convention, people can know where a tag entry in a tag file comes
from a built-in parser or a private optlib parser.
``--kinddef-<LANG>=<letter>,<name>,<description>``
Define a kind for *<LANG>*.
Be not confused this with ``--kinds-<LANG>``.
*<letter>* must be an alphabetical character ('[a-zA-EG-Z]')
other than "F". "F" has been reserved for representing a file
since Exuberant Ctags.
*<name>* must start with an alphabetic character, and the rest
must be alphanumeric ('[a-zA-Z][a-zA-Z0-9]*'). Do not use
"file" as *<name>*. It has been reserved for representing a file
since Exuberant Ctags.
Note that using a number character in a *<name>* violates the
version 2 of tags file format though @CTAGS_NAME_EXECUTABLE@
accepts it. For more detail, see tags(5).
*<description>* comes from any printable ASCII characters. The
exception is ``{`` and ``\``. ``{`` is reserved for adding flags
this option in the future. So put ``\`` before ``{`` to include
``{`` to a description. To include ``\`` itself to a description,
put ``\`` before ``\``.
Both *<letter>*, *<name>* and their combination must be unique in
a *<LANG>*.
This option is newly introduced in Universal Ctags. This option
reduces the typing defining a regex pattern with
``--regex-<LANG>=``, and keeps the consistency of kind
definitions in a language.
The *<letter>* can be used as an argument for ``--kinds-<LANG>``
option to enable or disable the kind. Unless ``K`` field is
enabled, the *<letter>* is used as value in the "kind" extension
field in tags output.
The *<name>* surrounded by braces can be used as an argument for
``--kind-<LANG>`` option. If ``K`` field is enabled, the *<name>*
is used as value in the "kind" extension field in tags output.
The *<description>* and *<letter>* are listed in ``--list-kinds``
output. All three elements of the kind-spec are listed in
``--list-kinds-full`` output. Don't use braces in the
*<description>*. They will be used meta characters in the future.
``--regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]``
Define a single-line regular expression.
The */<line_pattern>/<name_pattern>/* pair defines a regular expression
replacement pattern, similar in style to ``sed`` substitution
commands, ``s/regexp/replacement/``, with which to generate tags from source files mapped to
the named language, *<LANG>*, (case-insensitive; either a built-in
or user-defined language).
The regular expression, *<line_pattern>*, defines
an extended regular expression (roughly that used by egrep(1)),
which is used to locate a single source line containing a tag and
may specify tab characters using ``\t``.
When a matching line is
found, a tag will be generated for the name defined by
*<name_pattern>*, which generally will contain the special
back-references ``\1`` through ``\9`` to refer to matching sub-expression
groups within *<line_pattern>*.
The '``/``' separator characters shown in the
parameter to the option can actually be replaced by any
character. Note that whichever separator character is used will
have to be escaped with a backslash ('``\``') character wherever it is
used in the parameter as something other than a separator. The
regular expression defined by this option is added to the current
list of regular expressions for the specified language unless the
parameter is omitted, in which case the current list is cleared.
Unless modified by *<flags>*, *<line_pattern>* is interpreted as a POSIX
extended regular expression. The *<name_pattern>* should expand for all
matching lines to a non-empty string of characters, or a warning
message will be reported unless ``{placeholder}`` regex flag is
specified.
A kind specifier (*<kind-spec>*) for tags matching regexp may
follow *<name_pattern>*, which will determine what kind of tag is
reported in the ``kind`` extension field (see tags(5)).
*<kind-spec>* has two forms: *one-letter form* and *full form*.
The one-letter form in the form of ``<letter>``. It just refers a kind
*<letter>* defined with ``--kinddef-<LANG>``. This form is recommended in
Universal Ctags.
The full form of *<kind-spec>* is in the form of
``<letter>,<name>,<description>``. Either the kind *<name>* and/or the
*<description>* can be omitted. See the description of
``--kinddef-<LANG>=<letter>,<name>,<description>`` option about the
elements.
The full form is supported only for keeping the compatibility with Exuberant
Ctags which does not have ``--kinddef-<LANG>`` option. Supporting the
form will be removed from Universal Ctags in the future.
.. MEMO: the following line is commented out
If *<kind-spec>* is omitted, it defaults to ``r,regex``.
About *<flags>*, see "FLAGS FOR ``--regex-<LANG>`` OPTION".
For more information on the regular expressions used by
@CTAGS_NAME_EXECUTABLE@, see either the regex(5,7) man page, or
the GNU info documentation for regex (e.g. "``info regex``").
``--list-regex-flags``
Lists the flags that can be used in ``--regex-<LANG>`` option.
``--list-mline-regex-flags``
Lists the flags that can be used in ``--mline-regex-<LANG>`` option.
``--mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]``
Define a multi-line regular expression.
This option is similar to ``--regex-<LANG>`` option except the pattern is
applied to the whole file’s contents, not line by line.
``--_echo=<message>``
Print *<message>* to the standard error stream. This is helpful to
understand (and debug) optlib loading feature of Universal Ctags.
``--_force-quit[=<num>]``
Exits immediately when this option is processed. If *<num>* is used
as exit status. The default is 0. This is helpful to debug optlib
loading feature of Universal Ctags.
FLAGS FOR ``--regex-<LANG>`` OPTION
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can specify more than one flag, ``<letter>|{<name>}``, at the end of ``--regex-<LANG>`` to
control how Universal Ctags uses the pattern.
Exuberant Ctags uses a *<letter>* to represent a flag. In
Universal Ctags, a *<name>* surrounded by braces (name form) can be used
in addition to *<letter>*. The name form makes a user reading an optlib
file easier.
The most of all flags newly added in Universal Ctags
don't have the one-letter representation. All of them have only the name
representation. ``--list-regex-flags`` lists all the flags.
``basic`` (one-letter form ``b``)
The pattern is interpreted as a POSIX basic regular expression.
``exclusive`` (one-letter form ``x``)
Skip testing the other patterns if a line is matched to this
pattern. This is useful to avoid using CPU to parse line comments.
``extend`` (one-letter form ``e``)
The pattern is interpreted as a POSIX extended regular
expression (default).
``pcre2`` (one-letter form ``p``, experimental)
The pattern is interpreted as a PCRE2 regular expression explained
in pcre2syntax(3). This flag is available only if the ctags is
built with ``pcre2`` library. See the output of
``--list-features`` option to know whether your ctags is
built-with ``pcre2`` or not.
``icase`` (one-letter form ``i``)
The regular expression is to be applied in a case-insensitive
manner.
``placeholder``
Don't emit a tag captured with a regex pattern. The replacement
can be an empty string. See the following description of
``scope=...`` flag about how this is useful.
``scope=(ref|push|pop|clear|set|replace)``
Specify what to do with the internal scope stack.
A parser programmed with ``--regex-<LANG>`` has a stack (scope
stack) internally. You can use it for tracking scope
information. The ``scope=...`` flag is for manipulating and
utilizing the scope stack.
If ``{scope=push}`` is specified, a tag captured with
``--regex-<LANG>`` is pushed to the stack. ``{scope=push}``
implies ``{scope=ref}``.
You can fill the scope field (``scope:``) of captured tag with
``{scope=ref}``. If ``{scope=ref}`` flag is given,
@CTAGS_NAME_EXECUTABLE@ attaches the tag at the top to the tag
captured with ``--regex-<LANG>`` as the value for the ``scope:``
field.
@CTAGS_NAME_EXECUTABLE@ pops the tag at the top of the stack when
``--regex-<LANG>`` with ``{scope=pop}`` is matched to the input
line.
Specifying ``{scope=clear}`` removes all the tags in the scope.
Specifying ``{scope=set}`` removes all the tags in the scope, and
then pushes the captured tag as ``{scope=push}`` does.
``{scope=replace}`` does the three things sequentially. First it
does the same as ``{scope=pop}``, then fills the ``scope:`` field
of the tag captured with ``--regex-<LANG>``, and pushes the tag to
the scope stack as if ``{scope=push}`` was given finally.
You cannot specify another scope action together with
``{scope=replace}``.
You don't want to specify ``{scope=pop}{scope=push}`` as an
alternative to ``{scope=replace}``; ``{scope=pop}{scope=push}``
fills the ``scope:`` field of the tag captured with ``--regex-<LANG>``
first, then pops the tag at the top of the stack, and pushes
the captured tag to the scope stack finally. The timing when
filling the end field is different between ``{scope=replace}`` and
``{scope=pop}{scope=push}``.
In some cases, you may want to use ``--regex-<LANG>`` only for its
side effects: using it only to manipulate the stack but not for
capturing a tag. In such a case, make *<name_pattern>* component of
``--regex-<LANG>`` option empty while specifying ``{placeholder}``
as a regex flag. For example, a non-named tag can be put on
the stack by giving a regex flag "``{scope=push}{placeholder}``".
You may wonder what happens if a regex pattern with
``{scope=ref}`` flag matches an input line but the stack is empty,
or a non-named tag is at the top. If the regex pattern contains a
``{scope=ref}`` flag and the stack is empty, the ``{scope=ref}``
flag is ignored and nothing is attached to the ``scope:`` field.
If the top of the stack contains an unnamed tag,
@CTAGS_NAME_EXECUTABLE@ searches deeper into the stack to find the
top-most named tag. If it reaches the bottom of the stack without
finding a named tag, the ``{scope=ref}`` flag is ignored and
nothing is attached to the ``scope:`` field.
When a named tag on the stack is popped or cleared as the side
effect of a pattern matching, @CTAGS_NAME_EXECUTABLE@ attaches the
line number of the match to the ``end:`` field of
the named tag.
@CTAGS_NAME_EXECUTABLE@ clears all of the tags on the stack when it
reaches the end of the input source file. The line number of the
end is attached to the ``end:`` field of the cleared tags.
``warning=<message>``
print the given *<message>* at WARNING level
``fatal=<message>``
print the given *<message>* and exit
EXAMPLES
-------------
Perl Pod
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is the definition (pod.ctags) used in ctags for parsing Pod
(https://perldoc.perl.org/perlpod.html) file.
.. code-block:: ctags
--langdef=pod
--map-pod=+.pod
--kinddef-pod=c,chapter,chapters
--kinddef-pod=s,section,sections
--kinddef-pod=S,subsection,subsections
--kinddef-pod=t,subsubsection,subsubsections
--regex-pod=/^=head1[ \t]+(.+)/\1/c/
--regex-pod=/^=head2[ \t]+(.+)/\1/s/
--regex-pod=/^=head3[ \t]+(.+)/\1/S/
--regex-pod=/^=head4[ \t]+(.+)/\1/t/
Using scope regex flags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let's think about writing a parser for a very small subset of the Ruby
language.
input source file (``input.srb``)::
class Example
def methodA
puts "in class_method"
end
def methodB
puts "in class_method"
end
end
The parser for the input should capture ``Example`` with ``class`` kind,
``methodA``, and ``methodB`` with ``method`` kind. ``methodA`` and ``methodB``
should have ``Example`` as their scope. ``end:`` fields of each tag
should have proper values.
optlib file (``sub-ruby.ctags``):
.. code-block:: ctags
--langdef=subRuby
--map-subRuby=.srb
--kinddef-subRuby=c,class,classes
--kinddef-subRuby=m,method,methods
--regex-subRuby=/^class[ \t]+([a-zA-Z][a-zA-Z0-9]+)/\1/c/{scope=push}
--regex-subRuby=/^end///{scope=pop}{placeholder}
--regex-subRuby=/^[ \t]+def[ \t]+([a-zA-Z][a-zA-Z0-9_]+)/\1/m/{scope=push}
--regex-subRuby=/^[ \t]+end///{scope=pop}{placeholder}
command line and output::
$ ctags --quiet --fields=+eK \
--options=./sub-ruby.ctags -o - input.srb
Example input.srb /^class Example$/;" class end:8
methodA input.srb /^ def methodA$/;" method class:Example end:4
methodB input.srb /^ def methodB$/;" method class:Example end:7
SEE ALSO
--------
The official Universal Ctags web site at:
https://ctags.io/
ctags(1), tags(5), regex(3), regex(7), egrep(1), pcre2syntax(3)
AUTHOR
------
Universal Ctags project
https://ctags.io/
(This man page partially derived from ctags(1) of
Executable-ctags)
Darren Hiebert <dhiebert@users.sourceforge.net>
http://DarrenHiebert.com/