Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining parser for ENDF file format #2622

Closed
jlconlin opened this issue Aug 20, 2020 · 16 comments
Closed

Defining parser for ENDF file format #2622

jlconlin opened this issue Aug 20, 2020 · 16 comments

Comments

@jlconlin
Copy link

I have a file format that doesn't have a parser (yet). I'd like to (if I can), write a parser so that I can use existing text-editor tools to naturally move through the file. I'd be willing to do the work, but I'm not sure where to start. There are no keywords for this as it is not a computer language. I've written a simple syntax and folding definition for the Vim editor. Not sure if that helps or not.

The different sections of the file are determined based on the content of the last ten columns of each line. (I didn't create the format. Sorry.) here is a sample:

                                                                  MMMMFFTTT
                               33        856        176          17434 1451
                               34          2        155          17434 1451
                               34         51        115          17434 1451
 0.000000+0 0.000000+0          0          0          0          07434 1  0
 0.000000+0 0.000000+0          0          0          0          07434 0  0
 7.418300+4 1.813790+2          0          0          1          07434 2151
 7.418300+4 1.000000+0          0          0          2          07434 2151
 1.000000-5 5.000000+3          1          7          0          17434 2151
 0.000000+0 0.000000+0          0          3          5          07434 2151
 0.000000+0 0.000000+0          2          0         24          47434 2151
 7.418300+4 1.813790+2          0          0          0          07434 3 28
-7.222000+6-7.222000+6          0          0          1         397434 3 28
         39          2                                            7434 3 28
 7.261820+6 0.000000+0 9.300000+6 0.000000+0 9.600000+6 2.18585-137434 3 28
 1.000000+7 5.01372-13 1.050000+7 1.32071-11 1.100000+7 8.70475-107434 3 28
 0.000000+0 0.000000+0          0          0          0          07434 3  0
 7.418300+4 1.813790+2          0          0          0          07434 3 37
-2.093600+7-2.093600+7          0          0          1         207434 3 37
 2.105140+7 0.000000+0 2.200000+7 7.150990-5 2.400000+7 2.707920-27434 3 37
 1.300000+8 5.411910-2 1.500000+8 3.895580-2                      7434 3 37
 0.000000+0 0.000000+0          0          0          0          07434 3  0
 7.418300+4 1.813790+2          0          0          0          07434 3 41
-1.328500+7-1.328500+7          0          0          1         267434 3 41
         26          2                                            7434 3 41
 1.335820+7 0.000000+0 1.550000+7 0.000000+0 1.600000+7 2.56183-147434 3 41
 1.700000+7 9.60380-12 1.800000+7 3.02742-10 1.900000+7 1.474340-77434 3 41
 1.300000+8 1.582280-2 1.500000+8 1.154350-2                      7434 3 41

I've labeled the columns MMMM, FF, and TT. When these change is when I need a "tag" (using the term loosely) to tell me that it has changed. Note, this is (kind of) nested in that, there are many TTs in each FF, and many FFs inside each MMMM.

I've attached an example file that contains a full example.

n-000_n_001.endf.txt

@jlconlin jlconlin changed the title Defining parser for custom file format Defining parser for ENDF file format Aug 20, 2020
@masatake
Copy link
Member

The format definition: https://www.nndc.bnl.gov/csewg/docs/endf-manual.pdf

@jlconlin
Copy link
Author

The format definition: https://www.nndc.bnl.gov/csewg/docs/endf-manual.pdf

Yes that is the format definition. The parser does not have to generate tags for all of it right now. We can add to it as needed. First is just needed to know where MMMM, FF, and TT change.

@masatake
Copy link
Member

Do you know any popular programming language like C?
Could you tell me one of what you knows? I would like to use it as an example to explain tags output.

@masatake
Copy link
Member

You may know TeX. I will use it.

@jlconlin
Copy link
Author

I'm mostly familiar with C++, Python, and LaTeX. I can do C, but I don't like to.

@masatake
Copy link
Member

input.tex:

\section{A}
...
\subsection{B}
...
\subsubsection{C}

For the above input, ctags can generate following tags file:

$ u-ctags -o - --fields=+K-l /tmp/input.tex 
A	input.tex	/^\\section{A}$/;"	section
B	input.tex	/^\\subsection{B}$/;"	subsection	section:A
C	input.tex	/^\\subsubsection{C}$/;"	subsubsection	subsection:A""B

For input.endf:

                               33        856        176          17434 1451
                               34          2        155          17434 1451
                               34         51        115          17434 1451
 0.000000+0 0.000000+0          0          0          0          07434 1  0

what kind of tags output do you want?
My guessing:

17434	input.endf	/^                               33        856        176          17434 1451$/;"	mmmm
12	input.endf	/^                               33        856        176          17434 1451$/;"	ff	mmmm:17343
51	input.endf	/^                               33        856        176          17434 1451$/;"	tt	ff:1734312
07434	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          07434 1  0$/;"	mmmm
1\ 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          07434 1  0$/;"	ff	mmmm:07434
\ 0	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          07434 1  0$/;"	tt ff:074341\ 

@masatake
Copy link
Member

12	input.endf	/^                               33        856        176          17434 1451$/;"	ff	mmmm:17343
51	input.endf	/^                               33        856        176          17434 1451$/;"	tt	ff:1734312

This includes a typo.
What I would like to write is:

14	input.endf	/^                               33        856        176          17434 1451$/;"	ff	mmmm:17343
51	input.endf	/^                               33        856        176          17434 1451$/;"	tt	ff:1734314

@masatake
Copy link
Member

As far as reading "Table 1: Key parameters defining the hierarchy of entries in an ENDFfile", mmmm, ff and tt may not be good as the name of kinds.

mat (material) may be better than mmm.
mf (material file) may be better than ff.
mt (material subdivision) may be better than tt.

@masatake
Copy link
Member

I wonder how "1 " and " 0" should be tagged. Can we tag them as "1" and "0"?
Whether the prefixed and suffixed white space character should be kept or not.

@masatake
Copy link
Member

I found more typos. 17434 should be 7434. 07434 should be 7434.

@masatake
Copy link
Member

masatake commented Aug 20, 2020

Based on the guessing I wrote a parser. \x20 at the beginning of lines means a white space char.

input: n-000_n_001.endf.txt

[yamato@control]/tmp% u-ctags --fields=+K-l  --sort=no -o - input.endf 
u-ctags --fields=+K-l  --sort=no -o - input.endf 
\x20 1 	input.endf	/^ $Rev::          $  $Date::            $                             1 0  0$/;"	mat
0 	input.endf	/^ $Rev::          $  $Date::            $                             1 0  0$/;"	mf	mat:  1 
\x200	input.endf	/^ $Rev::          $  $Date::            $                             1 0  0$/;"	mt	mf:  1 0 
\x2025 	input.endf	/^ 1.000000+0 1.000000+0          0          0          0          2  25 1451$/;"	mat
14	input.endf	/^ 1.000000+0 1.000000+0          0          0          0          2  25 1451$/;"	mf	mat: 25 
51	input.endf	/^ 1.000000+0 1.000000+0          0          0          0          2  25 1451$/;"	mt	mf: 25 14
1 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 1  0$/;"	mf	mat: 25 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 1  0$/;"	mt	mf: 25 1 
0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mf	mat: 25 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mt	mf: 25 0 
21	input.endf	/^ 1.000000+0 1.000000+0          0          0          1          0  25 2151$/;"	mf	mat: 25 
51	input.endf	/^ 1.000000+0 1.000000+0          0          0          1          0  25 2151$/;"	mt	mf: 25 21
2 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 2  0$/;"	mf	mat: 25 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 2  0$/;"	mt	mf: 25 2 
0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mf	mat: 25 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mt	mf: 25 0 
3 	input.endf	/^ 1.000000+0 1.000000+0          0          0          0          0  25 3  1$/;"	mf	mat: 25 
\x201	input.endf	/^ 1.000000+0 1.000000+0          0          0          0          0  25 3  1$/;"	mt	mf: 25 3 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 3  0$/;"	mt	mf: 25 3 
\x202	input.endf	/^ 1.000000+0 1.000000+0          0          0          0          0  25 3  2$/;"	mt	mf: 25 3 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 3  0$/;"	mt	mf: 25 3 
0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mf	mat: 25 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mt	mf: 25 0 
4 	input.endf	/^ 1.000000+0 1.000000+0          0          1          0          0  25 4  2$/;"	mf	mat: 25 
\x202	input.endf	/^ 1.000000+0 1.000000+0          0          1          0          0  25 4  2$/;"	mt	mf: 25 4 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 4  0$/;"	mt	mf: 25 4 
0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mf	mat: 25 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mt	mf: 25 0 
\x20 0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0   0 0  0$/;"	mat
0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0   0 0  0$/;"	mf	mat:  0 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0   0 0  0$/;"	mt	mf:  0 0 
\x20-1 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  -1 0  0$/;"	mat
0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  -1 0  0$/;"	mf	mat: -1 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  -1 0  0$/;"	mt	mf: -1 0 

@jlconlin
Copy link
Author

As far as reading "Table 1: Key parameters defining the hierarchy of entries in an ENDFfile", mmmm, ff and tt may not be good as the name of kinds.

mat (material) may be better than mmm.
mf (material file) may be better than ff.
mt (material subdivision) may be better than tt.

Yes, you are right, mat, mf, and mt are the correct names for the hierarchy. I was trying not to get too much into the details. I'm impressed you were able to dig through that large document and find the important stuff. Thanks!

So when mat, mf, or mt turns to 0, that just means that that it is the last line of the material/file/section. I don't know if that means that you need a new tag. I'm still new to all of this.

Also, sometimes there are additional numbers beyond mt that are optional. These can be up to 5 digits in length.

@jlconlin
Copy link
Author

Based on the guessing I wrote a parser. \x20 at the beginning of lines means a white space char.

input: n-000_n_001.endf.txt

[yamato@control]/tmp% u-ctags --fields=+K-l  --sort=no -o - input.endf 
u-ctags --fields=+K-l  --sort=no -o - input.endf 
\x20 1 	input.endf	/^ $Rev::          $  $Date::            $                             1 0  0$/;"	mat
0 	input.endf	/^ $Rev::          $  $Date::            $                             1 0  0$/;"	mf	mat:  1 
\x200	input.endf	/^ $Rev::          $  $Date::            $                             1 0  0$/;"	mt	mf:  1 0 
\x2025 	input.endf	/^ 1.000000+0 1.000000+0          0          0          0          2  25 1451$/;"	mat
14	input.endf	/^ 1.000000+0 1.000000+0          0          0          0          2  25 1451$/;"	mf	mat: 25 
51	input.endf	/^ 1.000000+0 1.000000+0          0          0          0          2  25 1451$/;"	mt	mf: 25 14
1 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 1  0$/;"	mf	mat: 25 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 1  0$/;"	mt	mf: 25 1 
0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mf	mat: 25 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mt	mf: 25 0 
21	input.endf	/^ 1.000000+0 1.000000+0          0          0          1          0  25 2151$/;"	mf	mat: 25 
51	input.endf	/^ 1.000000+0 1.000000+0          0          0          1          0  25 2151$/;"	mt	mf: 25 21
2 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 2  0$/;"	mf	mat: 25 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 2  0$/;"	mt	mf: 25 2 
0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mf	mat: 25 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mt	mf: 25 0 
3 	input.endf	/^ 1.000000+0 1.000000+0          0          0          0          0  25 3  1$/;"	mf	mat: 25 
\x201	input.endf	/^ 1.000000+0 1.000000+0          0          0          0          0  25 3  1$/;"	mt	mf: 25 3 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 3  0$/;"	mt	mf: 25 3 
\x202	input.endf	/^ 1.000000+0 1.000000+0          0          0          0          0  25 3  2$/;"	mt	mf: 25 3 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 3  0$/;"	mt	mf: 25 3 
0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mf	mat: 25 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mt	mf: 25 0 
4 	input.endf	/^ 1.000000+0 1.000000+0          0          1          0          0  25 4  2$/;"	mf	mat: 25 
\x202	input.endf	/^ 1.000000+0 1.000000+0          0          1          0          0  25 4  2$/;"	mt	mf: 25 4 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 4  0$/;"	mt	mf: 25 4 
0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mf	mat: 25 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  25 0  0$/;"	mt	mf: 25 0 
\x20 0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0   0 0  0$/;"	mat
0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0   0 0  0$/;"	mf	mat:  0 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0   0 0  0$/;"	mt	mf:  0 0 
\x20-1 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  -1 0  0$/;"	mat
0 	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  -1 0  0$/;"	mf	mat: -1 
\x200	input.endf	/^ 0.000000+0 0.000000+0          0          0          0          0  -1 0  0$/;"	mt	mf: -1 0 

So, I'm too much of a novice to fully comprehend what those tags mean. There should be three "metadata" for every line, mat, mf, mt. I don't know if a tag needs to be generated for every line, or only when the section changes.

@masatake
Copy link
Member

masatake commented Aug 20, 2020

About the output, see tags(5) man page (https://docs.ctags.io/en/latest/man/tags.5.html).

I don't know if a tag needs to be generated for every line, or only when the section changes.

I also don't know that. I can help you write a parser. However, I cannot help you know what you want
because I don't know ENDF format well, not only about syntax bout also about purpose.
It is very up to how you use the tags output. I guess you may want to navigate the files on vim.
That means the knowledge of vim is needed. However, I don't know well about vim.

Here is the parser I wrote. There are some ways to write a parser in ctags. This one is categorized to "line oriented parser written in C".

masatake@e8e0015?branch=e8e0015393ae7a3b447ee886bd0884f45d11ced2&diff=unified

You can edit the parser as you want.

@jlconlin
Copy link
Author

jlconlin commented Aug 20, 2020

Here is the parser I wrote. There are some ways to write a parser in ctags. This one is categorized to "line oriented parser written in C".

masatake@e8e0015?branch=e8e0015393ae7a3b447ee886bd0884f45d11ced2&diff=unified

You can edit the parser as you want.

That looks great! If I understand C well enough, you are looking at the end of the file and updating the value of mat, mf, and mt. The only thing I would change is that mt is three digits in length; you only have it as two.

I'll clone it to my space, make the change, and see if I can't get it to work.

@masatake
Copy link
Member

That looks great! If I understand C well enough, you are looking at the end of the file and updating the value of mat, mf, and mt.
Yes.

The only thing I would change is that mt is three digits in length; you only have it as two.

Oh, sorry.

I'll clone it to my space, make the change, and see if I can't get it to work.

o.k. Feel free to reopen this if you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants