-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
191 lines (132 loc) · 7.31 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
**************************** GENERAL WORKFLOW **********************************
Our workflow for converting any trace to DataSeries format is the following:
1. Define a field table for system call in a special format and generate XML
files from it. The table is first used by the generate-xml.sh script to
generate XML files describing extent types in the DataSeries file. Later
the table is also used by csv2ds tool to properly identify field types while
conversion. Field table has the following format:
<extent type> <field name> <is nullable?> <field type>
The examples of the field tables are:
- tables/snia_syscall_fields.table
2. Convert the trace from a raw format (e.g., from a binary produced by
strace) to a CSV file with the fields corresponding to the fields of a
target DataSeries file. The first column of the CSV file should always be
the name of the extent. For example, two following records:
pwrite,1,4096,8192
read,2,128
correspond to "pwrite" and "read" extents with 3 and 2 fields, respectively.
The scripts that perform this action are:
- strace2ds.sh
Note: these scripts also invoke csv2ds tool, described next.
3. The CSV files obtained in the previous step lack the information about
the semantics of the fields. For example, in
pwrite,1,4096,8192
read,2,128
it is not clear what 1, 4096, 8192, 2, and 128 mean. One needs to define a
spec string that describes the format of the CSV file. For example, for the
record above it can be something like that:
pwrite(descriptor,bytes_requested,offset);read(descriptor,bytes_requested);
"descriptor", "bytes_requested", and "offset" are field names defined in the
field table in step 1. Basically, the spec string performs the mapping
between the values in the CSV file and the fields in the DataSeries file.
Examples of the spec strings are:
- specstrings/strace.spec (used by strace2ds.sh)
The specfile can also define common fields (for all extents) using the
Common(f1, f2, f3) directive. More details on the Common field are in the
FILE FORMAT and EXTENDED EXAMPLE sections below.
4. When the field table is defined, XML files are generated, and specstrings
are declared, one can use csv2ds tool to create a DataSeries file as follows:
$ csv2ds my.ds tables/example.table specfiles/example.spec xml/ traces/example.csv
The csv2ds tool uses "example.table" (and the corresponding XML files in the
xml/ subdirectory) to create DataSeries files with all required extents.
For each line in the "example.csv", csv2ds uses the corresponding
description in the "example.spec" specfile to set fields in the DataSeries
file. If some field is defined in the field table (and consequently in the
XML files) but is missing in the specstring, it will be set to NULL (if this
field is nullable) or set to the default value of the corresponding field
type (if the field is not nullable).
***************************** USUAL USE CASE **********************************
Often, one does not need to define tables or spec strings, because they are
already in this directory. In this case, the sequence of operations is very
easy. For example, for strace traces:
$ ./generate-xml.sh tables/snia_syscall_fields.csv
$ ./strace2ds.sh <strace_file> out.ds
******************************** FILE FORMAT ***********************************
* INPUT TRACE IN CSV FORMAT
The format for each line in the CSV file should look like this:
<extent name>,<fields>,...
* SPECIFICATION STRING
The string defines the structure of the extents and fields that the CSV file
follows. The extents are delimited by semicolons; the fields specific to an
extent is enclosed by parenthesis; and the fields are delimited by commas.
For example,
$ cat specstrings/example.spec
read(descriptor,bytes_requested);
describes that the 'read' records in the CSV file would look like
"read,<descriptor>,<bytes_requested>".
If the extent is called 'Common', the fields enclosed will be applied to the
beginning of all subsequent extents. For example,
Common(time_called,time_returned,time_recorded,executing_pid,executing_tid,executing_uid,return_value);read(descriptor,bytes_requested);write(descriptor,bytes_requested);
is equivalent to
read(time_called,time_returned,time_recorded,executing_pid,executing_tid,executing_uid,return_value,descriptor,bytes_requested);write(time_called,time_returned,time_recorded,executing_pid,executing_tid,executing_uid,return_value,descriptor,bytes_requested);
* FIELD TABLE
The field table describes fields and their types that will constitute a
target DataSeries file. All fields in the spec string must exist in the
field table.
The table has four fields, delimited by '\t' (TAB):
1) name of the extent
2) name of the field
3) nullable (1) or not nullable (0)
4) type of the field
******************************** EXTENDED EXAMPLE *****************************
Suppose one wants to trace open, read, and close system calls with the
following parameters:
<common fields> = time_called,time_returned,time_recorded,executing_pid,executing_tid,executing_uid,return_value
<open specific fields> = given_pathname,full_pathname,flag_read_only,flag_write_only,flag_read_and_write,flag_append,flag_create,flag_direct
<read specific fields> = descriptor,bytes_requested
<close specific fields> = descriptor
The field table should define at least all the fields above (but there may
be more):
$ cat example.table
Common time_called 1 int64
Common time_returned 1 int64
Common time_recorded 1 int64
Common executing_pid 1 int32
Common executing_tid 1 int32
Common executing_uid 1 int32
Common return_value 1 int64
open given_pathname 0 variable32
open full_pathname 1 variable32
open flag_read_only 0 bool
open flag_write_only 0 bool
open flag_read_and_write 0 bool
open flag_append 0 bool
open flag_create 0 bool
open flag_direct 0 bool
read descriptor 0 int32
read bytes_requested 0 int64
close descriptor 0 int32
The format for each line in the input CSV file should look like:
<system call name>,<common fields>,<fields specific to each system call>
Here's an example CSV file:
$ cat traces/example.csv
open,1313191933879624,1313191933879634,1313191933879634,11182,11205,0,11,/proc/11205/stat,,1,0,0,0,0,0
read,1313191933878698,1313191933879958,1313191933879958,11184,11202,0,245,11,1024
close,1313191933879977,1313191933879992,1313191933879992,11184,11202,0,0,11
From this we can see that:
1. At timestamp 1313191933879624, process 11182 with thread ID 11205 opens
the file "/proc/11205/stat" with O_RDONLY flag. This operation is
successfully completed at timestamp 1313191933879634 by returning the file
descriptor 11. Note that null fields (in this case, the full_pathname) are
represented by the null string.
2. At timestamp 1313191933878698, the same process with a different thread
(11202), reads from file descriptor 11 data of size 1024 bytes.
3. At timestamp 1313191933879977, the same process and thread closes the
file descriptor 11.
The specification string should look like:
$ cat specstrings/example.spec
Common(time_called,time_returned,time_recorded,executing_pid,executing_tid,executing_uid,return_value);open(given_pathname,full_pathname,flag_read_only,flag_write_only,flag_read_and_write,flag_append,flag_create,flag_direct);read(descriptor,bytes_requested);close(descriptor);
$ cat specstrings/example.spec
Now, we run
$ ./csv2ds out.ds tables/example.table specstrings/example.spec xml/ traces/example.csv
to perform the conversion.