1 | \input texinfo @c -*-texinfo-*- |
---|
2 | @documentencoding ISO-8859-2 |
---|
3 | @c @documentlanguage pl |
---|
4 | |
---|
5 | @c %**start of header |
---|
6 | @setfilename utt.info |
---|
7 | @settitle UAM Text Tools v0.90 |
---|
8 | @c %**end of header |
---|
9 | |
---|
10 | @copying |
---|
11 | This manual is for UAM Text Tools (version 0.90, November, 2007) |
---|
12 | |
---|
13 | Copyright @copyright{} 2005, 2007 Tomasz Obrêbski, Micha³ Stolarski, Justyna Walkowska, Pawe³ Konieczka. |
---|
14 | |
---|
15 | Permission is granted to copy, distribute and/or modify this document |
---|
16 | under the terms of the GNU Free Documentation License, Version 1.2 |
---|
17 | or any later version published by the Free Software Foundation; |
---|
18 | with no Invariant Sections, no Front-Cover Texts, and no Back-Cover |
---|
19 | Texts. A copy of the license is included in the section entitled GNU Free Documentation License,,GNU Free Documentation License. |
---|
20 | |
---|
21 | @c @quotation |
---|
22 | @c Permission is granted to ... |
---|
23 | @c No permission is granted until the document is completed. |
---|
24 | @c @end quotation |
---|
25 | @end copying |
---|
26 | |
---|
27 | |
---|
28 | @titlepage |
---|
29 | @title UAM Text Tools 0.90 - User Manual |
---|
30 | @subtitle edition 0.01, @today |
---|
31 | @subtitle status: prescript |
---|
32 | @author by Justyna Walkowska, Tomasz Obr@,{}ebski and Micha@l{} Stolarski |
---|
33 | @page |
---|
34 | @vskip 0pt plus 1filll |
---|
35 | @insertcopying |
---|
36 | @end titlepage |
---|
37 | |
---|
38 | @contents |
---|
39 | |
---|
40 | @c @paragraphindent none |
---|
41 | |
---|
42 | @iftex |
---|
43 | @parskip = 0.5@normalbaselineskip plus 3pt minus 1pt |
---|
44 | @end iftex |
---|
45 | |
---|
46 | @c @headings off |
---|
47 | @c @everyheading LEM(1) @| @| LEM(1) |
---|
48 | @everyfooting @today @c @| @thispage @| |
---|
49 | |
---|
50 | @ifnottex |
---|
51 | |
---|
52 | @node Top |
---|
53 | @top UTT - UAM Text Tools |
---|
54 | |
---|
55 | @insertcopying |
---|
56 | |
---|
57 | @menu |
---|
58 | * General information:: |
---|
59 | * UTT file format:: |
---|
60 | * Configuration files:: |
---|
61 | * UTT components:: |
---|
62 | * Auxiliary tools:: |
---|
63 | * Usage examples:: |
---|
64 | * PMDBF dictionary:: |
---|
65 | @c * Examples:: |
---|
66 | @c * Copyright:: |
---|
67 | * GNU Free Documentation License:: |
---|
68 | * Reporting bugs:: |
---|
69 | * Author:: |
---|
70 | @end menu |
---|
71 | @end ifnottex |
---|
72 | |
---|
73 | |
---|
74 | @c ---------------------------------------------------------------------- |
---|
75 | |
---|
76 | @node General information |
---|
77 | @chapter General information |
---|
78 | |
---|
79 | UAM Text Tools (UTT) is a package of language processing tools |
---|
80 | developed at Adam Mickiewicz University. Its functionality includes: |
---|
81 | |
---|
82 | @itemize @bullet |
---|
83 | |
---|
84 | @item |
---|
85 | tokenization |
---|
86 | @item |
---|
87 | dictionary-based morphological analysis |
---|
88 | @item |
---|
89 | heuristic morphological analysis of unknown words |
---|
90 | @item |
---|
91 | spelling correction |
---|
92 | @item |
---|
93 | pattern search |
---|
94 | @item |
---|
95 | sentence splitting |
---|
96 | @item |
---|
97 | generation of concordance tables |
---|
98 | @end itemize |
---|
99 | |
---|
100 | The toolkit is destined for processing of raw (not annotated) |
---|
101 | unrestricted text for any conceivable purpose. |
---|
102 | |
---|
103 | The system is organized as a collection of command-line programs, each |
---|
104 | performing one operation, e.g. tokenization, lemmatization, spelling |
---|
105 | correction. The components are independent one from another, the |
---|
106 | unifying element being the uniform i/o file format. |
---|
107 | |
---|
108 | The components may be combined in various ways to provide various text |
---|
109 | processing services. Also new components supplied by the used may be |
---|
110 | easily incorporated into the system provided that they respect the i/o |
---|
111 | file format conventions. |
---|
112 | |
---|
113 | UTT component programs does not depend on any specific tagset or |
---|
114 | morphological description format. |
---|
115 | |
---|
116 | UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by |
---|
117 | the Free Software Foundation, either version 3 of the License, or (at your option) any later version. |
---|
118 | |
---|
119 | The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use. |
---|
120 | |
---|
121 | |
---|
122 | List of contributors: |
---|
123 | |
---|
124 | @itemize |
---|
125 | @item Pawel Konieczka |
---|
126 | @item Tomasz Obrebski |
---|
127 | @item Michal Stolarski |
---|
128 | @item Marcin Walas |
---|
129 | @item Justyna Walkowska |
---|
130 | @end itemize |
---|
131 | |
---|
132 | @c ---------------------------------------------------------------------- |
---|
133 | @c --------------------------------------------------------------------- |
---|
134 | |
---|
135 | @node UTT file format |
---|
136 | @chapter UTT file format |
---|
137 | |
---|
138 | A UTT file contains annotation of a text. It consists of a sequence of |
---|
139 | segments. Each segment explicitly refers to a continuous piece of the |
---|
140 | text and provides some information on it. |
---|
141 | |
---|
142 | @section Segment format |
---|
143 | |
---|
144 | A segment occupies one line of a UTT file and consists of |
---|
145 | space-separated fields: |
---|
146 | |
---|
147 | |
---|
148 | @quotation |
---|
149 | @sp 1 |
---|
150 | [@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]] |
---|
151 | @sp 1 |
---|
152 | @end quotation |
---|
153 | |
---|
154 | @table @var |
---|
155 | |
---|
156 | @item @var{start} |
---|
157 | Non-negative integer value indicating the position in the source text where the |
---|
158 | segment starts. |
---|
159 | |
---|
160 | @item @var{length} |
---|
161 | Non-negative integer value indicating the length of the segment. |
---|
162 | |
---|
163 | @item @var{type} |
---|
164 | A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field). |
---|
165 | @var{type} reflects the main classification of segments - |
---|
166 | into words, numbers, punctuation marks, meta-text markers. |
---|
167 | @xref{tok output,,tok output}, for description of automatically recognized type markers. |
---|
168 | |
---|
169 | @item @var{form} |
---|
170 | This field contains the textual form of the segment or the special |
---|
171 | symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0). |
---|
172 | |
---|
173 | The characters or character sequences that have special meaning in the |
---|
174 | @var{form} field are enumerated below. |
---|
175 | |
---|
176 | Characters with special meaning: |
---|
177 | |
---|
178 | @itemize |
---|
179 | @item @code{_} - space character |
---|
180 | @item @code{*} - undefined contents |
---|
181 | @end itemize |
---|
182 | |
---|
183 | Escape sequences: |
---|
184 | |
---|
185 | @itemize |
---|
186 | @item @code{\n} - new line |
---|
187 | @item @code{\t} - tabulation |
---|
188 | @item @code{\r} - carriage return |
---|
189 | |
---|
190 | @item @code{\_} - the @code{_} character |
---|
191 | @item @code{\*} - the @code{*} character |
---|
192 | @item @code{\\} - the @code{\} character |
---|
193 | |
---|
194 | @c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters) |
---|
195 | @end itemize |
---|
196 | |
---|
197 | @item @var{annotation1} |
---|
198 | @item @var{annotation2} |
---|
199 | @item ... |
---|
200 | Annotation fields have the following format: |
---|
201 | |
---|
202 | @var{longname} @code{:} @var{value} |
---|
203 | |
---|
204 | or |
---|
205 | |
---|
206 | @var{shortname} @var{value} |
---|
207 | |
---|
208 | where @var{longname} is a string of alphanumeric characters |
---|
209 | (isalnum() test), @var{shortname} - a single non-alphanumeric character |
---|
210 | (ispunct() test), and @var{value} is an arbitrary string of non-blank characters. |
---|
211 | |
---|
212 | @end table |
---|
213 | |
---|
214 | |
---|
215 | Only two fields are mandatory: @var{type} and @var{form}. All other fields |
---|
216 | may be absent. In the case when only one number precedes the |
---|
217 | @var{type} field, it is interpreted as the @var{START} position. |
---|
218 | |
---|
219 | If the @var{length} field is ommited, the length of the segment is the |
---|
220 | length of the @var{form} field, except when the value of the |
---|
221 | @var{form} field is @code{*} -- in this case, the length is assumed to |
---|
222 | be 0. |
---|
223 | |
---|
224 | If the @var{start} field is also absent, the segment is assumed to directly |
---|
225 | follow the preceding one. |
---|
226 | |
---|
227 | @c Conventions: |
---|
228 | |
---|
229 | @c Annotation fields with predefined meaning: |
---|
230 | |
---|
231 | @c @itemize |
---|
232 | @c @item @code{!} - UTT components are allowed to modify the contents of |
---|
233 | @c the @var{form} field (e.g. spelling correction does this). If this happens the |
---|
234 | @c original form of the segment have to be placed in the @code{!}-field. |
---|
235 | @c @item @code{@@} - morphological description |
---|
236 | @c @item @code{=} - node identifier assignment (used in graph encoding) |
---|
237 | @c @item @code{<} - preceding/dominating node(s) (used in graph encoding) |
---|
238 | @c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding) |
---|
239 | @c @end itemize |
---|
240 | |
---|
241 | Segments of length 0 may be used to mark file positions with some |
---|
242 | information. See e.g. BOS and EOS (beginning/end of sentence) markers |
---|
243 | in the example below. |
---|
244 | |
---|
245 | Example: |
---|
246 | |
---|
247 | sentence: @samp{Piszemy dobre progrumy.} |
---|
248 | |
---|
249 | @example |
---|
250 | 0000 00 BOS * |
---|
251 | 0000 07 W Piszemy lem:pisaæ,V |
---|
252 | 0007 01 S _ |
---|
253 | 0008 05 W dobre lem:dobry,ADJ |
---|
254 | 0013 01 S _ |
---|
255 | 0014 08 W progrumy cor:programy lem:program,N |
---|
256 | 0022 01 P . |
---|
257 | 0023 00 EOS * |
---|
258 | 0023 01 S _ |
---|
259 | 0024 00 BOS * |
---|
260 | 0024 11 W Warszawiacy lem:Warszawiak,N |
---|
261 | 0035 01 S _ |
---|
262 | 0036 03 W te¿ |
---|
263 | 0039 01 P . |
---|
264 | 0040 00 EOS * |
---|
265 | |
---|
266 | @end example |
---|
267 | |
---|
268 | @example |
---|
269 | 0000 BOS * |
---|
270 | 0000 W Piszemy lem:pisaæ,V |
---|
271 | 0007 S _ |
---|
272 | 0008 W dobre lem:dobry,ADJ |
---|
273 | 0013 S _ |
---|
274 | 0014 W progrumy cor:programy lem:program,N |
---|
275 | 0022 P . |
---|
276 | 0023 EOS * |
---|
277 | @end example |
---|
278 | |
---|
279 | Posion information may be provided only for some types of segments: |
---|
280 | |
---|
281 | @example |
---|
282 | 0000 BOS * |
---|
283 | W Piszemy lem:pisaæ,V |
---|
284 | S _ |
---|
285 | W dobre lem:dobry,ADJ |
---|
286 | S _ |
---|
287 | W progrumy cor:programy lem:program,N |
---|
288 | P . |
---|
289 | EOS * |
---|
290 | S _ |
---|
291 | 0024 BOS * |
---|
292 | W Warszawiacy lem:Warszawiak,N |
---|
293 | S _ |
---|
294 | W te¿ |
---|
295 | P . |
---|
296 | EOS * |
---|
297 | @end example |
---|
298 | |
---|
299 | Position/length information may be provided only when necessary: |
---|
300 | |
---|
301 | @example |
---|
302 | 0000 04 N * |
---|
303 | 0000 N 12 |
---|
304 | P . |
---|
305 | N 5 |
---|
306 | S _ |
---|
307 | W km |
---|
308 | @end example |
---|
309 | |
---|
310 | @section UTT File |
---|
311 | |
---|
312 | A UTT file consists of a sequence of segments. The same text position |
---|
313 | may be covered by multiple segments. In cosequence, ambiguous text |
---|
314 | segmentation and ambiguous annotation may be represented. |
---|
315 | |
---|
316 | There are two structural requirements a valid UTT-formatted file |
---|
317 | has to meet: |
---|
318 | |
---|
319 | @itemize @bullet |
---|
320 | |
---|
321 | @item |
---|
322 | segments have to be sorted with respect to the @var{position} field, |
---|
323 | |
---|
324 | @item |
---|
325 | for each |
---|
326 | segment ending at position @var{n}, either there must be a segment starting at |
---|
327 | position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly |
---|
328 | for each segment starting at position @var{n}, either there must be a segment |
---|
329 | ending at position @var{n-1}, or the position @var{n-1} must not be covered |
---|
330 | by any segment. |
---|
331 | |
---|
332 | @end itemize |
---|
333 | |
---|
334 | A valid annotation for the text fragment |
---|
335 | @example |
---|
336 | 12.5 km |
---|
337 | @end example |
---|
338 | |
---|
339 | may be |
---|
340 | |
---|
341 | @example |
---|
342 | 0000 02 N 12 |
---|
343 | 0000 04 N 12.5 |
---|
344 | 0002 01 P . |
---|
345 | 0003 01 N 5 |
---|
346 | 0004 01 S _ |
---|
347 | 0005 02 W km |
---|
348 | @end example |
---|
349 | |
---|
350 | but not |
---|
351 | |
---|
352 | @example |
---|
353 | 0000 02 N 12 |
---|
354 | 0000 04 N 12.5 |
---|
355 | 0004 01 S _ |
---|
356 | 0005 02 W km |
---|
357 | @end example |
---|
358 | |
---|
359 | because in the latter example the first segment (starting at position 0000, 2 characters long) ends at position @var{n}=0001 which is covered by the second segment and no segment starts at position @var{n+2}=0002. |
---|
360 | |
---|
361 | @section Character encoding |
---|
362 | |
---|
363 | The UTT component programs accept only 1-byte character encoding, such |
---|
364 | as ISO, ANSI, DOS, UTF-8 (probably: not tested yet). |
---|
365 | |
---|
366 | |
---|
367 | @c @section Formats |
---|
368 | |
---|
369 | @c @unnumberedsubsubsec Basic format |
---|
370 | |
---|
371 | @c While processing large amounts of the overhead related with explicit |
---|
372 | @c ... of the start position and segment length becomes ... . Therefore, |
---|
373 | @c for efficiency reasons certain shortcuts are possible: |
---|
374 | |
---|
375 | @c @unnumberedsubsubsec Relative start position |
---|
376 | |
---|
377 | @c Start position may be given as relative distance from the last |
---|
378 | @c absolut position. |
---|
379 | |
---|
380 | @c @unnumberedsubsubsec Absent length |
---|
381 | |
---|
382 | @c Segment length may by omitted. Normally it can be restored by counting |
---|
383 | @c the length of the @emph{form field}. For segments with the special value |
---|
384 | @c @code{*} in the @emph{form field} length 0 is assumed. |
---|
385 | |
---|
386 | @c @unnumberedsubsubsec Absent length and start position |
---|
387 | |
---|
388 | @c Both start position and segment length may be omitted. In this format |
---|
389 | @c each segment is assumed to follow the previous one. This format is, |
---|
390 | @c therefore, suitable only for unambiguously tagged text |
---|
391 | @c (0-length markers can be still used.) |
---|
392 | |
---|
393 | |
---|
394 | @c @table @code |
---|
395 | @c @item AL |
---|
396 | @c @code{1234 03 W kot} |
---|
397 | @c @item RL |
---|
398 | @c @code{+56 03 W kot} |
---|
399 | @c @item A |
---|
400 | @c @code{1234 W kot} |
---|
401 | @c @item R |
---|
402 | @c @code{+56 W kot} |
---|
403 | @c @item 0 |
---|
404 | @c @code{W kot} |
---|
405 | @c @end table |
---|
406 | |
---|
407 | |
---|
408 | @c [JAK UZYSKAÆ POLSKIE CZCIONKI W DVI???] |
---|
409 | |
---|
410 | @macro parhelp |
---|
411 | @item @b{@minus{}@minus{}help}, @b{@minus{}h} |
---|
412 | Print help. |
---|
413 | @end macro |
---|
414 | |
---|
415 | |
---|
416 | @macro parversion |
---|
417 | @item @b{@minus{}@minus{}version}, @b{@minus{}V} |
---|
418 | Print version information. |
---|
419 | @end macro |
---|
420 | |
---|
421 | @macro parinteractive |
---|
422 | @item @b{@minus{}@minus{}interactive, @minus{}i} |
---|
423 | This option toggles interactive mode, which is by default off. In the |
---|
424 | interactive mode the program does not buffer the output. |
---|
425 | @end macro |
---|
426 | |
---|
427 | |
---|
428 | @c @macro parfile |
---|
429 | @c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}} |
---|
430 | @c Input file name. |
---|
431 | @c If this option is absent or equal to '@minus{}', the program |
---|
432 | @c reads from the standard input. |
---|
433 | @c @end macro |
---|
434 | |
---|
435 | |
---|
436 | @c @macro paroutput |
---|
437 | @c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}} |
---|
438 | @c Regular output file name. To regular output the program sends segments |
---|
439 | @c which it successfully processed and copies those which were not |
---|
440 | @c subject to processing. If this option is absent or equal to |
---|
441 | @c '@minus{}', standard output is used. |
---|
442 | @c @end macro |
---|
443 | |
---|
444 | @c @macro parfail |
---|
445 | @c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} |
---|
446 | @c Fail output file name. To fail output the program copies the segments |
---|
447 | @c it failed to process. If this option is absent or equal to |
---|
448 | @c '@minus{}', standard output is used. |
---|
449 | @c @end macro |
---|
450 | |
---|
451 | |
---|
452 | @c @macro parcopy |
---|
453 | @c @item @b{@minus{}@minus{}copy, @minus{}c} |
---|
454 | @c Copy succesfully processed segments to regular output also in their |
---|
455 | @c original input form. |
---|
456 | @c @end macro |
---|
457 | |
---|
458 | |
---|
459 | @macro parinputfield |
---|
460 | @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}} |
---|
461 | The field containing the input to the program. The default is the |
---|
462 | @var{form} field. The fields @var{position}, @var{length}, @var{type}, |
---|
463 | and @var{form} are referred to as @code{1}, @code{2}, @code{3}, |
---|
464 | @code{4}, respectively. |
---|
465 | @end macro |
---|
466 | |
---|
467 | |
---|
468 | @macro paroutputfield |
---|
469 | @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}} |
---|
470 | The name of the field added by the program. The default is the name of the program. |
---|
471 | @end macro |
---|
472 | |
---|
473 | |
---|
474 | @macro pardictionary |
---|
475 | @item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}} |
---|
476 | Dictionary file name. |
---|
477 | @end macro |
---|
478 | |
---|
479 | |
---|
480 | @macro parprocess |
---|
481 | @item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}} |
---|
482 | Process segments with the specified value in the @var{type} field. |
---|
483 | Multiple occurences of this option are allowed and are interpreted as |
---|
484 | disjunction. If this option is absent, all segments are processed. |
---|
485 | @end macro |
---|
486 | |
---|
487 | |
---|
488 | @macro parselect |
---|
489 | @item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}} |
---|
490 | Select for processing only segments in which the field named |
---|
491 | @var{fieldname} is present. Multiple occurences of this option are |
---|
492 | allowed and are interpreted as conjunction of conditions. If this |
---|
493 | option is absent, all segments are processed. |
---|
494 | @end macro |
---|
495 | |
---|
496 | |
---|
497 | @macro parunselect |
---|
498 | @item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}} |
---|
499 | Select for processing only segments in which the field @var{fieldname} |
---|
500 | is absent. Multiple occurences of this option are allowed and are |
---|
501 | interpreted as conjunction of conditions. If this option is absent, |
---|
502 | all segments are processed. |
---|
503 | @end macro |
---|
504 | |
---|
505 | |
---|
506 | @macro paroneline |
---|
507 | @item @b{@minus{}@minus{}one-line} |
---|
508 | This option makes the program print ambiguous annotation in one output |
---|
509 | line by generating multiple annotation fields. By default when |
---|
510 | ambiguous annotation may be produced for a segment, the segment is |
---|
511 | multiplicated and each of the annotations is added to separate copy of |
---|
512 | the segment. |
---|
513 | @end macro |
---|
514 | |
---|
515 | |
---|
516 | @macro paronefield |
---|
517 | @item @b{@minus{}@minus{}one-field, @minus{}1} |
---|
518 | This option makes the program print ambiguous annotation in one |
---|
519 | annotation field. By default when ambiguous annotation may be produced |
---|
520 | for a segment, the segment is multiplicated and each of the |
---|
521 | annotations is added to separate copy of the segment. |
---|
522 | |
---|
523 | This option is useful when working with @command{kot} or @command{con}. |
---|
524 | @end macro |
---|
525 | |
---|
526 | |
---|
527 | @c --------------------------------------------------------------------- |
---|
528 | @c --------------------------------------------------------------------- |
---|
529 | |
---|
530 | @c @node Common command line options |
---|
531 | @c @chapter Common command line options |
---|
532 | |
---|
533 | @c @table @code |
---|
534 | |
---|
535 | @c @parhelp |
---|
536 | |
---|
537 | @c @item @b{@minus{}@minus{}help}, @b{@minus{}h} |
---|
538 | @c Print help. |
---|
539 | |
---|
540 | @c @item @b{@minus{}@minus{}version}, @b{@minus{}v} |
---|
541 | @c Print version information. |
---|
542 | |
---|
543 | @c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}} |
---|
544 | @c Input file name. |
---|
545 | @c If this option is absent or equal to '@minus{}', the program |
---|
546 | @c reads from the standard input. |
---|
547 | |
---|
548 | @c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}} |
---|
549 | @c Regular output file name. To regular output the program sends segments |
---|
550 | @c which it successfully processed and copies those which were not |
---|
551 | @c subject to processing. If this option is absent or equal to |
---|
552 | @c '@minus{}', standard output is used. |
---|
553 | |
---|
554 | @c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} |
---|
555 | @c Fail output file name. To fail output the program copies the segments |
---|
556 | @c it failed to process. If this option is absent or equal to |
---|
557 | @c '@minus{}', standard output is used. |
---|
558 | |
---|
559 | @c @item @b{@minus{}@minus{}only-fail} |
---|
560 | @c Discard segments which would normally be sent to regular |
---|
561 | @c output. Print only segments the program failed to process. |
---|
562 | |
---|
563 | @c @item @b{@minus{}@minus{}no-fail} |
---|
564 | @c Discard segments the program failed to process. |
---|
565 | @c (This and the previous option are functionally equivalent to, |
---|
566 | @c respectively, @option{-o /dev/null} and @option{-e /dev/null}, but |
---|
567 | @c make the programs run faster.) |
---|
568 | |
---|
569 | @c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}} |
---|
570 | @c The field containing the input to the program. The default is usually |
---|
571 | @c the @var{form} field (unless otherwise stated in the program |
---|
572 | @c description). The fields @var{position}, @var{length}, @var{tag}, and |
---|
573 | @c @var{form} are referred to as @code{1}, @code{2}, @code{3}, @code{4}, |
---|
574 | @c respectively. |
---|
575 | |
---|
576 | @c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}} |
---|
577 | @c The name of the field added by the program. The default is the name of |
---|
578 | @c the program. |
---|
579 | |
---|
580 | @c @c @item @b{@minus{}@minus{}copy, @minus{}c} |
---|
581 | @c @c Copy processed segments to regular output. |
---|
582 | |
---|
583 | @c @item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}} |
---|
584 | @c Dictionary file name. |
---|
585 | @c (This option is used by programs which use dictionary data.) |
---|
586 | |
---|
587 | @c @item @b{@minus{}@minus{}process=@var{tag}, @minus{}p @var{tag}} |
---|
588 | @c Process segments with the specified value in the @var{tag} field. |
---|
589 | @c Multiple occurences of this option are allowed and are interpreted as |
---|
590 | @c disjunction. If this option is absent, all segments are processed. |
---|
591 | |
---|
592 | @c @item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}} |
---|
593 | @c Select for processing only segments in which the field named |
---|
594 | @c @var{fieldname} is present. Multiple occurences of this option are |
---|
595 | @c allowed and are interpreted as conjunction of conditions. If this |
---|
596 | @c option is absent, all segments are processed. |
---|
597 | |
---|
598 | @c @item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}} |
---|
599 | @c Select for processing only segments in which the field @var{fieldname} |
---|
600 | @c is absent. Multiple occurences of this option are allowed and are |
---|
601 | @c interpreted as conjunction of conditions. If this option is absent, |
---|
602 | @c all segments are processed. |
---|
603 | |
---|
604 | @c @item @b{@minus{}@minus{}interactive @minus{}i} |
---|
605 | @c This option toggles interactive mode, which is by default off. In the |
---|
606 | @c interactive mode the program does not buffer the output. |
---|
607 | |
---|
608 | @c @item @b{@minus{}@minus{}config=@var{filename}} |
---|
609 | @c Read configuration from file @file{@var{filename}}. |
---|
610 | |
---|
611 | @c @item @b{@minus{}@minus{}one @minus{}1} |
---|
612 | @c This option makes the program print ambiguous annotation in one output |
---|
613 | @c segment. By default when |
---|
614 | @c ambiguous new annotation is being produced for a segment, the segment |
---|
615 | @c is multiplicated and each of the annotations is added to separate copy |
---|
616 | @c of the segment. |
---|
617 | |
---|
618 | @c @end table |
---|
619 | |
---|
620 | @c --------------------------------------------------------------------- |
---|
621 | @c CONFIGURATION FILES |
---|
622 | @c --------------------------------------------------------------------- |
---|
623 | |
---|
624 | @node Configuration files |
---|
625 | @chapter Configuration files |
---|
626 | |
---|
627 | Values for all command line options accepted by a component |
---|
628 | may be set in configuration files. The default location of the |
---|
629 | configuration files for a component named @command{@var{program}} are |
---|
630 | |
---|
631 | @example |
---|
632 | @file{/usr/local/etc/utt/@var{program}.conf} |
---|
633 | @end example |
---|
634 | |
---|
635 | for system-wide configuration file and |
---|
636 | |
---|
637 | @example |
---|
638 | @file{~/.utt/@var{program}.conf} |
---|
639 | @end example |
---|
640 | |
---|
641 | for user configuration file. |
---|
642 | |
---|
643 | @c The configuration file to load may be also specified with the |
---|
644 | @c @option{--config} option. Configuration file need not be provided. |
---|
645 | |
---|
646 | For each option, the value is set according to the following priority: |
---|
647 | |
---|
648 | @itemize |
---|
649 | @item command line |
---|
650 | @c @item configuration file indicated with @option{--config} option |
---|
651 | @item user configuration file (or configuration file indicated with the @option{--config} option) |
---|
652 | @item system-wide configuration file |
---|
653 | @end itemize |
---|
654 | |
---|
655 | Parameter values are specified in the following format: |
---|
656 | |
---|
657 | @var{parametername}=@var{value} |
---|
658 | |
---|
659 | where @var{parametername} is the short or long name of an option accepted by |
---|
660 | the program, or |
---|
661 | |
---|
662 | @var{parametername} |
---|
663 | |
---|
664 | if the option does not need arguments. |
---|
665 | |
---|
666 | You can introduce comments to configuration files using the # sign. |
---|
667 | |
---|
668 | If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file. |
---|
669 | |
---|
670 | @c The equal sign may be omitted. |
---|
671 | |
---|
672 | |
---|
673 | @quotation Tip |
---|
674 | If you have two (or more) frequently used sets of options for the same |
---|
675 | program (eg. lem with PMDBF dictionary and lem with a user dictionary) |
---|
676 | a good solution is to create two soft links to lem, called |
---|
677 | eg. lemg and lemu and specify their configuration in files lemg.conf |
---|
678 | and lemu.conf respectively. |
---|
679 | @end quotation |
---|
680 | |
---|
681 | @c --------------------------------------------------------------------- |
---|
682 | @c COMPONENTS |
---|
683 | @c --------------------------------------------------------------------- |
---|
684 | |
---|
685 | @node UTT components |
---|
686 | @chapter UTT components |
---|
687 | |
---|
688 | UTT components are of three types: |
---|
689 | |
---|
690 | @menu |
---|
691 | Sources: programs which read non-UTT data (e.g. raw text) and produce output |
---|
692 | in UTT format |
---|
693 | * tok:: a tokenizer |
---|
694 | |
---|
695 | Filters: programs which read and produce UTT-formatted data |
---|
696 | @c * sen - the sentencizer:: |
---|
697 | * lem:: a morphological analyzer |
---|
698 | * gue:: a morphological guesser |
---|
699 | * cor:: a spelling corrector |
---|
700 | * sen:: a sentensizer |
---|
701 | @c * gph - the graphizer:: |
---|
702 | * ser:: a pattern search tool (marks matches) |
---|
703 | * grp:: a pattern search tool (selects sentences containing a match) |
---|
704 | |
---|
705 | Sinks: programs which read UTT data and produce output in another format |
---|
706 | * kot:: an untokenizer |
---|
707 | * con:: a concordance table generator |
---|
708 | @end menu |
---|
709 | |
---|
710 | @c --------------------------------------------------------------------- |
---|
711 | @c TOK |
---|
712 | @c --------------------------------------------------------------------- |
---|
713 | |
---|
714 | @page |
---|
715 | @node tok |
---|
716 | @section tok - a tokenizer |
---|
717 | |
---|
718 | @c ---------------------------------------- |
---|
719 | |
---|
720 | @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
721 | @item @strong{Authors:} @tab Tomasz Obrêbski |
---|
722 | @item @strong{Component category:} @tab source |
---|
723 | @end multitable |
---|
724 | |
---|
725 | |
---|
726 | @menu |
---|
727 | * tok description:: |
---|
728 | * tok input:: |
---|
729 | * tok output:: |
---|
730 | * tok command line options:: |
---|
731 | * tok example:: |
---|
732 | @end menu |
---|
733 | |
---|
734 | @node tok description |
---|
735 | @subsection Description |
---|
736 | |
---|
737 | @code{tok} is a simple program which reads a text file and identifies |
---|
738 | tokens on the basis of their orthographic form. The type of the token |
---|
739 | is printed as the @var{type} field. |
---|
740 | |
---|
741 | @node tok input |
---|
742 | @subsection Input |
---|
743 | |
---|
744 | Raw text. |
---|
745 | |
---|
746 | @node tok output |
---|
747 | @subsection Output |
---|
748 | |
---|
749 | UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished: |
---|
750 | |
---|
751 | @itemize |
---|
752 | |
---|
753 | @item @code{W} |
---|
754 | (word) |
---|
755 | - continuous sequence of letters |
---|
756 | |
---|
757 | @item @code{N} |
---|
758 | (number) |
---|
759 | - continuous sequence of digits |
---|
760 | |
---|
761 | @item @code{S} |
---|
762 | (space) |
---|
763 | - continuous sequence of space characters |
---|
764 | |
---|
765 | @item @code{P} |
---|
766 | (punctuation mark) |
---|
767 | - single printable characters not belonging to any of the other classes |
---|
768 | |
---|
769 | @item @code{B} |
---|
770 | (unprintable character) |
---|
771 | - single unprintable character |
---|
772 | |
---|
773 | @end itemize |
---|
774 | |
---|
775 | |
---|
776 | |
---|
777 | @node tok command line options |
---|
778 | @subsection Command line options |
---|
779 | |
---|
780 | @table @code |
---|
781 | |
---|
782 | @item @b{@minus{}@minus{}help}, @b{@minus{}h} |
---|
783 | Print help. |
---|
784 | |
---|
785 | @item @b{@minus{}@minus{}version}, @b{@minus{}V} |
---|
786 | Print version information. |
---|
787 | |
---|
788 | @item @b{@minus{}@minus{}interactive, @minus{}i} |
---|
789 | This option toggles interactive mode, which is by default off. In the |
---|
790 | interactive mode the program does not buffer the output. |
---|
791 | |
---|
792 | @end table |
---|
793 | |
---|
794 | @node tok example |
---|
795 | @subsection Example |
---|
796 | |
---|
797 | Input: |
---|
798 | |
---|
799 | @example |
---|
800 | Piszemy dobre programy. |
---|
801 | @end example |
---|
802 | |
---|
803 | Output: |
---|
804 | |
---|
805 | @example |
---|
806 | 0000 07 W Piszemy |
---|
807 | 0007 01 S _ |
---|
808 | 0008 05 W dobre |
---|
809 | 0013 01 S _ |
---|
810 | 0014 08 W programy |
---|
811 | 0022 01 P . |
---|
812 | 0023 01 S \n |
---|
813 | @end example |
---|
814 | |
---|
815 | |
---|
816 | @c --------------------------------------------------------------------- |
---|
817 | @c SEN |
---|
818 | @c --------------------------------------------------------------------- |
---|
819 | |
---|
820 | @c @node sen - sentencizer |
---|
821 | @c @chapter sen - sentencizer |
---|
822 | |
---|
823 | @c Authors: Tomasz Obrêbski |
---|
824 | |
---|
825 | @c --------------------------------------------------------------------- |
---|
826 | @c LEM |
---|
827 | @c --------------------------------------------------------------------- |
---|
828 | |
---|
829 | @page |
---|
830 | @node lem |
---|
831 | @section lem - morphological analyzer |
---|
832 | |
---|
833 | @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
834 | @item @strong{Authors:} @tab Tomasz Obrêbski, Micha³ Stolarski |
---|
835 | @item @strong{Component category:} @tab filter |
---|
836 | @end multitable |
---|
837 | |
---|
838 | @menu |
---|
839 | * lem description:: |
---|
840 | * lem command line options:: |
---|
841 | * lem input:: |
---|
842 | * lem output:: |
---|
843 | * lem example:: |
---|
844 | * lem dictionaries:: |
---|
845 | * lem hints:: |
---|
846 | @end menu |
---|
847 | |
---|
848 | @node lem description |
---|
849 | @subsection Description |
---|
850 | |
---|
851 | @command{lem} performs morphological analysis of a simple orthographic |
---|
852 | word, returning all its possible morphological annotations, |
---|
853 | disregarding the context. |
---|
854 | |
---|
855 | @c ---------------------------------------- |
---|
856 | |
---|
857 | @node lem command line options |
---|
858 | @subsection Command line options |
---|
859 | |
---|
860 | @table @code |
---|
861 | @parhelp |
---|
862 | @parversion |
---|
863 | @parinteractive |
---|
864 | @c @parfile |
---|
865 | @c @paroutput |
---|
866 | @c @parfail |
---|
867 | @c @parcopy |
---|
868 | @parinputfield |
---|
869 | @paroutputfield |
---|
870 | @pardictionary |
---|
871 | @parprocess |
---|
872 | @parselect |
---|
873 | @parunselect |
---|
874 | @paroneline |
---|
875 | @paronefield |
---|
876 | @end table |
---|
877 | |
---|
878 | @c ---------------------------------------- |
---|
879 | |
---|
880 | @node lem input |
---|
881 | @subsection Input |
---|
882 | |
---|
883 | Lem reads a UTT file and processes the value of the @var{form} field |
---|
884 | (the input field may be changed with @option{--input-field} option). |
---|
885 | |
---|
886 | @node lem output |
---|
887 | @subsection Output |
---|
888 | |
---|
889 | @command{lem} adds a new annotation field, whose default name is @code{lem}. In |
---|
890 | case of ambiguity either the segment is multiplicated (default), |
---|
891 | multiple @code{lem} fields are added (@option{--one-line}) or ambiguous |
---|
892 | annotation is produced as the value of single @code{lem} field (option |
---|
893 | @option{--one-field,-1}): |
---|
894 | |
---|
895 | @itemize @bullet |
---|
896 | |
---|
897 | @item |
---|
898 | unambiguous value format: |
---|
899 | |
---|
900 | @example |
---|
901 | <lemma>,<descr> |
---|
902 | @end example |
---|
903 | |
---|
904 | @item |
---|
905 | ambiguous value format (@option{--one-field} option) |
---|
906 | |
---|
907 | |
---|
908 | @example |
---|
909 | <lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]] |
---|
910 | @end example |
---|
911 | |
---|
912 | (alternative descriptions for the same lemma are separated by commas, |
---|
913 | alternative lemmata are separated by semicolons.) |
---|
914 | |
---|
915 | @end itemize |
---|
916 | |
---|
917 | @node lem example |
---|
918 | @subsection Example |
---|
919 | |
---|
920 | Input: |
---|
921 | |
---|
922 | @example |
---|
923 | 0000 07 W Piszemy |
---|
924 | 0007 01 S _ |
---|
925 | 0008 05 W dobre |
---|
926 | 0013 01 S _ |
---|
927 | 0014 08 W programy |
---|
928 | 0022 01 P . |
---|
929 | 0023 01 B \n |
---|
930 | @end example |
---|
931 | |
---|
932 | Output (default): |
---|
933 | |
---|
934 | @example |
---|
935 | 0000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1 |
---|
936 | 0007 01 B _ |
---|
937 | 0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn |
---|
938 | 0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn |
---|
939 | 0013 01 B _ |
---|
940 | 0014 08 W programy lem:program,N/GiNpCa |
---|
941 | 0014 08 W programy lem:program,N/GiNpCn |
---|
942 | 0014 08 W programy lem:program,N/GiNpCv |
---|
943 | 0022 01 P . |
---|
944 | 0023 01 B \n |
---|
945 | @end example |
---|
946 | |
---|
947 | Output (@option{--one-line} option): |
---|
948 | |
---|
949 | @example |
---|
950 | 0000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1 |
---|
951 | 0007 01 S _ |
---|
952 | 0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn |
---|
953 | 0013 01 S _ |
---|
954 | 0014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv |
---|
955 | 0022 01 P . |
---|
956 | 0023 01 S \n |
---|
957 | @end example |
---|
958 | |
---|
959 | Output (@option{--one-field} option): |
---|
960 | |
---|
961 | @example |
---|
962 | 0000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1 |
---|
963 | 0007 01 S _ |
---|
964 | 0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn |
---|
965 | 0013 01 S _ |
---|
966 | 0014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv |
---|
967 | 0022 01 P . |
---|
968 | 0023 01 S \n |
---|
969 | @end example |
---|
970 | |
---|
971 | @c ---------------------------------------- |
---|
972 | |
---|
973 | @node lem dictionaries |
---|
974 | @subsection Dictionaries |
---|
975 | |
---|
976 | @command{lem} requires a dictionary. The dictionary may be provided in |
---|
977 | one of two formats: in text (source) format or in binary (fsa) format. |
---|
978 | |
---|
979 | @subsubheading Text format |
---|
980 | |
---|
981 | Dictionary entries have the following structure: |
---|
982 | |
---|
983 | @example |
---|
984 | <form>;<lemma>,<descr>[;<lemma>,<descr>] |
---|
985 | @end example |
---|
986 | |
---|
987 | @var{lemma} may be given explicitly or in the cut-add format: |
---|
988 | |
---|
989 | @example |
---|
990 | @code{[<cut1><add1>-]<cut2><add2>} |
---|
991 | @end example |
---|
992 | |
---|
993 | meaning: replace prefix of length @code{<cut1>} with |
---|
994 | string @code{<add1>}, replace suffix of length @code{<cut2>} with string |
---|
995 | @code{<add2>}. For example @code{3t} transforms @samp{kocie} into |
---|
996 | @samp{kot}, @code{3-4a³y} transforms @samp{najbielsi} into @samp{bia³y} |
---|
997 | |
---|
998 | Each dictionary entry must be written in one line and must not contain blank characters. |
---|
999 | |
---|
1000 | Examples: |
---|
1001 | @example |
---|
1002 | kot;0,N/GaNsCn |
---|
1003 | kota;1,N/GaNsCg;1,N/GaNsCa |
---|
1004 | kotu;1,N/GaNsCd |
---|
1005 | kotem;2,N/GaNsCi |
---|
1006 | kocie;3t,N/GaNsCl;3t,N/GaNsCv |
---|
1007 | najbielsi;3-4a³y,ADJ/DsNpCnGp |
---|
1008 | najbielsze;3-5a³y,ADJ/DsNpCnGaifn |
---|
1009 | najlepsi;dobry,ADJ/DsNpCnGp |
---|
1010 | najlepsze;dobry,ADJ/DsNpCnGaifn |
---|
1011 | @end example |
---|
1012 | |
---|
1013 | |
---|
1014 | The mandatory file name extension for a text dictionary is @code{dic}. For large |
---|
1015 | dictionaries it is preferable, however, to compile them into binary |
---|
1016 | (fsa) format. |
---|
1017 | |
---|
1018 | @subsubheading Binary format |
---|
1019 | |
---|
1020 | The mandatory file name extension for a binary dictionary is @code{bin}. To |
---|
1021 | compile a text dictionary into binary format, write: |
---|
1022 | |
---|
1023 | @example |
---|
1024 | compiledic <dictionaryname>.dic |
---|
1025 | @end example |
---|
1026 | |
---|
1027 | @subsubheading Polex/PMDBF dictionary |
---|
1028 | |
---|
1029 | A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in |
---|
1030 | the distribution as the default @emph{lem}'s dictionary. It's |
---|
1031 | located by default in: |
---|
1032 | |
---|
1033 | @file{$HOME/.utt/pl/lem.bin} |
---|
1034 | |
---|
1035 | @node lem hints |
---|
1036 | @subsection Hints |
---|
1037 | |
---|
1038 | @c @subsubheading Combining data from multiple dictionaries |
---|
1039 | |
---|
1040 | @c @itemize |
---|
1041 | |
---|
1042 | @c @item Apply <dict1>, then apply <dict2> to words which were not annotatated. |
---|
1043 | |
---|
1044 | @c @example |
---|
1045 | @c lem -d <dict1> | lem -S lem -d <dict2> |
---|
1046 | @c @end example |
---|
1047 | |
---|
1048 | @c @item Add annotations from two dictionaries <dict1> and <dict2>. |
---|
1049 | |
---|
1050 | @c @example |
---|
1051 | @c lem -c -d <dict1> | lem -S lem -d <dict2> |
---|
1052 | @c @end example |
---|
1053 | |
---|
1054 | @c @end itemize |
---|
1055 | |
---|
1056 | |
---|
1057 | @c --------------------------------------------------------------------- |
---|
1058 | @c GUE |
---|
1059 | @c --------------------------------------------------------------------- |
---|
1060 | |
---|
1061 | @page |
---|
1062 | @node gue |
---|
1063 | @section gue - morphological guesser |
---|
1064 | |
---|
1065 | @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
1066 | |
---|
1067 | @item @strong{Authors:} @tab Micha³ Stolarski, Tomasz Obrêbski |
---|
1068 | @item @strong{Component category:} @tab filter |
---|
1069 | |
---|
1070 | @end multitable |
---|
1071 | |
---|
1072 | @command{gue} guesess morphological descriptions of the form contained |
---|
1073 | in the @var{form} field. |
---|
1074 | |
---|
1075 | @menu |
---|
1076 | * gue command line options:: |
---|
1077 | * gue example:: |
---|
1078 | * gue dictionaries:: |
---|
1079 | @end menu |
---|
1080 | |
---|
1081 | @node gue command line options |
---|
1082 | @subsection Command line options |
---|
1083 | |
---|
1084 | @table @code |
---|
1085 | |
---|
1086 | @parhelp |
---|
1087 | @parversion |
---|
1088 | @parinteractive |
---|
1089 | @c @parfile |
---|
1090 | @c @paroutput |
---|
1091 | @c @parfail |
---|
1092 | @c @parcopy |
---|
1093 | @parinputfield |
---|
1094 | @paroutputfield |
---|
1095 | @pardictionary |
---|
1096 | @parprocess |
---|
1097 | @parselect |
---|
1098 | @parunselect |
---|
1099 | @paroneline |
---|
1100 | @paronefield |
---|
1101 | |
---|
1102 | @item @b{@minus{}@minus{}delta=@var{n}} |
---|
1103 | Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2'). |
---|
1104 | |
---|
1105 | |
---|
1106 | @item @b{@minus{}@minus{}cut-off=@var{n}} |
---|
1107 | Do not display answers with less weight than cut-off value (default=`200'). |
---|
1108 | |
---|
1109 | |
---|
1110 | @item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}} |
---|
1111 | Guess up to n descriptions (default=`0', which means 'display all results'). |
---|
1112 | |
---|
1113 | |
---|
1114 | |
---|
1115 | @end table |
---|
1116 | |
---|
1117 | @node gue example |
---|
1118 | @subsection Example |
---|
1119 | |
---|
1120 | @example |
---|
1121 | command: gue -n 2 |
---|
1122 | |
---|
1123 | input: |
---|
1124 | 0000 07 W smerfny |
---|
1125 | |
---|
1126 | output: |
---|
1127 | 0000 07 W smerfny gue:,ADJ/CaDpGiNs |
---|
1128 | 0000 07 W smerfny gue:,ADJ/CnvDpGaipNs |
---|
1129 | @end example |
---|
1130 | |
---|
1131 | |
---|
1132 | @node gue dictionaries |
---|
1133 | @subsection Dictionaries |
---|
1134 | |
---|
1135 | @command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format. |
---|
1136 | The fsa format is created by compiling text-format dictionaries. |
---|
1137 | |
---|
1138 | |
---|
1139 | |
---|
1140 | @subsubheading Text format |
---|
1141 | |
---|
1142 | Dictionary entries have the following structure: |
---|
1143 | |
---|
1144 | @example |
---|
1145 | @var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight} |
---|
1146 | @end example |
---|
1147 | |
---|
1148 | @var{lemma} must be given in the cut-add format: |
---|
1149 | |
---|
1150 | @example |
---|
1151 | @code{[<cut1><add1>-]<cut2><add2>} |
---|
1152 | @end example |
---|
1153 | (no spaces in between): replace prefix of length @var{cut1} with |
---|
1154 | string @var{add1}, replace suffix of length @var{cat2} with string |
---|
1155 | @var{add2}. |
---|
1156 | |
---|
1157 | |
---|
1158 | Example: @code{3-4a³y} transforms @i{najbielsi} into @i{bia³y} |
---|
1159 | |
---|
1160 | |
---|
1161 | @var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.). |
---|
1162 | |
---|
1163 | @var{weight} is an integer value between 1 and 999 indicating the |
---|
1164 | likelihood of the guess. |
---|
1165 | |
---|
1166 | @example |
---|
1167 | *³kê;1a,N/GfNsCa |
---|
1168 | naj*elszy;3-4a³y,ADJ/...:... |
---|
1169 | @end example |
---|
1170 | |
---|
1171 | |
---|
1172 | @c --------------------------------------------------------------------- |
---|
1173 | @c COR |
---|
1174 | @c --------------------------------------------------------------------- |
---|
1175 | |
---|
1176 | @page |
---|
1177 | @node cor |
---|
1178 | @section cor - spelling corrector |
---|
1179 | |
---|
1180 | @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
1181 | @item @strong{Authors:} @tab Tomasz Obrêbski, Micha³ Stolarski |
---|
1182 | @item @strong{Component category:} @tab filter |
---|
1183 | @end multitable |
---|
1184 | |
---|
1185 | The spelling corrector applies Kemal Oflazer's dynamic programming |
---|
1186 | algorithm @cite{oflazer96} to the FSA representation of the set of |
---|
1187 | word forms of the Polex/PMDBF dictionary. Given an incorrect |
---|
1188 | word form it returns all word forms present in the dictionary whose |
---|
1189 | edit distance is smaller than the threshold given as the parameter. |
---|
1190 | |
---|
1191 | By default @code{cor} replaces the contents of the @var{form} field |
---|
1192 | with new corrected value, placing the old contents in the @code{cor} |
---|
1193 | field. |
---|
1194 | |
---|
1195 | |
---|
1196 | @menu |
---|
1197 | * cor command line options:: |
---|
1198 | * cor dictionaries:: |
---|
1199 | @end menu |
---|
1200 | |
---|
1201 | |
---|
1202 | @node cor command line options |
---|
1203 | @subsection Command line options |
---|
1204 | |
---|
1205 | @table @code |
---|
1206 | |
---|
1207 | @parhelp |
---|
1208 | @parversion |
---|
1209 | @parinteractive |
---|
1210 | @c @parfile |
---|
1211 | @c @paroutput |
---|
1212 | @c @parfail |
---|
1213 | @c @parcopy |
---|
1214 | @parinputfield |
---|
1215 | @paroutputfield |
---|
1216 | @pardictionary |
---|
1217 | @parprocess |
---|
1218 | @parselect |
---|
1219 | @parunselect |
---|
1220 | @paroneline |
---|
1221 | @paronefield |
---|
1222 | |
---|
1223 | @item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}} |
---|
1224 | Maximum edit distance (default='1'). |
---|
1225 | |
---|
1226 | |
---|
1227 | @end table |
---|
1228 | |
---|
1229 | @node cor dictionaries |
---|
1230 | @subsection Dictionaries |
---|
1231 | |
---|
1232 | @command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format. |
---|
1233 | The fsa format is created by compiling text-format dictionaries. |
---|
1234 | |
---|
1235 | @subsubheading Text format |
---|
1236 | |
---|
1237 | The @command{cor} dictionary is a list of words: |
---|
1238 | @example |
---|
1239 | odlot |
---|
1240 | odlotowy |
---|
1241 | odludek |
---|
1242 | @end example |
---|
1243 | |
---|
1244 | @page |
---|
1245 | @node sen |
---|
1246 | @section sen - a sentensizer |
---|
1247 | |
---|
1248 | @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
1249 | |
---|
1250 | @item @strong{Authors:} @tab Tomasz Obrêbski |
---|
1251 | @item @strong{Component category:} @tab filter |
---|
1252 | |
---|
1253 | @end multitable |
---|
1254 | |
---|
1255 | @command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation. |
---|
1256 | |
---|
1257 | @menu |
---|
1258 | @c * sen input:: |
---|
1259 | @c * sen output:: |
---|
1260 | * sen example:: |
---|
1261 | @end menu |
---|
1262 | |
---|
1263 | @node sen example |
---|
1264 | @subsection Example |
---|
1265 | |
---|
1266 | @example |
---|
1267 | command: sen |
---|
1268 | |
---|
1269 | input: |
---|
1270 | 0000 05 W Cze¶æ |
---|
1271 | 0005 01 P ! |
---|
1272 | 0006 01 S _ |
---|
1273 | 0007 02 W To |
---|
1274 | 0009 01 S _ |
---|
1275 | 0010 02 W ja |
---|
1276 | 0012 01 P . |
---|
1277 | 0013 01 S \n |
---|
1278 | |
---|
1279 | output: |
---|
1280 | 0000 00 BOS * |
---|
1281 | 0000 05 W Cze¶æ |
---|
1282 | 0005 01 P ! |
---|
1283 | 0006 00 EOS * |
---|
1284 | 0006 00 BOS * |
---|
1285 | 0006 01 S _ |
---|
1286 | 0007 02 W To |
---|
1287 | 0009 01 S _ |
---|
1288 | 0010 02 W ja |
---|
1289 | 0012 01 P . |
---|
1290 | 0013 01 S \n |
---|
1291 | 0014 00 EOS * |
---|
1292 | @end example |
---|
1293 | |
---|
1294 | |
---|
1295 | @c --------------------------------------------------------------------- |
---|
1296 | @c GPH |
---|
1297 | @c --------------------------------------------------------------------- |
---|
1298 | |
---|
1299 | @c @node gph - graphizer |
---|
1300 | @c @chapter gph - graphizer |
---|
1301 | |
---|
1302 | @c Authors: Tomasz Obrêbski |
---|
1303 | |
---|
1304 | |
---|
1305 | |
---|
1306 | @c SER |
---|
1307 | @c --------------------------------------------------------------------- |
---|
1308 | @c --------------------------------------------------------------------- |
---|
1309 | |
---|
1310 | @page |
---|
1311 | @node ser |
---|
1312 | @section ser - pattern search tool |
---|
1313 | |
---|
1314 | @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
1315 | @item @strong{Authors:} @tab Tomasz Obrêbski |
---|
1316 | @item @strong{Component category:} @tab filter |
---|
1317 | @end multitable |
---|
1318 | |
---|
1319 | @command{ser} looks for patterns in UTT-formatted texts. |
---|
1320 | |
---|
1321 | @menu |
---|
1322 | * ser command line options:: |
---|
1323 | * ser pattern:: |
---|
1324 | * ser how ser works:: |
---|
1325 | * ser customization:: |
---|
1326 | * ser limitations:: |
---|
1327 | * ser requirements:: |
---|
1328 | @end menu |
---|
1329 | |
---|
1330 | |
---|
1331 | @c --------------------------------------------------------------------- |
---|
1332 | @node ser command line options |
---|
1333 | @subsection Command line options |
---|
1334 | |
---|
1335 | @table @code |
---|
1336 | |
---|
1337 | @parhelp |
---|
1338 | @parversion |
---|
1339 | @c @parfile |
---|
1340 | @c @paroutput |
---|
1341 | @c @parinputfield |
---|
1342 | @c @paroutputfield |
---|
1343 | @parprocess |
---|
1344 | @parinteractive |
---|
1345 | |
---|
1346 | @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}} |
---|
1347 | The search pattern. |
---|
1348 | |
---|
1349 | @item @b{@minus{}@minus{}morph=@var{field}} |
---|
1350 | The name of the annotation field containing the morphological |
---|
1351 | description (default @code{lem}). |
---|
1352 | |
---|
1353 | @item @b{@minus{}@minus{}flex} |
---|
1354 | Only print the generated flex source code. |
---|
1355 | |
---|
1356 | @item @b{@minus{}@minus{}macro=@var{filename}} |
---|
1357 | Read macrodefinitions from file @var{filename} rather than from |
---|
1358 | default location. This option allows to redefine the set of terms. |
---|
1359 | |
---|
1360 | @item @b{@minus{}@minus{}define=@var{filename}} |
---|
1361 | Append macrodefinitions from file @var{filename}. This option |
---|
1362 | allows to extend the set of terms. |
---|
1363 | |
---|
1364 | @end table |
---|
1365 | |
---|
1366 | |
---|
1367 | @c --------------------------------------------------------------------- |
---|
1368 | @node ser pattern |
---|
1369 | @subsection Pattern |
---|
1370 | |
---|
1371 | The @command{ser} pattern is a regular expression over terms corresponding |
---|
1372 | to text segments or segment sequences. Predefined terms are: |
---|
1373 | |
---|
1374 | @table @code |
---|
1375 | |
---|
1376 | @item seg(@var{t},@var{f},@var{a}) |
---|
1377 | a segment of type @var{t}, containing form @var{f} and annotation |
---|
1378 | @var{a} |
---|
1379 | |
---|
1380 | @item form(@var{f}) |
---|
1381 | a segment containing form @var{f} |
---|
1382 | |
---|
1383 | @item field(@var{f}) |
---|
1384 | a segment containing annotation field @var{f} |
---|
1385 | |
---|
1386 | @item space(@var{f}) |
---|
1387 | a space segment of form @var{f} |
---|
1388 | |
---|
1389 | @item word(@var{f}) |
---|
1390 | a word segment of form @var{f} |
---|
1391 | |
---|
1392 | @item punct(@var{f}) |
---|
1393 | a punct segment of form @var{f} |
---|
1394 | |
---|
1395 | @item number(@var{f}) |
---|
1396 | a number segment of form @var{f} |
---|
1397 | |
---|
1398 | @item lexeme(@var{f}) |
---|
1399 | a word segment with lemma @var{f} |
---|
1400 | |
---|
1401 | @item cat(@var{c}) |
---|
1402 | a word segment of category @var{c} |
---|
1403 | |
---|
1404 | @end table |
---|
1405 | |
---|
1406 | All arguments are optional. If an argument is omitted, an arbitrary |
---|
1407 | string of non-blank characters is assumed as the argument value. Term |
---|
1408 | arguments may be arbitrary character-level regular expressions. The |
---|
1409 | following special symbols can by used: |
---|
1410 | |
---|
1411 | @multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
1412 | @item @code{[@dots{}]} @tab a character class |
---|
1413 | @item @code{[^@dots{}]} @tab a negated character class |
---|
1414 | @item @code{|} @tab alternative |
---|
1415 | @item @code{*} @tab repetition, including zero times |
---|
1416 | @item @code{+} @tab repetition, at least one time |
---|
1417 | @item @code{?} @tab optionality |
---|
1418 | @item @code{@{@var{m},@var{n}@}} @tab repetition from @var{m} to @var{n} times |
---|
1419 | @item @code{@{@var{m},@}} @tab repetition @var{m} or more times |
---|
1420 | @item @code{@{@var{m}@}} @tab repetition @var{m} times |
---|
1421 | @item @code{@var{\ddd}} @tab the character with octal value @var{ddd} |
---|
1422 | @item @code{\x@var{hh}} @tab the character with hexadecimal value @var{hh} |
---|
1423 | @item @code{( )} @tab parentheses, used to override precedence |
---|
1424 | @c @end multitable |
---|
1425 | |
---|
1426 | @c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
1427 | @item @code{.} @tab a non-blank character |
---|
1428 | @item @code{\w} @tab a letter |
---|
1429 | @item @code{\W} @tab a non-blank character other than a letter |
---|
1430 | @item @code{\d} @tab a digit |
---|
1431 | @item @code{\D} @tab a non-blank character other than a digit |
---|
1432 | @item @code{\s} @tab a space or tab character |
---|
1433 | @item @code{\S} @tab a non-blank character (the same as @code{.}) |
---|
1434 | @item @code{\l} @tab a lowercase letter |
---|
1435 | @item @code{\L} @tab an uppercase letter |
---|
1436 | @end multitable |
---|
1437 | |
---|
1438 | |
---|
1439 | @noindent The following characters: |
---|
1440 | @example |
---|
1441 | @verb{% [ ] ^ | * + ? { } , . < > \ %} |
---|
1442 | @end example |
---|
1443 | must be escaped with a backslash, i.e. written as: |
---|
1444 | @example |
---|
1445 | @verb{% \[ \] \^ \| \* \+ \? \{ \} \, \. \< \> \\ %} |
---|
1446 | @end example |
---|
1447 | |
---|
1448 | @quotation Note |
---|
1449 | The special symbols are ... borrowed from Perl with minor |
---|
1450 | modifications ... for convenience |
---|
1451 | The meaning of certain special characters/sequences slightly differs |
---|
1452 | from their common ???. This is motivated by convenience reasons. |
---|
1453 | The meaning of the @code{.} special character is modified due to |
---|
1454 | the special function of spaces in utt files (they are field |
---|
1455 | separators). Use @code{\s} to explicitly |
---|
1456 | @end quotation |
---|
1457 | |
---|
1458 | In the argument of the @code{cat} term a special operator <...> may be |
---|
1459 | used. A category specification enclosed in angle brackets matches all |
---|
1460 | category descriptions which are consistent (non-contradictory) with the |
---|
1461 | specification. For example @code{<N>} matches all noun descriptions, |
---|
1462 | @code{<ADJ/Can>} matches all adjectives in accusative or nominal case. |
---|
1463 | |
---|
1464 | |
---|
1465 | @* |
---|
1466 | @noindent @b{Examples of one-segment patterns:} |
---|
1467 | |
---|
1468 | @multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
1469 | @item @code{seg} @tab any segment |
---|
1470 | @item @code{word} @tab any word-form |
---|
1471 | @item @code{word(pomocy)} @tab the word-form @samp{pomocy} |
---|
1472 | @item @code{word(naj.+)} @tab a word-form beginning with @samp{naj} |
---|
1473 | @item @code{word(\L\l+)} @tab a capitalized word-form |
---|
1474 | @item @code{punct} @tab a punctuation character |
---|
1475 | @item @code{space(.*\\n.*)} @tab a space segment containing a newline character |
---|
1476 | @item @code{lexeme(pomoc)} @tab any form of the lexeme 'pomoc' |
---|
1477 | @item @code{cat(N/.*)} @tab a word which category starts with @code{N/} |
---|
1478 | @item @code{cat(<N/Ca>)} @tab a word which category matches @code{N/Ca} |
---|
1479 | @end multitable |
---|
1480 | |
---|
1481 | @* |
---|
1482 | @noindent @b{Examples of multi-segment patterns:} |
---|
1483 | |
---|
1484 | @table @code |
---|
1485 | |
---|
1486 | @item (word(\L) punct(\.) space?)+ word(\L\l+) |
---|
1487 | a sequence of initials followed by a surname |
---|
1488 | |
---|
1489 | @item punct seg(W|S|N)* cat(<NPRO/Sr>) seg(W|S|N)* punct |
---|
1490 | a text fragment between two punctuation characters, containing an |
---|
1491 | ocurrence of a relative pronoun |
---|
1492 | |
---|
1493 | @end table |
---|
1494 | |
---|
1495 | |
---|
1496 | @node ser how ser works |
---|
1497 | @subsection How ser works |
---|
1498 | |
---|
1499 | @node ser customization |
---|
1500 | @subsection Customization |
---|
1501 | |
---|
1502 | @c All predefined terms correspond to single segments, |
---|
1503 | |
---|
1504 | @example |
---|
1505 | define(`verbseq', `(cat(V) (space cat(V)))') |
---|
1506 | @end example |
---|
1507 | |
---|
1508 | |
---|
1509 | the term @code{cat()} may not be used as a ... of |
---|
1510 | |
---|
1511 | @c See @command{m4} manual for further details on macro definition format. |
---|
1512 | |
---|
1513 | @node ser limitations |
---|
1514 | @subsection Limitations |
---|
1515 | |
---|
1516 | more than 3 attributes in <>. |
---|
1517 | |
---|
1518 | @node ser requirements |
---|
1519 | @subsection Requirements |
---|
1520 | |
---|
1521 | In order to run @command{ser}, the following programs must be |
---|
1522 | installed in the system: |
---|
1523 | |
---|
1524 | @itemize |
---|
1525 | |
---|
1526 | @item @command{m4} |
---|
1527 | @item @command{grep} |
---|
1528 | @item @command{flex} |
---|
1529 | @item @command{gcc} |
---|
1530 | |
---|
1531 | @end itemize |
---|
1532 | |
---|
1533 | |
---|
1534 | @c GRP |
---|
1535 | @c --------------------------------------------------------------------- |
---|
1536 | @c --------------------------------------------------------------------- |
---|
1537 | |
---|
1538 | @page |
---|
1539 | @node grp |
---|
1540 | @section grp - pattern search tool |
---|
1541 | |
---|
1542 | @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
1543 | @item @strong{Authors:} @tab Tomasz Obrêbski |
---|
1544 | @item @strong{Component category:} @tab filter |
---|
1545 | @end multitable |
---|
1546 | |
---|
1547 | |
---|
1548 | @code{gre} selects sentences containing an expression matching a |
---|
1549 | pattern. The pattern format is exactly the same as that accepted by |
---|
1550 | @code{ser}. |
---|
1551 | |
---|
1552 | @code{gre} is intended mainly for speeding up corpus search process. |
---|
1553 | It is extremely fast (processing speed is usually higher then the speed |
---|
1554 | of reading the corpus file from disk). |
---|
1555 | |
---|
1556 | |
---|
1557 | |
---|
1558 | @c @menu |
---|
1559 | @c * ser command line options:: |
---|
1560 | @c * ser pattern:: |
---|
1561 | @c * ser how ser works:: |
---|
1562 | @c * ser customization:: |
---|
1563 | @c * ser limitations:: |
---|
1564 | @c * ser requirements:: |
---|
1565 | @c @end menu |
---|
1566 | @menu |
---|
1567 | * grp command line options:: |
---|
1568 | * grp pattern:: |
---|
1569 | * grp hints:: |
---|
1570 | @end menu |
---|
1571 | |
---|
1572 | @node grp command line options |
---|
1573 | @subsection Command line options |
---|
1574 | |
---|
1575 | @table @code |
---|
1576 | |
---|
1577 | @parhelp |
---|
1578 | @parversion |
---|
1579 | @c @parfile |
---|
1580 | @c @paroutput |
---|
1581 | @c @parinputfield |
---|
1582 | @c @paroutputfield |
---|
1583 | @parprocess |
---|
1584 | @parinteractive |
---|
1585 | |
---|
1586 | @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}} |
---|
1587 | The search pattern. |
---|
1588 | |
---|
1589 | @item @b{@minus{}@minus{}morph=@var{field}} |
---|
1590 | The name of the annotation field containing the morphological |
---|
1591 | description (default @code{lem}). |
---|
1592 | |
---|
1593 | @item @b{@minus{}@minus{}command} |
---|
1594 | Only print the generated flex source code. |
---|
1595 | |
---|
1596 | @item @b{@minus{}@minus{}macro=@var{filename}} |
---|
1597 | Read macrodefinitions from file @var{filename} rather than from |
---|
1598 | default location. This option allows to redefine the set of terms. |
---|
1599 | |
---|
1600 | @item @b{@minus{}@minus{}define=@var{filename}} |
---|
1601 | Append macrodefinitions from file @var{filename}. This option |
---|
1602 | allows to extend the set of terms. |
---|
1603 | |
---|
1604 | @end table |
---|
1605 | |
---|
1606 | |
---|
1607 | @node grp pattern |
---|
1608 | @subsection Pattern |
---|
1609 | |
---|
1610 | (see @code{ser}) |
---|
1611 | |
---|
1612 | @node grp hints |
---|
1613 | @subsection Hints |
---|
1614 | |
---|
1615 | The corpus search speed may be increased by combining grp with lzop |
---|
1616 | compression tool (grp usually processes data faster than it is read from a |
---|
1617 | disk, especially for slow laptop drives). |
---|
1618 | |
---|
1619 | @example |
---|
1620 | cat corpus | tok | sen | lem | grp -a p | lzop -7 > corpus.grp.lzo |
---|
1621 | @end example |
---|
1622 | |
---|
1623 | @example |
---|
1624 | lzop -cd corpus.grp.lzo | grp -a gP -e @var{EXPR} | ser -e @var{EXPR} |
---|
1625 | @end example |
---|
1626 | |
---|
1627 | |
---|
1628 | @c --------------------------------------------------------------------- |
---|
1629 | @c kot |
---|
1630 | @c --------------------------------------------------------------------- |
---|
1631 | @c --------------------------------------------------------------------- |
---|
1632 | |
---|
1633 | @page |
---|
1634 | @node kot |
---|
1635 | @section kot - untokenizer |
---|
1636 | |
---|
1637 | Authors: Tomasz Obrêbski |
---|
1638 | |
---|
1639 | @command{kot} is the opposite of @command{tok}. It changes UTT-formatted text into plain text. |
---|
1640 | |
---|
1641 | @menu |
---|
1642 | * kot command line options:: |
---|
1643 | * kot usage examples:: |
---|
1644 | @end menu |
---|
1645 | |
---|
1646 | @node kot command line options |
---|
1647 | @subsection Command line options |
---|
1648 | |
---|
1649 | @table @code |
---|
1650 | |
---|
1651 | @parhelp |
---|
1652 | |
---|
1653 | @c @item @b{@minus{}@minus{}version}, @b{@minus{}v} |
---|
1654 | |
---|
1655 | @c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}} |
---|
1656 | |
---|
1657 | @c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}} |
---|
1658 | |
---|
1659 | @c @item @b{@minus{}@minus{}interactive @minus{}i} |
---|
1660 | |
---|
1661 | @c @item @b{@minus{}@minus{}config=@var{filename}} |
---|
1662 | |
---|
1663 | @item |
---|
1664 | |
---|
1665 | @item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}} |
---|
1666 | print @var{string} between nonadjacent segments of the input file |
---|
1667 | |
---|
1668 | @item @b{@minus{}@minus{}spaces, @minus{}r} |
---|
1669 | retain the special characters @code{_}, @code{\t}, |
---|
1670 | @code{\n}, @code{\r}, @code{\f} unexpanded in the output |
---|
1671 | |
---|
1672 | @end table |
---|
1673 | |
---|
1674 | @node kot usage examples |
---|
1675 | @subsection Usage examples |
---|
1676 | |
---|
1677 | @example |
---|
1678 | cat legia.txt | tok | kot |
---|
1679 | @end example |
---|
1680 | |
---|
1681 | @example |
---|
1682 | cat legia.txt | tok | lem -1 | kot |
---|
1683 | @end example |
---|
1684 | |
---|
1685 | @c CON............................................................ |
---|
1686 | @c ............................................................... |
---|
1687 | @c ............................................................... |
---|
1688 | |
---|
1689 | @page |
---|
1690 | @node con |
---|
1691 | @section con - concordance table generator |
---|
1692 | |
---|
1693 | @command{con} generates a concordance table based on a pattern given to @command{ser}. |
---|
1694 | |
---|
1695 | @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
1696 | @item @strong{Authors:} @tab Justyna Walkowska |
---|
1697 | @item @strong{Component category:} @tab sink |
---|
1698 | @end multitable |
---|
1699 | @c |
---|
1700 | |
---|
1701 | @menu |
---|
1702 | * con command line options:: |
---|
1703 | * con usage example:: |
---|
1704 | * con hints:: |
---|
1705 | @end menu |
---|
1706 | |
---|
1707 | @node con command line options |
---|
1708 | @subsection Command line options |
---|
1709 | |
---|
1710 | @table @code |
---|
1711 | |
---|
1712 | @parhelp |
---|
1713 | |
---|
1714 | @c @item @b{@minus{}@minus{}help}, @b{@minus{}h} |
---|
1715 | @c @item @b{@minus{}@minus{}version}, @b{@minus{}v} |
---|
1716 | @c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}} |
---|
1717 | @c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}} |
---|
1718 | @c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???] |
---|
1719 | @c @item @b{@minus{}@minus{}copy, @minus{}c} [???] |
---|
1720 | @c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}} |
---|
1721 | @c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}} |
---|
1722 | @c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}} |
---|
1723 | @c @item @b{@minus{}@minus{}interactive @minus{}i} |
---|
1724 | @c @item @b{@minus{}@minus{}config=@var{filename}} |
---|
1725 | @c @item |
---|
1726 | @c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}} |
---|
1727 | @c search pattern |
---|
1728 | @c |
---|
1729 | @c @item @b{@minus{}@minus{}flex} |
---|
1730 | @c only print the generated flex source code |
---|
1731 | @c |
---|
1732 | @c @item @b{@minus{}@minus{}macro=@var{filename}} |
---|
1733 | @c read macrodefinitions from file @var{filename} rather than from |
---|
1734 | @c default location. This option allows to redefine the set of terms. |
---|
1735 | @c |
---|
1736 | @c @item @b{@minus{}@minus{}define=@var{filename}} |
---|
1737 | @c append macrodefinitions from file @var{filename}. This option |
---|
1738 | @c allows to extend the set of terms. |
---|
1739 | |
---|
1740 | @item @b{@minus{}@minus{}left @minus{}l} |
---|
1741 | Left context info (default='30c'). Example: |
---|
1742 | @example |
---|
1743 | -l=5c: left context is 5 characters |
---|
1744 | -l=5w: left context is 5 words |
---|
1745 | -l=5s: left context is 5 non-empty input lines |
---|
1746 | -l='\s*\S+\sr\S+BOS': left context starts with the given regex |
---|
1747 | @end example |
---|
1748 | |
---|
1749 | @item @b{@minus{}@minus{}right @minus{}r} |
---|
1750 | Right context info (default='30c'). |
---|
1751 | @item @b{@minus{}@minus{}trim @minus{}t} |
---|
1752 | Clear incomplete words from output. |
---|
1753 | @item @b{@minus{}@minus{}white @minus{}w} |
---|
1754 | DO NOT change all white characters into spaces. |
---|
1755 | @item @b{@minus{}@minus{}column @minus{}c} |
---|
1756 | Left column minimal width in characters (default = 0). |
---|
1757 | @item @b{@minus{}@minus{}ignore @minus{}i} |
---|
1758 | Ignore segment inconsistency in the input. |
---|
1759 | @item @b{@minus{}@minus{}bon} |
---|
1760 | Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*'). |
---|
1761 | @item @b{@minus{}@minus{}eob} |
---|
1762 | End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*'). |
---|
1763 | @item @b{@minus{}@minus{}bod} |
---|
1764 | Selected segment beginning display string (default='['). |
---|
1765 | @item @b{@minus{}@minus{}eod} |
---|
1766 | Selected segment end display string (default=']'). |
---|
1767 | |
---|
1768 | |
---|
1769 | |
---|
1770 | @end table |
---|
1771 | |
---|
1772 | @node con usage example |
---|
1773 | @subsection Usage example |
---|
1774 | @example |
---|
1775 | cat file.txt | tok | lem -1 | ser -e 'lexeme(dom) | con' |
---|
1776 | @end example |
---|
1777 | |
---|
1778 | |
---|
1779 | @node con hints |
---|
1780 | @subsection Hints |
---|
1781 | |
---|
1782 | @command{con} is a rather slow program. Do not pass large amounts of |
---|
1783 | redundant text through this program. @command{con} works fine in the following |
---|
1784 | sequence: |
---|
1785 | |
---|
1786 | @example |
---|
1787 | ... | grp -e EXPR | ser -e EXPR | con |
---|
1788 | @end example |
---|
1789 | |
---|
1790 | |
---|
1791 | |
---|
1792 | @c --------------------------------------------------------------------- |
---|
1793 | @c --------------------------------------------------------------------- |
---|
1794 | |
---|
1795 | @page |
---|
1796 | @node Auxiliary tools |
---|
1797 | @chapter Auxiliary tools |
---|
1798 | |
---|
1799 | @menu |
---|
1800 | * compiledic:: dictionary compiler |
---|
1801 | * fla:: UTT file flattener |
---|
1802 | * unfla:: UTT file unflattener |
---|
1803 | @end menu |
---|
1804 | |
---|
1805 | |
---|
1806 | @page |
---|
1807 | @node compiledic |
---|
1808 | @section compiledic - the dictionary compiler |
---|
1809 | |
---|
1810 | @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
1811 | @item @strong{Authors:} @tab Michal Stolarski, Tomasz Obrebski |
---|
1812 | @item @strong{Component category:} @tab additional tool |
---|
1813 | @end multitable |
---|
1814 | @c |
---|
1815 | |
---|
1816 | @command{compiledic} compiles dictionaries in text format (@code{.dic} extension) into binary |
---|
1817 | (FSA) format (@code{.bin} extension). |
---|
1818 | |
---|
1819 | Automaton representation of a dictionary is built using the AT&T tools: |
---|
1820 | @itemize |
---|
1821 | @item AT&T FSM Library, |
---|
1822 | @item AT&T Lextools. |
---|
1823 | @end itemize |
---|
1824 | |
---|
1825 | In order for the compiledic program to work you have to install the |
---|
1826 | above mentioned packages into your system. They are freely available |
---|
1827 | for non-commercial use. |
---|
1828 | |
---|
1829 | Usage: |
---|
1830 | @example |
---|
1831 | compiledic <dictionaryname>.dic |
---|
1832 | @end example |
---|
1833 | |
---|
1834 | The file <dictionaryname>.bin will be generated. |
---|
1835 | |
---|
1836 | Remarque: The program produces a lot of temporary files which are |
---|
1837 | stored in the current directory. They are deleted after successfull |
---|
1838 | termination of the program. |
---|
1839 | |
---|
1840 | @c @menu |
---|
1841 | @c * con command line options:: |
---|
1842 | @c * con usage example:: |
---|
1843 | @c * con hints:: |
---|
1844 | @c @end menu |
---|
1845 | |
---|
1846 | |
---|
1847 | @page |
---|
1848 | @node fla |
---|
1849 | @section fla - the UTT file flattener |
---|
1850 | |
---|
1851 | @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
1852 | @item @strong{Authors:} @tab Tomasz Obrêbski |
---|
1853 | @item @strong{Component category:} @tab filter |
---|
1854 | @end multitable |
---|
1855 | @c |
---|
1856 | |
---|
1857 | @command{fla} ``flattens'' a utt file by merging segments belonging |
---|
1858 | to one sentence in one line. Technically, end-of-line characters |
---|
1859 | ('\n', ASCII code 10) are replaced with line-feed characters ('\f', |
---|
1860 | ASCII code 12). The flattening makes it possible to process UTT files |
---|
1861 | with such tools as @command{grep} or @command{sed} sentence by |
---|
1862 | sentence (used in @command{grp} and @command{mar}). |
---|
1863 | |
---|
1864 | Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}. |
---|
1865 | |
---|
1866 | Flattened files are still human-readible. |
---|
1867 | |
---|
1868 | Usage: |
---|
1869 | |
---|
1870 | @example |
---|
1871 | fla [<bosregex>] |
---|
1872 | @end example |
---|
1873 | |
---|
1874 | The facultative argument is a regular expression describing segments |
---|
1875 | which should be treated as sentence beginnings (the test is: the |
---|
1876 | segment contains a fragment matching the @code{<bosregex>}). By |
---|
1877 | default, segments containing a field @code{BOS} are seeked. |
---|
1878 | @c @menu |
---|
1879 | @c * con command line options:: |
---|
1880 | @c * con usage example:: |
---|
1881 | @c * con hints:: |
---|
1882 | @c @end menu |
---|
1883 | |
---|
1884 | |
---|
1885 | |
---|
1886 | @page |
---|
1887 | @node unfla |
---|
1888 | @section unfla - the UTT file unflattener |
---|
1889 | |
---|
1890 | @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
1891 | @item @strong{Authors:} @tab Tomasz Obrêbski |
---|
1892 | @item @strong{Component category:} @tab filter |
---|
1893 | @end multitable |
---|
1894 | |
---|
1895 | @command{unfla} transforms a flattened UTT file, produced by |
---|
1896 | @command{fla}, into the regular format by restoring end-of-line |
---|
1897 | characters. |
---|
1898 | |
---|
1899 | |
---|
1900 | |
---|
1901 | |
---|
1902 | @c --------------------------------------------------------------------- |
---|
1903 | @c USAGE EXAMPLES |
---|
1904 | @c --------------------------------------------------------------------- |
---|
1905 | |
---|
1906 | @node Usage examples |
---|
1907 | @chapter Usage examples |
---|
1908 | |
---|
1909 | @subsubheading Simple pipelines |
---|
1910 | |
---|
1911 | @enumerate |
---|
1912 | |
---|
1913 | @item tokenization |
---|
1914 | |
---|
1915 | cat text | tok > output1 |
---|
1916 | |
---|
1917 | @item morphological annotation (1) |
---|
1918 | |
---|
1919 | simple dictionary based lemmatization |
---|
1920 | |
---|
1921 | cat text | tok | lem > output1 |
---|
1922 | |
---|
1923 | @item morphological annotation (2) |
---|
1924 | |
---|
1925 | 1) perform dictionary-based lemmatization |
---|
1926 | 4) guess descriptions for words which have no annotation |
---|
1927 | |
---|
1928 | @example |
---|
1929 | cat text | tok | lem | gue -S lem > output2 |
---|
1930 | @end example |
---|
1931 | |
---|
1932 | @item morphological annotation (3) |
---|
1933 | |
---|
1934 | 1) perform dictionary-based lemmatization |
---|
1935 | 2) try to correct words with no annotation |
---|
1936 | 3) perform dictionary-based lemmatization of corrected words |
---|
1937 | 4) guess descriptions for words which still have no annotation |
---|
1938 | |
---|
1939 | @example |
---|
1940 | cat text | tok | lem | cor -p W -S lem | lem -I cor | gue -p W -S lem |
---|
1941 | @end example |
---|
1942 | @item spelling correction |
---|
1943 | |
---|
1944 | |
---|
1945 | |
---|
1946 | @example |
---|
1947 | cat text | tok | lem --only-fail | cor -1 > output3 |
---|
1948 | @end example |
---|
1949 | |
---|
1950 | @item Expression extraction |
---|
1951 | |
---|
1952 | Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'. |
---|
1953 | |
---|
1954 | @example |
---|
1955 | cat text | tok | lem -1 | ser -e 'cat(<V>) space lexeme(rozmowa)' -m | kot > output4 |
---|
1956 | @end example |
---|
1957 | |
---|
1958 | @item A word in context |
---|
1959 | |
---|
1960 | Extraction of text fragments containing a form of the lexeme 'rozmowa' in |
---|
1961 | the context of 5 preceeding and 5 succeeding corpus segments. |
---|
1962 | |
---|
1963 | @example |
---|
1964 | cat text | tok | lem -1 | ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m | kot > output |
---|
1965 | @end example |
---|
1966 | |
---|
1967 | @item generation of concordance table (1) |
---|
1968 | |
---|
1969 | @example |
---|
1970 | cat text | tok | lem -1 | ser -e 'cat(<V>) space lexeme(rozmowa)' | con |
---|
1971 | @end example |
---|
1972 | |
---|
1973 | 10" |
---|
1974 | |
---|
1975 | @item generation of concordance table (2) |
---|
1976 | |
---|
1977 | The same as above but much faster |
---|
1978 | |
---|
1979 | @example |
---|
1980 | cat text | tok | lem -1 | \ |
---|
1981 | grp -e 'cat(<V>) space lexeme(rozmowa)' | \ |
---|
1982 | ser -e 'cat(<V>) space lexeme(rozmowa)' | \ |
---|
1983 | con |
---|
1984 | @end example |
---|
1985 | |
---|
1986 | 2" |
---|
1987 | |
---|
1988 | @item generation of concordance table (3) |
---|
1989 | |
---|
1990 | Usually, one performs repetitively search over the same corpus. In |
---|
1991 | such case it is advisable to transform the corpus data into the format |
---|
1992 | required by @command{grp} first, and then use the preprocessed data. |
---|
1993 | |
---|
1994 | As @command{grp} (@command{grep}) processes data faster then it is |
---|
1995 | read from the disk drive, the search time may be still shortened by |
---|
1996 | using file compression techniques. We suggest usin @command{lzop}. |
---|
1997 | |
---|
1998 | @item the fastest way to search a large corpus |
---|
1999 | |
---|
2000 | step 1: preprocessing |
---|
2001 | |
---|
2002 | @example |
---|
2003 | cat corpus | tok | sen | lem -1 \ |
---|
2004 | | grp -a p | lzop -7 > corpus.grp.lzo |
---|
2005 | @end example |
---|
2006 | |
---|
2007 | step 2: search |
---|
2008 | |
---|
2009 | @example |
---|
2010 | lzop -cd corpus.grp.lzo | grp -a gP -e 'cat(<V>) space |
---|
2011 | lexeme(rozmowa)' | ser -e 'cat(<V>) space lexeme(rozmowa)' | con |
---|
2012 | @end example |
---|
2013 | |
---|
2014 | @end enumerate |
---|
2015 | |
---|
2016 | @subsubheading More complicated configurations |
---|
2017 | |
---|
2018 | |
---|
2019 | @example |
---|
2020 | mknod fifo1 p |
---|
2021 | mknod fifo2 p |
---|
2022 | mknod fifo3 p |
---|
2023 | mknod fifo4 p |
---|
2024 | mknod fifo5 p |
---|
2025 | |
---|
2026 | tok | lem -p W -e fifo1 > fifo2 & |
---|
2027 | cor -e fifo3 < fifo1 | lem > fifo4 & |
---|
2028 | gue < fifo3 > fifo5 & |
---|
2029 | sort -m fifo2 fifo4 fifo5 |
---|
2030 | |
---|
2031 | rm fifo? |
---|
2032 | @end example |
---|
2033 | |
---|
2034 | |
---|
2035 | @c --------------------------------------------------------------------- |
---|
2036 | @c --------------------------------------------------------------------- |
---|
2037 | |
---|
2038 | @c --------------------------------------------------------------------- |
---|
2039 | @c PMDBF DICTIONARY |
---|
2040 | @c --------------------------------------------------------------------- |
---|
2041 | |
---|
2042 | @node PMDBF dictionary |
---|
2043 | @chapter PMDBF dictionary |
---|
2044 | |
---|
2045 | UTT components come with lexical data derived from Polish |
---|
2046 | Morphological Database (PMDB). |
---|
2047 | |
---|
2048 | @menu |
---|
2049 | * PMDBF files:: |
---|
2050 | * PMDBF tag structure:: |
---|
2051 | * PMDBF parts of speech:: |
---|
2052 | * PMDBF morphosyntactic attributes:: |
---|
2053 | @end menu |
---|
2054 | |
---|
2055 | @node PMDBF files |
---|
2056 | @section Files |
---|
2057 | |
---|
2058 | @node PMDBF tag structure |
---|
2059 | @section Tag structure |
---|
2060 | |
---|
2061 | pos = [[:upper:]]+ |
---|
2062 | |
---|
2063 | attr = [[:upper:]]+ |
---|
2064 | |
---|
2065 | val = [[:lower:][:digit:]?!*+-] | <[^>\n]+> |
---|
2066 | |
---|
2067 | descr = pos ( / ( attr val + ) + ) ? |
---|
2068 | |
---|
2069 | @node PMDBF parts of speech |
---|
2070 | @section Parts of speech |
---|
2071 | |
---|
2072 | @multitable {ADJPRP} { adjectival-passive-participle } |
---|
2073 | @item @code{N} @tab noun |
---|
2074 | @item @code{NPRO} @tab nominal-pronoun |
---|
2075 | @item @code{NV} @tab deverbal-noun |
---|
2076 | @item @code{V} @tab verb |
---|
2077 | @item @code{BYC} @tab byc |
---|
2078 | @item @code{VNI} @tab non-inflected-verb |
---|
2079 | @item @code{ADJ} @tab adjective |
---|
2080 | @item @code{ADJPAP} @tab adjectival-passive-participle |
---|
2081 | @item @code{ADJPRP} @tab adjectival-present-participle |
---|
2082 | @item @code{ADJPP} @tab adjectival-past-participle |
---|
2083 | @item @code{ADJPRO} @tab adjectival-pronoun |
---|
2084 | @item @code{ADJNUM} @tab adjectival-numeral |
---|
2085 | @item @code{ADV} @tab adverb |
---|
2086 | @item @code{ADVANP} @tab adverbial-anterior-participle |
---|
2087 | @item @code{ADVPRP} @tab adverbial-present-participle |
---|
2088 | @item @code{ADVPRO} @tab adverbial-pronoun |
---|
2089 | @item @code{ADVNUM} @tab adverbial-numeral |
---|
2090 | @item @code{P} @tab preposition |
---|
2091 | @item @code{PPRO} @tab prep-noun-pronoun |
---|
2092 | @item @code{CONJ} @tab conjunction |
---|
2093 | @item @code{EXCL} @tab exclamation |
---|
2094 | @item @code{APP} @tab call |
---|
2095 | @item @code{ONO} @tab onomatopoeia |
---|
2096 | @item @code{PART} @tab particle |
---|
2097 | @item @code{NUMCRD} @tab cardinal-numeral |
---|
2098 | @item @code{NUMCOL} @tab collective-numeral |
---|
2099 | @item @code{NUMPAR} @tab partitive-numeral |
---|
2100 | @item @code{NUMORD} @tab ordinal-numeral |
---|
2101 | @end multitable |
---|
2102 | |
---|
2103 | @node PMDBF morphosyntactic attributes |
---|
2104 | @section Morphosyntactic attributes |
---|
2105 | |
---|
2106 | @multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} |
---|
2107 | @c @headitem Attr @tab Val @tab Description |
---|
2108 | @item |
---|
2109 | @code{A} @tab @tab Aspect |
---|
2110 | @item |
---|
2111 | @tab @code{p} @tab perfect |
---|
2112 | @item |
---|
2113 | @tab @code{i} @tab imperfect. |
---|
2114 | @item |
---|
2115 | @item |
---|
2116 | @code{V} @tab @tab Verb-Form |
---|
2117 | @item |
---|
2118 | @tab @code{b} @tab infinitive, |
---|
2119 | @item |
---|
2120 | @tab @code{p} @tab personal, |
---|
2121 | @item |
---|
2122 | @tab @code{i} @tab impersonal. |
---|
2123 | @item |
---|
2124 | @item |
---|
2125 | @code{M} @tab @tab Mood |
---|
2126 | @item |
---|
2127 | @tab @code{d} @tab declarative, |
---|
2128 | @item |
---|
2129 | @tab @code{c} @tab conditional, |
---|
2130 | @item |
---|
2131 | @tab @code{i} @tab imperative. |
---|
2132 | @item |
---|
2133 | @item |
---|
2134 | @code{T} @tab @tab Tense |
---|
2135 | @item |
---|
2136 | @tab @code{a} @tab past, |
---|
2137 | @item |
---|
2138 | @tab @code{r} @tab present, |
---|
2139 | @item |
---|
2140 | @tab @code{f} @tab future. |
---|
2141 | @item |
---|
2142 | @item |
---|
2143 | @code{P} @tab @tab Person |
---|
2144 | @item |
---|
2145 | @tab @code{1} @tab 1, |
---|
2146 | @item |
---|
2147 | @tab @code{2} @tab 2, |
---|
2148 | @item |
---|
2149 | @tab @code{3} @tab 3. |
---|
2150 | @item |
---|
2151 | @item |
---|
2152 | @code{D} @tab @tab Degree |
---|
2153 | @item |
---|
2154 | @tab @code{p} @tab positive, |
---|
2155 | @item |
---|
2156 | @tab @code{c} @tab comparative, |
---|
2157 | @item |
---|
2158 | @tab @code{s} @tab superlative. |
---|
2159 | @item |
---|
2160 | @item |
---|
2161 | @code{N} @tab @tab Number |
---|
2162 | @item |
---|
2163 | @tab @code{s} @tab singular, |
---|
2164 | @item |
---|
2165 | @tab @code{p} @tab plural. |
---|
2166 | @item |
---|
2167 | @item |
---|
2168 | @code{C} @tab @tab Case |
---|
2169 | @item |
---|
2170 | @tab @code{n} @tab nominative, |
---|
2171 | @item |
---|
2172 | @tab @code{g} @tab genitive, |
---|
2173 | @item |
---|
2174 | @tab @code{d} @tab dative, |
---|
2175 | @item |
---|
2176 | @tab @code{a} @tab accusative, |
---|
2177 | @item |
---|
2178 | @tab @code{i} @tab instrumantal, |
---|
2179 | @item |
---|
2180 | @tab @code{l} @tab locative, |
---|
2181 | @item |
---|
2182 | @tab @code{v} @tab vocative. |
---|
2183 | @item |
---|
2184 | @item |
---|
2185 | @code{G} @tab @tab Gender |
---|
2186 | @item |
---|
2187 | @tab @code{p} @tab masculine-personal, |
---|
2188 | @item |
---|
2189 | @tab @code{a} @tab masculine-animal, |
---|
2190 | @item |
---|
2191 | @tab @code{i} @tab masculine-inanimate, |
---|
2192 | @item |
---|
2193 | @tab @code{f} @tab feminine, |
---|
2194 | @item |
---|
2195 | @tab @code{n} @tab neuter. |
---|
2196 | @end multitable |
---|
2197 | |
---|
2198 | |
---|
2199 | @c --------------------------------------------------------------------- |
---|
2200 | @c --------------------------------------------------------------------- |
---|
2201 | @c |
---|
2202 | @c @node Examples |
---|
2203 | @c @chapter Examples |
---|
2204 | |
---|
2205 | @c ---------------------------------------------------------------------- |
---|
2206 | @c ---------------------------------------------------------------------- |
---|
2207 | |
---|
2208 | @node GNU Free Documentation License |
---|
2209 | @chapter GNU Free Documentation License |
---|
2210 | |
---|
2211 | @c The GNU Free Documentation License. |
---|
2212 | @center Version 1.2, November 2002 |
---|
2213 | |
---|
2214 | @c This file is intended to be included within another document, |
---|
2215 | @c hence no sectioning command or @node. |
---|
2216 | |
---|
2217 | @display |
---|
2218 | Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc. |
---|
2219 | 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA |
---|
2220 | |
---|
2221 | Everyone is permitted to copy and distribute verbatim copies |
---|
2222 | of this license document, but changing it is not allowed. |
---|
2223 | @end display |
---|
2224 | |
---|
2225 | @enumerate 0 |
---|
2226 | @item |
---|
2227 | PREAMBLE |
---|
2228 | |
---|
2229 | The purpose of this License is to make a manual, textbook, or other |
---|
2230 | functional and useful document @dfn{free} in the sense of freedom: to |
---|
2231 | assure everyone the effective freedom to copy and redistribute it, |
---|
2232 | with or without modifying it, either commercially or noncommercially. |
---|
2233 | Secondarily, this License preserves for the author and publisher a way |
---|
2234 | to get credit for their work, while not being considered responsible |
---|
2235 | for modifications made by others. |
---|
2236 | |
---|
2237 | This License is a kind of ``copyleft'', which means that derivative |
---|
2238 | works of the document must themselves be free in the same sense. It |
---|
2239 | complements the GNU General Public License, which is a copyleft |
---|
2240 | license designed for free software. |
---|
2241 | |
---|
2242 | We have designed this License in order to use it for manuals for free |
---|
2243 | software, because free software needs free documentation: a free |
---|
2244 | program should come with manuals providing the same freedoms that the |
---|
2245 | software does. But this License is not limited to software manuals; |
---|
2246 | it can be used for any textual work, regardless of subject matter or |
---|
2247 | whether it is published as a printed book. We recommend this License |
---|
2248 | principally for works whose purpose is instruction or reference. |
---|
2249 | |
---|
2250 | @item |
---|
2251 | APPLICABILITY AND DEFINITIONS |
---|
2252 | |
---|
2253 | This License applies to any manual or other work, in any medium, that |
---|
2254 | contains a notice placed by the copyright holder saying it can be |
---|
2255 | distributed under the terms of this License. Such a notice grants a |
---|
2256 | world-wide, royalty-free license, unlimited in duration, to use that |
---|
2257 | work under the conditions stated herein. The ``Document'', below, |
---|
2258 | refers to any such manual or work. Any member of the public is a |
---|
2259 | licensee, and is addressed as ``you''. You accept the license if you |
---|
2260 | copy, modify or distribute the work in a way requiring permission |
---|
2261 | under copyright law. |
---|
2262 | |
---|
2263 | A ``Modified Version'' of the Document means any work containing the |
---|
2264 | Document or a portion of it, either copied verbatim, or with |
---|
2265 | modifications and/or translated into another language. |
---|
2266 | |
---|
2267 | A ``Secondary Section'' is a named appendix or a front-matter section |
---|
2268 | of the Document that deals exclusively with the relationship of the |
---|
2269 | publishers or authors of the Document to the Document's overall |
---|
2270 | subject (or to related matters) and contains nothing that could fall |
---|
2271 | directly within that overall subject. (Thus, if the Document is in |
---|
2272 | part a textbook of mathematics, a Secondary Section may not explain |
---|
2273 | any mathematics.) The relationship could be a matter of historical |
---|
2274 | connection with the subject or with related matters, or of legal, |
---|
2275 | commercial, philosophical, ethical or political position regarding |
---|
2276 | them. |
---|
2277 | |
---|
2278 | The ``Invariant Sections'' are certain Secondary Sections whose titles |
---|
2279 | are designated, as being those of Invariant Sections, in the notice |
---|
2280 | that says that the Document is released under this License. If a |
---|
2281 | section does not fit the above definition of Secondary then it is not |
---|
2282 | allowed to be designated as Invariant. The Document may contain zero |
---|
2283 | Invariant Sections. If the Document does not identify any Invariant |
---|
2284 | Sections then there are none. |
---|
2285 | |
---|
2286 | The ``Cover Texts'' are certain short passages of text that are listed, |
---|
2287 | as Front-Cover Texts or Back-Cover Texts, in the notice that says that |
---|
2288 | the Document is released under this License. A Front-Cover Text may |
---|
2289 | be at most 5 words, and a Back-Cover Text may be at most 25 words. |
---|
2290 | |
---|
2291 | A ``Transparent'' copy of the Document means a machine-readable copy, |
---|
2292 | represented in a format whose specification is available to the |
---|
2293 | general public, that is suitable for revising the document |
---|
2294 | straightforwardly with generic text editors or (for images composed of |
---|
2295 | pixels) generic paint programs or (for drawings) some widely available |
---|
2296 | drawing editor, and that is suitable for input to text formatters or |
---|
2297 | for automatic translation to a variety of formats suitable for input |
---|
2298 | to text formatters. A copy made in an otherwise Transparent file |
---|
2299 | format whose markup, or absence of markup, has been arranged to thwart |
---|
2300 | or discourage subsequent modification by readers is not Transparent. |
---|
2301 | An image format is not Transparent if used for any substantial amount |
---|
2302 | of text. A copy that is not ``Transparent'' is called ``Opaque''. |
---|
2303 | |
---|
2304 | Examples of suitable formats for Transparent copies include plain |
---|
2305 | @sc{ascii} without markup, Texinfo input format, La@TeX{} input |
---|
2306 | format, @acronym{SGML} or @acronym{XML} using a publicly available |
---|
2307 | @acronym{DTD}, and standard-conforming simple @acronym{HTML}, |
---|
2308 | PostScript or @acronym{PDF} designed for human modification. Examples |
---|
2309 | of transparent image formats include @acronym{PNG}, @acronym{XCF} and |
---|
2310 | @acronym{JPG}. Opaque formats include proprietary formats that can be |
---|
2311 | read and edited only by proprietary word processors, @acronym{SGML} or |
---|
2312 | @acronym{XML} for which the @acronym{DTD} and/or processing tools are |
---|
2313 | not generally available, and the machine-generated @acronym{HTML}, |
---|
2314 | PostScript or @acronym{PDF} produced by some word processors for |
---|
2315 | output purposes only. |
---|
2316 | |
---|
2317 | The ``Title Page'' means, for a printed book, the title page itself, |
---|
2318 | plus such following pages as are needed to hold, legibly, the material |
---|
2319 | this License requires to appear in the title page. For works in |
---|
2320 | formats which do not have any title page as such, ``Title Page'' means |
---|
2321 | the text near the most prominent appearance of the work's title, |
---|
2322 | preceding the beginning of the body of the text. |
---|
2323 | |
---|
2324 | A section ``Entitled XYZ'' means a named subunit of the Document whose |
---|
2325 | title either is precisely XYZ or contains XYZ in parentheses following |
---|
2326 | text that translates XYZ in another language. (Here XYZ stands for a |
---|
2327 | specific section name mentioned below, such as ``Acknowledgements'', |
---|
2328 | ``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title'' |
---|
2329 | of such a section when you modify the Document means that it remains a |
---|
2330 | section ``Entitled XYZ'' according to this definition. |
---|
2331 | |
---|
2332 | The Document may include Warranty Disclaimers next to the notice which |
---|
2333 | states that this License applies to the Document. These Warranty |
---|
2334 | Disclaimers are considered to be included by reference in this |
---|
2335 | License, but only as regards disclaiming warranties: any other |
---|
2336 | implication that these Warranty Disclaimers may have is void and has |
---|
2337 | no effect on the meaning of this License. |
---|
2338 | |
---|
2339 | @item |
---|
2340 | VERBATIM COPYING |
---|
2341 | |
---|
2342 | You may copy and distribute the Document in any medium, either |
---|
2343 | commercially or noncommercially, provided that this License, the |
---|
2344 | copyright notices, and the license notice saying this License applies |
---|
2345 | to the Document are reproduced in all copies, and that you add no other |
---|
2346 | conditions whatsoever to those of this License. You may not use |
---|
2347 | technical measures to obstruct or control the reading or further |
---|
2348 | copying of the copies you make or distribute. However, you may accept |
---|
2349 | compensation in exchange for copies. If you distribute a large enough |
---|
2350 | number of copies you must also follow the conditions in section 3. |
---|
2351 | |
---|
2352 | You may also lend copies, under the same conditions stated above, and |
---|
2353 | you may publicly display copies. |
---|
2354 | |
---|
2355 | @item |
---|
2356 | COPYING IN QUANTITY |
---|
2357 | |
---|
2358 | If you publish printed copies (or copies in media that commonly have |
---|
2359 | printed covers) of the Document, numbering more than 100, and the |
---|
2360 | Document's license notice requires Cover Texts, you must enclose the |
---|
2361 | copies in covers that carry, clearly and legibly, all these Cover |
---|
2362 | Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on |
---|
2363 | the back cover. Both covers must also clearly and legibly identify |
---|
2364 | you as the publisher of these copies. The front cover must present |
---|
2365 | the full title with all words of the title equally prominent and |
---|
2366 | visible. You may add other material on the covers in addition. |
---|
2367 | Copying with changes limited to the covers, as long as they preserve |
---|
2368 | the title of the Document and satisfy these conditions, can be treated |
---|
2369 | as verbatim copying in other respects. |
---|
2370 | |
---|
2371 | If the required texts for either cover are too voluminous to fit |
---|
2372 | legibly, you should put the first ones listed (as many as fit |
---|
2373 | reasonably) on the actual cover, and continue the rest onto adjacent |
---|
2374 | pages. |
---|
2375 | |
---|
2376 | If you publish or distribute Opaque copies of the Document numbering |
---|
2377 | more than 100, you must either include a machine-readable Transparent |
---|
2378 | copy along with each Opaque copy, or state in or with each Opaque copy |
---|
2379 | a computer-network location from which the general network-using |
---|
2380 | public has access to download using public-standard network protocols |
---|
2381 | a complete Transparent copy of the Document, free of added material. |
---|
2382 | If you use the latter option, you must take reasonably prudent steps, |
---|
2383 | when you begin distribution of Opaque copies in quantity, to ensure |
---|
2384 | that this Transparent copy will remain thus accessible at the stated |
---|
2385 | location until at least one year after the last time you distribute an |
---|
2386 | Opaque copy (directly or through your agents or retailers) of that |
---|
2387 | edition to the public. |
---|
2388 | |
---|
2389 | It is requested, but not required, that you contact the authors of the |
---|
2390 | Document well before redistributing any large number of copies, to give |
---|
2391 | them a chance to provide you with an updated version of the Document. |
---|
2392 | |
---|
2393 | @item |
---|
2394 | MODIFICATIONS |
---|
2395 | |
---|
2396 | You may copy and distribute a Modified Version of the Document under |
---|
2397 | the conditions of sections 2 and 3 above, provided that you release |
---|
2398 | the Modified Version under precisely this License, with the Modified |
---|
2399 | Version filling the role of the Document, thus licensing distribution |
---|
2400 | and modification of the Modified Version to whoever possesses a copy |
---|
2401 | of it. In addition, you must do these things in the Modified Version: |
---|
2402 | |
---|
2403 | @enumerate A |
---|
2404 | @item |
---|
2405 | Use in the Title Page (and on the covers, if any) a title distinct |
---|
2406 | from that of the Document, and from those of previous versions |
---|
2407 | (which should, if there were any, be listed in the History section |
---|
2408 | of the Document). You may use the same title as a previous version |
---|
2409 | if the original publisher of that version gives permission. |
---|
2410 | |
---|
2411 | @item |
---|
2412 | List on the Title Page, as authors, one or more persons or entities |
---|
2413 | responsible for authorship of the modifications in the Modified |
---|
2414 | Version, together with at least five of the principal authors of the |
---|
2415 | Document (all of its principal authors, if it has fewer than five), |
---|
2416 | unless they release you from this requirement. |
---|
2417 | |
---|
2418 | @item |
---|
2419 | State on the Title page the name of the publisher of the |
---|
2420 | Modified Version, as the publisher. |
---|
2421 | |
---|
2422 | @item |
---|
2423 | Preserve all the copyright notices of the Document. |
---|
2424 | |
---|
2425 | @item |
---|
2426 | Add an appropriate copyright notice for your modifications |
---|
2427 | adjacent to the other copyright notices. |
---|
2428 | |
---|
2429 | @item |
---|
2430 | Include, immediately after the copyright notices, a license notice |
---|
2431 | giving the public permission to use the Modified Version under the |
---|
2432 | terms of this License, in the form shown in the Addendum below. |
---|
2433 | |
---|
2434 | @item |
---|
2435 | Preserve in that license notice the full lists of Invariant Sections |
---|
2436 | and required Cover Texts given in the Document's license notice. |
---|
2437 | |
---|
2438 | @item |
---|
2439 | Include an unaltered copy of this License. |
---|
2440 | |
---|
2441 | @item |
---|
2442 | Preserve the section Entitled ``History'', Preserve its Title, and add |
---|
2443 | to it an item stating at least the title, year, new authors, and |
---|
2444 | publisher of the Modified Version as given on the Title Page. If |
---|
2445 | there is no section Entitled ``History'' in the Document, create one |
---|
2446 | stating the title, year, authors, and publisher of the Document as |
---|
2447 | given on its Title Page, then add an item describing the Modified |
---|
2448 | Version as stated in the previous sentence. |
---|
2449 | |
---|
2450 | @item |
---|
2451 | Preserve the network location, if any, given in the Document for |
---|
2452 | public access to a Transparent copy of the Document, and likewise |
---|
2453 | the network locations given in the Document for previous versions |
---|
2454 | it was based on. These may be placed in the ``History'' section. |
---|
2455 | You may omit a network location for a work that was published at |
---|
2456 | least four years before the Document itself, or if the original |
---|
2457 | publisher of the version it refers to gives permission. |
---|
2458 | |
---|
2459 | @item |
---|
2460 | For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve |
---|
2461 | the Title of the section, and preserve in the section all the |
---|
2462 | substance and tone of each of the contributor acknowledgements and/or |
---|
2463 | dedications given therein. |
---|
2464 | |
---|
2465 | @item |
---|
2466 | Preserve all the Invariant Sections of the Document, |
---|
2467 | unaltered in their text and in their titles. Section numbers |
---|
2468 | or the equivalent are not considered part of the section titles. |
---|
2469 | |
---|
2470 | @item |
---|
2471 | Delete any section Entitled ``Endorsements''. Such a section |
---|
2472 | may not be included in the Modified Version. |
---|
2473 | |
---|
2474 | @item |
---|
2475 | Do not retitle any existing section to be Entitled ``Endorsements'' or |
---|
2476 | to conflict in title with any Invariant Section. |
---|
2477 | |
---|
2478 | @item |
---|
2479 | Preserve any Warranty Disclaimers. |
---|
2480 | @end enumerate |
---|
2481 | |
---|
2482 | If the Modified Version includes new front-matter sections or |
---|
2483 | appendices that qualify as Secondary Sections and contain no material |
---|
2484 | copied from the Document, you may at your option designate some or all |
---|
2485 | of these sections as invariant. To do this, add their titles to the |
---|
2486 | list of Invariant Sections in the Modified Version's license notice. |
---|
2487 | These titles must be distinct from any other section titles. |
---|
2488 | |
---|
2489 | You may add a section Entitled ``Endorsements'', provided it contains |
---|
2490 | nothing but endorsements of your Modified Version by various |
---|
2491 | parties---for example, statements of peer review or that the text has |
---|
2492 | been approved by an organization as the authoritative definition of a |
---|
2493 | standard. |
---|
2494 | |
---|
2495 | You may add a passage of up to five words as a Front-Cover Text, and a |
---|
2496 | passage of up to 25 words as a Back-Cover Text, to the end of the list |
---|
2497 | of Cover Texts in the Modified Version. Only one passage of |
---|
2498 | Front-Cover Text and one of Back-Cover Text may be added by (or |
---|
2499 | through arrangements made by) any one entity. If the Document already |
---|
2500 | includes a cover text for the same cover, previously added by you or |
---|
2501 | by arrangement made by the same entity you are acting on behalf of, |
---|
2502 | you may not add another; but you may replace the old one, on explicit |
---|
2503 | permission from the previous publisher that added the old one. |
---|
2504 | |
---|
2505 | The author(s) and publisher(s) of the Document do not by this License |
---|
2506 | give permission to use their names for publicity for or to assert or |
---|
2507 | imply endorsement of any Modified Version. |
---|
2508 | |
---|
2509 | @item |
---|
2510 | COMBINING DOCUMENTS |
---|
2511 | |
---|
2512 | You may combine the Document with other documents released under this |
---|
2513 | License, under the terms defined in section 4 above for modified |
---|
2514 | versions, provided that you include in the combination all of the |
---|
2515 | Invariant Sections of all of the original documents, unmodified, and |
---|
2516 | list them all as Invariant Sections of your combined work in its |
---|
2517 | license notice, and that you preserve all their Warranty Disclaimers. |
---|
2518 | |
---|
2519 | The combined work need only contain one copy of this License, and |
---|
2520 | multiple identical Invariant Sections may be replaced with a single |
---|
2521 | copy. If there are multiple Invariant Sections with the same name but |
---|
2522 | different contents, make the title of each such section unique by |
---|
2523 | adding at the end of it, in parentheses, the name of the original |
---|
2524 | author or publisher of that section if known, or else a unique number. |
---|
2525 | Make the same adjustment to the section titles in the list of |
---|
2526 | Invariant Sections in the license notice of the combined work. |
---|
2527 | |
---|
2528 | In the combination, you must combine any sections Entitled ``History'' |
---|
2529 | in the various original documents, forming one section Entitled |
---|
2530 | ``History''; likewise combine any sections Entitled ``Acknowledgements'', |
---|
2531 | and any sections Entitled ``Dedications''. You must delete all |
---|
2532 | sections Entitled ``Endorsements.'' |
---|
2533 | |
---|
2534 | @item |
---|
2535 | COLLECTIONS OF DOCUMENTS |
---|
2536 | |
---|
2537 | You may make a collection consisting of the Document and other documents |
---|
2538 | released under this License, and replace the individual copies of this |
---|
2539 | License in the various documents with a single copy that is included in |
---|
2540 | the collection, provided that you follow the rules of this License for |
---|
2541 | verbatim copying of each of the documents in all other respects. |
---|
2542 | |
---|
2543 | You may extract a single document from such a collection, and distribute |
---|
2544 | it individually under this License, provided you insert a copy of this |
---|
2545 | License into the extracted document, and follow this License in all |
---|
2546 | other respects regarding verbatim copying of that document. |
---|
2547 | |
---|
2548 | @item |
---|
2549 | AGGREGATION WITH INDEPENDENT WORKS |
---|
2550 | |
---|
2551 | A compilation of the Document or its derivatives with other separate |
---|
2552 | and independent documents or works, in or on a volume of a storage or |
---|
2553 | distribution medium, is called an ``aggregate'' if the copyright |
---|
2554 | resulting from the compilation is not used to limit the legal rights |
---|
2555 | of the compilation's users beyond what the individual works permit. |
---|
2556 | When the Document is included in an aggregate, this License does not |
---|
2557 | apply to the other works in the aggregate which are not themselves |
---|
2558 | derivative works of the Document. |
---|
2559 | |
---|
2560 | If the Cover Text requirement of section 3 is applicable to these |
---|
2561 | copies of the Document, then if the Document is less than one half of |
---|
2562 | the entire aggregate, the Document's Cover Texts may be placed on |
---|
2563 | covers that bracket the Document within the aggregate, or the |
---|
2564 | electronic equivalent of covers if the Document is in electronic form. |
---|
2565 | Otherwise they must appear on printed covers that bracket the whole |
---|
2566 | aggregate. |
---|
2567 | |
---|
2568 | @item |
---|
2569 | TRANSLATION |
---|
2570 | |
---|
2571 | Translation is considered a kind of modification, so you may |
---|
2572 | distribute translations of the Document under the terms of section 4. |
---|
2573 | Replacing Invariant Sections with translations requires special |
---|
2574 | permission from their copyright holders, but you may include |
---|
2575 | translations of some or all Invariant Sections in addition to the |
---|
2576 | original versions of these Invariant Sections. You may include a |
---|
2577 | translation of this License, and all the license notices in the |
---|
2578 | Document, and any Warranty Disclaimers, provided that you also include |
---|
2579 | the original English version of this License and the original versions |
---|
2580 | of those notices and disclaimers. In case of a disagreement between |
---|
2581 | the translation and the original version of this License or a notice |
---|
2582 | or disclaimer, the original version will prevail. |
---|
2583 | |
---|
2584 | If a section in the Document is Entitled ``Acknowledgements'', |
---|
2585 | ``Dedications'', or ``History'', the requirement (section 4) to Preserve |
---|
2586 | its Title (section 1) will typically require changing the actual |
---|
2587 | title. |
---|
2588 | |
---|
2589 | @item |
---|
2590 | TERMINATION |
---|
2591 | |
---|
2592 | You may not copy, modify, sublicense, or distribute the Document except |
---|
2593 | as expressly provided for under this License. Any other attempt to |
---|
2594 | copy, modify, sublicense or distribute the Document is void, and will |
---|
2595 | automatically terminate your rights under this License. However, |
---|
2596 | parties who have received copies, or rights, from you under this |
---|
2597 | License will not have their licenses terminated so long as such |
---|
2598 | parties remain in full compliance. |
---|
2599 | |
---|
2600 | @item |
---|
2601 | FUTURE REVISIONS OF THIS LICENSE |
---|
2602 | |
---|
2603 | The Free Software Foundation may publish new, revised versions |
---|
2604 | of the GNU Free Documentation License from time to time. Such new |
---|
2605 | versions will be similar in spirit to the present version, but may |
---|
2606 | differ in detail to address new problems or concerns. See |
---|
2607 | @uref{http://www.gnu.org/copyleft/}. |
---|
2608 | |
---|
2609 | Each version of the License is given a distinguishing version number. |
---|
2610 | If the Document specifies that a particular numbered version of this |
---|
2611 | License ``or any later version'' applies to it, you have the option of |
---|
2612 | following the terms and conditions either of that specified version or |
---|
2613 | of any later version that has been published (not as a draft) by the |
---|
2614 | Free Software Foundation. If the Document does not specify a version |
---|
2615 | number of this License, you may choose any version ever published (not |
---|
2616 | as a draft) by the Free Software Foundation. |
---|
2617 | @end enumerate |
---|
2618 | |
---|
2619 | @page |
---|
2620 | @heading ADDENDUM: How to use this License for your documents |
---|
2621 | |
---|
2622 | To use this License in a document you have written, include a copy of |
---|
2623 | the License in the document and put the following copyright and |
---|
2624 | license notices just after the title page: |
---|
2625 | |
---|
2626 | @smallexample |
---|
2627 | @group |
---|
2628 | Copyright (C) @var{year} @var{your name}. |
---|
2629 | Permission is granted to copy, distribute and/or modify this document |
---|
2630 | under the terms of the GNU Free Documentation License, Version 1.2 |
---|
2631 | or any later version published by the Free Software Foundation; |
---|
2632 | with no Invariant Sections, no Front-Cover Texts, and no Back-Cover |
---|
2633 | Texts. A copy of the license is included in the section entitled ``GNU |
---|
2634 | Free Documentation License''. |
---|
2635 | @end group |
---|
2636 | @end smallexample |
---|
2637 | |
---|
2638 | If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, |
---|
2639 | replace the ``with@dots{}Texts.'' line with this: |
---|
2640 | |
---|
2641 | @smallexample |
---|
2642 | @group |
---|
2643 | with the Invariant Sections being @var{list their titles}, with |
---|
2644 | the Front-Cover Texts being @var{list}, and with the Back-Cover Texts |
---|
2645 | being @var{list}. |
---|
2646 | @end group |
---|
2647 | @end smallexample |
---|
2648 | |
---|
2649 | If you have Invariant Sections without Cover Texts, or some other |
---|
2650 | combination of the three, merge those two alternatives to suit the |
---|
2651 | situation. |
---|
2652 | |
---|
2653 | If your document contains nontrivial examples of program code, we |
---|
2654 | recommend releasing these examples in parallel under your choice of |
---|
2655 | free software license, such as the GNU General Public License, |
---|
2656 | to permit their use in free software. |
---|
2657 | |
---|
2658 | @c Local Variables: |
---|
2659 | @c ispell-local-pdict: "ispell-dict" |
---|
2660 | @c End: |
---|
2661 | |
---|
2662 | |
---|
2663 | @c --------------------------------------------------------------------- |
---|
2664 | @c --------------------------------------------------------------------- |
---|
2665 | |
---|
2666 | @node Reporting bugs |
---|
2667 | @chapter Reporting bugs |
---|
2668 | |
---|
2669 | Report bugs to <obrebski@@amu.edu.pl>. |
---|
2670 | |
---|
2671 | @c --------------------------------------------------------------------- |
---|
2672 | @c --------------------------------------------------------------------- |
---|
2673 | |
---|
2674 | @c @node Copyright |
---|
2675 | @c @chapter Copyright |
---|
2676 | @c |
---|
2677 | @c Copyright 2004 by Tomasz Obrebski |
---|
2678 | @c This software is free for research and educational use. |
---|
2679 | |
---|
2680 | @c --------------------------------------------------------------------- |
---|
2681 | @c --------------------------------------------------------------------- |
---|
2682 | |
---|
2683 | @node Author |
---|
2684 | @chapter Author |
---|
2685 | |
---|
2686 | |
---|
2687 | @bye |
---|