source: app/doc/utt.texinfo @ 25ae32e

help
Last change on this file since 25ae32e was 25ae32e, checked in by obrebski <obrebski@…>, 16 years ago

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@4 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

  • Property mode set to 100644
File size: 79.3 KB
Line 
1\input texinfo   @c -*-texinfo-*-
2@documentencoding ISO-8859-2
3@c @documentlanguage pl
4
5@c %**start of header
6@setfilename utt.info
7@settitle UAM Text Tools v0.90
8@c %**end of header
9
10@copying
11This manual is for UAM Text Tools (version 0.90, November, 2007)
12
13Copyright @copyright{}  2005, 2007  Tomasz Obrêbski, Micha³ Stolarski, Justyna Walkowska, Pawe³ Konieczka.
14
15Permission is granted to copy, distribute and/or modify this document
16under the terms of the GNU Free Documentation License, Version 1.2
17or any later version published by the Free Software Foundation;
18with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
19Texts.  A copy of the license is included in the section entitled GNU Free Documentation License,,GNU Free Documentation License.
20
21@c @quotation
22@c Permission is granted to ...
23@c No permission is granted until the document is completed.
24@c @end quotation
25@end copying
26
27
28@titlepage
29@title UAM Text Tools 0.90 - User Manual
30@subtitle edition 0.01, @today
31@subtitle status: prescript
32@author by Justyna Walkowska, Tomasz Obr@,{}ebski and Micha@l{} Stolarski
33@page
34@vskip 0pt plus 1filll
35@insertcopying
36@end titlepage
37
38@contents
39
40@c @paragraphindent none
41
42@iftex
43@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
44@end iftex
45
46@c @headings off
47@c @everyheading LEM(1) @| @| LEM(1)
48@everyfooting @today @c @| @thispage @|
49
50@ifnottex
51
52@node Top
53@top UTT - UAM Text Tools
54
55@insertcopying
56
57@menu
58* General information::                       
59* UTT file format::             
60* Configuration files::         
61* UTT components::
62* Auxiliary tools::
63* Usage examples::             
64* PMDBF dictionary::           
65@c * Examples::                   
66@c * Copyright::
67* GNU Free Documentation License::
68* Reporting bugs::                                   
69* Author::                     
70@end menu
71@end ifnottex
72
73
74@c ----------------------------------------------------------------------
75
76@node General information
77@chapter General information
78
79UAM Text Tools (UTT) is a package of language processing tools
80developed at Adam Mickiewicz University. Its functionality includes:
81
82@itemize @bullet
83
84@item
85tokenization
86@item
87dictionary-based morphological analysis
88@item
89heuristic morphological analysis of unknown words
90@item
91spelling correction
92@item
93pattern search
94@item
95sentence splitting
96@item
97generation of concordance tables
98@end itemize
99
100The toolkit is destined for processing of raw (not annotated)
101unrestricted text for any conceivable purpose.
102
103The system is organized as a collection of command-line programs, each
104performing one operation, e.g. tokenization, lemmatization, spelling
105correction. The components are independent one from another, the
106unifying element being the uniform i/o file format.
107
108The components may be combined in various ways to provide various text
109processing services. Also new components supplied by the used may be
110easily incorporated into the system provided that they respect the i/o
111file format conventions.
112
113UTT component programs does not depend on any specific tagset or
114morphological description format.
115
116UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
117the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
118
119The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use. 
120
121
122List of contributors:
123
124@itemize
125@item Pawel Konieczka
126@item Tomasz Obrebski
127@item Michal Stolarski
128@item Marcin Walas
129@item Justyna Walkowska
130@end itemize
131
132@c ----------------------------------------------------------------------
133@c ---------------------------------------------------------------------
134
135@node    UTT file format
136@chapter UTT file format
137
138A UTT file contains annotation of a text. It consists of a sequence of
139segments. Each segment explicitly refers to a continuous piece of the
140text and provides some information on it.
141
142@section Segment format
143
144A segment occupies one line of a UTT file and consists of
145space-separated fields:
146
147
148@quotation
149@sp 1
150[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
151@sp 1
152@end quotation
153
154@table @var
155
156@item @var{start}
157Non-negative integer value indicating the position in the source text where the
158segment starts.
159
160@item @var{length}
161Non-negative integer value indicating the length of the segment.
162
163@item @var{type}
164A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
165@var{type} reflects the main classification of segments -
166into words, numbers, punctuation marks, meta-text markers.
167@xref{tok output,,tok output}, for description of automatically recognized type markers.
168
169@item @var{form}
170This field contains the textual form of the segment or the special
171symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
172
173The characters or character sequences that have special meaning in the
174@var{form} field are enumerated below.
175
176Characters with special meaning:
177
178@itemize
179@item @code{_} - space character
180@item @code{*} - undefined contents
181@end itemize
182
183Escape sequences:
184
185@itemize
186@item @code{\n} - new line
187@item @code{\t} - tabulation
188@item @code{\r} - carriage return 
189
190@item @code{\_} - the @code{_} character
191@item @code{\*} - the @code{*} character
192@item @code{\\} - the @code{\} character
193
194@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
195@end itemize
196
197@item @var{annotation1}
198@item @var{annotation2}
199@item ...
200Annotation fields have the following format:
201
202@var{longname} @code{:} @var{value}
203
204or
205
206@var{shortname} @var{value}
207
208where @var{longname} is a string of alphanumeric characters
209(isalnum() test), @var{shortname} - a single non-alphanumeric character
210(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
211
212@end table
213
214
215Only two fields are mandatory: @var{type} and @var{form}. All other fields
216may be absent. In the case when only one number precedes the
217@var{type} field, it is interpreted as the @var{START} position.
218
219If the @var{length} field is ommited, the length of the segment is the
220length of the @var{form} field, except when the value of the
221@var{form} field is @code{*} -- in this case, the length is assumed to
222be 0.
223
224If the @var{start} field is also absent, the segment is assumed to directly
225follow the preceding one.
226
227@c Conventions:
228
229@c Annotation fields with predefined meaning:
230
231@c @itemize
232@c @item @code{!} - UTT components are allowed to modify the contents of
233@c the @var{form} field (e.g. spelling correction does this). If this happens the
234@c original form of the segment have to be placed in the @code{!}-field.
235@c @item @code{@@} - morphological description
236@c @item @code{=} - node identifier assignment (used in graph encoding)
237@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
238@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
239@c @end itemize
240
241Segments of length 0 may be used to mark file positions with some
242information. See e.g. BOS and EOS (beginning/end of sentence) markers
243in the example below.
244
245Example:
246
247sentence: @samp{Piszemy dobre progrumy.}
248
249@example
2500000 00 BOS *
2510000 07 W Piszemy lem:pisaæ,V
2520007 01 S _
2530008 05 W dobre lem:dobry,ADJ
2540013 01 S _
2550014 08 W progrumy cor:programy lem:program,N
2560022 01 P .
2570023 00 EOS *
2580023 01 S _
2590024 00 BOS *
2600024 11 W Warszawiacy lem:Warszawiak,N
2610035 01 S _
2620036 03 W te¿
2630039 01 P .
2640040 00 EOS *
265
266@end example
267
268@example
2690000 BOS *
2700000 W Piszemy lem:pisaæ,V
2710007 S _
2720008 W dobre lem:dobry,ADJ
2730013 S _
2740014 W progrumy cor:programy lem:program,N
2750022 P .
2760023 EOS *
277@end example
278
279Posion information may be provided only for some types of segments:
280
281@example
2820000 BOS *
283W Piszemy lem:pisaæ,V
284S _
285W dobre lem:dobry,ADJ
286S _
287W progrumy cor:programy lem:program,N
288P .
289EOS *
290S _
2910024 BOS *
292W Warszawiacy lem:Warszawiak,N
293S _
294W te¿
295P .
296EOS *
297@end example
298
299Position/length information may be provided only when necessary:
300
301@example
3020000 04 N *
3030000 N 12
304P .
305N 5
306S _
307W km
308@end example
309
310@section UTT File
311
312A UTT file consists of a sequence of segments.  The same text position
313may be covered by multiple segments. In cosequence, ambiguous text
314segmentation and ambiguous annotation may be represented.
315
316There are two structural requirements a valid UTT-formatted file
317has to meet:
318
319@itemize @bullet
320
321@item
322segments have to be sorted with respect to the @var{position} field,
323
324@item
325for each
326segment ending at position @var{n}, either there must be a segment starting at
327position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
328for each segment starting at position @var{n}, either there must be a segment
329ending at position @var{n-1}, or the position @var{n-1} must not be covered
330by any segment.
331
332@end itemize
333
334A valid annotation for the text fragment
335@example
33612.5 km
337@end example
338
339may be
340
341@example
3420000 02 N 12
3430000 04 N 12.5
3440002 01 P .
3450003 01 N 5
3460004 01 S _
3470005 02 W km
348@end example
349
350but not
351
352@example
3530000 02 N 12
3540000 04 N 12.5
3550004 01 S _
3560005 02 W km
357@end example
358
359because in the latter example the first segment (starting at position 0000, 2 characters long) ends at position @var{n}=0001 which is covered by the second segment and no segment starts at position @var{n+2}=0002.
360
361@section Character encoding
362
363The UTT component programs accept only 1-byte character encoding, such
364as ISO, ANSI, DOS, UTF-8 (probably: not tested yet).
365
366
367@c @section Formats
368
369@c @unnumberedsubsubsec Basic format
370
371@c While processing large amounts of the overhead related with explicit
372@c ... of the start position and segment length becomes ... . Therefore,
373@c for efficiency reasons certain shortcuts are possible:
374
375@c @unnumberedsubsubsec Relative start position
376
377@c Start position may be given as relative distance from the last
378@c absolut position.
379
380@c @unnumberedsubsubsec Absent length
381
382@c Segment length may by omitted. Normally it can be restored by counting
383@c the length of the @emph{form field}. For segments with the special value
384@c @code{*} in the @emph{form field} length 0 is assumed.
385
386@c @unnumberedsubsubsec Absent length and start position
387
388@c Both start position and segment length may be omitted. In this format
389@c each segment is assumed to follow the previous one. This format is,
390@c therefore, suitable only for unambiguously tagged text
391@c (0-length markers can be still used.)
392
393
394@c @table @code
395@c @item AL
396@c @code{1234 03 W kot}
397@c @item RL
398@c @code{+56 03 W kot}
399@c @item A
400@c @code{1234 W kot}
401@c @item R
402@c @code{+56 W kot}
403@c @item 0
404@c @code{W kot}
405@c @end table
406
407
408@c [JAK UZYSKAÆ POLSKIE CZCIONKI W DVI???]
409
410@macro parhelp
411@item @b{@minus{}@minus{}help}, @b{@minus{}h}
412Print help.
413@end macro
414
415
416@macro parversion
417@item @b{@minus{}@minus{}version}, @b{@minus{}V}
418Print version information.
419@end macro
420
421@macro parinteractive
422@item @b{@minus{}@minus{}interactive, @minus{}i}
423This option toggles interactive mode, which is by default off. In the
424interactive mode the program does not buffer the output.
425@end macro
426
427
428@c @macro parfile
429@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
430@c Input file name.
431@c If this option is absent or equal to '@minus{}', the program
432@c reads from the standard input.
433@c @end macro
434
435
436@c @macro paroutput
437@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
438@c Regular output file name. To regular output the program sends segments
439@c which it successfully processed and copies those which were not
440@c subject to processing. If this option is absent or equal to
441@c '@minus{}', standard output is used.
442@c @end macro
443
444@c @macro parfail
445@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
446@c Fail output file name. To fail output the program copies the segments
447@c it failed to process.  If this option is absent or equal to
448@c '@minus{}', standard output is used.
449@c @end macro
450
451
452@c @macro parcopy
453@c @item @b{@minus{}@minus{}copy, @minus{}c}
454@c Copy succesfully processed segments to regular output also in their
455@c original input form.
456@c @end macro
457
458
459@macro parinputfield
460@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
461The field containing the input to the program. The default is the
462@var{form} field. The fields @var{position}, @var{length}, @var{type},
463and @var{form} are referred to as @code{1}, @code{2}, @code{3},
464@code{4}, respectively.
465@end macro
466
467
468@macro paroutputfield
469@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
470The name of the field added by the program. The default is the name of the program.
471@end macro
472
473
474@macro pardictionary
475@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
476Dictionary file name.
477@end macro
478
479
480@macro parprocess
481@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
482Process segments with the specified value in the @var{type} field.
483Multiple occurences of this option are allowed and are interpreted as
484disjunction. If this option is absent, all segments are processed.
485@end macro
486
487
488@macro parselect
489@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
490Select for processing only segments in which the field named
491@var{fieldname} is present. Multiple occurences of this option are
492allowed and are interpreted as conjunction of conditions. If this
493option is absent, all segments are processed.
494@end macro
495
496
497@macro parunselect
498@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
499Select for processing only segments in which the field @var{fieldname}
500is absent.  Multiple occurences of this option are allowed and are
501interpreted as conjunction of conditions. If this option is absent,
502all segments are processed.
503@end macro
504
505
506@macro paroneline
507@item @b{@minus{}@minus{}one-line}
508This option makes the program print ambiguous annotation in one output
509line by generating multiple annotation fields. By default when
510ambiguous annotation may be produced for a segment, the segment is
511multiplicated and each of the annotations is added to separate copy of
512the segment.
513@end macro
514
515
516@macro paronefield
517@item @b{@minus{}@minus{}one-field, @minus{}1}
518This option makes the program print ambiguous annotation in one
519annotation field. By default when ambiguous annotation may be produced
520for a segment, the segment is multiplicated and each of the
521annotations is added to separate copy of the segment.
522
523This option is useful when working with @command{kot} or @command{con}.
524@end macro
525
526
527@c ---------------------------------------------------------------------
528@c ---------------------------------------------------------------------
529
530@c @node Common command line options
531@c @chapter Common command line options
532
533@c @table @code
534
535@c @parhelp
536
537@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
538@c Print help.
539
540@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
541@c Print version information.
542
543@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
544@c Input file name.
545@c If this option is absent or equal to '@minus{}', the program
546@c reads from the standard input.
547
548@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
549@c Regular output file name. To regular output the program sends segments
550@c which it successfully processed and copies those which were not
551@c subject to processing. If this option is absent or equal to
552@c '@minus{}', standard output is used.
553
554@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
555@c Fail output file name. To fail output the program copies the segments
556@c it failed to process.  If this option is absent or equal to
557@c '@minus{}', standard output is used.
558
559@c @item @b{@minus{}@minus{}only-fail}
560@c Discard segments which would normally be sent to regular
561@c output. Print only segments the program failed to process.
562
563@c @item @b{@minus{}@minus{}no-fail}
564@c Discard segments the program failed to process.
565@c (This and the previous option are functionally equivalent to,
566@c respectively, @option{-o /dev/null} and @option{-e /dev/null}, but
567@c make the programs run faster.)
568
569@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
570@c The field containing the input to the program. The default is usually
571@c the @var{form} field (unless otherwise stated in the program
572@c description). The fields @var{position}, @var{length}, @var{tag}, and
573@c @var{form} are referred to as @code{1}, @code{2}, @code{3}, @code{4},
574@c respectively.
575
576@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
577@c The name of the field added by the program. The default is the name of
578@c the program.
579
580@c @c @item @b{@minus{}@minus{}copy, @minus{}c}
581@c @c Copy processed segments to regular output.
582
583@c @item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
584@c Dictionary file name.
585@c (This option is used by programs which use dictionary data.)
586
587@c @item @b{@minus{}@minus{}process=@var{tag}, @minus{}p @var{tag}}
588@c Process segments with the specified value in the @var{tag} field.
589@c Multiple occurences of this option are allowed and are interpreted as
590@c disjunction. If this option is absent, all segments are processed.
591
592@c @item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
593@c Select for processing only segments in which the field named
594@c @var{fieldname} is present. Multiple occurences of this option are
595@c allowed and are interpreted as conjunction of conditions. If this
596@c option is absent, all segments are processed.
597
598@c @item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
599@c Select for processing only segments in which the field @var{fieldname}
600@c is absent.  Multiple occurences of this option are allowed and are
601@c interpreted as conjunction of conditions. If this option is absent,
602@c all segments are processed.
603
604@c @item @b{@minus{}@minus{}interactive @minus{}i}
605@c This option toggles interactive mode, which is by default off. In the
606@c interactive mode the program does not buffer the output.
607
608@c @item @b{@minus{}@minus{}config=@var{filename}}
609@c Read configuration from file @file{@var{filename}}.
610
611@c @item @b{@minus{}@minus{}one @minus{}1}
612@c This option makes the program print ambiguous annotation in one output
613@c segment. By default when
614@c ambiguous new annotation is being produced for a segment, the segment
615@c is multiplicated and each of the annotations is added to separate copy
616@c of the segment.
617
618@c @end table
619
620@c ---------------------------------------------------------------------
621@c CONFIGURATION FILES
622@c ---------------------------------------------------------------------
623
624@node    Configuration files
625@chapter Configuration files
626
627Values for all command line options accepted by a component
628may be set in configuration files. The default location of the
629configuration files for a component named @command{@var{program}} are
630
631@example
632        @file{/etc/utt/conf/@var{program}.conf}
633@end example
634
635for system-wide configuration file and
636
637@example
638        @file{~/.utt/conf/@var{program}.conf}
639@end example
640
641for user configuration file.
642
643@c The configuration file to load may be also specified with the
644@c @option{--config} option. Configuration file need not be provided.
645
646For each option, the value is set according to the following priority:
647
648@itemize
649@item command line
650@c @item configuration file indicated with @option{--config} option
651@item user configuration file (or configuration file indicated with the @option{--config} option)
652@item system-wide configuration file
653@end itemize
654
655Parameter values are specified in the following format:
656
657@var{parametername}=@var{value}
658
659where @var{parametername} is the short or long name of an option accepted by
660the program, or
661
662@var{parametername}
663
664if the option does not need arguments.
665
666You can introduce comments to configuration files using the # sign.
667
668If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
669
670@c The equal sign may be omitted.
671
672
673@quotation Tip
674If you have two (or more) frequently used sets of options for the same
675program (eg. lem with PMDBF dictionary and lem with a user dictionary)
676a good solution is to create two soft links to lem, called
677eg. lemg and lemu and specify their configuration in files lemg.conf
678and lemu.conf respectively.
679@end quotation
680
681@c ---------------------------------------------------------------------
682@c COMPONENTS
683@c ---------------------------------------------------------------------
684
685@node UTT components
686@chapter UTT components
687
688UTT components are of three types:
689
690@menu
691Sources: programs which read non-UTT data (e.g. raw text) and produce output
692in UTT format
693* tok::         a tokenizer
694
695Filters: programs which read and produce UTT-formatted data
696@c * sen - the sentencizer::
697* lem::         a morphological analyzer
698* gue::         a morphological guesser
699* cor::         a spelling corrector
700* sen::         a sentensizer
701@c * gph - the graphizer::
702* ser::         a pattern search tool (marks matches)
703* grp::         a pattern search tool (selects sentences containing a match)
704
705Sinks: programs which read UTT data and produce output in another format
706* kot::         an untokenizer
707* con::         a concordance table generator
708@end menu
709
710@c ---------------------------------------------------------------------
711@c TOK
712@c ---------------------------------------------------------------------
713
714@page
715@node tok
716@section tok - a tokenizer
717
718@c ----------------------------------------
719
720@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
721@item @strong{Authors:}                 @tab Tomasz Obrêbski
722@item @strong{Component category:}      @tab source
723@end multitable
724
725
726@menu
727* tok description::
728* tok input::
729* tok output::
730* tok command line options::
731* tok example::
732@end menu
733
734@node tok description
735@subsection Description
736
737@code{tok} is a simple program which reads a text file and identifies
738tokens on the basis of their orthographic form.  The type of the token
739is printed as the @var{type} field.
740
741@node tok input
742@subsection Input
743
744Raw text.
745
746@node tok output
747@subsection Output
748
749UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
750
751@itemize
752
753@item @code{W}
754(word)
755- continuous sequence of letters
756
757@item @code{N}
758(number)
759- continuous sequence of digits
760
761@item @code{S}
762(space)
763- continuous sequence of space characters
764
765@item @code{P}
766(punctuation mark)
767- single printable characters not belonging to any of the other classes
768
769@item @code{B}
770(unprintable character)
771- single unprintable character
772
773@end itemize
774
775
776
777@node tok command line options
778@subsection Command line options
779
780@table @code
781
782@item @b{@minus{}@minus{}help}, @b{@minus{}h}
783Print help.
784
785@item @b{@minus{}@minus{}version}, @b{@minus{}V}
786Print version information.
787
788@item @b{@minus{}@minus{}interactive, @minus{}i}
789This option toggles interactive mode, which is by default off. In the
790interactive mode the program does not buffer the output.
791
792@end table
793
794@node tok example
795@subsection Example
796
797Input:
798
799@example
800Piszemy dobre programy.
801@end example
802
803Output:
804
805@example
8060000 07 W Piszemy
8070007 01 S _
8080008 05 W dobre
8090013 01 S _
8100014 08 W programy
8110022 01 P .
8120023 01 S \n
813@end example
814
815
816@c ---------------------------------------------------------------------
817@c SEN
818@c ---------------------------------------------------------------------
819
820@c @node sen - sentencizer
821@c @chapter sen - sentencizer
822
823@c Authors: Tomasz Obrêbski
824
825@c ---------------------------------------------------------------------
826@c LEM
827@c ---------------------------------------------------------------------
828
829@page
830@node lem
831@section lem - morphological analyzer
832
833@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
834@item @strong{Authors:}                 @tab Tomasz Obrêbski, Micha³ Stolarski
835@item @strong{Component category:}      @tab filter
836@end multitable
837
838@menu
839* lem description::             
840* lem command line options::   
841* lem input::
842* lem output::
843* lem example::                 
844* lem dictionaries::           
845* lem hints::           
846@end menu
847
848@node lem description
849@subsection Description
850
851@command{lem} performs morphological analysis of a simple orthographic
852word, returning all its possible morphological annotations,
853disregarding the context.
854
855@c ----------------------------------------
856
857@node lem command line options
858@subsection Command line options
859
860@table @code
861@parhelp
862@parversion
863@parinteractive
864@c @parfile
865@c @paroutput
866@c @parfail
867@c @parcopy
868@parinputfield
869@paroutputfield
870@pardictionary
871@parprocess
872@parselect
873@parunselect
874@paroneline
875@paronefield
876@end table
877
878@c ----------------------------------------
879
880@node lem input
881@subsection Input
882
883Lem reads a UTT file and processes the value of the @var{form} field
884(the input field may be changed with @option{--input-field} option).
885
886@node lem output
887@subsection Output
888
889@command{lem} adds a new annotation field, whose default name is @code{lem}.  In
890case of ambiguity either the segment is multiplicated (default),
891multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
892annotation is produced as the value of single @code{lem} field (option
893@option{--one-field,-1}):
894
895@itemize @bullet
896
897@item
898unambiguous value format:
899
900@example
901   <lemma>,<descr>
902@end example
903
904@item
905ambiguous value format (@option{--one-field} option)
906
907
908@example
909   <lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
910@end example
911
912(alternative descriptions for the same lemma are separated by commas,
913alternative lemmata are separated by semicolons.)
914
915@end itemize
916
917@node lem example
918@subsection Example
919
920Input:
921
922@example
9230000 07 W Piszemy
9240007 01 S _
9250008 05 W dobre
9260013 01 S _
9270014 08 W programy
9280022 01 P .
9290023 01 B \n
930@end example
931
932Output (default):
933
934@example
9350000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1
9360007 01 B _
9370008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
9380008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
9390013 01 B _
9400014 08 W programy lem:program,N/GiNpCa
9410014 08 W programy lem:program,N/GiNpCn
9420014 08 W programy lem:program,N/GiNpCv
9430022 01 P .
9440023 01 B \n
945@end example
946
947Output (@option{--one-line} option):
948
949@example
9500000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1
9510007 01 S _
9520008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
9530013 01 S _
9540014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
9550022 01 P .
9560023 01 S \n
957@end example
958
959Output (@option{--one-field} option):
960
961@example
9620000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1
9630007 01 S _
9640008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
9650013 01 S _
9660014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
9670022 01 P .
9680023 01 S \n
969@end example
970
971@c ----------------------------------------
972
973@node lem dictionaries
974@subsection Dictionaries
975
976@command{lem} requires a dictionary. The dictionary may be provided in
977one of two formats: in text (source) format or in binary (fsa) format.
978
979@subsubheading Text format
980
981Dictionary entries have the following structure:
982
983@example
984<form>;<lemma>,<descr>[;<lemma>,<descr>]
985@end example
986
987@var{lemma} may be given explicitly or in the cut-add format:
988
989@example
990@code{[<cut1><add1>-]<cut2><add2>}
991@end example
992
993meaning: replace prefix of length @code{<cut1>} with
994string @code{<add1>}, replace suffix of length @code{<cut2>} with string
995@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
996@samp{kot}, @code{3-4a³y} transforms @samp{najbielsi} into @samp{bia³y}
997
998Each dictionary entry must be written in one line and must not contain blank characters.
999
1000Examples:
1001@example
1002kot;0,N/GaNsCn
1003kota;1,N/GaNsCg;1,N/GaNsCa
1004kotu;1,N/GaNsCd
1005kotem;2,N/GaNsCi
1006kocie;3t,N/GaNsCl;3t,N/GaNsCv
1007najbielsi;3-4a³y,ADJ/DsNpCnGp
1008najbielsze;3-5a³y,ADJ/DsNpCnGaifn
1009najlepsi;dobry,ADJ/DsNpCnGp
1010najlepsze;dobry,ADJ/DsNpCnGaifn
1011@end example
1012
1013
1014The mandatory file name extension for a text dictionary is @code{dic}. For large
1015dictionaries it is preferable, however, to compile them into binary
1016(fsa) format.
1017
1018@subsubheading Binary format
1019
1020The mandatory file name extension for a binary dictionary is @code{bin}. To
1021compile a text dictionary into binary format, write:
1022
1023@example
1024compiledic <dictionaryname>.dic
1025@end example
1026
1027@subsubheading Polex/PMDBF dictionary
1028
1029A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
1030the distribution as the default @emph{lem}'s dictionary. It's
1031located by default in:
1032
1033@file{$HOME/.utt/pl/lem.bin}
1034
1035@node lem hints
1036@subsection Hints
1037
1038@c @subsubheading Combining data from multiple dictionaries
1039
1040@c @itemize
1041
1042@c @item Apply <dict1>, then apply <dict2> to words which were not annotatated.
1043
1044@c @example
1045@c lem -d <dict1> | lem -S lem -d <dict2>
1046@c @end example
1047
1048@c @item Add annotations from two dictionaries <dict1> and <dict2>.
1049
1050@c @example
1051@c lem -c -d <dict1> | lem -S lem -d <dict2>
1052@c @end example
1053
1054@c @end itemize
1055
1056
1057@c ---------------------------------------------------------------------
1058@c GUE
1059@c ---------------------------------------------------------------------
1060
1061@page
1062@node gue
1063@section gue - morphological guesser
1064
1065@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1066
1067@item @strong{Authors:}                 @tab Micha³ Stolarski, Tomasz Obrêbski
1068@item @strong{Component category:}      @tab filter
1069
1070@end multitable
1071
1072@command{gue} guesess morphological descriptions of the form contained
1073in the @var{form} field.
1074
1075@menu
1076* gue command line options::   
1077* gue example::                 
1078* gue dictionaries::           
1079@end menu
1080
1081@node gue command line options
1082@subsection Command line options
1083
1084@table @code
1085
1086@parhelp
1087@parversion
1088@parinteractive
1089@c @parfile
1090@c @paroutput
1091@c @parfail
1092@c @parcopy
1093@parinputfield
1094@paroutputfield
1095@pardictionary
1096@parprocess
1097@parselect
1098@parunselect
1099@paroneline
1100@paronefield
1101
1102@item @b{@minus{}@minus{}delta=@var{n}}
1103Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
1104
1105
1106@item @b{@minus{}@minus{}cut-off=@var{n}}
1107Do not display answers with less weight than cut-off value (default=`200').
1108
1109
1110@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
1111Guess up to n descriptions  (default=`0', which means 'display all results').
1112
1113
1114
1115@end table
1116
1117@node gue example
1118@subsection Example
1119
1120@example
1121command: gue -n 2
1122
1123input:
11240000 07 W smerfny
1125
1126output:
11270000 07 W smerfny gue:,ADJ/CaDpGiNs
11280000 07 W smerfny gue:,ADJ/CnvDpGaipNs
1129@end example
1130                                 
1131
1132@node gue dictionaries
1133@subsection Dictionaries
1134
1135@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
1136The fsa format is created by compiling text-format dictionaries.
1137
1138
1139
1140@subsubheading Text format
1141
1142Dictionary entries have the following structure:
1143
1144@example
1145@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
1146@end example
1147
1148@var{lemma} must be given in the cut-add format:
1149
1150@example
1151@code{[<cut1><add1>-]<cut2><add2>}
1152@end example
1153(no spaces in between): replace prefix of length @var{cut1} with
1154string @var{add1}, replace suffix of length @var{cat2} with string
1155@var{add2}.
1156
1157
1158Example: @code{3-4a³y} transforms @i{najbielsi} into @i{bia³y}
1159
1160
1161@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
1162
1163@var{weight} is an integer value between 1 and 999 indicating the
1164likelihood of the guess.
1165
1166@example
1167*³kê;1a,N/GfNsCa
1168naj*elszy;3-4a³y,ADJ/...:...
1169@end example
1170
1171
1172@c ---------------------------------------------------------------------
1173@c COR
1174@c ---------------------------------------------------------------------
1175
1176@page
1177@node cor
1178@section cor - spelling corrector
1179
1180@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1181@item @strong{Authors:}                 @tab Tomasz Obrêbski, Micha³ Stolarski
1182@item @strong{Component category:}      @tab filter
1183@end multitable
1184
1185The spelling corrector applies Kemal Oflazer's dynamic programming
1186algorithm @cite{oflazer96} to the FSA representation of the set of
1187word forms of the Polex/PMDBF dictionary. Given an incorrect
1188word form it returns all word forms present in the dictionary whose
1189edit distance is smaller than the threshold given as the parameter.
1190
1191By default @code{cor} replaces the contents of the @var{form} field
1192with new corrected value, placing the old contents in the @code{cor}
1193field.
1194
1195
1196@menu
1197* cor command line options::   
1198* cor dictionaries::           
1199@end menu
1200
1201
1202@node cor command line options
1203@subsection Command line options
1204
1205@table @code
1206
1207@parhelp
1208@parversion
1209@parinteractive
1210@c @parfile
1211@c @paroutput
1212@c @parfail
1213@c @parcopy
1214@parinputfield
1215@paroutputfield
1216@pardictionary
1217@parprocess
1218@parselect
1219@parunselect
1220@paroneline
1221@paronefield
1222
1223@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
1224Maximum edit distance (default='1').
1225
1226
1227@end table
1228
1229@node cor dictionaries
1230@subsection Dictionaries
1231
1232@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
1233The fsa format is created by compiling text-format dictionaries.
1234
1235@subsubheading Text format
1236
1237The @command{cor} dictionary is a list of words:
1238@example
1239odlot
1240odlotowy
1241odludek
1242@end example
1243
1244@page
1245@node sen
1246@section sen - a sentensizer
1247
1248@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1249
1250@item @strong{Authors:}                 @tab Tomasz Obrêbski
1251@item @strong{Component category:}      @tab filter
1252
1253@end multitable
1254
1255@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
1256
1257@menu
1258@c * sen input::
1259@c * sen output::
1260* sen example::                 
1261@end menu
1262
1263@node sen example
1264@subsection Example
1265
1266@example
1267command: sen
1268
1269input:
12700000 05 W Cze¶æ
12710005 01 P !
12720006 01 S _
12730007 02 W To
12740009 01 S _
12750010 02 W ja
12760012 01 P .
12770013 01 S \n
1278
1279output:
12800000 00 BOS *
12810000 05 W Cze¶æ
12820005 01 P !
12830006 00 EOS *
12840006 00 BOS *
12850006 01 S _
12860007 02 W To
12870009 01 S _
12880010 02 W ja
12890012 01 P .
12900013 01 S \n
12910014 00 EOS *
1292@end example
1293
1294
1295@c ---------------------------------------------------------------------
1296@c GPH
1297@c ---------------------------------------------------------------------
1298
1299@c @node gph - graphizer
1300@c @chapter gph - graphizer
1301
1302@c Authors: Tomasz Obrêbski
1303
1304
1305
1306@c SER
1307@c ---------------------------------------------------------------------
1308@c ---------------------------------------------------------------------
1309
1310@page
1311@node ser
1312@section ser - pattern search tool
1313
1314@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1315@item @strong{Authors:}                 @tab Tomasz Obrêbski
1316@item @strong{Component category:}      @tab filter
1317@end multitable
1318
1319@command{ser} looks for patterns in UTT-formatted texts.
1320
1321@menu
1322* ser command line options::   
1323* ser pattern::                 
1324* ser how ser works::           
1325* ser customization::           
1326* ser limitations::             
1327* ser requirements::           
1328@end menu
1329
1330
1331@c ---------------------------------------------------------------------
1332@node ser command line options
1333@subsection Command line options
1334
1335@table @code
1336
1337@parhelp
1338@parversion
1339@c @parfile
1340@c @paroutput
1341@c @parinputfield
1342@c @paroutputfield
1343@parprocess
1344@parinteractive
1345
1346@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1347The search pattern.
1348
1349@item @b{@minus{}@minus{}morph=@var{field}}
1350The name of the annotation field containing the morphological
1351description (default @code{lem}).
1352
1353@item @b{@minus{}@minus{}flex}
1354Only print the generated flex source code.
1355
1356@item @b{@minus{}@minus{}macro=@var{filename}}
1357Read macrodefinitions from file @var{filename} rather than from
1358default location. This option allows to redefine the set of terms.
1359
1360@item @b{@minus{}@minus{}define=@var{filename}}
1361Append macrodefinitions from file @var{filename}. This option
1362allows to extend the set of terms.
1363
1364@end table
1365
1366
1367@c ---------------------------------------------------------------------
1368@node ser pattern
1369@subsection Pattern
1370
1371The @command{ser} pattern is a regular expression over terms corresponding
1372to text segments or segment sequences. Predefined terms are:
1373
1374@table @code
1375
1376@item seg(@var{t},@var{f},@var{a})
1377a segment of type @var{t}, containing form @var{f} and annotation
1378@var{a}
1379
1380@item form(@var{f})
1381a segment containing form @var{f}
1382
1383@item field(@var{f})
1384a segment containing annotation field @var{f}
1385
1386@item space(@var{f})
1387a space segment of form @var{f}
1388
1389@item word(@var{f})
1390a word segment of form @var{f}
1391
1392@item punct(@var{f})
1393a punct segment of form @var{f}
1394
1395@item number(@var{f})
1396a number segment of form @var{f}
1397
1398@item lexeme(@var{f})
1399a word segment with lemma @var{f}
1400
1401@item cat(@var{c})
1402a word segment of category @var{c}
1403
1404@end table
1405
1406All arguments are optional. If an argument is omitted, an arbitrary
1407string of non-blank characters is assumed as the argument value. Term
1408arguments may be arbitrary character-level regular expressions. The
1409following special symbols can by used:
1410
1411@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1412@item @code{[@dots{}]}            @tab a character class
1413@item @code{[^@dots{}]}           @tab a negated character class
1414@item @code{|}                    @tab alternative
1415@item @code{*}                    @tab repetition, including zero times
1416@item @code{+}                    @tab repetition, at least one time
1417@item @code{?}                    @tab optionality
1418@item @code{@{@var{m},@var{n}@}}  @tab repetition from @var{m} to @var{n} times
1419@item @code{@{@var{m},@}}         @tab repetition @var{m} or more times
1420@item @code{@{@var{m}@}}          @tab repetition @var{m} times
1421@item @code{@var{\ddd}}           @tab the character with octal value @var{ddd}
1422@item @code{\x@var{hh}}           @tab the character with hexadecimal value @var{hh}
1423@item @code{( )}                  @tab parentheses, used to override precedence
1424@c @end multitable
1425
1426@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1427@item @code{.}    @tab a non-blank character
1428@item @code{\w}   @tab a letter
1429@item @code{\W}   @tab a non-blank character other than a letter
1430@item @code{\d}   @tab a digit
1431@item @code{\D}   @tab a non-blank character other than a digit
1432@item @code{\s}   @tab a space or tab character
1433@item @code{\S}   @tab a non-blank character (the same as @code{.})
1434@item @code{\l}   @tab a lowercase letter
1435@item @code{\L}   @tab an uppercase letter
1436@end multitable
1437
1438
1439@noindent The following characters:
1440@example
1441@verb{%  [   ]   ^   |   *   +   ?   {   }   ,   .   <   >   \ %}
1442@end example
1443must be escaped with a backslash, i.e. written as:
1444@example
1445@verb{% \[  \]  \^  \|  \*  \+  \?  \{  \}  \,  \.  \<  \>  \\ %}
1446@end example
1447
1448@quotation Note
1449The special symbols are ... borrowed from Perl with minor
1450modifications ... for convenience
1451The meaning of certain special characters/sequences slightly differs
1452from their common ???. This is motivated by convenience reasons.
1453The meaning of the @code{.} special character is modified due to
1454the special function of spaces in utt files (they are field
1455separators). Use @code{\s} to explicitly
1456@end quotation
1457
1458In the argument of the @code{cat} term a special operator <...> may be
1459used. A category specification enclosed in angle brackets matches all
1460category descriptions which are consistent (non-contradictory) with the
1461specification. For example @code{<N>} matches all noun descriptions,
1462@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
1463
1464
1465@*
1466@noindent @b{Examples of one-segment patterns:}
1467
1468@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1469@item @code{seg}            @tab any segment
1470@item @code{word}           @tab any word-form
1471@item @code{word(pomocy)}   @tab the word-form @samp{pomocy}
1472@item @code{word(naj.+)}    @tab a word-form beginning with @samp{naj}
1473@item @code{word(\L\l+)}    @tab a capitalized word-form
1474@item @code{punct}          @tab a punctuation character
1475@item @code{space(.*\\n.*)} @tab a space segment containing a newline character
1476@item @code{lexeme(pomoc)}  @tab any form of the lexeme 'pomoc'
1477@item @code{cat(N/.*)}      @tab a word which category starts with @code{N/}
1478@item @code{cat(<N/Ca>)}    @tab a word which category matches @code{N/Ca}
1479@end multitable
1480
1481@*
1482@noindent @b{Examples of multi-segment patterns:}
1483
1484@table @code
1485
1486@item (word(\L) punct(\.) space?)+ word(\L\l+)
1487a sequence of initials followed by a surname
1488
1489@item punct seg(W|S|N)* cat(<NPRO/Sr>) seg(W|S|N)* punct
1490a text fragment between two punctuation characters, containing an
1491ocurrence of a relative pronoun
1492
1493@end table
1494
1495
1496@node ser how ser works
1497@subsection How ser works
1498
1499@node ser customization
1500@subsection Customization
1501
1502@c All predefined terms correspond to single segments,
1503
1504@example
1505define(`verbseq', `(cat(V) (space cat(V)))')
1506@end example
1507
1508
1509the term @code{cat()} may not be used as a ... of
1510
1511@c See @command{m4} manual for further details on macro definition format.
1512
1513@node ser limitations
1514@subsection Limitations
1515
1516more than 3 attributes in <>.
1517
1518@node ser requirements
1519@subsection Requirements
1520
1521In order to run @command{ser}, the following programs must be
1522installed in the system:
1523
1524@itemize
1525
1526@item @command{m4}
1527@item @command{grep}
1528@item @command{flex}
1529@item @command{gcc}
1530
1531@end itemize
1532
1533
1534@c GRP
1535@c ---------------------------------------------------------------------
1536@c ---------------------------------------------------------------------
1537
1538@page
1539@node grp
1540@section grp - pattern search tool
1541
1542@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1543@item @strong{Authors:}                 @tab Tomasz Obrêbski
1544@item @strong{Component category:}      @tab filter
1545@end multitable
1546
1547
1548@code{gre} selects sentences containing an expression matching a
1549pattern. The pattern format is exactly the same as that accepted by
1550@code{ser}.
1551
1552@code{gre} is intended mainly for speeding up corpus search process.
1553It is extremely fast (processing speed is usually higher then the speed
1554of reading the corpus file from disk).
1555
1556
1557
1558@c @menu
1559@c * ser command line options::   
1560@c * ser pattern::                 
1561@c * ser how ser works::           
1562@c * ser customization::           
1563@c * ser limitations::             
1564@c * ser requirements::           
1565@c @end menu
1566@menu
1567* grp command line options::   
1568* grp pattern::                 
1569* grp hints::   
1570@end menu
1571
1572@node grp command line options
1573@subsection Command line options
1574
1575@table @code
1576
1577@parhelp
1578@parversion
1579@c @parfile
1580@c @paroutput
1581@c @parinputfield
1582@c @paroutputfield
1583@parprocess
1584@parinteractive
1585
1586@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1587The search pattern.
1588
1589@item @b{@minus{}@minus{}morph=@var{field}}
1590The name of the annotation field containing the morphological
1591description (default @code{lem}).
1592
1593@item @b{@minus{}@minus{}command}
1594Only print the generated flex source code.
1595
1596@item @b{@minus{}@minus{}macro=@var{filename}}
1597Read macrodefinitions from file @var{filename} rather than from
1598default location. This option allows to redefine the set of terms.
1599
1600@item @b{@minus{}@minus{}define=@var{filename}}
1601Append macrodefinitions from file @var{filename}. This option
1602allows to extend the set of terms.
1603
1604@end table
1605
1606
1607@node grp pattern
1608@subsection Pattern
1609
1610(see @code{ser})
1611
1612@node grp hints
1613@subsection Hints
1614
1615The corpus search speed may be increased by combining grp with lzop
1616compression tool (grp usually processes data faster than it is read from a
1617disk, especially for slow laptop drives).
1618
1619@example
1620cat corpus | tok | sen | lem | grp -a p | lzop -7 > corpus.grp.lzo
1621@end example
1622
1623@example
1624lzop -cd corpus.grp.lzo | grp -a gP -e @var{EXPR} | ser -e @var{EXPR}
1625@end example
1626
1627
1628@c ---------------------------------------------------------------------
1629@c kot
1630@c ---------------------------------------------------------------------
1631@c ---------------------------------------------------------------------
1632
1633@page
1634@node kot
1635@section kot - untokenizer
1636
1637Authors: Tomasz Obrêbski
1638
1639@command{kot} is the opposite of @command{tok}. It changes UTT-formatted text into plain text.
1640
1641@menu
1642* kot command line options::   
1643* kot usage examples::   
1644@end menu
1645
1646@node kot command line options
1647@subsection Command line options
1648
1649@table @code
1650
1651@parhelp
1652
1653@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1654
1655@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1656
1657@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1658
1659@c @item @b{@minus{}@minus{}interactive @minus{}i}
1660
1661@c @item @b{@minus{}@minus{}config=@var{filename}}
1662
1663@item
1664
1665@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
1666print @var{string} between nonadjacent segments of the input file
1667
1668@item @b{@minus{}@minus{}spaces, @minus{}r}
1669retain the special characters @code{_}, @code{\t},
1670@code{\n}, @code{\r}, @code{\f} unexpanded in the output
1671
1672@end table
1673
1674@node kot usage examples
1675@subsection Usage examples
1676
1677@example
1678cat legia.txt | tok | kot       
1679@end example
1680
1681@example
1682cat legia.txt | tok | lem -1 | kot
1683@end example
1684
1685@c CON............................................................
1686@c ...............................................................
1687@c ...............................................................
1688
1689@page
1690@node con
1691@section con - concordance table generator
1692
1693@command{con} generates a concordance table based on a pattern given to @command{ser}.
1694
1695@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1696@item @strong{Authors:}                 @tab Justyna Walkowska
1697@item @strong{Component category:}      @tab sink
1698@end multitable
1699@c
1700
1701@menu
1702* con command line options::
1703* con usage example::
1704* con hints::   
1705@end menu
1706
1707@node con command line options
1708@subsection Command line options
1709
1710@table @code
1711
1712@parhelp
1713
1714@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
1715@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1716@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1717@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1718@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
1719@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
1720@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
1721@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
1722@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
1723@c @item @b{@minus{}@minus{}interactive @minus{}i}
1724@c @item @b{@minus{}@minus{}config=@var{filename}}
1725@c @item
1726@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1727@c search pattern
1728@c
1729@c @item @b{@minus{}@minus{}flex}
1730@c only print the generated flex source code
1731@c
1732@c @item @b{@minus{}@minus{}macro=@var{filename}}
1733@c read macrodefinitions from file @var{filename} rather than from
1734@c default location. This option allows to redefine the set of terms.
1735@c
1736@c @item @b{@minus{}@minus{}define=@var{filename}}
1737@c append macrodefinitions from file @var{filename}. This option
1738@c allows to extend the set of terms.
1739
1740@item @b{@minus{}@minus{}left @minus{}l}           
1741        Left context info (default='30c'). Example:
1742@example                         
1743                                 -l=5c: left context is 5 characters
1744                                 -l=5w: left context is 5 words
1745                                 -l=5s: left context is 5 non-empty input lines
1746                                 -l='\s*\S+\sr\S+BOS': left context starts with the given regex
1747@end example
1748
1749@item @b{@minus{}@minus{}right @minus{}r}           
1750        Right context info (default='30c').
1751@item @b{@minus{}@minus{}trim @minus{}t}           
1752        Clear incomplete words from output.
1753@item @b{@minus{}@minus{}white @minus{}w}           
1754        DO NOT change all white characters into spaces.
1755@item @b{@minus{}@minus{}column @minus{}c}           
1756        Left column minimal width in characters (default = 0).
1757@item @b{@minus{}@minus{}ignore @minus{}i}           
1758        Ignore segment inconsistency in the input.
1759@item @b{@minus{}@minus{}bon}           
1760        Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
1761@item @b{@minus{}@minus{}eob}           
1762        End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
1763@item @b{@minus{}@minus{}bod}           
1764        Selected segment beginning display string (default='[').
1765@item @b{@minus{}@minus{}eod}           
1766        Selected segment end display string (default=']').
1767
1768
1769
1770@end table
1771
1772@node con usage example
1773@subsection Usage example
1774@example
1775cat file.txt | tok | lem -1 | ser -e 'lexeme(dom) | con' 
1776@end example
1777
1778
1779@node con hints
1780@subsection Hints
1781
1782@command{con} is a rather slow program. Do not pass large amounts of
1783redundant text through this program. @command{con} works fine in the following
1784sequence:
1785
1786@example
1787... | grp -e EXPR | ser -e EXPR | con
1788@end example
1789
1790
1791
1792@c ---------------------------------------------------------------------
1793@c ---------------------------------------------------------------------
1794
1795@page
1796@node Auxiliary tools
1797@chapter Auxiliary tools
1798
1799@menu
1800* compiledic::         dictionary compiler
1801* fla::                UTT file flattener
1802* unfla::              UTT file unflattener
1803@end menu
1804
1805
1806@page
1807@node compiledic
1808@section compiledic - the dictionary compiler
1809
1810@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1811@item @strong{Authors:}                 @tab Michal Stolarski, Tomasz Obrebski
1812@item @strong{Component category:}      @tab additional tool
1813@end multitable
1814@c
1815
1816@command{compiledic} compiles dictionaries in text format (@code{.dic} extension) into binary
1817(FSA) format (@code{.bin} extension).
1818
1819Automaton representation of a dictionary is built using the AT&T tools:
1820@itemize
1821@item AT&T FSM Library,
1822@item AT&T Lextools.
1823@end itemize
1824
1825In order for the compiledic program to work you have to install the
1826above mentioned packages into your system.  They are freely available
1827for non-commercial use.
1828
1829Usage:
1830@example
1831        compiledic <dictionaryname>.dic
1832@end example
1833
1834The file <dictionaryname>.bin will be generated.
1835
1836Remarque: The program produces a lot of temporary files which are
1837stored in the current directory. They are deleted after successfull
1838termination of the program.
1839
1840@c @menu
1841@c * con command line options::
1842@c * con usage example::
1843@c * con hints::   
1844@c @end menu
1845
1846
1847@page
1848@node fla
1849@section fla - the UTT file flattener
1850
1851@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1852@item @strong{Authors:}                 @tab Tomasz Obrêbski
1853@item @strong{Component category:}      @tab filter
1854@end multitable
1855@c
1856
1857@command{fla} ``flattens'' a utt file by merging segments belonging
1858to one sentence in one line. Technically, end-of-line characters
1859('\n', ASCII code 10) are replaced with line-feed characters ('\f',
1860ASCII code 12).  The flattening makes it possible to process UTT files
1861with such tools as @command{grep} or @command{sed} sentence by
1862sentence (used in @command{grp} and @command{mar}).
1863
1864Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
1865
1866Flattened files are still human-readible.
1867
1868Usage:
1869
1870@example
1871        fla [<bosregex>]
1872@end example
1873
1874The facultative argument is a regular expression describing segments
1875which should be treated as sentence beginnings (the test is: the
1876segment contains a fragment matching the @code{<bosregex>}). By
1877default, segments containing a field @code{BOS} are seeked.
1878@c @menu
1879@c * con command line options::
1880@c * con usage example::
1881@c * con hints::   
1882@c @end menu
1883
1884
1885
1886@page
1887@node unfla
1888@section unfla - the UTT file unflattener
1889
1890@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1891@item @strong{Authors:}                 @tab Tomasz Obrêbski
1892@item @strong{Component category:}      @tab filter
1893@end multitable
1894
1895@command{unfla} transforms a flattened UTT file, produced by
1896@command{fla}, into the regular format by restoring end-of-line
1897characters.
1898
1899
1900
1901
1902@c ---------------------------------------------------------------------
1903@c USAGE EXAMPLES
1904@c ---------------------------------------------------------------------
1905
1906@node Usage examples
1907@chapter Usage examples
1908
1909@subsubheading Simple pipelines
1910
1911@enumerate
1912
1913@item tokenization
1914
1915cat text | tok > output1
1916
1917@item morphological annotation (1)
1918
1919simple dictionary based lemmatization
1920
1921cat text | tok | lem > output1
1922
1923@item morphological annotation (2)
1924
19251) perform dictionary-based lemmatization
19264) guess descriptions for words which have no annotation
1927
1928@example
1929cat text | tok | lem | gue -S lem > output2
1930@end example
1931
1932@item morphological annotation (3)
1933
19341) perform dictionary-based lemmatization
19352) try to correct words with no annotation
19363) perform dictionary-based lemmatization of corrected words
19374) guess descriptions for words which still have no annotation
1938
1939@example
1940cat text | tok | lem | cor -p W -S lem | lem -I cor | gue -p W -S lem
1941@end example
1942@item spelling correction
1943
1944
1945
1946@example
1947cat text | tok | lem --only-fail | cor -1 > output3
1948@end example
1949
1950@item Expression extraction
1951
1952Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
1953
1954@example
1955cat text | tok | lem -1 | ser -e 'cat(<V>) space lexeme(rozmowa)' -m | kot > output4
1956@end example
1957
1958@item A word in context
1959
1960Extraction of text fragments containing a form of the lexeme 'rozmowa' in
1961the context of 5 preceeding and 5 succeeding corpus segments.
1962
1963@example
1964cat text | tok | lem -1 | ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m | kot > output
1965@end example
1966
1967@item generation of concordance table (1)
1968
1969@example
1970cat text | tok | lem -1 | ser -e 'cat(<V>) space lexeme(rozmowa)' | con
1971@end example
1972
197310"
1974
1975@item generation of concordance table (2)
1976
1977The same as above but much faster
1978
1979@example
1980cat text | tok | lem -1 | \
1981grp -e 'cat(<V>) space lexeme(rozmowa)' | \
1982ser -e 'cat(<V>) space lexeme(rozmowa)' | \
1983con
1984@end example
1985
19862"
1987
1988@item generation of concordance table (3)
1989
1990Usually, one performs repetitively search over the same corpus. In
1991such case it is advisable to transform the corpus data into the format
1992required by @command{grp} first, and then use the preprocessed data.
1993
1994As @command{grp} (@command{grep}) processes data faster then it is
1995read from the disk drive, the search time may be still shortened by
1996using file compression techniques.  We suggest usin @command{lzop}.
1997
1998@item the fastest way to search a large corpus
1999
2000step 1: preprocessing
2001
2002@example
2003cat corpus | tok | sen | lem -1 \
2004| grp -a p | lzop -7 > corpus.grp.lzo
2005@end example
2006
2007step 2: search
2008
2009@example
2010lzop -cd corpus.grp.lzo | grp -a gP -e 'cat(<V>) space
2011lexeme(rozmowa)' | ser -e 'cat(<V>) space lexeme(rozmowa)' | con
2012@end example
2013
2014@end enumerate
2015
2016@subsubheading More complicated configurations
2017
2018
2019@example
2020mknod fifo1 p
2021mknod fifo2 p
2022mknod fifo3 p
2023mknod fifo4 p
2024mknod fifo5 p
2025
2026tok | lem -p W -e fifo1 > fifo2 &
2027cor -e fifo3 < fifo1 | lem > fifo4 &
2028gue < fifo3 > fifo5 &
2029sort -m fifo2 fifo4 fifo5
2030
2031rm fifo?
2032@end example
2033
2034
2035@c ---------------------------------------------------------------------
2036@c ---------------------------------------------------------------------
2037
2038@c ---------------------------------------------------------------------
2039@c PMDBF DICTIONARY
2040@c ---------------------------------------------------------------------
2041
2042@node PMDBF dictionary
2043@chapter PMDBF dictionary
2044
2045UTT components come with lexical data derived from Polish
2046Morphological Database (PMDB).
2047
2048@menu
2049* PMDBF files::   
2050* PMDBF tag structure::                 
2051* PMDBF parts of speech::           
2052* PMDBF morphosyntactic attributes::           
2053@end menu
2054
2055@node PMDBF files
2056@section Files
2057
2058@node PMDBF tag structure
2059@section Tag structure
2060
2061pos = [[:upper:]]+
2062
2063attr = [[:upper:]]+
2064
2065val = [[:lower:][:digit:]?!*+-] | <[^>\n]+>
2066
2067descr = pos ( / ( attr val + ) + ) ?
2068
2069@node PMDBF parts of speech
2070@section Parts of speech
2071
2072@multitable {ADJPRP} { adjectival-passive-participle }
2073@item @code{N} @tab noun
2074@item @code{NPRO} @tab nominal-pronoun
2075@item @code{NV} @tab deverbal-noun
2076@item @code{V} @tab verb
2077@item @code{BYC} @tab byc
2078@item @code{VNI} @tab non-inflected-verb
2079@item @code{ADJ} @tab adjective
2080@item @code{ADJPAP} @tab adjectival-passive-participle
2081@item @code{ADJPRP} @tab adjectival-present-participle
2082@item @code{ADJPP} @tab adjectival-past-participle
2083@item @code{ADJPRO} @tab adjectival-pronoun
2084@item @code{ADJNUM} @tab adjectival-numeral
2085@item @code{ADV} @tab adverb
2086@item @code{ADVANP} @tab adverbial-anterior-participle
2087@item @code{ADVPRP} @tab adverbial-present-participle
2088@item @code{ADVPRO} @tab adverbial-pronoun
2089@item @code{ADVNUM} @tab  adverbial-numeral
2090@item @code{P} @tab preposition
2091@item @code{PPRO} @tab prep-noun-pronoun
2092@item @code{CONJ} @tab conjunction
2093@item @code{EXCL} @tab exclamation
2094@item @code{APP} @tab call
2095@item @code{ONO} @tab onomatopoeia
2096@item @code{PART} @tab particle
2097@item @code{NUMCRD} @tab cardinal-numeral
2098@item @code{NUMCOL} @tab collective-numeral
2099@item @code{NUMPAR} @tab partitive-numeral
2100@item @code{NUMORD} @tab ordinal-numeral
2101@end multitable
2102
2103@node PMDBF morphosyntactic attributes
2104@section Morphosyntactic attributes
2105
2106@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2107@c @headitem Attr @tab Val @tab Description
2108@item
2109@code{A} @tab @tab Aspect
2110@item
2111@tab @code{p} @tab perfect
2112@item
2113@tab @code{i} @tab imperfect.
2114@item
2115@item
2116@code{V} @tab @tab Verb-Form
2117@item
2118@tab @code{b} @tab infinitive,
2119@item
2120@tab @code{p} @tab personal,
2121@item
2122@tab @code{i} @tab impersonal.
2123@item
2124@item
2125@code{M} @tab @tab Mood
2126@item
2127@tab @code{d} @tab declarative,
2128@item
2129@tab @code{c} @tab conditional,
2130@item
2131@tab @code{i} @tab imperative.
2132@item
2133@item
2134@code{T} @tab @tab Tense
2135@item
2136@tab @code{a} @tab past,
2137@item
2138@tab @code{r} @tab present,
2139@item
2140@tab @code{f} @tab future.
2141@item
2142@item
2143@code{P} @tab @tab Person
2144@item
2145@tab @code{1} @tab 1,
2146@item
2147@tab @code{2} @tab 2,
2148@item
2149@tab @code{3} @tab 3.
2150@item
2151@item
2152@code{D} @tab @tab Degree
2153@item
2154@tab @code{p} @tab positive,
2155@item
2156@tab @code{c} @tab comparative,
2157@item
2158@tab @code{s} @tab superlative.
2159@item
2160@item
2161@code{N} @tab @tab Number
2162@item
2163@tab @code{s} @tab singular,
2164@item
2165@tab @code{p} @tab plural.
2166@item
2167@item
2168@code{C} @tab @tab Case
2169@item
2170@tab @code{n} @tab nominative,
2171@item
2172@tab @code{g} @tab genitive,
2173@item
2174@tab @code{d} @tab dative,
2175@item
2176@tab @code{a} @tab accusative,
2177@item
2178@tab @code{i} @tab instrumantal,
2179@item
2180@tab @code{l} @tab locative,
2181@item
2182@tab @code{v} @tab vocative.
2183@item
2184@item
2185@code{G} @tab @tab Gender
2186@item
2187@tab @code{p} @tab masculine-personal,
2188@item
2189@tab @code{a} @tab masculine-animal,
2190@item
2191@tab @code{i} @tab masculine-inanimate,
2192@item
2193@tab @code{f} @tab feminine,
2194@item
2195@tab @code{n} @tab neuter.
2196@end multitable
2197
2198
2199@c ---------------------------------------------------------------------
2200@c ---------------------------------------------------------------------
2201@c
2202@c @node Examples
2203@c @chapter Examples
2204
2205@c ----------------------------------------------------------------------
2206@c ----------------------------------------------------------------------
2207
2208@node    GNU Free Documentation License
2209@chapter GNU Free Documentation License
2210
2211@c The GNU Free Documentation License.
2212@center Version 1.2, November 2002
2213
2214@c This file is intended to be included within another document,
2215@c hence no sectioning command or @node.
2216
2217@display
2218Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
221951 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA
2220
2221Everyone is permitted to copy and distribute verbatim copies
2222of this license document, but changing it is not allowed.
2223@end display
2224
2225@enumerate 0
2226@item
2227PREAMBLE
2228
2229The purpose of this License is to make a manual, textbook, or other
2230functional and useful document @dfn{free} in the sense of freedom: to
2231assure everyone the effective freedom to copy and redistribute it,
2232with or without modifying it, either commercially or noncommercially.
2233Secondarily, this License preserves for the author and publisher a way
2234to get credit for their work, while not being considered responsible
2235for modifications made by others.
2236
2237This License is a kind of ``copyleft'', which means that derivative
2238works of the document must themselves be free in the same sense.  It
2239complements the GNU General Public License, which is a copyleft
2240license designed for free software.
2241
2242We have designed this License in order to use it for manuals for free
2243software, because free software needs free documentation: a free
2244program should come with manuals providing the same freedoms that the
2245software does.  But this License is not limited to software manuals;
2246it can be used for any textual work, regardless of subject matter or
2247whether it is published as a printed book.  We recommend this License
2248principally for works whose purpose is instruction or reference.
2249
2250@item
2251APPLICABILITY AND DEFINITIONS
2252
2253This License applies to any manual or other work, in any medium, that
2254contains a notice placed by the copyright holder saying it can be
2255distributed under the terms of this License.  Such a notice grants a
2256world-wide, royalty-free license, unlimited in duration, to use that
2257work under the conditions stated herein.  The ``Document'', below,
2258refers to any such manual or work.  Any member of the public is a
2259licensee, and is addressed as ``you''.  You accept the license if you
2260copy, modify or distribute the work in a way requiring permission
2261under copyright law.
2262
2263A ``Modified Version'' of the Document means any work containing the
2264Document or a portion of it, either copied verbatim, or with
2265modifications and/or translated into another language.
2266
2267A ``Secondary Section'' is a named appendix or a front-matter section
2268of the Document that deals exclusively with the relationship of the
2269publishers or authors of the Document to the Document's overall
2270subject (or to related matters) and contains nothing that could fall
2271directly within that overall subject.  (Thus, if the Document is in
2272part a textbook of mathematics, a Secondary Section may not explain
2273any mathematics.)  The relationship could be a matter of historical
2274connection with the subject or with related matters, or of legal,
2275commercial, philosophical, ethical or political position regarding
2276them.
2277
2278The ``Invariant Sections'' are certain Secondary Sections whose titles
2279are designated, as being those of Invariant Sections, in the notice
2280that says that the Document is released under this License.  If a
2281section does not fit the above definition of Secondary then it is not
2282allowed to be designated as Invariant.  The Document may contain zero
2283Invariant Sections.  If the Document does not identify any Invariant
2284Sections then there are none.
2285
2286The ``Cover Texts'' are certain short passages of text that are listed,
2287as Front-Cover Texts or Back-Cover Texts, in the notice that says that
2288the Document is released under this License.  A Front-Cover Text may
2289be at most 5 words, and a Back-Cover Text may be at most 25 words.
2290
2291A ``Transparent'' copy of the Document means a machine-readable copy,
2292represented in a format whose specification is available to the
2293general public, that is suitable for revising the document
2294straightforwardly with generic text editors or (for images composed of
2295pixels) generic paint programs or (for drawings) some widely available
2296drawing editor, and that is suitable for input to text formatters or
2297for automatic translation to a variety of formats suitable for input
2298to text formatters.  A copy made in an otherwise Transparent file
2299format whose markup, or absence of markup, has been arranged to thwart
2300or discourage subsequent modification by readers is not Transparent.
2301An image format is not Transparent if used for any substantial amount
2302of text.  A copy that is not ``Transparent'' is called ``Opaque''.
2303
2304Examples of suitable formats for Transparent copies include plain
2305@sc{ascii} without markup, Texinfo input format, La@TeX{} input
2306format, @acronym{SGML} or @acronym{XML} using a publicly available
2307@acronym{DTD}, and standard-conforming simple @acronym{HTML},
2308PostScript or @acronym{PDF} designed for human modification.  Examples
2309of transparent image formats include @acronym{PNG}, @acronym{XCF} and
2310@acronym{JPG}.  Opaque formats include proprietary formats that can be
2311read and edited only by proprietary word processors, @acronym{SGML} or
2312@acronym{XML} for which the @acronym{DTD} and/or processing tools are
2313not generally available, and the machine-generated @acronym{HTML},
2314PostScript or @acronym{PDF} produced by some word processors for
2315output purposes only.
2316
2317The ``Title Page'' means, for a printed book, the title page itself,
2318plus such following pages as are needed to hold, legibly, the material
2319this License requires to appear in the title page.  For works in
2320formats which do not have any title page as such, ``Title Page'' means
2321the text near the most prominent appearance of the work's title,
2322preceding the beginning of the body of the text.
2323
2324A section ``Entitled XYZ'' means a named subunit of the Document whose
2325title either is precisely XYZ or contains XYZ in parentheses following
2326text that translates XYZ in another language.  (Here XYZ stands for a
2327specific section name mentioned below, such as ``Acknowledgements'',
2328``Dedications'', ``Endorsements'', or ``History''.)  To ``Preserve the Title''
2329of such a section when you modify the Document means that it remains a
2330section ``Entitled XYZ'' according to this definition.
2331
2332The Document may include Warranty Disclaimers next to the notice which
2333states that this License applies to the Document.  These Warranty
2334Disclaimers are considered to be included by reference in this
2335License, but only as regards disclaiming warranties: any other
2336implication that these Warranty Disclaimers may have is void and has
2337no effect on the meaning of this License.
2338
2339@item
2340VERBATIM COPYING
2341
2342You may copy and distribute the Document in any medium, either
2343commercially or noncommercially, provided that this License, the
2344copyright notices, and the license notice saying this License applies
2345to the Document are reproduced in all copies, and that you add no other
2346conditions whatsoever to those of this License.  You may not use
2347technical measures to obstruct or control the reading or further
2348copying of the copies you make or distribute.  However, you may accept
2349compensation in exchange for copies.  If you distribute a large enough
2350number of copies you must also follow the conditions in section 3.
2351
2352You may also lend copies, under the same conditions stated above, and
2353you may publicly display copies.
2354
2355@item
2356COPYING IN QUANTITY
2357
2358If you publish printed copies (or copies in media that commonly have
2359printed covers) of the Document, numbering more than 100, and the
2360Document's license notice requires Cover Texts, you must enclose the
2361copies in covers that carry, clearly and legibly, all these Cover
2362Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
2363the back cover.  Both covers must also clearly and legibly identify
2364you as the publisher of these copies.  The front cover must present
2365the full title with all words of the title equally prominent and
2366visible.  You may add other material on the covers in addition.
2367Copying with changes limited to the covers, as long as they preserve
2368the title of the Document and satisfy these conditions, can be treated
2369as verbatim copying in other respects.
2370
2371If the required texts for either cover are too voluminous to fit
2372legibly, you should put the first ones listed (as many as fit
2373reasonably) on the actual cover, and continue the rest onto adjacent
2374pages.
2375
2376If you publish or distribute Opaque copies of the Document numbering
2377more than 100, you must either include a machine-readable Transparent
2378copy along with each Opaque copy, or state in or with each Opaque copy
2379a computer-network location from which the general network-using
2380public has access to download using public-standard network protocols
2381a complete Transparent copy of the Document, free of added material.
2382If you use the latter option, you must take reasonably prudent steps,
2383when you begin distribution of Opaque copies in quantity, to ensure
2384that this Transparent copy will remain thus accessible at the stated
2385location until at least one year after the last time you distribute an
2386Opaque copy (directly or through your agents or retailers) of that
2387edition to the public.
2388
2389It is requested, but not required, that you contact the authors of the
2390Document well before redistributing any large number of copies, to give
2391them a chance to provide you with an updated version of the Document.
2392
2393@item
2394MODIFICATIONS
2395
2396You may copy and distribute a Modified Version of the Document under
2397the conditions of sections 2 and 3 above, provided that you release
2398the Modified Version under precisely this License, with the Modified
2399Version filling the role of the Document, thus licensing distribution
2400and modification of the Modified Version to whoever possesses a copy
2401of it.  In addition, you must do these things in the Modified Version:
2402
2403@enumerate A
2404@item
2405Use in the Title Page (and on the covers, if any) a title distinct
2406from that of the Document, and from those of previous versions
2407(which should, if there were any, be listed in the History section
2408of the Document).  You may use the same title as a previous version
2409if the original publisher of that version gives permission.
2410
2411@item
2412List on the Title Page, as authors, one or more persons or entities
2413responsible for authorship of the modifications in the Modified
2414Version, together with at least five of the principal authors of the
2415Document (all of its principal authors, if it has fewer than five),
2416unless they release you from this requirement.
2417
2418@item
2419State on the Title page the name of the publisher of the
2420Modified Version, as the publisher.
2421
2422@item
2423Preserve all the copyright notices of the Document.
2424
2425@item
2426Add an appropriate copyright notice for your modifications
2427adjacent to the other copyright notices.
2428
2429@item
2430Include, immediately after the copyright notices, a license notice
2431giving the public permission to use the Modified Version under the
2432terms of this License, in the form shown in the Addendum below.
2433
2434@item
2435Preserve in that license notice the full lists of Invariant Sections
2436and required Cover Texts given in the Document's license notice.
2437
2438@item
2439Include an unaltered copy of this License.
2440
2441@item
2442Preserve the section Entitled ``History'', Preserve its Title, and add
2443to it an item stating at least the title, year, new authors, and
2444publisher of the Modified Version as given on the Title Page.  If
2445there is no section Entitled ``History'' in the Document, create one
2446stating the title, year, authors, and publisher of the Document as
2447given on its Title Page, then add an item describing the Modified
2448Version as stated in the previous sentence.
2449
2450@item
2451Preserve the network location, if any, given in the Document for
2452public access to a Transparent copy of the Document, and likewise
2453the network locations given in the Document for previous versions
2454it was based on.  These may be placed in the ``History'' section.
2455You may omit a network location for a work that was published at
2456least four years before the Document itself, or if the original
2457publisher of the version it refers to gives permission.
2458
2459@item
2460For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
2461the Title of the section, and preserve in the section all the
2462substance and tone of each of the contributor acknowledgements and/or
2463dedications given therein.
2464
2465@item
2466Preserve all the Invariant Sections of the Document,
2467unaltered in their text and in their titles.  Section numbers
2468or the equivalent are not considered part of the section titles.
2469
2470@item
2471Delete any section Entitled ``Endorsements''.  Such a section
2472may not be included in the Modified Version.
2473
2474@item
2475Do not retitle any existing section to be Entitled ``Endorsements'' or
2476to conflict in title with any Invariant Section.
2477
2478@item
2479Preserve any Warranty Disclaimers.
2480@end enumerate
2481
2482If the Modified Version includes new front-matter sections or
2483appendices that qualify as Secondary Sections and contain no material
2484copied from the Document, you may at your option designate some or all
2485of these sections as invariant.  To do this, add their titles to the
2486list of Invariant Sections in the Modified Version's license notice.
2487These titles must be distinct from any other section titles.
2488
2489You may add a section Entitled ``Endorsements'', provided it contains
2490nothing but endorsements of your Modified Version by various
2491parties---for example, statements of peer review or that the text has
2492been approved by an organization as the authoritative definition of a
2493standard.
2494
2495You may add a passage of up to five words as a Front-Cover Text, and a
2496passage of up to 25 words as a Back-Cover Text, to the end of the list
2497of Cover Texts in the Modified Version.  Only one passage of
2498Front-Cover Text and one of Back-Cover Text may be added by (or
2499through arrangements made by) any one entity.  If the Document already
2500includes a cover text for the same cover, previously added by you or
2501by arrangement made by the same entity you are acting on behalf of,
2502you may not add another; but you may replace the old one, on explicit
2503permission from the previous publisher that added the old one.
2504
2505The author(s) and publisher(s) of the Document do not by this License
2506give permission to use their names for publicity for or to assert or
2507imply endorsement of any Modified Version.
2508
2509@item
2510COMBINING DOCUMENTS
2511
2512You may combine the Document with other documents released under this
2513License, under the terms defined in section 4 above for modified
2514versions, provided that you include in the combination all of the
2515Invariant Sections of all of the original documents, unmodified, and
2516list them all as Invariant Sections of your combined work in its
2517license notice, and that you preserve all their Warranty Disclaimers.
2518
2519The combined work need only contain one copy of this License, and
2520multiple identical Invariant Sections may be replaced with a single
2521copy.  If there are multiple Invariant Sections with the same name but
2522different contents, make the title of each such section unique by
2523adding at the end of it, in parentheses, the name of the original
2524author or publisher of that section if known, or else a unique number.
2525Make the same adjustment to the section titles in the list of
2526Invariant Sections in the license notice of the combined work.
2527
2528In the combination, you must combine any sections Entitled ``History''
2529in the various original documents, forming one section Entitled
2530``History''; likewise combine any sections Entitled ``Acknowledgements'',
2531and any sections Entitled ``Dedications''.  You must delete all
2532sections Entitled ``Endorsements.''
2533
2534@item
2535COLLECTIONS OF DOCUMENTS
2536
2537You may make a collection consisting of the Document and other documents
2538released under this License, and replace the individual copies of this
2539License in the various documents with a single copy that is included in
2540the collection, provided that you follow the rules of this License for
2541verbatim copying of each of the documents in all other respects.
2542
2543You may extract a single document from such a collection, and distribute
2544it individually under this License, provided you insert a copy of this
2545License into the extracted document, and follow this License in all
2546other respects regarding verbatim copying of that document.
2547
2548@item
2549AGGREGATION WITH INDEPENDENT WORKS
2550
2551A compilation of the Document or its derivatives with other separate
2552and independent documents or works, in or on a volume of a storage or
2553distribution medium, is called an ``aggregate'' if the copyright
2554resulting from the compilation is not used to limit the legal rights
2555of the compilation's users beyond what the individual works permit.
2556When the Document is included in an aggregate, this License does not
2557apply to the other works in the aggregate which are not themselves
2558derivative works of the Document.
2559
2560If the Cover Text requirement of section 3 is applicable to these
2561copies of the Document, then if the Document is less than one half of
2562the entire aggregate, the Document's Cover Texts may be placed on
2563covers that bracket the Document within the aggregate, or the
2564electronic equivalent of covers if the Document is in electronic form.
2565Otherwise they must appear on printed covers that bracket the whole
2566aggregate.
2567
2568@item
2569TRANSLATION
2570
2571Translation is considered a kind of modification, so you may
2572distribute translations of the Document under the terms of section 4.
2573Replacing Invariant Sections with translations requires special
2574permission from their copyright holders, but you may include
2575translations of some or all Invariant Sections in addition to the
2576original versions of these Invariant Sections.  You may include a
2577translation of this License, and all the license notices in the
2578Document, and any Warranty Disclaimers, provided that you also include
2579the original English version of this License and the original versions
2580of those notices and disclaimers.  In case of a disagreement between
2581the translation and the original version of this License or a notice
2582or disclaimer, the original version will prevail.
2583
2584If a section in the Document is Entitled ``Acknowledgements'',
2585``Dedications'', or ``History'', the requirement (section 4) to Preserve
2586its Title (section 1) will typically require changing the actual
2587title.
2588
2589@item
2590TERMINATION
2591
2592You may not copy, modify, sublicense, or distribute the Document except
2593as expressly provided for under this License.  Any other attempt to
2594copy, modify, sublicense or distribute the Document is void, and will
2595automatically terminate your rights under this License.  However,
2596parties who have received copies, or rights, from you under this
2597License will not have their licenses terminated so long as such
2598parties remain in full compliance.
2599
2600@item
2601FUTURE REVISIONS OF THIS LICENSE
2602
2603The Free Software Foundation may publish new, revised versions
2604of the GNU Free Documentation License from time to time.  Such new
2605versions will be similar in spirit to the present version, but may
2606differ in detail to address new problems or concerns.  See
2607@uref{http://www.gnu.org/copyleft/}.
2608
2609Each version of the License is given a distinguishing version number.
2610If the Document specifies that a particular numbered version of this
2611License ``or any later version'' applies to it, you have the option of
2612following the terms and conditions either of that specified version or
2613of any later version that has been published (not as a draft) by the
2614Free Software Foundation.  If the Document does not specify a version
2615number of this License, you may choose any version ever published (not
2616as a draft) by the Free Software Foundation.
2617@end enumerate
2618
2619@page
2620@heading ADDENDUM: How to use this License for your documents
2621
2622To use this License in a document you have written, include a copy of
2623the License in the document and put the following copyright and
2624license notices just after the title page:
2625
2626@smallexample
2627@group
2628  Copyright (C)  @var{year}  @var{your name}.
2629  Permission is granted to copy, distribute and/or modify this document
2630  under the terms of the GNU Free Documentation License, Version 1.2
2631  or any later version published by the Free Software Foundation;
2632  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
2633  Texts.  A copy of the license is included in the section entitled ``GNU
2634  Free Documentation License''.
2635@end group
2636@end smallexample
2637
2638If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
2639replace the ``with@dots{}Texts.'' line with this:
2640
2641@smallexample
2642@group
2643    with the Invariant Sections being @var{list their titles}, with
2644    the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
2645    being @var{list}.
2646@end group
2647@end smallexample
2648
2649If you have Invariant Sections without Cover Texts, or some other
2650combination of the three, merge those two alternatives to suit the
2651situation.
2652
2653If your document contains nontrivial examples of program code, we
2654recommend releasing these examples in parallel under your choice of
2655free software license, such as the GNU General Public License,
2656to permit their use in free software.
2657
2658@c Local Variables:
2659@c ispell-local-pdict: "ispell-dict"
2660@c End:
2661
2662
2663@c ---------------------------------------------------------------------
2664@c ---------------------------------------------------------------------
2665
2666@node    Reporting bugs
2667@chapter Reporting bugs
2668
2669Report bugs to <obrebski@@amu.edu.pl>.
2670
2671@c ---------------------------------------------------------------------
2672@c ---------------------------------------------------------------------
2673
2674@c @node    Copyright
2675@c @chapter Copyright
2676@c
2677@c Copyright 2004 by Tomasz Obrebski
2678@c This software is free for research and educational use.
2679
2680@c ---------------------------------------------------------------------
2681@c ---------------------------------------------------------------------
2682
2683@node    Author
2684@chapter Author
2685
2686
2687@bye
Note: See TracBrowser for help on using the repository browser.