source: app/doc/utt.texinfo @ 9ace5d2

help
Last change on this file since 9ace5d2 was 9ace5d2, checked in by obrebski <obrebski@…>, 16 years ago

trochę zmian

M app/doc/utt.texinfo
M app/src/dgp/sgraph.hh
M app/src/dgp/const.hh
M app/src/dgp/grammar.hh
M app/src/dgp/thesymbols.hh
M app/src/dgp/dgc
M app/src/dgp/sgraph.cc
M app/src/dgp/grammar.cc

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@63 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

  • Property mode set to 100644
File size: 82.6 KB
Line 
1
2\input texinfo   @c -*-texinfo-*-
3@c @documentencoding ISO-8859-2
4@documentencoding UTF-8
5@c @documentlanguage pl
6
7@c %**start of header
8@setfilename utt.info
9@settitle UAM Text Tools v0.90
10@c %**end of header
11
12@copying
13This manual is for UAM Text Tools (version 0.90, October, 2008)
14
15Copyright @copyright{}  2005, 2007  Tomasz Obrębski, Michał Stolarski, Justyna Walkowska, Paweł Konieczka.
16
17Permission is granted to copy, distribute and/or modify this document
18under the terms of the GNU Free Documentation License, Version 1.2 or
19any later version published by the Free Software Foundation; with no
20Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.  A
21copy of the license is included in the section entitled GNU Free
22Documentation License,,GNU Free Documentation License.
23
24@c @quotation
25@c Permission is granted to ...
26@c No permission is granted until the document is completed.
27@c @end quotation
28@end copying
29
30
31@titlepage
32@title UAM Text Tools 0.90 - User Manual
33@subtitle edition 0.01, @today
34@subtitle status: prescript
35@author by Justyna Walkowska, Tomasz Obrębski and Michał Stolarski
36@page
37@vskip 0pt plus 1filll
38@insertcopying
39@end titlepage
40
41@contents
42
43@c @paragraphindent none
44
45@iftex
46@tex
47% \usepackage[T1]{fontenc}
48% \usepackage[utf8]{inputenc}
49% \usepackage{times}
50@end tex
51
52@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
53@end iftex
54@c @headings off
55@c @everyheading LEM(1) @| @| LEM(1)
56@everyfooting @today @c @| @thispage @|
57
58@ifnottex
59
60@node Top
61@top UTT - UAM Text Tools
62
63@insertcopying
64
65@menu
66* General information::                       
67* UTT file format::             
68* Configuration files::         
69* UTT components::
70* Auxiliary tools::
71* Usage examples::             
72* PMDBF dictionary::           
73@c * Examples::                   
74@c * Copyright::
75* GNU Free Documentation License::
76* Reporting bugs::                                   
77* Author::                     
78@end menu
79@end ifnottex
80
81
82@c ----------------------------------------------------------------------
83
84@node General information
85@chapter General information
86
87UAM Text Tools (UTT) is a package of language processing tools
88developed at Adam Mickiewicz University. Its functionality includes:
89
90@itemize @bullet
91
92@item
93tokenization ółąŌ
94@item
95dictionary-based morphological analysis
96@item
97heuristic morphological analysis of unknown words
98@item
99spelling correction ółąśćŌ
100@item
101pattern search
102@item
103sentence splitting
104@item
105generation of concordance tables
106@end itemize
107
108The toolkit is destined for processing of raw (not annotated)
109unrestricted text for any conceivable purpose.
110
111The system is organized as a collection of command-line programs, each
112performing one operation, e.g. tokenization, lemmatization, spelling
113correction. The components are independent one from another, the
114unifying element being the uniform i/o file format.
115
116The components may be combined in various ways to provide various text
117processing services. Also new components supplied by the used may be
118easily incorporated into the system provided that they respect the i/o
119file format conventions.
120
121UTT component programs does not depend on any specific tagset or
122morphological description format.
123
124UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
125the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
126
127The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use. 
128
129
130List of contributors:
131
132@itemize
133@item Pawel Konieczka
134@item Tomasz Obrębski
135@item Michał Stolarski
136@item Marcin Walas
137@item Justyna Walkowska
138@item Paweł Wereński
139@end itemize
140
141@c ----------------------------------------------------------------------
142@c ---------------------------------------------------------------------
143
144@node    UTT file format
145@chapter UTT file format
146
147A UTT file contains annotation of a text. It consists of a sequence of
148segments. Each segment explicitly refers to a continuous piece of the
149text and provides some information on it.
150
151@section Segment format
152
153A segment occupies one line of a UTT file and consists of
154space-separated fields:
155
156
157@quotation
158@sp 1
159[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
160@sp 1
161@end quotation
162
163@table @var
164
165@item @var{start}
166Non-negative integer value indicating the position in the source text where the
167segment starts.
168
169@item @var{length}
170Non-negative integer value indicating the length of the segment.
171
172@item @var{type}
173A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
174@var{type} reflects the main classification of segments -
175into words, numbers, punctuation marks, meta-text markers.
176@xref{tok output,,tok output}, for description of automatically recognized type markers.
177
178@item @var{form}
179This field contains the textual form of the segment or the special
180symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
181
182The characters or character sequences that have special meaning in the
183@var{form} field are enumerated below.
184
185Characters with special meaning:
186
187@itemize
188@item @code{_} - space character
189@item @code{*} - undefined contents
190@end itemize
191
192Escape sequences:
193
194@itemize
195@item @code{\n} - new line
196@item @code{\t} - tabulation
197@item @code{\r} - carriage return 
198
199@item @code{\_} - the @code{_} character
200@item @code{\*} - the @code{*} character
201@item @code{\\} - the @code{\} character
202
203@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
204@end itemize
205
206@item @var{annotation1}
207@item @var{annotation2}
208@item ...
209Annotation fields have the following format:
210
211@var{longname} @code{:} @var{value}
212
213or
214
215@var{shortname} @var{value}
216
217where @var{longname} is a string of alphanumeric characters
218(isalnum() test), @var{shortname} - a single non-alphanumeric character
219(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
220
221@end table
222
223
224Only two fields are mandatory: @var{type} and @var{form}. All other fields
225may be absent. In the case when only one number precedes the
226@var{type} field, it is interpreted as the @var{START} position.
227
228If the @var{length} field is ommited, the length of the segment is the
229length of the @var{form} field, except when the value of the
230@var{form} field is @code{*} -- in this case, the length is assumed to
231be 0.
232
233If the @var{start} field is also absent, the segment is assumed to directly
234follow the preceding one.
235
236@c Conventions:
237
238@c Annotation fields with predefined meaning:
239
240@c @itemize
241@c @item @code{!} - UTT components are allowed to modify the contents of
242@c the @var{form} field (e.g. spelling correction does this). If this happens the
243@c original form of the segment have to be placed in the @code{!}-field.
244@c @item @code{@@} - morphological description
245@c @item @code{=} - node identifier assignment (used in graph encoding)
246@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
247@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
248@c @end itemize
249
250Segments of length 0 may be used to mark file positions with some
251information. See e.g. BOS and EOS (beginning/end of sentence) markers
252in the example below.
253
254Example:
255
256sentence: @samp{Piszemy dobre progrumy.}
257
258@example
2590000 00 BOS *
2600000 07 W Piszemy lem:pisać,V
2610007 01 S _
2620008 05 W dobre lem:dobry,ADJ
2630013 01 S _
2640014 08 W progrumy cor:programy lem:program,N
2650022 01 P .
2660023 00 EOS *
2670023 01 S _
2680024 00 BOS *
2690024 11 W Warszawiacy lem:Warszawiak,N
2700035 01 S _
2710036 03 W teŌ
2720039 01 P .
2730040 00 EOS *
274
275@end example
276
277@example
2780000 BOS *
2790000 W Piszemy lem:pisać,V
2800007 S _
2810008 W dobre lem:dobry,ADJ
2820013 S _
2830014 W progrumy cor:programy lem:program,N
2840022 P .
2850023 EOS *
286@end example
287
288Posion information may be provided only for some types of segments:
289
290@example
2910000 BOS *
292W Piszemy lem:pisać‡,V
293S _
294W dobre lem:dobry,ADJ
295S _
296W progrumy cor:programy lem:program,N
297P .
298EOS *
299S _
3000024 BOS *
301W Warszawiacy lem:Warszawiak,N
302S _
303W teŌ
304P .
305EOS *
306@end example
307
308Position/length information may be provided only when necessary:
309
310@example
3110000 04 N *
3120000 N 12
313P .
314N 5
315S _
316W km
317@end example
318
319@section UTT File
320
321A UTT file consists of a sequence of segments.  The same text position
322may be covered by multiple segments. In cosequence, ambiguous text
323segmentation and ambiguous annotation may be represented.
324
325There are two structural requirements a valid UTT-formatted file
326has to meet:
327
328@itemize @bullet
329
330@item
331segments have to be sorted with respect to the @var{position} field,
332
333@item
334for each
335segment ending at position @var{n}, either there must be a segment starting at
336position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
337for each segment starting at position @var{n}, either there must be a segment
338ending at position @var{n-1}, or the position @var{n-1} must not be covered
339by any segment.
340
341@end itemize
342
343A valid annotation for the text fragment
344@example
34512.5 km
346@end example
347
348may be
349
350@example
3510000 02 N 12
3520000 04 N 12.5
3530002 01 P .
3540003 01 N 5
3550004 01 S _
3560005 02 W km
357@end example
358
359but not
360
361@example
3620000 02 N 12
3630000 04 N 12.5
3640004 01 S _
3650005 02 W km
366@end example
367
368because in the latter example the first segment (starting at position
3690000, 2 characters long) ends at position @var{n}=0001 which is
370covered by the second segment and no segment starts at position
371@var{n+2}=0002.
372
373
374@section Flattened UTT file
375
376A UTT file format has two variants: regular and flattened. The regular
377format was described above.  In the flattened format some of the
378end-of-line characters are replaced with line-feed characters.
379
380The flatten format is basically used to represent whole sentences as
381single lines of the input file (all intrasentential end-of-line
382characters are replaced with line-feed characters).
383
384This technical trick permits to perform certain text
385processing operations on entire sentences with the use of such tools as
386@command{grep} (see @command{grp} component) or @command{sed} (see  @command{mar} component).
387
388The conversion between the two formats is performed by the tools:
389@command{fla} and @command{unfla}.
390
391@section Character encoding
392
393The UTT component programs accept only 1-byte character encoding, such
394as ISO, ANSI, DOS.
395
396
397@c @section Formats
398
399@c @unnumberedsubsubsec Basic format
400
401@c While processing large amounts of the overhead related with explicit
402@c ... of the start position and segment length becomes ... . Therefore,
403@c for efficiency reasons certain shortcuts are possible:
404
405@c @unnumberedsubsubsec Relative start position
406
407@c Start position may be given as relative distance from the last
408@c absolut position.
409
410@c @unnumberedsubsubsec Absent length
411
412@c Segment length may by omitted. Normally it can be restored by counting
413@c the length of the @emph{form field}. For segments with the special value
414@c @code{*} in the @emph{form field} length 0 is assumed.
415
416@c @unnumberedsubsubsec Absent length and start position
417
418@c Both start position and segment length may be omitted. In this format
419@c each segment is assumed to follow the previous one. This format is,
420@c therefore, suitable only for unambiguously tagged text
421@c (0-length markers can be still used.)
422
423
424@c @table @code
425@c @item AL
426@c @code{1234 03 W kot}
427@c @item RL
428@c @code{+56 03 W kot}
429@c @item A
430@c @code{1234 W kot}
431@c @item R
432@c @code{+56 W kot}
433@c @item 0
434@c @code{W kot}
435@c @end table
436
437
438@c [JAK UZYSKAÆ POLSKIE CZCIONKI W DVI???]
439
440@macro parhelp
441@item @b{@minus{}@minus{}help}, @b{@minus{}h}
442Print help.
443@end macro
444
445
446@macro parversion
447@item @b{@minus{}@minus{}version}, @b{@minus{}V}
448Print version information.
449@end macro
450
451@macro parinteractive
452@item @b{@minus{}@minus{}interactive, @minus{}i}
453This option toggles interactive mode, which is by default off. In the
454interactive mode the program does not buffer the output.
455@end macro
456
457
458@c @macro parfile
459@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
460@c Input file name.
461@c If this option is absent or equal to '@minus{}', the program
462@c reads from the standard input.
463@c @end macro
464
465
466@c @macro paroutput
467@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
468@c Regular output file name. To regular output the program sends segments
469@c which it successfully processed and copies those which were not
470@c subject to processing. If this option is absent or equal to
471@c '@minus{}', standard output is used.
472@c @end macro
473
474@c @macro parfail
475@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
476@c Fail output file name. To fail output the program copies the segments
477@c it failed to process.  If this option is absent or equal to
478@c '@minus{}', standard output is used.
479@c @end macro
480
481
482@c @macro parcopy
483@c @item @b{@minus{}@minus{}copy, @minus{}c}
484@c Copy succesfully processed segments to regular output also in their
485@c original input form.
486@c @end macro
487
488
489@macro parinputfield
490@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
491The field containing the input to the program. The default is the
492@var{form} field. The fields @var{position}, @var{length}, @var{type},
493and @var{form} are referred to as @code{1}, @code{2}, @code{3},
494@code{4}, respectively.
495@end macro
496
497
498@macro paroutputfield
499@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
500The name of the field added by the program. The default is the name of the program.
501@end macro
502
503
504@macro pardictionary
505@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
506Dictionary file name.
507@end macro
508
509
510@macro parprocess
511@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
512Process segments with the specified value in the @var{type} field.
513Multiple occurences of this option are allowed and are interpreted as
514disjunction. If this option is absent, all segments are processed.
515@end macro
516
517
518@macro parselect
519@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
520Select for processing only segments in which the field named
521@var{fieldname} is present. Multiple occurences of this option are
522allowed and are interpreted as conjunction of conditions. If this
523option is absent, all segments are processed.
524@end macro
525
526
527@macro parunselect
528@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
529Select for processing only segments in which the field @var{fieldname}
530is absent.  Multiple occurences of this option are allowed and are
531interpreted as conjunction of conditions. If this option is absent,
532all segments are processed.
533@end macro
534
535
536@macro paroneline
537@item @b{@minus{}@minus{}one-line}
538This option makes the program print ambiguous annotation in one output
539line by generating multiple annotation fields. By default when
540ambiguous annotation may be produced for a segment, the segment is
541multiplicated and each of the annotations is added to separate copy of
542the segment.
543@end macro
544
545
546@macro paronefield
547@item @b{@minus{}@minus{}one-field, @minus{}1}
548This option makes the program print ambiguous annotation in one
549annotation field. By default when ambiguous annotation may be produced
550for a segment, the segment is multiplicated and each of the
551annotations is added to separate copy of the segment.
552
553This option is useful when working with @command{kot} or @command{con}.
554@end macro
555
556
557@c ---------------------------------------------------------------------
558@c CONFIGURATION FILES
559@c ---------------------------------------------------------------------
560
561@node    Configuration files
562@chapter Configuration files
563
564Values for all command line options accepted by a component
565may be set in configuration files. The default location of the
566configuration files for a component named @command{@var{program}} are
567
568@example
569        @file{/usr/local/etc/utt/@var{program}.conf}
570@end example
571
572for system-wide configuration file and
573
574@example
575        @file{~/.utt/@var{program}.conf}
576@end example
577
578for user configuration file.
579
580@c The configuration file to load may be also specified with the
581@c @option{--config} option. Configuration file need not be provided.
582
583For each option, the value is set according to the following priority:
584
585@itemize
586@item command line
587@c @item configuration file indicated with @option{--config} option
588@item user configuration file (or configuration file indicated with the @option{--config} option)
589@item system-wide configuration file
590@end itemize
591
592Parameter values are specified in the following format:
593
594@var{parametername}=@var{value}
595
596where @var{parametername} is the short or long name of an option accepted by
597the program, or
598
599@var{parametername}
600
601if the option does not need arguments.
602
603You can introduce comments to configuration files using the # sign.
604
605If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
606
607@c The equal sign may be omitted.
608
609
610@quotation Tip
611If you have two (or more) frequently used sets of options for the same
612program (eg. lem with PMDBF dictionary and lem with a user dictionary)
613a good solution is to create two soft links to lem, called
614eg. lemg and lemu and specify their configuration in files lemg.conf
615and lemu.conf respectively.
616@end quotation
617
618@c ---------------------------------------------------------------------
619@c COMPONENTS
620@c ---------------------------------------------------------------------
621
622@node UTT components
623@chapter UTT components
624
625UTT components are of three types:
626
627@menu
628Sources: programs which read non-UTT data (e.g. raw text) and produce output
629in UTT format
630* tok::         a tokenizer
631
632Filters: programs which read and produce UTT-formatted data
633* lem::         a morphological analyzer
634* gue::         a morphological guesser
635* cor::         a simple spelling corrector
636* kor::         a more elaborated spelling corrector
637* sen::         a sentensizer
638* ser::         a pattern search tool (marks matches)
639* mar::         a pattern search tool (introduces arbitrary markers into the text)
640* grp::         a pattern search tool (selects sentences containing a match)
641@c * gph::         a word-graph annotation tool::
642@c * dgp::         a dependency parser
643
644Sinks: programs which read UTT data and produce output in another format
645* kot::         an untokenizer
646* con::         a concordance table generator
647@end menu
648
649@c ---------------------------------------------------------------------
650@c TOK
651@c ---------------------------------------------------------------------
652
653@page
654@node tok
655@section tok - a tokenizer
656
657@c ----------------------------------------
658
659@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
660@item @strong{Authors:}                 @tab Tomasz Obrębski
661@item @strong{Component category:}      @tab source
662@item @strong{Input format:}            @tab raw text file
663@item @strong{Output format:}           @tab UTT regular
664@item @strong{Required annotation:}     @tab -
665@end multitable
666
667
668@menu
669* tok description::
670* tok input::
671* tok output::
672* tok command line options::
673* tok example::
674@end menu
675
676@node tok description
677@subsection Description
678
679@code{tok} is a simple program which reads a text file and identifies
680tokens on the basis of their orthographic form.  The type of the token
681is printed as the @var{type} field.
682
683@node tok input
684@subsection Input
685
686Raw text.
687
688@node tok output
689@subsection Output
690
691UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
692
693@itemize
694
695@item @code{W}
696(word)
697- continuous sequence of letters
698
699@item @code{N}
700(number)
701- continuous sequence of digits
702
703@item @code{S}
704(space)
705- continuous sequence of space characters
706
707@item @code{P}
708(punctuation mark)
709- single printable characters not belonging to any of the other classes
710
711@item @code{B}
712(unprintable character)
713- single unprintable character
714
715@end itemize
716
717
718
719@node tok command line options
720@subsection Command line options
721
722@table @code
723
724@item @b{@minus{}@minus{}help}, @b{@minus{}h}
725Print help.
726
727@item @b{@minus{}@minus{}version}, @b{@minus{}V}
728Print version information.
729
730@item @b{@minus{}@minus{}interactive, @minus{}i}
731This option toggles interactive mode, which is by default off. In the
732interactive mode the program does not buffer the output.
733
734@end table
735
736@node tok example
737@subsection Example
738
739Input:
740
741@example
742Piszemy dobre programy.
743@end example
744
745Output:
746
747@example
7480000 07 W Piszemy
7490007 01 S _
7500008 05 W dobre
7510013 01 S _
7520014 08 W programy
7530022 01 P .
7540023 01 S \n
755@end example
756
757
758@c ---------------------------------------------------------------------
759@c SEN
760@c ---------------------------------------------------------------------
761
762@c @node sen - sentencizer
763@c @chapter sen - sentencizer
764
765@c Authors: Tomasz Obrębski
766
767@c ---------------------------------------------------------------------
768@c LEM
769@c ---------------------------------------------------------------------
770
771@page
772@node lem
773@section lem - morphological analyzer
774
775@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
776@item @strong{Authors:}                 @tab Tomasz Obrębski, Michał Stolarski
777@item @strong{Component category:}      @tab filter
778@item @strong{Input format:}            @tab UTT regular
779@item @strong{Output format:}           @tab UTT regular
780@item @strong{Required annotation:}     @tab tok
781@end multitable
782
783@menu
784* lem description::             
785* lem command line options::   
786* lem input::
787* lem output::
788* lem example::                 
789* lem dictionaries::           
790* lem hints::           
791@end menu
792
793@node lem description
794@subsection Description
795
796@command{lem} performs morphological analysis of a simple orthographic
797word, returning all its possible morphological annotations,
798disregarding the context.
799
800@c ----------------------------------------
801
802@node lem command line options
803@subsection Command line options
804
805@table @code
806@parhelp
807@parversion
808@parinteractive
809@c @parfile
810@c @paroutput
811@c @parfail
812@c @parcopy
813@parinputfield
814@paroutputfield
815@pardictionary
816@parprocess
817@parselect
818@parunselect
819@paroneline
820@paronefield
821@end table
822
823@c ----------------------------------------
824
825@node lem input
826@subsection Input
827
828Lem reads a UTT file and processes the value of the @var{form} field
829(the input field may be changed with @option{--input-field} option).
830
831@node lem output
832@subsection Output
833
834@command{lem} adds a new annotation field, whose default name is @code{lem}.  In
835case of ambiguity either the segment is multiplicated (default),
836multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
837annotation is produced as the value of single @code{lem} field (option
838@option{--one-field,-1}):
839
840@itemize @bullet
841
842@item
843unambiguous value format:
844
845@example
846   <lemma>,<descr>
847@end example
848
849@item
850ambiguous value format (@option{--one-field} option)
851
852
853@example
854   <lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
855@end example
856
857(alternative descriptions for the same lemma are separated by commas,
858alternative lemmata are separated by semicolons.)
859
860@end itemize
861
862@node lem example
863@subsection Example
864
865Input:
866
867@example
8680000 07 W Piszemy
8690007 01 S _
8700008 05 W dobre
8710013 01 S _
8720014 08 W programy
8730022 01 P .
8740023 01 B \n
875@end example
876
877Output (default):
878
879@example
8800000 07 W Piszemy lem:pisać,V/AiVpMdTrfNpP1
8810007 01 B _
8820008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
8830008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
8840013 01 B _
8850014 08 W programy lem:program,N/GiNpCa
8860014 08 W programy lem:program,N/GiNpCn
8870014 08 W programy lem:program,N/GiNpCv
8880022 01 P .
8890023 01 B \n
890@end example
891
892Output (@option{--one-line} option):
893
894@example
8950000 07 W Piszemy lem:pisać,V/AiVpMdTrfNpP1
8960007 01 S _
8970008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
8980013 01 S _
8990014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
9000022 01 P .
9010023 01 S \n
902@end example
903
904Output (@option{--one-field} option):
905
906@example
9070000 07 W Piszemy lem:pisać,V/AiVpMdTrfNpP1
9080007 01 S _
9090008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
9100013 01 S _
9110014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
9120022 01 P .
9130023 01 S \n
914@end example
915
916@c ----------------------------------------
917
918@node lem dictionaries
919@subsection Dictionaries
920
921@command{lem} requires a dictionary. The dictionary may be provided in
922one of two formats: in text (source) format or in binary (fsa) format.
923
924@subsubheading Text format
925
926Dictionary entries have the following structure:
927
928@example
929<form>;<lemma>,<descr>[;<lemma>,<descr>]
930@end example
931
932@var{lemma} may be given explicitly or in the cut-add format:
933
934@example
935@code{[<cut1><add1>-]<cut2><add2>}
936@end example
937
938meaning: replace prefix of length @code{<cut1>} with
939string @code{<add1>}, replace suffix of length @code{<cut2>} with string
940@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
941@samp{kot}, @code{3-4aÂły} transforms @samp{najbielsi} into @samp{biaÂły}
942
943Each dictionary entry must be written in one line and must not contain blank characters.
944
945Examples:
946@example
947kot;0,N/GaNsCn
948kota;1,N/GaNsCg;1,N/GaNsCa
949kotu;1,N/GaNsCd
950kotem;2,N/GaNsCi
951kocie;3t,N/GaNsCl;3t,N/GaNsCv
952najbielsi;3-4ały,ADJ/DsNpCnGp
953najbielsze;3-5ały,ADJ/DsNpCnGaifn
954najlepsi;dobry,ADJ/DsNpCnGp
955najlepsze;dobry,ADJ/DsNpCnGaifn
956@end example
957
958
959The mandatory file name extension for a text dictionary is @code{dic}. For large
960dictionaries it is preferable, however, to compile them into binary
961(fsa) format.
962
963@subsubheading Binary format
964
965The mandatory file name extension for a binary dictionary is @code{bin}. To
966compile a text dictionary into binary format, write:
967
968@example
969compiledic <dictionaryname>.dic
970@end example
971
972@subsubheading Polex/PMDBF dictionary
973
974A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
975the distribution as the default @emph{lem}'s dictionary. It's
976located by default in:
977
978@file{$HOME/.local/share/utt/pl_PL.ISO-8859-2/lem.bin}
979
980in local installation or in
981
982@file{/usr/local/share/utt/pl_PL.ISO-8859-2/lem.bin}
983
984in system installation.
985
986@node lem hints
987@subsection Hints
988
989@subsubheading Combining data from multiple dictionaries
990
991@itemize
992
993@item Apply <dict1>, then apply <dict2> to words which were not annotatated.
994
995@example
996lem -d <dict1> | lem -S lem -d <dict2>
997@end example
998
999@item Add annotations from two dictionaries <dict1> and <dict2>.
1000
1001@example
1002lem -c -d <dict1> | lem -S lem -d <dict2>
1003@end example
1004
1005@end itemize
1006
1007
1008@c ---------------------------------------------------------------------
1009@c GUE
1010@c ---------------------------------------------------------------------
1011
1012@page
1013@node gue
1014@section gue - morphological guesser
1015
1016@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1017
1018@item @strong{Authors:}                 @tab Michał Stolarski, Tomasz Obrębski
1019@item @strong{Component category:}      @tab filter
1020
1021@end multitable
1022
1023@menu
1024* gue description::   
1025* gue command line options::   
1026* gue example::                 
1027* gue dictionaries::           
1028@end menu
1029
1030
1031@node gue description
1032@subsection Description
1033
1034@command{gue} guesess morphological descriptions of the form contained
1035in the @var{form} field.
1036
1037
1038@node gue command line options
1039@subsection Command line options
1040
1041@table @code
1042
1043@parhelp
1044@parversion
1045@parinteractive
1046@c @parfile
1047@c @paroutput
1048@c @parfail
1049@c @parcopy
1050@parinputfield
1051@paroutputfield
1052@pardictionary
1053@parprocess
1054@parselect
1055@parunselect
1056@paroneline
1057@paronefield
1058
1059@item @b{@minus{}@minus{}delta=@var{n}}
1060Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
1061
1062
1063@item @b{@minus{}@minus{}cut-off=@var{n}}
1064Do not display answers with less weight than cut-off value (default=`200').
1065
1066
1067@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
1068Guess up to n descriptions  (default=`0', which means 'display all results').
1069
1070
1071
1072@end table
1073
1074@node gue example
1075@subsection Example
1076
1077@example
1078command: gue -n 2
1079
1080input:
10810000 07 W smerfny
1082
1083output:
10840000 07 W smerfny gue:,ADJ/CaDpGiNs
10850000 07 W smerfny gue:,ADJ/CnvDpGaipNs
1086@end example
1087                                 
1088
1089@node gue dictionaries
1090@subsection Dictionaries
1091
1092@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
1093The fsa format is created by compiling text-format dictionaries.
1094
1095
1096
1097@subsubheading Text format
1098
1099Dictionary entries have the following structure:
1100
1101@example
1102@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
1103@end example
1104
1105@var{lemma} must be given in the cut-add format:
1106
1107@example
1108@code{[<cut1><add1>-]<cut2><add2>}
1109@end example
1110(no spaces in between): replace prefix of length @var{cut1} with
1111string @var{add1}, replace suffix of length @var{cat2} with string
1112@var{add2}.
1113
1114
1115Example: @code{3-4ały} transforms @i{najbielsi} into @i{biały}
1116
1117
1118@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
1119
1120@var{weight} is an integer value between 1 and 999 indicating the
1121likelihood of the guess.
1122
1123@c @example
1124@c *łkę;1a,N/GfNsCa
1125@c naj*elszy;3-4ały,ADJ/...:...
1126@c @end example
1127
1128
1129@c ---------------------------------------------------------------------
1130@c COR
1131@c ---------------------------------------------------------------------
1132
1133@page
1134@node cor
1135@section cor - spelling corrector
1136
1137@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1138@item @strong{Authors:}                 @tab Tomasz Obrębski, Michał Stolarski
1139@item @strong{Component category:}      @tab filter
1140@item @strong{Input format:}            @tab UTT regular
1141@item @strong{Output format:}           @tab UTT regular
1142@item @strong{Required annotation:}     @tab tok
1143@end multitable
1144
1145@menu
1146* cor description::
1147* cor command line options::   
1148* cor dictionaries::           
1149@end menu
1150
1151
1152@node cor description
1153@subsection Description
1154
1155The spelling corrector applies Kemal Oflazer's dynamic programming
1156algorithm @cite{oflazer96} to the FSA representation of the set of
1157word forms of the Polex/PMDBF dictionary. Given an incorrect
1158word form it returns all word forms present in the dictionary whose
1159edit distance is smaller than the threshold given as the parameter.
1160
1161
1162@node cor command line options
1163@subsection Command line options
1164
1165@table @code
1166
1167@parhelp
1168@parversion
1169@parinteractive
1170@c @parfile
1171@c @paroutput
1172@c @parfail
1173@c @parcopy
1174@parinputfield
1175@paroutputfield
1176@pardictionary
1177@parprocess
1178@parselect
1179@parunselect
1180@paroneline
1181@paronefield
1182
1183@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
1184Maximum edit distance (default='1').
1185
1186@c @item @b{@minus{}@minus{}replace, @minus{}r}
1187@c Replace original form with corrected form, place original form in the
1188@c cor field. This option has no effect in @option{--one-*} modes (default=off)
1189
1190
1191@end table
1192
1193@node cor dictionaries
1194@subsection Dictionaries
1195
1196@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
1197The fsa format is created by compiling text-format dictionaries.
1198
1199@subsubheading Text format
1200
1201The @command{cor} dictionary is a list of words:
1202@example
1203odlot
1204odlotowy
1205odludek
1206@end example
1207
1208@subsubheading Binary format
1209
1210The mandatory file name extension for a binary dictionary is @code{bin}. To
1211compile a text dictionary into binary format, write:
1212
1213@example
1214compiledic <dictionaryname>.dic
1215@end example
1216
1217@c ---------------------------------------------------------------------
1218@c KOR
1219@c ---------------------------------------------------------------------
1220
1221@page
1222@node kor
1223@section kor - configurable spelling corrector
1224
1225@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1226@item @strong{Authors:}                 @tab Paweł Werenski, Tomasz Obrębski, Michał Stolarski
1227@item @strong{Component category:}      @tab filter
1228@item @strong{Input format:}            @tab UTT regular
1229@item @strong{Output format:}           @tab UTT regular
1230@item @strong{Required annotation:}     @tab tok
1231@end multitable
1232
1233@menu
1234* kor description::
1235* kor command line options::
1236* kor weights definition file::   
1237* kor dictionaries::           
1238@end menu
1239
1240
1241@node kor description
1242@subsection Description
1243
1244The spelling corrector applies a Pawel Werenski's dynamic programming
1245algorithm to the FSA representation of the set of word forms of the
1246Polex/PMDBF dictionary. The algorithm is an extension of K. Oflazer
1247algorithm used by @command{cor}. In the extended version it is
1248possible to assign weights to individual edit operations.
1249
1250Given an incorrect word form it returns all word forms
1251present in the dictionary whose edit distance is smaller than the
1252threshold given as the parameter.
1253
1254
1255@node kor command line options
1256@subsection Command line options
1257
1258@table @code
1259
1260@parhelp
1261@parversion
1262@parinteractive
1263@c @parfile
1264@c @paroutput
1265@c @parfail
1266@c @parcopy
1267@parinputfield
1268@paroutputfield
1269@pardictionary
1270@parprocess
1271@parselect
1272@parunselect
1273@paroneline
1274@paronefield
1275
1276@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
1277Maximum edit distance (default='1').
1278
1279@item @b{@minus{}@minus{}weights=@var{filename}, @minus{}w @var{filename}}
1280Edit operations' weights file.
1281
1282@c @item @b{@minus{}@minus{}replace, @minus{}r}
1283@c Replace original form with corrected form, place original form in the
1284@c cor field. This option has no effect in @option{--one-*} modes (default=off)
1285
1286
1287@end table
1288
1289
1290@node kor weights definition file
1291@subsection Weights definition file
1292
1293Example:
1294
1295@example
1296
1297%stdcor 1
1298%xchg   1
1299ÅŒ  rz 0.5
1300ch h  0.5
1301u  ó  0.5
1302
1303@end example
1304
1305
1306Default weight is set to 1 (@code{%stdcor 1}), the weight of exchange
1307operation is set to 1 (@code{%xchg 1}), the three principal orthographic
1308errors are assigned the weight 0.5.
1309
1310The edit operation weight declaration, such as
1311
1312@example
1313ÅŒ  rz 0.5
1314@end example
1315
1316works in both ways, i.e. Ō->rz, rz->Ō.
1317
1318The default weights definition file for @code{kor} is:
1319
1320@example
1321$HOME/.local/share/utt/weights.kor
1322@end example
1323
1324or, if the above mentioned file is absent:
1325
1326@example
1327/usr/local/share/utt/weights.kor
1328@end example
1329
1330
1331@node kor dictionaries
1332@subsection Dictionaries
1333
1334see @command{cor}
1335
1336@c ---------------------------------------------------------------------
1337@c SEN
1338@c ---------------------------------------------------------------------
1339
1340@page
1341@node sen
1342@section sen - a sentensizer
1343
1344@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1345
1346@item @strong{Authors:}                 @tab Tomasz Obrębski
1347@item @strong{Component category:}      @tab filter
1348@item @strong{Input format:}            @tab UTT regular
1349@item @strong{Output format:}           @tab UTT regular
1350@item @strong{Required annotation:}     @tab tok
1351
1352@end multitable
1353
1354
1355@menu
1356* sen description::
1357@c * sen input::
1358@c * sen output::
1359* sen example::                 
1360@end menu
1361
1362@node sen description
1363@subsection Description
1364
1365@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
1366
1367@node sen example
1368@subsection Example
1369
1370@example
1371command: sen
1372
1373input:
13740000 05 W Cześć
13750005 01 P !
13760006 01 S _
13770007 02 W To
13780009 01 S _
13790010 02 W ja
13800012 01 P .
13810013 01 S \n
1382
1383output:
13840000 00 BOS *
13850000 05 W Cześć
13860005 01 P !
13870006 00 EOS *
13880006 00 BOS *
13890006 01 S _
13900007 02 W To
13910009 01 S _
13920010 02 W ja
13930012 01 P .
13940013 01 S \n
13950014 00 EOS *
1396@end example
1397
1398
1399@c ---------------------------------------------------------------------
1400@c GPH
1401@c ---------------------------------------------------------------------
1402
1403@c @node gph - graphizer
1404@c @chapter gph - graphizer
1405
1406@c Authors: Tomasz Obrębski
1407
1408
1409
1410@c ---------------------------------------------------------------------
1411@c SER
1412@c ---------------------------------------------------------------------
1413
1414@page
1415@node ser
1416@section ser - pattern search tool
1417
1418@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1419@item @strong{Authors:}                 @tab Tomasz Obrębski
1420@item @strong{Component category:}      @tab filter
1421@item @strong{Input format:}            @tab UTT regular
1422@item @strong{Output format:}           @tab UTT regular
1423@item @strong{Required annotation:}     @tab tok,  lem --one-field
1424@end multitable
1425
1426@menu
1427* ser description::
1428* ser command line options::   
1429* ser pattern::                 
1430* ser how ser works::           
1431* ser customization::           
1432* ser limitations::             
1433* ser requirements::           
1434@end menu
1435
1436
1437@node ser description
1438@subsection Description
1439
1440@command{ser} looks for patterns in UTT-formatted texts.
1441
1442
1443@c ---------------------------------------------------------------------
1444@node ser command line options
1445@subsection Command line options
1446
1447@table @code
1448
1449@parhelp
1450@parversion
1451@c @parfile
1452@c @paroutput
1453@c @parinputfield
1454@c @paroutputfield
1455@parprocess
1456@parinteractive
1457
1458@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1459The search pattern.
1460
1461@item @b{@minus{}@minus{}morph=@var{field}}
1462The name of the annotation field containing the morphological
1463description (default @code{lem}).
1464
1465@item @b{@minus{}@minus{}flex}
1466Only print the generated flex source code.
1467
1468@item @b{@minus{}@minus{}macro=@var{filename}}
1469Read macrodefinitions from file @var{filename} rather than from
1470default location. This option allows to redefine the set of terms.
1471
1472@item @b{@minus{}@minus{}define=@var{filename}}
1473Append macrodefinitions from file @var{filename}. This option
1474allows to extend the set of terms.
1475
1476@end table
1477
1478
1479@c ---------------------------------------------------------------------
1480@node ser pattern
1481@subsection Pattern
1482
1483The @command{ser} pattern is a regular expression over terms corresponding
1484to text segments or segment sequences. Predefined terms are:
1485
1486@table @code
1487
1488@item seg(@var{t},@var{f},@var{a})
1489a segment of type @var{t}, containing form @var{f} and annotation
1490@var{a}
1491
1492@item form(@var{f})
1493a segment containing form @var{f}
1494
1495@item field(@var{f})
1496a segment containing annotation field @var{f}
1497
1498@item space(@var{f})
1499a space segment of form @var{f}
1500
1501@item word(@var{f})
1502a word segment of form @var{f}
1503
1504@item punct(@var{f})
1505a punct segment of form @var{f}
1506
1507@item number(@var{f})
1508a number segment of form @var{f}
1509
1510@item lexeme(@var{f})
1511a word segment with lemma @var{f}
1512
1513@item cat(@var{c})
1514a word segment of category @var{c}
1515
1516@end table
1517
1518All arguments are optional. If an argument is omitted, an arbitrary
1519string of non-blank characters is assumed as the argument value. Term
1520arguments may be arbitrary character-level regular expressions. The
1521following special symbols can by used:
1522
1523@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1524@item @code{[@dots{}]}            @tab a character class
1525@item @code{[^@dots{}]}           @tab a negated character class
1526@item @code{|}                    @tab alternative
1527@item @code{*}                    @tab repetition, including zero times
1528@item @code{+}                    @tab repetition, at least one time
1529@item @code{?}                    @tab optionality
1530@item @code{@{@var{m},@var{n}@}}  @tab repetition from @var{m} to @var{n} times
1531@item @code{@{@var{m},@}}         @tab repetition @var{m} or more times
1532@item @code{@{@var{m}@}}          @tab repetition @var{m} times
1533@item @code{@var{\ddd}}           @tab the character with octal value @var{ddd}
1534@item @code{\x@var{hh}}           @tab the character with hexadecimal value @var{hh}
1535@item @code{( )}                  @tab parentheses, used to override precedence
1536@c @end multitable
1537
1538@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1539@item @code{.}    @tab a non-blank character
1540@item @code{\w}   @tab a letter
1541@item @code{\W}   @tab a non-blank character other than a letter
1542@item @code{\d}   @tab a digit
1543@item @code{\D}   @tab a non-blank character other than a digit
1544@item @code{\s}   @tab a space or tab character
1545@item @code{\S}   @tab a non-blank character (the same as @code{.})
1546@item @code{\l}   @tab a lowercase letter
1547@item @code{\L}   @tab an uppercase letter
1548@end multitable
1549
1550
1551@noindent The following characters:
1552@example
1553@verb{%  [   ]   ^   |   *   +   ?   {   }   ,   .   <   >   \ %}
1554@end example
1555must be escaped with a backslash, i.e. written as:
1556@example
1557@verb{% \[  \]  \^  \|  \*  \+  \?  \{  \}  \,  \.  \<  \>  \\ %}
1558@end example
1559
1560@quotation Note
1561The special symbols are ... borrowed from Perl with minor
1562modifications ... for convenience
1563The meaning of certain special characters/sequences slightly differs
1564from their common ???. This is motivated by convenience reasons.
1565The meaning of the @code{.} special character is modified due to
1566the special function of spaces in utt files (they are field
1567separators). Use @code{\s} to explicitly
1568@end quotation
1569
1570In the argument of the @code{cat} term a special operator <...> may be
1571used. A category specification enclosed in angle brackets matches all
1572category descriptions which are consistent (non-contradictory) with the
1573specification. For example @code{<N>} matches all noun descriptions,
1574@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
1575
1576
1577@*
1578@noindent @b{Examples of one-segment patterns:}
1579
1580@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1581@item @code{seg}            @tab any segment
1582@item @code{word}           @tab any word-form
1583@item @code{word(pomocy)}   @tab the word-form @samp{pomocy}
1584@item @code{word(naj.+)}    @tab a word-form beginning with @samp{naj}
1585@item @code{word(\L\l+)}    @tab a capitalized word-form
1586@item @code{punct}          @tab a punctuation character
1587@item @code{space(.*\\n.*)} @tab a space segment containing a newline character
1588@item @code{lexeme(pomoc)}  @tab any form of the lexeme 'pomoc'
1589@item @code{cat(N/.*)}      @tab a word which category starts with @code{N/}
1590@item @code{cat(<N/Ca>)}    @tab a word which category matches @code{N/Ca}
1591@end multitable
1592
1593@*
1594@noindent @b{Examples of multi-segment patterns:}
1595
1596@table @code
1597
1598@item (word(\L) punct(\.) space?)+ word(\L\l+)
1599a sequence of initials followed by a surname
1600
1601@item punct seg(W|S|N)* cat(<NPRO/Sr>) seg(W|S|N)* punct
1602a text fragment between two punctuation characters, containing an
1603ocurrence of a relative pronoun
1604
1605@end table
1606
1607
1608@node ser how ser works
1609@subsection How ser works
1610
1611@node ser customization
1612@subsection Customization
1613
1614@c All predefined terms correspond to single segments,
1615
1616@example
1617define(`verbseq', `(cat(<V>) (space cat(<V>)))')
1618@end example
1619
1620
1621the term @code{cat()} may not be used as a ... of
1622
1623@c See @command{m4} manual for further details on macro definition format.
1624
1625@node ser limitations
1626@subsection Limitations
1627
1628Do not use more than 3 attributes in <>.
1629
1630@node ser requirements
1631@subsection Requirements
1632
1633In order to run @command{ser}, the following programs must be
1634installed in the system:
1635
1636@itemize
1637
1638@item @command{m4}
1639@item @command{grep}
1640@item @command{flex}
1641@item @command{gcc}
1642
1643@end itemize
1644
1645
1646@c ---------------------------------------------------------------------
1647@c GRP
1648@c ---------------------------------------------------------------------
1649
1650@page
1651@node grp
1652@section grp - pattern search tool
1653
1654@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1655@item @strong{Authors:}                 @tab Tomasz Obrębski
1656@item @strong{Component category:}      @tab filter
1657@item @strong{Input format:}            @tab UTT flattened
1658@item @strong{Output format:}           @tab UTT flattened
1659@item @strong{Required annotation:}     @tab tok, sen, lem --one-field
1660@end multitable
1661
1662
1663@menu
1664* grp description::
1665* grp command line options::   
1666* grp pattern::                 
1667* grp hints::   
1668@end menu
1669
1670
1671@node grp description
1672@subsection Description
1673
1674@code{gre} selects sentences containing an expression matching a
1675pattern. The pattern format is exactly the same as that accepted by
1676@code{ser}.
1677
1678@code{gre} is intended mainly for speeding up corpus search process.
1679It is extremely fast (processing speed is usually higher then the speed
1680of reading the corpus file from disk).
1681
1682@node grp command line options
1683@subsection Command line options
1684
1685@table @code
1686
1687@parhelp
1688@parversion
1689@parprocess
1690@parinteractive
1691
1692@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1693The search pattern.
1694
1695@item @b{@minus{}@minus{}morph=@var{field}}
1696The name of the annotation field containing the morphological
1697description (default @code{lem}).
1698
1699@item @b{@minus{}@minus{}command}
1700Only print the generated flex source code.
1701
1702@item @b{@minus{}@minus{}macro=@var{filename}}
1703Read macrodefinitions from file @var{filename} rather than from
1704default location. This option allows to redefine the set of terms.
1705
1706@item @b{@minus{}@minus{}define=@var{filename}}
1707Append macrodefinitions from file @var{filename}. This option
1708allows to extend the set of terms.
1709
1710@end table
1711
1712
1713@node grp pattern
1714@subsection Pattern
1715
1716(see @code{ser})
1717
1718@node grp hints
1719@subsection Hints
1720
1721The corpus search speed may be increased by combining grp with lzop
1722compression tool (grp usually processes data faster than it is read from a
1723disk, especially for slow laptop drives).
1724
1725@example
1726cat corpus | tok | sen | lem -1 | fla | lzop -7 > corpus.grp.lzo
1727@end example
1728
1729@example
1730lzop -cd corpus.grp.lzo | grp -e @var{EXPR} | unfla | ser -e @var{EXPR}
1731@end example
1732
1733
1734
1735@c ---------------------------------------------------------------------
1736@c MAR
1737@c ---------------------------------------------------------------------
1738
1739@page
1740@node mar
1741@section mar
1742
1743@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1744@item @strong{Authors:}                 @tab Marcin Walas, Tomasz Obrębski
1745@item @strong{Input format:}            @tab UTT flattened
1746@item @strong{Output format:}           @tab UTT flattened
1747@item @strong{Required annotation:}     @tab tok, sen, lem -1
1748@end multitable
1749
1750[TODO]
1751
1752(see mar's help 'mar -h' for some information)
1753
1754@c ---------------------------------------------------------------------
1755@c KOT
1756@c ---------------------------------------------------------------------
1757
1758
1759@page
1760@node kot
1761@section kot - untokenizer
1762
1763@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1764@item @strong{Authors:}                 @tab Tomasz Obrębski
1765@item @strong{Component category:}      @tab filter
1766@item @strong{Input format:}            @tab UTT regular
1767@item @strong{Output format:}           @tab text
1768@item @strong{Required annotation:}     @tab tok
1769@end multitable
1770
1771
1772@menu
1773* kot description::
1774* kot command line options::   
1775* kot usage examples::   
1776@end menu
1777
1778@node kot description
1779@subsection Description
1780
1781@command{kot} transforms a UTT formatted file back into raw text format.
1782
1783@node kot command line options
1784@subsection Command line options
1785
1786@table @code
1787
1788@parhelp
1789
1790@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1791
1792@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1793
1794@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1795
1796@c @item @b{@minus{}@minus{}interactive @minus{}i}
1797
1798@c @item @b{@minus{}@minus{}config=@var{filename}}
1799
1800@item
1801
1802@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
1803print @var{string} between nonadjacent segments of the input file
1804
1805@item @b{@minus{}@minus{}spaces, @minus{}r}
1806retain the special characters @code{_}, @code{\t},
1807@code{\n}, @code{\r}, @code{\f} unexpanded in the output
1808
1809@end table
1810
1811@node kot usage examples
1812@subsection Usage examples
1813
1814@example
1815cat legia.txt | tok | kot       
1816@end example
1817
1818@example
1819cat legia.txt | tok | lem -1 | kot
1820@end example
1821
1822@c ---------------------------------------------------------------
1823@c CON
1824@c ---------------------------------------------------------------
1825
1826
1827@page
1828@node con
1829@section con - concordance table generator
1830
1831@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1832@item @strong{Authors:}                 @tab Justyna Walkowska
1833@item @strong{Component category:}      @tab sink
1834@item @strong{Input format:}            @tab UTT regular
1835@item @strong{Output format:}           @tab text
1836@item @strong{Required annotation:}     @tab ser or mar
1837@end multitable
1838@c
1839
1840@menu
1841* con description::
1842* con command line options::
1843* con usage example::
1844* con hints::   
1845@end menu
1846
1847
1848@node con description
1849@subsection Description
1850
1851@command{con} generates a concordance table based on a pattern given to @command{ser}.
1852
1853
1854@node con command line options
1855@subsection Command line options
1856
1857@table @code
1858
1859@parhelp
1860
1861@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
1862@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1863@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1864@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1865@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
1866@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
1867@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
1868@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
1869@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
1870@c @item @b{@minus{}@minus{}interactive @minus{}i}
1871@c @item @b{@minus{}@minus{}config=@var{filename}}
1872@c @item
1873@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1874@c search pattern
1875@c
1876@c @item @b{@minus{}@minus{}flex}
1877@c only print the generated flex source code
1878@c
1879@c @item @b{@minus{}@minus{}macro=@var{filename}}
1880@c read macrodefinitions from file @var{filename} rather than from
1881@c default location. This option allows to redefine the set of terms.
1882@c
1883@c @item @b{@minus{}@minus{}define=@var{filename}}
1884@c append macrodefinitions from file @var{filename}. This option
1885@c allows to extend the set of terms.
1886
1887@item @b{@minus{}@minus{}left @minus{}l}           
1888        Left context info (default='30c'). Example:
1889@example                         
1890                                 -l=5c: left context is 5 characters
1891                                 -l=5w: left context is 5 words
1892                                 -l=5s: left context is 5 non-empty input lines
1893                                 -l='\s*\S+\sr\S+BOS': left context starts with the given regex
1894@end example
1895
1896@item @b{@minus{}@minus{}right @minus{}r}           
1897        Right context info (default='30c').
1898@item @b{@minus{}@minus{}trim @minus{}t}           
1899        Clear incomplete words from output.
1900@item @b{@minus{}@minus{}white @minus{}w}           
1901        DO NOT change all white characters into spaces.
1902@item @b{@minus{}@minus{}column @minus{}c}           
1903        Left column minimal width in characters (default = 0).
1904@item @b{@minus{}@minus{}ignore @minus{}i}           
1905        Ignore segment inconsistency in the input.
1906@item @b{@minus{}@minus{}bom}           
1907        Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
1908@item @b{@minus{}@minus{}eom}           
1909        End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
1910@item @b{@minus{}@minus{}bod}           
1911        Selected segment beginning display string (default='[').
1912@item @b{@minus{}@minus{}eod}           
1913        Selected segment end display string (default=']').
1914
1915
1916
1917@end table
1918
1919@node con usage example
1920@subsection Usage example
1921@example
1922cat file.txt | tok | lem -1 | ser -e 'lexeme(dom)' | con 
1923@end example
1924
1925
1926@node con hints
1927@subsection Hints
1928
1929@command{con} is a rather slow program. Do not pass large amounts of
1930redundant text through this program. @command{con} works fine in the following
1931sequence:
1932
1933@example
1934... | grp -e EXPR | ser -e EXPR | con
1935@end example
1936
1937
1938@c ---------------------------------------------------------------------
1939@c ---------------------------------------------------------------------
1940
1941@page
1942@node Auxiliary tools
1943@chapter Auxiliary tools
1944
1945@menu
1946* compiledic::         dictionary compiler
1947* fla::                UTT file flattener
1948* unfla::              UTT file unflattener
1949@end menu
1950
1951
1952@page
1953@node compiledic
1954@section compiledic - the dictionary compiler
1955
1956@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1957@item @strong{Authors:}                 @tab Michał Stolarski, Tomasz Obrębski
1958@item @strong{Component category:}      @tab additional tool
1959@end multitable
1960@c
1961
1962@command{compiledic} compiles dictionaries in text format (@code{.dic} extension) into binary
1963(FSA) format (@code{.bin} extension).
1964
1965Automaton representation of a dictionary is built using the AT&T tools:
1966@itemize
1967@item AT&T FSM Library,
1968@item AT&T Lextools.
1969@end itemize
1970
1971In order for the compiledic program to work you have to install the
1972above mentioned packages into your system.  They are freely available
1973for non-commercial use.
1974
1975Usage:
1976@example
1977        compiledic <dictionaryname>.dic
1978@end example
1979
1980The file <dictionaryname>.bin will be generated.
1981
1982Remarque: The program produces a lot of temporary files which are
1983stored in the current directory. They are deleted after successfull
1984termination of the program.
1985
1986@c @menu
1987@c * con command line options::
1988@c * con usage example::
1989@c * con hints::   
1990@c @end menu
1991
1992
1993@c -------------------------------------------------------------------------------
1994@c FLA
1995@c -------------------------------------------------------------------------------
1996
1997@page
1998@node fla
1999@section fla - the UTT file flattener
2000
2001@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2002@item @strong{Authors:}                 @tab Tomasz Obrębski
2003@item @strong{Input format:}            @tab UTT regular
2004@item @strong{Output format:}           @tab UTT flattened
2005@item @strong{Required annotation:}     @tab sen
2006@end multitable
2007@c
2008
2009@menu
2010* fla description::
2011@c * fla command line options::
2012@c * fla usage example::
2013@end menu
2014
2015
2016@node fla description
2017@subsection Description
2018
2019@command{fla} ``flattens'' a utt file by merging segments belonging
2020to one sentence in one line. Technically, end-of-line characters
2021('\n', ASCII code 10) are replaced with line-feed characters ('\f',
2022ASCII code 12).  The flattening makes it possible to process UTT files
2023with such tools as @command{grep} or @command{sed} sentence by
2024sentence (used in @command{grp} and @command{mar}).
2025
2026Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
2027
2028Flattened files are still human-readible.
2029
2030Usage:
2031
2032@example
2033        fla [<bosregex>]
2034@end example
2035
2036The facultative argument is a regular expression describing segments
2037which should be treated as sentence beginnings (the test is: the
2038segment contains a fragment matching the @code{<bosregex>}). By
2039default, segments containing a field @code{BOS} are seeked.
2040
2041@c -------------------------------------------------------------------------------
2042@c UNFLA
2043@c -------------------------------------------------------------------------------
2044
2045@page
2046@node unfla
2047@section unfla - the UTT file unflattener
2048
2049@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2050@item @strong{Authors:}                 @tab Tomasz Obrębski
2051@item @strong{Input format:}            @tab UTT flattened
2052@item @strong{Output format:}           @tab UTT regular
2053@item @strong{Required annotation:}     @tab -
2054@end multitable
2055
2056@menu
2057* unfla description::
2058@c * fla command line options::
2059@c * fla usage example::
2060@end menu
2061
2062@node unfla description
2063@subsection Description
2064@command{unfla} transforms a flattened UTT file, produced by
2065@command{fla}, into the regular format by restoring end-of-line
2066characters.
2067
2068
2069
2070
2071@c ---------------------------------------------------------------------
2072@c USAGE EXAMPLES
2073@c ---------------------------------------------------------------------
2074
2075@node Usage examples
2076@chapter Usage examples
2077
2078@subsubheading Simple pipelines
2079
2080@enumerate
2081
2082@item tokenization
2083
2084cat text | tok > output1
2085
2086@item morphological annotation (1)
2087
2088simple dictionary based lemmatization
2089
2090cat text | tok | lem > output1
2091
2092@item morphological annotation (2)
2093
20941) perform dictionary-based lemmatization
20954) guess descriptions for words which have no annotation
2096
2097@example
2098cat text | tok | lem | gue -S lem > output2
2099@end example
2100
2101@item morphological annotation (3)
2102
21031) perform dictionary-based lemmatization
21042) try to correct words with no annotation
21053) perform dictionary-based lemmatization of corrected words
21064) guess descriptions for words which still have no annotation
2107
2108@example
2109cat text | tok | lem | cor -p W -S lem | lem -I cor | gue -p W -S lem
2110@end example
2111@item spelling correction
2112
2113
2114
2115@example
2116cat text | tok | egrep ' W ' | lem | egrep -v 'lem:' | cor -1
2117@end example
2118
2119@item Expression extraction
2120
2121Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
2122
2123@example
2124cat text | tok | lem -1 | ser -e 'cat(<V>) space lexeme(rozmowa)' -m | kot > output4
2125@end example
2126
2127@item A word in context
2128
2129Extraction of text fragments containing a form of the lexeme 'rozmowa' in
2130the context of 5 preceeding and 5 succeeding corpus segments.
2131
2132@example
2133cat text | tok | lem -1 | ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m | kot > output
2134@end example
2135
2136@item generation of concordance table (1)
2137
2138@example
2139cat text | tok | lem -1 | ser -e 'cat(<V>) space lexeme(rozmowa)' | con
2140@end example
2141
214210"
2143
2144@item generation of concordance table (2)
2145
2146The same as above but much faster
2147
2148@example
2149cat text | tok | lem -1 | \
2150grp -e 'cat(<V>) space lexeme(rozmowa)' | \
2151ser -e 'cat(<V>) space lexeme(rozmowa)' | \
2152con
2153@end example
2154
21552"
2156
2157@item generation of concordance table (3)
2158
2159Usually, one performs repetitively search over the same corpus. In
2160such case it is advisable to transform the corpus data into the format
2161required by @command{grp} first, and then use the preprocessed data.
2162
2163As @command{grp} (@command{grep}) processes data faster then it is
2164read from the disk drive, the search time may be still shortened by
2165using file compression techniques.  We suggest using the
2166@command{lzop} compressor/decompressor.
2167
2168@item the fastest way to search a large corpus
2169
2170step 1: corpus preprocessing
2171
2172@example
2173cat corpus | tok | sen | lem -1 \
2174| fla | lzop -7 > corpus.grp.lzo
2175@end example
2176
2177step 2: search
2178
2179@example
2180lzop -cd corpus.grp.lzo | unfla | grp -e 'cat(<V>) space
2181lexeme(rozmowa)' | ser -e 'cat(<V>) space lexeme(rozmowa)' | con
2182@end example
2183
2184@end enumerate
2185
2186@c @subsubheading More complicated configurations
2187
2188
2189@c @example
2190@c mknod fifo1 p
2191@c mknod fifo2 p
2192@c mknod fifo3 p
2193@c mknod fifo4 p
2194@c mknod fifo5 p
2195
2196@c tok | lem -p W -e fifo1 > fifo2 &
2197@c cor -e fifo3 < fifo1 | lem > fifo4 &
2198@c gue < fifo3 > fifo5 &
2199@c sort -m fifo2 fifo4 fifo5
2200
2201@c rm fifo?
2202@c @end example
2203
2204
2205@c ---------------------------------------------------------------------
2206@c ---------------------------------------------------------------------
2207
2208@c ---------------------------------------------------------------------
2209@c PMDBF DICTIONARY
2210@c ---------------------------------------------------------------------
2211
2212@node PMDBF dictionary
2213@chapter PMDBF dictionary
2214
2215UTT components come with lexical data derived from Polish
2216Morphological Database (PMDB).
2217
2218@menu
2219* PMDBF files::   
2220* PMDBF tag structure::                 
2221* PMDBF parts of speech::           
2222* PMDBF morphosyntactic attributes::           
2223@end menu
2224
2225@node PMDBF files
2226@section Files
2227
2228@node PMDBF tag structure
2229@section Tag structure
2230
2231pos = [[:upper:]]+
2232
2233attr = [[:upper:]]+
2234
2235val = [[:lower:][:digit:]?!*+-] | <[^>\n]+>
2236
2237descr = pos ( / ( attr val + ) + ) ?
2238
2239@node PMDBF parts of speech
2240@section Parts of speech
2241
2242@multitable {ADJPRP} { adjectival-passive-participle }
2243@item @code{N} @tab noun
2244@item @code{NPRO} @tab nominal-pronoun
2245@item @code{NV} @tab deverbal-noun
2246@item @code{V} @tab verb
2247@item @code{BYC} @tab byc
2248@item @code{VNI} @tab non-inflected-verb
2249@item @code{ADJ} @tab adjective
2250@item @code{ADJPAP} @tab adjectival-passive-participle
2251@item @code{ADJPRP} @tab adjectival-present-participle
2252@item @code{ADJPP} @tab adjectival-past-participle
2253@item @code{ADJPRO} @tab adjectival-pronoun
2254@item @code{ADJNUM} @tab adjectival-numeral
2255@item @code{ADV} @tab adverb
2256@item @code{ADVANP} @tab adverbial-anterior-participle
2257@item @code{ADVPRP} @tab adverbial-present-participle
2258@item @code{ADVPRO} @tab adverbial-pronoun
2259@item @code{ADVNUM} @tab  adverbial-numeral
2260@item @code{P} @tab preposition
2261@item @code{PPRO} @tab prep-noun-pronoun
2262@item @code{CONJ} @tab conjunction
2263@item @code{EXCL} @tab exclamation
2264@item @code{APP} @tab call
2265@item @code{ONO} @tab onomatopoeia
2266@item @code{PART} @tab particle
2267@item @code{NUMCRD} @tab cardinal-numeral
2268@item @code{NUMCOL} @tab collective-numeral
2269@item @code{NUMPAR} @tab partitive-numeral
2270@item @code{NUMORD} @tab ordinal-numeral
2271@end multitable
2272
2273@node PMDBF morphosyntactic attributes
2274@section Morphosyntactic attributes
2275
2276@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2277@c @headitem Attr @tab Val @tab Description
2278@item
2279@code{A} @tab @tab Aspect
2280@item
2281@tab @code{p} @tab perfect
2282@item
2283@tab @code{i} @tab imperfect.
2284@item
2285@item
2286@code{V} @tab @tab Verb-Form
2287@item
2288@tab @code{b} @tab infinitive,
2289@item
2290@tab @code{p} @tab personal,
2291@item
2292@tab @code{i} @tab impersonal.
2293@item
2294@item
2295@code{M} @tab @tab Mood
2296@item
2297@tab @code{d} @tab declarative,
2298@item
2299@tab @code{c} @tab conditional,
2300@item
2301@tab @code{i} @tab imperative.
2302@item
2303@item
2304@code{T} @tab @tab Tense
2305@item
2306@tab @code{a} @tab past,
2307@item
2308@tab @code{r} @tab present,
2309@item
2310@tab @code{f} @tab future.
2311@item
2312@item
2313@code{P} @tab @tab Person
2314@item
2315@tab @code{1} @tab 1,
2316@item
2317@tab @code{2} @tab 2,
2318@item
2319@tab @code{3} @tab 3.
2320@item
2321@item
2322@code{D} @tab @tab Degree
2323@item
2324@tab @code{p} @tab positive,
2325@item
2326@tab @code{c} @tab comparative,
2327@item
2328@tab @code{s} @tab superlative.
2329@item
2330@item
2331@code{N} @tab @tab Number
2332@item
2333@tab @code{s} @tab singular,
2334@item
2335@tab @code{p} @tab plural.
2336@item
2337@item
2338@code{C} @tab @tab Case
2339@item
2340@tab @code{n} @tab nominative,
2341@item
2342@tab @code{g} @tab genitive,
2343@item
2344@tab @code{d} @tab dative,
2345@item
2346@tab @code{a} @tab accusative,
2347@item
2348@tab @code{i} @tab instrumantal,
2349@item
2350@tab @code{l} @tab locative,
2351@item
2352@tab @code{v} @tab vocative.
2353@item
2354@code{G} @tab @tab Gender
2355@item
2356@tab @code{p} @tab masculine-personal,
2357@item
2358@tab @code{a} @tab masculine-animal,
2359@item
2360@tab @code{i} @tab masculine-inanimate,
2361@item
2362@tab @code{f} @tab feminine,
2363@item
2364@tab @code{n} @tab neuter.
2365@end multitable
2366
2367
2368@c ---------------------------------------------------------------------
2369@c ---------------------------------------------------------------------
2370@c
2371@c @node Examples
2372@c @chapter Examples
2373
2374@c ----------------------------------------------------------------------
2375@c ----------------------------------------------------------------------
2376
2377@node    GNU Free Documentation License
2378@chapter GNU Free Documentation License
2379
2380@c The GNU Free Documentation License.
2381@center Version 1.2, November 2002
2382
2383@c This file is intended to be included within another document,
2384@c hence no sectioning command or @node.
2385
2386@display
2387Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
238851 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA
2389
2390Everyone is permitted to copy and distribute verbatim copies
2391of this license document, but changing it is not allowed.
2392@end display
2393
2394@enumerate 0
2395@item
2396PREAMBLE
2397
2398The purpose of this License is to make a manual, textbook, or other
2399functional and useful document @dfn{free} in the sense of freedom: to
2400assure everyone the effective freedom to copy and redistribute it,
2401with or without modifying it, either commercially or noncommercially.
2402Secondarily, this License preserves for the author and publisher a way
2403to get credit for their work, while not being considered responsible
2404for modifications made by others.
2405
2406This License is a kind of ``copyleft'', which means that derivative
2407works of the document must themselves be free in the same sense.  It
2408complements the GNU General Public License, which is a copyleft
2409license designed for free software.
2410
2411We have designed this License in order to use it for manuals for free
2412software, because free software needs free documentation: a free
2413program should come with manuals providing the same freedoms that the
2414software does.  But this License is not limited to software manuals;
2415it can be used for any textual work, regardless of subject matter or
2416whether it is published as a printed book.  We recommend this License
2417principally for works whose purpose is instruction or reference.
2418
2419@item
2420APPLICABILITY AND DEFINITIONS
2421
2422This License applies to any manual or other work, in any medium, that
2423contains a notice placed by the copyright holder saying it can be
2424distributed under the terms of this License.  Such a notice grants a
2425world-wide, royalty-free license, unlimited in duration, to use that
2426work under the conditions stated herein.  The ``Document'', below,
2427refers to any such manual or work.  Any member of the public is a
2428licensee, and is addressed as ``you''.  You accept the license if you
2429copy, modify or distribute the work in a way requiring permission
2430under copyright law.
2431
2432A ``Modified Version'' of the Document means any work containing the
2433Document or a portion of it, either copied verbatim, or with
2434modifications and/or translated into another language.
2435
2436A ``Secondary Section'' is a named appendix or a front-matter section
2437of the Document that deals exclusively with the relationship of the
2438publishers or authors of the Document to the Document's overall
2439subject (or to related matters) and contains nothing that could fall
2440directly within that overall subject.  (Thus, if the Document is in
2441part a textbook of mathematics, a Secondary Section may not explain
2442any mathematics.)  The relationship could be a matter of historical
2443connection with the subject or with related matters, or of legal,
2444commercial, philosophical, ethical or political position regarding
2445them.
2446
2447The ``Invariant Sections'' are certain Secondary Sections whose titles
2448are designated, as being those of Invariant Sections, in the notice
2449that says that the Document is released under this License.  If a
2450section does not fit the above definition of Secondary then it is not
2451allowed to be designated as Invariant.  The Document may contain zero
2452Invariant Sections.  If the Document does not identify any Invariant
2453Sections then there are none.
2454
2455The ``Cover Texts'' are certain short passages of text that are listed,
2456as Front-Cover Texts or Back-Cover Texts, in the notice that says that
2457the Document is released under this License.  A Front-Cover Text may
2458be at most 5 words, and a Back-Cover Text may be at most 25 words.
2459
2460A ``Transparent'' copy of the Document means a machine-readable copy,
2461represented in a format whose specification is available to the
2462general public, that is suitable for revising the document
2463straightforwardly with generic text editors or (for images composed of
2464pixels) generic paint programs or (for drawings) some widely available
2465drawing editor, and that is suitable for input to text formatters or
2466for automatic translation to a variety of formats suitable for input
2467to text formatters.  A copy made in an otherwise Transparent file
2468format whose markup, or absence of markup, has been arranged to thwart
2469or discourage subsequent modification by readers is not Transparent.
2470An image format is not Transparent if used for any substantial amount
2471of text.  A copy that is not ``Transparent'' is called ``Opaque''.
2472
2473Examples of suitable formats for Transparent copies include plain
2474@sc{ascii} without markup, Texinfo input format, La@TeX{} input
2475format, @acronym{SGML} or @acronym{XML} using a publicly available
2476@acronym{DTD}, and standard-conforming simple @acronym{HTML},
2477PostScript or @acronym{PDF} designed for human modification.  Examples
2478of transparent image formats include @acronym{PNG}, @acronym{XCF} and
2479@acronym{JPG}.  Opaque formats include proprietary formats that can be
2480read and edited only by proprietary word processors, @acronym{SGML} or
2481@acronym{XML} for which the @acronym{DTD} and/or processing tools are
2482not generally available, and the machine-generated @acronym{HTML},
2483PostScript or @acronym{PDF} produced by some word processors for
2484output purposes only.
2485
2486The ``Title Page'' means, for a printed book, the title page itself,
2487plus such following pages as are needed to hold, legibly, the material
2488this License requires to appear in the title page.  For works in
2489formats which do not have any title page as such, ``Title Page'' means
2490the text near the most prominent appearance of the work's title,
2491preceding the beginning of the body of the text.
2492
2493A section ``Entitled XYZ'' means a named subunit of the Document whose
2494title either is precisely XYZ or contains XYZ in parentheses following
2495text that translates XYZ in another language.  (Here XYZ stands for a
2496specific section name mentioned below, such as ``Acknowledgements'',
2497``Dedications'', ``Endorsements'', or ``History''.)  To ``Preserve the Title''
2498of such a section when you modify the Document means that it remains a
2499section ``Entitled XYZ'' according to this definition.
2500
2501The Document may include Warranty Disclaimers next to the notice which
2502states that this License applies to the Document.  These Warranty
2503Disclaimers are considered to be included by reference in this
2504License, but only as regards disclaiming warranties: any other
2505implication that these Warranty Disclaimers may have is void and has
2506no effect on the meaning of this License.
2507
2508@item
2509VERBATIM COPYING
2510
2511You may copy and distribute the Document in any medium, either
2512commercially or noncommercially, provided that this License, the
2513copyright notices, and the license notice saying this License applies
2514to the Document are reproduced in all copies, and that you add no other
2515conditions whatsoever to those of this License.  You may not use
2516technical measures to obstruct or control the reading or further
2517copying of the copies you make or distribute.  However, you may accept
2518compensation in exchange for copies.  If you distribute a large enough
2519number of copies you must also follow the conditions in section 3.
2520
2521You may also lend copies, under the same conditions stated above, and
2522you may publicly display copies.
2523
2524@item
2525COPYING IN QUANTITY
2526
2527If you publish printed copies (or copies in media that commonly have
2528printed covers) of the Document, numbering more than 100, and the
2529Document's license notice requires Cover Texts, you must enclose the
2530copies in covers that carry, clearly and legibly, all these Cover
2531Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
2532the back cover.  Both covers must also clearly and legibly identify
2533you as the publisher of these copies.  The front cover must present
2534the full title with all words of the title equally prominent and
2535visible.  You may add other material on the covers in addition.
2536Copying with changes limited to the covers, as long as they preserve
2537the title of the Document and satisfy these conditions, can be treated
2538as verbatim copying in other respects.
2539
2540If the required texts for either cover are too voluminous to fit
2541legibly, you should put the first ones listed (as many as fit
2542reasonably) on the actual cover, and continue the rest onto adjacent
2543pages.
2544
2545If you publish or distribute Opaque copies of the Document numbering
2546more than 100, you must either include a machine-readable Transparent
2547copy along with each Opaque copy, or state in or with each Opaque copy
2548a computer-network location from which the general network-using
2549public has access to download using public-standard network protocols
2550a complete Transparent copy of the Document, free of added material.
2551If you use the latter option, you must take reasonably prudent steps,
2552when you begin distribution of Opaque copies in quantity, to ensure
2553that this Transparent copy will remain thus accessible at the stated
2554location until at least one year after the last time you distribute an
2555Opaque copy (directly or through your agents or retailers) of that
2556edition to the public.
2557
2558It is requested, but not required, that you contact the authors of the
2559Document well before redistributing any large number of copies, to give
2560them a chance to provide you with an updated version of the Document.
2561
2562@item
2563MODIFICATIONS
2564
2565You may copy and distribute a Modified Version of the Document under
2566the conditions of sections 2 and 3 above, provided that you release
2567the Modified Version under precisely this License, with the Modified
2568Version filling the role of the Document, thus licensing distribution
2569and modification of the Modified Version to whoever possesses a copy
2570of it.  In addition, you must do these things in the Modified Version:
2571
2572@enumerate A
2573@item
2574Use in the Title Page (and on the covers, if any) a title distinct
2575from that of the Document, and from those of previous versions
2576(which should, if there were any, be listed in the History section
2577of the Document).  You may use the same title as a previous version
2578if the original publisher of that version gives permission.
2579
2580@item
2581List on the Title Page, as authors, one or more persons or entities
2582responsible for authorship of the modifications in the Modified
2583Version, together with at least five of the principal authors of the
2584Document (all of its principal authors, if it has fewer than five),
2585unless they release you from this requirement.
2586
2587@item
2588State on the Title page the name of the publisher of the
2589Modified Version, as the publisher.
2590
2591@item
2592Preserve all the copyright notices of the Document.
2593
2594@item
2595Add an appropriate copyright notice for your modifications
2596adjacent to the other copyright notices.
2597
2598@item
2599Include, immediately after the copyright notices, a license notice
2600giving the public permission to use the Modified Version under the
2601terms of this License, in the form shown in the Addendum below.
2602
2603@item
2604Preserve in that license notice the full lists of Invariant Sections
2605and required Cover Texts given in the Document's license notice.
2606
2607@item
2608Include an unaltered copy of this License.
2609
2610@item
2611Preserve the section Entitled ``History'', Preserve its Title, and add
2612to it an item stating at least the title, year, new authors, and
2613publisher of the Modified Version as given on the Title Page.  If
2614there is no section Entitled ``History'' in the Document, create one
2615stating the title, year, authors, and publisher of the Document as
2616given on its Title Page, then add an item describing the Modified
2617Version as stated in the previous sentence.
2618
2619@item
2620Preserve the network location, if any, given in the Document for
2621public access to a Transparent copy of the Document, and likewise
2622the network locations given in the Document for previous versions
2623it was based on.  These may be placed in the ``History'' section.
2624You may omit a network location for a work that was published at
2625least four years before the Document itself, or if the original
2626publisher of the version it refers to gives permission.
2627
2628@item
2629For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
2630the Title of the section, and preserve in the section all the
2631substance and tone of each of the contributor acknowledgements and/or
2632dedications given therein.
2633
2634@item
2635Preserve all the Invariant Sections of the Document,
2636unaltered in their text and in their titles.  Section numbers
2637or the equivalent are not considered part of the section titles.
2638
2639@item
2640Delete any section Entitled ``Endorsements''.  Such a section
2641may not be included in the Modified Version.
2642
2643@item
2644Do not retitle any existing section to be Entitled ``Endorsements'' or
2645to conflict in title with any Invariant Section.
2646
2647@item
2648Preserve any Warranty Disclaimers.
2649@end enumerate
2650
2651If the Modified Version includes new front-matter sections or
2652appendices that qualify as Secondary Sections and contain no material
2653copied from the Document, you may at your option designate some or all
2654of these sections as invariant.  To do this, add their titles to the
2655list of Invariant Sections in the Modified Version's license notice.
2656These titles must be distinct from any other section titles.
2657
2658You may add a section Entitled ``Endorsements'', provided it contains
2659nothing but endorsements of your Modified Version by various
2660parties---for example, statements of peer review or that the text has
2661been approved by an organization as the authoritative definition of a
2662standard.
2663
2664You may add a passage of up to five words as a Front-Cover Text, and a
2665passage of up to 25 words as a Back-Cover Text, to the end of the list
2666of Cover Texts in the Modified Version.  Only one passage of
2667Front-Cover Text and one of Back-Cover Text may be added by (or
2668through arrangements made by) any one entity.  If the Document already
2669includes a cover text for the same cover, previously added by you or
2670by arrangement made by the same entity you are acting on behalf of,
2671you may not add another; but you may replace the old one, on explicit
2672permission from the previous publisher that added the old one.
2673
2674The author(s) and publisher(s) of the Document do not by this License
2675give permission to use their names for publicity for or to assert or
2676imply endorsement of any Modified Version.
2677
2678@item
2679COMBINING DOCUMENTS
2680
2681You may combine the Document with other documents released under this
2682License, under the terms defined in section 4 above for modified
2683versions, provided that you include in the combination all of the
2684Invariant Sections of all of the original documents, unmodified, and
2685list them all as Invariant Sections of your combined work in its
2686license notice, and that you preserve all their Warranty Disclaimers.
2687
2688The combined work need only contain one copy of this License, and
2689multiple identical Invariant Sections may be replaced with a single
2690copy.  If there are multiple Invariant Sections with the same name but
2691different contents, make the title of each such section unique by
2692adding at the end of it, in parentheses, the name of the original
2693author or publisher of that section if known, or else a unique number.
2694Make the same adjustment to the section titles in the list of
2695Invariant Sections in the license notice of the combined work.
2696
2697In the combination, you must combine any sections Entitled ``History''
2698in the various original documents, forming one section Entitled
2699``History''; likewise combine any sections Entitled ``Acknowledgements'',
2700and any sections Entitled ``Dedications''.  You must delete all
2701sections Entitled ``Endorsements.''
2702
2703@item
2704COLLECTIONS OF DOCUMENTS
2705
2706You may make a collection consisting of the Document and other documents
2707released under this License, and replace the individual copies of this
2708License in the various documents with a single copy that is included in
2709the collection, provided that you follow the rules of this License for
2710verbatim copying of each of the documents in all other respects.
2711
2712You may extract a single document from such a collection, and distribute
2713it individually under this License, provided you insert a copy of this
2714License into the extracted document, and follow this License in all
2715other respects regarding verbatim copying of that document.
2716
2717@item
2718AGGREGATION WITH INDEPENDENT WORKS
2719
2720A compilation of the Document or its derivatives with other separate
2721and independent documents or works, in or on a volume of a storage or
2722distribution medium, is called an ``aggregate'' if the copyright
2723resulting from the compilation is not used to limit the legal rights
2724of the compilation's users beyond what the individual works permit.
2725When the Document is included in an aggregate, this License does not
2726apply to the other works in the aggregate which are not themselves
2727derivative works of the Document.
2728
2729If the Cover Text requirement of section 3 is applicable to these
2730copies of the Document, then if the Document is less than one half of
2731the entire aggregate, the Document's Cover Texts may be placed on
2732covers that bracket the Document within the aggregate, or the
2733electronic equivalent of covers if the Document is in electronic form.
2734Otherwise they must appear on printed covers that bracket the whole
2735aggregate.
2736
2737@item
2738TRANSLATION
2739
2740Translation is considered a kind of modification, so you may
2741distribute translations of the Document under the terms of section 4.
2742Replacing Invariant Sections with translations requires special
2743permission from their copyright holders, but you may include
2744translations of some or all Invariant Sections in addition to the
2745original versions of these Invariant Sections.  You may include a
2746translation of this License, and all the license notices in the
2747Document, and any Warranty Disclaimers, provided that you also include
2748the original English version of this License and the original versions
2749of those notices and disclaimers.  In case of a disagreement between
2750the translation and the original version of this License or a notice
2751or disclaimer, the original version will prevail.
2752
2753If a section in the Document is Entitled ``Acknowledgements'',
2754``Dedications'', or ``History'', the requirement (section 4) to Preserve
2755its Title (section 1) will typically require changing the actual
2756title.
2757
2758@item
2759TERMINATION
2760
2761You may not copy, modify, sublicense, or distribute the Document except
2762as expressly provided for under this License.  Any other attempt to
2763copy, modify, sublicense or distribute the Document is void, and will
2764automatically terminate your rights under this License.  However,
2765parties who have received copies, or rights, from you under this
2766License will not have their licenses terminated so long as such
2767parties remain in full compliance.
2768
2769@item
2770FUTURE REVISIONS OF THIS LICENSE
2771
2772The Free Software Foundation may publish new, revised versions
2773of the GNU Free Documentation License from time to time.  Such new
2774versions will be similar in spirit to the present version, but may
2775differ in detail to address new problems or concerns.  See
2776@uref{http://www.gnu.org/copyleft/}.
2777
2778Each version of the License is given a distinguishing version number.
2779If the Document specifies that a particular numbered version of this
2780License ``or any later version'' applies to it, you have the option of
2781following the terms and conditions either of that specified version or
2782of any later version that has been published (not as a draft) by the
2783Free Software Foundation.  If the Document does not specify a version
2784number of this License, you may choose any version ever published (not
2785as a draft) by the Free Software Foundation.
2786@end enumerate
2787
2788@page
2789@heading ADDENDUM: How to use this License for your documents
2790
2791To use this License in a document you have written, include a copy of
2792the License in the document and put the following copyright and
2793license notices just after the title page:
2794
2795@smallexample
2796@group
2797  Copyright (C)  @var{year}  @var{your name}.
2798  Permission is granted to copy, distribute and/or modify this document
2799  under the terms of the GNU Free Documentation License, Version 1.2
2800  or any later version published by the Free Software Foundation;
2801  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
2802  Texts.  A copy of the license is included in the section entitled ``GNU
2803  Free Documentation License''.
2804@end group
2805@end smallexample
2806
2807If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
2808replace the ``with@dots{}Texts.'' line with this:
2809
2810@smallexample
2811@group
2812    with the Invariant Sections being @var{list their titles}, with
2813    the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
2814    being @var{list}.
2815@end group
2816@end smallexample
2817
2818If you have Invariant Sections without Cover Texts, or some other
2819combination of the three, merge those two alternatives to suit the
2820situation.
2821
2822If your document contains nontrivial examples of program code, we
2823recommend releasing these examples in parallel under your choice of
2824free software license, such as the GNU General Public License,
2825to permit their use in free software.
2826
2827@c Local Variables:
2828@c ispell-local-pdict: "ispell-dict"
2829@c End:
2830
2831
2832@c ---------------------------------------------------------------------
2833@c ---------------------------------------------------------------------
2834
2835@node    Reporting bugs
2836@chapter Reporting bugs
2837
2838Report bugs to <obrebski@@amu.edu.pl>.
2839
2840@c ---------------------------------------------------------------------
2841@c ---------------------------------------------------------------------
2842
2843@c @node    Copyright
2844@c @chapter Copyright
2845@c
2846@c Copyright 2004 by Tomasz Obrębski
2847@c This software is free for research and educational use.
2848
2849@c ---------------------------------------------------------------------
2850@c ---------------------------------------------------------------------
2851
2852@node    Author
2853@chapter Author
2854
2855
2856@bye
Note: See TracBrowser for help on using the repository browser.