source: app/doc/utt.texinfo @ 91ed676

help
Last change on this file since 91ed676 was e28a625, checked in by obrebski <obrebski@…>, 16 years ago

Ta linia i następne zostaną zignorowane--

M app/dist/files/README

uaktualnione

M app/doc/utt.texinfo

dopiski

M app/src/gue/Makefile

statyczne biblioteki

M app/src/cor/cmdline_cor.ggo

usuniecie nie dzialajacych parametrow

M app/src/cor/Makefile

statyczne biblioteki

M app/src/common/cmdline_common.ggo

?

M app/src/kor/Makefile

statyczne biblioteki

M app/src/lem/Makefile

statyczne biblioteki

M lang/dist/tarball/Makefile

pakowanie modulow jezykowych po jednym

M lang/Makefile

-"-

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@61 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

  • Property mode set to 100644
File size: 80.0 KB
Line 
1\input texinfo   @c -*-texinfo-*-
2@documentencoding ISO-8859-2
3@c @documentlanguage pl
4
5@c %**start of header
6@setfilename utt.info
7@settitle UAM Text Tools v0.90
8@c %**end of header
9
10@copying
11This manual is for UAM Text Tools (version 0.90, October, 2008)
12
13Copyright @copyright{}  2005, 2007  Tomasz Obrêbski, Micha³ Stolarski, Justyna Walkowska, Pawe³ Konieczka.
14
15Permission is granted to copy, distribute and/or modify this document
16under the terms of the GNU Free Documentation License, Version 1.2 or
17any later version published by the Free Software Foundation; with no
18Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.  A
19copy of the license is included in the section entitled GNU Free
20Documentation License,,GNU Free Documentation License.
21
22@c @quotation
23@c Permission is granted to ...
24@c No permission is granted until the document is completed.
25@c @end quotation
26@end copying
27
28
29@titlepage
30@title UAM Text Tools 0.90 - User Manual
31@subtitle edition 0.01, @today
32@subtitle status: prescript
33@author by Justyna Walkowska, Tomasz Obr@,{}ebski and Micha@l{} Stolarski
34@page
35@vskip 0pt plus 1filll
36@insertcopying
37@end titlepage
38
39@contents
40
41@c @paragraphindent none
42
43@iftex
44@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
45@end iftex
46
47@c @headings off
48@c @everyheading LEM(1) @| @| LEM(1)
49@everyfooting @today @c @| @thispage @|
50
51@ifnottex
52
53@node Top
54@top UTT - UAM Text Tools
55
56@insertcopying
57
58@menu
59* General information::                       
60* UTT file format::             
61* Configuration files::         
62* UTT components::
63* Auxiliary tools::
64* Usage examples::             
65* PMDBF dictionary::           
66@c * Examples::                   
67@c * Copyright::
68* GNU Free Documentation License::
69* Reporting bugs::                                   
70* Author::                     
71@end menu
72@end ifnottex
73
74
75@c ----------------------------------------------------------------------
76
77@node General information
78@chapter General information
79
80UAM Text Tools (UTT) is a package of language processing tools
81developed at Adam Mickiewicz University. Its functionality includes:
82
83@itemize @bullet
84
85@item
86tokenization
87@item
88dictionary-based morphological analysis
89@item
90heuristic morphological analysis of unknown words
91@item
92spelling correction
93@item
94pattern search
95@item
96sentence splitting
97@item
98generation of concordance tables
99@end itemize
100
101The toolkit is destined for processing of raw (not annotated)
102unrestricted text for any conceivable purpose.
103
104The system is organized as a collection of command-line programs, each
105performing one operation, e.g. tokenization, lemmatization, spelling
106correction. The components are independent one from another, the
107unifying element being the uniform i/o file format.
108
109The components may be combined in various ways to provide various text
110processing services. Also new components supplied by the used may be
111easily incorporated into the system provided that they respect the i/o
112file format conventions.
113
114UTT component programs does not depend on any specific tagset or
115morphological description format.
116
117UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
118the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
119
120The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use. 
121
122
123List of contributors:
124
125@itemize
126@item Pawel Konieczka
127@item Tomasz Obrebski
128@item Michal Stolarski
129@item Marcin Walas
130@item Justyna Walkowska
131@item Pawel Werenski
132@end itemize
133
134@c ----------------------------------------------------------------------
135@c ---------------------------------------------------------------------
136
137@node    UTT file format
138@chapter UTT file format
139
140A UTT file contains annotation of a text. It consists of a sequence of
141segments. Each segment explicitly refers to a continuous piece of the
142text and provides some information on it.
143
144@section Segment format
145
146A segment occupies one line of a UTT file and consists of
147space-separated fields:
148
149
150@quotation
151@sp 1
152[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
153@sp 1
154@end quotation
155
156@table @var
157
158@item @var{start}
159Non-negative integer value indicating the position in the source text where the
160segment starts.
161
162@item @var{length}
163Non-negative integer value indicating the length of the segment.
164
165@item @var{type}
166A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
167@var{type} reflects the main classification of segments -
168into words, numbers, punctuation marks, meta-text markers.
169@xref{tok output,,tok output}, for description of automatically recognized type markers.
170
171@item @var{form}
172This field contains the textual form of the segment or the special
173symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
174
175The characters or character sequences that have special meaning in the
176@var{form} field are enumerated below.
177
178Characters with special meaning:
179
180@itemize
181@item @code{_} - space character
182@item @code{*} - undefined contents
183@end itemize
184
185Escape sequences:
186
187@itemize
188@item @code{\n} - new line
189@item @code{\t} - tabulation
190@item @code{\r} - carriage return 
191
192@item @code{\_} - the @code{_} character
193@item @code{\*} - the @code{*} character
194@item @code{\\} - the @code{\} character
195
196@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
197@end itemize
198
199@item @var{annotation1}
200@item @var{annotation2}
201@item ...
202Annotation fields have the following format:
203
204@var{longname} @code{:} @var{value}
205
206or
207
208@var{shortname} @var{value}
209
210where @var{longname} is a string of alphanumeric characters
211(isalnum() test), @var{shortname} - a single non-alphanumeric character
212(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
213
214@end table
215
216
217Only two fields are mandatory: @var{type} and @var{form}. All other fields
218may be absent. In the case when only one number precedes the
219@var{type} field, it is interpreted as the @var{START} position.
220
221If the @var{length} field is ommited, the length of the segment is the
222length of the @var{form} field, except when the value of the
223@var{form} field is @code{*} -- in this case, the length is assumed to
224be 0.
225
226If the @var{start} field is also absent, the segment is assumed to directly
227follow the preceding one.
228
229@c Conventions:
230
231@c Annotation fields with predefined meaning:
232
233@c @itemize
234@c @item @code{!} - UTT components are allowed to modify the contents of
235@c the @var{form} field (e.g. spelling correction does this). If this happens the
236@c original form of the segment have to be placed in the @code{!}-field.
237@c @item @code{@@} - morphological description
238@c @item @code{=} - node identifier assignment (used in graph encoding)
239@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
240@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
241@c @end itemize
242
243Segments of length 0 may be used to mark file positions with some
244information. See e.g. BOS and EOS (beginning/end of sentence) markers
245in the example below.
246
247Example:
248
249sentence: @samp{Piszemy dobre progrumy.}
250
251@example
2520000 00 BOS *
2530000 07 W Piszemy lem:pisaÊ,V
2540007 01 S _
2550008 05 W dobre lem:dobry,ADJ
2560013 01 S _
2570014 08 W progrumy cor:programy lem:program,N
2580022 01 P .
2590023 00 EOS *
2600023 01 S _
2610024 00 BOS *
2620024 11 W Warszawiacy lem:Warszawiak,N
2630035 01 S _
2640036 03 W te¿
2650039 01 P .
2660040 00 EOS *
267
268@end example
269
270@example
2710000 BOS *
2720000 W Piszemy lem:pisaÊ,V
2730007 S _
2740008 W dobre lem:dobry,ADJ
2750013 S _
2760014 W progrumy cor:programy lem:program,N
2770022 P .
2780023 EOS *
279@end example
280
281Posion information may be provided only for some types of segments:
282
283@example
2840000 BOS *
285W Piszemy lem:pisaÊ,V
286S _
287W dobre lem:dobry,ADJ
288S _
289W progrumy cor:programy lem:program,N
290P .
291EOS *
292S _
2930024 BOS *
294W Warszawiacy lem:Warszawiak,N
295S _
296W te¿
297P .
298EOS *
299@end example
300
301Position/length information may be provided only when necessary:
302
303@example
3040000 04 N *
3050000 N 12
306P .
307N 5
308S _
309W km
310@end example
311
312@section UTT File
313
314A UTT file consists of a sequence of segments.  The same text position
315may be covered by multiple segments. In cosequence, ambiguous text
316segmentation and ambiguous annotation may be represented.
317
318There are two structural requirements a valid UTT-formatted file
319has to meet:
320
321@itemize @bullet
322
323@item
324segments have to be sorted with respect to the @var{position} field,
325
326@item
327for each
328segment ending at position @var{n}, either there must be a segment starting at
329position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
330for each segment starting at position @var{n}, either there must be a segment
331ending at position @var{n-1}, or the position @var{n-1} must not be covered
332by any segment.
333
334@end itemize
335
336A valid annotation for the text fragment
337@example
33812.5 km
339@end example
340
341may be
342
343@example
3440000 02 N 12
3450000 04 N 12.5
3460002 01 P .
3470003 01 N 5
3480004 01 S _
3490005 02 W km
350@end example
351
352but not
353
354@example
3550000 02 N 12
3560000 04 N 12.5
3570004 01 S _
3580005 02 W km
359@end example
360
361because in the latter example the first segment (starting at position
3620000, 2 characters long) ends at position @var{n}=0001 which is
363covered by the second segment and no segment starts at position
364@var{n+2}=0002.
365
366
367@section Flattened UTT file
368
369A UTT file format has two variants: regular and flattened. The regular
370format was described above.  In the flattened format some of the
371end-of-line characters are replaced with line-feed characters.
372
373The flatten format is basically used to represent whole sentences as
374single lines of the input file (all intrasentential end-of-line
375characters are replaced with line-feed characters).
376
377This technical trick permits to perform certain text
378processing operations on entire sentences with the use of such tools as
379@command{grep} (see @command{grp} component) or @command{sed} (see  @command{mar} component).
380
381The conversion between the two formats is performed by the tools:
382@command{fla} and @command{unfla}.
383
384@section Character encoding
385
386The UTT component programs accept only 1-byte character encoding, such
387as ISO, ANSI, DOS.
388
389
390@c @section Formats
391
392@c @unnumberedsubsubsec Basic format
393
394@c While processing large amounts of the overhead related with explicit
395@c ... of the start position and segment length becomes ... . Therefore,
396@c for efficiency reasons certain shortcuts are possible:
397
398@c @unnumberedsubsubsec Relative start position
399
400@c Start position may be given as relative distance from the last
401@c absolut position.
402
403@c @unnumberedsubsubsec Absent length
404
405@c Segment length may by omitted. Normally it can be restored by counting
406@c the length of the @emph{form field}. For segments with the special value
407@c @code{*} in the @emph{form field} length 0 is assumed.
408
409@c @unnumberedsubsubsec Absent length and start position
410
411@c Both start position and segment length may be omitted. In this format
412@c each segment is assumed to follow the previous one. This format is,
413@c therefore, suitable only for unambiguously tagged text
414@c (0-length markers can be still used.)
415
416
417@c @table @code
418@c @item AL
419@c @code{1234 03 W kot}
420@c @item RL
421@c @code{+56 03 W kot}
422@c @item A
423@c @code{1234 W kot}
424@c @item R
425@c @code{+56 W kot}
426@c @item 0
427@c @code{W kot}
428@c @end table
429
430
431@c [JAK UZYSKAÆ POLSKIE CZCIONKI W DVI???]
432
433@macro parhelp
434@item @b{@minus{}@minus{}help}, @b{@minus{}h}
435Print help.
436@end macro
437
438
439@macro parversion
440@item @b{@minus{}@minus{}version}, @b{@minus{}V}
441Print version information.
442@end macro
443
444@macro parinteractive
445@item @b{@minus{}@minus{}interactive, @minus{}i}
446This option toggles interactive mode, which is by default off. In the
447interactive mode the program does not buffer the output.
448@end macro
449
450
451@c @macro parfile
452@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
453@c Input file name.
454@c If this option is absent or equal to '@minus{}', the program
455@c reads from the standard input.
456@c @end macro
457
458
459@c @macro paroutput
460@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
461@c Regular output file name. To regular output the program sends segments
462@c which it successfully processed and copies those which were not
463@c subject to processing. If this option is absent or equal to
464@c '@minus{}', standard output is used.
465@c @end macro
466
467@c @macro parfail
468@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
469@c Fail output file name. To fail output the program copies the segments
470@c it failed to process.  If this option is absent or equal to
471@c '@minus{}', standard output is used.
472@c @end macro
473
474
475@c @macro parcopy
476@c @item @b{@minus{}@minus{}copy, @minus{}c}
477@c Copy succesfully processed segments to regular output also in their
478@c original input form.
479@c @end macro
480
481
482@macro parinputfield
483@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
484The field containing the input to the program. The default is the
485@var{form} field. The fields @var{position}, @var{length}, @var{type},
486and @var{form} are referred to as @code{1}, @code{2}, @code{3},
487@code{4}, respectively.
488@end macro
489
490
491@macro paroutputfield
492@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
493The name of the field added by the program. The default is the name of the program.
494@end macro
495
496
497@macro pardictionary
498@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
499Dictionary file name.
500@end macro
501
502
503@macro parprocess
504@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
505Process segments with the specified value in the @var{type} field.
506Multiple occurences of this option are allowed and are interpreted as
507disjunction. If this option is absent, all segments are processed.
508@end macro
509
510
511@macro parselect
512@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
513Select for processing only segments in which the field named
514@var{fieldname} is present. Multiple occurences of this option are
515allowed and are interpreted as conjunction of conditions. If this
516option is absent, all segments are processed.
517@end macro
518
519
520@macro parunselect
521@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
522Select for processing only segments in which the field @var{fieldname}
523is absent.  Multiple occurences of this option are allowed and are
524interpreted as conjunction of conditions. If this option is absent,
525all segments are processed.
526@end macro
527
528
529@macro paroneline
530@item @b{@minus{}@minus{}one-line}
531This option makes the program print ambiguous annotation in one output
532line by generating multiple annotation fields. By default when
533ambiguous annotation may be produced for a segment, the segment is
534multiplicated and each of the annotations is added to separate copy of
535the segment.
536@end macro
537
538
539@macro paronefield
540@item @b{@minus{}@minus{}one-field, @minus{}1}
541This option makes the program print ambiguous annotation in one
542annotation field. By default when ambiguous annotation may be produced
543for a segment, the segment is multiplicated and each of the
544annotations is added to separate copy of the segment.
545
546This option is useful when working with @command{kot} or @command{con}.
547@end macro
548
549
550@c ---------------------------------------------------------------------
551@c CONFIGURATION FILES
552@c ---------------------------------------------------------------------
553
554@node    Configuration files
555@chapter Configuration files
556
557Values for all command line options accepted by a component
558may be set in configuration files. The default location of the
559configuration files for a component named @command{@var{program}} are
560
561@example
562        @file{/usr/local/etc/utt/@var{program}.conf}
563@end example
564
565for system-wide configuration file and
566
567@example
568        @file{~/.utt/@var{program}.conf}
569@end example
570
571for user configuration file.
572
573@c The configuration file to load may be also specified with the
574@c @option{--config} option. Configuration file need not be provided.
575
576For each option, the value is set according to the following priority:
577
578@itemize
579@item command line
580@c @item configuration file indicated with @option{--config} option
581@item user configuration file (or configuration file indicated with the @option{--config} option)
582@item system-wide configuration file
583@end itemize
584
585Parameter values are specified in the following format:
586
587@var{parametername}=@var{value}
588
589where @var{parametername} is the short or long name of an option accepted by
590the program, or
591
592@var{parametername}
593
594if the option does not need arguments.
595
596You can introduce comments to configuration files using the # sign.
597
598If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
599
600@c The equal sign may be omitted.
601
602
603@quotation Tip
604If you have two (or more) frequently used sets of options for the same
605program (eg. lem with PMDBF dictionary and lem with a user dictionary)
606a good solution is to create two soft links to lem, called
607eg. lemg and lemu and specify their configuration in files lemg.conf
608and lemu.conf respectively.
609@end quotation
610
611@c ---------------------------------------------------------------------
612@c COMPONENTS
613@c ---------------------------------------------------------------------
614
615@node UTT components
616@chapter UTT components
617
618UTT components are of three types:
619
620@menu
621Sources: programs which read non-UTT data (e.g. raw text) and produce output
622in UTT format
623* tok::         a tokenizer
624
625Filters: programs which read and produce UTT-formatted data
626* lem::         a morphological analyzer
627* gue::         a morphological guesser
628* cor::         a simple spelling corrector
629* kor::         a more elaborated spelling corrector
630* sen::         a sentensizer
631* ser::         a pattern search tool (marks matches)
632* mar::         a pattern search tool (introduces arbitrary markers into the text)
633* grp::         a pattern search tool (selects sentences containing a match)
634@c * gph::         a word-graph annotation tool::
635@c * dgp::         a dependency parser
636
637Sinks: programs which read UTT data and produce output in another format
638* kot::         an untokenizer
639* con::         a concordance table generator
640@end menu
641
642@c ---------------------------------------------------------------------
643@c TOK
644@c ---------------------------------------------------------------------
645
646@page
647@node tok
648@section tok - a tokenizer
649
650@c ----------------------------------------
651
652@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
653@item @strong{Authors:}                 @tab Tomasz Obrêbski
654@item @strong{Component category:}      @tab source
655@item @strong{Input format:}            @tab raw text file
656@item @strong{Output format:}           @tab UTT regular
657@item @strong{Required annotation:}     @tab -
658@end multitable
659
660
661@menu
662* tok description::
663* tok input::
664* tok output::
665* tok command line options::
666* tok example::
667@end menu
668
669@node tok description
670@subsection Description
671
672@code{tok} is a simple program which reads a text file and identifies
673tokens on the basis of their orthographic form.  The type of the token
674is printed as the @var{type} field.
675
676@node tok input
677@subsection Input
678
679Raw text.
680
681@node tok output
682@subsection Output
683
684UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
685
686@itemize
687
688@item @code{W}
689(word)
690- continuous sequence of letters
691
692@item @code{N}
693(number)
694- continuous sequence of digits
695
696@item @code{S}
697(space)
698- continuous sequence of space characters
699
700@item @code{P}
701(punctuation mark)
702- single printable characters not belonging to any of the other classes
703
704@item @code{B}
705(unprintable character)
706- single unprintable character
707
708@end itemize
709
710
711
712@node tok command line options
713@subsection Command line options
714
715@table @code
716
717@item @b{@minus{}@minus{}help}, @b{@minus{}h}
718Print help.
719
720@item @b{@minus{}@minus{}version}, @b{@minus{}V}
721Print version information.
722
723@item @b{@minus{}@minus{}interactive, @minus{}i}
724This option toggles interactive mode, which is by default off. In the
725interactive mode the program does not buffer the output.
726
727@end table
728
729@node tok example
730@subsection Example
731
732Input:
733
734@example
735Piszemy dobre programy.
736@end example
737
738Output:
739
740@example
7410000 07 W Piszemy
7420007 01 S _
7430008 05 W dobre
7440013 01 S _
7450014 08 W programy
7460022 01 P .
7470023 01 S \n
748@end example
749
750
751@c ---------------------------------------------------------------------
752@c SEN
753@c ---------------------------------------------------------------------
754
755@c @node sen - sentencizer
756@c @chapter sen - sentencizer
757
758@c Authors: Tomasz Obrêbski
759
760@c ---------------------------------------------------------------------
761@c LEM
762@c ---------------------------------------------------------------------
763
764@page
765@node lem
766@section lem - morphological analyzer
767
768@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
769@item @strong{Authors:}                 @tab Tomasz Obrêbski, Micha³ Stolarski
770@item @strong{Component category:}      @tab filter
771@item @strong{Input format:}            @tab UTT regular
772@item @strong{Output format:}           @tab UTT regular
773@item @strong{Required annotation:}     @tab tok
774@end multitable
775
776@menu
777* lem description::             
778* lem command line options::   
779* lem input::
780* lem output::
781* lem example::                 
782* lem dictionaries::           
783* lem hints::           
784@end menu
785
786@node lem description
787@subsection Description
788
789@command{lem} performs morphological analysis of a simple orthographic
790word, returning all its possible morphological annotations,
791disregarding the context.
792
793@c ----------------------------------------
794
795@node lem command line options
796@subsection Command line options
797
798@table @code
799@parhelp
800@parversion
801@parinteractive
802@c @parfile
803@c @paroutput
804@c @parfail
805@c @parcopy
806@parinputfield
807@paroutputfield
808@pardictionary
809@parprocess
810@parselect
811@parunselect
812@paroneline
813@paronefield
814@end table
815
816@c ----------------------------------------
817
818@node lem input
819@subsection Input
820
821Lem reads a UTT file and processes the value of the @var{form} field
822(the input field may be changed with @option{--input-field} option).
823
824@node lem output
825@subsection Output
826
827@command{lem} adds a new annotation field, whose default name is @code{lem}.  In
828case of ambiguity either the segment is multiplicated (default),
829multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
830annotation is produced as the value of single @code{lem} field (option
831@option{--one-field,-1}):
832
833@itemize @bullet
834
835@item
836unambiguous value format:
837
838@example
839   <lemma>,<descr>
840@end example
841
842@item
843ambiguous value format (@option{--one-field} option)
844
845
846@example
847   <lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
848@end example
849
850(alternative descriptions for the same lemma are separated by commas,
851alternative lemmata are separated by semicolons.)
852
853@end itemize
854
855@node lem example
856@subsection Example
857
858Input:
859
860@example
8610000 07 W Piszemy
8620007 01 S _
8630008 05 W dobre
8640013 01 S _
8650014 08 W programy
8660022 01 P .
8670023 01 B \n
868@end example
869
870Output (default):
871
872@example
8730000 07 W Piszemy lem:pisaÊ,V/AiVpMdTrfNpP1
8740007 01 B _
8750008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
8760008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
8770013 01 B _
8780014 08 W programy lem:program,N/GiNpCa
8790014 08 W programy lem:program,N/GiNpCn
8800014 08 W programy lem:program,N/GiNpCv
8810022 01 P .
8820023 01 B \n
883@end example
884
885Output (@option{--one-line} option):
886
887@example
8880000 07 W Piszemy lem:pisaÊ,V/AiVpMdTrfNpP1
8890007 01 S _
8900008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
8910013 01 S _
8920014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
8930022 01 P .
8940023 01 S \n
895@end example
896
897Output (@option{--one-field} option):
898
899@example
9000000 07 W Piszemy lem:pisaÊ,V/AiVpMdTrfNpP1
9010007 01 S _
9020008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
9030013 01 S _
9040014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
9050022 01 P .
9060023 01 S \n
907@end example
908
909@c ----------------------------------------
910
911@node lem dictionaries
912@subsection Dictionaries
913
914@command{lem} requires a dictionary. The dictionary may be provided in
915one of two formats: in text (source) format or in binary (fsa) format.
916
917@subsubheading Text format
918
919Dictionary entries have the following structure:
920
921@example
922<form>;<lemma>,<descr>[;<lemma>,<descr>]
923@end example
924
925@var{lemma} may be given explicitly or in the cut-add format:
926
927@example
928@code{[<cut1><add1>-]<cut2><add2>}
929@end example
930
931meaning: replace prefix of length @code{<cut1>} with
932string @code{<add1>}, replace suffix of length @code{<cut2>} with string
933@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
934@samp{kot}, @code{3-4a³y} transforms @samp{najbielsi} into @samp{bia³y}
935
936Each dictionary entry must be written in one line and must not contain blank characters.
937
938Examples:
939@example
940kot;0,N/GaNsCn
941kota;1,N/GaNsCg;1,N/GaNsCa
942kotu;1,N/GaNsCd
943kotem;2,N/GaNsCi
944kocie;3t,N/GaNsCl;3t,N/GaNsCv
945najbielsi;3-4a³y,ADJ/DsNpCnGp
946najbielsze;3-5a³y,ADJ/DsNpCnGaifn
947najlepsi;dobry,ADJ/DsNpCnGp
948najlepsze;dobry,ADJ/DsNpCnGaifn
949@end example
950
951
952The mandatory file name extension for a text dictionary is @code{dic}. For large
953dictionaries it is preferable, however, to compile them into binary
954(fsa) format.
955
956@subsubheading Binary format
957
958The mandatory file name extension for a binary dictionary is @code{bin}. To
959compile a text dictionary into binary format, write:
960
961@example
962compiledic <dictionaryname>.dic
963@end example
964
965@subsubheading Polex/PMDBF dictionary
966
967A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
968the distribution as the default @emph{lem}'s dictionary. It's
969located by default in:
970
971@file{$HOME/.local/share/utt/pl_PL.ISO-8859-2/lem.bin}
972
973in local installation or in
974
975@file{/usr/local/share/utt/pl_PL.ISO-8859-2/lem.bin}
976
977in system installation.
978
979@node lem hints
980@subsection Hints
981
982@subsubheading Combining data from multiple dictionaries
983
984@itemize
985
986@item Apply <dict1>, then apply <dict2> to words which were not annotatated.
987
988@example
989lem -d <dict1> | lem -S lem -d <dict2>
990@end example
991
992@item Add annotations from two dictionaries <dict1> and <dict2>.
993
994@example
995lem -c -d <dict1> | lem -S lem -d <dict2>
996@end example
997
998@end itemize
999
1000
1001@c ---------------------------------------------------------------------
1002@c GUE
1003@c ---------------------------------------------------------------------
1004
1005@page
1006@node gue
1007@section gue - morphological guesser
1008
1009@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1010
1011@item @strong{Authors:}                 @tab Micha³ Stolarski, Tomasz Obrêbski
1012@item @strong{Component category:}      @tab filter
1013
1014@end multitable
1015
1016@menu
1017* gue description::   
1018* gue command line options::   
1019* gue example::                 
1020* gue dictionaries::           
1021@end menu
1022
1023
1024@node gue description
1025@subsection Description
1026
1027@command{gue} guesess morphological descriptions of the form contained
1028in the @var{form} field.
1029
1030
1031@node gue command line options
1032@subsection Command line options
1033
1034@table @code
1035
1036@parhelp
1037@parversion
1038@parinteractive
1039@c @parfile
1040@c @paroutput
1041@c @parfail
1042@c @parcopy
1043@parinputfield
1044@paroutputfield
1045@pardictionary
1046@parprocess
1047@parselect
1048@parunselect
1049@paroneline
1050@paronefield
1051
1052@item @b{@minus{}@minus{}delta=@var{n}}
1053Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
1054
1055
1056@item @b{@minus{}@minus{}cut-off=@var{n}}
1057Do not display answers with less weight than cut-off value (default=`200').
1058
1059
1060@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
1061Guess up to n descriptions  (default=`0', which means 'display all results').
1062
1063
1064
1065@end table
1066
1067@node gue example
1068@subsection Example
1069
1070@example
1071command: gue -n 2
1072
1073input:
10740000 07 W smerfny
1075
1076output:
10770000 07 W smerfny gue:,ADJ/CaDpGiNs
10780000 07 W smerfny gue:,ADJ/CnvDpGaipNs
1079@end example
1080                                 
1081
1082@node gue dictionaries
1083@subsection Dictionaries
1084
1085@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
1086The fsa format is created by compiling text-format dictionaries.
1087
1088
1089
1090@subsubheading Text format
1091
1092Dictionary entries have the following structure:
1093
1094@example
1095@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
1096@end example
1097
1098@var{lemma} must be given in the cut-add format:
1099
1100@example
1101@code{[<cut1><add1>-]<cut2><add2>}
1102@end example
1103(no spaces in between): replace prefix of length @var{cut1} with
1104string @var{add1}, replace suffix of length @var{cat2} with string
1105@var{add2}.
1106
1107
1108Example: @code{3-4a³y} transforms @i{najbielsi} into @i{bia³y}
1109
1110
1111@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
1112
1113@var{weight} is an integer value between 1 and 999 indicating the
1114likelihood of the guess.
1115
1116@example
1117*³kê;1a,N/GfNsCa
1118naj*elszy;3-4a³y,ADJ/...:...
1119@end example
1120
1121
1122@c ---------------------------------------------------------------------
1123@c COR
1124@c ---------------------------------------------------------------------
1125
1126@page
1127@node cor
1128@section cor - spelling corrector
1129
1130@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1131@item @strong{Authors:}                 @tab Tomasz Obrêbski, Micha³ Stolarski
1132@item @strong{Component category:}      @tab filter
1133@item @strong{Input format:}            @tab UTT regular
1134@item @strong{Output format:}           @tab UTT regular
1135@item @strong{Required annotation:}     @tab tok
1136@end multitable
1137
1138@menu
1139* cor description::
1140* cor command line options::   
1141* cor dictionaries::           
1142@end menu
1143
1144
1145@node cor description
1146@subsection Description
1147
1148The spelling corrector applies Kemal Oflazer's dynamic programming
1149algorithm @cite{oflazer96} to the FSA representation of the set of
1150word forms of the Polex/PMDBF dictionary. Given an incorrect
1151word form it returns all word forms present in the dictionary whose
1152edit distance is smaller than the threshold given as the parameter.
1153
1154
1155@node cor command line options
1156@subsection Command line options
1157
1158@table @code
1159
1160@parhelp
1161@parversion
1162@parinteractive
1163@c @parfile
1164@c @paroutput
1165@c @parfail
1166@c @parcopy
1167@parinputfield
1168@paroutputfield
1169@pardictionary
1170@parprocess
1171@parselect
1172@parunselect
1173@paroneline
1174@paronefield
1175
1176@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
1177Maximum edit distance (default='1').
1178
1179@c @item @b{@minus{}@minus{}replace, @minus{}r}
1180@c Replace original form with corrected form, place original form in the
1181@c cor field. This option has no effect in @option{--one-*} modes (default=off)
1182
1183
1184@end table
1185
1186@node cor dictionaries
1187@subsection Dictionaries
1188
1189@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
1190The fsa format is created by compiling text-format dictionaries.
1191
1192@subsubheading Text format
1193
1194The @command{cor} dictionary is a list of words:
1195@example
1196odlot
1197odlotowy
1198odludek
1199@end example
1200
1201@subsubheading Binary format
1202
1203The mandatory file name extension for a binary dictionary is @code{bin}. To
1204compile a text dictionary into binary format, write:
1205
1206@example
1207compiledic <dictionaryname>.dic
1208@end example
1209
1210@c ---------------------------------------------------------------------
1211@c KOR
1212@c ---------------------------------------------------------------------
1213
1214@page
1215@node kor
1216@section kor - configurable spelling corrector
1217
1218[TODO]
1219
1220@c ---------------------------------------------------------------------
1221@c SEN
1222@c ---------------------------------------------------------------------
1223
1224@page
1225@node sen
1226@section sen - a sentensizer
1227
1228@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1229
1230@item @strong{Authors:}                 @tab Tomasz Obrêbski
1231@item @strong{Component category:}      @tab filter
1232@item @strong{Input format:}            @tab UTT regular
1233@item @strong{Output format:}           @tab UTT regular
1234@item @strong{Required annotation:}     @tab tok
1235
1236@end multitable
1237
1238
1239@menu
1240* sen description::
1241@c * sen input::
1242@c * sen output::
1243* sen example::                 
1244@end menu
1245
1246@node sen description
1247@subsection Description
1248
1249@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
1250
1251@node sen example
1252@subsection Example
1253
1254@example
1255command: sen
1256
1257input:
12580000 05 W Cze¶Ê
12590005 01 P !
12600006 01 S _
12610007 02 W To
12620009 01 S _
12630010 02 W ja
12640012 01 P .
12650013 01 S \n
1266
1267output:
12680000 00 BOS *
12690000 05 W Cze¶Ê
12700005 01 P !
12710006 00 EOS *
12720006 00 BOS *
12730006 01 S _
12740007 02 W To
12750009 01 S _
12760010 02 W ja
12770012 01 P .
12780013 01 S \n
12790014 00 EOS *
1280@end example
1281
1282
1283@c ---------------------------------------------------------------------
1284@c GPH
1285@c ---------------------------------------------------------------------
1286
1287@c @node gph - graphizer
1288@c @chapter gph - graphizer
1289
1290@c Authors: Tomasz Obrêbski
1291
1292
1293
1294@c ---------------------------------------------------------------------
1295@c SER
1296@c ---------------------------------------------------------------------
1297
1298@page
1299@node ser
1300@section ser - pattern search tool
1301
1302@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1303@item @strong{Authors:}                 @tab Tomasz Obrêbski
1304@item @strong{Component category:}      @tab filter
1305@item @strong{Input format:}            @tab UTT regular
1306@item @strong{Output format:}           @tab UTT regular
1307@item @strong{Required annotation:}     @tab tok,  lem --one-field
1308@end multitable
1309
1310@menu
1311* ser description::
1312* ser command line options::   
1313* ser pattern::                 
1314* ser how ser works::           
1315* ser customization::           
1316* ser limitations::             
1317* ser requirements::           
1318@end menu
1319
1320
1321@node ser description
1322@subsection Description
1323
1324@command{ser} looks for patterns in UTT-formatted texts.
1325
1326
1327@c ---------------------------------------------------------------------
1328@node ser command line options
1329@subsection Command line options
1330
1331@table @code
1332
1333@parhelp
1334@parversion
1335@c @parfile
1336@c @paroutput
1337@c @parinputfield
1338@c @paroutputfield
1339@parprocess
1340@parinteractive
1341
1342@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1343The search pattern.
1344
1345@item @b{@minus{}@minus{}morph=@var{field}}
1346The name of the annotation field containing the morphological
1347description (default @code{lem}).
1348
1349@item @b{@minus{}@minus{}flex}
1350Only print the generated flex source code.
1351
1352@item @b{@minus{}@minus{}macro=@var{filename}}
1353Read macrodefinitions from file @var{filename} rather than from
1354default location. This option allows to redefine the set of terms.
1355
1356@item @b{@minus{}@minus{}define=@var{filename}}
1357Append macrodefinitions from file @var{filename}. This option
1358allows to extend the set of terms.
1359
1360@end table
1361
1362
1363@c ---------------------------------------------------------------------
1364@node ser pattern
1365@subsection Pattern
1366
1367The @command{ser} pattern is a regular expression over terms corresponding
1368to text segments or segment sequences. Predefined terms are:
1369
1370@table @code
1371
1372@item seg(@var{t},@var{f},@var{a})
1373a segment of type @var{t}, containing form @var{f} and annotation
1374@var{a}
1375
1376@item form(@var{f})
1377a segment containing form @var{f}
1378
1379@item field(@var{f})
1380a segment containing annotation field @var{f}
1381
1382@item space(@var{f})
1383a space segment of form @var{f}
1384
1385@item word(@var{f})
1386a word segment of form @var{f}
1387
1388@item punct(@var{f})
1389a punct segment of form @var{f}
1390
1391@item number(@var{f})
1392a number segment of form @var{f}
1393
1394@item lexeme(@var{f})
1395a word segment with lemma @var{f}
1396
1397@item cat(@var{c})
1398a word segment of category @var{c}
1399
1400@end table
1401
1402All arguments are optional. If an argument is omitted, an arbitrary
1403string of non-blank characters is assumed as the argument value. Term
1404arguments may be arbitrary character-level regular expressions. The
1405following special symbols can by used:
1406
1407@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1408@item @code{[@dots{}]}            @tab a character class
1409@item @code{[^@dots{}]}           @tab a negated character class
1410@item @code{|}                    @tab alternative
1411@item @code{*}                    @tab repetition, including zero times
1412@item @code{+}                    @tab repetition, at least one time
1413@item @code{?}                    @tab optionality
1414@item @code{@{@var{m},@var{n}@}}  @tab repetition from @var{m} to @var{n} times
1415@item @code{@{@var{m},@}}         @tab repetition @var{m} or more times
1416@item @code{@{@var{m}@}}          @tab repetition @var{m} times
1417@item @code{@var{\ddd}}           @tab the character with octal value @var{ddd}
1418@item @code{\x@var{hh}}           @tab the character with hexadecimal value @var{hh}
1419@item @code{( )}                  @tab parentheses, used to override precedence
1420@c @end multitable
1421
1422@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1423@item @code{.}    @tab a non-blank character
1424@item @code{\w}   @tab a letter
1425@item @code{\W}   @tab a non-blank character other than a letter
1426@item @code{\d}   @tab a digit
1427@item @code{\D}   @tab a non-blank character other than a digit
1428@item @code{\s}   @tab a space or tab character
1429@item @code{\S}   @tab a non-blank character (the same as @code{.})
1430@item @code{\l}   @tab a lowercase letter
1431@item @code{\L}   @tab an uppercase letter
1432@end multitable
1433
1434
1435@noindent The following characters:
1436@example
1437@verb{%  [   ]   ^   |   *   +   ?   {   }   ,   .   <   >   \ %}
1438@end example
1439must be escaped with a backslash, i.e. written as:
1440@example
1441@verb{% \[  \]  \^  \|  \*  \+  \?  \{  \}  \,  \.  \<  \>  \\ %}
1442@end example
1443
1444@quotation Note
1445The special symbols are ... borrowed from Perl with minor
1446modifications ... for convenience
1447The meaning of certain special characters/sequences slightly differs
1448from their common ???. This is motivated by convenience reasons.
1449The meaning of the @code{.} special character is modified due to
1450the special function of spaces in utt files (they are field
1451separators). Use @code{\s} to explicitly
1452@end quotation
1453
1454In the argument of the @code{cat} term a special operator <...> may be
1455used. A category specification enclosed in angle brackets matches all
1456category descriptions which are consistent (non-contradictory) with the
1457specification. For example @code{<N>} matches all noun descriptions,
1458@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
1459
1460
1461@*
1462@noindent @b{Examples of one-segment patterns:}
1463
1464@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1465@item @code{seg}            @tab any segment
1466@item @code{word}           @tab any word-form
1467@item @code{word(pomocy)}   @tab the word-form @samp{pomocy}
1468@item @code{word(naj.+)}    @tab a word-form beginning with @samp{naj}
1469@item @code{word(\L\l+)}    @tab a capitalized word-form
1470@item @code{punct}          @tab a punctuation character
1471@item @code{space(.*\\n.*)} @tab a space segment containing a newline character
1472@item @code{lexeme(pomoc)}  @tab any form of the lexeme 'pomoc'
1473@item @code{cat(N/.*)}      @tab a word which category starts with @code{N/}
1474@item @code{cat(<N/Ca>)}    @tab a word which category matches @code{N/Ca}
1475@end multitable
1476
1477@*
1478@noindent @b{Examples of multi-segment patterns:}
1479
1480@table @code
1481
1482@item (word(\L) punct(\.) space?)+ word(\L\l+)
1483a sequence of initials followed by a surname
1484
1485@item punct seg(W|S|N)* cat(<NPRO/Sr>) seg(W|S|N)* punct
1486a text fragment between two punctuation characters, containing an
1487ocurrence of a relative pronoun
1488
1489@end table
1490
1491
1492@node ser how ser works
1493@subsection How ser works
1494
1495@node ser customization
1496@subsection Customization
1497
1498@c All predefined terms correspond to single segments,
1499
1500@example
1501define(`verbseq', `(cat(<V>) (space cat(<V>)))')
1502@end example
1503
1504
1505the term @code{cat()} may not be used as a ... of
1506
1507@c See @command{m4} manual for further details on macro definition format.
1508
1509@node ser limitations
1510@subsection Limitations
1511
1512Do not use more than 3 attributes in <>.
1513
1514@node ser requirements
1515@subsection Requirements
1516
1517In order to run @command{ser}, the following programs must be
1518installed in the system:
1519
1520@itemize
1521
1522@item @command{m4}
1523@item @command{grep}
1524@item @command{flex}
1525@item @command{gcc}
1526
1527@end itemize
1528
1529
1530@c ---------------------------------------------------------------------
1531@c GRP
1532@c ---------------------------------------------------------------------
1533
1534@page
1535@node grp
1536@section grp - pattern search tool
1537
1538@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1539@item @strong{Authors:}                 @tab Tomasz Obrêbski
1540@item @strong{Component category:}      @tab filter
1541@item @strong{Input format:}            @tab UTT flattened
1542@item @strong{Output format:}           @tab UTT flattened
1543@item @strong{Required annotation:}     @tab tok, sen, lem --one-field
1544@end multitable
1545
1546
1547@menu
1548* grp description::
1549* grp command line options::   
1550* grp pattern::                 
1551* grp hints::   
1552@end menu
1553
1554
1555@node grp description
1556@subsection Description
1557
1558@code{gre} selects sentences containing an expression matching a
1559pattern. The pattern format is exactly the same as that accepted by
1560@code{ser}.
1561
1562@code{gre} is intended mainly for speeding up corpus search process.
1563It is extremely fast (processing speed is usually higher then the speed
1564of reading the corpus file from disk).
1565
1566@node grp command line options
1567@subsection Command line options
1568
1569@table @code
1570
1571@parhelp
1572@parversion
1573@parprocess
1574@parinteractive
1575
1576@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1577The search pattern.
1578
1579@item @b{@minus{}@minus{}morph=@var{field}}
1580The name of the annotation field containing the morphological
1581description (default @code{lem}).
1582
1583@item @b{@minus{}@minus{}command}
1584Only print the generated flex source code.
1585
1586@item @b{@minus{}@minus{}macro=@var{filename}}
1587Read macrodefinitions from file @var{filename} rather than from
1588default location. This option allows to redefine the set of terms.
1589
1590@item @b{@minus{}@minus{}define=@var{filename}}
1591Append macrodefinitions from file @var{filename}. This option
1592allows to extend the set of terms.
1593
1594@end table
1595
1596
1597@node grp pattern
1598@subsection Pattern
1599
1600(see @code{ser})
1601
1602@node grp hints
1603@subsection Hints
1604
1605The corpus search speed may be increased by combining grp with lzop
1606compression tool (grp usually processes data faster than it is read from a
1607disk, especially for slow laptop drives).
1608
1609@example
1610cat corpus | tok | sen | lem -1 | fla | lzop -7 > corpus.grp.lzo
1611@end example
1612
1613@example
1614lzop -cd corpus.grp.lzo | grp -e @var{EXPR} | unfla | ser -e @var{EXPR}
1615@end example
1616
1617
1618
1619@c ---------------------------------------------------------------------
1620@c MAR
1621@c ---------------------------------------------------------------------
1622
1623@page
1624@node mar
1625@section mar
1626
1627@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1628@item @strong{Authors:}                 @tab Marcin Walas, Tomasz Obrêbski
1629@item @strong{Input format:}            @tab UTT flattened
1630@item @strong{Output format:}           @tab UTT flattened
1631@item @strong{Required annotation:}     @tab tok, sen, lem -1
1632@end multitable
1633
1634[TODO]
1635
1636(see mar's help 'mar -h' for some information)
1637
1638@c ---------------------------------------------------------------------
1639@c KOT
1640@c ---------------------------------------------------------------------
1641
1642
1643@page
1644@node kot
1645@section kot - untokenizer
1646
1647@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1648@item @strong{Authors:}                 @tab Tomasz Obrêbski
1649@item @strong{Component category:}      @tab filter
1650@item @strong{Input format:}            @tab UTT regular
1651@item @strong{Output format:}           @tab text
1652@item @strong{Required annotation:}     @tab tok
1653@end multitable
1654
1655
1656@menu
1657* kot description::
1658* kot command line options::   
1659* kot usage examples::   
1660@end menu
1661
1662@node kot description
1663@subsection Description
1664
1665@command{kot} transforms a UTT formatted file back into raw text format.
1666
1667@node kot command line options
1668@subsection Command line options
1669
1670@table @code
1671
1672@parhelp
1673
1674@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1675
1676@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1677
1678@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1679
1680@c @item @b{@minus{}@minus{}interactive @minus{}i}
1681
1682@c @item @b{@minus{}@minus{}config=@var{filename}}
1683
1684@item
1685
1686@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
1687print @var{string} between nonadjacent segments of the input file
1688
1689@item @b{@minus{}@minus{}spaces, @minus{}r}
1690retain the special characters @code{_}, @code{\t},
1691@code{\n}, @code{\r}, @code{\f} unexpanded in the output
1692
1693@end table
1694
1695@node kot usage examples
1696@subsection Usage examples
1697
1698@example
1699cat legia.txt | tok | kot       
1700@end example
1701
1702@example
1703cat legia.txt | tok | lem -1 | kot
1704@end example
1705
1706@c ---------------------------------------------------------------
1707@c CON
1708@c ---------------------------------------------------------------
1709
1710
1711@page
1712@node con
1713@section con - concordance table generator
1714
1715@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1716@item @strong{Authors:}                 @tab Justyna Walkowska
1717@item @strong{Component category:}      @tab sink
1718@item @strong{Input format:}            @tab UTT regular
1719@item @strong{Output format:}           @tab text
1720@item @strong{Required annotation:}     @tab ser or mar
1721@end multitable
1722@c
1723
1724@menu
1725* con description::
1726* con command line options::
1727* con usage example::
1728* con hints::   
1729@end menu
1730
1731
1732@node con description
1733@subsection Description
1734
1735@command{con} generates a concordance table based on a pattern given to @command{ser}.
1736
1737
1738@node con command line options
1739@subsection Command line options
1740
1741@table @code
1742
1743@parhelp
1744
1745@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
1746@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1747@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1748@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1749@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
1750@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
1751@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
1752@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
1753@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
1754@c @item @b{@minus{}@minus{}interactive @minus{}i}
1755@c @item @b{@minus{}@minus{}config=@var{filename}}
1756@c @item
1757@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1758@c search pattern
1759@c
1760@c @item @b{@minus{}@minus{}flex}
1761@c only print the generated flex source code
1762@c
1763@c @item @b{@minus{}@minus{}macro=@var{filename}}
1764@c read macrodefinitions from file @var{filename} rather than from
1765@c default location. This option allows to redefine the set of terms.
1766@c
1767@c @item @b{@minus{}@minus{}define=@var{filename}}
1768@c append macrodefinitions from file @var{filename}. This option
1769@c allows to extend the set of terms.
1770
1771@item @b{@minus{}@minus{}left @minus{}l}           
1772        Left context info (default='30c'). Example:
1773@example                         
1774                                 -l=5c: left context is 5 characters
1775                                 -l=5w: left context is 5 words
1776                                 -l=5s: left context is 5 non-empty input lines
1777                                 -l='\s*\S+\sr\S+BOS': left context starts with the given regex
1778@end example
1779
1780@item @b{@minus{}@minus{}right @minus{}r}           
1781        Right context info (default='30c').
1782@item @b{@minus{}@minus{}trim @minus{}t}           
1783        Clear incomplete words from output.
1784@item @b{@minus{}@minus{}white @minus{}w}           
1785        DO NOT change all white characters into spaces.
1786@item @b{@minus{}@minus{}column @minus{}c}           
1787        Left column minimal width in characters (default = 0).
1788@item @b{@minus{}@minus{}ignore @minus{}i}           
1789        Ignore segment inconsistency in the input.
1790@item @b{@minus{}@minus{}bom}           
1791        Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
1792@item @b{@minus{}@minus{}eom}           
1793        End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
1794@item @b{@minus{}@minus{}bod}           
1795        Selected segment beginning display string (default='[').
1796@item @b{@minus{}@minus{}eod}           
1797        Selected segment end display string (default=']').
1798
1799
1800
1801@end table
1802
1803@node con usage example
1804@subsection Usage example
1805@example
1806cat file.txt | tok | lem -1 | ser -e 'lexeme(dom)' | con 
1807@end example
1808
1809
1810@node con hints
1811@subsection Hints
1812
1813@command{con} is a rather slow program. Do not pass large amounts of
1814redundant text through this program. @command{con} works fine in the following
1815sequence:
1816
1817@example
1818... | grp -e EXPR | ser -e EXPR | con
1819@end example
1820
1821
1822@c ---------------------------------------------------------------------
1823@c ---------------------------------------------------------------------
1824
1825@page
1826@node Auxiliary tools
1827@chapter Auxiliary tools
1828
1829@menu
1830* compiledic::         dictionary compiler
1831* fla::                UTT file flattener
1832* unfla::              UTT file unflattener
1833@end menu
1834
1835
1836@page
1837@node compiledic
1838@section compiledic - the dictionary compiler
1839
1840@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1841@item @strong{Authors:}                 @tab Michal Stolarski, Tomasz Obrebski
1842@item @strong{Component category:}      @tab additional tool
1843@end multitable
1844@c
1845
1846@command{compiledic} compiles dictionaries in text format (@code{.dic} extension) into binary
1847(FSA) format (@code{.bin} extension).
1848
1849Automaton representation of a dictionary is built using the AT&T tools:
1850@itemize
1851@item AT&T FSM Library,
1852@item AT&T Lextools.
1853@end itemize
1854
1855In order for the compiledic program to work you have to install the
1856above mentioned packages into your system.  They are freely available
1857for non-commercial use.
1858
1859Usage:
1860@example
1861        compiledic <dictionaryname>.dic
1862@end example
1863
1864The file <dictionaryname>.bin will be generated.
1865
1866Remarque: The program produces a lot of temporary files which are
1867stored in the current directory. They are deleted after successfull
1868termination of the program.
1869
1870@c @menu
1871@c * con command line options::
1872@c * con usage example::
1873@c * con hints::   
1874@c @end menu
1875
1876
1877@c -------------------------------------------------------------------------------
1878@c FLA
1879@c -------------------------------------------------------------------------------
1880
1881@page
1882@node fla
1883@section fla - the UTT file flattener
1884
1885@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1886@item @strong{Authors:}                 @tab Tomasz Obrêbski
1887@item @strong{Input format:}            @tab UTT regular
1888@item @strong{Output format:}           @tab UTT flattened
1889@item @strong{Required annotation:}     @tab sen
1890@end multitable
1891@c
1892
1893@menu
1894* fla description::
1895@c * fla command line options::
1896@c * fla usage example::
1897@end menu
1898
1899
1900@node fla description
1901@subsection Description
1902
1903@command{fla} ``flattens'' a utt file by merging segments belonging
1904to one sentence in one line. Technically, end-of-line characters
1905('\n', ASCII code 10) are replaced with line-feed characters ('\f',
1906ASCII code 12).  The flattening makes it possible to process UTT files
1907with such tools as @command{grep} or @command{sed} sentence by
1908sentence (used in @command{grp} and @command{mar}).
1909
1910Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
1911
1912Flattened files are still human-readible.
1913
1914Usage:
1915
1916@example
1917        fla [<bosregex>]
1918@end example
1919
1920The facultative argument is a regular expression describing segments
1921which should be treated as sentence beginnings (the test is: the
1922segment contains a fragment matching the @code{<bosregex>}). By
1923default, segments containing a field @code{BOS} are seeked.
1924
1925@c -------------------------------------------------------------------------------
1926@c UNFLA
1927@c -------------------------------------------------------------------------------
1928
1929@page
1930@node unfla
1931@section unfla - the UTT file unflattener
1932
1933@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1934@item @strong{Authors:}                 @tab Tomasz Obrêbski
1935@item @strong{Input format:}            @tab UTT flattened
1936@item @strong{Output format:}           @tab UTT regular
1937@item @strong{Required annotation:}     @tab -
1938@end multitable
1939
1940@menu
1941* unfla description::
1942@c * fla command line options::
1943@c * fla usage example::
1944@end menu
1945
1946@node unfla description
1947@subsection Description
1948@command{unfla} transforms a flattened UTT file, produced by
1949@command{fla}, into the regular format by restoring end-of-line
1950characters.
1951
1952
1953
1954
1955@c ---------------------------------------------------------------------
1956@c USAGE EXAMPLES
1957@c ---------------------------------------------------------------------
1958
1959@node Usage examples
1960@chapter Usage examples
1961
1962@subsubheading Simple pipelines
1963
1964@enumerate
1965
1966@item tokenization
1967
1968cat text | tok > output1
1969
1970@item morphological annotation (1)
1971
1972simple dictionary based lemmatization
1973
1974cat text | tok | lem > output1
1975
1976@item morphological annotation (2)
1977
19781) perform dictionary-based lemmatization
19794) guess descriptions for words which have no annotation
1980
1981@example
1982cat text | tok | lem | gue -S lem > output2
1983@end example
1984
1985@item morphological annotation (3)
1986
19871) perform dictionary-based lemmatization
19882) try to correct words with no annotation
19893) perform dictionary-based lemmatization of corrected words
19904) guess descriptions for words which still have no annotation
1991
1992@example
1993cat text | tok | lem | cor -p W -S lem | lem -I cor | gue -p W -S lem
1994@end example
1995@item spelling correction
1996
1997
1998
1999@example
2000cat text | tok | egrep ' W ' | lem | egrep -v 'lem:' | cor -1
2001@end example
2002
2003@item Expression extraction
2004
2005Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
2006
2007@example
2008cat text | tok | lem -1 | ser -e 'cat(<V>) space lexeme(rozmowa)' -m | kot > output4
2009@end example
2010
2011@item A word in context
2012
2013Extraction of text fragments containing a form of the lexeme 'rozmowa' in
2014the context of 5 preceeding and 5 succeeding corpus segments.
2015
2016@example
2017cat text | tok | lem -1 | ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m | kot > output
2018@end example
2019
2020@item generation of concordance table (1)
2021
2022@example
2023cat text | tok | lem -1 | ser -e 'cat(<V>) space lexeme(rozmowa)' | con
2024@end example
2025
202610"
2027
2028@item generation of concordance table (2)
2029
2030The same as above but much faster
2031
2032@example
2033cat text | tok | lem -1 | \
2034grp -e 'cat(<V>) space lexeme(rozmowa)' | \
2035ser -e 'cat(<V>) space lexeme(rozmowa)' | \
2036con
2037@end example
2038
20392"
2040
2041@item generation of concordance table (3)
2042
2043Usually, one performs repetitively search over the same corpus. In
2044such case it is advisable to transform the corpus data into the format
2045required by @command{grp} first, and then use the preprocessed data.
2046
2047As @command{grp} (@command{grep}) processes data faster then it is
2048read from the disk drive, the search time may be still shortened by
2049using file compression techniques.  We suggest using the
2050@command{lzop} compressor/decompressor.
2051
2052@item the fastest way to search a large corpus
2053
2054step 1: corpus preprocessing
2055
2056@example
2057cat corpus | tok | sen | lem -1 \
2058| fla | lzop -7 > corpus.grp.lzo
2059@end example
2060
2061step 2: search
2062
2063@example
2064lzop -cd corpus.grp.lzo | unfla | grp -e 'cat(<V>) space
2065lexeme(rozmowa)' | ser -e 'cat(<V>) space lexeme(rozmowa)' | con
2066@end example
2067
2068@end enumerate
2069
2070@c @subsubheading More complicated configurations
2071
2072
2073@c @example
2074@c mknod fifo1 p
2075@c mknod fifo2 p
2076@c mknod fifo3 p
2077@c mknod fifo4 p
2078@c mknod fifo5 p
2079
2080@c tok | lem -p W -e fifo1 > fifo2 &
2081@c cor -e fifo3 < fifo1 | lem > fifo4 &
2082@c gue < fifo3 > fifo5 &
2083@c sort -m fifo2 fifo4 fifo5
2084
2085@c rm fifo?
2086@c @end example
2087
2088
2089@c ---------------------------------------------------------------------
2090@c ---------------------------------------------------------------------
2091
2092@c ---------------------------------------------------------------------
2093@c PMDBF DICTIONARY
2094@c ---------------------------------------------------------------------
2095
2096@node PMDBF dictionary
2097@chapter PMDBF dictionary
2098
2099UTT components come with lexical data derived from Polish
2100Morphological Database (PMDB).
2101
2102@menu
2103* PMDBF files::   
2104* PMDBF tag structure::                 
2105* PMDBF parts of speech::           
2106* PMDBF morphosyntactic attributes::           
2107@end menu
2108
2109@node PMDBF files
2110@section Files
2111
2112@node PMDBF tag structure
2113@section Tag structure
2114
2115pos = [[:upper:]]+
2116
2117attr = [[:upper:]]+
2118
2119val = [[:lower:][:digit:]?!*+-] | <[^>\n]+>
2120
2121descr = pos ( / ( attr val + ) + ) ?
2122
2123@node PMDBF parts of speech
2124@section Parts of speech
2125
2126@multitable {ADJPRP} { adjectival-passive-participle }
2127@item @code{N} @tab noun
2128@item @code{NPRO} @tab nominal-pronoun
2129@item @code{NV} @tab deverbal-noun
2130@item @code{V} @tab verb
2131@item @code{BYC} @tab byc
2132@item @code{VNI} @tab non-inflected-verb
2133@item @code{ADJ} @tab adjective
2134@item @code{ADJPAP} @tab adjectival-passive-participle
2135@item @code{ADJPRP} @tab adjectival-present-participle
2136@item @code{ADJPP} @tab adjectival-past-participle
2137@item @code{ADJPRO} @tab adjectival-pronoun
2138@item @code{ADJNUM} @tab adjectival-numeral
2139@item @code{ADV} @tab adverb
2140@item @code{ADVANP} @tab adverbial-anterior-participle
2141@item @code{ADVPRP} @tab adverbial-present-participle
2142@item @code{ADVPRO} @tab adverbial-pronoun
2143@item @code{ADVNUM} @tab  adverbial-numeral
2144@item @code{P} @tab preposition
2145@item @code{PPRO} @tab prep-noun-pronoun
2146@item @code{CONJ} @tab conjunction
2147@item @code{EXCL} @tab exclamation
2148@item @code{APP} @tab call
2149@item @code{ONO} @tab onomatopoeia
2150@item @code{PART} @tab particle
2151@item @code{NUMCRD} @tab cardinal-numeral
2152@item @code{NUMCOL} @tab collective-numeral
2153@item @code{NUMPAR} @tab partitive-numeral
2154@item @code{NUMORD} @tab ordinal-numeral
2155@end multitable
2156
2157@node PMDBF morphosyntactic attributes
2158@section Morphosyntactic attributes
2159
2160@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2161@c @headitem Attr @tab Val @tab Description
2162@item
2163@code{A} @tab @tab Aspect
2164@item
2165@tab @code{p} @tab perfect
2166@item
2167@tab @code{i} @tab imperfect.
2168@item
2169@item
2170@code{V} @tab @tab Verb-Form
2171@item
2172@tab @code{b} @tab infinitive,
2173@item
2174@tab @code{p} @tab personal,
2175@item
2176@tab @code{i} @tab impersonal.
2177@item
2178@item
2179@code{M} @tab @tab Mood
2180@item
2181@tab @code{d} @tab declarative,
2182@item
2183@tab @code{c} @tab conditional,
2184@item
2185@tab @code{i} @tab imperative.
2186@item
2187@item
2188@code{T} @tab @tab Tense
2189@item
2190@tab @code{a} @tab past,
2191@item
2192@tab @code{r} @tab present,
2193@item
2194@tab @code{f} @tab future.
2195@item
2196@item
2197@code{P} @tab @tab Person
2198@item
2199@tab @code{1} @tab 1,
2200@item
2201@tab @code{2} @tab 2,
2202@item
2203@tab @code{3} @tab 3.
2204@item
2205@item
2206@code{D} @tab @tab Degree
2207@item
2208@tab @code{p} @tab positive,
2209@item
2210@tab @code{c} @tab comparative,
2211@item
2212@tab @code{s} @tab superlative.
2213@item
2214@item
2215@code{N} @tab @tab Number
2216@item
2217@tab @code{s} @tab singular,
2218@item
2219@tab @code{p} @tab plural.
2220@item
2221@item
2222@code{C} @tab @tab Case
2223@item
2224@tab @code{n} @tab nominative,
2225@item
2226@tab @code{g} @tab genitive,
2227@item
2228@tab @code{d} @tab dative,
2229@item
2230@tab @code{a} @tab accusative,
2231@item
2232@tab @code{i} @tab instrumantal,
2233@item
2234@tab @code{l} @tab locative,
2235@item
2236@tab @code{v} @tab vocative.
2237@item
2238@item
2239@code{G} @tab @tab Gender
2240@item
2241@tab @code{p} @tab masculine-personal,
2242@item
2243@tab @code{a} @tab masculine-animal,
2244@item
2245@tab @code{i} @tab masculine-inanimate,
2246@item
2247@tab @code{f} @tab feminine,
2248@item
2249@tab @code{n} @tab neuter.
2250@end multitable
2251
2252
2253@c ---------------------------------------------------------------------
2254@c ---------------------------------------------------------------------
2255@c
2256@c @node Examples
2257@c @chapter Examples
2258
2259@c ----------------------------------------------------------------------
2260@c ----------------------------------------------------------------------
2261
2262@node    GNU Free Documentation License
2263@chapter GNU Free Documentation License
2264
2265@c The GNU Free Documentation License.
2266@center Version 1.2, November 2002
2267
2268@c This file is intended to be included within another document,
2269@c hence no sectioning command or @node.
2270
2271@display
2272Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
227351 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA
2274
2275Everyone is permitted to copy and distribute verbatim copies
2276of this license document, but changing it is not allowed.
2277@end display
2278
2279@enumerate 0
2280@item
2281PREAMBLE
2282
2283The purpose of this License is to make a manual, textbook, or other
2284functional and useful document @dfn{free} in the sense of freedom: to
2285assure everyone the effective freedom to copy and redistribute it,
2286with or without modifying it, either commercially or noncommercially.
2287Secondarily, this License preserves for the author and publisher a way
2288to get credit for their work, while not being considered responsible
2289for modifications made by others.
2290
2291This License is a kind of ``copyleft'', which means that derivative
2292works of the document must themselves be free in the same sense.  It
2293complements the GNU General Public License, which is a copyleft
2294license designed for free software.
2295
2296We have designed this License in order to use it for manuals for free
2297software, because free software needs free documentation: a free
2298program should come with manuals providing the same freedoms that the
2299software does.  But this License is not limited to software manuals;
2300it can be used for any textual work, regardless of subject matter or
2301whether it is published as a printed book.  We recommend this License
2302principally for works whose purpose is instruction or reference.
2303
2304@item
2305APPLICABILITY AND DEFINITIONS
2306
2307This License applies to any manual or other work, in any medium, that
2308contains a notice placed by the copyright holder saying it can be
2309distributed under the terms of this License.  Such a notice grants a
2310world-wide, royalty-free license, unlimited in duration, to use that
2311work under the conditions stated herein.  The ``Document'', below,
2312refers to any such manual or work.  Any member of the public is a
2313licensee, and is addressed as ``you''.  You accept the license if you
2314copy, modify or distribute the work in a way requiring permission
2315under copyright law.
2316
2317A ``Modified Version'' of the Document means any work containing the
2318Document or a portion of it, either copied verbatim, or with
2319modifications and/or translated into another language.
2320
2321A ``Secondary Section'' is a named appendix or a front-matter section
2322of the Document that deals exclusively with the relationship of the
2323publishers or authors of the Document to the Document's overall
2324subject (or to related matters) and contains nothing that could fall
2325directly within that overall subject.  (Thus, if the Document is in
2326part a textbook of mathematics, a Secondary Section may not explain
2327any mathematics.)  The relationship could be a matter of historical
2328connection with the subject or with related matters, or of legal,
2329commercial, philosophical, ethical or political position regarding
2330them.
2331
2332The ``Invariant Sections'' are certain Secondary Sections whose titles
2333are designated, as being those of Invariant Sections, in the notice
2334that says that the Document is released under this License.  If a
2335section does not fit the above definition of Secondary then it is not
2336allowed to be designated as Invariant.  The Document may contain zero
2337Invariant Sections.  If the Document does not identify any Invariant
2338Sections then there are none.
2339
2340The ``Cover Texts'' are certain short passages of text that are listed,
2341as Front-Cover Texts or Back-Cover Texts, in the notice that says that
2342the Document is released under this License.  A Front-Cover Text may
2343be at most 5 words, and a Back-Cover Text may be at most 25 words.
2344
2345A ``Transparent'' copy of the Document means a machine-readable copy,
2346represented in a format whose specification is available to the
2347general public, that is suitable for revising the document
2348straightforwardly with generic text editors or (for images composed of
2349pixels) generic paint programs or (for drawings) some widely available
2350drawing editor, and that is suitable for input to text formatters or
2351for automatic translation to a variety of formats suitable for input
2352to text formatters.  A copy made in an otherwise Transparent file
2353format whose markup, or absence of markup, has been arranged to thwart
2354or discourage subsequent modification by readers is not Transparent.
2355An image format is not Transparent if used for any substantial amount
2356of text.  A copy that is not ``Transparent'' is called ``Opaque''.
2357
2358Examples of suitable formats for Transparent copies include plain
2359@sc{ascii} without markup, Texinfo input format, La@TeX{} input
2360format, @acronym{SGML} or @acronym{XML} using a publicly available
2361@acronym{DTD}, and standard-conforming simple @acronym{HTML},
2362PostScript or @acronym{PDF} designed for human modification.  Examples
2363of transparent image formats include @acronym{PNG}, @acronym{XCF} and
2364@acronym{JPG}.  Opaque formats include proprietary formats that can be
2365read and edited only by proprietary word processors, @acronym{SGML} or
2366@acronym{XML} for which the @acronym{DTD} and/or processing tools are
2367not generally available, and the machine-generated @acronym{HTML},
2368PostScript or @acronym{PDF} produced by some word processors for
2369output purposes only.
2370
2371The ``Title Page'' means, for a printed book, the title page itself,
2372plus such following pages as are needed to hold, legibly, the material
2373this License requires to appear in the title page.  For works in
2374formats which do not have any title page as such, ``Title Page'' means
2375the text near the most prominent appearance of the work's title,
2376preceding the beginning of the body of the text.
2377
2378A section ``Entitled XYZ'' means a named subunit of the Document whose
2379title either is precisely XYZ or contains XYZ in parentheses following
2380text that translates XYZ in another language.  (Here XYZ stands for a
2381specific section name mentioned below, such as ``Acknowledgements'',
2382``Dedications'', ``Endorsements'', or ``History''.)  To ``Preserve the Title''
2383of such a section when you modify the Document means that it remains a
2384section ``Entitled XYZ'' according to this definition.
2385
2386The Document may include Warranty Disclaimers next to the notice which
2387states that this License applies to the Document.  These Warranty
2388Disclaimers are considered to be included by reference in this
2389License, but only as regards disclaiming warranties: any other
2390implication that these Warranty Disclaimers may have is void and has
2391no effect on the meaning of this License.
2392
2393@item
2394VERBATIM COPYING
2395
2396You may copy and distribute the Document in any medium, either
2397commercially or noncommercially, provided that this License, the
2398copyright notices, and the license notice saying this License applies
2399to the Document are reproduced in all copies, and that you add no other
2400conditions whatsoever to those of this License.  You may not use
2401technical measures to obstruct or control the reading or further
2402copying of the copies you make or distribute.  However, you may accept
2403compensation in exchange for copies.  If you distribute a large enough
2404number of copies you must also follow the conditions in section 3.
2405
2406You may also lend copies, under the same conditions stated above, and
2407you may publicly display copies.
2408
2409@item
2410COPYING IN QUANTITY
2411
2412If you publish printed copies (or copies in media that commonly have
2413printed covers) of the Document, numbering more than 100, and the
2414Document's license notice requires Cover Texts, you must enclose the
2415copies in covers that carry, clearly and legibly, all these Cover
2416Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
2417the back cover.  Both covers must also clearly and legibly identify
2418you as the publisher of these copies.  The front cover must present
2419the full title with all words of the title equally prominent and
2420visible.  You may add other material on the covers in addition.
2421Copying with changes limited to the covers, as long as they preserve
2422the title of the Document and satisfy these conditions, can be treated
2423as verbatim copying in other respects.
2424
2425If the required texts for either cover are too voluminous to fit
2426legibly, you should put the first ones listed (as many as fit
2427reasonably) on the actual cover, and continue the rest onto adjacent
2428pages.
2429
2430If you publish or distribute Opaque copies of the Document numbering
2431more than 100, you must either include a machine-readable Transparent
2432copy along with each Opaque copy, or state in or with each Opaque copy
2433a computer-network location from which the general network-using
2434public has access to download using public-standard network protocols
2435a complete Transparent copy of the Document, free of added material.
2436If you use the latter option, you must take reasonably prudent steps,
2437when you begin distribution of Opaque copies in quantity, to ensure
2438that this Transparent copy will remain thus accessible at the stated
2439location until at least one year after the last time you distribute an
2440Opaque copy (directly or through your agents or retailers) of that
2441edition to the public.
2442
2443It is requested, but not required, that you contact the authors of the
2444Document well before redistributing any large number of copies, to give
2445them a chance to provide you with an updated version of the Document.
2446
2447@item
2448MODIFICATIONS
2449
2450You may copy and distribute a Modified Version of the Document under
2451the conditions of sections 2 and 3 above, provided that you release
2452the Modified Version under precisely this License, with the Modified
2453Version filling the role of the Document, thus licensing distribution
2454and modification of the Modified Version to whoever possesses a copy
2455of it.  In addition, you must do these things in the Modified Version:
2456
2457@enumerate A
2458@item
2459Use in the Title Page (and on the covers, if any) a title distinct
2460from that of the Document, and from those of previous versions
2461(which should, if there were any, be listed in the History section
2462of the Document).  You may use the same title as a previous version
2463if the original publisher of that version gives permission.
2464
2465@item
2466List on the Title Page, as authors, one or more persons or entities
2467responsible for authorship of the modifications in the Modified
2468Version, together with at least five of the principal authors of the
2469Document (all of its principal authors, if it has fewer than five),
2470unless they release you from this requirement.
2471
2472@item
2473State on the Title page the name of the publisher of the
2474Modified Version, as the publisher.
2475
2476@item
2477Preserve all the copyright notices of the Document.
2478
2479@item
2480Add an appropriate copyright notice for your modifications
2481adjacent to the other copyright notices.
2482
2483@item
2484Include, immediately after the copyright notices, a license notice
2485giving the public permission to use the Modified Version under the
2486terms of this License, in the form shown in the Addendum below.
2487
2488@item
2489Preserve in that license notice the full lists of Invariant Sections
2490and required Cover Texts given in the Document's license notice.
2491
2492@item
2493Include an unaltered copy of this License.
2494
2495@item
2496Preserve the section Entitled ``History'', Preserve its Title, and add
2497to it an item stating at least the title, year, new authors, and
2498publisher of the Modified Version as given on the Title Page.  If
2499there is no section Entitled ``History'' in the Document, create one
2500stating the title, year, authors, and publisher of the Document as
2501given on its Title Page, then add an item describing the Modified
2502Version as stated in the previous sentence.
2503
2504@item
2505Preserve the network location, if any, given in the Document for
2506public access to a Transparent copy of the Document, and likewise
2507the network locations given in the Document for previous versions
2508it was based on.  These may be placed in the ``History'' section.
2509You may omit a network location for a work that was published at
2510least four years before the Document itself, or if the original
2511publisher of the version it refers to gives permission.
2512
2513@item
2514For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
2515the Title of the section, and preserve in the section all the
2516substance and tone of each of the contributor acknowledgements and/or
2517dedications given therein.
2518
2519@item
2520Preserve all the Invariant Sections of the Document,
2521unaltered in their text and in their titles.  Section numbers
2522or the equivalent are not considered part of the section titles.
2523
2524@item
2525Delete any section Entitled ``Endorsements''.  Such a section
2526may not be included in the Modified Version.
2527
2528@item
2529Do not retitle any existing section to be Entitled ``Endorsements'' or
2530to conflict in title with any Invariant Section.
2531
2532@item
2533Preserve any Warranty Disclaimers.
2534@end enumerate
2535
2536If the Modified Version includes new front-matter sections or
2537appendices that qualify as Secondary Sections and contain no material
2538copied from the Document, you may at your option designate some or all
2539of these sections as invariant.  To do this, add their titles to the
2540list of Invariant Sections in the Modified Version's license notice.
2541These titles must be distinct from any other section titles.
2542
2543You may add a section Entitled ``Endorsements'', provided it contains
2544nothing but endorsements of your Modified Version by various
2545parties---for example, statements of peer review or that the text has
2546been approved by an organization as the authoritative definition of a
2547standard.
2548
2549You may add a passage of up to five words as a Front-Cover Text, and a
2550passage of up to 25 words as a Back-Cover Text, to the end of the list
2551of Cover Texts in the Modified Version.  Only one passage of
2552Front-Cover Text and one of Back-Cover Text may be added by (or
2553through arrangements made by) any one entity.  If the Document already
2554includes a cover text for the same cover, previously added by you or
2555by arrangement made by the same entity you are acting on behalf of,
2556you may not add another; but you may replace the old one, on explicit
2557permission from the previous publisher that added the old one.
2558
2559The author(s) and publisher(s) of the Document do not by this License
2560give permission to use their names for publicity for or to assert or
2561imply endorsement of any Modified Version.
2562
2563@item
2564COMBINING DOCUMENTS
2565
2566You may combine the Document with other documents released under this
2567License, under the terms defined in section 4 above for modified
2568versions, provided that you include in the combination all of the
2569Invariant Sections of all of the original documents, unmodified, and
2570list them all as Invariant Sections of your combined work in its
2571license notice, and that you preserve all their Warranty Disclaimers.
2572
2573The combined work need only contain one copy of this License, and
2574multiple identical Invariant Sections may be replaced with a single
2575copy.  If there are multiple Invariant Sections with the same name but
2576different contents, make the title of each such section unique by
2577adding at the end of it, in parentheses, the name of the original
2578author or publisher of that section if known, or else a unique number.
2579Make the same adjustment to the section titles in the list of
2580Invariant Sections in the license notice of the combined work.
2581
2582In the combination, you must combine any sections Entitled ``History''
2583in the various original documents, forming one section Entitled
2584``History''; likewise combine any sections Entitled ``Acknowledgements'',
2585and any sections Entitled ``Dedications''.  You must delete all
2586sections Entitled ``Endorsements.''
2587
2588@item
2589COLLECTIONS OF DOCUMENTS
2590
2591You may make a collection consisting of the Document and other documents
2592released under this License, and replace the individual copies of this
2593License in the various documents with a single copy that is included in
2594the collection, provided that you follow the rules of this License for
2595verbatim copying of each of the documents in all other respects.
2596
2597You may extract a single document from such a collection, and distribute
2598it individually under this License, provided you insert a copy of this
2599License into the extracted document, and follow this License in all
2600other respects regarding verbatim copying of that document.
2601
2602@item
2603AGGREGATION WITH INDEPENDENT WORKS
2604
2605A compilation of the Document or its derivatives with other separate
2606and independent documents or works, in or on a volume of a storage or
2607distribution medium, is called an ``aggregate'' if the copyright
2608resulting from the compilation is not used to limit the legal rights
2609of the compilation's users beyond what the individual works permit.
2610When the Document is included in an aggregate, this License does not
2611apply to the other works in the aggregate which are not themselves
2612derivative works of the Document.
2613
2614If the Cover Text requirement of section 3 is applicable to these
2615copies of the Document, then if the Document is less than one half of
2616the entire aggregate, the Document's Cover Texts may be placed on
2617covers that bracket the Document within the aggregate, or the
2618electronic equivalent of covers if the Document is in electronic form.
2619Otherwise they must appear on printed covers that bracket the whole
2620aggregate.
2621
2622@item
2623TRANSLATION
2624
2625Translation is considered a kind of modification, so you may
2626distribute translations of the Document under the terms of section 4.
2627Replacing Invariant Sections with translations requires special
2628permission from their copyright holders, but you may include
2629translations of some or all Invariant Sections in addition to the
2630original versions of these Invariant Sections.  You may include a
2631translation of this License, and all the license notices in the
2632Document, and any Warranty Disclaimers, provided that you also include
2633the original English version of this License and the original versions
2634of those notices and disclaimers.  In case of a disagreement between
2635the translation and the original version of this License or a notice
2636or disclaimer, the original version will prevail.
2637
2638If a section in the Document is Entitled ``Acknowledgements'',
2639``Dedications'', or ``History'', the requirement (section 4) to Preserve
2640its Title (section 1) will typically require changing the actual
2641title.
2642
2643@item
2644TERMINATION
2645
2646You may not copy, modify, sublicense, or distribute the Document except
2647as expressly provided for under this License.  Any other attempt to
2648copy, modify, sublicense or distribute the Document is void, and will
2649automatically terminate your rights under this License.  However,
2650parties who have received copies, or rights, from you under this
2651License will not have their licenses terminated so long as such
2652parties remain in full compliance.
2653
2654@item
2655FUTURE REVISIONS OF THIS LICENSE
2656
2657The Free Software Foundation may publish new, revised versions
2658of the GNU Free Documentation License from time to time.  Such new
2659versions will be similar in spirit to the present version, but may
2660differ in detail to address new problems or concerns.  See
2661@uref{http://www.gnu.org/copyleft/}.
2662
2663Each version of the License is given a distinguishing version number.
2664If the Document specifies that a particular numbered version of this
2665License ``or any later version'' applies to it, you have the option of
2666following the terms and conditions either of that specified version or
2667of any later version that has been published (not as a draft) by the
2668Free Software Foundation.  If the Document does not specify a version
2669number of this License, you may choose any version ever published (not
2670as a draft) by the Free Software Foundation.
2671@end enumerate
2672
2673@page
2674@heading ADDENDUM: How to use this License for your documents
2675
2676To use this License in a document you have written, include a copy of
2677the License in the document and put the following copyright and
2678license notices just after the title page:
2679
2680@smallexample
2681@group
2682  Copyright (C)  @var{year}  @var{your name}.
2683  Permission is granted to copy, distribute and/or modify this document
2684  under the terms of the GNU Free Documentation License, Version 1.2
2685  or any later version published by the Free Software Foundation;
2686  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
2687  Texts.  A copy of the license is included in the section entitled ``GNU
2688  Free Documentation License''.
2689@end group
2690@end smallexample
2691
2692If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
2693replace the ``with@dots{}Texts.'' line with this:
2694
2695@smallexample
2696@group
2697    with the Invariant Sections being @var{list their titles}, with
2698    the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
2699    being @var{list}.
2700@end group
2701@end smallexample
2702
2703If you have Invariant Sections without Cover Texts, or some other
2704combination of the three, merge those two alternatives to suit the
2705situation.
2706
2707If your document contains nontrivial examples of program code, we
2708recommend releasing these examples in parallel under your choice of
2709free software license, such as the GNU General Public License,
2710to permit their use in free software.
2711
2712@c Local Variables:
2713@c ispell-local-pdict: "ispell-dict"
2714@c End:
2715
2716
2717@c ---------------------------------------------------------------------
2718@c ---------------------------------------------------------------------
2719
2720@node    Reporting bugs
2721@chapter Reporting bugs
2722
2723Report bugs to <obrebski@@amu.edu.pl>.
2724
2725@c ---------------------------------------------------------------------
2726@c ---------------------------------------------------------------------
2727
2728@c @node    Copyright
2729@c @chapter Copyright
2730@c
2731@c Copyright 2004 by Tomasz Obrebski
2732@c This software is free for research and educational use.
2733
2734@c ---------------------------------------------------------------------
2735@c ---------------------------------------------------------------------
2736
2737@node    Author
2738@chapter Author
2739
2740
2741@bye
Note: See TracBrowser for help on using the repository browser.