Context Navigation

utt.texinfo @ 91ed676

help

Last change on this file since 91ed676 was e28a625, checked in by obrebski <obrebski@…>, 18 years ago

Ta linia i następne zostaną zignorowane--

M app/dist/files/README

uaktualnione

M app/doc/utt.texinfo

dopiski

M app/src/gue/Makefile

statyczne biblioteki

M app/src/cor/cmdline_cor.ggo

usuniecie nie dzialajacych parametrow

M app/src/cor/Makefile

statyczne biblioteki

M app/src/common/cmdline_common.ggo

?

M app/src/kor/Makefile

statyczne biblioteki

M app/src/lem/Makefile

statyczne biblioteki

M lang/dist/tarball/Makefile

pakowanie modulow jezykowych po jednym

M lang/Makefile

-"-

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@61 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

Property mode set to 100644

File size: 80.0 KB

Line
1	\input texinfo @c --texinfo--
2	@documentencoding ISO-8859-2
3	@c @documentlanguage pl
4
5	@c %**start of header
6	@setfilename utt.info
7	@settitle UAM Text Tools v0.90
8	@c %**end of header
9
10	@copying
11	This manual is for UAM Text Tools (version 0.90, October, 2008)
12
13	Copyright @copyright{} 2005, 2007 Tomasz ObrÃªbski, MichaÂ³ Stolarski, Justyna Walkowska, PaweÂ³ Konieczka.
14
15	Permission is granted to copy, distribute and/or modify this document
16	under the terms of the GNU Free Documentation License, Version 1.2 or
17	any later version published by the Free Software Foundation; with no
18	Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
19	copy of the license is included in the section entitled GNU Free
20	Documentation License,,GNU Free Documentation License.
21
22	@c @quotation
23	@c Permission is granted to ...
24	@c No permission is granted until the document is completed.
25	@c @end quotation
26	@end copying
27
28
29	@titlepage
30	@title UAM Text Tools 0.90 - User Manual
31	@subtitle edition 0.01, @today
32	@subtitle status: prescript
33	@author by Justyna Walkowska, Tomasz Obr@,{}ebski and Micha@l{} Stolarski
34	@page
35	@vskip 0pt plus 1filll
36	@insertcopying
37	@end titlepage
38
39	@contents
40
41	@c @paragraphindent none
42
43	@iftex
44	@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
45	@end iftex
46
47	@c @headings off
48	@c @everyheading LEM(1) @\| @\| LEM(1)
49	@everyfooting @today @c @\| @thispage @\|
50
51	@ifnottex
52
53	@node Top
54	@top UTT - UAM Text Tools
55
56	@insertcopying
57
58	@menu
59	* General information::
60	* UTT file format::
61	* Configuration files::
62	* UTT components::
63	* Auxiliary tools::
64	* Usage examples::
65	* PMDBF dictionary::
66	@c * Examples::
67	@c * Copyright::
68	* GNU Free Documentation License::
69	* Reporting bugs::
70	* Author::
71	@end menu
72	@end ifnottex
73
74
75	@c ----------------------------------------------------------------------
76
77	@node General information
78	@chapter General information
79
80	UAM Text Tools (UTT) is a package of language processing tools
81	developed at Adam Mickiewicz University. Its functionality includes:
82
83	@itemize @bullet
84
85	@item
86	tokenization
87	@item
88	dictionary-based morphological analysis
89	@item
90	heuristic morphological analysis of unknown words
91	@item
92	spelling correction
93	@item
94	pattern search
95	@item
96	sentence splitting
97	@item
98	generation of concordance tables
99	@end itemize
100
101	The toolkit is destined for processing of raw (not annotated)
102	unrestricted text for any conceivable purpose.
103
104	The system is organized as a collection of command-line programs, each
105	performing one operation, e.g. tokenization, lemmatization, spelling
106	correction. The components are independent one from another, the
107	unifying element being the uniform i/o file format.
108
109	The components may be combined in various ways to provide various text
110	processing services. Also new components supplied by the used may be
111	easily incorporated into the system provided that they respect the i/o
112	file format conventions.
113
114	UTT component programs does not depend on any specific tagset or
115	morphological description format.
116
117	UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
118	the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
119
120	The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use.
121
122
123	List of contributors:
124
125	@itemize
126	@item Pawel Konieczka
127	@item Tomasz Obrebski
128	@item Michal Stolarski
129	@item Marcin Walas
130	@item Justyna Walkowska
131	@item Pawel Werenski
132	@end itemize
133
134	@c ----------------------------------------------------------------------
135	@c ---------------------------------------------------------------------
136
137	@node UTT file format
138	@chapter UTT file format
139
140	A UTT file contains annotation of a text. It consists of a sequence of
141	segments. Each segment explicitly refers to a continuous piece of the
142	text and provides some information on it.
143
144	@section Segment format
145
146	A segment occupies one line of a UTT file and consists of
147	space-separated fields:
148
149
150	@quotation
151	@sp 1
152	[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
153	@sp 1
154	@end quotation
155
156	@table @var
157
158	@item @var{start}
159	Non-negative integer value indicating the position in the source text where the
160	segment starts.
161
162	@item @var{length}
163	Non-negative integer value indicating the length of the segment.
164
165	@item @var{type}
166	A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
167	@var{type} reflects the main classification of segments -
168	into words, numbers, punctuation marks, meta-text markers.
169	@xref{tok output,,tok output}, for description of automatically recognized type markers.
170
171	@item @var{form}
172	This field contains the textual form of the segment or the special
173	symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
174
175	The characters or character sequences that have special meaning in the
176	@var{form} field are enumerated below.
177
178	Characters with special meaning:
179
180	@itemize
181	@item @code{_} - space character
182	@item @code{*} - undefined contents
183	@end itemize
184
185	Escape sequences:
186
187	@itemize
188	@item @code{\n} - new line
189	@item @code{\t} - tabulation
190	@item @code{\r} - carriage return
191
192	@item @code{\_} - the @code{_} character
193	@item @code{\} - the @code{} character
194	@item @code{\\} - the @code{\} character
195
196	@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
197	@end itemize
198
199	@item @var{annotation1}
200	@item @var{annotation2}
201	@item ...
202	Annotation fields have the following format:
203
204	@var{longname} @code{:} @var{value}
205
206	or
207
208	@var{shortname} @var{value}
209
210	where @var{longname} is a string of alphanumeric characters
211	(isalnum() test), @var{shortname} - a single non-alphanumeric character
212	(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
213
214	@end table
215
216
217	Only two fields are mandatory: @var{type} and @var{form}. All other fields
218	may be absent. In the case when only one number precedes the
219	@var{type} field, it is interpreted as the @var{START} position.
220
221	If the @var{length} field is ommited, the length of the segment is the
222	length of the @var{form} field, except when the value of the
223	@var{form} field is @code{*} -- in this case, the length is assumed to
224	be 0.
225
226	If the @var{start} field is also absent, the segment is assumed to directly
227	follow the preceding one.
228
229	@c Conventions:
230
231	@c Annotation fields with predefined meaning:
232
233	@c @itemize
234	@c @item @code{!} - UTT components are allowed to modify the contents of
235	@c the @var{form} field (e.g. spelling correction does this). If this happens the
236	@c original form of the segment have to be placed in the @code{!}-field.
237	@c @item @code{@@} - morphological description
238	@c @item @code{=} - node identifier assignment (used in graph encoding)
239	@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
240	@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
241	@c @end itemize
242
243	Segments of length 0 may be used to mark file positions with some
244	information. See e.g. BOS and EOS (beginning/end of sentence) markers
245	in the example below.
246
247	Example:
248
249	sentence: @samp{Piszemy dobre progrumy.}
250
251	@example
252	0000 00 BOS *
253	0000 07 W Piszemy lem:pisaÃŠ,V
254	0007 01 S _
255	0008 05 W dobre lem:dobry,ADJ
256	0013 01 S _
257	0014 08 W progrumy cor:programy lem:program,N
258	0022 01 P .
259	0023 00 EOS *
260	0023 01 S _
261	0024 00 BOS *
262	0024 11 W Warszawiacy lem:Warszawiak,N
263	0035 01 S _
264	0036 03 W teÂ¿
265	0039 01 P .
266	0040 00 EOS *
267
268	@end example
269
270	@example
271	0000 BOS *
272	0000 W Piszemy lem:pisaÃŠ,V
273	0007 S _
274	0008 W dobre lem:dobry,ADJ
275	0013 S _
276	0014 W progrumy cor:programy lem:program,N
277	0022 P .
278	0023 EOS *
279	@end example
280
281	Posion information may be provided only for some types of segments:
282
283	@example
284	0000 BOS *
285	W Piszemy lem:pisaÃŠ,V
286	S _
287	W dobre lem:dobry,ADJ
288	S _
289	W progrumy cor:programy lem:program,N
290	P .
291	EOS *
292	S _
293	0024 BOS *
294	W Warszawiacy lem:Warszawiak,N
295	S _
296	W teÂ¿
297	P .
298	EOS *
299	@end example
300
301	Position/length information may be provided only when necessary:
302
303	@example
304	0000 04 N *
305	0000 N 12
306	P .
307	N 5
308	S _
309	W km
310	@end example
311
312	@section UTT File
313
314	A UTT file consists of a sequence of segments. The same text position
315	may be covered by multiple segments. In cosequence, ambiguous text
316	segmentation and ambiguous annotation may be represented.
317
318	There are two structural requirements a valid UTT-formatted file
319	has to meet:
320
321	@itemize @bullet
322
323	@item
324	segments have to be sorted with respect to the @var{position} field,
325
326	@item
327	for each
328	segment ending at position @var{n}, either there must be a segment starting at
329	position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
330	for each segment starting at position @var{n}, either there must be a segment
331	ending at position @var{n-1}, or the position @var{n-1} must not be covered
332	by any segment.
333
334	@end itemize
335
336	A valid annotation for the text fragment
337	@example
338	12.5 km
339	@end example
340
341	may be
342
343	@example
344	0000 02 N 12
345	0000 04 N 12.5
346	0002 01 P .
347	0003 01 N 5
348	0004 01 S _
349	0005 02 W km
350	@end example
351
352	but not
353
354	@example
355	0000 02 N 12
356	0000 04 N 12.5
357	0004 01 S _
358	0005 02 W km
359	@end example
360
361	because in the latter example the first segment (starting at position
362	0000, 2 characters long) ends at position @var{n}=0001 which is
363	covered by the second segment and no segment starts at position
364	@var{n+2}=0002.
365
366
367	@section Flattened UTT file
368
369	A UTT file format has two variants: regular and flattened. The regular
370	format was described above. In the flattened format some of the
371	end-of-line characters are replaced with line-feed characters.
372
373	The flatten format is basically used to represent whole sentences as
374	single lines of the input file (all intrasentential end-of-line
375	characters are replaced with line-feed characters).
376
377	This technical trick permits to perform certain text
378	processing operations on entire sentences with the use of such tools as
379	@command{grep} (see @command{grp} component) or @command{sed} (see @command{mar} component).
380
381	The conversion between the two formats is performed by the tools:
382	@command{fla} and @command{unfla}.
383
384	@section Character encoding
385
386	The UTT component programs accept only 1-byte character encoding, such
387	as ISO, ANSI, DOS.
388
389
390	@c @section Formats
391
392	@c @unnumberedsubsubsec Basic format
393
394	@c While processing large amounts of the overhead related with explicit
395	@c ... of the start position and segment length becomes ... . Therefore,
396	@c for efficiency reasons certain shortcuts are possible:
397
398	@c @unnumberedsubsubsec Relative start position
399
400	@c Start position may be given as relative distance from the last
401	@c absolut position.
402
403	@c @unnumberedsubsubsec Absent length
404
405	@c Segment length may by omitted. Normally it can be restored by counting
406	@c the length of the @emph{form field}. For segments with the special value
407	@c @code{*} in the @emph{form field} length 0 is assumed.
408
409	@c @unnumberedsubsubsec Absent length and start position
410
411	@c Both start position and segment length may be omitted. In this format
412	@c each segment is assumed to follow the previous one. This format is,
413	@c therefore, suitable only for unambiguously tagged text
414	@c (0-length markers can be still used.)
415
416
417	@c @table @code
418	@c @item AL
419	@c @code{1234 03 W kot}
420	@c @item RL
421	@c @code{+56 03 W kot}
422	@c @item A
423	@c @code{1234 W kot}
424	@c @item R
425	@c @code{+56 W kot}
426	@c @item 0
427	@c @code{W kot}
428	@c @end table
429
430
431	@c [JAK UZYSKAÃ POLSKIE CZCIONKI W DVI???]
432
433	@macro parhelp
434	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
435	Print help.
436	@end macro
437
438
439	@macro parversion
440	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
441	Print version information.
442	@end macro
443
444	@macro parinteractive
445	@item @b{@minus{}@minus{}interactive, @minus{}i}
446	This option toggles interactive mode, which is by default off. In the
447	interactive mode the program does not buffer the output.
448	@end macro
449
450
451	@c @macro parfile
452	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
453	@c Input file name.
454	@c If this option is absent or equal to '@minus{}', the program
455	@c reads from the standard input.
456	@c @end macro
457
458
459	@c @macro paroutput
460	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
461	@c Regular output file name. To regular output the program sends segments
462	@c which it successfully processed and copies those which were not
463	@c subject to processing. If this option is absent or equal to
464	@c '@minus{}', standard output is used.
465	@c @end macro
466
467	@c @macro parfail
468	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
469	@c Fail output file name. To fail output the program copies the segments
470	@c it failed to process. If this option is absent or equal to
471	@c '@minus{}', standard output is used.
472	@c @end macro
473
474
475	@c @macro parcopy
476	@c @item @b{@minus{}@minus{}copy, @minus{}c}
477	@c Copy succesfully processed segments to regular output also in their
478	@c original input form.
479	@c @end macro
480
481
482	@macro parinputfield
483	@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
484	The field containing the input to the program. The default is the
485	@var{form} field. The fields @var{position}, @var{length}, @var{type},
486	and @var{form} are referred to as @code{1}, @code{2}, @code{3},
487	@code{4}, respectively.
488	@end macro
489
490
491	@macro paroutputfield
492	@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
493	The name of the field added by the program. The default is the name of the program.
494	@end macro
495
496
497	@macro pardictionary
498	@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
499	Dictionary file name.
500	@end macro
501
502
503	@macro parprocess
504	@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
505	Process segments with the specified value in the @var{type} field.
506	Multiple occurences of this option are allowed and are interpreted as
507	disjunction. If this option is absent, all segments are processed.
508	@end macro
509
510
511	@macro parselect
512	@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
513	Select for processing only segments in which the field named
514	@var{fieldname} is present. Multiple occurences of this option are
515	allowed and are interpreted as conjunction of conditions. If this
516	option is absent, all segments are processed.
517	@end macro
518
519
520	@macro parunselect
521	@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
522	Select for processing only segments in which the field @var{fieldname}
523	is absent. Multiple occurences of this option are allowed and are
524	interpreted as conjunction of conditions. If this option is absent,
525	all segments are processed.
526	@end macro
527
528
529	@macro paroneline
530	@item @b{@minus{}@minus{}one-line}
531	This option makes the program print ambiguous annotation in one output
532	line by generating multiple annotation fields. By default when
533	ambiguous annotation may be produced for a segment, the segment is
534	multiplicated and each of the annotations is added to separate copy of
535	the segment.
536	@end macro
537
538
539	@macro paronefield
540	@item @b{@minus{}@minus{}one-field, @minus{}1}
541	This option makes the program print ambiguous annotation in one
542	annotation field. By default when ambiguous annotation may be produced
543	for a segment, the segment is multiplicated and each of the
544	annotations is added to separate copy of the segment.
545
546	This option is useful when working with @command{kot} or @command{con}.
547	@end macro
548
549
550	@c ---------------------------------------------------------------------
551	@c CONFIGURATION FILES
552	@c ---------------------------------------------------------------------
553
554	@node Configuration files
555	@chapter Configuration files
556
557	Values for all command line options accepted by a component
558	may be set in configuration files. The default location of the
559	configuration files for a component named @command{@var{program}} are
560
561	@example
562	@file{/usr/local/etc/utt/@var{program}.conf}
563	@end example
564
565	for system-wide configuration file and
566
567	@example
568	@file{~/.utt/@var{program}.conf}
569	@end example
570
571	for user configuration file.
572
573	@c The configuration file to load may be also specified with the
574	@c @option{--config} option. Configuration file need not be provided.
575
576	For each option, the value is set according to the following priority:
577
578	@itemize
579	@item command line
580	@c @item configuration file indicated with @option{--config} option
581	@item user configuration file (or configuration file indicated with the @option{--config} option)
582	@item system-wide configuration file
583	@end itemize
584
585	Parameter values are specified in the following format:
586
587	@var{parametername}=@var{value}
588
589	where @var{parametername} is the short or long name of an option accepted by
590	the program, or
591
592	@var{parametername}
593
594	if the option does not need arguments.
595
596	You can introduce comments to configuration files using the # sign.
597
598	If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
599
600	@c The equal sign may be omitted.
601
602
603	@quotation Tip
604	If you have two (or more) frequently used sets of options for the same
605	program (eg. lem with PMDBF dictionary and lem with a user dictionary)
606	a good solution is to create two soft links to lem, called
607	eg. lemg and lemu and specify their configuration in files lemg.conf
608	and lemu.conf respectively.
609	@end quotation
610
611	@c ---------------------------------------------------------------------
612	@c COMPONENTS
613	@c ---------------------------------------------------------------------
614
615	@node UTT components
616	@chapter UTT components
617
618	UTT components are of three types:
619
620	@menu
621	Sources: programs which read non-UTT data (e.g. raw text) and produce output
622	in UTT format
623	* tok:: a tokenizer
624
625	Filters: programs which read and produce UTT-formatted data
626	* lem:: a morphological analyzer
627	* gue:: a morphological guesser
628	* cor:: a simple spelling corrector
629	* kor:: a more elaborated spelling corrector
630	* sen:: a sentensizer
631	* ser:: a pattern search tool (marks matches)
632	* mar:: a pattern search tool (introduces arbitrary markers into the text)
633	* grp:: a pattern search tool (selects sentences containing a match)
634	@c * gph:: a word-graph annotation tool::
635	@c * dgp:: a dependency parser
636
637	Sinks: programs which read UTT data and produce output in another format
638	* kot:: an untokenizer
639	* con:: a concordance table generator
640	@end menu
641
642	@c ---------------------------------------------------------------------
643	@c TOK
644	@c ---------------------------------------------------------------------
645
646	@page
647	@node tok
648	@section tok - a tokenizer
649
650	@c ----------------------------------------
651
652	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
653	@item @strong{Authors:} @tab Tomasz ObrÃªbski
654	@item @strong{Component category:} @tab source
655	@item @strong{Input format:} @tab raw text file
656	@item @strong{Output format:} @tab UTT regular
657	@item @strong{Required annotation:} @tab -
658	@end multitable
659
660
661	@menu
662	* tok description::
663	* tok input::
664	* tok output::
665	* tok command line options::
666	* tok example::
667	@end menu
668
669	@node tok description
670	@subsection Description
671
672	@code{tok} is a simple program which reads a text file and identifies
673	tokens on the basis of their orthographic form. The type of the token
674	is printed as the @var{type} field.
675
676	@node tok input
677	@subsection Input
678
679	Raw text.
680
681	@node tok output
682	@subsection Output
683
684	UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
685
686	@itemize
687
688	@item @code{W}
689	(word)
690	- continuous sequence of letters
691
692	@item @code{N}
693	(number)
694	- continuous sequence of digits
695
696	@item @code{S}
697	(space)
698	- continuous sequence of space characters
699
700	@item @code{P}
701	(punctuation mark)
702	- single printable characters not belonging to any of the other classes
703
704	@item @code{B}
705	(unprintable character)
706	- single unprintable character
707
708	@end itemize
709
710
711
712	@node tok command line options
713	@subsection Command line options
714
715	@table @code
716
717	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
718	Print help.
719
720	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
721	Print version information.
722
723	@item @b{@minus{}@minus{}interactive, @minus{}i}
724	This option toggles interactive mode, which is by default off. In the
725	interactive mode the program does not buffer the output.
726
727	@end table
728
729	@node tok example
730	@subsection Example
731
732	Input:
733
734	@example
735	Piszemy dobre programy.
736	@end example
737
738	Output:
739
740	@example
741	0000 07 W Piszemy
742	0007 01 S _
743	0008 05 W dobre
744	0013 01 S _
745	0014 08 W programy
746	0022 01 P .
747	0023 01 S \n
748	@end example
749
750
751	@c ---------------------------------------------------------------------
752	@c SEN
753	@c ---------------------------------------------------------------------
754
755	@c @node sen - sentencizer
756	@c @chapter sen - sentencizer
757
758	@c Authors: Tomasz ObrÃªbski
759
760	@c ---------------------------------------------------------------------
761	@c LEM
762	@c ---------------------------------------------------------------------
763
764	@page
765	@node lem
766	@section lem - morphological analyzer
767
768	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
769	@item @strong{Authors:} @tab Tomasz ObrÃªbski, MichaÂ³ Stolarski
770	@item @strong{Component category:} @tab filter
771	@item @strong{Input format:} @tab UTT regular
772	@item @strong{Output format:} @tab UTT regular
773	@item @strong{Required annotation:} @tab tok
774	@end multitable
775
776	@menu
777	* lem description::
778	* lem command line options::
779	* lem input::
780	* lem output::
781	* lem example::
782	* lem dictionaries::
783	* lem hints::
784	@end menu
785
786	@node lem description
787	@subsection Description
788
789	@command{lem} performs morphological analysis of a simple orthographic
790	word, returning all its possible morphological annotations,
791	disregarding the context.
792
793	@c ----------------------------------------
794
795	@node lem command line options
796	@subsection Command line options
797
798	@table @code
799	@parhelp
800	@parversion
801	@parinteractive
802	@c @parfile
803	@c @paroutput
804	@c @parfail
805	@c @parcopy
806	@parinputfield
807	@paroutputfield
808	@pardictionary
809	@parprocess
810	@parselect
811	@parunselect
812	@paroneline
813	@paronefield
814	@end table
815
816	@c ----------------------------------------
817
818	@node lem input
819	@subsection Input
820
821	Lem reads a UTT file and processes the value of the @var{form} field
822	(the input field may be changed with @option{--input-field} option).
823
824	@node lem output
825	@subsection Output
826
827	@command{lem} adds a new annotation field, whose default name is @code{lem}. In
828	case of ambiguity either the segment is multiplicated (default),
829	multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
830	annotation is produced as the value of single @code{lem} field (option
831	@option{--one-field,-1}):
832
833	@itemize @bullet
834
835	@item
836	unambiguous value format:
837
838	@example
839	<lemma>,<descr>
840	@end example
841
842	@item
843	ambiguous value format (@option{--one-field} option)
844
845
846	@example
847	<lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
848	@end example
849
850	(alternative descriptions for the same lemma are separated by commas,
851	alternative lemmata are separated by semicolons.)
852
853	@end itemize
854
855	@node lem example
856	@subsection Example
857
858	Input:
859
860	@example
861	0000 07 W Piszemy
862	0007 01 S _
863	0008 05 W dobre
864	0013 01 S _
865	0014 08 W programy
866	0022 01 P .
867	0023 01 B \n
868	@end example
869
870	Output (default):
871
872	@example
873	0000 07 W Piszemy lem:pisaÃŠ,V/AiVpMdTrfNpP1
874	0007 01 B _
875	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
876	0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
877	0013 01 B _
878	0014 08 W programy lem:program,N/GiNpCa
879	0014 08 W programy lem:program,N/GiNpCn
880	0014 08 W programy lem:program,N/GiNpCv
881	0022 01 P .
882	0023 01 B \n
883	@end example
884
885	Output (@option{--one-line} option):
886
887	@example
888	0000 07 W Piszemy lem:pisaÃŠ,V/AiVpMdTrfNpP1
889	0007 01 S _
890	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
891	0013 01 S _
892	0014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
893	0022 01 P .
894	0023 01 S \n
895	@end example
896
897	Output (@option{--one-field} option):
898
899	@example
900	0000 07 W Piszemy lem:pisaÃŠ,V/AiVpMdTrfNpP1
901	0007 01 S _
902	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
903	0013 01 S _
904	0014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
905	0022 01 P .
906	0023 01 S \n
907	@end example
908
909	@c ----------------------------------------
910
911	@node lem dictionaries
912	@subsection Dictionaries
913
914	@command{lem} requires a dictionary. The dictionary may be provided in
915	one of two formats: in text (source) format or in binary (fsa) format.
916
917	@subsubheading Text format
918
919	Dictionary entries have the following structure:
920
921	@example
922	<form>;<lemma>,<descr>[;<lemma>,<descr>]
923	@end example
924
925	@var{lemma} may be given explicitly or in the cut-add format:
926
927	@example
928	@code{[<cut1><add1>-]<cut2><add2>}
929	@end example
930
931	meaning: replace prefix of length @code{<cut1>} with
932	string @code{<add1>}, replace suffix of length @code{<cut2>} with string
933	@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
934	@samp{kot}, @code{3-4aÂ³y} transforms @samp{najbielsi} into @samp{biaÂ³y}
935
936	Each dictionary entry must be written in one line and must not contain blank characters.
937
938	Examples:
939	@example
940	kot;0,N/GaNsCn
941	kota;1,N/GaNsCg;1,N/GaNsCa
942	kotu;1,N/GaNsCd
943	kotem;2,N/GaNsCi
944	kocie;3t,N/GaNsCl;3t,N/GaNsCv
945	najbielsi;3-4aÂ³y,ADJ/DsNpCnGp
946	najbielsze;3-5aÂ³y,ADJ/DsNpCnGaifn
947	najlepsi;dobry,ADJ/DsNpCnGp
948	najlepsze;dobry,ADJ/DsNpCnGaifn
949	@end example
950
951
952	The mandatory file name extension for a text dictionary is @code{dic}. For large
953	dictionaries it is preferable, however, to compile them into binary
954	(fsa) format.
955
956	@subsubheading Binary format
957
958	The mandatory file name extension for a binary dictionary is @code{bin}. To
959	compile a text dictionary into binary format, write:
960
961	@example
962	compiledic <dictionaryname>.dic
963	@end example
964
965	@subsubheading Polex/PMDBF dictionary
966
967	A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
968	the distribution as the default @emph{lem}'s dictionary. It's
969	located by default in:
970
971	@file{$HOME/.local/share/utt/pl_PL.ISO-8859-2/lem.bin}
972
973	in local installation or in
974
975	@file{/usr/local/share/utt/pl_PL.ISO-8859-2/lem.bin}
976
977	in system installation.
978
979	@node lem hints
980	@subsection Hints
981
982	@subsubheading Combining data from multiple dictionaries
983
984	@itemize
985
986	@item Apply <dict1>, then apply <dict2> to words which were not annotatated.
987
988	@example
989	lem -d <dict1> \| lem -S lem -d <dict2>
990	@end example
991
992	@item Add annotations from two dictionaries <dict1> and <dict2>.
993
994	@example
995	lem -c -d <dict1> \| lem -S lem -d <dict2>
996	@end example
997
998	@end itemize
999
1000
1001	@c ---------------------------------------------------------------------
1002	@c GUE
1003	@c ---------------------------------------------------------------------
1004
1005	@page
1006	@node gue
1007	@section gue - morphological guesser
1008
1009	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1010
1011	@item @strong{Authors:} @tab MichaÂ³ Stolarski, Tomasz ObrÃªbski
1012	@item @strong{Component category:} @tab filter
1013
1014	@end multitable
1015
1016	@menu
1017	* gue description::
1018	* gue command line options::
1019	* gue example::
1020	* gue dictionaries::
1021	@end menu
1022
1023
1024	@node gue description
1025	@subsection Description
1026
1027	@command{gue} guesess morphological descriptions of the form contained
1028	in the @var{form} field.
1029
1030
1031	@node gue command line options
1032	@subsection Command line options
1033
1034	@table @code
1035
1036	@parhelp
1037	@parversion
1038	@parinteractive
1039	@c @parfile
1040	@c @paroutput
1041	@c @parfail
1042	@c @parcopy
1043	@parinputfield
1044	@paroutputfield
1045	@pardictionary
1046	@parprocess
1047	@parselect
1048	@parunselect
1049	@paroneline
1050	@paronefield
1051
1052	@item @b{@minus{}@minus{}delta=@var{n}}
1053	Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
1054
1055
1056	@item @b{@minus{}@minus{}cut-off=@var{n}}
1057	Do not display answers with less weight than cut-off value (default=`200').
1058
1059
1060	@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
1061	Guess up to n descriptions (default=`0', which means 'display all results').
1062
1063
1064
1065	@end table
1066
1067	@node gue example
1068	@subsection Example
1069
1070	@example
1071	command: gue -n 2
1072
1073	input:
1074	0000 07 W smerfny
1075
1076	output:
1077	0000 07 W smerfny gue:,ADJ/CaDpGiNs
1078	0000 07 W smerfny gue:,ADJ/CnvDpGaipNs
1079	@end example
1080
1081
1082	@node gue dictionaries
1083	@subsection Dictionaries
1084
1085	@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
1086	The fsa format is created by compiling text-format dictionaries.
1087
1088
1089
1090	@subsubheading Text format
1091
1092	Dictionary entries have the following structure:
1093
1094	@example
1095	@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
1096	@end example
1097
1098	@var{lemma} must be given in the cut-add format:
1099
1100	@example
1101	@code{[<cut1><add1>-]<cut2><add2>}
1102	@end example
1103	(no spaces in between): replace prefix of length @var{cut1} with
1104	string @var{add1}, replace suffix of length @var{cat2} with string
1105	@var{add2}.
1106
1107
1108	Example: @code{3-4aÂ³y} transforms @i{najbielsi} into @i{biaÂ³y}
1109
1110
1111	@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
1112
1113	@var{weight} is an integer value between 1 and 999 indicating the
1114	likelihood of the guess.
1115
1116	@example
1117	*Â³kÃª;1a,N/GfNsCa
1118	naj*elszy;3-4aÂ³y,ADJ/...:...
1119	@end example
1120
1121
1122	@c ---------------------------------------------------------------------
1123	@c COR
1124	@c ---------------------------------------------------------------------
1125
1126	@page
1127	@node cor
1128	@section cor - spelling corrector
1129
1130	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1131	@item @strong{Authors:} @tab Tomasz ObrÃªbski, MichaÂ³ Stolarski
1132	@item @strong{Component category:} @tab filter
1133	@item @strong{Input format:} @tab UTT regular
1134	@item @strong{Output format:} @tab UTT regular
1135	@item @strong{Required annotation:} @tab tok
1136	@end multitable
1137
1138	@menu
1139	* cor description::
1140	* cor command line options::
1141	* cor dictionaries::
1142	@end menu
1143
1144
1145	@node cor description
1146	@subsection Description
1147
1148	The spelling corrector applies Kemal Oflazer's dynamic programming
1149	algorithm @cite{oflazer96} to the FSA representation of the set of
1150	word forms of the Polex/PMDBF dictionary. Given an incorrect
1151	word form it returns all word forms present in the dictionary whose
1152	edit distance is smaller than the threshold given as the parameter.
1153
1154
1155	@node cor command line options
1156	@subsection Command line options
1157
1158	@table @code
1159
1160	@parhelp
1161	@parversion
1162	@parinteractive
1163	@c @parfile
1164	@c @paroutput
1165	@c @parfail
1166	@c @parcopy
1167	@parinputfield
1168	@paroutputfield
1169	@pardictionary
1170	@parprocess
1171	@parselect
1172	@parunselect
1173	@paroneline
1174	@paronefield
1175
1176	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
1177	Maximum edit distance (default='1').
1178
1179	@c @item @b{@minus{}@minus{}replace, @minus{}r}
1180	@c Replace original form with corrected form, place original form in the
1181	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
1182
1183
1184	@end table
1185
1186	@node cor dictionaries
1187	@subsection Dictionaries
1188
1189	@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
1190	The fsa format is created by compiling text-format dictionaries.
1191
1192	@subsubheading Text format
1193
1194	The @command{cor} dictionary is a list of words:
1195	@example
1196	odlot
1197	odlotowy
1198	odludek
1199	@end example
1200
1201	@subsubheading Binary format
1202
1203	The mandatory file name extension for a binary dictionary is @code{bin}. To
1204	compile a text dictionary into binary format, write:
1205
1206	@example
1207	compiledic <dictionaryname>.dic
1208	@end example
1209
1210	@c ---------------------------------------------------------------------
1211	@c KOR
1212	@c ---------------------------------------------------------------------
1213
1214	@page
1215	@node kor
1216	@section kor - configurable spelling corrector
1217
1218	[TODO]
1219
1220	@c ---------------------------------------------------------------------
1221	@c SEN
1222	@c ---------------------------------------------------------------------
1223
1224	@page
1225	@node sen
1226	@section sen - a sentensizer
1227
1228	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1229
1230	@item @strong{Authors:} @tab Tomasz ObrÃªbski
1231	@item @strong{Component category:} @tab filter
1232	@item @strong{Input format:} @tab UTT regular
1233	@item @strong{Output format:} @tab UTT regular
1234	@item @strong{Required annotation:} @tab tok
1235
1236	@end multitable
1237
1238
1239	@menu
1240	* sen description::
1241	@c * sen input::
1242	@c * sen output::
1243	* sen example::
1244	@end menu
1245
1246	@node sen description
1247	@subsection Description
1248
1249	@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
1250
1251	@node sen example
1252	@subsection Example
1253
1254	@example
1255	command: sen
1256
1257	input:
1258	0000 05 W CzeÂ¶ÃŠ
1259	0005 01 P !
1260	0006 01 S _
1261	0007 02 W To
1262	0009 01 S _
1263	0010 02 W ja
1264	0012 01 P .
1265	0013 01 S \n
1266
1267	output:
1268	0000 00 BOS *
1269	0000 05 W CzeÂ¶ÃŠ
1270	0005 01 P !
1271	0006 00 EOS *
1272	0006 00 BOS *
1273	0006 01 S _
1274	0007 02 W To
1275	0009 01 S _
1276	0010 02 W ja
1277	0012 01 P .
1278	0013 01 S \n
1279	0014 00 EOS *
1280	@end example
1281
1282
1283	@c ---------------------------------------------------------------------
1284	@c GPH
1285	@c ---------------------------------------------------------------------
1286
1287	@c @node gph - graphizer
1288	@c @chapter gph - graphizer
1289
1290	@c Authors: Tomasz ObrÃªbski
1291
1292
1293
1294	@c ---------------------------------------------------------------------
1295	@c SER
1296	@c ---------------------------------------------------------------------
1297
1298	@page
1299	@node ser
1300	@section ser - pattern search tool
1301
1302	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1303	@item @strong{Authors:} @tab Tomasz ObrÃªbski
1304	@item @strong{Component category:} @tab filter
1305	@item @strong{Input format:} @tab UTT regular
1306	@item @strong{Output format:} @tab UTT regular
1307	@item @strong{Required annotation:} @tab tok, lem --one-field
1308	@end multitable
1309
1310	@menu
1311	* ser description::
1312	* ser command line options::
1313	* ser pattern::
1314	* ser how ser works::
1315	* ser customization::
1316	* ser limitations::
1317	* ser requirements::
1318	@end menu
1319
1320
1321	@node ser description
1322	@subsection Description
1323
1324	@command{ser} looks for patterns in UTT-formatted texts.
1325
1326
1327	@c ---------------------------------------------------------------------
1328	@node ser command line options
1329	@subsection Command line options
1330
1331	@table @code
1332
1333	@parhelp
1334	@parversion
1335	@c @parfile
1336	@c @paroutput
1337	@c @parinputfield
1338	@c @paroutputfield
1339	@parprocess
1340	@parinteractive
1341
1342	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1343	The search pattern.
1344
1345	@item @b{@minus{}@minus{}morph=@var{field}}
1346	The name of the annotation field containing the morphological
1347	description (default @code{lem}).
1348
1349	@item @b{@minus{}@minus{}flex}
1350	Only print the generated flex source code.
1351
1352	@item @b{@minus{}@minus{}macro=@var{filename}}
1353	Read macrodefinitions from file @var{filename} rather than from
1354	default location. This option allows to redefine the set of terms.
1355
1356	@item @b{@minus{}@minus{}define=@var{filename}}
1357	Append macrodefinitions from file @var{filename}. This option
1358	allows to extend the set of terms.
1359
1360	@end table
1361
1362
1363	@c ---------------------------------------------------------------------
1364	@node ser pattern
1365	@subsection Pattern
1366
1367	The @command{ser} pattern is a regular expression over terms corresponding
1368	to text segments or segment sequences. Predefined terms are:
1369
1370	@table @code
1371
1372	@item seg(@var{t},@var{f},@var{a})
1373	a segment of type @var{t}, containing form @var{f} and annotation
1374	@var{a}
1375
1376	@item form(@var{f})
1377	a segment containing form @var{f}
1378
1379	@item field(@var{f})
1380	a segment containing annotation field @var{f}
1381
1382	@item space(@var{f})
1383	a space segment of form @var{f}
1384
1385	@item word(@var{f})
1386	a word segment of form @var{f}
1387
1388	@item punct(@var{f})
1389	a punct segment of form @var{f}
1390
1391	@item number(@var{f})
1392	a number segment of form @var{f}
1393
1394	@item lexeme(@var{f})
1395	a word segment with lemma @var{f}
1396
1397	@item cat(@var{c})
1398	a word segment of category @var{c}
1399
1400	@end table
1401
1402	All arguments are optional. If an argument is omitted, an arbitrary
1403	string of non-blank characters is assumed as the argument value. Term
1404	arguments may be arbitrary character-level regular expressions. The
1405	following special symbols can by used:
1406
1407	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1408	@item @code{[@dots{}]} @tab a character class
1409	@item @code{[^@dots{}]} @tab a negated character class
1410	@item @code{\|} @tab alternative
1411	@item @code{*} @tab repetition, including zero times
1412	@item @code{+} @tab repetition, at least one time
1413	@item @code{?} @tab optionality
1414	@item @code{@{@var{m},@var{n}@}} @tab repetition from @var{m} to @var{n} times
1415	@item @code{@{@var{m},@}} @tab repetition @var{m} or more times
1416	@item @code{@{@var{m}@}} @tab repetition @var{m} times
1417	@item @code{@var{\ddd}} @tab the character with octal value @var{ddd}
1418	@item @code{\x@var{hh}} @tab the character with hexadecimal value @var{hh}
1419	@item @code{( )} @tab parentheses, used to override precedence
1420	@c @end multitable
1421
1422	@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1423	@item @code{.} @tab a non-blank character
1424	@item @code{\w} @tab a letter
1425	@item @code{\W} @tab a non-blank character other than a letter
1426	@item @code{\d} @tab a digit
1427	@item @code{\D} @tab a non-blank character other than a digit
1428	@item @code{\s} @tab a space or tab character
1429	@item @code{\S} @tab a non-blank character (the same as @code{.})
1430	@item @code{\l} @tab a lowercase letter
1431	@item @code{\L} @tab an uppercase letter
1432	@end multitable
1433
1434
1435	@noindent The following characters:
1436	@example
1437	@verb{% [ ] ^ \| * + ? { } , . < > \ %}
1438	@end example
1439	must be escaped with a backslash, i.e. written as:
1440	@example
1441	@verb{% \[ \] \^ \\| \* \+ \? \{ \} \, \. \< \> \\ %}
1442	@end example
1443
1444	@quotation Note
1445	The special symbols are ... borrowed from Perl with minor
1446	modifications ... for convenience
1447	The meaning of certain special characters/sequences slightly differs
1448	from their common ???. This is motivated by convenience reasons.
1449	The meaning of the @code{.} special character is modified due to
1450	the special function of spaces in utt files (they are field
1451	separators). Use @code{\s} to explicitly
1452	@end quotation
1453
1454	In the argument of the @code{cat} term a special operator <...> may be
1455	used. A category specification enclosed in angle brackets matches all
1456	category descriptions which are consistent (non-contradictory) with the
1457	specification. For example @code{<N>} matches all noun descriptions,
1458	@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
1459
1460
1461	@*
1462	@noindent @b{Examples of one-segment patterns:}
1463
1464	@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1465	@item @code{seg} @tab any segment
1466	@item @code{word} @tab any word-form
1467	@item @code{word(pomocy)} @tab the word-form @samp{pomocy}
1468	@item @code{word(naj.+)} @tab a word-form beginning with @samp{naj}
1469	@item @code{word(\L\l+)} @tab a capitalized word-form
1470	@item @code{punct} @tab a punctuation character
1471	@item @code{space(.\\n.)} @tab a space segment containing a newline character
1472	@item @code{lexeme(pomoc)} @tab any form of the lexeme 'pomoc'
1473	@item @code{cat(N/.*)} @tab a word which category starts with @code{N/}
1474	@item @code{cat(<N/Ca>)} @tab a word which category matches @code{N/Ca}
1475	@end multitable
1476
1477	@*
1478	@noindent @b{Examples of multi-segment patterns:}
1479
1480	@table @code
1481
1482	@item (word(\L) punct(\.) space?)+ word(\L\l+)
1483	a sequence of initials followed by a surname
1484
1485	@item punct seg(W\|S\|N)* cat(<NPRO/Sr>) seg(W\|S\|N)* punct
1486	a text fragment between two punctuation characters, containing an
1487	ocurrence of a relative pronoun
1488
1489	@end table
1490
1491
1492	@node ser how ser works
1493	@subsection How ser works
1494
1495	@node ser customization
1496	@subsection Customization
1497
1498	@c All predefined terms correspond to single segments,
1499
1500	@example
1501	define(`verbseq', `(cat(<V>) (space cat(<V>)))')
1502	@end example
1503
1504
1505	the term @code{cat()} may not be used as a ... of
1506
1507	@c See @command{m4} manual for further details on macro definition format.
1508
1509	@node ser limitations
1510	@subsection Limitations
1511
1512	Do not use more than 3 attributes in <>.
1513
1514	@node ser requirements
1515	@subsection Requirements
1516
1517	In order to run @command{ser}, the following programs must be
1518	installed in the system:
1519
1520	@itemize
1521
1522	@item @command{m4}
1523	@item @command{grep}
1524	@item @command{flex}
1525	@item @command{gcc}
1526
1527	@end itemize
1528
1529
1530	@c ---------------------------------------------------------------------
1531	@c GRP
1532	@c ---------------------------------------------------------------------
1533
1534	@page
1535	@node grp
1536	@section grp - pattern search tool
1537
1538	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1539	@item @strong{Authors:} @tab Tomasz ObrÃªbski
1540	@item @strong{Component category:} @tab filter
1541	@item @strong{Input format:} @tab UTT flattened
1542	@item @strong{Output format:} @tab UTT flattened
1543	@item @strong{Required annotation:} @tab tok, sen, lem --one-field
1544	@end multitable
1545
1546
1547	@menu
1548	* grp description::
1549	* grp command line options::
1550	* grp pattern::
1551	* grp hints::
1552	@end menu
1553
1554
1555	@node grp description
1556	@subsection Description
1557
1558	@code{gre} selects sentences containing an expression matching a
1559	pattern. The pattern format is exactly the same as that accepted by
1560	@code{ser}.
1561
1562	@code{gre} is intended mainly for speeding up corpus search process.
1563	It is extremely fast (processing speed is usually higher then the speed
1564	of reading the corpus file from disk).
1565
1566	@node grp command line options
1567	@subsection Command line options
1568
1569	@table @code
1570
1571	@parhelp
1572	@parversion
1573	@parprocess
1574	@parinteractive
1575
1576	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1577	The search pattern.
1578
1579	@item @b{@minus{}@minus{}morph=@var{field}}
1580	The name of the annotation field containing the morphological
1581	description (default @code{lem}).
1582
1583	@item @b{@minus{}@minus{}command}
1584	Only print the generated flex source code.
1585
1586	@item @b{@minus{}@minus{}macro=@var{filename}}
1587	Read macrodefinitions from file @var{filename} rather than from
1588	default location. This option allows to redefine the set of terms.
1589
1590	@item @b{@minus{}@minus{}define=@var{filename}}
1591	Append macrodefinitions from file @var{filename}. This option
1592	allows to extend the set of terms.
1593
1594	@end table
1595
1596
1597	@node grp pattern
1598	@subsection Pattern
1599
1600	(see @code{ser})
1601
1602	@node grp hints
1603	@subsection Hints
1604
1605	The corpus search speed may be increased by combining grp with lzop
1606	compression tool (grp usually processes data faster than it is read from a
1607	disk, especially for slow laptop drives).
1608
1609	@example
1610	cat corpus \| tok \| sen \| lem -1 \| fla \| lzop -7 > corpus.grp.lzo
1611	@end example
1612
1613	@example
1614	lzop -cd corpus.grp.lzo \| grp -e @var{EXPR} \| unfla \| ser -e @var{EXPR}
1615	@end example
1616
1617
1618
1619	@c ---------------------------------------------------------------------
1620	@c MAR
1621	@c ---------------------------------------------------------------------
1622
1623	@page
1624	@node mar
1625	@section mar
1626
1627	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1628	@item @strong{Authors:} @tab Marcin Walas, Tomasz ObrÃªbski
1629	@item @strong{Input format:} @tab UTT flattened
1630	@item @strong{Output format:} @tab UTT flattened
1631	@item @strong{Required annotation:} @tab tok, sen, lem -1
1632	@end multitable
1633
1634	[TODO]
1635
1636	(see mar's help 'mar -h' for some information)
1637
1638	@c ---------------------------------------------------------------------
1639	@c KOT
1640	@c ---------------------------------------------------------------------
1641
1642
1643	@page
1644	@node kot
1645	@section kot - untokenizer
1646
1647	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1648	@item @strong{Authors:} @tab Tomasz ObrÃªbski
1649	@item @strong{Component category:} @tab filter
1650	@item @strong{Input format:} @tab UTT regular
1651	@item @strong{Output format:} @tab text
1652	@item @strong{Required annotation:} @tab tok
1653	@end multitable
1654
1655
1656	@menu
1657	* kot description::
1658	* kot command line options::
1659	* kot usage examples::
1660	@end menu
1661
1662	@node kot description
1663	@subsection Description
1664
1665	@command{kot} transforms a UTT formatted file back into raw text format.
1666
1667	@node kot command line options
1668	@subsection Command line options
1669
1670	@table @code
1671
1672	@parhelp
1673
1674	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1675
1676	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1677
1678	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1679
1680	@c @item @b{@minus{}@minus{}interactive @minus{}i}
1681
1682	@c @item @b{@minus{}@minus{}config=@var{filename}}
1683
1684	@item
1685
1686	@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
1687	print @var{string} between nonadjacent segments of the input file
1688
1689	@item @b{@minus{}@minus{}spaces, @minus{}r}
1690	retain the special characters @code{_}, @code{\t},
1691	@code{\n}, @code{\r}, @code{\f} unexpanded in the output
1692
1693	@end table
1694
1695	@node kot usage examples
1696	@subsection Usage examples
1697
1698	@example
1699	cat legia.txt \| tok \| kot
1700	@end example
1701
1702	@example
1703	cat legia.txt \| tok \| lem -1 \| kot
1704	@end example
1705
1706	@c ---------------------------------------------------------------
1707	@c CON
1708	@c ---------------------------------------------------------------
1709
1710
1711	@page
1712	@node con
1713	@section con - concordance table generator
1714
1715	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1716	@item @strong{Authors:} @tab Justyna Walkowska
1717	@item @strong{Component category:} @tab sink
1718	@item @strong{Input format:} @tab UTT regular
1719	@item @strong{Output format:} @tab text
1720	@item @strong{Required annotation:} @tab ser or mar
1721	@end multitable
1722	@c
1723
1724	@menu
1725	* con description::
1726	* con command line options::
1727	* con usage example::
1728	* con hints::
1729	@end menu
1730
1731
1732	@node con description
1733	@subsection Description
1734
1735	@command{con} generates a concordance table based on a pattern given to @command{ser}.
1736
1737
1738	@node con command line options
1739	@subsection Command line options
1740
1741	@table @code
1742
1743	@parhelp
1744
1745	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
1746	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1747	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1748	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1749	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
1750	@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
1751	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
1752	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
1753	@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
1754	@c @item @b{@minus{}@minus{}interactive @minus{}i}
1755	@c @item @b{@minus{}@minus{}config=@var{filename}}
1756	@c @item
1757	@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1758	@c search pattern
1759	@c
1760	@c @item @b{@minus{}@minus{}flex}
1761	@c only print the generated flex source code
1762	@c
1763	@c @item @b{@minus{}@minus{}macro=@var{filename}}
1764	@c read macrodefinitions from file @var{filename} rather than from
1765	@c default location. This option allows to redefine the set of terms.
1766	@c
1767	@c @item @b{@minus{}@minus{}define=@var{filename}}
1768	@c append macrodefinitions from file @var{filename}. This option
1769	@c allows to extend the set of terms.
1770
1771	@item @b{@minus{}@minus{}left @minus{}l}
1772	Left context info (default='30c'). Example:
1773	@example
1774	-l=5c: left context is 5 characters
1775	-l=5w: left context is 5 words
1776	-l=5s: left context is 5 non-empty input lines
1777	-l='\s*\S+\sr\S+BOS': left context starts with the given regex
1778	@end example
1779
1780	@item @b{@minus{}@minus{}right @minus{}r}
1781	Right context info (default='30c').
1782	@item @b{@minus{}@minus{}trim @minus{}t}
1783	Clear incomplete words from output.
1784	@item @b{@minus{}@minus{}white @minus{}w}
1785	DO NOT change all white characters into spaces.
1786	@item @b{@minus{}@minus{}column @minus{}c}
1787	Left column minimal width in characters (default = 0).
1788	@item @b{@minus{}@minus{}ignore @minus{}i}
1789	Ignore segment inconsistency in the input.
1790	@item @b{@minus{}@minus{}bom}
1791	Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
1792	@item @b{@minus{}@minus{}eom}
1793	End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
1794	@item @b{@minus{}@minus{}bod}
1795	Selected segment beginning display string (default='[').
1796	@item @b{@minus{}@minus{}eod}
1797	Selected segment end display string (default=']').
1798
1799
1800
1801	@end table
1802
1803	@node con usage example
1804	@subsection Usage example
1805	@example
1806	cat file.txt \| tok \| lem -1 \| ser -e 'lexeme(dom)' \| con
1807	@end example
1808
1809
1810	@node con hints
1811	@subsection Hints
1812
1813	@command{con} is a rather slow program. Do not pass large amounts of
1814	redundant text through this program. @command{con} works fine in the following
1815	sequence:
1816
1817	@example
1818	... \| grp -e EXPR \| ser -e EXPR \| con
1819	@end example
1820
1821
1822	@c ---------------------------------------------------------------------
1823	@c ---------------------------------------------------------------------
1824
1825	@page
1826	@node Auxiliary tools
1827	@chapter Auxiliary tools
1828
1829	@menu
1830	* compiledic:: dictionary compiler
1831	* fla:: UTT file flattener
1832	* unfla:: UTT file unflattener
1833	@end menu
1834
1835
1836	@page
1837	@node compiledic
1838	@section compiledic - the dictionary compiler
1839
1840	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1841	@item @strong{Authors:} @tab Michal Stolarski, Tomasz Obrebski
1842	@item @strong{Component category:} @tab additional tool
1843	@end multitable
1844	@c
1845
1846	@command{compiledic} compiles dictionaries in text format (@code{.dic} extension) into binary
1847	(FSA) format (@code{.bin} extension).
1848
1849	Automaton representation of a dictionary is built using the AT&T tools:
1850	@itemize
1851	@item AT&T FSM Library,
1852	@item AT&T Lextools.
1853	@end itemize
1854
1855	In order for the compiledic program to work you have to install the
1856	above mentioned packages into your system. They are freely available
1857	for non-commercial use.
1858
1859	Usage:
1860	@example
1861	compiledic <dictionaryname>.dic
1862	@end example
1863
1864	The file <dictionaryname>.bin will be generated.
1865
1866	Remarque: The program produces a lot of temporary files which are
1867	stored in the current directory. They are deleted after successfull
1868	termination of the program.
1869
1870	@c @menu
1871	@c * con command line options::
1872	@c * con usage example::
1873	@c * con hints::
1874	@c @end menu
1875
1876
1877	@c -------------------------------------------------------------------------------
1878	@c FLA
1879	@c -------------------------------------------------------------------------------
1880
1881	@page
1882	@node fla
1883	@section fla - the UTT file flattener
1884
1885	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1886	@item @strong{Authors:} @tab Tomasz ObrÃªbski
1887	@item @strong{Input format:} @tab UTT regular
1888	@item @strong{Output format:} @tab UTT flattened
1889	@item @strong{Required annotation:} @tab sen
1890	@end multitable
1891	@c
1892
1893	@menu
1894	* fla description::
1895	@c * fla command line options::
1896	@c * fla usage example::
1897	@end menu
1898
1899
1900	@node fla description
1901	@subsection Description
1902
1903	@command{fla} ``flattens'' a utt file by merging segments belonging
1904	to one sentence in one line. Technically, end-of-line characters
1905	('\n', ASCII code 10) are replaced with line-feed characters ('\f',
1906	ASCII code 12). The flattening makes it possible to process UTT files
1907	with such tools as @command{grep} or @command{sed} sentence by
1908	sentence (used in @command{grp} and @command{mar}).
1909
1910	Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
1911
1912	Flattened files are still human-readible.
1913
1914	Usage:
1915
1916	@example
1917	fla [<bosregex>]
1918	@end example
1919
1920	The facultative argument is a regular expression describing segments
1921	which should be treated as sentence beginnings (the test is: the
1922	segment contains a fragment matching the @code{<bosregex>}). By
1923	default, segments containing a field @code{BOS} are seeked.
1924
1925	@c -------------------------------------------------------------------------------
1926	@c UNFLA
1927	@c -------------------------------------------------------------------------------
1928
1929	@page
1930	@node unfla
1931	@section unfla - the UTT file unflattener
1932
1933	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1934	@item @strong{Authors:} @tab Tomasz ObrÃªbski
1935	@item @strong{Input format:} @tab UTT flattened
1936	@item @strong{Output format:} @tab UTT regular
1937	@item @strong{Required annotation:} @tab -
1938	@end multitable
1939
1940	@menu
1941	* unfla description::
1942	@c * fla command line options::
1943	@c * fla usage example::
1944	@end menu
1945
1946	@node unfla description
1947	@subsection Description
1948	@command{unfla} transforms a flattened UTT file, produced by
1949	@command{fla}, into the regular format by restoring end-of-line
1950	characters.
1951
1952
1953
1954
1955	@c ---------------------------------------------------------------------
1956	@c USAGE EXAMPLES
1957	@c ---------------------------------------------------------------------
1958
1959	@node Usage examples
1960	@chapter Usage examples
1961
1962	@subsubheading Simple pipelines
1963
1964	@enumerate
1965
1966	@item tokenization
1967
1968	cat text \| tok > output1
1969
1970	@item morphological annotation (1)
1971
1972	simple dictionary based lemmatization
1973
1974	cat text \| tok \| lem > output1
1975
1976	@item morphological annotation (2)
1977
1978	1) perform dictionary-based lemmatization
1979	4) guess descriptions for words which have no annotation
1980
1981	@example
1982	cat text \| tok \| lem \| gue -S lem > output2
1983	@end example
1984
1985	@item morphological annotation (3)
1986
1987	1) perform dictionary-based lemmatization
1988	2) try to correct words with no annotation
1989	3) perform dictionary-based lemmatization of corrected words
1990	4) guess descriptions for words which still have no annotation
1991
1992	@example
1993	cat text \| tok \| lem \| cor -p W -S lem \| lem -I cor \| gue -p W -S lem
1994	@end example
1995	@item spelling correction
1996
1997
1998
1999	@example
2000	cat text \| tok \| egrep ' W ' \| lem \| egrep -v 'lem:' \| cor -1
2001	@end example
2002
2003	@item Expression extraction
2004
2005	Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
2006
2007	@example
2008	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' -m \| kot > output4
2009	@end example
2010
2011	@item A word in context
2012
2013	Extraction of text fragments containing a form of the lexeme 'rozmowa' in
2014	the context of 5 preceeding and 5 succeeding corpus segments.
2015
2016	@example
2017	cat text \| tok \| lem -1 \| ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m \| kot > output
2018	@end example
2019
2020	@item generation of concordance table (1)
2021
2022	@example
2023	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
2024	@end example
2025
2026	10"
2027
2028	@item generation of concordance table (2)
2029
2030	The same as above but much faster
2031
2032	@example
2033	cat text \| tok \| lem -1 \| \
2034	grp -e 'cat(<V>) space lexeme(rozmowa)' \| \
2035	ser -e 'cat(<V>) space lexeme(rozmowa)' \| \
2036	con
2037	@end example
2038
2039	2"
2040
2041	@item generation of concordance table (3)
2042
2043	Usually, one performs repetitively search over the same corpus. In
2044	such case it is advisable to transform the corpus data into the format
2045	required by @command{grp} first, and then use the preprocessed data.
2046
2047	As @command{grp} (@command{grep}) processes data faster then it is
2048	read from the disk drive, the search time may be still shortened by
2049	using file compression techniques. We suggest using the
2050	@command{lzop} compressor/decompressor.
2051
2052	@item the fastest way to search a large corpus
2053
2054	step 1: corpus preprocessing
2055
2056	@example
2057	cat corpus \| tok \| sen \| lem -1 \
2058	\| fla \| lzop -7 > corpus.grp.lzo
2059	@end example
2060
2061	step 2: search
2062
2063	@example
2064	lzop -cd corpus.grp.lzo \| unfla \| grp -e 'cat(<V>) space
2065	lexeme(rozmowa)' \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
2066	@end example
2067
2068	@end enumerate
2069
2070	@c @subsubheading More complicated configurations
2071
2072
2073	@c @example
2074	@c mknod fifo1 p
2075	@c mknod fifo2 p
2076	@c mknod fifo3 p
2077	@c mknod fifo4 p
2078	@c mknod fifo5 p
2079
2080	@c tok \| lem -p W -e fifo1 > fifo2 &
2081	@c cor -e fifo3 < fifo1 \| lem > fifo4 &
2082	@c gue < fifo3 > fifo5 &
2083	@c sort -m fifo2 fifo4 fifo5
2084
2085	@c rm fifo?
2086	@c @end example
2087
2088
2089	@c ---------------------------------------------------------------------
2090	@c ---------------------------------------------------------------------
2091
2092	@c ---------------------------------------------------------------------
2093	@c PMDBF DICTIONARY
2094	@c ---------------------------------------------------------------------
2095
2096	@node PMDBF dictionary
2097	@chapter PMDBF dictionary
2098
2099	UTT components come with lexical data derived from Polish
2100	Morphological Database (PMDB).
2101
2102	@menu
2103	* PMDBF files::
2104	* PMDBF tag structure::
2105	* PMDBF parts of speech::
2106	* PMDBF morphosyntactic attributes::
2107	@end menu
2108
2109	@node PMDBF files
2110	@section Files
2111
2112	@node PMDBF tag structure
2113	@section Tag structure
2114
2115	pos = [[:upper:]]+
2116
2117	attr = [[:upper:]]+
2118
2119	val = [[:lower:][:digit:]?!*+-] \| <[^>\n]+>
2120
2121	descr = pos ( / ( attr val + ) + ) ?
2122
2123	@node PMDBF parts of speech
2124	@section Parts of speech
2125
2126	@multitable {ADJPRP} { adjectival-passive-participle }
2127	@item @code{N} @tab noun
2128	@item @code{NPRO} @tab nominal-pronoun
2129	@item @code{NV} @tab deverbal-noun
2130	@item @code{V} @tab verb
2131	@item @code{BYC} @tab byc
2132	@item @code{VNI} @tab non-inflected-verb
2133	@item @code{ADJ} @tab adjective
2134	@item @code{ADJPAP} @tab adjectival-passive-participle
2135	@item @code{ADJPRP} @tab adjectival-present-participle
2136	@item @code{ADJPP} @tab adjectival-past-participle
2137	@item @code{ADJPRO} @tab adjectival-pronoun
2138	@item @code{ADJNUM} @tab adjectival-numeral
2139	@item @code{ADV} @tab adverb
2140	@item @code{ADVANP} @tab adverbial-anterior-participle
2141	@item @code{ADVPRP} @tab adverbial-present-participle
2142	@item @code{ADVPRO} @tab adverbial-pronoun
2143	@item @code{ADVNUM} @tab adverbial-numeral
2144	@item @code{P} @tab preposition
2145	@item @code{PPRO} @tab prep-noun-pronoun
2146	@item @code{CONJ} @tab conjunction
2147	@item @code{EXCL} @tab exclamation
2148	@item @code{APP} @tab call
2149	@item @code{ONO} @tab onomatopoeia
2150	@item @code{PART} @tab particle
2151	@item @code{NUMCRD} @tab cardinal-numeral
2152	@item @code{NUMCOL} @tab collective-numeral
2153	@item @code{NUMPAR} @tab partitive-numeral
2154	@item @code{NUMORD} @tab ordinal-numeral
2155	@end multitable
2156
2157	@node PMDBF morphosyntactic attributes
2158	@section Morphosyntactic attributes
2159
2160	@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2161	@c @headitem Attr @tab Val @tab Description
2162	@item
2163	@code{A} @tab @tab Aspect
2164	@item
2165	@tab @code{p} @tab perfect
2166	@item
2167	@tab @code{i} @tab imperfect.
2168	@item
2169	@item
2170	@code{V} @tab @tab Verb-Form
2171	@item
2172	@tab @code{b} @tab infinitive,
2173	@item
2174	@tab @code{p} @tab personal,
2175	@item
2176	@tab @code{i} @tab impersonal.
2177	@item
2178	@item
2179	@code{M} @tab @tab Mood
2180	@item
2181	@tab @code{d} @tab declarative,
2182	@item
2183	@tab @code{c} @tab conditional,
2184	@item
2185	@tab @code{i} @tab imperative.
2186	@item
2187	@item
2188	@code{T} @tab @tab Tense
2189	@item
2190	@tab @code{a} @tab past,
2191	@item
2192	@tab @code{r} @tab present,
2193	@item
2194	@tab @code{f} @tab future.
2195	@item
2196	@item
2197	@code{P} @tab @tab Person
2198	@item
2199	@tab @code{1} @tab 1,
2200	@item
2201	@tab @code{2} @tab 2,
2202	@item
2203	@tab @code{3} @tab 3.
2204	@item
2205	@item
2206	@code{D} @tab @tab Degree
2207	@item
2208	@tab @code{p} @tab positive,
2209	@item
2210	@tab @code{c} @tab comparative,
2211	@item
2212	@tab @code{s} @tab superlative.
2213	@item
2214	@item
2215	@code{N} @tab @tab Number
2216	@item
2217	@tab @code{s} @tab singular,
2218	@item
2219	@tab @code{p} @tab plural.
2220	@item
2221	@item
2222	@code{C} @tab @tab Case
2223	@item
2224	@tab @code{n} @tab nominative,
2225	@item
2226	@tab @code{g} @tab genitive,
2227	@item
2228	@tab @code{d} @tab dative,
2229	@item
2230	@tab @code{a} @tab accusative,
2231	@item
2232	@tab @code{i} @tab instrumantal,
2233	@item
2234	@tab @code{l} @tab locative,
2235	@item
2236	@tab @code{v} @tab vocative.
2237	@item
2238	@item
2239	@code{G} @tab @tab Gender
2240	@item
2241	@tab @code{p} @tab masculine-personal,
2242	@item
2243	@tab @code{a} @tab masculine-animal,
2244	@item
2245	@tab @code{i} @tab masculine-inanimate,
2246	@item
2247	@tab @code{f} @tab feminine,
2248	@item
2249	@tab @code{n} @tab neuter.
2250	@end multitable
2251
2252
2253	@c ---------------------------------------------------------------------
2254	@c ---------------------------------------------------------------------
2255	@c
2256	@c @node Examples
2257	@c @chapter Examples
2258
2259	@c ----------------------------------------------------------------------
2260	@c ----------------------------------------------------------------------
2261
2262	@node GNU Free Documentation License
2263	@chapter GNU Free Documentation License
2264
2265	@c The GNU Free Documentation License.
2266	@center Version 1.2, November 2002
2267
2268	@c This file is intended to be included within another document,
2269	@c hence no sectioning command or @node.
2270
2271	@display
2272	Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
2273	51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
2274
2275	Everyone is permitted to copy and distribute verbatim copies
2276	of this license document, but changing it is not allowed.
2277	@end display
2278
2279	@enumerate 0
2280	@item
2281	PREAMBLE
2282
2283	The purpose of this License is to make a manual, textbook, or other
2284	functional and useful document @dfn{free} in the sense of freedom: to
2285	assure everyone the effective freedom to copy and redistribute it,
2286	with or without modifying it, either commercially or noncommercially.
2287	Secondarily, this License preserves for the author and publisher a way
2288	to get credit for their work, while not being considered responsible
2289	for modifications made by others.
2290
2291	This License is a kind of ``copyleft'', which means that derivative
2292	works of the document must themselves be free in the same sense. It
2293	complements the GNU General Public License, which is a copyleft
2294	license designed for free software.
2295
2296	We have designed this License in order to use it for manuals for free
2297	software, because free software needs free documentation: a free
2298	program should come with manuals providing the same freedoms that the
2299	software does. But this License is not limited to software manuals;
2300	it can be used for any textual work, regardless of subject matter or
2301	whether it is published as a printed book. We recommend this License
2302	principally for works whose purpose is instruction or reference.
2303
2304	@item
2305	APPLICABILITY AND DEFINITIONS
2306
2307	This License applies to any manual or other work, in any medium, that
2308	contains a notice placed by the copyright holder saying it can be
2309	distributed under the terms of this License. Such a notice grants a
2310	world-wide, royalty-free license, unlimited in duration, to use that
2311	work under the conditions stated herein. The ``Document'', below,
2312	refers to any such manual or work. Any member of the public is a
2313	licensee, and is addressed as ``you''. You accept the license if you
2314	copy, modify or distribute the work in a way requiring permission
2315	under copyright law.
2316
2317	A ``Modified Version'' of the Document means any work containing the
2318	Document or a portion of it, either copied verbatim, or with
2319	modifications and/or translated into another language.
2320
2321	A ``Secondary Section'' is a named appendix or a front-matter section
2322	of the Document that deals exclusively with the relationship of the
2323	publishers or authors of the Document to the Document's overall
2324	subject (or to related matters) and contains nothing that could fall
2325	directly within that overall subject. (Thus, if the Document is in
2326	part a textbook of mathematics, a Secondary Section may not explain
2327	any mathematics.) The relationship could be a matter of historical
2328	connection with the subject or with related matters, or of legal,
2329	commercial, philosophical, ethical or political position regarding
2330	them.
2331
2332	The ``Invariant Sections'' are certain Secondary Sections whose titles
2333	are designated, as being those of Invariant Sections, in the notice
2334	that says that the Document is released under this License. If a
2335	section does not fit the above definition of Secondary then it is not
2336	allowed to be designated as Invariant. The Document may contain zero
2337	Invariant Sections. If the Document does not identify any Invariant
2338	Sections then there are none.
2339
2340	The ``Cover Texts'' are certain short passages of text that are listed,
2341	as Front-Cover Texts or Back-Cover Texts, in the notice that says that
2342	the Document is released under this License. A Front-Cover Text may
2343	be at most 5 words, and a Back-Cover Text may be at most 25 words.
2344
2345	A ``Transparent'' copy of the Document means a machine-readable copy,
2346	represented in a format whose specification is available to the
2347	general public, that is suitable for revising the document
2348	straightforwardly with generic text editors or (for images composed of
2349	pixels) generic paint programs or (for drawings) some widely available
2350	drawing editor, and that is suitable for input to text formatters or
2351	for automatic translation to a variety of formats suitable for input
2352	to text formatters. A copy made in an otherwise Transparent file
2353	format whose markup, or absence of markup, has been arranged to thwart
2354	or discourage subsequent modification by readers is not Transparent.
2355	An image format is not Transparent if used for any substantial amount
2356	of text. A copy that is not ``Transparent'' is called ``Opaque''.
2357
2358	Examples of suitable formats for Transparent copies include plain
2359	@sc{ascii} without markup, Texinfo input format, La@TeX{} input
2360	format, @acronym{SGML} or @acronym{XML} using a publicly available
2361	@acronym{DTD}, and standard-conforming simple @acronym{HTML},
2362	PostScript or @acronym{PDF} designed for human modification. Examples
2363	of transparent image formats include @acronym{PNG}, @acronym{XCF} and
2364	@acronym{JPG}. Opaque formats include proprietary formats that can be
2365	read and edited only by proprietary word processors, @acronym{SGML} or
2366	@acronym{XML} for which the @acronym{DTD} and/or processing tools are
2367	not generally available, and the machine-generated @acronym{HTML},
2368	PostScript or @acronym{PDF} produced by some word processors for
2369	output purposes only.
2370
2371	The ``Title Page'' means, for a printed book, the title page itself,
2372	plus such following pages as are needed to hold, legibly, the material
2373	this License requires to appear in the title page. For works in
2374	formats which do not have any title page as such, ``Title Page'' means
2375	the text near the most prominent appearance of the work's title,
2376	preceding the beginning of the body of the text.
2377
2378	A section ``Entitled XYZ'' means a named subunit of the Document whose
2379	title either is precisely XYZ or contains XYZ in parentheses following
2380	text that translates XYZ in another language. (Here XYZ stands for a
2381	specific section name mentioned below, such as ``Acknowledgements'',
2382	``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title''
2383	of such a section when you modify the Document means that it remains a
2384	section ``Entitled XYZ'' according to this definition.
2385
2386	The Document may include Warranty Disclaimers next to the notice which
2387	states that this License applies to the Document. These Warranty
2388	Disclaimers are considered to be included by reference in this
2389	License, but only as regards disclaiming warranties: any other
2390	implication that these Warranty Disclaimers may have is void and has
2391	no effect on the meaning of this License.
2392
2393	@item
2394	VERBATIM COPYING
2395
2396	You may copy and distribute the Document in any medium, either
2397	commercially or noncommercially, provided that this License, the
2398	copyright notices, and the license notice saying this License applies
2399	to the Document are reproduced in all copies, and that you add no other
2400	conditions whatsoever to those of this License. You may not use
2401	technical measures to obstruct or control the reading or further
2402	copying of the copies you make or distribute. However, you may accept
2403	compensation in exchange for copies. If you distribute a large enough
2404	number of copies you must also follow the conditions in section 3.
2405
2406	You may also lend copies, under the same conditions stated above, and
2407	you may publicly display copies.
2408
2409	@item
2410	COPYING IN QUANTITY
2411
2412	If you publish printed copies (or copies in media that commonly have
2413	printed covers) of the Document, numbering more than 100, and the
2414	Document's license notice requires Cover Texts, you must enclose the
2415	copies in covers that carry, clearly and legibly, all these Cover
2416	Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
2417	the back cover. Both covers must also clearly and legibly identify
2418	you as the publisher of these copies. The front cover must present
2419	the full title with all words of the title equally prominent and
2420	visible. You may add other material on the covers in addition.
2421	Copying with changes limited to the covers, as long as they preserve
2422	the title of the Document and satisfy these conditions, can be treated
2423	as verbatim copying in other respects.
2424
2425	If the required texts for either cover are too voluminous to fit
2426	legibly, you should put the first ones listed (as many as fit
2427	reasonably) on the actual cover, and continue the rest onto adjacent
2428	pages.
2429
2430	If you publish or distribute Opaque copies of the Document numbering
2431	more than 100, you must either include a machine-readable Transparent
2432	copy along with each Opaque copy, or state in or with each Opaque copy
2433	a computer-network location from which the general network-using
2434	public has access to download using public-standard network protocols
2435	a complete Transparent copy of the Document, free of added material.
2436	If you use the latter option, you must take reasonably prudent steps,
2437	when you begin distribution of Opaque copies in quantity, to ensure
2438	that this Transparent copy will remain thus accessible at the stated
2439	location until at least one year after the last time you distribute an
2440	Opaque copy (directly or through your agents or retailers) of that
2441	edition to the public.
2442
2443	It is requested, but not required, that you contact the authors of the
2444	Document well before redistributing any large number of copies, to give
2445	them a chance to provide you with an updated version of the Document.
2446
2447	@item
2448	MODIFICATIONS
2449
2450	You may copy and distribute a Modified Version of the Document under
2451	the conditions of sections 2 and 3 above, provided that you release
2452	the Modified Version under precisely this License, with the Modified
2453	Version filling the role of the Document, thus licensing distribution
2454	and modification of the Modified Version to whoever possesses a copy
2455	of it. In addition, you must do these things in the Modified Version:
2456
2457	@enumerate A
2458	@item
2459	Use in the Title Page (and on the covers, if any) a title distinct
2460	from that of the Document, and from those of previous versions
2461	(which should, if there were any, be listed in the History section
2462	of the Document). You may use the same title as a previous version
2463	if the original publisher of that version gives permission.
2464
2465	@item
2466	List on the Title Page, as authors, one or more persons or entities
2467	responsible for authorship of the modifications in the Modified
2468	Version, together with at least five of the principal authors of the
2469	Document (all of its principal authors, if it has fewer than five),
2470	unless they release you from this requirement.
2471
2472	@item
2473	State on the Title page the name of the publisher of the
2474	Modified Version, as the publisher.
2475
2476	@item
2477	Preserve all the copyright notices of the Document.
2478
2479	@item
2480	Add an appropriate copyright notice for your modifications
2481	adjacent to the other copyright notices.
2482
2483	@item
2484	Include, immediately after the copyright notices, a license notice
2485	giving the public permission to use the Modified Version under the
2486	terms of this License, in the form shown in the Addendum below.
2487
2488	@item
2489	Preserve in that license notice the full lists of Invariant Sections
2490	and required Cover Texts given in the Document's license notice.
2491
2492	@item
2493	Include an unaltered copy of this License.
2494
2495	@item
2496	Preserve the section Entitled ``History'', Preserve its Title, and add
2497	to it an item stating at least the title, year, new authors, and
2498	publisher of the Modified Version as given on the Title Page. If
2499	there is no section Entitled ``History'' in the Document, create one
2500	stating the title, year, authors, and publisher of the Document as
2501	given on its Title Page, then add an item describing the Modified
2502	Version as stated in the previous sentence.
2503
2504	@item
2505	Preserve the network location, if any, given in the Document for
2506	public access to a Transparent copy of the Document, and likewise
2507	the network locations given in the Document for previous versions
2508	it was based on. These may be placed in the ``History'' section.
2509	You may omit a network location for a work that was published at
2510	least four years before the Document itself, or if the original
2511	publisher of the version it refers to gives permission.
2512
2513	@item
2514	For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
2515	the Title of the section, and preserve in the section all the
2516	substance and tone of each of the contributor acknowledgements and/or
2517	dedications given therein.
2518
2519	@item
2520	Preserve all the Invariant Sections of the Document,
2521	unaltered in their text and in their titles. Section numbers
2522	or the equivalent are not considered part of the section titles.
2523
2524	@item
2525	Delete any section Entitled ``Endorsements''. Such a section
2526	may not be included in the Modified Version.
2527
2528	@item
2529	Do not retitle any existing section to be Entitled ``Endorsements'' or
2530	to conflict in title with any Invariant Section.
2531
2532	@item
2533	Preserve any Warranty Disclaimers.
2534	@end enumerate
2535
2536	If the Modified Version includes new front-matter sections or
2537	appendices that qualify as Secondary Sections and contain no material
2538	copied from the Document, you may at your option designate some or all
2539	of these sections as invariant. To do this, add their titles to the
2540	list of Invariant Sections in the Modified Version's license notice.
2541	These titles must be distinct from any other section titles.
2542
2543	You may add a section Entitled ``Endorsements'', provided it contains
2544	nothing but endorsements of your Modified Version by various
2545	parties---for example, statements of peer review or that the text has
2546	been approved by an organization as the authoritative definition of a
2547	standard.
2548
2549	You may add a passage of up to five words as a Front-Cover Text, and a
2550	passage of up to 25 words as a Back-Cover Text, to the end of the list
2551	of Cover Texts in the Modified Version. Only one passage of
2552	Front-Cover Text and one of Back-Cover Text may be added by (or
2553	through arrangements made by) any one entity. If the Document already
2554	includes a cover text for the same cover, previously added by you or
2555	by arrangement made by the same entity you are acting on behalf of,
2556	you may not add another; but you may replace the old one, on explicit
2557	permission from the previous publisher that added the old one.
2558
2559	The author(s) and publisher(s) of the Document do not by this License
2560	give permission to use their names for publicity for or to assert or
2561	imply endorsement of any Modified Version.
2562
2563	@item
2564	COMBINING DOCUMENTS
2565
2566	You may combine the Document with other documents released under this
2567	License, under the terms defined in section 4 above for modified
2568	versions, provided that you include in the combination all of the
2569	Invariant Sections of all of the original documents, unmodified, and
2570	list them all as Invariant Sections of your combined work in its
2571	license notice, and that you preserve all their Warranty Disclaimers.
2572
2573	The combined work need only contain one copy of this License, and
2574	multiple identical Invariant Sections may be replaced with a single
2575	copy. If there are multiple Invariant Sections with the same name but
2576	different contents, make the title of each such section unique by
2577	adding at the end of it, in parentheses, the name of the original
2578	author or publisher of that section if known, or else a unique number.
2579	Make the same adjustment to the section titles in the list of
2580	Invariant Sections in the license notice of the combined work.
2581
2582	In the combination, you must combine any sections Entitled ``History''
2583	in the various original documents, forming one section Entitled
2584	``History''; likewise combine any sections Entitled ``Acknowledgements'',
2585	and any sections Entitled ``Dedications''. You must delete all
2586	sections Entitled ``Endorsements.''
2587
2588	@item
2589	COLLECTIONS OF DOCUMENTS
2590
2591	You may make a collection consisting of the Document and other documents
2592	released under this License, and replace the individual copies of this
2593	License in the various documents with a single copy that is included in
2594	the collection, provided that you follow the rules of this License for
2595	verbatim copying of each of the documents in all other respects.
2596
2597	You may extract a single document from such a collection, and distribute
2598	it individually under this License, provided you insert a copy of this
2599	License into the extracted document, and follow this License in all
2600	other respects regarding verbatim copying of that document.
2601
2602	@item
2603	AGGREGATION WITH INDEPENDENT WORKS
2604
2605	A compilation of the Document or its derivatives with other separate
2606	and independent documents or works, in or on a volume of a storage or
2607	distribution medium, is called an ``aggregate'' if the copyright
2608	resulting from the compilation is not used to limit the legal rights
2609	of the compilation's users beyond what the individual works permit.
2610	When the Document is included in an aggregate, this License does not
2611	apply to the other works in the aggregate which are not themselves
2612	derivative works of the Document.
2613
2614	If the Cover Text requirement of section 3 is applicable to these
2615	copies of the Document, then if the Document is less than one half of
2616	the entire aggregate, the Document's Cover Texts may be placed on
2617	covers that bracket the Document within the aggregate, or the
2618	electronic equivalent of covers if the Document is in electronic form.
2619	Otherwise they must appear on printed covers that bracket the whole
2620	aggregate.
2621
2622	@item
2623	TRANSLATION
2624
2625	Translation is considered a kind of modification, so you may
2626	distribute translations of the Document under the terms of section 4.
2627	Replacing Invariant Sections with translations requires special
2628	permission from their copyright holders, but you may include
2629	translations of some or all Invariant Sections in addition to the
2630	original versions of these Invariant Sections. You may include a
2631	translation of this License, and all the license notices in the
2632	Document, and any Warranty Disclaimers, provided that you also include
2633	the original English version of this License and the original versions
2634	of those notices and disclaimers. In case of a disagreement between
2635	the translation and the original version of this License or a notice
2636	or disclaimer, the original version will prevail.
2637
2638	If a section in the Document is Entitled ``Acknowledgements'',
2639	``Dedications'', or ``History'', the requirement (section 4) to Preserve
2640	its Title (section 1) will typically require changing the actual
2641	title.
2642
2643	@item
2644	TERMINATION
2645
2646	You may not copy, modify, sublicense, or distribute the Document except
2647	as expressly provided for under this License. Any other attempt to
2648	copy, modify, sublicense or distribute the Document is void, and will
2649	automatically terminate your rights under this License. However,
2650	parties who have received copies, or rights, from you under this
2651	License will not have their licenses terminated so long as such
2652	parties remain in full compliance.
2653
2654	@item
2655	FUTURE REVISIONS OF THIS LICENSE
2656
2657	The Free Software Foundation may publish new, revised versions
2658	of the GNU Free Documentation License from time to time. Such new
2659	versions will be similar in spirit to the present version, but may
2660	differ in detail to address new problems or concerns. See
2661	@uref{http://www.gnu.org/copyleft/}.
2662
2663	Each version of the License is given a distinguishing version number.
2664	If the Document specifies that a particular numbered version of this
2665	License ``or any later version'' applies to it, you have the option of
2666	following the terms and conditions either of that specified version or
2667	of any later version that has been published (not as a draft) by the
2668	Free Software Foundation. If the Document does not specify a version
2669	number of this License, you may choose any version ever published (not
2670	as a draft) by the Free Software Foundation.
2671	@end enumerate
2672
2673	@page
2674	@heading ADDENDUM: How to use this License for your documents
2675
2676	To use this License in a document you have written, include a copy of
2677	the License in the document and put the following copyright and
2678	license notices just after the title page:
2679
2680	@smallexample
2681	@group
2682	Copyright (C) @var{year} @var{your name}.
2683	Permission is granted to copy, distribute and/or modify this document
2684	under the terms of the GNU Free Documentation License, Version 1.2
2685	or any later version published by the Free Software Foundation;
2686	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
2687	Texts. A copy of the license is included in the section entitled ``GNU
2688	Free Documentation License''.
2689	@end group
2690	@end smallexample
2691
2692	If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
2693	replace the ``with@dots{}Texts.'' line with this:
2694
2695	@smallexample
2696	@group
2697	with the Invariant Sections being @var{list their titles}, with
2698	the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
2699	being @var{list}.
2700	@end group
2701	@end smallexample
2702
2703	If you have Invariant Sections without Cover Texts, or some other
2704	combination of the three, merge those two alternatives to suit the
2705	situation.
2706
2707	If your document contains nontrivial examples of program code, we
2708	recommend releasing these examples in parallel under your choice of
2709	free software license, such as the GNU General Public License,
2710	to permit their use in free software.
2711
2712	@c Local Variables:
2713	@c ispell-local-pdict: "ispell-dict"
2714	@c End:
2715
2716
2717	@c ---------------------------------------------------------------------
2718	@c ---------------------------------------------------------------------
2719
2720	@node Reporting bugs
2721	@chapter Reporting bugs
2722
2723	Report bugs to <obrebski@@amu.edu.pl>.
2724
2725	@c ---------------------------------------------------------------------
2726	@c ---------------------------------------------------------------------
2727
2728	@c @node Copyright
2729	@c @chapter Copyright
2730	@c
2731	@c Copyright 2004 by Tomasz Obrebski
2732	@c This software is free for research and educational use.
2733
2734	@c ---------------------------------------------------------------------
2735	@c ---------------------------------------------------------------------
2736
2737	@node Author
2738	@chapter Author
2739
2740
2741	@bye

Note: See TracBrowser for help on using the repository browser.

UAM Text Tools

Context Navigation

source: app/doc/utt.texinfo @ 91ed676

Download in other formats: