Context Navigation

utt.texinfo @ 04ae414

help

Last change on this file since 04ae414 was 04ae414, checked in by obrebski <obrebski@…>, 17 years ago

w utt.texinfo usuniete polskie znaki z jednego miejsca (nie kompilowalo
sie). NADAL POZOSTAJE PROBLEM Z POLSKIMI ZNAKAMI W TEXINFO!

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@58 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

Property mode set to 100644

File size: 79.3 KB

Line
1	\input texinfo @c --texinfo--
2	@documentencoding ISO-8859-2
3	@c @documentlanguage pl
4
5	@c %**start of header
6	@setfilename utt.info
7	@settitle UAM Text Tools v0.90
8	@c %**end of header
9
10	@copying
11	This manual is for UAM Text Tools (version 0.90, November, 2007)
12
13	Copyright @copyright{} 2005, 2007 Tomasz ObrÃªbski, MichaÂ³ Stolarski, Justyna Walkowska, PaweÂ³ Konieczka.
14
15	Permission is granted to copy, distribute and/or modify this document
16	under the terms of the GNU Free Documentation License, Version 1.2
17	or any later version published by the Free Software Foundation;
18	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
19	Texts. A copy of the license is included in the section entitled GNU Free Documentation License,,GNU Free Documentation License.
20
21	@c @quotation
22	@c Permission is granted to ...
23	@c No permission is granted until the document is completed.
24	@c @end quotation
25	@end copying
26
27
28	@titlepage
29	@title UAM Text Tools 0.90 - User Manual
30	@subtitle edition 0.01, @today
31	@subtitle status: prescript
32	@author by Justyna Walkowska, Tomasz Obr@,{}ebski and Micha@l{} Stolarski
33	@page
34	@vskip 0pt plus 1filll
35	@insertcopying
36	@end titlepage
37
38	@contents
39
40	@c @paragraphindent none
41
42	@iftex
43	@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
44	@end iftex
45
46	@c @headings off
47	@c @everyheading LEM(1) @\| @\| LEM(1)
48	@everyfooting @today @c @\| @thispage @\|
49
50	@ifnottex
51
52	@node Top
53	@top UTT - UAM Text Tools
54
55	@insertcopying
56
57	@menu
58	* General information::
59	* UTT file format::
60	* Configuration files::
61	* UTT components::
62	* Auxiliary tools::
63	* Usage examples::
64	* PMDBF dictionary::
65	@c * Examples::
66	@c * Copyright::
67	* GNU Free Documentation License::
68	* Reporting bugs::
69	* Author::
70	@end menu
71	@end ifnottex
72
73
74	@c ----------------------------------------------------------------------
75
76	@node General information
77	@chapter General information
78
79	UAM Text Tools (UTT) is a package of language processing tools
80	developed at Adam Mickiewicz University. Its functionality includes:
81
82	@itemize @bullet
83
84	@item
85	tokenization
86	@item
87	dictionary-based morphological analysis
88	@item
89	heuristic morphological analysis of unknown words
90	@item
91	spelling correction
92	@item
93	pattern search
94	@item
95	sentence splitting
96	@item
97	generation of concordance tables
98	@end itemize
99
100	The toolkit is destined for processing of raw (not annotated)
101	unrestricted text for any conceivable purpose.
102
103	The system is organized as a collection of command-line programs, each
104	performing one operation, e.g. tokenization, lemmatization, spelling
105	correction. The components are independent one from another, the
106	unifying element being the uniform i/o file format.
107
108	The components may be combined in various ways to provide various text
109	processing services. Also new components supplied by the used may be
110	easily incorporated into the system provided that they respect the i/o
111	file format conventions.
112
113	UTT component programs does not depend on any specific tagset or
114	morphological description format.
115
116	UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
117	the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
118
119	The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use.
120
121
122	List of contributors:
123
124	@itemize
125	@item Pawel Konieczka
126	@item Tomasz Obrebski
127	@item Michal Stolarski
128	@item Marcin Walas
129	@item Justyna Walkowska
130	@item Pawel Werenski
131	@end itemize
132
133	@c ----------------------------------------------------------------------
134	@c ---------------------------------------------------------------------
135
136	@node UTT file format
137	@chapter UTT file format
138
139	A UTT file contains annotation of a text. It consists of a sequence of
140	segments. Each segment explicitly refers to a continuous piece of the
141	text and provides some information on it.
142
143	@section Segment format
144
145	A segment occupies one line of a UTT file and consists of
146	space-separated fields:
147
148
149	@quotation
150	@sp 1
151	[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
152	@sp 1
153	@end quotation
154
155	@table @var
156
157	@item @var{start}
158	Non-negative integer value indicating the position in the source text where the
159	segment starts.
160
161	@item @var{length}
162	Non-negative integer value indicating the length of the segment.
163
164	@item @var{type}
165	A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
166	@var{type} reflects the main classification of segments -
167	into words, numbers, punctuation marks, meta-text markers.
168	@xref{tok output,,tok output}, for description of automatically recognized type markers.
169
170	@item @var{form}
171	This field contains the textual form of the segment or the special
172	symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
173
174	The characters or character sequences that have special meaning in the
175	@var{form} field are enumerated below.
176
177	Characters with special meaning:
178
179	@itemize
180	@item @code{_} - space character
181	@item @code{*} - undefined contents
182	@end itemize
183
184	Escape sequences:
185
186	@itemize
187	@item @code{\n} - new line
188	@item @code{\t} - tabulation
189	@item @code{\r} - carriage return
190
191	@item @code{\_} - the @code{_} character
192	@item @code{\} - the @code{} character
193	@item @code{\\} - the @code{\} character
194
195	@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
196	@end itemize
197
198	@item @var{annotation1}
199	@item @var{annotation2}
200	@item ...
201	Annotation fields have the following format:
202
203	@var{longname} @code{:} @var{value}
204
205	or
206
207	@var{shortname} @var{value}
208
209	where @var{longname} is a string of alphanumeric characters
210	(isalnum() test), @var{shortname} - a single non-alphanumeric character
211	(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
212
213	@end table
214
215
216	Only two fields are mandatory: @var{type} and @var{form}. All other fields
217	may be absent. In the case when only one number precedes the
218	@var{type} field, it is interpreted as the @var{START} position.
219
220	If the @var{length} field is ommited, the length of the segment is the
221	length of the @var{form} field, except when the value of the
222	@var{form} field is @code{*} -- in this case, the length is assumed to
223	be 0.
224
225	If the @var{start} field is also absent, the segment is assumed to directly
226	follow the preceding one.
227
228	@c Conventions:
229
230	@c Annotation fields with predefined meaning:
231
232	@c @itemize
233	@c @item @code{!} - UTT components are allowed to modify the contents of
234	@c the @var{form} field (e.g. spelling correction does this). If this happens the
235	@c original form of the segment have to be placed in the @code{!}-field.
236	@c @item @code{@@} - morphological description
237	@c @item @code{=} - node identifier assignment (used in graph encoding)
238	@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
239	@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
240	@c @end itemize
241
242	Segments of length 0 may be used to mark file positions with some
243	information. See e.g. BOS and EOS (beginning/end of sentence) markers
244	in the example below.
245
246	Example:
247
248	sentence: @samp{Piszemy dobre progrumy.}
249
250	@example
251	0000 00 BOS *
252	0000 07 W Piszemy lem:pisaÃŠ,V
253	0007 01 S _
254	0008 05 W dobre lem:dobry,ADJ
255	0013 01 S _
256	0014 08 W progrumy cor:programy lem:program,N
257	0022 01 P .
258	0023 00 EOS *
259	0023 01 S _
260	0024 00 BOS *
261	0024 11 W Warszawiacy lem:Warszawiak,N
262	0035 01 S _
263	0036 03 W teÂ¿
264	0039 01 P .
265	0040 00 EOS *
266
267	@end example
268
269	@example
270	0000 BOS *
271	0000 W Piszemy lem:pisaÃŠ,V
272	0007 S _
273	0008 W dobre lem:dobry,ADJ
274	0013 S _
275	0014 W progrumy cor:programy lem:program,N
276	0022 P .
277	0023 EOS *
278	@end example
279
280	Posion information may be provided only for some types of segments:
281
282	@example
283	0000 BOS *
284	W Piszemy lem:pisaÃŠ,V
285	S _
286	W dobre lem:dobry,ADJ
287	S _
288	W progrumy cor:programy lem:program,N
289	P .
290	EOS *
291	S _
292	0024 BOS *
293	W Warszawiacy lem:Warszawiak,N
294	S _
295	W teÂ¿
296	P .
297	EOS *
298	@end example
299
300	Position/length information may be provided only when necessary:
301
302	@example
303	0000 04 N *
304	0000 N 12
305	P .
306	N 5
307	S _
308	W km
309	@end example
310
311	@section UTT File
312
313	A UTT file consists of a sequence of segments. The same text position
314	may be covered by multiple segments. In cosequence, ambiguous text
315	segmentation and ambiguous annotation may be represented.
316
317	There are two structural requirements a valid UTT-formatted file
318	has to meet:
319
320	@itemize @bullet
321
322	@item
323	segments have to be sorted with respect to the @var{position} field,
324
325	@item
326	for each
327	segment ending at position @var{n}, either there must be a segment starting at
328	position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
329	for each segment starting at position @var{n}, either there must be a segment
330	ending at position @var{n-1}, or the position @var{n-1} must not be covered
331	by any segment.
332
333	@end itemize
334
335	A valid annotation for the text fragment
336	@example
337	12.5 km
338	@end example
339
340	may be
341
342	@example
343	0000 02 N 12
344	0000 04 N 12.5
345	0002 01 P .
346	0003 01 N 5
347	0004 01 S _
348	0005 02 W km
349	@end example
350
351	but not
352
353	@example
354	0000 02 N 12
355	0000 04 N 12.5
356	0004 01 S _
357	0005 02 W km
358	@end example
359
360	because in the latter example the first segment (starting at position 0000, 2 characters long) ends at position @var{n}=0001 which is covered by the second segment and no segment starts at position @var{n+2}=0002.
361
362	@section Character encoding
363
364	The UTT component programs accept only 1-byte character encoding, such
365	as ISO, ANSI, DOS, UTF-8 (probably: not tested yet).
366
367
368	@c @section Formats
369
370	@c @unnumberedsubsubsec Basic format
371
372	@c While processing large amounts of the overhead related with explicit
373	@c ... of the start position and segment length becomes ... . Therefore,
374	@c for efficiency reasons certain shortcuts are possible:
375
376	@c @unnumberedsubsubsec Relative start position
377
378	@c Start position may be given as relative distance from the last
379	@c absolut position.
380
381	@c @unnumberedsubsubsec Absent length
382
383	@c Segment length may by omitted. Normally it can be restored by counting
384	@c the length of the @emph{form field}. For segments with the special value
385	@c @code{*} in the @emph{form field} length 0 is assumed.
386
387	@c @unnumberedsubsubsec Absent length and start position
388
389	@c Both start position and segment length may be omitted. In this format
390	@c each segment is assumed to follow the previous one. This format is,
391	@c therefore, suitable only for unambiguously tagged text
392	@c (0-length markers can be still used.)
393
394
395	@c @table @code
396	@c @item AL
397	@c @code{1234 03 W kot}
398	@c @item RL
399	@c @code{+56 03 W kot}
400	@c @item A
401	@c @code{1234 W kot}
402	@c @item R
403	@c @code{+56 W kot}
404	@c @item 0
405	@c @code{W kot}
406	@c @end table
407
408
409	@c [JAK UZYSKAÃ POLSKIE CZCIONKI W DVI???]
410
411	@macro parhelp
412	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
413	Print help.
414	@end macro
415
416
417	@macro parversion
418	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
419	Print version information.
420	@end macro
421
422	@macro parinteractive
423	@item @b{@minus{}@minus{}interactive, @minus{}i}
424	This option toggles interactive mode, which is by default off. In the
425	interactive mode the program does not buffer the output.
426	@end macro
427
428
429	@c @macro parfile
430	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
431	@c Input file name.
432	@c If this option is absent or equal to '@minus{}', the program
433	@c reads from the standard input.
434	@c @end macro
435
436
437	@c @macro paroutput
438	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
439	@c Regular output file name. To regular output the program sends segments
440	@c which it successfully processed and copies those which were not
441	@c subject to processing. If this option is absent or equal to
442	@c '@minus{}', standard output is used.
443	@c @end macro
444
445	@c @macro parfail
446	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
447	@c Fail output file name. To fail output the program copies the segments
448	@c it failed to process. If this option is absent or equal to
449	@c '@minus{}', standard output is used.
450	@c @end macro
451
452
453	@c @macro parcopy
454	@c @item @b{@minus{}@minus{}copy, @minus{}c}
455	@c Copy succesfully processed segments to regular output also in their
456	@c original input form.
457	@c @end macro
458
459
460	@macro parinputfield
461	@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
462	The field containing the input to the program. The default is the
463	@var{form} field. The fields @var{position}, @var{length}, @var{type},
464	and @var{form} are referred to as @code{1}, @code{2}, @code{3},
465	@code{4}, respectively.
466	@end macro
467
468
469	@macro paroutputfield
470	@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
471	The name of the field added by the program. The default is the name of the program.
472	@end macro
473
474
475	@macro pardictionary
476	@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
477	Dictionary file name.
478	@end macro
479
480
481	@macro parprocess
482	@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
483	Process segments with the specified value in the @var{type} field.
484	Multiple occurences of this option are allowed and are interpreted as
485	disjunction. If this option is absent, all segments are processed.
486	@end macro
487
488
489	@macro parselect
490	@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
491	Select for processing only segments in which the field named
492	@var{fieldname} is present. Multiple occurences of this option are
493	allowed and are interpreted as conjunction of conditions. If this
494	option is absent, all segments are processed.
495	@end macro
496
497
498	@macro parunselect
499	@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
500	Select for processing only segments in which the field @var{fieldname}
501	is absent. Multiple occurences of this option are allowed and are
502	interpreted as conjunction of conditions. If this option is absent,
503	all segments are processed.
504	@end macro
505
506
507	@macro paroneline
508	@item @b{@minus{}@minus{}one-line}
509	This option makes the program print ambiguous annotation in one output
510	line by generating multiple annotation fields. By default when
511	ambiguous annotation may be produced for a segment, the segment is
512	multiplicated and each of the annotations is added to separate copy of
513	the segment.
514	@end macro
515
516
517	@macro paronefield
518	@item @b{@minus{}@minus{}one-field, @minus{}1}
519	This option makes the program print ambiguous annotation in one
520	annotation field. By default when ambiguous annotation may be produced
521	for a segment, the segment is multiplicated and each of the
522	annotations is added to separate copy of the segment.
523
524	This option is useful when working with @command{kot} or @command{con}.
525	@end macro
526
527
528	@c ---------------------------------------------------------------------
529	@c ---------------------------------------------------------------------
530
531	@c @node Common command line options
532	@c @chapter Common command line options
533
534	@c @table @code
535
536	@c @parhelp
537
538	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
539	@c Print help.
540
541	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
542	@c Print version information.
543
544	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
545	@c Input file name.
546	@c If this option is absent or equal to '@minus{}', the program
547	@c reads from the standard input.
548
549	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
550	@c Regular output file name. To regular output the program sends segments
551	@c which it successfully processed and copies those which were not
552	@c subject to processing. If this option is absent or equal to
553	@c '@minus{}', standard output is used.
554
555	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
556	@c Fail output file name. To fail output the program copies the segments
557	@c it failed to process. If this option is absent or equal to
558	@c '@minus{}', standard output is used.
559
560	@c @item @b{@minus{}@minus{}only-fail}
561	@c Discard segments which would normally be sent to regular
562	@c output. Print only segments the program failed to process.
563
564	@c @item @b{@minus{}@minus{}no-fail}
565	@c Discard segments the program failed to process.
566	@c (This and the previous option are functionally equivalent to,
567	@c respectively, @option{-o /dev/null} and @option{-e /dev/null}, but
568	@c make the programs run faster.)
569
570	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
571	@c The field containing the input to the program. The default is usually
572	@c the @var{form} field (unless otherwise stated in the program
573	@c description). The fields @var{position}, @var{length}, @var{tag}, and
574	@c @var{form} are referred to as @code{1}, @code{2}, @code{3}, @code{4},
575	@c respectively.
576
577	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
578	@c The name of the field added by the program. The default is the name of
579	@c the program.
580
581	@c @c @item @b{@minus{}@minus{}copy, @minus{}c}
582	@c @c Copy processed segments to regular output.
583
584	@c @item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
585	@c Dictionary file name.
586	@c (This option is used by programs which use dictionary data.)
587
588	@c @item @b{@minus{}@minus{}process=@var{tag}, @minus{}p @var{tag}}
589	@c Process segments with the specified value in the @var{tag} field.
590	@c Multiple occurences of this option are allowed and are interpreted as
591	@c disjunction. If this option is absent, all segments are processed.
592
593	@c @item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
594	@c Select for processing only segments in which the field named
595	@c @var{fieldname} is present. Multiple occurences of this option are
596	@c allowed and are interpreted as conjunction of conditions. If this
597	@c option is absent, all segments are processed.
598
599	@c @item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
600	@c Select for processing only segments in which the field @var{fieldname}
601	@c is absent. Multiple occurences of this option are allowed and are
602	@c interpreted as conjunction of conditions. If this option is absent,
603	@c all segments are processed.
604
605	@c @item @b{@minus{}@minus{}interactive @minus{}i}
606	@c This option toggles interactive mode, which is by default off. In the
607	@c interactive mode the program does not buffer the output.
608
609	@c @item @b{@minus{}@minus{}config=@var{filename}}
610	@c Read configuration from file @file{@var{filename}}.
611
612	@c @item @b{@minus{}@minus{}one @minus{}1}
613	@c This option makes the program print ambiguous annotation in one output
614	@c segment. By default when
615	@c ambiguous new annotation is being produced for a segment, the segment
616	@c is multiplicated and each of the annotations is added to separate copy
617	@c of the segment.
618
619	@c @end table
620
621	@c ---------------------------------------------------------------------
622	@c CONFIGURATION FILES
623	@c ---------------------------------------------------------------------
624
625	@node Configuration files
626	@chapter Configuration files
627
628	Values for all command line options accepted by a component
629	may be set in configuration files. The default location of the
630	configuration files for a component named @command{@var{program}} are
631
632	@example
633	@file{/usr/local/etc/utt/@var{program}.conf}
634	@end example
635
636	for system-wide configuration file and
637
638	@example
639	@file{~/.utt/@var{program}.conf}
640	@end example
641
642	for user configuration file.
643
644	@c The configuration file to load may be also specified with the
645	@c @option{--config} option. Configuration file need not be provided.
646
647	For each option, the value is set according to the following priority:
648
649	@itemize
650	@item command line
651	@c @item configuration file indicated with @option{--config} option
652	@item user configuration file (or configuration file indicated with the @option{--config} option)
653	@item system-wide configuration file
654	@end itemize
655
656	Parameter values are specified in the following format:
657
658	@var{parametername}=@var{value}
659
660	where @var{parametername} is the short or long name of an option accepted by
661	the program, or
662
663	@var{parametername}
664
665	if the option does not need arguments.
666
667	You can introduce comments to configuration files using the # sign.
668
669	If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
670
671	@c The equal sign may be omitted.
672
673
674	@quotation Tip
675	If you have two (or more) frequently used sets of options for the same
676	program (eg. lem with PMDBF dictionary and lem with a user dictionary)
677	a good solution is to create two soft links to lem, called
678	eg. lemg and lemu and specify their configuration in files lemg.conf
679	and lemu.conf respectively.
680	@end quotation
681
682	@c ---------------------------------------------------------------------
683	@c COMPONENTS
684	@c ---------------------------------------------------------------------
685
686	@node UTT components
687	@chapter UTT components
688
689	UTT components are of three types:
690
691	@menu
692	Sources: programs which read non-UTT data (e.g. raw text) and produce output
693	in UTT format
694	* tok:: a tokenizer
695
696	Filters: programs which read and produce UTT-formatted data
697	@c * sen - the sentencizer::
698	* lem:: a morphological analyzer
699	* gue:: a morphological guesser
700	* cor:: a spelling corrector
701	* sen:: a sentensizer
702	@c * gph - the graphizer::
703	* ser:: a pattern search tool (marks matches)
704	* grp:: a pattern search tool (selects sentences containing a match)
705
706	Sinks: programs which read UTT data and produce output in another format
707	* kot:: an untokenizer
708	* con:: a concordance table generator
709	@end menu
710
711	@c ---------------------------------------------------------------------
712	@c TOK
713	@c ---------------------------------------------------------------------
714
715	@page
716	@node tok
717	@section tok - a tokenizer
718
719	@c ----------------------------------------
720
721	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
722	@item @strong{Authors:} @tab Tomasz ObrÃªbski
723	@item @strong{Component category:} @tab source
724	@end multitable
725
726
727	@menu
728	* tok description::
729	* tok input::
730	* tok output::
731	* tok command line options::
732	* tok example::
733	@end menu
734
735	@node tok description
736	@subsection Description
737
738	@code{tok} is a simple program which reads a text file and identifies
739	tokens on the basis of their orthographic form. The type of the token
740	is printed as the @var{type} field.
741
742	@node tok input
743	@subsection Input
744
745	Raw text.
746
747	@node tok output
748	@subsection Output
749
750	UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
751
752	@itemize
753
754	@item @code{W}
755	(word)
756	- continuous sequence of letters
757
758	@item @code{N}
759	(number)
760	- continuous sequence of digits
761
762	@item @code{S}
763	(space)
764	- continuous sequence of space characters
765
766	@item @code{P}
767	(punctuation mark)
768	- single printable characters not belonging to any of the other classes
769
770	@item @code{B}
771	(unprintable character)
772	- single unprintable character
773
774	@end itemize
775
776
777
778	@node tok command line options
779	@subsection Command line options
780
781	@table @code
782
783	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
784	Print help.
785
786	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
787	Print version information.
788
789	@item @b{@minus{}@minus{}interactive, @minus{}i}
790	This option toggles interactive mode, which is by default off. In the
791	interactive mode the program does not buffer the output.
792
793	@end table
794
795	@node tok example
796	@subsection Example
797
798	Input:
799
800	@example
801	Piszemy dobre programy.
802	@end example
803
804	Output:
805
806	@example
807	0000 07 W Piszemy
808	0007 01 S _
809	0008 05 W dobre
810	0013 01 S _
811	0014 08 W programy
812	0022 01 P .
813	0023 01 S \n
814	@end example
815
816
817	@c ---------------------------------------------------------------------
818	@c SEN
819	@c ---------------------------------------------------------------------
820
821	@c @node sen - sentencizer
822	@c @chapter sen - sentencizer
823
824	@c Authors: Tomasz ObrÃªbski
825
826	@c ---------------------------------------------------------------------
827	@c LEM
828	@c ---------------------------------------------------------------------
829
830	@page
831	@node lem
832	@section lem - morphological analyzer
833
834	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
835	@item @strong{Authors:} @tab Tomasz ObrÃªbski, MichaÂ³ Stolarski
836	@item @strong{Component category:} @tab filter
837	@end multitable
838
839	@menu
840	* lem description::
841	* lem command line options::
842	* lem input::
843	* lem output::
844	* lem example::
845	* lem dictionaries::
846	* lem hints::
847	@end menu
848
849	@node lem description
850	@subsection Description
851
852	@command{lem} performs morphological analysis of a simple orthographic
853	word, returning all its possible morphological annotations,
854	disregarding the context.
855
856	@c ----------------------------------------
857
858	@node lem command line options
859	@subsection Command line options
860
861	@table @code
862	@parhelp
863	@parversion
864	@parinteractive
865	@c @parfile
866	@c @paroutput
867	@c @parfail
868	@c @parcopy
869	@parinputfield
870	@paroutputfield
871	@pardictionary
872	@parprocess
873	@parselect
874	@parunselect
875	@paroneline
876	@paronefield
877	@end table
878
879	@c ----------------------------------------
880
881	@node lem input
882	@subsection Input
883
884	Lem reads a UTT file and processes the value of the @var{form} field
885	(the input field may be changed with @option{--input-field} option).
886
887	@node lem output
888	@subsection Output
889
890	@command{lem} adds a new annotation field, whose default name is @code{lem}. In
891	case of ambiguity either the segment is multiplicated (default),
892	multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
893	annotation is produced as the value of single @code{lem} field (option
894	@option{--one-field,-1}):
895
896	@itemize @bullet
897
898	@item
899	unambiguous value format:
900
901	@example
902	<lemma>,<descr>
903	@end example
904
905	@item
906	ambiguous value format (@option{--one-field} option)
907
908
909	@example
910	<lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
911	@end example
912
913	(alternative descriptions for the same lemma are separated by commas,
914	alternative lemmata are separated by semicolons.)
915
916	@end itemize
917
918	@node lem example
919	@subsection Example
920
921	Input:
922
923	@example
924	0000 07 W Piszemy
925	0007 01 S _
926	0008 05 W dobre
927	0013 01 S _
928	0014 08 W programy
929	0022 01 P .
930	0023 01 B \n
931	@end example
932
933	Output (default):
934
935	@example
936	0000 07 W Piszemy lem:pisaÃŠ,V/AiVpMdTrfNpP1
937	0007 01 B _
938	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
939	0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
940	0013 01 B _
941	0014 08 W programy lem:program,N/GiNpCa
942	0014 08 W programy lem:program,N/GiNpCn
943	0014 08 W programy lem:program,N/GiNpCv
944	0022 01 P .
945	0023 01 B \n
946	@end example
947
948	Output (@option{--one-line} option):
949
950	@example
951	0000 07 W Piszemy lem:pisaÃŠ,V/AiVpMdTrfNpP1
952	0007 01 S _
953	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
954	0013 01 S _
955	0014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
956	0022 01 P .
957	0023 01 S \n
958	@end example
959
960	Output (@option{--one-field} option):
961
962	@example
963	0000 07 W Piszemy lem:pisaÃŠ,V/AiVpMdTrfNpP1
964	0007 01 S _
965	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
966	0013 01 S _
967	0014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
968	0022 01 P .
969	0023 01 S \n
970	@end example
971
972	@c ----------------------------------------
973
974	@node lem dictionaries
975	@subsection Dictionaries
976
977	@command{lem} requires a dictionary. The dictionary may be provided in
978	one of two formats: in text (source) format or in binary (fsa) format.
979
980	@subsubheading Text format
981
982	Dictionary entries have the following structure:
983
984	@example
985	<form>;<lemma>,<descr>[;<lemma>,<descr>]
986	@end example
987
988	@var{lemma} may be given explicitly or in the cut-add format:
989
990	@example
991	@code{[<cut1><add1>-]<cut2><add2>}
992	@end example
993
994	meaning: replace prefix of length @code{<cut1>} with
995	string @code{<add1>}, replace suffix of length @code{<cut2>} with string
996	@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
997	@samp{kot}, @code{3-4aÂ³y} transforms @samp{najbielsi} into @samp{biaÂ³y}
998
999	Each dictionary entry must be written in one line and must not contain blank characters.
1000
1001	Examples:
1002	@example
1003	kot;0,N/GaNsCn
1004	kota;1,N/GaNsCg;1,N/GaNsCa
1005	kotu;1,N/GaNsCd
1006	kotem;2,N/GaNsCi
1007	kocie;3t,N/GaNsCl;3t,N/GaNsCv
1008	najbielsi;3-4aÂ³y,ADJ/DsNpCnGp
1009	najbielsze;3-5aÂ³y,ADJ/DsNpCnGaifn
1010	najlepsi;dobry,ADJ/DsNpCnGp
1011	najlepsze;dobry,ADJ/DsNpCnGaifn
1012	@end example
1013
1014
1015	The mandatory file name extension for a text dictionary is @code{dic}. For large
1016	dictionaries it is preferable, however, to compile them into binary
1017	(fsa) format.
1018
1019	@subsubheading Binary format
1020
1021	The mandatory file name extension for a binary dictionary is @code{bin}. To
1022	compile a text dictionary into binary format, write:
1023
1024	@example
1025	compiledic <dictionaryname>.dic
1026	@end example
1027
1028	@subsubheading Polex/PMDBF dictionary
1029
1030	A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
1031	the distribution as the default @emph{lem}'s dictionary. It's
1032	located by default in:
1033
1034	@file{$HOME/.utt/pl/lem.bin}
1035
1036	@node lem hints
1037	@subsection Hints
1038
1039	@c @subsubheading Combining data from multiple dictionaries
1040
1041	@c @itemize
1042
1043	@c @item Apply <dict1>, then apply <dict2> to words which were not annotatated.
1044
1045	@c @example
1046	@c lem -d <dict1> \| lem -S lem -d <dict2>
1047	@c @end example
1048
1049	@c @item Add annotations from two dictionaries <dict1> and <dict2>.
1050
1051	@c @example
1052	@c lem -c -d <dict1> \| lem -S lem -d <dict2>
1053	@c @end example
1054
1055	@c @end itemize
1056
1057
1058	@c ---------------------------------------------------------------------
1059	@c GUE
1060	@c ---------------------------------------------------------------------
1061
1062	@page
1063	@node gue
1064	@section gue - morphological guesser
1065
1066	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1067
1068	@item @strong{Authors:} @tab MichaÂ³ Stolarski, Tomasz ObrÃªbski
1069	@item @strong{Component category:} @tab filter
1070
1071	@end multitable
1072
1073	@command{gue} guesess morphological descriptions of the form contained
1074	in the @var{form} field.
1075
1076	@menu
1077	* gue command line options::
1078	* gue example::
1079	* gue dictionaries::
1080	@end menu
1081
1082	@node gue command line options
1083	@subsection Command line options
1084
1085	@table @code
1086
1087	@parhelp
1088	@parversion
1089	@parinteractive
1090	@c @parfile
1091	@c @paroutput
1092	@c @parfail
1093	@c @parcopy
1094	@parinputfield
1095	@paroutputfield
1096	@pardictionary
1097	@parprocess
1098	@parselect
1099	@parunselect
1100	@paroneline
1101	@paronefield
1102
1103	@item @b{@minus{}@minus{}delta=@var{n}}
1104	Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
1105
1106
1107	@item @b{@minus{}@minus{}cut-off=@var{n}}
1108	Do not display answers with less weight than cut-off value (default=`200').
1109
1110
1111	@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
1112	Guess up to n descriptions (default=`0', which means 'display all results').
1113
1114
1115
1116	@end table
1117
1118	@node gue example
1119	@subsection Example
1120
1121	@example
1122	command: gue -n 2
1123
1124	input:
1125	0000 07 W smerfny
1126
1127	output:
1128	0000 07 W smerfny gue:,ADJ/CaDpGiNs
1129	0000 07 W smerfny gue:,ADJ/CnvDpGaipNs
1130	@end example
1131
1132
1133	@node gue dictionaries
1134	@subsection Dictionaries
1135
1136	@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
1137	The fsa format is created by compiling text-format dictionaries.
1138
1139
1140
1141	@subsubheading Text format
1142
1143	Dictionary entries have the following structure:
1144
1145	@example
1146	@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
1147	@end example
1148
1149	@var{lemma} must be given in the cut-add format:
1150
1151	@example
1152	@code{[<cut1><add1>-]<cut2><add2>}
1153	@end example
1154	(no spaces in between): replace prefix of length @var{cut1} with
1155	string @var{add1}, replace suffix of length @var{cat2} with string
1156	@var{add2}.
1157
1158
1159	Example: @code{3-4aÂ³y} transforms @i{najbielsi} into @i{biaÂ³y}
1160
1161
1162	@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
1163
1164	@var{weight} is an integer value between 1 and 999 indicating the
1165	likelihood of the guess.
1166
1167	@example
1168	*Â³kÃª;1a,N/GfNsCa
1169	naj*elszy;3-4aÂ³y,ADJ/...:...
1170	@end example
1171
1172
1173	@c ---------------------------------------------------------------------
1174	@c COR
1175	@c ---------------------------------------------------------------------
1176
1177	@page
1178	@node cor
1179	@section cor - spelling corrector
1180
1181	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1182	@item @strong{Authors:} @tab Tomasz ObrÃªbski, MichaÂ³ Stolarski
1183	@item @strong{Component category:} @tab filter
1184	@end multitable
1185
1186	The spelling corrector applies Kemal Oflazer's dynamic programming
1187	algorithm @cite{oflazer96} to the FSA representation of the set of
1188	word forms of the Polex/PMDBF dictionary. Given an incorrect
1189	word form it returns all word forms present in the dictionary whose
1190	edit distance is smaller than the threshold given as the parameter.
1191
1192	By default @code{cor} replaces the contents of the @var{form} field
1193	with new corrected value, placing the old contents in the @code{cor}
1194	field.
1195
1196
1197	@menu
1198	* cor command line options::
1199	* cor dictionaries::
1200	@end menu
1201
1202
1203	@node cor command line options
1204	@subsection Command line options
1205
1206	@table @code
1207
1208	@parhelp
1209	@parversion
1210	@parinteractive
1211	@c @parfile
1212	@c @paroutput
1213	@c @parfail
1214	@c @parcopy
1215	@parinputfield
1216	@paroutputfield
1217	@pardictionary
1218	@parprocess
1219	@parselect
1220	@parunselect
1221	@paroneline
1222	@paronefield
1223
1224	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
1225	Maximum edit distance (default='1').
1226
1227
1228	@end table
1229
1230	@node cor dictionaries
1231	@subsection Dictionaries
1232
1233	@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
1234	The fsa format is created by compiling text-format dictionaries.
1235
1236	@subsubheading Text format
1237
1238	The @command{cor} dictionary is a list of words:
1239	@example
1240	odlot
1241	odlotowy
1242	odludek
1243	@end example
1244
1245	@page
1246	@node sen
1247	@section sen - a sentensizer
1248
1249	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1250
1251	@item @strong{Authors:} @tab Tomasz ObrÃªbski
1252	@item @strong{Component category:} @tab filter
1253
1254	@end multitable
1255
1256	@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
1257
1258	@menu
1259	@c * sen input::
1260	@c * sen output::
1261	* sen example::
1262	@end menu
1263
1264	@node sen example
1265	@subsection Example
1266
1267	@example
1268	command: sen
1269
1270	input:
1271	0000 05 W CzeÂ¶ÃŠ
1272	0005 01 P !
1273	0006 01 S _
1274	0007 02 W To
1275	0009 01 S _
1276	0010 02 W ja
1277	0012 01 P .
1278	0013 01 S \n
1279
1280	output:
1281	0000 00 BOS *
1282	0000 05 W CzeÂ¶ÃŠ
1283	0005 01 P !
1284	0006 00 EOS *
1285	0006 00 BOS *
1286	0006 01 S _
1287	0007 02 W To
1288	0009 01 S _
1289	0010 02 W ja
1290	0012 01 P .
1291	0013 01 S \n
1292	0014 00 EOS *
1293	@end example
1294
1295
1296	@c ---------------------------------------------------------------------
1297	@c GPH
1298	@c ---------------------------------------------------------------------
1299
1300	@c @node gph - graphizer
1301	@c @chapter gph - graphizer
1302
1303	@c Authors: Tomasz ObrÃªbski
1304
1305
1306
1307	@c SER
1308	@c ---------------------------------------------------------------------
1309	@c ---------------------------------------------------------------------
1310
1311	@page
1312	@node ser
1313	@section ser - pattern search tool
1314
1315	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1316	@item @strong{Authors:} @tab Tomasz ObrÃªbski
1317	@item @strong{Component category:} @tab filter
1318	@end multitable
1319
1320	@command{ser} looks for patterns in UTT-formatted texts.
1321
1322	@menu
1323	* ser command line options::
1324	* ser pattern::
1325	* ser how ser works::
1326	* ser customization::
1327	* ser limitations::
1328	* ser requirements::
1329	@end menu
1330
1331
1332	@c ---------------------------------------------------------------------
1333	@node ser command line options
1334	@subsection Command line options
1335
1336	@table @code
1337
1338	@parhelp
1339	@parversion
1340	@c @parfile
1341	@c @paroutput
1342	@c @parinputfield
1343	@c @paroutputfield
1344	@parprocess
1345	@parinteractive
1346
1347	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1348	The search pattern.
1349
1350	@item @b{@minus{}@minus{}morph=@var{field}}
1351	The name of the annotation field containing the morphological
1352	description (default @code{lem}).
1353
1354	@item @b{@minus{}@minus{}flex}
1355	Only print the generated flex source code.
1356
1357	@item @b{@minus{}@minus{}macro=@var{filename}}
1358	Read macrodefinitions from file @var{filename} rather than from
1359	default location. This option allows to redefine the set of terms.
1360
1361	@item @b{@minus{}@minus{}define=@var{filename}}
1362	Append macrodefinitions from file @var{filename}. This option
1363	allows to extend the set of terms.
1364
1365	@end table
1366
1367
1368	@c ---------------------------------------------------------------------
1369	@node ser pattern
1370	@subsection Pattern
1371
1372	The @command{ser} pattern is a regular expression over terms corresponding
1373	to text segments or segment sequences. Predefined terms are:
1374
1375	@table @code
1376
1377	@item seg(@var{t},@var{f},@var{a})
1378	a segment of type @var{t}, containing form @var{f} and annotation
1379	@var{a}
1380
1381	@item form(@var{f})
1382	a segment containing form @var{f}
1383
1384	@item field(@var{f})
1385	a segment containing annotation field @var{f}
1386
1387	@item space(@var{f})
1388	a space segment of form @var{f}
1389
1390	@item word(@var{f})
1391	a word segment of form @var{f}
1392
1393	@item punct(@var{f})
1394	a punct segment of form @var{f}
1395
1396	@item number(@var{f})
1397	a number segment of form @var{f}
1398
1399	@item lexeme(@var{f})
1400	a word segment with lemma @var{f}
1401
1402	@item cat(@var{c})
1403	a word segment of category @var{c}
1404
1405	@end table
1406
1407	All arguments are optional. If an argument is omitted, an arbitrary
1408	string of non-blank characters is assumed as the argument value. Term
1409	arguments may be arbitrary character-level regular expressions. The
1410	following special symbols can by used:
1411
1412	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1413	@item @code{[@dots{}]} @tab a character class
1414	@item @code{[^@dots{}]} @tab a negated character class
1415	@item @code{\|} @tab alternative
1416	@item @code{*} @tab repetition, including zero times
1417	@item @code{+} @tab repetition, at least one time
1418	@item @code{?} @tab optionality
1419	@item @code{@{@var{m},@var{n}@}} @tab repetition from @var{m} to @var{n} times
1420	@item @code{@{@var{m},@}} @tab repetition @var{m} or more times
1421	@item @code{@{@var{m}@}} @tab repetition @var{m} times
1422	@item @code{@var{\ddd}} @tab the character with octal value @var{ddd}
1423	@item @code{\x@var{hh}} @tab the character with hexadecimal value @var{hh}
1424	@item @code{( )} @tab parentheses, used to override precedence
1425	@c @end multitable
1426
1427	@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1428	@item @code{.} @tab a non-blank character
1429	@item @code{\w} @tab a letter
1430	@item @code{\W} @tab a non-blank character other than a letter
1431	@item @code{\d} @tab a digit
1432	@item @code{\D} @tab a non-blank character other than a digit
1433	@item @code{\s} @tab a space or tab character
1434	@item @code{\S} @tab a non-blank character (the same as @code{.})
1435	@item @code{\l} @tab a lowercase letter
1436	@item @code{\L} @tab an uppercase letter
1437	@end multitable
1438
1439
1440	@noindent The following characters:
1441	@example
1442	@verb{% [ ] ^ \| * + ? { } , . < > \ %}
1443	@end example
1444	must be escaped with a backslash, i.e. written as:
1445	@example
1446	@verb{% \[ \] \^ \\| \* \+ \? \{ \} \, \. \< \> \\ %}
1447	@end example
1448
1449	@quotation Note
1450	The special symbols are ... borrowed from Perl with minor
1451	modifications ... for convenience
1452	The meaning of certain special characters/sequences slightly differs
1453	from their common ???. This is motivated by convenience reasons.
1454	The meaning of the @code{.} special character is modified due to
1455	the special function of spaces in utt files (they are field
1456	separators). Use @code{\s} to explicitly
1457	@end quotation
1458
1459	In the argument of the @code{cat} term a special operator <...> may be
1460	used. A category specification enclosed in angle brackets matches all
1461	category descriptions which are consistent (non-contradictory) with the
1462	specification. For example @code{<N>} matches all noun descriptions,
1463	@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
1464
1465
1466	@*
1467	@noindent @b{Examples of one-segment patterns:}
1468
1469	@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1470	@item @code{seg} @tab any segment
1471	@item @code{word} @tab any word-form
1472	@item @code{word(pomocy)} @tab the word-form @samp{pomocy}
1473	@item @code{word(naj.+)} @tab a word-form beginning with @samp{naj}
1474	@item @code{word(\L\l+)} @tab a capitalized word-form
1475	@item @code{punct} @tab a punctuation character
1476	@item @code{space(.\\n.)} @tab a space segment containing a newline character
1477	@item @code{lexeme(pomoc)} @tab any form of the lexeme 'pomoc'
1478	@item @code{cat(N/.*)} @tab a word which category starts with @code{N/}
1479	@item @code{cat(<N/Ca>)} @tab a word which category matches @code{N/Ca}
1480	@end multitable
1481
1482	@*
1483	@noindent @b{Examples of multi-segment patterns:}
1484
1485	@table @code
1486
1487	@item (word(\L) punct(\.) space?)+ word(\L\l+)
1488	a sequence of initials followed by a surname
1489
1490	@item punct seg(W\|S\|N)* cat(<NPRO/Sr>) seg(W\|S\|N)* punct
1491	a text fragment between two punctuation characters, containing an
1492	ocurrence of a relative pronoun
1493
1494	@end table
1495
1496
1497	@node ser how ser works
1498	@subsection How ser works
1499
1500	@node ser customization
1501	@subsection Customization
1502
1503	@c All predefined terms correspond to single segments,
1504
1505	@example
1506	define(`verbseq', `(cat(V) (space cat(V)))')
1507	@end example
1508
1509
1510	the term @code{cat()} may not be used as a ... of
1511
1512	@c See @command{m4} manual for further details on macro definition format.
1513
1514	@node ser limitations
1515	@subsection Limitations
1516
1517	more than 3 attributes in <>.
1518
1519	@node ser requirements
1520	@subsection Requirements
1521
1522	In order to run @command{ser}, the following programs must be
1523	installed in the system:
1524
1525	@itemize
1526
1527	@item @command{m4}
1528	@item @command{grep}
1529	@item @command{flex}
1530	@item @command{gcc}
1531
1532	@end itemize
1533
1534
1535	@c GRP
1536	@c ---------------------------------------------------------------------
1537	@c ---------------------------------------------------------------------
1538
1539	@page
1540	@node grp
1541	@section grp - pattern search tool
1542
1543	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1544	@item @strong{Authors:} @tab Tomasz ObrÃªbski
1545	@item @strong{Component category:} @tab filter
1546	@end multitable
1547
1548
1549	@code{gre} selects sentences containing an expression matching a
1550	pattern. The pattern format is exactly the same as that accepted by
1551	@code{ser}.
1552
1553	@code{gre} is intended mainly for speeding up corpus search process.
1554	It is extremely fast (processing speed is usually higher then the speed
1555	of reading the corpus file from disk).
1556
1557
1558
1559	@c @menu
1560	@c * ser command line options::
1561	@c * ser pattern::
1562	@c * ser how ser works::
1563	@c * ser customization::
1564	@c * ser limitations::
1565	@c * ser requirements::
1566	@c @end menu
1567	@menu
1568	* grp command line options::
1569	* grp pattern::
1570	* grp hints::
1571	@end menu
1572
1573	@node grp command line options
1574	@subsection Command line options
1575
1576	@table @code
1577
1578	@parhelp
1579	@parversion
1580	@c @parfile
1581	@c @paroutput
1582	@c @parinputfield
1583	@c @paroutputfield
1584	@parprocess
1585	@parinteractive
1586
1587	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1588	The search pattern.
1589
1590	@item @b{@minus{}@minus{}morph=@var{field}}
1591	The name of the annotation field containing the morphological
1592	description (default @code{lem}).
1593
1594	@item @b{@minus{}@minus{}command}
1595	Only print the generated flex source code.
1596
1597	@item @b{@minus{}@minus{}macro=@var{filename}}
1598	Read macrodefinitions from file @var{filename} rather than from
1599	default location. This option allows to redefine the set of terms.
1600
1601	@item @b{@minus{}@minus{}define=@var{filename}}
1602	Append macrodefinitions from file @var{filename}. This option
1603	allows to extend the set of terms.
1604
1605	@end table
1606
1607
1608	@node grp pattern
1609	@subsection Pattern
1610
1611	(see @code{ser})
1612
1613	@node grp hints
1614	@subsection Hints
1615
1616	The corpus search speed may be increased by combining grp with lzop
1617	compression tool (grp usually processes data faster than it is read from a
1618	disk, especially for slow laptop drives).
1619
1620	@example
1621	cat corpus \| tok \| sen \| lem \| grp -a p \| lzop -7 > corpus.grp.lzo
1622	@end example
1623
1624	@example
1625	lzop -cd corpus.grp.lzo \| grp -a gP -e @var{EXPR} \| ser -e @var{EXPR}
1626	@end example
1627
1628
1629	@c ---------------------------------------------------------------------
1630	@c kot
1631	@c ---------------------------------------------------------------------
1632	@c ---------------------------------------------------------------------
1633
1634	@page
1635	@node kot
1636	@section kot - untokenizer
1637
1638	Authors: Tomasz ObrÃªbski
1639
1640	@command{kot} is the opposite of @command{tok}. It changes UTT-formatted text into plain text.
1641
1642	@menu
1643	* kot command line options::
1644	* kot usage examples::
1645	@end menu
1646
1647	@node kot command line options
1648	@subsection Command line options
1649
1650	@table @code
1651
1652	@parhelp
1653
1654	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1655
1656	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1657
1658	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1659
1660	@c @item @b{@minus{}@minus{}interactive @minus{}i}
1661
1662	@c @item @b{@minus{}@minus{}config=@var{filename}}
1663
1664	@item
1665
1666	@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
1667	print @var{string} between nonadjacent segments of the input file
1668
1669	@item @b{@minus{}@minus{}spaces, @minus{}r}
1670	retain the special characters @code{_}, @code{\t},
1671	@code{\n}, @code{\r}, @code{\f} unexpanded in the output
1672
1673	@end table
1674
1675	@node kot usage examples
1676	@subsection Usage examples
1677
1678	@example
1679	cat legia.txt \| tok \| kot
1680	@end example
1681
1682	@example
1683	cat legia.txt \| tok \| lem -1 \| kot
1684	@end example
1685
1686	@c CON............................................................
1687	@c ...............................................................
1688	@c ...............................................................
1689
1690	@page
1691	@node con
1692	@section con - concordance table generator
1693
1694	@command{con} generates a concordance table based on a pattern given to @command{ser}.
1695
1696	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1697	@item @strong{Authors:} @tab Justyna Walkowska
1698	@item @strong{Component category:} @tab sink
1699	@end multitable
1700	@c
1701
1702	@menu
1703	* con command line options::
1704	* con usage example::
1705	* con hints::
1706	@end menu
1707
1708	@node con command line options
1709	@subsection Command line options
1710
1711	@table @code
1712
1713	@parhelp
1714
1715	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
1716	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1717	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1718	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1719	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
1720	@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
1721	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
1722	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
1723	@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
1724	@c @item @b{@minus{}@minus{}interactive @minus{}i}
1725	@c @item @b{@minus{}@minus{}config=@var{filename}}
1726	@c @item
1727	@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1728	@c search pattern
1729	@c
1730	@c @item @b{@minus{}@minus{}flex}
1731	@c only print the generated flex source code
1732	@c
1733	@c @item @b{@minus{}@minus{}macro=@var{filename}}
1734	@c read macrodefinitions from file @var{filename} rather than from
1735	@c default location. This option allows to redefine the set of terms.
1736	@c
1737	@c @item @b{@minus{}@minus{}define=@var{filename}}
1738	@c append macrodefinitions from file @var{filename}. This option
1739	@c allows to extend the set of terms.
1740
1741	@item @b{@minus{}@minus{}left @minus{}l}
1742	Left context info (default='30c'). Example:
1743	@example
1744	-l=5c: left context is 5 characters
1745	-l=5w: left context is 5 words
1746	-l=5s: left context is 5 non-empty input lines
1747	-l='\s*\S+\sr\S+BOS': left context starts with the given regex
1748	@end example
1749
1750	@item @b{@minus{}@minus{}right @minus{}r}
1751	Right context info (default='30c').
1752	@item @b{@minus{}@minus{}trim @minus{}t}
1753	Clear incomplete words from output.
1754	@item @b{@minus{}@minus{}white @minus{}w}
1755	DO NOT change all white characters into spaces.
1756	@item @b{@minus{}@minus{}column @minus{}c}
1757	Left column minimal width in characters (default = 0).
1758	@item @b{@minus{}@minus{}ignore @minus{}i}
1759	Ignore segment inconsistency in the input.
1760	@item @b{@minus{}@minus{}bon}
1761	Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
1762	@item @b{@minus{}@minus{}eob}
1763	End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
1764	@item @b{@minus{}@minus{}bod}
1765	Selected segment beginning display string (default='[').
1766	@item @b{@minus{}@minus{}eod}
1767	Selected segment end display string (default=']').
1768
1769
1770
1771	@end table
1772
1773	@node con usage example
1774	@subsection Usage example
1775	@example
1776	cat file.txt \| tok \| lem -1 \| ser -e 'lexeme(dom) \| con'
1777	@end example
1778
1779
1780	@node con hints
1781	@subsection Hints
1782
1783	@command{con} is a rather slow program. Do not pass large amounts of
1784	redundant text through this program. @command{con} works fine in the following
1785	sequence:
1786
1787	@example
1788	... \| grp -e EXPR \| ser -e EXPR \| con
1789	@end example
1790
1791
1792
1793	@c ---------------------------------------------------------------------
1794	@c ---------------------------------------------------------------------
1795
1796	@page
1797	@node Auxiliary tools
1798	@chapter Auxiliary tools
1799
1800	@menu
1801	* compiledic:: dictionary compiler
1802	* fla:: UTT file flattener
1803	* unfla:: UTT file unflattener
1804	@end menu
1805
1806
1807	@page
1808	@node compiledic
1809	@section compiledic - the dictionary compiler
1810
1811	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1812	@item @strong{Authors:} @tab Michal Stolarski, Tomasz Obrebski
1813	@item @strong{Component category:} @tab additional tool
1814	@end multitable
1815	@c
1816
1817	@command{compiledic} compiles dictionaries in text format (@code{.dic} extension) into binary
1818	(FSA) format (@code{.bin} extension).
1819
1820	Automaton representation of a dictionary is built using the AT&T tools:
1821	@itemize
1822	@item AT&T FSM Library,
1823	@item AT&T Lextools.
1824	@end itemize
1825
1826	In order for the compiledic program to work you have to install the
1827	above mentioned packages into your system. They are freely available
1828	for non-commercial use.
1829
1830	Usage:
1831	@example
1832	compiledic <dictionaryname>.dic
1833	@end example
1834
1835	The file <dictionaryname>.bin will be generated.
1836
1837	Remarque: The program produces a lot of temporary files which are
1838	stored in the current directory. They are deleted after successfull
1839	termination of the program.
1840
1841	@c @menu
1842	@c * con command line options::
1843	@c * con usage example::
1844	@c * con hints::
1845	@c @end menu
1846
1847
1848	@page
1849	@node fla
1850	@section fla - the UTT file flattener
1851
1852	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1853	@item @strong{Authors:} @tab Tomasz ObrÃªbski
1854	@item @strong{Component category:} @tab filter
1855	@end multitable
1856	@c
1857
1858	@command{fla} ``flattens'' a utt file by merging segments belonging
1859	to one sentence in one line. Technically, end-of-line characters
1860	('\n', ASCII code 10) are replaced with line-feed characters ('\f',
1861	ASCII code 12). The flattening makes it possible to process UTT files
1862	with such tools as @command{grep} or @command{sed} sentence by
1863	sentence (used in @command{grp} and @command{mar}).
1864
1865	Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
1866
1867	Flattened files are still human-readible.
1868
1869	Usage:
1870
1871	@example
1872	fla [<bosregex>]
1873	@end example
1874
1875	The facultative argument is a regular expression describing segments
1876	which should be treated as sentence beginnings (the test is: the
1877	segment contains a fragment matching the @code{<bosregex>}). By
1878	default, segments containing a field @code{BOS} are seeked.
1879	@c @menu
1880	@c * con command line options::
1881	@c * con usage example::
1882	@c * con hints::
1883	@c @end menu
1884
1885
1886
1887	@page
1888	@node unfla
1889	@section unfla - the UTT file unflattener
1890
1891	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1892	@item @strong{Authors:} @tab Tomasz ObrÃªbski
1893	@item @strong{Component category:} @tab filter
1894	@end multitable
1895
1896	@command{unfla} transforms a flattened UTT file, produced by
1897	@command{fla}, into the regular format by restoring end-of-line
1898	characters.
1899
1900
1901
1902
1903	@c ---------------------------------------------------------------------
1904	@c USAGE EXAMPLES
1905	@c ---------------------------------------------------------------------
1906
1907	@node Usage examples
1908	@chapter Usage examples
1909
1910	@subsubheading Simple pipelines
1911
1912	@enumerate
1913
1914	@item tokenization
1915
1916	cat text \| tok > output1
1917
1918	@item morphological annotation (1)
1919
1920	simple dictionary based lemmatization
1921
1922	cat text \| tok \| lem > output1
1923
1924	@item morphological annotation (2)
1925
1926	1) perform dictionary-based lemmatization
1927	4) guess descriptions for words which have no annotation
1928
1929	@example
1930	cat text \| tok \| lem \| gue -S lem > output2
1931	@end example
1932
1933	@item morphological annotation (3)
1934
1935	1) perform dictionary-based lemmatization
1936	2) try to correct words with no annotation
1937	3) perform dictionary-based lemmatization of corrected words
1938	4) guess descriptions for words which still have no annotation
1939
1940	@example
1941	cat text \| tok \| lem \| cor -p W -S lem \| lem -I cor \| gue -p W -S lem
1942	@end example
1943	@item spelling correction
1944
1945
1946
1947	@example
1948	cat text \| tok \| lem --only-fail \| cor -1 > output3
1949	@end example
1950
1951	@item Expression extraction
1952
1953	Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
1954
1955	@example
1956	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' -m \| kot > output4
1957	@end example
1958
1959	@item A word in context
1960
1961	Extraction of text fragments containing a form of the lexeme 'rozmowa' in
1962	the context of 5 preceeding and 5 succeeding corpus segments.
1963
1964	@example
1965	cat text \| tok \| lem -1 \| ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m \| kot > output
1966	@end example
1967
1968	@item generation of concordance table (1)
1969
1970	@example
1971	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
1972	@end example
1973
1974	10"
1975
1976	@item generation of concordance table (2)
1977
1978	The same as above but much faster
1979
1980	@example
1981	cat text \| tok \| lem -1 \| \
1982	grp -e 'cat(<V>) space lexeme(rozmowa)' \| \
1983	ser -e 'cat(<V>) space lexeme(rozmowa)' \| \
1984	con
1985	@end example
1986
1987	2"
1988
1989	@item generation of concordance table (3)
1990
1991	Usually, one performs repetitively search over the same corpus. In
1992	such case it is advisable to transform the corpus data into the format
1993	required by @command{grp} first, and then use the preprocessed data.
1994
1995	As @command{grp} (@command{grep}) processes data faster then it is
1996	read from the disk drive, the search time may be still shortened by
1997	using file compression techniques. We suggest usin @command{lzop}.
1998
1999	@item the fastest way to search a large corpus
2000
2001	step 1: preprocessing
2002
2003	@example
2004	cat corpus \| tok \| sen \| lem -1 \
2005	\| grp -a p \| lzop -7 > corpus.grp.lzo
2006	@end example
2007
2008	step 2: search
2009
2010	@example
2011	lzop -cd corpus.grp.lzo \| grp -a gP -e 'cat(<V>) space
2012	lexeme(rozmowa)' \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
2013	@end example
2014
2015	@end enumerate
2016
2017	@subsubheading More complicated configurations
2018
2019
2020	@example
2021	mknod fifo1 p
2022	mknod fifo2 p
2023	mknod fifo3 p
2024	mknod fifo4 p
2025	mknod fifo5 p
2026
2027	tok \| lem -p W -e fifo1 > fifo2 &
2028	cor -e fifo3 < fifo1 \| lem > fifo4 &
2029	gue < fifo3 > fifo5 &
2030	sort -m fifo2 fifo4 fifo5
2031
2032	rm fifo?
2033	@end example
2034
2035
2036	@c ---------------------------------------------------------------------
2037	@c ---------------------------------------------------------------------
2038
2039	@c ---------------------------------------------------------------------
2040	@c PMDBF DICTIONARY
2041	@c ---------------------------------------------------------------------
2042
2043	@node PMDBF dictionary
2044	@chapter PMDBF dictionary
2045
2046	UTT components come with lexical data derived from Polish
2047	Morphological Database (PMDB).
2048
2049	@menu
2050	* PMDBF files::
2051	* PMDBF tag structure::
2052	* PMDBF parts of speech::
2053	* PMDBF morphosyntactic attributes::
2054	@end menu
2055
2056	@node PMDBF files
2057	@section Files
2058
2059	@node PMDBF tag structure
2060	@section Tag structure
2061
2062	pos = [[:upper:]]+
2063
2064	attr = [[:upper:]]+
2065
2066	val = [[:lower:][:digit:]?!*+-] \| <[^>\n]+>
2067
2068	descr = pos ( / ( attr val + ) + ) ?
2069
2070	@node PMDBF parts of speech
2071	@section Parts of speech
2072
2073	@multitable {ADJPRP} { adjectival-passive-participle }
2074	@item @code{N} @tab noun
2075	@item @code{NPRO} @tab nominal-pronoun
2076	@item @code{NV} @tab deverbal-noun
2077	@item @code{V} @tab verb
2078	@item @code{BYC} @tab byc
2079	@item @code{VNI} @tab non-inflected-verb
2080	@item @code{ADJ} @tab adjective
2081	@item @code{ADJPAP} @tab adjectival-passive-participle
2082	@item @code{ADJPRP} @tab adjectival-present-participle
2083	@item @code{ADJPP} @tab adjectival-past-participle
2084	@item @code{ADJPRO} @tab adjectival-pronoun
2085	@item @code{ADJNUM} @tab adjectival-numeral
2086	@item @code{ADV} @tab adverb
2087	@item @code{ADVANP} @tab adverbial-anterior-participle
2088	@item @code{ADVPRP} @tab adverbial-present-participle
2089	@item @code{ADVPRO} @tab adverbial-pronoun
2090	@item @code{ADVNUM} @tab adverbial-numeral
2091	@item @code{P} @tab preposition
2092	@item @code{PPRO} @tab prep-noun-pronoun
2093	@item @code{CONJ} @tab conjunction
2094	@item @code{EXCL} @tab exclamation
2095	@item @code{APP} @tab call
2096	@item @code{ONO} @tab onomatopoeia
2097	@item @code{PART} @tab particle
2098	@item @code{NUMCRD} @tab cardinal-numeral
2099	@item @code{NUMCOL} @tab collective-numeral
2100	@item @code{NUMPAR} @tab partitive-numeral
2101	@item @code{NUMORD} @tab ordinal-numeral
2102	@end multitable
2103
2104	@node PMDBF morphosyntactic attributes
2105	@section Morphosyntactic attributes
2106
2107	@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2108	@c @headitem Attr @tab Val @tab Description
2109	@item
2110	@code{A} @tab @tab Aspect
2111	@item
2112	@tab @code{p} @tab perfect
2113	@item
2114	@tab @code{i} @tab imperfect.
2115	@item
2116	@item
2117	@code{V} @tab @tab Verb-Form
2118	@item
2119	@tab @code{b} @tab infinitive,
2120	@item
2121	@tab @code{p} @tab personal,
2122	@item
2123	@tab @code{i} @tab impersonal.
2124	@item
2125	@item
2126	@code{M} @tab @tab Mood
2127	@item
2128	@tab @code{d} @tab declarative,
2129	@item
2130	@tab @code{c} @tab conditional,
2131	@item
2132	@tab @code{i} @tab imperative.
2133	@item
2134	@item
2135	@code{T} @tab @tab Tense
2136	@item
2137	@tab @code{a} @tab past,
2138	@item
2139	@tab @code{r} @tab present,
2140	@item
2141	@tab @code{f} @tab future.
2142	@item
2143	@item
2144	@code{P} @tab @tab Person
2145	@item
2146	@tab @code{1} @tab 1,
2147	@item
2148	@tab @code{2} @tab 2,
2149	@item
2150	@tab @code{3} @tab 3.
2151	@item
2152	@item
2153	@code{D} @tab @tab Degree
2154	@item
2155	@tab @code{p} @tab positive,
2156	@item
2157	@tab @code{c} @tab comparative,
2158	@item
2159	@tab @code{s} @tab superlative.
2160	@item
2161	@item
2162	@code{N} @tab @tab Number
2163	@item
2164	@tab @code{s} @tab singular,
2165	@item
2166	@tab @code{p} @tab plural.
2167	@item
2168	@item
2169	@code{C} @tab @tab Case
2170	@item
2171	@tab @code{n} @tab nominative,
2172	@item
2173	@tab @code{g} @tab genitive,
2174	@item
2175	@tab @code{d} @tab dative,
2176	@item
2177	@tab @code{a} @tab accusative,
2178	@item
2179	@tab @code{i} @tab instrumantal,
2180	@item
2181	@tab @code{l} @tab locative,
2182	@item
2183	@tab @code{v} @tab vocative.
2184	@item
2185	@item
2186	@code{G} @tab @tab Gender
2187	@item
2188	@tab @code{p} @tab masculine-personal,
2189	@item
2190	@tab @code{a} @tab masculine-animal,
2191	@item
2192	@tab @code{i} @tab masculine-inanimate,
2193	@item
2194	@tab @code{f} @tab feminine,
2195	@item
2196	@tab @code{n} @tab neuter.
2197	@end multitable
2198
2199
2200	@c ---------------------------------------------------------------------
2201	@c ---------------------------------------------------------------------
2202	@c
2203	@c @node Examples
2204	@c @chapter Examples
2205
2206	@c ----------------------------------------------------------------------
2207	@c ----------------------------------------------------------------------
2208
2209	@node GNU Free Documentation License
2210	@chapter GNU Free Documentation License
2211
2212	@c The GNU Free Documentation License.
2213	@center Version 1.2, November 2002
2214
2215	@c This file is intended to be included within another document,
2216	@c hence no sectioning command or @node.
2217
2218	@display
2219	Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
2220	51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
2221
2222	Everyone is permitted to copy and distribute verbatim copies
2223	of this license document, but changing it is not allowed.
2224	@end display
2225
2226	@enumerate 0
2227	@item
2228	PREAMBLE
2229
2230	The purpose of this License is to make a manual, textbook, or other
2231	functional and useful document @dfn{free} in the sense of freedom: to
2232	assure everyone the effective freedom to copy and redistribute it,
2233	with or without modifying it, either commercially or noncommercially.
2234	Secondarily, this License preserves for the author and publisher a way
2235	to get credit for their work, while not being considered responsible
2236	for modifications made by others.
2237
2238	This License is a kind of ``copyleft'', which means that derivative
2239	works of the document must themselves be free in the same sense. It
2240	complements the GNU General Public License, which is a copyleft
2241	license designed for free software.
2242
2243	We have designed this License in order to use it for manuals for free
2244	software, because free software needs free documentation: a free
2245	program should come with manuals providing the same freedoms that the
2246	software does. But this License is not limited to software manuals;
2247	it can be used for any textual work, regardless of subject matter or
2248	whether it is published as a printed book. We recommend this License
2249	principally for works whose purpose is instruction or reference.
2250
2251	@item
2252	APPLICABILITY AND DEFINITIONS
2253
2254	This License applies to any manual or other work, in any medium, that
2255	contains a notice placed by the copyright holder saying it can be
2256	distributed under the terms of this License. Such a notice grants a
2257	world-wide, royalty-free license, unlimited in duration, to use that
2258	work under the conditions stated herein. The ``Document'', below,
2259	refers to any such manual or work. Any member of the public is a
2260	licensee, and is addressed as ``you''. You accept the license if you
2261	copy, modify or distribute the work in a way requiring permission
2262	under copyright law.
2263
2264	A ``Modified Version'' of the Document means any work containing the
2265	Document or a portion of it, either copied verbatim, or with
2266	modifications and/or translated into another language.
2267
2268	A ``Secondary Section'' is a named appendix or a front-matter section
2269	of the Document that deals exclusively with the relationship of the
2270	publishers or authors of the Document to the Document's overall
2271	subject (or to related matters) and contains nothing that could fall
2272	directly within that overall subject. (Thus, if the Document is in
2273	part a textbook of mathematics, a Secondary Section may not explain
2274	any mathematics.) The relationship could be a matter of historical
2275	connection with the subject or with related matters, or of legal,
2276	commercial, philosophical, ethical or political position regarding
2277	them.
2278
2279	The ``Invariant Sections'' are certain Secondary Sections whose titles
2280	are designated, as being those of Invariant Sections, in the notice
2281	that says that the Document is released under this License. If a
2282	section does not fit the above definition of Secondary then it is not
2283	allowed to be designated as Invariant. The Document may contain zero
2284	Invariant Sections. If the Document does not identify any Invariant
2285	Sections then there are none.
2286
2287	The ``Cover Texts'' are certain short passages of text that are listed,
2288	as Front-Cover Texts or Back-Cover Texts, in the notice that says that
2289	the Document is released under this License. A Front-Cover Text may
2290	be at most 5 words, and a Back-Cover Text may be at most 25 words.
2291
2292	A ``Transparent'' copy of the Document means a machine-readable copy,
2293	represented in a format whose specification is available to the
2294	general public, that is suitable for revising the document
2295	straightforwardly with generic text editors or (for images composed of
2296	pixels) generic paint programs or (for drawings) some widely available
2297	drawing editor, and that is suitable for input to text formatters or
2298	for automatic translation to a variety of formats suitable for input
2299	to text formatters. A copy made in an otherwise Transparent file
2300	format whose markup, or absence of markup, has been arranged to thwart
2301	or discourage subsequent modification by readers is not Transparent.
2302	An image format is not Transparent if used for any substantial amount
2303	of text. A copy that is not ``Transparent'' is called ``Opaque''.
2304
2305	Examples of suitable formats for Transparent copies include plain
2306	@sc{ascii} without markup, Texinfo input format, La@TeX{} input
2307	format, @acronym{SGML} or @acronym{XML} using a publicly available
2308	@acronym{DTD}, and standard-conforming simple @acronym{HTML},
2309	PostScript or @acronym{PDF} designed for human modification. Examples
2310	of transparent image formats include @acronym{PNG}, @acronym{XCF} and
2311	@acronym{JPG}. Opaque formats include proprietary formats that can be
2312	read and edited only by proprietary word processors, @acronym{SGML} or
2313	@acronym{XML} for which the @acronym{DTD} and/or processing tools are
2314	not generally available, and the machine-generated @acronym{HTML},
2315	PostScript or @acronym{PDF} produced by some word processors for
2316	output purposes only.
2317
2318	The ``Title Page'' means, for a printed book, the title page itself,
2319	plus such following pages as are needed to hold, legibly, the material
2320	this License requires to appear in the title page. For works in
2321	formats which do not have any title page as such, ``Title Page'' means
2322	the text near the most prominent appearance of the work's title,
2323	preceding the beginning of the body of the text.
2324
2325	A section ``Entitled XYZ'' means a named subunit of the Document whose
2326	title either is precisely XYZ or contains XYZ in parentheses following
2327	text that translates XYZ in another language. (Here XYZ stands for a
2328	specific section name mentioned below, such as ``Acknowledgements'',
2329	``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title''
2330	of such a section when you modify the Document means that it remains a
2331	section ``Entitled XYZ'' according to this definition.
2332
2333	The Document may include Warranty Disclaimers next to the notice which
2334	states that this License applies to the Document. These Warranty
2335	Disclaimers are considered to be included by reference in this
2336	License, but only as regards disclaiming warranties: any other
2337	implication that these Warranty Disclaimers may have is void and has
2338	no effect on the meaning of this License.
2339
2340	@item
2341	VERBATIM COPYING
2342
2343	You may copy and distribute the Document in any medium, either
2344	commercially or noncommercially, provided that this License, the
2345	copyright notices, and the license notice saying this License applies
2346	to the Document are reproduced in all copies, and that you add no other
2347	conditions whatsoever to those of this License. You may not use
2348	technical measures to obstruct or control the reading or further
2349	copying of the copies you make or distribute. However, you may accept
2350	compensation in exchange for copies. If you distribute a large enough
2351	number of copies you must also follow the conditions in section 3.
2352
2353	You may also lend copies, under the same conditions stated above, and
2354	you may publicly display copies.
2355
2356	@item
2357	COPYING IN QUANTITY
2358
2359	If you publish printed copies (or copies in media that commonly have
2360	printed covers) of the Document, numbering more than 100, and the
2361	Document's license notice requires Cover Texts, you must enclose the
2362	copies in covers that carry, clearly and legibly, all these Cover
2363	Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
2364	the back cover. Both covers must also clearly and legibly identify
2365	you as the publisher of these copies. The front cover must present
2366	the full title with all words of the title equally prominent and
2367	visible. You may add other material on the covers in addition.
2368	Copying with changes limited to the covers, as long as they preserve
2369	the title of the Document and satisfy these conditions, can be treated
2370	as verbatim copying in other respects.
2371
2372	If the required texts for either cover are too voluminous to fit
2373	legibly, you should put the first ones listed (as many as fit
2374	reasonably) on the actual cover, and continue the rest onto adjacent
2375	pages.
2376
2377	If you publish or distribute Opaque copies of the Document numbering
2378	more than 100, you must either include a machine-readable Transparent
2379	copy along with each Opaque copy, or state in or with each Opaque copy
2380	a computer-network location from which the general network-using
2381	public has access to download using public-standard network protocols
2382	a complete Transparent copy of the Document, free of added material.
2383	If you use the latter option, you must take reasonably prudent steps,
2384	when you begin distribution of Opaque copies in quantity, to ensure
2385	that this Transparent copy will remain thus accessible at the stated
2386	location until at least one year after the last time you distribute an
2387	Opaque copy (directly or through your agents or retailers) of that
2388	edition to the public.
2389
2390	It is requested, but not required, that you contact the authors of the
2391	Document well before redistributing any large number of copies, to give
2392	them a chance to provide you with an updated version of the Document.
2393
2394	@item
2395	MODIFICATIONS
2396
2397	You may copy and distribute a Modified Version of the Document under
2398	the conditions of sections 2 and 3 above, provided that you release
2399	the Modified Version under precisely this License, with the Modified
2400	Version filling the role of the Document, thus licensing distribution
2401	and modification of the Modified Version to whoever possesses a copy
2402	of it. In addition, you must do these things in the Modified Version:
2403
2404	@enumerate A
2405	@item
2406	Use in the Title Page (and on the covers, if any) a title distinct
2407	from that of the Document, and from those of previous versions
2408	(which should, if there were any, be listed in the History section
2409	of the Document). You may use the same title as a previous version
2410	if the original publisher of that version gives permission.
2411
2412	@item
2413	List on the Title Page, as authors, one or more persons or entities
2414	responsible for authorship of the modifications in the Modified
2415	Version, together with at least five of the principal authors of the
2416	Document (all of its principal authors, if it has fewer than five),
2417	unless they release you from this requirement.
2418
2419	@item
2420	State on the Title page the name of the publisher of the
2421	Modified Version, as the publisher.
2422
2423	@item
2424	Preserve all the copyright notices of the Document.
2425
2426	@item
2427	Add an appropriate copyright notice for your modifications
2428	adjacent to the other copyright notices.
2429
2430	@item
2431	Include, immediately after the copyright notices, a license notice
2432	giving the public permission to use the Modified Version under the
2433	terms of this License, in the form shown in the Addendum below.
2434
2435	@item
2436	Preserve in that license notice the full lists of Invariant Sections
2437	and required Cover Texts given in the Document's license notice.
2438
2439	@item
2440	Include an unaltered copy of this License.
2441
2442	@item
2443	Preserve the section Entitled ``History'', Preserve its Title, and add
2444	to it an item stating at least the title, year, new authors, and
2445	publisher of the Modified Version as given on the Title Page. If
2446	there is no section Entitled ``History'' in the Document, create one
2447	stating the title, year, authors, and publisher of the Document as
2448	given on its Title Page, then add an item describing the Modified
2449	Version as stated in the previous sentence.
2450
2451	@item
2452	Preserve the network location, if any, given in the Document for
2453	public access to a Transparent copy of the Document, and likewise
2454	the network locations given in the Document for previous versions
2455	it was based on. These may be placed in the ``History'' section.
2456	You may omit a network location for a work that was published at
2457	least four years before the Document itself, or if the original
2458	publisher of the version it refers to gives permission.
2459
2460	@item
2461	For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
2462	the Title of the section, and preserve in the section all the
2463	substance and tone of each of the contributor acknowledgements and/or
2464	dedications given therein.
2465
2466	@item
2467	Preserve all the Invariant Sections of the Document,
2468	unaltered in their text and in their titles. Section numbers
2469	or the equivalent are not considered part of the section titles.
2470
2471	@item
2472	Delete any section Entitled ``Endorsements''. Such a section
2473	may not be included in the Modified Version.
2474
2475	@item
2476	Do not retitle any existing section to be Entitled ``Endorsements'' or
2477	to conflict in title with any Invariant Section.
2478
2479	@item
2480	Preserve any Warranty Disclaimers.
2481	@end enumerate
2482
2483	If the Modified Version includes new front-matter sections or
2484	appendices that qualify as Secondary Sections and contain no material
2485	copied from the Document, you may at your option designate some or all
2486	of these sections as invariant. To do this, add their titles to the
2487	list of Invariant Sections in the Modified Version's license notice.
2488	These titles must be distinct from any other section titles.
2489
2490	You may add a section Entitled ``Endorsements'', provided it contains
2491	nothing but endorsements of your Modified Version by various
2492	parties---for example, statements of peer review or that the text has
2493	been approved by an organization as the authoritative definition of a
2494	standard.
2495
2496	You may add a passage of up to five words as a Front-Cover Text, and a
2497	passage of up to 25 words as a Back-Cover Text, to the end of the list
2498	of Cover Texts in the Modified Version. Only one passage of
2499	Front-Cover Text and one of Back-Cover Text may be added by (or
2500	through arrangements made by) any one entity. If the Document already
2501	includes a cover text for the same cover, previously added by you or
2502	by arrangement made by the same entity you are acting on behalf of,
2503	you may not add another; but you may replace the old one, on explicit
2504	permission from the previous publisher that added the old one.
2505
2506	The author(s) and publisher(s) of the Document do not by this License
2507	give permission to use their names for publicity for or to assert or
2508	imply endorsement of any Modified Version.
2509
2510	@item
2511	COMBINING DOCUMENTS
2512
2513	You may combine the Document with other documents released under this
2514	License, under the terms defined in section 4 above for modified
2515	versions, provided that you include in the combination all of the
2516	Invariant Sections of all of the original documents, unmodified, and
2517	list them all as Invariant Sections of your combined work in its
2518	license notice, and that you preserve all their Warranty Disclaimers.
2519
2520	The combined work need only contain one copy of this License, and
2521	multiple identical Invariant Sections may be replaced with a single
2522	copy. If there are multiple Invariant Sections with the same name but
2523	different contents, make the title of each such section unique by
2524	adding at the end of it, in parentheses, the name of the original
2525	author or publisher of that section if known, or else a unique number.
2526	Make the same adjustment to the section titles in the list of
2527	Invariant Sections in the license notice of the combined work.
2528
2529	In the combination, you must combine any sections Entitled ``History''
2530	in the various original documents, forming one section Entitled
2531	``History''; likewise combine any sections Entitled ``Acknowledgements'',
2532	and any sections Entitled ``Dedications''. You must delete all
2533	sections Entitled ``Endorsements.''
2534
2535	@item
2536	COLLECTIONS OF DOCUMENTS
2537
2538	You may make a collection consisting of the Document and other documents
2539	released under this License, and replace the individual copies of this
2540	License in the various documents with a single copy that is included in
2541	the collection, provided that you follow the rules of this License for
2542	verbatim copying of each of the documents in all other respects.
2543
2544	You may extract a single document from such a collection, and distribute
2545	it individually under this License, provided you insert a copy of this
2546	License into the extracted document, and follow this License in all
2547	other respects regarding verbatim copying of that document.
2548
2549	@item
2550	AGGREGATION WITH INDEPENDENT WORKS
2551
2552	A compilation of the Document or its derivatives with other separate
2553	and independent documents or works, in or on a volume of a storage or
2554	distribution medium, is called an ``aggregate'' if the copyright
2555	resulting from the compilation is not used to limit the legal rights
2556	of the compilation's users beyond what the individual works permit.
2557	When the Document is included in an aggregate, this License does not
2558	apply to the other works in the aggregate which are not themselves
2559	derivative works of the Document.
2560
2561	If the Cover Text requirement of section 3 is applicable to these
2562	copies of the Document, then if the Document is less than one half of
2563	the entire aggregate, the Document's Cover Texts may be placed on
2564	covers that bracket the Document within the aggregate, or the
2565	electronic equivalent of covers if the Document is in electronic form.
2566	Otherwise they must appear on printed covers that bracket the whole
2567	aggregate.
2568
2569	@item
2570	TRANSLATION
2571
2572	Translation is considered a kind of modification, so you may
2573	distribute translations of the Document under the terms of section 4.
2574	Replacing Invariant Sections with translations requires special
2575	permission from their copyright holders, but you may include
2576	translations of some or all Invariant Sections in addition to the
2577	original versions of these Invariant Sections. You may include a
2578	translation of this License, and all the license notices in the
2579	Document, and any Warranty Disclaimers, provided that you also include
2580	the original English version of this License and the original versions
2581	of those notices and disclaimers. In case of a disagreement between
2582	the translation and the original version of this License or a notice
2583	or disclaimer, the original version will prevail.
2584
2585	If a section in the Document is Entitled ``Acknowledgements'',
2586	``Dedications'', or ``History'', the requirement (section 4) to Preserve
2587	its Title (section 1) will typically require changing the actual
2588	title.
2589
2590	@item
2591	TERMINATION
2592
2593	You may not copy, modify, sublicense, or distribute the Document except
2594	as expressly provided for under this License. Any other attempt to
2595	copy, modify, sublicense or distribute the Document is void, and will
2596	automatically terminate your rights under this License. However,
2597	parties who have received copies, or rights, from you under this
2598	License will not have their licenses terminated so long as such
2599	parties remain in full compliance.
2600
2601	@item
2602	FUTURE REVISIONS OF THIS LICENSE
2603
2604	The Free Software Foundation may publish new, revised versions
2605	of the GNU Free Documentation License from time to time. Such new
2606	versions will be similar in spirit to the present version, but may
2607	differ in detail to address new problems or concerns. See
2608	@uref{http://www.gnu.org/copyleft/}.
2609
2610	Each version of the License is given a distinguishing version number.
2611	If the Document specifies that a particular numbered version of this
2612	License ``or any later version'' applies to it, you have the option of
2613	following the terms and conditions either of that specified version or
2614	of any later version that has been published (not as a draft) by the
2615	Free Software Foundation. If the Document does not specify a version
2616	number of this License, you may choose any version ever published (not
2617	as a draft) by the Free Software Foundation.
2618	@end enumerate
2619
2620	@page
2621	@heading ADDENDUM: How to use this License for your documents
2622
2623	To use this License in a document you have written, include a copy of
2624	the License in the document and put the following copyright and
2625	license notices just after the title page:
2626
2627	@smallexample
2628	@group
2629	Copyright (C) @var{year} @var{your name}.
2630	Permission is granted to copy, distribute and/or modify this document
2631	under the terms of the GNU Free Documentation License, Version 1.2
2632	or any later version published by the Free Software Foundation;
2633	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
2634	Texts. A copy of the license is included in the section entitled ``GNU
2635	Free Documentation License''.
2636	@end group
2637	@end smallexample
2638
2639	If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
2640	replace the ``with@dots{}Texts.'' line with this:
2641
2642	@smallexample
2643	@group
2644	with the Invariant Sections being @var{list their titles}, with
2645	the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
2646	being @var{list}.
2647	@end group
2648	@end smallexample
2649
2650	If you have Invariant Sections without Cover Texts, or some other
2651	combination of the three, merge those two alternatives to suit the
2652	situation.
2653
2654	If your document contains nontrivial examples of program code, we
2655	recommend releasing these examples in parallel under your choice of
2656	free software license, such as the GNU General Public License,
2657	to permit their use in free software.
2658
2659	@c Local Variables:
2660	@c ispell-local-pdict: "ispell-dict"
2661	@c End:
2662
2663
2664	@c ---------------------------------------------------------------------
2665	@c ---------------------------------------------------------------------
2666
2667	@node Reporting bugs
2668	@chapter Reporting bugs
2669
2670	Report bugs to <obrebski@@amu.edu.pl>.
2671
2672	@c ---------------------------------------------------------------------
2673	@c ---------------------------------------------------------------------
2674
2675	@c @node Copyright
2676	@c @chapter Copyright
2677	@c
2678	@c Copyright 2004 by Tomasz Obrebski
2679	@c This software is free for research and educational use.
2680
2681	@c ---------------------------------------------------------------------
2682	@c ---------------------------------------------------------------------
2683
2684	@node Author
2685	@chapter Author
2686
2687
2688	@bye

Note: See TracBrowser for help on using the repository browser.

UAM Text Tools

Context Navigation

source: app/doc/utt.texinfo @ 04ae414

Download in other formats: