Context Navigation

utt.texinfo @ a5fdde9

help

Last change on this file since a5fdde9 was 246900a, checked in by pawelk <pawelk@…>, 18 years ago

Przejrzałem kody programów pod kątem korzystania z plików konfiguracyjnych.

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@10 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

Property mode set to 100644

File size: 79.3 KB

Line
1	\input texinfo @c --texinfo--
2	@documentencoding ISO-8859-2
3	@c @documentlanguage pl
4
5	@c %**start of header
6	@setfilename utt.info
7	@settitle UAM Text Tools v0.90
8	@c %**end of header
9
10	@copying
11	This manual is for UAM Text Tools (version 0.90, November, 2007)
12
13	Copyright @copyright{} 2005, 2007 Tomasz Obrêbski, Micha³ Stolarski, Justyna Walkowska, Pawe³ Konieczka.
14
15	Permission is granted to copy, distribute and/or modify this document
16	under the terms of the GNU Free Documentation License, Version 1.2
17	or any later version published by the Free Software Foundation;
18	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
19	Texts. A copy of the license is included in the section entitled GNU Free Documentation License,,GNU Free Documentation License.
20
21	@c @quotation
22	@c Permission is granted to ...
23	@c No permission is granted until the document is completed.
24	@c @end quotation
25	@end copying
26
27
28	@titlepage
29	@title UAM Text Tools 0.90 - User Manual
30	@subtitle edition 0.01, @today
31	@subtitle status: prescript
32	@author by Justyna Walkowska, Tomasz Obr@,{}ebski and Micha@l{} Stolarski
33	@page
34	@vskip 0pt plus 1filll
35	@insertcopying
36	@end titlepage
37
38	@contents
39
40	@c @paragraphindent none
41
42	@iftex
43	@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
44	@end iftex
45
46	@c @headings off
47	@c @everyheading LEM(1) @\| @\| LEM(1)
48	@everyfooting @today @c @\| @thispage @\|
49
50	@ifnottex
51
52	@node Top
53	@top UTT - UAM Text Tools
54
55	@insertcopying
56
57	@menu
58	* General information::
59	* UTT file format::
60	* Configuration files::
61	* UTT components::
62	* Auxiliary tools::
63	* Usage examples::
64	* PMDBF dictionary::
65	@c * Examples::
66	@c * Copyright::
67	* GNU Free Documentation License::
68	* Reporting bugs::
69	* Author::
70	@end menu
71	@end ifnottex
72
73
74	@c ----------------------------------------------------------------------
75
76	@node General information
77	@chapter General information
78
79	UAM Text Tools (UTT) is a package of language processing tools
80	developed at Adam Mickiewicz University. Its functionality includes:
81
82	@itemize @bullet
83
84	@item
85	tokenization
86	@item
87	dictionary-based morphological analysis
88	@item
89	heuristic morphological analysis of unknown words
90	@item
91	spelling correction
92	@item
93	pattern search
94	@item
95	sentence splitting
96	@item
97	generation of concordance tables
98	@end itemize
99
100	The toolkit is destined for processing of raw (not annotated)
101	unrestricted text for any conceivable purpose.
102
103	The system is organized as a collection of command-line programs, each
104	performing one operation, e.g. tokenization, lemmatization, spelling
105	correction. The components are independent one from another, the
106	unifying element being the uniform i/o file format.
107
108	The components may be combined in various ways to provide various text
109	processing services. Also new components supplied by the used may be
110	easily incorporated into the system provided that they respect the i/o
111	file format conventions.
112
113	UTT component programs does not depend on any specific tagset or
114	morphological description format.
115
116	UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
117	the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
118
119	The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use.
120
121
122	List of contributors:
123
124	@itemize
125	@item Pawel Konieczka
126	@item Tomasz Obrebski
127	@item Michal Stolarski
128	@item Marcin Walas
129	@item Justyna Walkowska
130	@end itemize
131
132	@c ----------------------------------------------------------------------
133	@c ---------------------------------------------------------------------
134
135	@node UTT file format
136	@chapter UTT file format
137
138	A UTT file contains annotation of a text. It consists of a sequence of
139	segments. Each segment explicitly refers to a continuous piece of the
140	text and provides some information on it.
141
142	@section Segment format
143
144	A segment occupies one line of a UTT file and consists of
145	space-separated fields:
146
147
148	@quotation
149	@sp 1
150	[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
151	@sp 1
152	@end quotation
153
154	@table @var
155
156	@item @var{start}
157	Non-negative integer value indicating the position in the source text where the
158	segment starts.
159
160	@item @var{length}
161	Non-negative integer value indicating the length of the segment.
162
163	@item @var{type}
164	A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
165	@var{type} reflects the main classification of segments -
166	into words, numbers, punctuation marks, meta-text markers.
167	@xref{tok output,,tok output}, for description of automatically recognized type markers.
168
169	@item @var{form}
170	This field contains the textual form of the segment or the special
171	symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
172
173	The characters or character sequences that have special meaning in the
174	@var{form} field are enumerated below.
175
176	Characters with special meaning:
177
178	@itemize
179	@item @code{_} - space character
180	@item @code{*} - undefined contents
181	@end itemize
182
183	Escape sequences:
184
185	@itemize
186	@item @code{\n} - new line
187	@item @code{\t} - tabulation
188	@item @code{\r} - carriage return
189
190	@item @code{\_} - the @code{_} character
191	@item @code{\} - the @code{} character
192	@item @code{\\} - the @code{\} character
193
194	@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
195	@end itemize
196
197	@item @var{annotation1}
198	@item @var{annotation2}
199	@item ...
200	Annotation fields have the following format:
201
202	@var{longname} @code{:} @var{value}
203
204	or
205
206	@var{shortname} @var{value}
207
208	where @var{longname} is a string of alphanumeric characters
209	(isalnum() test), @var{shortname} - a single non-alphanumeric character
210	(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
211
212	@end table
213
214
215	Only two fields are mandatory: @var{type} and @var{form}. All other fields
216	may be absent. In the case when only one number precedes the
217	@var{type} field, it is interpreted as the @var{START} position.
218
219	If the @var{length} field is ommited, the length of the segment is the
220	length of the @var{form} field, except when the value of the
221	@var{form} field is @code{*} -- in this case, the length is assumed to
222	be 0.
223
224	If the @var{start} field is also absent, the segment is assumed to directly
225	follow the preceding one.
226
227	@c Conventions:
228
229	@c Annotation fields with predefined meaning:
230
231	@c @itemize
232	@c @item @code{!} - UTT components are allowed to modify the contents of
233	@c the @var{form} field (e.g. spelling correction does this). If this happens the
234	@c original form of the segment have to be placed in the @code{!}-field.
235	@c @item @code{@@} - morphological description
236	@c @item @code{=} - node identifier assignment (used in graph encoding)
237	@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
238	@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
239	@c @end itemize
240
241	Segments of length 0 may be used to mark file positions with some
242	information. See e.g. BOS and EOS (beginning/end of sentence) markers
243	in the example below.
244
245	Example:
246
247	sentence: @samp{Piszemy dobre progrumy.}
248
249	@example
250	0000 00 BOS *
251	0000 07 W Piszemy lem:pisaæ,V
252	0007 01 S _
253	0008 05 W dobre lem:dobry,ADJ
254	0013 01 S _
255	0014 08 W progrumy cor:programy lem:program,N
256	0022 01 P .
257	0023 00 EOS *
258	0023 01 S _
259	0024 00 BOS *
260	0024 11 W Warszawiacy lem:Warszawiak,N
261	0035 01 S _
262	0036 03 W te¿
263	0039 01 P .
264	0040 00 EOS *
265
266	@end example
267
268	@example
269	0000 BOS *
270	0000 W Piszemy lem:pisaæ,V
271	0007 S _
272	0008 W dobre lem:dobry,ADJ
273	0013 S _
274	0014 W progrumy cor:programy lem:program,N
275	0022 P .
276	0023 EOS *
277	@end example
278
279	Posion information may be provided only for some types of segments:
280
281	@example
282	0000 BOS *
283	W Piszemy lem:pisaæ,V
284	S _
285	W dobre lem:dobry,ADJ
286	S _
287	W progrumy cor:programy lem:program,N
288	P .
289	EOS *
290	S _
291	0024 BOS *
292	W Warszawiacy lem:Warszawiak,N
293	S _
294	W te¿
295	P .
296	EOS *
297	@end example
298
299	Position/length information may be provided only when necessary:
300
301	@example
302	0000 04 N *
303	0000 N 12
304	P .
305	N 5
306	S _
307	W km
308	@end example
309
310	@section UTT File
311
312	A UTT file consists of a sequence of segments. The same text position
313	may be covered by multiple segments. In cosequence, ambiguous text
314	segmentation and ambiguous annotation may be represented.
315
316	There are two structural requirements a valid UTT-formatted file
317	has to meet:
318
319	@itemize @bullet
320
321	@item
322	segments have to be sorted with respect to the @var{position} field,
323
324	@item
325	for each
326	segment ending at position @var{n}, either there must be a segment starting at
327	position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
328	for each segment starting at position @var{n}, either there must be a segment
329	ending at position @var{n-1}, or the position @var{n-1} must not be covered
330	by any segment.
331
332	@end itemize
333
334	A valid annotation for the text fragment
335	@example
336	12.5 km
337	@end example
338
339	may be
340
341	@example
342	0000 02 N 12
343	0000 04 N 12.5
344	0002 01 P .
345	0003 01 N 5
346	0004 01 S _
347	0005 02 W km
348	@end example
349
350	but not
351
352	@example
353	0000 02 N 12
354	0000 04 N 12.5
355	0004 01 S _
356	0005 02 W km
357	@end example
358
359	because in the latter example the first segment (starting at position 0000, 2 characters long) ends at position @var{n}=0001 which is covered by the second segment and no segment starts at position @var{n+2}=0002.
360
361	@section Character encoding
362
363	The UTT component programs accept only 1-byte character encoding, such
364	as ISO, ANSI, DOS, UTF-8 (probably: not tested yet).
365
366
367	@c @section Formats
368
369	@c @unnumberedsubsubsec Basic format
370
371	@c While processing large amounts of the overhead related with explicit
372	@c ... of the start position and segment length becomes ... . Therefore,
373	@c for efficiency reasons certain shortcuts are possible:
374
375	@c @unnumberedsubsubsec Relative start position
376
377	@c Start position may be given as relative distance from the last
378	@c absolut position.
379
380	@c @unnumberedsubsubsec Absent length
381
382	@c Segment length may by omitted. Normally it can be restored by counting
383	@c the length of the @emph{form field}. For segments with the special value
384	@c @code{*} in the @emph{form field} length 0 is assumed.
385
386	@c @unnumberedsubsubsec Absent length and start position
387
388	@c Both start position and segment length may be omitted. In this format
389	@c each segment is assumed to follow the previous one. This format is,
390	@c therefore, suitable only for unambiguously tagged text
391	@c (0-length markers can be still used.)
392
393
394	@c @table @code
395	@c @item AL
396	@c @code{1234 03 W kot}
397	@c @item RL
398	@c @code{+56 03 W kot}
399	@c @item A
400	@c @code{1234 W kot}
401	@c @item R
402	@c @code{+56 W kot}
403	@c @item 0
404	@c @code{W kot}
405	@c @end table
406
407
408	@c [JAK UZYSKAÆ POLSKIE CZCIONKI W DVI???]
409
410	@macro parhelp
411	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
412	Print help.
413	@end macro
414
415
416	@macro parversion
417	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
418	Print version information.
419	@end macro
420
421	@macro parinteractive
422	@item @b{@minus{}@minus{}interactive, @minus{}i}
423	This option toggles interactive mode, which is by default off. In the
424	interactive mode the program does not buffer the output.
425	@end macro
426
427
428	@c @macro parfile
429	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
430	@c Input file name.
431	@c If this option is absent or equal to '@minus{}', the program
432	@c reads from the standard input.
433	@c @end macro
434
435
436	@c @macro paroutput
437	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
438	@c Regular output file name. To regular output the program sends segments
439	@c which it successfully processed and copies those which were not
440	@c subject to processing. If this option is absent or equal to
441	@c '@minus{}', standard output is used.
442	@c @end macro
443
444	@c @macro parfail
445	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
446	@c Fail output file name. To fail output the program copies the segments
447	@c it failed to process. If this option is absent or equal to
448	@c '@minus{}', standard output is used.
449	@c @end macro
450
451
452	@c @macro parcopy
453	@c @item @b{@minus{}@minus{}copy, @minus{}c}
454	@c Copy succesfully processed segments to regular output also in their
455	@c original input form.
456	@c @end macro
457
458
459	@macro parinputfield
460	@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
461	The field containing the input to the program. The default is the
462	@var{form} field. The fields @var{position}, @var{length}, @var{type},
463	and @var{form} are referred to as @code{1}, @code{2}, @code{3},
464	@code{4}, respectively.
465	@end macro
466
467
468	@macro paroutputfield
469	@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
470	The name of the field added by the program. The default is the name of the program.
471	@end macro
472
473
474	@macro pardictionary
475	@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
476	Dictionary file name.
477	@end macro
478
479
480	@macro parprocess
481	@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
482	Process segments with the specified value in the @var{type} field.
483	Multiple occurences of this option are allowed and are interpreted as
484	disjunction. If this option is absent, all segments are processed.
485	@end macro
486
487
488	@macro parselect
489	@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
490	Select for processing only segments in which the field named
491	@var{fieldname} is present. Multiple occurences of this option are
492	allowed and are interpreted as conjunction of conditions. If this
493	option is absent, all segments are processed.
494	@end macro
495
496
497	@macro parunselect
498	@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
499	Select for processing only segments in which the field @var{fieldname}
500	is absent. Multiple occurences of this option are allowed and are
501	interpreted as conjunction of conditions. If this option is absent,
502	all segments are processed.
503	@end macro
504
505
506	@macro paroneline
507	@item @b{@minus{}@minus{}one-line}
508	This option makes the program print ambiguous annotation in one output
509	line by generating multiple annotation fields. By default when
510	ambiguous annotation may be produced for a segment, the segment is
511	multiplicated and each of the annotations is added to separate copy of
512	the segment.
513	@end macro
514
515
516	@macro paronefield
517	@item @b{@minus{}@minus{}one-field, @minus{}1}
518	This option makes the program print ambiguous annotation in one
519	annotation field. By default when ambiguous annotation may be produced
520	for a segment, the segment is multiplicated and each of the
521	annotations is added to separate copy of the segment.
522
523	This option is useful when working with @command{kot} or @command{con}.
524	@end macro
525
526
527	@c ---------------------------------------------------------------------
528	@c ---------------------------------------------------------------------
529
530	@c @node Common command line options
531	@c @chapter Common command line options
532
533	@c @table @code
534
535	@c @parhelp
536
537	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
538	@c Print help.
539
540	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
541	@c Print version information.
542
543	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
544	@c Input file name.
545	@c If this option is absent or equal to '@minus{}', the program
546	@c reads from the standard input.
547
548	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
549	@c Regular output file name. To regular output the program sends segments
550	@c which it successfully processed and copies those which were not
551	@c subject to processing. If this option is absent or equal to
552	@c '@minus{}', standard output is used.
553
554	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
555	@c Fail output file name. To fail output the program copies the segments
556	@c it failed to process. If this option is absent or equal to
557	@c '@minus{}', standard output is used.
558
559	@c @item @b{@minus{}@minus{}only-fail}
560	@c Discard segments which would normally be sent to regular
561	@c output. Print only segments the program failed to process.
562
563	@c @item @b{@minus{}@minus{}no-fail}
564	@c Discard segments the program failed to process.
565	@c (This and the previous option are functionally equivalent to,
566	@c respectively, @option{-o /dev/null} and @option{-e /dev/null}, but
567	@c make the programs run faster.)
568
569	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
570	@c The field containing the input to the program. The default is usually
571	@c the @var{form} field (unless otherwise stated in the program
572	@c description). The fields @var{position}, @var{length}, @var{tag}, and
573	@c @var{form} are referred to as @code{1}, @code{2}, @code{3}, @code{4},
574	@c respectively.
575
576	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
577	@c The name of the field added by the program. The default is the name of
578	@c the program.
579
580	@c @c @item @b{@minus{}@minus{}copy, @minus{}c}
581	@c @c Copy processed segments to regular output.
582
583	@c @item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
584	@c Dictionary file name.
585	@c (This option is used by programs which use dictionary data.)
586
587	@c @item @b{@minus{}@minus{}process=@var{tag}, @minus{}p @var{tag}}
588	@c Process segments with the specified value in the @var{tag} field.
589	@c Multiple occurences of this option are allowed and are interpreted as
590	@c disjunction. If this option is absent, all segments are processed.
591
592	@c @item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
593	@c Select for processing only segments in which the field named
594	@c @var{fieldname} is present. Multiple occurences of this option are
595	@c allowed and are interpreted as conjunction of conditions. If this
596	@c option is absent, all segments are processed.
597
598	@c @item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
599	@c Select for processing only segments in which the field @var{fieldname}
600	@c is absent. Multiple occurences of this option are allowed and are
601	@c interpreted as conjunction of conditions. If this option is absent,
602	@c all segments are processed.
603
604	@c @item @b{@minus{}@minus{}interactive @minus{}i}
605	@c This option toggles interactive mode, which is by default off. In the
606	@c interactive mode the program does not buffer the output.
607
608	@c @item @b{@minus{}@minus{}config=@var{filename}}
609	@c Read configuration from file @file{@var{filename}}.
610
611	@c @item @b{@minus{}@minus{}one @minus{}1}
612	@c This option makes the program print ambiguous annotation in one output
613	@c segment. By default when
614	@c ambiguous new annotation is being produced for a segment, the segment
615	@c is multiplicated and each of the annotations is added to separate copy
616	@c of the segment.
617
618	@c @end table
619
620	@c ---------------------------------------------------------------------
621	@c CONFIGURATION FILES
622	@c ---------------------------------------------------------------------
623
624	@node Configuration files
625	@chapter Configuration files
626
627	Values for all command line options accepted by a component
628	may be set in configuration files. The default location of the
629	configuration files for a component named @command{@var{program}} are
630
631	@example
632	@file{/usr/local/etc/utt/@var{program}.conf}
633	@end example
634
635	for system-wide configuration file and
636
637	@example
638	@file{~/.utt/@var{program}.conf}
639	@end example
640
641	for user configuration file.
642
643	@c The configuration file to load may be also specified with the
644	@c @option{--config} option. Configuration file need not be provided.
645
646	For each option, the value is set according to the following priority:
647
648	@itemize
649	@item command line
650	@c @item configuration file indicated with @option{--config} option
651	@item user configuration file (or configuration file indicated with the @option{--config} option)
652	@item system-wide configuration file
653	@end itemize
654
655	Parameter values are specified in the following format:
656
657	@var{parametername}=@var{value}
658
659	where @var{parametername} is the short or long name of an option accepted by
660	the program, or
661
662	@var{parametername}
663
664	if the option does not need arguments.
665
666	You can introduce comments to configuration files using the # sign.
667
668	If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
669
670	@c The equal sign may be omitted.
671
672
673	@quotation Tip
674	If you have two (or more) frequently used sets of options for the same
675	program (eg. lem with PMDBF dictionary and lem with a user dictionary)
676	a good solution is to create two soft links to lem, called
677	eg. lemg and lemu and specify their configuration in files lemg.conf
678	and lemu.conf respectively.
679	@end quotation
680
681	@c ---------------------------------------------------------------------
682	@c COMPONENTS
683	@c ---------------------------------------------------------------------
684
685	@node UTT components
686	@chapter UTT components
687
688	UTT components are of three types:
689
690	@menu
691	Sources: programs which read non-UTT data (e.g. raw text) and produce output
692	in UTT format
693	* tok:: a tokenizer
694
695	Filters: programs which read and produce UTT-formatted data
696	@c * sen - the sentencizer::
697	* lem:: a morphological analyzer
698	* gue:: a morphological guesser
699	* cor:: a spelling corrector
700	* sen:: a sentensizer
701	@c * gph - the graphizer::
702	* ser:: a pattern search tool (marks matches)
703	* grp:: a pattern search tool (selects sentences containing a match)
704
705	Sinks: programs which read UTT data and produce output in another format
706	* kot:: an untokenizer
707	* con:: a concordance table generator
708	@end menu
709
710	@c ---------------------------------------------------------------------
711	@c TOK
712	@c ---------------------------------------------------------------------
713
714	@page
715	@node tok
716	@section tok - a tokenizer
717
718	@c ----------------------------------------
719
720	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
721	@item @strong{Authors:} @tab Tomasz Obrêbski
722	@item @strong{Component category:} @tab source
723	@end multitable
724
725
726	@menu
727	* tok description::
728	* tok input::
729	* tok output::
730	* tok command line options::
731	* tok example::
732	@end menu
733
734	@node tok description
735	@subsection Description
736
737	@code{tok} is a simple program which reads a text file and identifies
738	tokens on the basis of their orthographic form. The type of the token
739	is printed as the @var{type} field.
740
741	@node tok input
742	@subsection Input
743
744	Raw text.
745
746	@node tok output
747	@subsection Output
748
749	UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
750
751	@itemize
752
753	@item @code{W}
754	(word)
755	- continuous sequence of letters
756
757	@item @code{N}
758	(number)
759	- continuous sequence of digits
760
761	@item @code{S}
762	(space)
763	- continuous sequence of space characters
764
765	@item @code{P}
766	(punctuation mark)
767	- single printable characters not belonging to any of the other classes
768
769	@item @code{B}
770	(unprintable character)
771	- single unprintable character
772
773	@end itemize
774
775
776
777	@node tok command line options
778	@subsection Command line options
779
780	@table @code
781
782	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
783	Print help.
784
785	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
786	Print version information.
787
788	@item @b{@minus{}@minus{}interactive, @minus{}i}
789	This option toggles interactive mode, which is by default off. In the
790	interactive mode the program does not buffer the output.
791
792	@end table
793
794	@node tok example
795	@subsection Example
796
797	Input:
798
799	@example
800	Piszemy dobre programy.
801	@end example
802
803	Output:
804
805	@example
806	0000 07 W Piszemy
807	0007 01 S _
808	0008 05 W dobre
809	0013 01 S _
810	0014 08 W programy
811	0022 01 P .
812	0023 01 S \n
813	@end example
814
815
816	@c ---------------------------------------------------------------------
817	@c SEN
818	@c ---------------------------------------------------------------------
819
820	@c @node sen - sentencizer
821	@c @chapter sen - sentencizer
822
823	@c Authors: Tomasz Obrêbski
824
825	@c ---------------------------------------------------------------------
826	@c LEM
827	@c ---------------------------------------------------------------------
828
829	@page
830	@node lem
831	@section lem - morphological analyzer
832
833	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
834	@item @strong{Authors:} @tab Tomasz Obrêbski, Micha³ Stolarski
835	@item @strong{Component category:} @tab filter
836	@end multitable
837
838	@menu
839	* lem description::
840	* lem command line options::
841	* lem input::
842	* lem output::
843	* lem example::
844	* lem dictionaries::
845	* lem hints::
846	@end menu
847
848	@node lem description
849	@subsection Description
850
851	@command{lem} performs morphological analysis of a simple orthographic
852	word, returning all its possible morphological annotations,
853	disregarding the context.
854
855	@c ----------------------------------------
856
857	@node lem command line options
858	@subsection Command line options
859
860	@table @code
861	@parhelp
862	@parversion
863	@parinteractive
864	@c @parfile
865	@c @paroutput
866	@c @parfail
867	@c @parcopy
868	@parinputfield
869	@paroutputfield
870	@pardictionary
871	@parprocess
872	@parselect
873	@parunselect
874	@paroneline
875	@paronefield
876	@end table
877
878	@c ----------------------------------------
879
880	@node lem input
881	@subsection Input
882
883	Lem reads a UTT file and processes the value of the @var{form} field
884	(the input field may be changed with @option{--input-field} option).
885
886	@node lem output
887	@subsection Output
888
889	@command{lem} adds a new annotation field, whose default name is @code{lem}. In
890	case of ambiguity either the segment is multiplicated (default),
891	multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
892	annotation is produced as the value of single @code{lem} field (option
893	@option{--one-field,-1}):
894
895	@itemize @bullet
896
897	@item
898	unambiguous value format:
899
900	@example
901	<lemma>,<descr>
902	@end example
903
904	@item
905	ambiguous value format (@option{--one-field} option)
906
907
908	@example
909	<lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
910	@end example
911
912	(alternative descriptions for the same lemma are separated by commas,
913	alternative lemmata are separated by semicolons.)
914
915	@end itemize
916
917	@node lem example
918	@subsection Example
919
920	Input:
921
922	@example
923	0000 07 W Piszemy
924	0007 01 S _
925	0008 05 W dobre
926	0013 01 S _
927	0014 08 W programy
928	0022 01 P .
929	0023 01 B \n
930	@end example
931
932	Output (default):
933
934	@example
935	0000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1
936	0007 01 B _
937	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
938	0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
939	0013 01 B _
940	0014 08 W programy lem:program,N/GiNpCa
941	0014 08 W programy lem:program,N/GiNpCn
942	0014 08 W programy lem:program,N/GiNpCv
943	0022 01 P .
944	0023 01 B \n
945	@end example
946
947	Output (@option{--one-line} option):
948
949	@example
950	0000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1
951	0007 01 S _
952	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
953	0013 01 S _
954	0014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
955	0022 01 P .
956	0023 01 S \n
957	@end example
958
959	Output (@option{--one-field} option):
960
961	@example
962	0000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1
963	0007 01 S _
964	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
965	0013 01 S _
966	0014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
967	0022 01 P .
968	0023 01 S \n
969	@end example
970
971	@c ----------------------------------------
972
973	@node lem dictionaries
974	@subsection Dictionaries
975
976	@command{lem} requires a dictionary. The dictionary may be provided in
977	one of two formats: in text (source) format or in binary (fsa) format.
978
979	@subsubheading Text format
980
981	Dictionary entries have the following structure:
982
983	@example
984	<form>;<lemma>,<descr>[;<lemma>,<descr>]
985	@end example
986
987	@var{lemma} may be given explicitly or in the cut-add format:
988
989	@example
990	@code{[<cut1><add1>-]<cut2><add2>}
991	@end example
992
993	meaning: replace prefix of length @code{<cut1>} with
994	string @code{<add1>}, replace suffix of length @code{<cut2>} with string
995	@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
996	@samp{kot}, @code{3-4a³y} transforms @samp{najbielsi} into @samp{bia³y}
997
998	Each dictionary entry must be written in one line and must not contain blank characters.
999
1000	Examples:
1001	@example
1002	kot;0,N/GaNsCn
1003	kota;1,N/GaNsCg;1,N/GaNsCa
1004	kotu;1,N/GaNsCd
1005	kotem;2,N/GaNsCi
1006	kocie;3t,N/GaNsCl;3t,N/GaNsCv
1007	najbielsi;3-4a³y,ADJ/DsNpCnGp
1008	najbielsze;3-5a³y,ADJ/DsNpCnGaifn
1009	najlepsi;dobry,ADJ/DsNpCnGp
1010	najlepsze;dobry,ADJ/DsNpCnGaifn
1011	@end example
1012
1013
1014	The mandatory file name extension for a text dictionary is @code{dic}. For large
1015	dictionaries it is preferable, however, to compile them into binary
1016	(fsa) format.
1017
1018	@subsubheading Binary format
1019
1020	The mandatory file name extension for a binary dictionary is @code{bin}. To
1021	compile a text dictionary into binary format, write:
1022
1023	@example
1024	compiledic <dictionaryname>.dic
1025	@end example
1026
1027	@subsubheading Polex/PMDBF dictionary
1028
1029	A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
1030	the distribution as the default @emph{lem}'s dictionary. It's
1031	located by default in:
1032
1033	@file{$HOME/.utt/pl/lem.bin}
1034
1035	@node lem hints
1036	@subsection Hints
1037
1038	@c @subsubheading Combining data from multiple dictionaries
1039
1040	@c @itemize
1041
1042	@c @item Apply <dict1>, then apply <dict2> to words which were not annotatated.
1043
1044	@c @example
1045	@c lem -d <dict1> \| lem -S lem -d <dict2>
1046	@c @end example
1047
1048	@c @item Add annotations from two dictionaries <dict1> and <dict2>.
1049
1050	@c @example
1051	@c lem -c -d <dict1> \| lem -S lem -d <dict2>
1052	@c @end example
1053
1054	@c @end itemize
1055
1056
1057	@c ---------------------------------------------------------------------
1058	@c GUE
1059	@c ---------------------------------------------------------------------
1060
1061	@page
1062	@node gue
1063	@section gue - morphological guesser
1064
1065	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1066
1067	@item @strong{Authors:} @tab Micha³ Stolarski, Tomasz Obrêbski
1068	@item @strong{Component category:} @tab filter
1069
1070	@end multitable
1071
1072	@command{gue} guesess morphological descriptions of the form contained
1073	in the @var{form} field.
1074
1075	@menu
1076	* gue command line options::
1077	* gue example::
1078	* gue dictionaries::
1079	@end menu
1080
1081	@node gue command line options
1082	@subsection Command line options
1083
1084	@table @code
1085
1086	@parhelp
1087	@parversion
1088	@parinteractive
1089	@c @parfile
1090	@c @paroutput
1091	@c @parfail
1092	@c @parcopy
1093	@parinputfield
1094	@paroutputfield
1095	@pardictionary
1096	@parprocess
1097	@parselect
1098	@parunselect
1099	@paroneline
1100	@paronefield
1101
1102	@item @b{@minus{}@minus{}delta=@var{n}}
1103	Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
1104
1105
1106	@item @b{@minus{}@minus{}cut-off=@var{n}}
1107	Do not display answers with less weight than cut-off value (default=`200').
1108
1109
1110	@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
1111	Guess up to n descriptions (default=`0', which means 'display all results').
1112
1113
1114
1115	@end table
1116
1117	@node gue example
1118	@subsection Example
1119
1120	@example
1121	command: gue -n 2
1122
1123	input:
1124	0000 07 W smerfny
1125
1126	output:
1127	0000 07 W smerfny gue:,ADJ/CaDpGiNs
1128	0000 07 W smerfny gue:,ADJ/CnvDpGaipNs
1129	@end example
1130
1131
1132	@node gue dictionaries
1133	@subsection Dictionaries
1134
1135	@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
1136	The fsa format is created by compiling text-format dictionaries.
1137
1138
1139
1140	@subsubheading Text format
1141
1142	Dictionary entries have the following structure:
1143
1144	@example
1145	@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
1146	@end example
1147
1148	@var{lemma} must be given in the cut-add format:
1149
1150	@example
1151	@code{[<cut1><add1>-]<cut2><add2>}
1152	@end example
1153	(no spaces in between): replace prefix of length @var{cut1} with
1154	string @var{add1}, replace suffix of length @var{cat2} with string
1155	@var{add2}.
1156
1157
1158	Example: @code{3-4a³y} transforms @i{najbielsi} into @i{bia³y}
1159
1160
1161	@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
1162
1163	@var{weight} is an integer value between 1 and 999 indicating the
1164	likelihood of the guess.
1165
1166	@example
1167	*³kê;1a,N/GfNsCa
1168	naj*elszy;3-4a³y,ADJ/...:...
1169	@end example
1170
1171
1172	@c ---------------------------------------------------------------------
1173	@c COR
1174	@c ---------------------------------------------------------------------
1175
1176	@page
1177	@node cor
1178	@section cor - spelling corrector
1179
1180	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1181	@item @strong{Authors:} @tab Tomasz Obrêbski, Micha³ Stolarski
1182	@item @strong{Component category:} @tab filter
1183	@end multitable
1184
1185	The spelling corrector applies Kemal Oflazer's dynamic programming
1186	algorithm @cite{oflazer96} to the FSA representation of the set of
1187	word forms of the Polex/PMDBF dictionary. Given an incorrect
1188	word form it returns all word forms present in the dictionary whose
1189	edit distance is smaller than the threshold given as the parameter.
1190
1191	By default @code{cor} replaces the contents of the @var{form} field
1192	with new corrected value, placing the old contents in the @code{cor}
1193	field.
1194
1195
1196	@menu
1197	* cor command line options::
1198	* cor dictionaries::
1199	@end menu
1200
1201
1202	@node cor command line options
1203	@subsection Command line options
1204
1205	@table @code
1206
1207	@parhelp
1208	@parversion
1209	@parinteractive
1210	@c @parfile
1211	@c @paroutput
1212	@c @parfail
1213	@c @parcopy
1214	@parinputfield
1215	@paroutputfield
1216	@pardictionary
1217	@parprocess
1218	@parselect
1219	@parunselect
1220	@paroneline
1221	@paronefield
1222
1223	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
1224	Maximum edit distance (default='1').
1225
1226
1227	@end table
1228
1229	@node cor dictionaries
1230	@subsection Dictionaries
1231
1232	@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
1233	The fsa format is created by compiling text-format dictionaries.
1234
1235	@subsubheading Text format
1236
1237	The @command{cor} dictionary is a list of words:
1238	@example
1239	odlot
1240	odlotowy
1241	odludek
1242	@end example
1243
1244	@page
1245	@node sen
1246	@section sen - a sentensizer
1247
1248	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1249
1250	@item @strong{Authors:} @tab Tomasz Obrêbski
1251	@item @strong{Component category:} @tab filter
1252
1253	@end multitable
1254
1255	@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
1256
1257	@menu
1258	@c * sen input::
1259	@c * sen output::
1260	* sen example::
1261	@end menu
1262
1263	@node sen example
1264	@subsection Example
1265
1266	@example
1267	command: sen
1268
1269	input:
1270	0000 05 W Cze¶æ
1271	0005 01 P !
1272	0006 01 S _
1273	0007 02 W To
1274	0009 01 S _
1275	0010 02 W ja
1276	0012 01 P .
1277	0013 01 S \n
1278
1279	output:
1280	0000 00 BOS *
1281	0000 05 W Cze¶æ
1282	0005 01 P !
1283	0006 00 EOS *
1284	0006 00 BOS *
1285	0006 01 S _
1286	0007 02 W To
1287	0009 01 S _
1288	0010 02 W ja
1289	0012 01 P .
1290	0013 01 S \n
1291	0014 00 EOS *
1292	@end example
1293
1294
1295	@c ---------------------------------------------------------------------
1296	@c GPH
1297	@c ---------------------------------------------------------------------
1298
1299	@c @node gph - graphizer
1300	@c @chapter gph - graphizer
1301
1302	@c Authors: Tomasz Obrêbski
1303
1304
1305
1306	@c SER
1307	@c ---------------------------------------------------------------------
1308	@c ---------------------------------------------------------------------
1309
1310	@page
1311	@node ser
1312	@section ser - pattern search tool
1313
1314	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1315	@item @strong{Authors:} @tab Tomasz Obrêbski
1316	@item @strong{Component category:} @tab filter
1317	@end multitable
1318
1319	@command{ser} looks for patterns in UTT-formatted texts.
1320
1321	@menu
1322	* ser command line options::
1323	* ser pattern::
1324	* ser how ser works::
1325	* ser customization::
1326	* ser limitations::
1327	* ser requirements::
1328	@end menu
1329
1330
1331	@c ---------------------------------------------------------------------
1332	@node ser command line options
1333	@subsection Command line options
1334
1335	@table @code
1336
1337	@parhelp
1338	@parversion
1339	@c @parfile
1340	@c @paroutput
1341	@c @parinputfield
1342	@c @paroutputfield
1343	@parprocess
1344	@parinteractive
1345
1346	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1347	The search pattern.
1348
1349	@item @b{@minus{}@minus{}morph=@var{field}}
1350	The name of the annotation field containing the morphological
1351	description (default @code{lem}).
1352
1353	@item @b{@minus{}@minus{}flex}
1354	Only print the generated flex source code.
1355
1356	@item @b{@minus{}@minus{}macro=@var{filename}}
1357	Read macrodefinitions from file @var{filename} rather than from
1358	default location. This option allows to redefine the set of terms.
1359
1360	@item @b{@minus{}@minus{}define=@var{filename}}
1361	Append macrodefinitions from file @var{filename}. This option
1362	allows to extend the set of terms.
1363
1364	@end table
1365
1366
1367	@c ---------------------------------------------------------------------
1368	@node ser pattern
1369	@subsection Pattern
1370
1371	The @command{ser} pattern is a regular expression over terms corresponding
1372	to text segments or segment sequences. Predefined terms are:
1373
1374	@table @code
1375
1376	@item seg(@var{t},@var{f},@var{a})
1377	a segment of type @var{t}, containing form @var{f} and annotation
1378	@var{a}
1379
1380	@item form(@var{f})
1381	a segment containing form @var{f}
1382
1383	@item field(@var{f})
1384	a segment containing annotation field @var{f}
1385
1386	@item space(@var{f})
1387	a space segment of form @var{f}
1388
1389	@item word(@var{f})
1390	a word segment of form @var{f}
1391
1392	@item punct(@var{f})
1393	a punct segment of form @var{f}
1394
1395	@item number(@var{f})
1396	a number segment of form @var{f}
1397
1398	@item lexeme(@var{f})
1399	a word segment with lemma @var{f}
1400
1401	@item cat(@var{c})
1402	a word segment of category @var{c}
1403
1404	@end table
1405
1406	All arguments are optional. If an argument is omitted, an arbitrary
1407	string of non-blank characters is assumed as the argument value. Term
1408	arguments may be arbitrary character-level regular expressions. The
1409	following special symbols can by used:
1410
1411	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1412	@item @code{[@dots{}]} @tab a character class
1413	@item @code{[^@dots{}]} @tab a negated character class
1414	@item @code{\|} @tab alternative
1415	@item @code{*} @tab repetition, including zero times
1416	@item @code{+} @tab repetition, at least one time
1417	@item @code{?} @tab optionality
1418	@item @code{@{@var{m},@var{n}@}} @tab repetition from @var{m} to @var{n} times
1419	@item @code{@{@var{m},@}} @tab repetition @var{m} or more times
1420	@item @code{@{@var{m}@}} @tab repetition @var{m} times
1421	@item @code{@var{\ddd}} @tab the character with octal value @var{ddd}
1422	@item @code{\x@var{hh}} @tab the character with hexadecimal value @var{hh}
1423	@item @code{( )} @tab parentheses, used to override precedence
1424	@c @end multitable
1425
1426	@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1427	@item @code{.} @tab a non-blank character
1428	@item @code{\w} @tab a letter
1429	@item @code{\W} @tab a non-blank character other than a letter
1430	@item @code{\d} @tab a digit
1431	@item @code{\D} @tab a non-blank character other than a digit
1432	@item @code{\s} @tab a space or tab character
1433	@item @code{\S} @tab a non-blank character (the same as @code{.})
1434	@item @code{\l} @tab a lowercase letter
1435	@item @code{\L} @tab an uppercase letter
1436	@end multitable
1437
1438
1439	@noindent The following characters:
1440	@example
1441	@verb{% [ ] ^ \| * + ? { } , . < > \ %}
1442	@end example
1443	must be escaped with a backslash, i.e. written as:
1444	@example
1445	@verb{% \[ \] \^ \\| \* \+ \? \{ \} \, \. \< \> \\ %}
1446	@end example
1447
1448	@quotation Note
1449	The special symbols are ... borrowed from Perl with minor
1450	modifications ... for convenience
1451	The meaning of certain special characters/sequences slightly differs
1452	from their common ???. This is motivated by convenience reasons.
1453	The meaning of the @code{.} special character is modified due to
1454	the special function of spaces in utt files (they are field
1455	separators). Use @code{\s} to explicitly
1456	@end quotation
1457
1458	In the argument of the @code{cat} term a special operator <...> may be
1459	used. A category specification enclosed in angle brackets matches all
1460	category descriptions which are consistent (non-contradictory) with the
1461	specification. For example @code{<N>} matches all noun descriptions,
1462	@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
1463
1464
1465	@*
1466	@noindent @b{Examples of one-segment patterns:}
1467
1468	@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1469	@item @code{seg} @tab any segment
1470	@item @code{word} @tab any word-form
1471	@item @code{word(pomocy)} @tab the word-form @samp{pomocy}
1472	@item @code{word(naj.+)} @tab a word-form beginning with @samp{naj}
1473	@item @code{word(\L\l+)} @tab a capitalized word-form
1474	@item @code{punct} @tab a punctuation character
1475	@item @code{space(.\\n.)} @tab a space segment containing a newline character
1476	@item @code{lexeme(pomoc)} @tab any form of the lexeme 'pomoc'
1477	@item @code{cat(N/.*)} @tab a word which category starts with @code{N/}
1478	@item @code{cat(<N/Ca>)} @tab a word which category matches @code{N/Ca}
1479	@end multitable
1480
1481	@*
1482	@noindent @b{Examples of multi-segment patterns:}
1483
1484	@table @code
1485
1486	@item (word(\L) punct(\.) space?)+ word(\L\l+)
1487	a sequence of initials followed by a surname
1488
1489	@item punct seg(W\|S\|N)* cat(<NPRO/Sr>) seg(W\|S\|N)* punct
1490	a text fragment between two punctuation characters, containing an
1491	ocurrence of a relative pronoun
1492
1493	@end table
1494
1495
1496	@node ser how ser works
1497	@subsection How ser works
1498
1499	@node ser customization
1500	@subsection Customization
1501
1502	@c All predefined terms correspond to single segments,
1503
1504	@example
1505	define(`verbseq', `(cat(V) (space cat(V)))')
1506	@end example
1507
1508
1509	the term @code{cat()} may not be used as a ... of
1510
1511	@c See @command{m4} manual for further details on macro definition format.
1512
1513	@node ser limitations
1514	@subsection Limitations
1515
1516	more than 3 attributes in <>.
1517
1518	@node ser requirements
1519	@subsection Requirements
1520
1521	In order to run @command{ser}, the following programs must be
1522	installed in the system:
1523
1524	@itemize
1525
1526	@item @command{m4}
1527	@item @command{grep}
1528	@item @command{flex}
1529	@item @command{gcc}
1530
1531	@end itemize
1532
1533
1534	@c GRP
1535	@c ---------------------------------------------------------------------
1536	@c ---------------------------------------------------------------------
1537
1538	@page
1539	@node grp
1540	@section grp - pattern search tool
1541
1542	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1543	@item @strong{Authors:} @tab Tomasz Obrêbski
1544	@item @strong{Component category:} @tab filter
1545	@end multitable
1546
1547
1548	@code{gre} selects sentences containing an expression matching a
1549	pattern. The pattern format is exactly the same as that accepted by
1550	@code{ser}.
1551
1552	@code{gre} is intended mainly for speeding up corpus search process.
1553	It is extremely fast (processing speed is usually higher then the speed
1554	of reading the corpus file from disk).
1555
1556
1557
1558	@c @menu
1559	@c * ser command line options::
1560	@c * ser pattern::
1561	@c * ser how ser works::
1562	@c * ser customization::
1563	@c * ser limitations::
1564	@c * ser requirements::
1565	@c @end menu
1566	@menu
1567	* grp command line options::
1568	* grp pattern::
1569	* grp hints::
1570	@end menu
1571
1572	@node grp command line options
1573	@subsection Command line options
1574
1575	@table @code
1576
1577	@parhelp
1578	@parversion
1579	@c @parfile
1580	@c @paroutput
1581	@c @parinputfield
1582	@c @paroutputfield
1583	@parprocess
1584	@parinteractive
1585
1586	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1587	The search pattern.
1588
1589	@item @b{@minus{}@minus{}morph=@var{field}}
1590	The name of the annotation field containing the morphological
1591	description (default @code{lem}).
1592
1593	@item @b{@minus{}@minus{}command}
1594	Only print the generated flex source code.
1595
1596	@item @b{@minus{}@minus{}macro=@var{filename}}
1597	Read macrodefinitions from file @var{filename} rather than from
1598	default location. This option allows to redefine the set of terms.
1599
1600	@item @b{@minus{}@minus{}define=@var{filename}}
1601	Append macrodefinitions from file @var{filename}. This option
1602	allows to extend the set of terms.
1603
1604	@end table
1605
1606
1607	@node grp pattern
1608	@subsection Pattern
1609
1610	(see @code{ser})
1611
1612	@node grp hints
1613	@subsection Hints
1614
1615	The corpus search speed may be increased by combining grp with lzop
1616	compression tool (grp usually processes data faster than it is read from a
1617	disk, especially for slow laptop drives).
1618
1619	@example
1620	cat corpus \| tok \| sen \| lem \| grp -a p \| lzop -7 > corpus.grp.lzo
1621	@end example
1622
1623	@example
1624	lzop -cd corpus.grp.lzo \| grp -a gP -e @var{EXPR} \| ser -e @var{EXPR}
1625	@end example
1626
1627
1628	@c ---------------------------------------------------------------------
1629	@c kot
1630	@c ---------------------------------------------------------------------
1631	@c ---------------------------------------------------------------------
1632
1633	@page
1634	@node kot
1635	@section kot - untokenizer
1636
1637	Authors: Tomasz Obrêbski
1638
1639	@command{kot} is the opposite of @command{tok}. It changes UTT-formatted text into plain text.
1640
1641	@menu
1642	* kot command line options::
1643	* kot usage examples::
1644	@end menu
1645
1646	@node kot command line options
1647	@subsection Command line options
1648
1649	@table @code
1650
1651	@parhelp
1652
1653	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1654
1655	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1656
1657	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1658
1659	@c @item @b{@minus{}@minus{}interactive @minus{}i}
1660
1661	@c @item @b{@minus{}@minus{}config=@var{filename}}
1662
1663	@item
1664
1665	@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
1666	print @var{string} between nonadjacent segments of the input file
1667
1668	@item @b{@minus{}@minus{}spaces, @minus{}r}
1669	retain the special characters @code{_}, @code{\t},
1670	@code{\n}, @code{\r}, @code{\f} unexpanded in the output
1671
1672	@end table
1673
1674	@node kot usage examples
1675	@subsection Usage examples
1676
1677	@example
1678	cat legia.txt \| tok \| kot
1679	@end example
1680
1681	@example
1682	cat legia.txt \| tok \| lem -1 \| kot
1683	@end example
1684
1685	@c CON............................................................
1686	@c ...............................................................
1687	@c ...............................................................
1688
1689	@page
1690	@node con
1691	@section con - concordance table generator
1692
1693	@command{con} generates a concordance table based on a pattern given to @command{ser}.
1694
1695	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1696	@item @strong{Authors:} @tab Justyna Walkowska
1697	@item @strong{Component category:} @tab sink
1698	@end multitable
1699	@c
1700
1701	@menu
1702	* con command line options::
1703	* con usage example::
1704	* con hints::
1705	@end menu
1706
1707	@node con command line options
1708	@subsection Command line options
1709
1710	@table @code
1711
1712	@parhelp
1713
1714	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
1715	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1716	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1717	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1718	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
1719	@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
1720	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
1721	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
1722	@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
1723	@c @item @b{@minus{}@minus{}interactive @minus{}i}
1724	@c @item @b{@minus{}@minus{}config=@var{filename}}
1725	@c @item
1726	@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1727	@c search pattern
1728	@c
1729	@c @item @b{@minus{}@minus{}flex}
1730	@c only print the generated flex source code
1731	@c
1732	@c @item @b{@minus{}@minus{}macro=@var{filename}}
1733	@c read macrodefinitions from file @var{filename} rather than from
1734	@c default location. This option allows to redefine the set of terms.
1735	@c
1736	@c @item @b{@minus{}@minus{}define=@var{filename}}
1737	@c append macrodefinitions from file @var{filename}. This option
1738	@c allows to extend the set of terms.
1739
1740	@item @b{@minus{}@minus{}left @minus{}l}
1741	Left context info (default='30c'). Example:
1742	@example
1743	-l=5c: left context is 5 characters
1744	-l=5w: left context is 5 words
1745	-l=5s: left context is 5 non-empty input lines
1746	-l='\s*\S+\sr\S+BOS': left context starts with the given regex
1747	@end example
1748
1749	@item @b{@minus{}@minus{}right @minus{}r}
1750	Right context info (default='30c').
1751	@item @b{@minus{}@minus{}trim @minus{}t}
1752	Clear incomplete words from output.
1753	@item @b{@minus{}@minus{}white @minus{}w}
1754	DO NOT change all white characters into spaces.
1755	@item @b{@minus{}@minus{}column @minus{}c}
1756	Left column minimal width in characters (default = 0).
1757	@item @b{@minus{}@minus{}ignore @minus{}i}
1758	Ignore segment inconsistency in the input.
1759	@item @b{@minus{}@minus{}bon}
1760	Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
1761	@item @b{@minus{}@minus{}eob}
1762	End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
1763	@item @b{@minus{}@minus{}bod}
1764	Selected segment beginning display string (default='[').
1765	@item @b{@minus{}@minus{}eod}
1766	Selected segment end display string (default=']').
1767
1768
1769
1770	@end table
1771
1772	@node con usage example
1773	@subsection Usage example
1774	@example
1775	cat file.txt \| tok \| lem -1 \| ser -e 'lexeme(dom) \| con'
1776	@end example
1777
1778
1779	@node con hints
1780	@subsection Hints
1781
1782	@command{con} is a rather slow program. Do not pass large amounts of
1783	redundant text through this program. @command{con} works fine in the following
1784	sequence:
1785
1786	@example
1787	... \| grp -e EXPR \| ser -e EXPR \| con
1788	@end example
1789
1790
1791
1792	@c ---------------------------------------------------------------------
1793	@c ---------------------------------------------------------------------
1794
1795	@page
1796	@node Auxiliary tools
1797	@chapter Auxiliary tools
1798
1799	@menu
1800	* compiledic:: dictionary compiler
1801	* fla:: UTT file flattener
1802	* unfla:: UTT file unflattener
1803	@end menu
1804
1805
1806	@page
1807	@node compiledic
1808	@section compiledic - the dictionary compiler
1809
1810	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1811	@item @strong{Authors:} @tab Michal Stolarski, Tomasz Obrebski
1812	@item @strong{Component category:} @tab additional tool
1813	@end multitable
1814	@c
1815
1816	@command{compiledic} compiles dictionaries in text format (@code{.dic} extension) into binary
1817	(FSA) format (@code{.bin} extension).
1818
1819	Automaton representation of a dictionary is built using the AT&T tools:
1820	@itemize
1821	@item AT&T FSM Library,
1822	@item AT&T Lextools.
1823	@end itemize
1824
1825	In order for the compiledic program to work you have to install the
1826	above mentioned packages into your system. They are freely available
1827	for non-commercial use.
1828
1829	Usage:
1830	@example
1831	compiledic <dictionaryname>.dic
1832	@end example
1833
1834	The file <dictionaryname>.bin will be generated.
1835
1836	Remarque: The program produces a lot of temporary files which are
1837	stored in the current directory. They are deleted after successfull
1838	termination of the program.
1839
1840	@c @menu
1841	@c * con command line options::
1842	@c * con usage example::
1843	@c * con hints::
1844	@c @end menu
1845
1846
1847	@page
1848	@node fla
1849	@section fla - the UTT file flattener
1850
1851	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1852	@item @strong{Authors:} @tab Tomasz Obrêbski
1853	@item @strong{Component category:} @tab filter
1854	@end multitable
1855	@c
1856
1857	@command{fla} ``flattens'' a utt file by merging segments belonging
1858	to one sentence in one line. Technically, end-of-line characters
1859	('\n', ASCII code 10) are replaced with line-feed characters ('\f',
1860	ASCII code 12). The flattening makes it possible to process UTT files
1861	with such tools as @command{grep} or @command{sed} sentence by
1862	sentence (used in @command{grp} and @command{mar}).
1863
1864	Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
1865
1866	Flattened files are still human-readible.
1867
1868	Usage:
1869
1870	@example
1871	fla [<bosregex>]
1872	@end example
1873
1874	The facultative argument is a regular expression describing segments
1875	which should be treated as sentence beginnings (the test is: the
1876	segment contains a fragment matching the @code{<bosregex>}). By
1877	default, segments containing a field @code{BOS} are seeked.
1878	@c @menu
1879	@c * con command line options::
1880	@c * con usage example::
1881	@c * con hints::
1882	@c @end menu
1883
1884
1885
1886	@page
1887	@node unfla
1888	@section unfla - the UTT file unflattener
1889
1890	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1891	@item @strong{Authors:} @tab Tomasz Obrêbski
1892	@item @strong{Component category:} @tab filter
1893	@end multitable
1894
1895	@command{unfla} transforms a flattened UTT file, produced by
1896	@command{fla}, into the regular format by restoring end-of-line
1897	characters.
1898
1899
1900
1901
1902	@c ---------------------------------------------------------------------
1903	@c USAGE EXAMPLES
1904	@c ---------------------------------------------------------------------
1905
1906	@node Usage examples
1907	@chapter Usage examples
1908
1909	@subsubheading Simple pipelines
1910
1911	@enumerate
1912
1913	@item tokenization
1914
1915	cat text \| tok > output1
1916
1917	@item morphological annotation (1)
1918
1919	simple dictionary based lemmatization
1920
1921	cat text \| tok \| lem > output1
1922
1923	@item morphological annotation (2)
1924
1925	1) perform dictionary-based lemmatization
1926	4) guess descriptions for words which have no annotation
1927
1928	@example
1929	cat text \| tok \| lem \| gue -S lem > output2
1930	@end example
1931
1932	@item morphological annotation (3)
1933
1934	1) perform dictionary-based lemmatization
1935	2) try to correct words with no annotation
1936	3) perform dictionary-based lemmatization of corrected words
1937	4) guess descriptions for words which still have no annotation
1938
1939	@example
1940	cat text \| tok \| lem \| cor -p W -S lem \| lem -I cor \| gue -p W -S lem
1941	@end example
1942	@item spelling correction
1943
1944
1945
1946	@example
1947	cat text \| tok \| lem --only-fail \| cor -1 > output3
1948	@end example
1949
1950	@item Expression extraction
1951
1952	Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
1953
1954	@example
1955	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' -m \| kot > output4
1956	@end example
1957
1958	@item A word in context
1959
1960	Extraction of text fragments containing a form of the lexeme 'rozmowa' in
1961	the context of 5 preceeding and 5 succeeding corpus segments.
1962
1963	@example
1964	cat text \| tok \| lem -1 \| ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m \| kot > output
1965	@end example
1966
1967	@item generation of concordance table (1)
1968
1969	@example
1970	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
1971	@end example
1972
1973	10"
1974
1975	@item generation of concordance table (2)
1976
1977	The same as above but much faster
1978
1979	@example
1980	cat text \| tok \| lem -1 \| \
1981	grp -e 'cat(<V>) space lexeme(rozmowa)' \| \
1982	ser -e 'cat(<V>) space lexeme(rozmowa)' \| \
1983	con
1984	@end example
1985
1986	2"
1987
1988	@item generation of concordance table (3)
1989
1990	Usually, one performs repetitively search over the same corpus. In
1991	such case it is advisable to transform the corpus data into the format
1992	required by @command{grp} first, and then use the preprocessed data.
1993
1994	As @command{grp} (@command{grep}) processes data faster then it is
1995	read from the disk drive, the search time may be still shortened by
1996	using file compression techniques. We suggest usin @command{lzop}.
1997
1998	@item the fastest way to search a large corpus
1999
2000	step 1: preprocessing
2001
2002	@example
2003	cat corpus \| tok \| sen \| lem -1 \
2004	\| grp -a p \| lzop -7 > corpus.grp.lzo
2005	@end example
2006
2007	step 2: search
2008
2009	@example
2010	lzop -cd corpus.grp.lzo \| grp -a gP -e 'cat(<V>) space
2011	lexeme(rozmowa)' \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
2012	@end example
2013
2014	@end enumerate
2015
2016	@subsubheading More complicated configurations
2017
2018
2019	@example
2020	mknod fifo1 p
2021	mknod fifo2 p
2022	mknod fifo3 p
2023	mknod fifo4 p
2024	mknod fifo5 p
2025
2026	tok \| lem -p W -e fifo1 > fifo2 &
2027	cor -e fifo3 < fifo1 \| lem > fifo4 &
2028	gue < fifo3 > fifo5 &
2029	sort -m fifo2 fifo4 fifo5
2030
2031	rm fifo?
2032	@end example
2033
2034
2035	@c ---------------------------------------------------------------------
2036	@c ---------------------------------------------------------------------
2037
2038	@c ---------------------------------------------------------------------
2039	@c PMDBF DICTIONARY
2040	@c ---------------------------------------------------------------------
2041
2042	@node PMDBF dictionary
2043	@chapter PMDBF dictionary
2044
2045	UTT components come with lexical data derived from Polish
2046	Morphological Database (PMDB).
2047
2048	@menu
2049	* PMDBF files::
2050	* PMDBF tag structure::
2051	* PMDBF parts of speech::
2052	* PMDBF morphosyntactic attributes::
2053	@end menu
2054
2055	@node PMDBF files
2056	@section Files
2057
2058	@node PMDBF tag structure
2059	@section Tag structure
2060
2061	pos = [[:upper:]]+
2062
2063	attr = [[:upper:]]+
2064
2065	val = [[:lower:][:digit:]?!*+-] \| <[^>\n]+>
2066
2067	descr = pos ( / ( attr val + ) + ) ?
2068
2069	@node PMDBF parts of speech
2070	@section Parts of speech
2071
2072	@multitable {ADJPRP} { adjectival-passive-participle }
2073	@item @code{N} @tab noun
2074	@item @code{NPRO} @tab nominal-pronoun
2075	@item @code{NV} @tab deverbal-noun
2076	@item @code{V} @tab verb
2077	@item @code{BYC} @tab byc
2078	@item @code{VNI} @tab non-inflected-verb
2079	@item @code{ADJ} @tab adjective
2080	@item @code{ADJPAP} @tab adjectival-passive-participle
2081	@item @code{ADJPRP} @tab adjectival-present-participle
2082	@item @code{ADJPP} @tab adjectival-past-participle
2083	@item @code{ADJPRO} @tab adjectival-pronoun
2084	@item @code{ADJNUM} @tab adjectival-numeral
2085	@item @code{ADV} @tab adverb
2086	@item @code{ADVANP} @tab adverbial-anterior-participle
2087	@item @code{ADVPRP} @tab adverbial-present-participle
2088	@item @code{ADVPRO} @tab adverbial-pronoun
2089	@item @code{ADVNUM} @tab adverbial-numeral
2090	@item @code{P} @tab preposition
2091	@item @code{PPRO} @tab prep-noun-pronoun
2092	@item @code{CONJ} @tab conjunction
2093	@item @code{EXCL} @tab exclamation
2094	@item @code{APP} @tab call
2095	@item @code{ONO} @tab onomatopoeia
2096	@item @code{PART} @tab particle
2097	@item @code{NUMCRD} @tab cardinal-numeral
2098	@item @code{NUMCOL} @tab collective-numeral
2099	@item @code{NUMPAR} @tab partitive-numeral
2100	@item @code{NUMORD} @tab ordinal-numeral
2101	@end multitable
2102
2103	@node PMDBF morphosyntactic attributes
2104	@section Morphosyntactic attributes
2105
2106	@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2107	@c @headitem Attr @tab Val @tab Description
2108	@item
2109	@code{A} @tab @tab Aspect
2110	@item
2111	@tab @code{p} @tab perfect
2112	@item
2113	@tab @code{i} @tab imperfect.
2114	@item
2115	@item
2116	@code{V} @tab @tab Verb-Form
2117	@item
2118	@tab @code{b} @tab infinitive,
2119	@item
2120	@tab @code{p} @tab personal,
2121	@item
2122	@tab @code{i} @tab impersonal.
2123	@item
2124	@item
2125	@code{M} @tab @tab Mood
2126	@item
2127	@tab @code{d} @tab declarative,
2128	@item
2129	@tab @code{c} @tab conditional,
2130	@item
2131	@tab @code{i} @tab imperative.
2132	@item
2133	@item
2134	@code{T} @tab @tab Tense
2135	@item
2136	@tab @code{a} @tab past,
2137	@item
2138	@tab @code{r} @tab present,
2139	@item
2140	@tab @code{f} @tab future.
2141	@item
2142	@item
2143	@code{P} @tab @tab Person
2144	@item
2145	@tab @code{1} @tab 1,
2146	@item
2147	@tab @code{2} @tab 2,
2148	@item
2149	@tab @code{3} @tab 3.
2150	@item
2151	@item
2152	@code{D} @tab @tab Degree
2153	@item
2154	@tab @code{p} @tab positive,
2155	@item
2156	@tab @code{c} @tab comparative,
2157	@item
2158	@tab @code{s} @tab superlative.
2159	@item
2160	@item
2161	@code{N} @tab @tab Number
2162	@item
2163	@tab @code{s} @tab singular,
2164	@item
2165	@tab @code{p} @tab plural.
2166	@item
2167	@item
2168	@code{C} @tab @tab Case
2169	@item
2170	@tab @code{n} @tab nominative,
2171	@item
2172	@tab @code{g} @tab genitive,
2173	@item
2174	@tab @code{d} @tab dative,
2175	@item
2176	@tab @code{a} @tab accusative,
2177	@item
2178	@tab @code{i} @tab instrumantal,
2179	@item
2180	@tab @code{l} @tab locative,
2181	@item
2182	@tab @code{v} @tab vocative.
2183	@item
2184	@item
2185	@code{G} @tab @tab Gender
2186	@item
2187	@tab @code{p} @tab masculine-personal,
2188	@item
2189	@tab @code{a} @tab masculine-animal,
2190	@item
2191	@tab @code{i} @tab masculine-inanimate,
2192	@item
2193	@tab @code{f} @tab feminine,
2194	@item
2195	@tab @code{n} @tab neuter.
2196	@end multitable
2197
2198
2199	@c ---------------------------------------------------------------------
2200	@c ---------------------------------------------------------------------
2201	@c
2202	@c @node Examples
2203	@c @chapter Examples
2204
2205	@c ----------------------------------------------------------------------
2206	@c ----------------------------------------------------------------------
2207
2208	@node GNU Free Documentation License
2209	@chapter GNU Free Documentation License
2210
2211	@c The GNU Free Documentation License.
2212	@center Version 1.2, November 2002
2213
2214	@c This file is intended to be included within another document,
2215	@c hence no sectioning command or @node.
2216
2217	@display
2218	Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
2219	51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
2220
2221	Everyone is permitted to copy and distribute verbatim copies
2222	of this license document, but changing it is not allowed.
2223	@end display
2224
2225	@enumerate 0
2226	@item
2227	PREAMBLE
2228
2229	The purpose of this License is to make a manual, textbook, or other
2230	functional and useful document @dfn{free} in the sense of freedom: to
2231	assure everyone the effective freedom to copy and redistribute it,
2232	with or without modifying it, either commercially or noncommercially.
2233	Secondarily, this License preserves for the author and publisher a way
2234	to get credit for their work, while not being considered responsible
2235	for modifications made by others.
2236
2237	This License is a kind of ``copyleft'', which means that derivative
2238	works of the document must themselves be free in the same sense. It
2239	complements the GNU General Public License, which is a copyleft
2240	license designed for free software.
2241
2242	We have designed this License in order to use it for manuals for free
2243	software, because free software needs free documentation: a free
2244	program should come with manuals providing the same freedoms that the
2245	software does. But this License is not limited to software manuals;
2246	it can be used for any textual work, regardless of subject matter or
2247	whether it is published as a printed book. We recommend this License
2248	principally for works whose purpose is instruction or reference.
2249
2250	@item
2251	APPLICABILITY AND DEFINITIONS
2252
2253	This License applies to any manual or other work, in any medium, that
2254	contains a notice placed by the copyright holder saying it can be
2255	distributed under the terms of this License. Such a notice grants a
2256	world-wide, royalty-free license, unlimited in duration, to use that
2257	work under the conditions stated herein. The ``Document'', below,
2258	refers to any such manual or work. Any member of the public is a
2259	licensee, and is addressed as ``you''. You accept the license if you
2260	copy, modify or distribute the work in a way requiring permission
2261	under copyright law.
2262
2263	A ``Modified Version'' of the Document means any work containing the
2264	Document or a portion of it, either copied verbatim, or with
2265	modifications and/or translated into another language.
2266
2267	A ``Secondary Section'' is a named appendix or a front-matter section
2268	of the Document that deals exclusively with the relationship of the
2269	publishers or authors of the Document to the Document's overall
2270	subject (or to related matters) and contains nothing that could fall
2271	directly within that overall subject. (Thus, if the Document is in
2272	part a textbook of mathematics, a Secondary Section may not explain
2273	any mathematics.) The relationship could be a matter of historical
2274	connection with the subject or with related matters, or of legal,
2275	commercial, philosophical, ethical or political position regarding
2276	them.
2277
2278	The ``Invariant Sections'' are certain Secondary Sections whose titles
2279	are designated, as being those of Invariant Sections, in the notice
2280	that says that the Document is released under this License. If a
2281	section does not fit the above definition of Secondary then it is not
2282	allowed to be designated as Invariant. The Document may contain zero
2283	Invariant Sections. If the Document does not identify any Invariant
2284	Sections then there are none.
2285
2286	The ``Cover Texts'' are certain short passages of text that are listed,
2287	as Front-Cover Texts or Back-Cover Texts, in the notice that says that
2288	the Document is released under this License. A Front-Cover Text may
2289	be at most 5 words, and a Back-Cover Text may be at most 25 words.
2290
2291	A ``Transparent'' copy of the Document means a machine-readable copy,
2292	represented in a format whose specification is available to the
2293	general public, that is suitable for revising the document
2294	straightforwardly with generic text editors or (for images composed of
2295	pixels) generic paint programs or (for drawings) some widely available
2296	drawing editor, and that is suitable for input to text formatters or
2297	for automatic translation to a variety of formats suitable for input
2298	to text formatters. A copy made in an otherwise Transparent file
2299	format whose markup, or absence of markup, has been arranged to thwart
2300	or discourage subsequent modification by readers is not Transparent.
2301	An image format is not Transparent if used for any substantial amount
2302	of text. A copy that is not ``Transparent'' is called ``Opaque''.
2303
2304	Examples of suitable formats for Transparent copies include plain
2305	@sc{ascii} without markup, Texinfo input format, La@TeX{} input
2306	format, @acronym{SGML} or @acronym{XML} using a publicly available
2307	@acronym{DTD}, and standard-conforming simple @acronym{HTML},
2308	PostScript or @acronym{PDF} designed for human modification. Examples
2309	of transparent image formats include @acronym{PNG}, @acronym{XCF} and
2310	@acronym{JPG}. Opaque formats include proprietary formats that can be
2311	read and edited only by proprietary word processors, @acronym{SGML} or
2312	@acronym{XML} for which the @acronym{DTD} and/or processing tools are
2313	not generally available, and the machine-generated @acronym{HTML},
2314	PostScript or @acronym{PDF} produced by some word processors for
2315	output purposes only.
2316
2317	The ``Title Page'' means, for a printed book, the title page itself,
2318	plus such following pages as are needed to hold, legibly, the material
2319	this License requires to appear in the title page. For works in
2320	formats which do not have any title page as such, ``Title Page'' means
2321	the text near the most prominent appearance of the work's title,
2322	preceding the beginning of the body of the text.
2323
2324	A section ``Entitled XYZ'' means a named subunit of the Document whose
2325	title either is precisely XYZ or contains XYZ in parentheses following
2326	text that translates XYZ in another language. (Here XYZ stands for a
2327	specific section name mentioned below, such as ``Acknowledgements'',
2328	``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title''
2329	of such a section when you modify the Document means that it remains a
2330	section ``Entitled XYZ'' according to this definition.
2331
2332	The Document may include Warranty Disclaimers next to the notice which
2333	states that this License applies to the Document. These Warranty
2334	Disclaimers are considered to be included by reference in this
2335	License, but only as regards disclaiming warranties: any other
2336	implication that these Warranty Disclaimers may have is void and has
2337	no effect on the meaning of this License.
2338
2339	@item
2340	VERBATIM COPYING
2341
2342	You may copy and distribute the Document in any medium, either
2343	commercially or noncommercially, provided that this License, the
2344	copyright notices, and the license notice saying this License applies
2345	to the Document are reproduced in all copies, and that you add no other
2346	conditions whatsoever to those of this License. You may not use
2347	technical measures to obstruct or control the reading or further
2348	copying of the copies you make or distribute. However, you may accept
2349	compensation in exchange for copies. If you distribute a large enough
2350	number of copies you must also follow the conditions in section 3.
2351
2352	You may also lend copies, under the same conditions stated above, and
2353	you may publicly display copies.
2354
2355	@item
2356	COPYING IN QUANTITY
2357
2358	If you publish printed copies (or copies in media that commonly have
2359	printed covers) of the Document, numbering more than 100, and the
2360	Document's license notice requires Cover Texts, you must enclose the
2361	copies in covers that carry, clearly and legibly, all these Cover
2362	Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
2363	the back cover. Both covers must also clearly and legibly identify
2364	you as the publisher of these copies. The front cover must present
2365	the full title with all words of the title equally prominent and
2366	visible. You may add other material on the covers in addition.
2367	Copying with changes limited to the covers, as long as they preserve
2368	the title of the Document and satisfy these conditions, can be treated
2369	as verbatim copying in other respects.
2370
2371	If the required texts for either cover are too voluminous to fit
2372	legibly, you should put the first ones listed (as many as fit
2373	reasonably) on the actual cover, and continue the rest onto adjacent
2374	pages.
2375
2376	If you publish or distribute Opaque copies of the Document numbering
2377	more than 100, you must either include a machine-readable Transparent
2378	copy along with each Opaque copy, or state in or with each Opaque copy
2379	a computer-network location from which the general network-using
2380	public has access to download using public-standard network protocols
2381	a complete Transparent copy of the Document, free of added material.
2382	If you use the latter option, you must take reasonably prudent steps,
2383	when you begin distribution of Opaque copies in quantity, to ensure
2384	that this Transparent copy will remain thus accessible at the stated
2385	location until at least one year after the last time you distribute an
2386	Opaque copy (directly or through your agents or retailers) of that
2387	edition to the public.
2388
2389	It is requested, but not required, that you contact the authors of the
2390	Document well before redistributing any large number of copies, to give
2391	them a chance to provide you with an updated version of the Document.
2392
2393	@item
2394	MODIFICATIONS
2395
2396	You may copy and distribute a Modified Version of the Document under
2397	the conditions of sections 2 and 3 above, provided that you release
2398	the Modified Version under precisely this License, with the Modified
2399	Version filling the role of the Document, thus licensing distribution
2400	and modification of the Modified Version to whoever possesses a copy
2401	of it. In addition, you must do these things in the Modified Version:
2402
2403	@enumerate A
2404	@item
2405	Use in the Title Page (and on the covers, if any) a title distinct
2406	from that of the Document, and from those of previous versions
2407	(which should, if there were any, be listed in the History section
2408	of the Document). You may use the same title as a previous version
2409	if the original publisher of that version gives permission.
2410
2411	@item
2412	List on the Title Page, as authors, one or more persons or entities
2413	responsible for authorship of the modifications in the Modified
2414	Version, together with at least five of the principal authors of the
2415	Document (all of its principal authors, if it has fewer than five),
2416	unless they release you from this requirement.
2417
2418	@item
2419	State on the Title page the name of the publisher of the
2420	Modified Version, as the publisher.
2421
2422	@item
2423	Preserve all the copyright notices of the Document.
2424
2425	@item
2426	Add an appropriate copyright notice for your modifications
2427	adjacent to the other copyright notices.
2428
2429	@item
2430	Include, immediately after the copyright notices, a license notice
2431	giving the public permission to use the Modified Version under the
2432	terms of this License, in the form shown in the Addendum below.
2433
2434	@item
2435	Preserve in that license notice the full lists of Invariant Sections
2436	and required Cover Texts given in the Document's license notice.
2437
2438	@item
2439	Include an unaltered copy of this License.
2440
2441	@item
2442	Preserve the section Entitled ``History'', Preserve its Title, and add
2443	to it an item stating at least the title, year, new authors, and
2444	publisher of the Modified Version as given on the Title Page. If
2445	there is no section Entitled ``History'' in the Document, create one
2446	stating the title, year, authors, and publisher of the Document as
2447	given on its Title Page, then add an item describing the Modified
2448	Version as stated in the previous sentence.
2449
2450	@item
2451	Preserve the network location, if any, given in the Document for
2452	public access to a Transparent copy of the Document, and likewise
2453	the network locations given in the Document for previous versions
2454	it was based on. These may be placed in the ``History'' section.
2455	You may omit a network location for a work that was published at
2456	least four years before the Document itself, or if the original
2457	publisher of the version it refers to gives permission.
2458
2459	@item
2460	For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
2461	the Title of the section, and preserve in the section all the
2462	substance and tone of each of the contributor acknowledgements and/or
2463	dedications given therein.
2464
2465	@item
2466	Preserve all the Invariant Sections of the Document,
2467	unaltered in their text and in their titles. Section numbers
2468	or the equivalent are not considered part of the section titles.
2469
2470	@item
2471	Delete any section Entitled ``Endorsements''. Such a section
2472	may not be included in the Modified Version.
2473
2474	@item
2475	Do not retitle any existing section to be Entitled ``Endorsements'' or
2476	to conflict in title with any Invariant Section.
2477
2478	@item
2479	Preserve any Warranty Disclaimers.
2480	@end enumerate
2481
2482	If the Modified Version includes new front-matter sections or
2483	appendices that qualify as Secondary Sections and contain no material
2484	copied from the Document, you may at your option designate some or all
2485	of these sections as invariant. To do this, add their titles to the
2486	list of Invariant Sections in the Modified Version's license notice.
2487	These titles must be distinct from any other section titles.
2488
2489	You may add a section Entitled ``Endorsements'', provided it contains
2490	nothing but endorsements of your Modified Version by various
2491	parties---for example, statements of peer review or that the text has
2492	been approved by an organization as the authoritative definition of a
2493	standard.
2494
2495	You may add a passage of up to five words as a Front-Cover Text, and a
2496	passage of up to 25 words as a Back-Cover Text, to the end of the list
2497	of Cover Texts in the Modified Version. Only one passage of
2498	Front-Cover Text and one of Back-Cover Text may be added by (or
2499	through arrangements made by) any one entity. If the Document already
2500	includes a cover text for the same cover, previously added by you or
2501	by arrangement made by the same entity you are acting on behalf of,
2502	you may not add another; but you may replace the old one, on explicit
2503	permission from the previous publisher that added the old one.
2504
2505	The author(s) and publisher(s) of the Document do not by this License
2506	give permission to use their names for publicity for or to assert or
2507	imply endorsement of any Modified Version.
2508
2509	@item
2510	COMBINING DOCUMENTS
2511
2512	You may combine the Document with other documents released under this
2513	License, under the terms defined in section 4 above for modified
2514	versions, provided that you include in the combination all of the
2515	Invariant Sections of all of the original documents, unmodified, and
2516	list them all as Invariant Sections of your combined work in its
2517	license notice, and that you preserve all their Warranty Disclaimers.
2518
2519	The combined work need only contain one copy of this License, and
2520	multiple identical Invariant Sections may be replaced with a single
2521	copy. If there are multiple Invariant Sections with the same name but
2522	different contents, make the title of each such section unique by
2523	adding at the end of it, in parentheses, the name of the original
2524	author or publisher of that section if known, or else a unique number.
2525	Make the same adjustment to the section titles in the list of
2526	Invariant Sections in the license notice of the combined work.
2527
2528	In the combination, you must combine any sections Entitled ``History''
2529	in the various original documents, forming one section Entitled
2530	``History''; likewise combine any sections Entitled ``Acknowledgements'',
2531	and any sections Entitled ``Dedications''. You must delete all
2532	sections Entitled ``Endorsements.''
2533
2534	@item
2535	COLLECTIONS OF DOCUMENTS
2536
2537	You may make a collection consisting of the Document and other documents
2538	released under this License, and replace the individual copies of this
2539	License in the various documents with a single copy that is included in
2540	the collection, provided that you follow the rules of this License for
2541	verbatim copying of each of the documents in all other respects.
2542
2543	You may extract a single document from such a collection, and distribute
2544	it individually under this License, provided you insert a copy of this
2545	License into the extracted document, and follow this License in all
2546	other respects regarding verbatim copying of that document.
2547
2548	@item
2549	AGGREGATION WITH INDEPENDENT WORKS
2550
2551	A compilation of the Document or its derivatives with other separate
2552	and independent documents or works, in or on a volume of a storage or
2553	distribution medium, is called an ``aggregate'' if the copyright
2554	resulting from the compilation is not used to limit the legal rights
2555	of the compilation's users beyond what the individual works permit.
2556	When the Document is included in an aggregate, this License does not
2557	apply to the other works in the aggregate which are not themselves
2558	derivative works of the Document.
2559
2560	If the Cover Text requirement of section 3 is applicable to these
2561	copies of the Document, then if the Document is less than one half of
2562	the entire aggregate, the Document's Cover Texts may be placed on
2563	covers that bracket the Document within the aggregate, or the
2564	electronic equivalent of covers if the Document is in electronic form.
2565	Otherwise they must appear on printed covers that bracket the whole
2566	aggregate.
2567
2568	@item
2569	TRANSLATION
2570
2571	Translation is considered a kind of modification, so you may
2572	distribute translations of the Document under the terms of section 4.
2573	Replacing Invariant Sections with translations requires special
2574	permission from their copyright holders, but you may include
2575	translations of some or all Invariant Sections in addition to the
2576	original versions of these Invariant Sections. You may include a
2577	translation of this License, and all the license notices in the
2578	Document, and any Warranty Disclaimers, provided that you also include
2579	the original English version of this License and the original versions
2580	of those notices and disclaimers. In case of a disagreement between
2581	the translation and the original version of this License or a notice
2582	or disclaimer, the original version will prevail.
2583
2584	If a section in the Document is Entitled ``Acknowledgements'',
2585	``Dedications'', or ``History'', the requirement (section 4) to Preserve
2586	its Title (section 1) will typically require changing the actual
2587	title.
2588
2589	@item
2590	TERMINATION
2591
2592	You may not copy, modify, sublicense, or distribute the Document except
2593	as expressly provided for under this License. Any other attempt to
2594	copy, modify, sublicense or distribute the Document is void, and will
2595	automatically terminate your rights under this License. However,
2596	parties who have received copies, or rights, from you under this
2597	License will not have their licenses terminated so long as such
2598	parties remain in full compliance.
2599
2600	@item
2601	FUTURE REVISIONS OF THIS LICENSE
2602
2603	The Free Software Foundation may publish new, revised versions
2604	of the GNU Free Documentation License from time to time. Such new
2605	versions will be similar in spirit to the present version, but may
2606	differ in detail to address new problems or concerns. See
2607	@uref{http://www.gnu.org/copyleft/}.
2608
2609	Each version of the License is given a distinguishing version number.
2610	If the Document specifies that a particular numbered version of this
2611	License ``or any later version'' applies to it, you have the option of
2612	following the terms and conditions either of that specified version or
2613	of any later version that has been published (not as a draft) by the
2614	Free Software Foundation. If the Document does not specify a version
2615	number of this License, you may choose any version ever published (not
2616	as a draft) by the Free Software Foundation.
2617	@end enumerate
2618
2619	@page
2620	@heading ADDENDUM: How to use this License for your documents
2621
2622	To use this License in a document you have written, include a copy of
2623	the License in the document and put the following copyright and
2624	license notices just after the title page:
2625
2626	@smallexample
2627	@group
2628	Copyright (C) @var{year} @var{your name}.
2629	Permission is granted to copy, distribute and/or modify this document
2630	under the terms of the GNU Free Documentation License, Version 1.2
2631	or any later version published by the Free Software Foundation;
2632	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
2633	Texts. A copy of the license is included in the section entitled ``GNU
2634	Free Documentation License''.
2635	@end group
2636	@end smallexample
2637
2638	If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
2639	replace the ``with@dots{}Texts.'' line with this:
2640
2641	@smallexample
2642	@group
2643	with the Invariant Sections being @var{list their titles}, with
2644	the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
2645	being @var{list}.
2646	@end group
2647	@end smallexample
2648
2649	If you have Invariant Sections without Cover Texts, or some other
2650	combination of the three, merge those two alternatives to suit the
2651	situation.
2652
2653	If your document contains nontrivial examples of program code, we
2654	recommend releasing these examples in parallel under your choice of
2655	free software license, such as the GNU General Public License,
2656	to permit their use in free software.
2657
2658	@c Local Variables:
2659	@c ispell-local-pdict: "ispell-dict"
2660	@c End:
2661
2662
2663	@c ---------------------------------------------------------------------
2664	@c ---------------------------------------------------------------------
2665
2666	@node Reporting bugs
2667	@chapter Reporting bugs
2668
2669	Report bugs to <obrebski@@amu.edu.pl>.
2670
2671	@c ---------------------------------------------------------------------
2672	@c ---------------------------------------------------------------------
2673
2674	@c @node Copyright
2675	@c @chapter Copyright
2676	@c
2677	@c Copyright 2004 by Tomasz Obrebski
2678	@c This software is free for research and educational use.
2679
2680	@c ---------------------------------------------------------------------
2681	@c ---------------------------------------------------------------------
2682
2683	@node Author
2684	@chapter Author
2685
2686
2687	@bye

Note: See TracBrowser for help on using the repository browser.

UAM Text Tools

Context Navigation

source: app/doc/utt.texinfo @ a5fdde9

Download in other formats: