Context Navigation

utt.texinfo @ 9ace5d2

help

Last change on this file since 9ace5d2 was 9ace5d2, checked in by obrebski <obrebski@…>, 17 years ago

trochę zmian

M app/doc/utt.texinfo
M app/src/dgp/sgraph.hh
M app/src/dgp/const.hh
M app/src/dgp/grammar.hh
M app/src/dgp/thesymbols.hh
M app/src/dgp/dgc
M app/src/dgp/sgraph.cc
M app/src/dgp/grammar.cc

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@63 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

Property mode set to 100644

File size: 82.6 KB

Line
1
2	\input texinfo @c --texinfo--
3	@c @documentencoding ISO-8859-2
4	@documentencoding UTF-8
5	@c @documentlanguage pl
6
7	@c %**start of header
8	@setfilename utt.info
9	@settitle UAM Text Tools v0.90
10	@c %**end of header
11
12	@copying
13	This manual is for UAM Text Tools (version 0.90, October, 2008)
14
15	Copyright @copyright{} 2005, 2007 Tomasz ObrÄbski, MichaÅ Stolarski, Justyna Walkowska, PaweÅ Konieczka.
16
17	Permission is granted to copy, distribute and/or modify this document
18	under the terms of the GNU Free Documentation License, Version 1.2 or
19	any later version published by the Free Software Foundation; with no
20	Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
21	copy of the license is included in the section entitled GNU Free
22	Documentation License,,GNU Free Documentation License.
23
24	@c @quotation
25	@c Permission is granted to ...
26	@c No permission is granted until the document is completed.
27	@c @end quotation
28	@end copying
29
30
31	@titlepage
32	@title UAM Text Tools 0.90 - User Manual
33	@subtitle edition 0.01, @today
34	@subtitle status: prescript
35	@author by Justyna Walkowska, Tomasz ObrÄbski and MichaÅ Stolarski
36	@page
37	@vskip 0pt plus 1filll
38	@insertcopying
39	@end titlepage
40
41	@contents
42
43	@c @paragraphindent none
44
45	@iftex
46	@tex
47	% \usepackage[T1]{fontenc}
48	% \usepackage[utf8]{inputenc}
49	% \usepackage{times}
50	@end tex
51
52	@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
53	@end iftex
54	@c @headings off
55	@c @everyheading LEM(1) @\| @\| LEM(1)
56	@everyfooting @today @c @\| @thispage @\|
57
58	@ifnottex
59
60	@node Top
61	@top UTT - UAM Text Tools
62
63	@insertcopying
64
65	@menu
66	* General information::
67	* UTT file format::
68	* Configuration files::
69	* UTT components::
70	* Auxiliary tools::
71	* Usage examples::
72	* PMDBF dictionary::
73	@c * Examples::
74	@c * Copyright::
75	* GNU Free Documentation License::
76	* Reporting bugs::
77	* Author::
78	@end menu
79	@end ifnottex
80
81
82	@c ----------------------------------------------------------------------
83
84	@node General information
85	@chapter General information
86
87	UAM Text Tools (UTT) is a package of language processing tools
88	developed at Adam Mickiewicz University. Its functionality includes:
89
90	@itemize @bullet
91
92	@item
93	tokenization Ã³ÅÄÅŒ
94	@item
95	dictionary-based morphological analysis
96	@item
97	heuristic morphological analysis of unknown words
98	@item
99	spelling correction Ã³ÅÄÅÄÅŒ
100	@item
101	pattern search
102	@item
103	sentence splitting
104	@item
105	generation of concordance tables
106	@end itemize
107
108	The toolkit is destined for processing of raw (not annotated)
109	unrestricted text for any conceivable purpose.
110
111	The system is organized as a collection of command-line programs, each
112	performing one operation, e.g. tokenization, lemmatization, spelling
113	correction. The components are independent one from another, the
114	unifying element being the uniform i/o file format.
115
116	The components may be combined in various ways to provide various text
117	processing services. Also new components supplied by the used may be
118	easily incorporated into the system provided that they respect the i/o
119	file format conventions.
120
121	UTT component programs does not depend on any specific tagset or
122	morphological description format.
123
124	UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
125	the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
126
127	The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use.
128
129
130	List of contributors:
131
132	@itemize
133	@item Pawel Konieczka
134	@item Tomasz ObrÄbski
135	@item MichaÅ Stolarski
136	@item Marcin Walas
137	@item Justyna Walkowska
138	@item PaweÅ WereÅski
139	@end itemize
140
141	@c ----------------------------------------------------------------------
142	@c ---------------------------------------------------------------------
143
144	@node UTT file format
145	@chapter UTT file format
146
147	A UTT file contains annotation of a text. It consists of a sequence of
148	segments. Each segment explicitly refers to a continuous piece of the
149	text and provides some information on it.
150
151	@section Segment format
152
153	A segment occupies one line of a UTT file and consists of
154	space-separated fields:
155
156
157	@quotation
158	@sp 1
159	[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
160	@sp 1
161	@end quotation
162
163	@table @var
164
165	@item @var{start}
166	Non-negative integer value indicating the position in the source text where the
167	segment starts.
168
169	@item @var{length}
170	Non-negative integer value indicating the length of the segment.
171
172	@item @var{type}
173	A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
174	@var{type} reflects the main classification of segments -
175	into words, numbers, punctuation marks, meta-text markers.
176	@xref{tok output,,tok output}, for description of automatically recognized type markers.
177
178	@item @var{form}
179	This field contains the textual form of the segment or the special
180	symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
181
182	The characters or character sequences that have special meaning in the
183	@var{form} field are enumerated below.
184
185	Characters with special meaning:
186
187	@itemize
188	@item @code{_} - space character
189	@item @code{*} - undefined contents
190	@end itemize
191
192	Escape sequences:
193
194	@itemize
195	@item @code{\n} - new line
196	@item @code{\t} - tabulation
197	@item @code{\r} - carriage return
198
199	@item @code{\_} - the @code{_} character
200	@item @code{\} - the @code{} character
201	@item @code{\\} - the @code{\} character
202
203	@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
204	@end itemize
205
206	@item @var{annotation1}
207	@item @var{annotation2}
208	@item ...
209	Annotation fields have the following format:
210
211	@var{longname} @code{:} @var{value}
212
213	or
214
215	@var{shortname} @var{value}
216
217	where @var{longname} is a string of alphanumeric characters
218	(isalnum() test), @var{shortname} - a single non-alphanumeric character
219	(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
220
221	@end table
222
223
224	Only two fields are mandatory: @var{type} and @var{form}. All other fields
225	may be absent. In the case when only one number precedes the
226	@var{type} field, it is interpreted as the @var{START} position.
227
228	If the @var{length} field is ommited, the length of the segment is the
229	length of the @var{form} field, except when the value of the
230	@var{form} field is @code{*} -- in this case, the length is assumed to
231	be 0.
232
233	If the @var{start} field is also absent, the segment is assumed to directly
234	follow the preceding one.
235
236	@c Conventions:
237
238	@c Annotation fields with predefined meaning:
239
240	@c @itemize
241	@c @item @code{!} - UTT components are allowed to modify the contents of
242	@c the @var{form} field (e.g. spelling correction does this). If this happens the
243	@c original form of the segment have to be placed in the @code{!}-field.
244	@c @item @code{@@} - morphological description
245	@c @item @code{=} - node identifier assignment (used in graph encoding)
246	@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
247	@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
248	@c @end itemize
249
250	Segments of length 0 may be used to mark file positions with some
251	information. See e.g. BOS and EOS (beginning/end of sentence) markers
252	in the example below.
253
254	Example:
255
256	sentence: @samp{Piszemy dobre progrumy.}
257
258	@example
259	0000 00 BOS *
260	0000 07 W Piszemy lem:pisaÄ,V
261	0007 01 S _
262	0008 05 W dobre lem:dobry,ADJ
263	0013 01 S _
264	0014 08 W progrumy cor:programy lem:program,N
265	0022 01 P .
266	0023 00 EOS *
267	0023 01 S _
268	0024 00 BOS *
269	0024 11 W Warszawiacy lem:Warszawiak,N
270	0035 01 S _
271	0036 03 W teÅŒ
272	0039 01 P .
273	0040 00 EOS *
274
275	@end example
276
277	@example
278	0000 BOS *
279	0000 W Piszemy lem:pisaÄ,V
280	0007 S _
281	0008 W dobre lem:dobry,ADJ
282	0013 S _
283	0014 W progrumy cor:programy lem:program,N
284	0022 P .
285	0023 EOS *
286	@end example
287
288	Posion information may be provided only for some types of segments:
289
290	@example
291	0000 BOS *
292	W Piszemy lem:pisaÄÂ,V
293	S _
294	W dobre lem:dobry,ADJ
295	S _
296	W progrumy cor:programy lem:program,N
297	P .
298	EOS *
299	S _
300	0024 BOS *
301	W Warszawiacy lem:Warszawiak,N
302	S _
303	W teÅŒ
304	P .
305	EOS *
306	@end example
307
308	Position/length information may be provided only when necessary:
309
310	@example
311	0000 04 N *
312	0000 N 12
313	P .
314	N 5
315	S _
316	W km
317	@end example
318
319	@section UTT File
320
321	A UTT file consists of a sequence of segments. The same text position
322	may be covered by multiple segments. In cosequence, ambiguous text
323	segmentation and ambiguous annotation may be represented.
324
325	There are two structural requirements a valid UTT-formatted file
326	has to meet:
327
328	@itemize @bullet
329
330	@item
331	segments have to be sorted with respect to the @var{position} field,
332
333	@item
334	for each
335	segment ending at position @var{n}, either there must be a segment starting at
336	position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
337	for each segment starting at position @var{n}, either there must be a segment
338	ending at position @var{n-1}, or the position @var{n-1} must not be covered
339	by any segment.
340
341	@end itemize
342
343	A valid annotation for the text fragment
344	@example
345	12.5 km
346	@end example
347
348	may be
349
350	@example
351	0000 02 N 12
352	0000 04 N 12.5
353	0002 01 P .
354	0003 01 N 5
355	0004 01 S _
356	0005 02 W km
357	@end example
358
359	but not
360
361	@example
362	0000 02 N 12
363	0000 04 N 12.5
364	0004 01 S _
365	0005 02 W km
366	@end example
367
368	because in the latter example the first segment (starting at position
369	0000, 2 characters long) ends at position @var{n}=0001 which is
370	covered by the second segment and no segment starts at position
371	@var{n+2}=0002.
372
373
374	@section Flattened UTT file
375
376	A UTT file format has two variants: regular and flattened. The regular
377	format was described above. In the flattened format some of the
378	end-of-line characters are replaced with line-feed characters.
379
380	The flatten format is basically used to represent whole sentences as
381	single lines of the input file (all intrasentential end-of-line
382	characters are replaced with line-feed characters).
383
384	This technical trick permits to perform certain text
385	processing operations on entire sentences with the use of such tools as
386	@command{grep} (see @command{grp} component) or @command{sed} (see @command{mar} component).
387
388	The conversion between the two formats is performed by the tools:
389	@command{fla} and @command{unfla}.
390
391	@section Character encoding
392
393	The UTT component programs accept only 1-byte character encoding, such
394	as ISO, ANSI, DOS.
395
396
397	@c @section Formats
398
399	@c @unnumberedsubsubsec Basic format
400
401	@c While processing large amounts of the overhead related with explicit
402	@c ... of the start position and segment length becomes ... . Therefore,
403	@c for efficiency reasons certain shortcuts are possible:
404
405	@c @unnumberedsubsubsec Relative start position
406
407	@c Start position may be given as relative distance from the last
408	@c absolut position.
409
410	@c @unnumberedsubsubsec Absent length
411
412	@c Segment length may by omitted. Normally it can be restored by counting
413	@c the length of the @emph{form field}. For segments with the special value
414	@c @code{*} in the @emph{form field} length 0 is assumed.
415
416	@c @unnumberedsubsubsec Absent length and start position
417
418	@c Both start position and segment length may be omitted. In this format
419	@c each segment is assumed to follow the previous one. This format is,
420	@c therefore, suitable only for unambiguously tagged text
421	@c (0-length markers can be still used.)
422
423
424	@c @table @code
425	@c @item AL
426	@c @code{1234 03 W kot}
427	@c @item RL
428	@c @code{+56 03 W kot}
429	@c @item A
430	@c @code{1234 W kot}
431	@c @item R
432	@c @code{+56 W kot}
433	@c @item 0
434	@c @code{W kot}
435	@c @end table
436
437
438	@c [JAK UZYSKAÄÂ POLSKIE CZCIONKI W DVI???]
439
440	@macro parhelp
441	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
442	Print help.
443	@end macro
444
445
446	@macro parversion
447	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
448	Print version information.
449	@end macro
450
451	@macro parinteractive
452	@item @b{@minus{}@minus{}interactive, @minus{}i}
453	This option toggles interactive mode, which is by default off. In the
454	interactive mode the program does not buffer the output.
455	@end macro
456
457
458	@c @macro parfile
459	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
460	@c Input file name.
461	@c If this option is absent or equal to '@minus{}', the program
462	@c reads from the standard input.
463	@c @end macro
464
465
466	@c @macro paroutput
467	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
468	@c Regular output file name. To regular output the program sends segments
469	@c which it successfully processed and copies those which were not
470	@c subject to processing. If this option is absent or equal to
471	@c '@minus{}', standard output is used.
472	@c @end macro
473
474	@c @macro parfail
475	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
476	@c Fail output file name. To fail output the program copies the segments
477	@c it failed to process. If this option is absent or equal to
478	@c '@minus{}', standard output is used.
479	@c @end macro
480
481
482	@c @macro parcopy
483	@c @item @b{@minus{}@minus{}copy, @minus{}c}
484	@c Copy succesfully processed segments to regular output also in their
485	@c original input form.
486	@c @end macro
487
488
489	@macro parinputfield
490	@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
491	The field containing the input to the program. The default is the
492	@var{form} field. The fields @var{position}, @var{length}, @var{type},
493	and @var{form} are referred to as @code{1}, @code{2}, @code{3},
494	@code{4}, respectively.
495	@end macro
496
497
498	@macro paroutputfield
499	@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
500	The name of the field added by the program. The default is the name of the program.
501	@end macro
502
503
504	@macro pardictionary
505	@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
506	Dictionary file name.
507	@end macro
508
509
510	@macro parprocess
511	@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
512	Process segments with the specified value in the @var{type} field.
513	Multiple occurences of this option are allowed and are interpreted as
514	disjunction. If this option is absent, all segments are processed.
515	@end macro
516
517
518	@macro parselect
519	@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
520	Select for processing only segments in which the field named
521	@var{fieldname} is present. Multiple occurences of this option are
522	allowed and are interpreted as conjunction of conditions. If this
523	option is absent, all segments are processed.
524	@end macro
525
526
527	@macro parunselect
528	@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
529	Select for processing only segments in which the field @var{fieldname}
530	is absent. Multiple occurences of this option are allowed and are
531	interpreted as conjunction of conditions. If this option is absent,
532	all segments are processed.
533	@end macro
534
535
536	@macro paroneline
537	@item @b{@minus{}@minus{}one-line}
538	This option makes the program print ambiguous annotation in one output
539	line by generating multiple annotation fields. By default when
540	ambiguous annotation may be produced for a segment, the segment is
541	multiplicated and each of the annotations is added to separate copy of
542	the segment.
543	@end macro
544
545
546	@macro paronefield
547	@item @b{@minus{}@minus{}one-field, @minus{}1}
548	This option makes the program print ambiguous annotation in one
549	annotation field. By default when ambiguous annotation may be produced
550	for a segment, the segment is multiplicated and each of the
551	annotations is added to separate copy of the segment.
552
553	This option is useful when working with @command{kot} or @command{con}.
554	@end macro
555
556
557	@c ---------------------------------------------------------------------
558	@c CONFIGURATION FILES
559	@c ---------------------------------------------------------------------
560
561	@node Configuration files
562	@chapter Configuration files
563
564	Values for all command line options accepted by a component
565	may be set in configuration files. The default location of the
566	configuration files for a component named @command{@var{program}} are
567
568	@example
569	@file{/usr/local/etc/utt/@var{program}.conf}
570	@end example
571
572	for system-wide configuration file and
573
574	@example
575	@file{~/.utt/@var{program}.conf}
576	@end example
577
578	for user configuration file.
579
580	@c The configuration file to load may be also specified with the
581	@c @option{--config} option. Configuration file need not be provided.
582
583	For each option, the value is set according to the following priority:
584
585	@itemize
586	@item command line
587	@c @item configuration file indicated with @option{--config} option
588	@item user configuration file (or configuration file indicated with the @option{--config} option)
589	@item system-wide configuration file
590	@end itemize
591
592	Parameter values are specified in the following format:
593
594	@var{parametername}=@var{value}
595
596	where @var{parametername} is the short or long name of an option accepted by
597	the program, or
598
599	@var{parametername}
600
601	if the option does not need arguments.
602
603	You can introduce comments to configuration files using the # sign.
604
605	If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
606
607	@c The equal sign may be omitted.
608
609
610	@quotation Tip
611	If you have two (or more) frequently used sets of options for the same
612	program (eg. lem with PMDBF dictionary and lem with a user dictionary)
613	a good solution is to create two soft links to lem, called
614	eg. lemg and lemu and specify their configuration in files lemg.conf
615	and lemu.conf respectively.
616	@end quotation
617
618	@c ---------------------------------------------------------------------
619	@c COMPONENTS
620	@c ---------------------------------------------------------------------
621
622	@node UTT components
623	@chapter UTT components
624
625	UTT components are of three types:
626
627	@menu
628	Sources: programs which read non-UTT data (e.g. raw text) and produce output
629	in UTT format
630	* tok:: a tokenizer
631
632	Filters: programs which read and produce UTT-formatted data
633	* lem:: a morphological analyzer
634	* gue:: a morphological guesser
635	* cor:: a simple spelling corrector
636	* kor:: a more elaborated spelling corrector
637	* sen:: a sentensizer
638	* ser:: a pattern search tool (marks matches)
639	* mar:: a pattern search tool (introduces arbitrary markers into the text)
640	* grp:: a pattern search tool (selects sentences containing a match)
641	@c * gph:: a word-graph annotation tool::
642	@c * dgp:: a dependency parser
643
644	Sinks: programs which read UTT data and produce output in another format
645	* kot:: an untokenizer
646	* con:: a concordance table generator
647	@end menu
648
649	@c ---------------------------------------------------------------------
650	@c TOK
651	@c ---------------------------------------------------------------------
652
653	@page
654	@node tok
655	@section tok - a tokenizer
656
657	@c ----------------------------------------
658
659	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
660	@item @strong{Authors:} @tab Tomasz ObrÄbski
661	@item @strong{Component category:} @tab source
662	@item @strong{Input format:} @tab raw text file
663	@item @strong{Output format:} @tab UTT regular
664	@item @strong{Required annotation:} @tab -
665	@end multitable
666
667
668	@menu
669	* tok description::
670	* tok input::
671	* tok output::
672	* tok command line options::
673	* tok example::
674	@end menu
675
676	@node tok description
677	@subsection Description
678
679	@code{tok} is a simple program which reads a text file and identifies
680	tokens on the basis of their orthographic form. The type of the token
681	is printed as the @var{type} field.
682
683	@node tok input
684	@subsection Input
685
686	Raw text.
687
688	@node tok output
689	@subsection Output
690
691	UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
692
693	@itemize
694
695	@item @code{W}
696	(word)
697	- continuous sequence of letters
698
699	@item @code{N}
700	(number)
701	- continuous sequence of digits
702
703	@item @code{S}
704	(space)
705	- continuous sequence of space characters
706
707	@item @code{P}
708	(punctuation mark)
709	- single printable characters not belonging to any of the other classes
710
711	@item @code{B}
712	(unprintable character)
713	- single unprintable character
714
715	@end itemize
716
717
718
719	@node tok command line options
720	@subsection Command line options
721
722	@table @code
723
724	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
725	Print help.
726
727	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
728	Print version information.
729
730	@item @b{@minus{}@minus{}interactive, @minus{}i}
731	This option toggles interactive mode, which is by default off. In the
732	interactive mode the program does not buffer the output.
733
734	@end table
735
736	@node tok example
737	@subsection Example
738
739	Input:
740
741	@example
742	Piszemy dobre programy.
743	@end example
744
745	Output:
746
747	@example
748	0000 07 W Piszemy
749	0007 01 S _
750	0008 05 W dobre
751	0013 01 S _
752	0014 08 W programy
753	0022 01 P .
754	0023 01 S \n
755	@end example
756
757
758	@c ---------------------------------------------------------------------
759	@c SEN
760	@c ---------------------------------------------------------------------
761
762	@c @node sen - sentencizer
763	@c @chapter sen - sentencizer
764
765	@c Authors: Tomasz ObrÄbski
766
767	@c ---------------------------------------------------------------------
768	@c LEM
769	@c ---------------------------------------------------------------------
770
771	@page
772	@node lem
773	@section lem - morphological analyzer
774
775	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
776	@item @strong{Authors:} @tab Tomasz ObrÄbski, MichaÅ Stolarski
777	@item @strong{Component category:} @tab filter
778	@item @strong{Input format:} @tab UTT regular
779	@item @strong{Output format:} @tab UTT regular
780	@item @strong{Required annotation:} @tab tok
781	@end multitable
782
783	@menu
784	* lem description::
785	* lem command line options::
786	* lem input::
787	* lem output::
788	* lem example::
789	* lem dictionaries::
790	* lem hints::
791	@end menu
792
793	@node lem description
794	@subsection Description
795
796	@command{lem} performs morphological analysis of a simple orthographic
797	word, returning all its possible morphological annotations,
798	disregarding the context.
799
800	@c ----------------------------------------
801
802	@node lem command line options
803	@subsection Command line options
804
805	@table @code
806	@parhelp
807	@parversion
808	@parinteractive
809	@c @parfile
810	@c @paroutput
811	@c @parfail
812	@c @parcopy
813	@parinputfield
814	@paroutputfield
815	@pardictionary
816	@parprocess
817	@parselect
818	@parunselect
819	@paroneline
820	@paronefield
821	@end table
822
823	@c ----------------------------------------
824
825	@node lem input
826	@subsection Input
827
828	Lem reads a UTT file and processes the value of the @var{form} field
829	(the input field may be changed with @option{--input-field} option).
830
831	@node lem output
832	@subsection Output
833
834	@command{lem} adds a new annotation field, whose default name is @code{lem}. In
835	case of ambiguity either the segment is multiplicated (default),
836	multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
837	annotation is produced as the value of single @code{lem} field (option
838	@option{--one-field,-1}):
839
840	@itemize @bullet
841
842	@item
843	unambiguous value format:
844
845	@example
846	<lemma>,<descr>
847	@end example
848
849	@item
850	ambiguous value format (@option{--one-field} option)
851
852
853	@example
854	<lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
855	@end example
856
857	(alternative descriptions for the same lemma are separated by commas,
858	alternative lemmata are separated by semicolons.)
859
860	@end itemize
861
862	@node lem example
863	@subsection Example
864
865	Input:
866
867	@example
868	0000 07 W Piszemy
869	0007 01 S _
870	0008 05 W dobre
871	0013 01 S _
872	0014 08 W programy
873	0022 01 P .
874	0023 01 B \n
875	@end example
876
877	Output (default):
878
879	@example
880	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
881	0007 01 B _
882	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
883	0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
884	0013 01 B _
885	0014 08 W programy lem:program,N/GiNpCa
886	0014 08 W programy lem:program,N/GiNpCn
887	0014 08 W programy lem:program,N/GiNpCv
888	0022 01 P .
889	0023 01 B \n
890	@end example
891
892	Output (@option{--one-line} option):
893
894	@example
895	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
896	0007 01 S _
897	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
898	0013 01 S _
899	0014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
900	0022 01 P .
901	0023 01 S \n
902	@end example
903
904	Output (@option{--one-field} option):
905
906	@example
907	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
908	0007 01 S _
909	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
910	0013 01 S _
911	0014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
912	0022 01 P .
913	0023 01 S \n
914	@end example
915
916	@c ----------------------------------------
917
918	@node lem dictionaries
919	@subsection Dictionaries
920
921	@command{lem} requires a dictionary. The dictionary may be provided in
922	one of two formats: in text (source) format or in binary (fsa) format.
923
924	@subsubheading Text format
925
926	Dictionary entries have the following structure:
927
928	@example
929	<form>;<lemma>,<descr>[;<lemma>,<descr>]
930	@end example
931
932	@var{lemma} may be given explicitly or in the cut-add format:
933
934	@example
935	@code{[<cut1><add1>-]<cut2><add2>}
936	@end example
937
938	meaning: replace prefix of length @code{<cut1>} with
939	string @code{<add1>}, replace suffix of length @code{<cut2>} with string
940	@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
941	@samp{kot}, @code{3-4aÃÅy} transforms @samp{najbielsi} into @samp{biaÃÅy}
942
943	Each dictionary entry must be written in one line and must not contain blank characters.
944
945	Examples:
946	@example
947	kot;0,N/GaNsCn
948	kota;1,N/GaNsCg;1,N/GaNsCa
949	kotu;1,N/GaNsCd
950	kotem;2,N/GaNsCi
951	kocie;3t,N/GaNsCl;3t,N/GaNsCv
952	najbielsi;3-4aÅy,ADJ/DsNpCnGp
953	najbielsze;3-5aÅy,ADJ/DsNpCnGaifn
954	najlepsi;dobry,ADJ/DsNpCnGp
955	najlepsze;dobry,ADJ/DsNpCnGaifn
956	@end example
957
958
959	The mandatory file name extension for a text dictionary is @code{dic}. For large
960	dictionaries it is preferable, however, to compile them into binary
961	(fsa) format.
962
963	@subsubheading Binary format
964
965	The mandatory file name extension for a binary dictionary is @code{bin}. To
966	compile a text dictionary into binary format, write:
967
968	@example
969	compiledic <dictionaryname>.dic
970	@end example
971
972	@subsubheading Polex/PMDBF dictionary
973
974	A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
975	the distribution as the default @emph{lem}'s dictionary. It's
976	located by default in:
977
978	@file{$HOME/.local/share/utt/pl_PL.ISO-8859-2/lem.bin}
979
980	in local installation or in
981
982	@file{/usr/local/share/utt/pl_PL.ISO-8859-2/lem.bin}
983
984	in system installation.
985
986	@node lem hints
987	@subsection Hints
988
989	@subsubheading Combining data from multiple dictionaries
990
991	@itemize
992
993	@item Apply <dict1>, then apply <dict2> to words which were not annotatated.
994
995	@example
996	lem -d <dict1> \| lem -S lem -d <dict2>
997	@end example
998
999	@item Add annotations from two dictionaries <dict1> and <dict2>.
1000
1001	@example
1002	lem -c -d <dict1> \| lem -S lem -d <dict2>
1003	@end example
1004
1005	@end itemize
1006
1007
1008	@c ---------------------------------------------------------------------
1009	@c GUE
1010	@c ---------------------------------------------------------------------
1011
1012	@page
1013	@node gue
1014	@section gue - morphological guesser
1015
1016	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1017
1018	@item @strong{Authors:} @tab MichaÅ Stolarski, Tomasz ObrÄbski
1019	@item @strong{Component category:} @tab filter
1020
1021	@end multitable
1022
1023	@menu
1024	* gue description::
1025	* gue command line options::
1026	* gue example::
1027	* gue dictionaries::
1028	@end menu
1029
1030
1031	@node gue description
1032	@subsection Description
1033
1034	@command{gue} guesess morphological descriptions of the form contained
1035	in the @var{form} field.
1036
1037
1038	@node gue command line options
1039	@subsection Command line options
1040
1041	@table @code
1042
1043	@parhelp
1044	@parversion
1045	@parinteractive
1046	@c @parfile
1047	@c @paroutput
1048	@c @parfail
1049	@c @parcopy
1050	@parinputfield
1051	@paroutputfield
1052	@pardictionary
1053	@parprocess
1054	@parselect
1055	@parunselect
1056	@paroneline
1057	@paronefield
1058
1059	@item @b{@minus{}@minus{}delta=@var{n}}
1060	Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
1061
1062
1063	@item @b{@minus{}@minus{}cut-off=@var{n}}
1064	Do not display answers with less weight than cut-off value (default=`200').
1065
1066
1067	@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
1068	Guess up to n descriptions (default=`0', which means 'display all results').
1069
1070
1071
1072	@end table
1073
1074	@node gue example
1075	@subsection Example
1076
1077	@example
1078	command: gue -n 2
1079
1080	input:
1081	0000 07 W smerfny
1082
1083	output:
1084	0000 07 W smerfny gue:,ADJ/CaDpGiNs
1085	0000 07 W smerfny gue:,ADJ/CnvDpGaipNs
1086	@end example
1087
1088
1089	@node gue dictionaries
1090	@subsection Dictionaries
1091
1092	@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
1093	The fsa format is created by compiling text-format dictionaries.
1094
1095
1096
1097	@subsubheading Text format
1098
1099	Dictionary entries have the following structure:
1100
1101	@example
1102	@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
1103	@end example
1104
1105	@var{lemma} must be given in the cut-add format:
1106
1107	@example
1108	@code{[<cut1><add1>-]<cut2><add2>}
1109	@end example
1110	(no spaces in between): replace prefix of length @var{cut1} with
1111	string @var{add1}, replace suffix of length @var{cat2} with string
1112	@var{add2}.
1113
1114
1115	Example: @code{3-4aÅy} transforms @i{najbielsi} into @i{biaÅy}
1116
1117
1118	@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
1119
1120	@var{weight} is an integer value between 1 and 999 indicating the
1121	likelihood of the guess.
1122
1123	@c @example
1124	@c *ÅkÄ;1a,N/GfNsCa
1125	@c naj*elszy;3-4aÅy,ADJ/...:...
1126	@c @end example
1127
1128
1129	@c ---------------------------------------------------------------------
1130	@c COR
1131	@c ---------------------------------------------------------------------
1132
1133	@page
1134	@node cor
1135	@section cor - spelling corrector
1136
1137	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1138	@item @strong{Authors:} @tab Tomasz ObrÄbski, MichaÅ Stolarski
1139	@item @strong{Component category:} @tab filter
1140	@item @strong{Input format:} @tab UTT regular
1141	@item @strong{Output format:} @tab UTT regular
1142	@item @strong{Required annotation:} @tab tok
1143	@end multitable
1144
1145	@menu
1146	* cor description::
1147	* cor command line options::
1148	* cor dictionaries::
1149	@end menu
1150
1151
1152	@node cor description
1153	@subsection Description
1154
1155	The spelling corrector applies Kemal Oflazer's dynamic programming
1156	algorithm @cite{oflazer96} to the FSA representation of the set of
1157	word forms of the Polex/PMDBF dictionary. Given an incorrect
1158	word form it returns all word forms present in the dictionary whose
1159	edit distance is smaller than the threshold given as the parameter.
1160
1161
1162	@node cor command line options
1163	@subsection Command line options
1164
1165	@table @code
1166
1167	@parhelp
1168	@parversion
1169	@parinteractive
1170	@c @parfile
1171	@c @paroutput
1172	@c @parfail
1173	@c @parcopy
1174	@parinputfield
1175	@paroutputfield
1176	@pardictionary
1177	@parprocess
1178	@parselect
1179	@parunselect
1180	@paroneline
1181	@paronefield
1182
1183	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
1184	Maximum edit distance (default='1').
1185
1186	@c @item @b{@minus{}@minus{}replace, @minus{}r}
1187	@c Replace original form with corrected form, place original form in the
1188	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
1189
1190
1191	@end table
1192
1193	@node cor dictionaries
1194	@subsection Dictionaries
1195
1196	@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
1197	The fsa format is created by compiling text-format dictionaries.
1198
1199	@subsubheading Text format
1200
1201	The @command{cor} dictionary is a list of words:
1202	@example
1203	odlot
1204	odlotowy
1205	odludek
1206	@end example
1207
1208	@subsubheading Binary format
1209
1210	The mandatory file name extension for a binary dictionary is @code{bin}. To
1211	compile a text dictionary into binary format, write:
1212
1213	@example
1214	compiledic <dictionaryname>.dic
1215	@end example
1216
1217	@c ---------------------------------------------------------------------
1218	@c KOR
1219	@c ---------------------------------------------------------------------
1220
1221	@page
1222	@node kor
1223	@section kor - configurable spelling corrector
1224
1225	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1226	@item @strong{Authors:} @tab PaweÅ Werenski, Tomasz ObrÄbski, MichaÅ Stolarski
1227	@item @strong{Component category:} @tab filter
1228	@item @strong{Input format:} @tab UTT regular
1229	@item @strong{Output format:} @tab UTT regular
1230	@item @strong{Required annotation:} @tab tok
1231	@end multitable
1232
1233	@menu
1234	* kor description::
1235	* kor command line options::
1236	* kor weights definition file::
1237	* kor dictionaries::
1238	@end menu
1239
1240
1241	@node kor description
1242	@subsection Description
1243
1244	The spelling corrector applies a Pawel Werenski's dynamic programming
1245	algorithm to the FSA representation of the set of word forms of the
1246	Polex/PMDBF dictionary. The algorithm is an extension of K. Oflazer
1247	algorithm used by @command{cor}. In the extended version it is
1248	possible to assign weights to individual edit operations.
1249
1250	Given an incorrect word form it returns all word forms
1251	present in the dictionary whose edit distance is smaller than the
1252	threshold given as the parameter.
1253
1254
1255	@node kor command line options
1256	@subsection Command line options
1257
1258	@table @code
1259
1260	@parhelp
1261	@parversion
1262	@parinteractive
1263	@c @parfile
1264	@c @paroutput
1265	@c @parfail
1266	@c @parcopy
1267	@parinputfield
1268	@paroutputfield
1269	@pardictionary
1270	@parprocess
1271	@parselect
1272	@parunselect
1273	@paroneline
1274	@paronefield
1275
1276	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
1277	Maximum edit distance (default='1').
1278
1279	@item @b{@minus{}@minus{}weights=@var{filename}, @minus{}w @var{filename}}
1280	Edit operations' weights file.
1281
1282	@c @item @b{@minus{}@minus{}replace, @minus{}r}
1283	@c Replace original form with corrected form, place original form in the
1284	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
1285
1286
1287	@end table
1288
1289
1290	@node kor weights definition file
1291	@subsection Weights definition file
1292
1293	Example:
1294
1295	@example
1296
1297	%stdcor 1
1298	%xchg 1
1299	ÅŒ rz 0.5
1300	ch h 0.5
1301	u Ã³ 0.5
1302
1303	@end example
1304
1305
1306	Default weight is set to 1 (@code{%stdcor 1}), the weight of exchange
1307	operation is set to 1 (@code{%xchg 1}), the three principal orthographic
1308	errors are assigned the weight 0.5.
1309
1310	The edit operation weight declaration, such as
1311
1312	@example
1313	ÅŒ rz 0.5
1314	@end example
1315
1316	works in both ways, i.e. ÅŒ->rz, rz->ÅŒ.
1317
1318	The default weights definition file for @code{kor} is:
1319
1320	@example
1321	$HOME/.local/share/utt/weights.kor
1322	@end example
1323
1324	or, if the above mentioned file is absent:
1325
1326	@example
1327	/usr/local/share/utt/weights.kor
1328	@end example
1329
1330
1331	@node kor dictionaries
1332	@subsection Dictionaries
1333
1334	see @command{cor}
1335
1336	@c ---------------------------------------------------------------------
1337	@c SEN
1338	@c ---------------------------------------------------------------------
1339
1340	@page
1341	@node sen
1342	@section sen - a sentensizer
1343
1344	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1345
1346	@item @strong{Authors:} @tab Tomasz ObrÄbski
1347	@item @strong{Component category:} @tab filter
1348	@item @strong{Input format:} @tab UTT regular
1349	@item @strong{Output format:} @tab UTT regular
1350	@item @strong{Required annotation:} @tab tok
1351
1352	@end multitable
1353
1354
1355	@menu
1356	* sen description::
1357	@c * sen input::
1358	@c * sen output::
1359	* sen example::
1360	@end menu
1361
1362	@node sen description
1363	@subsection Description
1364
1365	@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
1366
1367	@node sen example
1368	@subsection Example
1369
1370	@example
1371	command: sen
1372
1373	input:
1374	0000 05 W CzeÅÄ
1375	0005 01 P !
1376	0006 01 S _
1377	0007 02 W To
1378	0009 01 S _
1379	0010 02 W ja
1380	0012 01 P .
1381	0013 01 S \n
1382
1383	output:
1384	0000 00 BOS *
1385	0000 05 W CzeÅÄ
1386	0005 01 P !
1387	0006 00 EOS *
1388	0006 00 BOS *
1389	0006 01 S _
1390	0007 02 W To
1391	0009 01 S _
1392	0010 02 W ja
1393	0012 01 P .
1394	0013 01 S \n
1395	0014 00 EOS *
1396	@end example
1397
1398
1399	@c ---------------------------------------------------------------------
1400	@c GPH
1401	@c ---------------------------------------------------------------------
1402
1403	@c @node gph - graphizer
1404	@c @chapter gph - graphizer
1405
1406	@c Authors: Tomasz ObrÄbski
1407
1408
1409
1410	@c ---------------------------------------------------------------------
1411	@c SER
1412	@c ---------------------------------------------------------------------
1413
1414	@page
1415	@node ser
1416	@section ser - pattern search tool
1417
1418	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1419	@item @strong{Authors:} @tab Tomasz ObrÄbski
1420	@item @strong{Component category:} @tab filter
1421	@item @strong{Input format:} @tab UTT regular
1422	@item @strong{Output format:} @tab UTT regular
1423	@item @strong{Required annotation:} @tab tok, lem --one-field
1424	@end multitable
1425
1426	@menu
1427	* ser description::
1428	* ser command line options::
1429	* ser pattern::
1430	* ser how ser works::
1431	* ser customization::
1432	* ser limitations::
1433	* ser requirements::
1434	@end menu
1435
1436
1437	@node ser description
1438	@subsection Description
1439
1440	@command{ser} looks for patterns in UTT-formatted texts.
1441
1442
1443	@c ---------------------------------------------------------------------
1444	@node ser command line options
1445	@subsection Command line options
1446
1447	@table @code
1448
1449	@parhelp
1450	@parversion
1451	@c @parfile
1452	@c @paroutput
1453	@c @parinputfield
1454	@c @paroutputfield
1455	@parprocess
1456	@parinteractive
1457
1458	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1459	The search pattern.
1460
1461	@item @b{@minus{}@minus{}morph=@var{field}}
1462	The name of the annotation field containing the morphological
1463	description (default @code{lem}).
1464
1465	@item @b{@minus{}@minus{}flex}
1466	Only print the generated flex source code.
1467
1468	@item @b{@minus{}@minus{}macro=@var{filename}}
1469	Read macrodefinitions from file @var{filename} rather than from
1470	default location. This option allows to redefine the set of terms.
1471
1472	@item @b{@minus{}@minus{}define=@var{filename}}
1473	Append macrodefinitions from file @var{filename}. This option
1474	allows to extend the set of terms.
1475
1476	@end table
1477
1478
1479	@c ---------------------------------------------------------------------
1480	@node ser pattern
1481	@subsection Pattern
1482
1483	The @command{ser} pattern is a regular expression over terms corresponding
1484	to text segments or segment sequences. Predefined terms are:
1485
1486	@table @code
1487
1488	@item seg(@var{t},@var{f},@var{a})
1489	a segment of type @var{t}, containing form @var{f} and annotation
1490	@var{a}
1491
1492	@item form(@var{f})
1493	a segment containing form @var{f}
1494
1495	@item field(@var{f})
1496	a segment containing annotation field @var{f}
1497
1498	@item space(@var{f})
1499	a space segment of form @var{f}
1500
1501	@item word(@var{f})
1502	a word segment of form @var{f}
1503
1504	@item punct(@var{f})
1505	a punct segment of form @var{f}
1506
1507	@item number(@var{f})
1508	a number segment of form @var{f}
1509
1510	@item lexeme(@var{f})
1511	a word segment with lemma @var{f}
1512
1513	@item cat(@var{c})
1514	a word segment of category @var{c}
1515
1516	@end table
1517
1518	All arguments are optional. If an argument is omitted, an arbitrary
1519	string of non-blank characters is assumed as the argument value. Term
1520	arguments may be arbitrary character-level regular expressions. The
1521	following special symbols can by used:
1522
1523	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1524	@item @code{[@dots{}]} @tab a character class
1525	@item @code{[^@dots{}]} @tab a negated character class
1526	@item @code{\|} @tab alternative
1527	@item @code{*} @tab repetition, including zero times
1528	@item @code{+} @tab repetition, at least one time
1529	@item @code{?} @tab optionality
1530	@item @code{@{@var{m},@var{n}@}} @tab repetition from @var{m} to @var{n} times
1531	@item @code{@{@var{m},@}} @tab repetition @var{m} or more times
1532	@item @code{@{@var{m}@}} @tab repetition @var{m} times
1533	@item @code{@var{\ddd}} @tab the character with octal value @var{ddd}
1534	@item @code{\x@var{hh}} @tab the character with hexadecimal value @var{hh}
1535	@item @code{( )} @tab parentheses, used to override precedence
1536	@c @end multitable
1537
1538	@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1539	@item @code{.} @tab a non-blank character
1540	@item @code{\w} @tab a letter
1541	@item @code{\W} @tab a non-blank character other than a letter
1542	@item @code{\d} @tab a digit
1543	@item @code{\D} @tab a non-blank character other than a digit
1544	@item @code{\s} @tab a space or tab character
1545	@item @code{\S} @tab a non-blank character (the same as @code{.})
1546	@item @code{\l} @tab a lowercase letter
1547	@item @code{\L} @tab an uppercase letter
1548	@end multitable
1549
1550
1551	@noindent The following characters:
1552	@example
1553	@verb{% [ ] ^ \| * + ? { } , . < > \ %}
1554	@end example
1555	must be escaped with a backslash, i.e. written as:
1556	@example
1557	@verb{% \[ \] \^ \\| \* \+ \? \{ \} \, \. \< \> \\ %}
1558	@end example
1559
1560	@quotation Note
1561	The special symbols are ... borrowed from Perl with minor
1562	modifications ... for convenience
1563	The meaning of certain special characters/sequences slightly differs
1564	from their common ???. This is motivated by convenience reasons.
1565	The meaning of the @code{.} special character is modified due to
1566	the special function of spaces in utt files (they are field
1567	separators). Use @code{\s} to explicitly
1568	@end quotation
1569
1570	In the argument of the @code{cat} term a special operator <...> may be
1571	used. A category specification enclosed in angle brackets matches all
1572	category descriptions which are consistent (non-contradictory) with the
1573	specification. For example @code{<N>} matches all noun descriptions,
1574	@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
1575
1576
1577	@*
1578	@noindent @b{Examples of one-segment patterns:}
1579
1580	@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1581	@item @code{seg} @tab any segment
1582	@item @code{word} @tab any word-form
1583	@item @code{word(pomocy)} @tab the word-form @samp{pomocy}
1584	@item @code{word(naj.+)} @tab a word-form beginning with @samp{naj}
1585	@item @code{word(\L\l+)} @tab a capitalized word-form
1586	@item @code{punct} @tab a punctuation character
1587	@item @code{space(.\\n.)} @tab a space segment containing a newline character
1588	@item @code{lexeme(pomoc)} @tab any form of the lexeme 'pomoc'
1589	@item @code{cat(N/.*)} @tab a word which category starts with @code{N/}
1590	@item @code{cat(<N/Ca>)} @tab a word which category matches @code{N/Ca}
1591	@end multitable
1592
1593	@*
1594	@noindent @b{Examples of multi-segment patterns:}
1595
1596	@table @code
1597
1598	@item (word(\L) punct(\.) space?)+ word(\L\l+)
1599	a sequence of initials followed by a surname
1600
1601	@item punct seg(W\|S\|N)* cat(<NPRO/Sr>) seg(W\|S\|N)* punct
1602	a text fragment between two punctuation characters, containing an
1603	ocurrence of a relative pronoun
1604
1605	@end table
1606
1607
1608	@node ser how ser works
1609	@subsection How ser works
1610
1611	@node ser customization
1612	@subsection Customization
1613
1614	@c All predefined terms correspond to single segments,
1615
1616	@example
1617	define(`verbseq', `(cat(<V>) (space cat(<V>)))')
1618	@end example
1619
1620
1621	the term @code{cat()} may not be used as a ... of
1622
1623	@c See @command{m4} manual for further details on macro definition format.
1624
1625	@node ser limitations
1626	@subsection Limitations
1627
1628	Do not use more than 3 attributes in <>.
1629
1630	@node ser requirements
1631	@subsection Requirements
1632
1633	In order to run @command{ser}, the following programs must be
1634	installed in the system:
1635
1636	@itemize
1637
1638	@item @command{m4}
1639	@item @command{grep}
1640	@item @command{flex}
1641	@item @command{gcc}
1642
1643	@end itemize
1644
1645
1646	@c ---------------------------------------------------------------------
1647	@c GRP
1648	@c ---------------------------------------------------------------------
1649
1650	@page
1651	@node grp
1652	@section grp - pattern search tool
1653
1654	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1655	@item @strong{Authors:} @tab Tomasz ObrÄbski
1656	@item @strong{Component category:} @tab filter
1657	@item @strong{Input format:} @tab UTT flattened
1658	@item @strong{Output format:} @tab UTT flattened
1659	@item @strong{Required annotation:} @tab tok, sen, lem --one-field
1660	@end multitable
1661
1662
1663	@menu
1664	* grp description::
1665	* grp command line options::
1666	* grp pattern::
1667	* grp hints::
1668	@end menu
1669
1670
1671	@node grp description
1672	@subsection Description
1673
1674	@code{gre} selects sentences containing an expression matching a
1675	pattern. The pattern format is exactly the same as that accepted by
1676	@code{ser}.
1677
1678	@code{gre} is intended mainly for speeding up corpus search process.
1679	It is extremely fast (processing speed is usually higher then the speed
1680	of reading the corpus file from disk).
1681
1682	@node grp command line options
1683	@subsection Command line options
1684
1685	@table @code
1686
1687	@parhelp
1688	@parversion
1689	@parprocess
1690	@parinteractive
1691
1692	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1693	The search pattern.
1694
1695	@item @b{@minus{}@minus{}morph=@var{field}}
1696	The name of the annotation field containing the morphological
1697	description (default @code{lem}).
1698
1699	@item @b{@minus{}@minus{}command}
1700	Only print the generated flex source code.
1701
1702	@item @b{@minus{}@minus{}macro=@var{filename}}
1703	Read macrodefinitions from file @var{filename} rather than from
1704	default location. This option allows to redefine the set of terms.
1705
1706	@item @b{@minus{}@minus{}define=@var{filename}}
1707	Append macrodefinitions from file @var{filename}. This option
1708	allows to extend the set of terms.
1709
1710	@end table
1711
1712
1713	@node grp pattern
1714	@subsection Pattern
1715
1716	(see @code{ser})
1717
1718	@node grp hints
1719	@subsection Hints
1720
1721	The corpus search speed may be increased by combining grp with lzop
1722	compression tool (grp usually processes data faster than it is read from a
1723	disk, especially for slow laptop drives).
1724
1725	@example
1726	cat corpus \| tok \| sen \| lem -1 \| fla \| lzop -7 > corpus.grp.lzo
1727	@end example
1728
1729	@example
1730	lzop -cd corpus.grp.lzo \| grp -e @var{EXPR} \| unfla \| ser -e @var{EXPR}
1731	@end example
1732
1733
1734
1735	@c ---------------------------------------------------------------------
1736	@c MAR
1737	@c ---------------------------------------------------------------------
1738
1739	@page
1740	@node mar
1741	@section mar
1742
1743	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1744	@item @strong{Authors:} @tab Marcin Walas, Tomasz ObrÄbski
1745	@item @strong{Input format:} @tab UTT flattened
1746	@item @strong{Output format:} @tab UTT flattened
1747	@item @strong{Required annotation:} @tab tok, sen, lem -1
1748	@end multitable
1749
1750	[TODO]
1751
1752	(see mar's help 'mar -h' for some information)
1753
1754	@c ---------------------------------------------------------------------
1755	@c KOT
1756	@c ---------------------------------------------------------------------
1757
1758
1759	@page
1760	@node kot
1761	@section kot - untokenizer
1762
1763	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1764	@item @strong{Authors:} @tab Tomasz ObrÄbski
1765	@item @strong{Component category:} @tab filter
1766	@item @strong{Input format:} @tab UTT regular
1767	@item @strong{Output format:} @tab text
1768	@item @strong{Required annotation:} @tab tok
1769	@end multitable
1770
1771
1772	@menu
1773	* kot description::
1774	* kot command line options::
1775	* kot usage examples::
1776	@end menu
1777
1778	@node kot description
1779	@subsection Description
1780
1781	@command{kot} transforms a UTT formatted file back into raw text format.
1782
1783	@node kot command line options
1784	@subsection Command line options
1785
1786	@table @code
1787
1788	@parhelp
1789
1790	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1791
1792	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1793
1794	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1795
1796	@c @item @b{@minus{}@minus{}interactive @minus{}i}
1797
1798	@c @item @b{@minus{}@minus{}config=@var{filename}}
1799
1800	@item
1801
1802	@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
1803	print @var{string} between nonadjacent segments of the input file
1804
1805	@item @b{@minus{}@minus{}spaces, @minus{}r}
1806	retain the special characters @code{_}, @code{\t},
1807	@code{\n}, @code{\r}, @code{\f} unexpanded in the output
1808
1809	@end table
1810
1811	@node kot usage examples
1812	@subsection Usage examples
1813
1814	@example
1815	cat legia.txt \| tok \| kot
1816	@end example
1817
1818	@example
1819	cat legia.txt \| tok \| lem -1 \| kot
1820	@end example
1821
1822	@c ---------------------------------------------------------------
1823	@c CON
1824	@c ---------------------------------------------------------------
1825
1826
1827	@page
1828	@node con
1829	@section con - concordance table generator
1830
1831	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1832	@item @strong{Authors:} @tab Justyna Walkowska
1833	@item @strong{Component category:} @tab sink
1834	@item @strong{Input format:} @tab UTT regular
1835	@item @strong{Output format:} @tab text
1836	@item @strong{Required annotation:} @tab ser or mar
1837	@end multitable
1838	@c
1839
1840	@menu
1841	* con description::
1842	* con command line options::
1843	* con usage example::
1844	* con hints::
1845	@end menu
1846
1847
1848	@node con description
1849	@subsection Description
1850
1851	@command{con} generates a concordance table based on a pattern given to @command{ser}.
1852
1853
1854	@node con command line options
1855	@subsection Command line options
1856
1857	@table @code
1858
1859	@parhelp
1860
1861	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
1862	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1863	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1864	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1865	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
1866	@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
1867	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
1868	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
1869	@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
1870	@c @item @b{@minus{}@minus{}interactive @minus{}i}
1871	@c @item @b{@minus{}@minus{}config=@var{filename}}
1872	@c @item
1873	@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1874	@c search pattern
1875	@c
1876	@c @item @b{@minus{}@minus{}flex}
1877	@c only print the generated flex source code
1878	@c
1879	@c @item @b{@minus{}@minus{}macro=@var{filename}}
1880	@c read macrodefinitions from file @var{filename} rather than from
1881	@c default location. This option allows to redefine the set of terms.
1882	@c
1883	@c @item @b{@minus{}@minus{}define=@var{filename}}
1884	@c append macrodefinitions from file @var{filename}. This option
1885	@c allows to extend the set of terms.
1886
1887	@item @b{@minus{}@minus{}left @minus{}l}
1888	Left context info (default='30c'). Example:
1889	@example
1890	-l=5c: left context is 5 characters
1891	-l=5w: left context is 5 words
1892	-l=5s: left context is 5 non-empty input lines
1893	-l='\s*\S+\sr\S+BOS': left context starts with the given regex
1894	@end example
1895
1896	@item @b{@minus{}@minus{}right @minus{}r}
1897	Right context info (default='30c').
1898	@item @b{@minus{}@minus{}trim @minus{}t}
1899	Clear incomplete words from output.
1900	@item @b{@minus{}@minus{}white @minus{}w}
1901	DO NOT change all white characters into spaces.
1902	@item @b{@minus{}@minus{}column @minus{}c}
1903	Left column minimal width in characters (default = 0).
1904	@item @b{@minus{}@minus{}ignore @minus{}i}
1905	Ignore segment inconsistency in the input.
1906	@item @b{@minus{}@minus{}bom}
1907	Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
1908	@item @b{@minus{}@minus{}eom}
1909	End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
1910	@item @b{@minus{}@minus{}bod}
1911	Selected segment beginning display string (default='[').
1912	@item @b{@minus{}@minus{}eod}
1913	Selected segment end display string (default=']').
1914
1915
1916
1917	@end table
1918
1919	@node con usage example
1920	@subsection Usage example
1921	@example
1922	cat file.txt \| tok \| lem -1 \| ser -e 'lexeme(dom)' \| con
1923	@end example
1924
1925
1926	@node con hints
1927	@subsection Hints
1928
1929	@command{con} is a rather slow program. Do not pass large amounts of
1930	redundant text through this program. @command{con} works fine in the following
1931	sequence:
1932
1933	@example
1934	... \| grp -e EXPR \| ser -e EXPR \| con
1935	@end example
1936
1937
1938	@c ---------------------------------------------------------------------
1939	@c ---------------------------------------------------------------------
1940
1941	@page
1942	@node Auxiliary tools
1943	@chapter Auxiliary tools
1944
1945	@menu
1946	* compiledic:: dictionary compiler
1947	* fla:: UTT file flattener
1948	* unfla:: UTT file unflattener
1949	@end menu
1950
1951
1952	@page
1953	@node compiledic
1954	@section compiledic - the dictionary compiler
1955
1956	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1957	@item @strong{Authors:} @tab MichaÅ Stolarski, Tomasz ObrÄbski
1958	@item @strong{Component category:} @tab additional tool
1959	@end multitable
1960	@c
1961
1962	@command{compiledic} compiles dictionaries in text format (@code{.dic} extension) into binary
1963	(FSA) format (@code{.bin} extension).
1964
1965	Automaton representation of a dictionary is built using the AT&T tools:
1966	@itemize
1967	@item AT&T FSM Library,
1968	@item AT&T Lextools.
1969	@end itemize
1970
1971	In order for the compiledic program to work you have to install the
1972	above mentioned packages into your system. They are freely available
1973	for non-commercial use.
1974
1975	Usage:
1976	@example
1977	compiledic <dictionaryname>.dic
1978	@end example
1979
1980	The file <dictionaryname>.bin will be generated.
1981
1982	Remarque: The program produces a lot of temporary files which are
1983	stored in the current directory. They are deleted after successfull
1984	termination of the program.
1985
1986	@c @menu
1987	@c * con command line options::
1988	@c * con usage example::
1989	@c * con hints::
1990	@c @end menu
1991
1992
1993	@c -------------------------------------------------------------------------------
1994	@c FLA
1995	@c -------------------------------------------------------------------------------
1996
1997	@page
1998	@node fla
1999	@section fla - the UTT file flattener
2000
2001	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2002	@item @strong{Authors:} @tab Tomasz ObrÄbski
2003	@item @strong{Input format:} @tab UTT regular
2004	@item @strong{Output format:} @tab UTT flattened
2005	@item @strong{Required annotation:} @tab sen
2006	@end multitable
2007	@c
2008
2009	@menu
2010	* fla description::
2011	@c * fla command line options::
2012	@c * fla usage example::
2013	@end menu
2014
2015
2016	@node fla description
2017	@subsection Description
2018
2019	@command{fla} ``flattens'' a utt file by merging segments belonging
2020	to one sentence in one line. Technically, end-of-line characters
2021	('\n', ASCII code 10) are replaced with line-feed characters ('\f',
2022	ASCII code 12). The flattening makes it possible to process UTT files
2023	with such tools as @command{grep} or @command{sed} sentence by
2024	sentence (used in @command{grp} and @command{mar}).
2025
2026	Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
2027
2028	Flattened files are still human-readible.
2029
2030	Usage:
2031
2032	@example
2033	fla [<bosregex>]
2034	@end example
2035
2036	The facultative argument is a regular expression describing segments
2037	which should be treated as sentence beginnings (the test is: the
2038	segment contains a fragment matching the @code{<bosregex>}). By
2039	default, segments containing a field @code{BOS} are seeked.
2040
2041	@c -------------------------------------------------------------------------------
2042	@c UNFLA
2043	@c -------------------------------------------------------------------------------
2044
2045	@page
2046	@node unfla
2047	@section unfla - the UTT file unflattener
2048
2049	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2050	@item @strong{Authors:} @tab Tomasz ObrÄbski
2051	@item @strong{Input format:} @tab UTT flattened
2052	@item @strong{Output format:} @tab UTT regular
2053	@item @strong{Required annotation:} @tab -
2054	@end multitable
2055
2056	@menu
2057	* unfla description::
2058	@c * fla command line options::
2059	@c * fla usage example::
2060	@end menu
2061
2062	@node unfla description
2063	@subsection Description
2064	@command{unfla} transforms a flattened UTT file, produced by
2065	@command{fla}, into the regular format by restoring end-of-line
2066	characters.
2067
2068
2069
2070
2071	@c ---------------------------------------------------------------------
2072	@c USAGE EXAMPLES
2073	@c ---------------------------------------------------------------------
2074
2075	@node Usage examples
2076	@chapter Usage examples
2077
2078	@subsubheading Simple pipelines
2079
2080	@enumerate
2081
2082	@item tokenization
2083
2084	cat text \| tok > output1
2085
2086	@item morphological annotation (1)
2087
2088	simple dictionary based lemmatization
2089
2090	cat text \| tok \| lem > output1
2091
2092	@item morphological annotation (2)
2093
2094	1) perform dictionary-based lemmatization
2095	4) guess descriptions for words which have no annotation
2096
2097	@example
2098	cat text \| tok \| lem \| gue -S lem > output2
2099	@end example
2100
2101	@item morphological annotation (3)
2102
2103	1) perform dictionary-based lemmatization
2104	2) try to correct words with no annotation
2105	3) perform dictionary-based lemmatization of corrected words
2106	4) guess descriptions for words which still have no annotation
2107
2108	@example
2109	cat text \| tok \| lem \| cor -p W -S lem \| lem -I cor \| gue -p W -S lem
2110	@end example
2111	@item spelling correction
2112
2113
2114
2115	@example
2116	cat text \| tok \| egrep ' W ' \| lem \| egrep -v 'lem:' \| cor -1
2117	@end example
2118
2119	@item Expression extraction
2120
2121	Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
2122
2123	@example
2124	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' -m \| kot > output4
2125	@end example
2126
2127	@item A word in context
2128
2129	Extraction of text fragments containing a form of the lexeme 'rozmowa' in
2130	the context of 5 preceeding and 5 succeeding corpus segments.
2131
2132	@example
2133	cat text \| tok \| lem -1 \| ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m \| kot > output
2134	@end example
2135
2136	@item generation of concordance table (1)
2137
2138	@example
2139	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
2140	@end example
2141
2142	10"
2143
2144	@item generation of concordance table (2)
2145
2146	The same as above but much faster
2147
2148	@example
2149	cat text \| tok \| lem -1 \| \
2150	grp -e 'cat(<V>) space lexeme(rozmowa)' \| \
2151	ser -e 'cat(<V>) space lexeme(rozmowa)' \| \
2152	con
2153	@end example
2154
2155	2"
2156
2157	@item generation of concordance table (3)
2158
2159	Usually, one performs repetitively search over the same corpus. In
2160	such case it is advisable to transform the corpus data into the format
2161	required by @command{grp} first, and then use the preprocessed data.
2162
2163	As @command{grp} (@command{grep}) processes data faster then it is
2164	read from the disk drive, the search time may be still shortened by
2165	using file compression techniques. We suggest using the
2166	@command{lzop} compressor/decompressor.
2167
2168	@item the fastest way to search a large corpus
2169
2170	step 1: corpus preprocessing
2171
2172	@example
2173	cat corpus \| tok \| sen \| lem -1 \
2174	\| fla \| lzop -7 > corpus.grp.lzo
2175	@end example
2176
2177	step 2: search
2178
2179	@example
2180	lzop -cd corpus.grp.lzo \| unfla \| grp -e 'cat(<V>) space
2181	lexeme(rozmowa)' \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
2182	@end example
2183
2184	@end enumerate
2185
2186	@c @subsubheading More complicated configurations
2187
2188
2189	@c @example
2190	@c mknod fifo1 p
2191	@c mknod fifo2 p
2192	@c mknod fifo3 p
2193	@c mknod fifo4 p
2194	@c mknod fifo5 p
2195
2196	@c tok \| lem -p W -e fifo1 > fifo2 &
2197	@c cor -e fifo3 < fifo1 \| lem > fifo4 &
2198	@c gue < fifo3 > fifo5 &
2199	@c sort -m fifo2 fifo4 fifo5
2200
2201	@c rm fifo?
2202	@c @end example
2203
2204
2205	@c ---------------------------------------------------------------------
2206	@c ---------------------------------------------------------------------
2207
2208	@c ---------------------------------------------------------------------
2209	@c PMDBF DICTIONARY
2210	@c ---------------------------------------------------------------------
2211
2212	@node PMDBF dictionary
2213	@chapter PMDBF dictionary
2214
2215	UTT components come with lexical data derived from Polish
2216	Morphological Database (PMDB).
2217
2218	@menu
2219	* PMDBF files::
2220	* PMDBF tag structure::
2221	* PMDBF parts of speech::
2222	* PMDBF morphosyntactic attributes::
2223	@end menu
2224
2225	@node PMDBF files
2226	@section Files
2227
2228	@node PMDBF tag structure
2229	@section Tag structure
2230
2231	pos = [[:upper:]]+
2232
2233	attr = [[:upper:]]+
2234
2235	val = [[:lower:][:digit:]?!*+-] \| <[^>\n]+>
2236
2237	descr = pos ( / ( attr val + ) + ) ?
2238
2239	@node PMDBF parts of speech
2240	@section Parts of speech
2241
2242	@multitable {ADJPRP} { adjectival-passive-participle }
2243	@item @code{N} @tab noun
2244	@item @code{NPRO} @tab nominal-pronoun
2245	@item @code{NV} @tab deverbal-noun
2246	@item @code{V} @tab verb
2247	@item @code{BYC} @tab byc
2248	@item @code{VNI} @tab non-inflected-verb
2249	@item @code{ADJ} @tab adjective
2250	@item @code{ADJPAP} @tab adjectival-passive-participle
2251	@item @code{ADJPRP} @tab adjectival-present-participle
2252	@item @code{ADJPP} @tab adjectival-past-participle
2253	@item @code{ADJPRO} @tab adjectival-pronoun
2254	@item @code{ADJNUM} @tab adjectival-numeral
2255	@item @code{ADV} @tab adverb
2256	@item @code{ADVANP} @tab adverbial-anterior-participle
2257	@item @code{ADVPRP} @tab adverbial-present-participle
2258	@item @code{ADVPRO} @tab adverbial-pronoun
2259	@item @code{ADVNUM} @tab adverbial-numeral
2260	@item @code{P} @tab preposition
2261	@item @code{PPRO} @tab prep-noun-pronoun
2262	@item @code{CONJ} @tab conjunction
2263	@item @code{EXCL} @tab exclamation
2264	@item @code{APP} @tab call
2265	@item @code{ONO} @tab onomatopoeia
2266	@item @code{PART} @tab particle
2267	@item @code{NUMCRD} @tab cardinal-numeral
2268	@item @code{NUMCOL} @tab collective-numeral
2269	@item @code{NUMPAR} @tab partitive-numeral
2270	@item @code{NUMORD} @tab ordinal-numeral
2271	@end multitable
2272
2273	@node PMDBF morphosyntactic attributes
2274	@section Morphosyntactic attributes
2275
2276	@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2277	@c @headitem Attr @tab Val @tab Description
2278	@item
2279	@code{A} @tab @tab Aspect
2280	@item
2281	@tab @code{p} @tab perfect
2282	@item
2283	@tab @code{i} @tab imperfect.
2284	@item
2285	@item
2286	@code{V} @tab @tab Verb-Form
2287	@item
2288	@tab @code{b} @tab infinitive,
2289	@item
2290	@tab @code{p} @tab personal,
2291	@item
2292	@tab @code{i} @tab impersonal.
2293	@item
2294	@item
2295	@code{M} @tab @tab Mood
2296	@item
2297	@tab @code{d} @tab declarative,
2298	@item
2299	@tab @code{c} @tab conditional,
2300	@item
2301	@tab @code{i} @tab imperative.
2302	@item
2303	@item
2304	@code{T} @tab @tab Tense
2305	@item
2306	@tab @code{a} @tab past,
2307	@item
2308	@tab @code{r} @tab present,
2309	@item
2310	@tab @code{f} @tab future.
2311	@item
2312	@item
2313	@code{P} @tab @tab Person
2314	@item
2315	@tab @code{1} @tab 1,
2316	@item
2317	@tab @code{2} @tab 2,
2318	@item
2319	@tab @code{3} @tab 3.
2320	@item
2321	@item
2322	@code{D} @tab @tab Degree
2323	@item
2324	@tab @code{p} @tab positive,
2325	@item
2326	@tab @code{c} @tab comparative,
2327	@item
2328	@tab @code{s} @tab superlative.
2329	@item
2330	@item
2331	@code{N} @tab @tab Number
2332	@item
2333	@tab @code{s} @tab singular,
2334	@item
2335	@tab @code{p} @tab plural.
2336	@item
2337	@item
2338	@code{C} @tab @tab Case
2339	@item
2340	@tab @code{n} @tab nominative,
2341	@item
2342	@tab @code{g} @tab genitive,
2343	@item
2344	@tab @code{d} @tab dative,
2345	@item
2346	@tab @code{a} @tab accusative,
2347	@item
2348	@tab @code{i} @tab instrumantal,
2349	@item
2350	@tab @code{l} @tab locative,
2351	@item
2352	@tab @code{v} @tab vocative.
2353	@item
2354	@code{G} @tab @tab Gender
2355	@item
2356	@tab @code{p} @tab masculine-personal,
2357	@item
2358	@tab @code{a} @tab masculine-animal,
2359	@item
2360	@tab @code{i} @tab masculine-inanimate,
2361	@item
2362	@tab @code{f} @tab feminine,
2363	@item
2364	@tab @code{n} @tab neuter.
2365	@end multitable
2366
2367
2368	@c ---------------------------------------------------------------------
2369	@c ---------------------------------------------------------------------
2370	@c
2371	@c @node Examples
2372	@c @chapter Examples
2373
2374	@c ----------------------------------------------------------------------
2375	@c ----------------------------------------------------------------------
2376
2377	@node GNU Free Documentation License
2378	@chapter GNU Free Documentation License
2379
2380	@c The GNU Free Documentation License.
2381	@center Version 1.2, November 2002
2382
2383	@c This file is intended to be included within another document,
2384	@c hence no sectioning command or @node.
2385
2386	@display
2387	Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
2388	51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
2389
2390	Everyone is permitted to copy and distribute verbatim copies
2391	of this license document, but changing it is not allowed.
2392	@end display
2393
2394	@enumerate 0
2395	@item
2396	PREAMBLE
2397
2398	The purpose of this License is to make a manual, textbook, or other
2399	functional and useful document @dfn{free} in the sense of freedom: to
2400	assure everyone the effective freedom to copy and redistribute it,
2401	with or without modifying it, either commercially or noncommercially.
2402	Secondarily, this License preserves for the author and publisher a way
2403	to get credit for their work, while not being considered responsible
2404	for modifications made by others.
2405
2406	This License is a kind of ``copyleft'', which means that derivative
2407	works of the document must themselves be free in the same sense. It
2408	complements the GNU General Public License, which is a copyleft
2409	license designed for free software.
2410
2411	We have designed this License in order to use it for manuals for free
2412	software, because free software needs free documentation: a free
2413	program should come with manuals providing the same freedoms that the
2414	software does. But this License is not limited to software manuals;
2415	it can be used for any textual work, regardless of subject matter or
2416	whether it is published as a printed book. We recommend this License
2417	principally for works whose purpose is instruction or reference.
2418
2419	@item
2420	APPLICABILITY AND DEFINITIONS
2421
2422	This License applies to any manual or other work, in any medium, that
2423	contains a notice placed by the copyright holder saying it can be
2424	distributed under the terms of this License. Such a notice grants a
2425	world-wide, royalty-free license, unlimited in duration, to use that
2426	work under the conditions stated herein. The ``Document'', below,
2427	refers to any such manual or work. Any member of the public is a
2428	licensee, and is addressed as ``you''. You accept the license if you
2429	copy, modify or distribute the work in a way requiring permission
2430	under copyright law.
2431
2432	A ``Modified Version'' of the Document means any work containing the
2433	Document or a portion of it, either copied verbatim, or with
2434	modifications and/or translated into another language.
2435
2436	A ``Secondary Section'' is a named appendix or a front-matter section
2437	of the Document that deals exclusively with the relationship of the
2438	publishers or authors of the Document to the Document's overall
2439	subject (or to related matters) and contains nothing that could fall
2440	directly within that overall subject. (Thus, if the Document is in
2441	part a textbook of mathematics, a Secondary Section may not explain
2442	any mathematics.) The relationship could be a matter of historical
2443	connection with the subject or with related matters, or of legal,
2444	commercial, philosophical, ethical or political position regarding
2445	them.
2446
2447	The ``Invariant Sections'' are certain Secondary Sections whose titles
2448	are designated, as being those of Invariant Sections, in the notice
2449	that says that the Document is released under this License. If a
2450	section does not fit the above definition of Secondary then it is not
2451	allowed to be designated as Invariant. The Document may contain zero
2452	Invariant Sections. If the Document does not identify any Invariant
2453	Sections then there are none.
2454
2455	The ``Cover Texts'' are certain short passages of text that are listed,
2456	as Front-Cover Texts or Back-Cover Texts, in the notice that says that
2457	the Document is released under this License. A Front-Cover Text may
2458	be at most 5 words, and a Back-Cover Text may be at most 25 words.
2459
2460	A ``Transparent'' copy of the Document means a machine-readable copy,
2461	represented in a format whose specification is available to the
2462	general public, that is suitable for revising the document
2463	straightforwardly with generic text editors or (for images composed of
2464	pixels) generic paint programs or (for drawings) some widely available
2465	drawing editor, and that is suitable for input to text formatters or
2466	for automatic translation to a variety of formats suitable for input
2467	to text formatters. A copy made in an otherwise Transparent file
2468	format whose markup, or absence of markup, has been arranged to thwart
2469	or discourage subsequent modification by readers is not Transparent.
2470	An image format is not Transparent if used for any substantial amount
2471	of text. A copy that is not ``Transparent'' is called ``Opaque''.
2472
2473	Examples of suitable formats for Transparent copies include plain
2474	@sc{ascii} without markup, Texinfo input format, La@TeX{} input
2475	format, @acronym{SGML} or @acronym{XML} using a publicly available
2476	@acronym{DTD}, and standard-conforming simple @acronym{HTML},
2477	PostScript or @acronym{PDF} designed for human modification. Examples
2478	of transparent image formats include @acronym{PNG}, @acronym{XCF} and
2479	@acronym{JPG}. Opaque formats include proprietary formats that can be
2480	read and edited only by proprietary word processors, @acronym{SGML} or
2481	@acronym{XML} for which the @acronym{DTD} and/or processing tools are
2482	not generally available, and the machine-generated @acronym{HTML},
2483	PostScript or @acronym{PDF} produced by some word processors for
2484	output purposes only.
2485
2486	The ``Title Page'' means, for a printed book, the title page itself,
2487	plus such following pages as are needed to hold, legibly, the material
2488	this License requires to appear in the title page. For works in
2489	formats which do not have any title page as such, ``Title Page'' means
2490	the text near the most prominent appearance of the work's title,
2491	preceding the beginning of the body of the text.
2492
2493	A section ``Entitled XYZ'' means a named subunit of the Document whose
2494	title either is precisely XYZ or contains XYZ in parentheses following
2495	text that translates XYZ in another language. (Here XYZ stands for a
2496	specific section name mentioned below, such as ``Acknowledgements'',
2497	``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title''
2498	of such a section when you modify the Document means that it remains a
2499	section ``Entitled XYZ'' according to this definition.
2500
2501	The Document may include Warranty Disclaimers next to the notice which
2502	states that this License applies to the Document. These Warranty
2503	Disclaimers are considered to be included by reference in this
2504	License, but only as regards disclaiming warranties: any other
2505	implication that these Warranty Disclaimers may have is void and has
2506	no effect on the meaning of this License.
2507
2508	@item
2509	VERBATIM COPYING
2510
2511	You may copy and distribute the Document in any medium, either
2512	commercially or noncommercially, provided that this License, the
2513	copyright notices, and the license notice saying this License applies
2514	to the Document are reproduced in all copies, and that you add no other
2515	conditions whatsoever to those of this License. You may not use
2516	technical measures to obstruct or control the reading or further
2517	copying of the copies you make or distribute. However, you may accept
2518	compensation in exchange for copies. If you distribute a large enough
2519	number of copies you must also follow the conditions in section 3.
2520
2521	You may also lend copies, under the same conditions stated above, and
2522	you may publicly display copies.
2523
2524	@item
2525	COPYING IN QUANTITY
2526
2527	If you publish printed copies (or copies in media that commonly have
2528	printed covers) of the Document, numbering more than 100, and the
2529	Document's license notice requires Cover Texts, you must enclose the
2530	copies in covers that carry, clearly and legibly, all these Cover
2531	Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
2532	the back cover. Both covers must also clearly and legibly identify
2533	you as the publisher of these copies. The front cover must present
2534	the full title with all words of the title equally prominent and
2535	visible. You may add other material on the covers in addition.
2536	Copying with changes limited to the covers, as long as they preserve
2537	the title of the Document and satisfy these conditions, can be treated
2538	as verbatim copying in other respects.
2539
2540	If the required texts for either cover are too voluminous to fit
2541	legibly, you should put the first ones listed (as many as fit
2542	reasonably) on the actual cover, and continue the rest onto adjacent
2543	pages.
2544
2545	If you publish or distribute Opaque copies of the Document numbering
2546	more than 100, you must either include a machine-readable Transparent
2547	copy along with each Opaque copy, or state in or with each Opaque copy
2548	a computer-network location from which the general network-using
2549	public has access to download using public-standard network protocols
2550	a complete Transparent copy of the Document, free of added material.
2551	If you use the latter option, you must take reasonably prudent steps,
2552	when you begin distribution of Opaque copies in quantity, to ensure
2553	that this Transparent copy will remain thus accessible at the stated
2554	location until at least one year after the last time you distribute an
2555	Opaque copy (directly or through your agents or retailers) of that
2556	edition to the public.
2557
2558	It is requested, but not required, that you contact the authors of the
2559	Document well before redistributing any large number of copies, to give
2560	them a chance to provide you with an updated version of the Document.
2561
2562	@item
2563	MODIFICATIONS
2564
2565	You may copy and distribute a Modified Version of the Document under
2566	the conditions of sections 2 and 3 above, provided that you release
2567	the Modified Version under precisely this License, with the Modified
2568	Version filling the role of the Document, thus licensing distribution
2569	and modification of the Modified Version to whoever possesses a copy
2570	of it. In addition, you must do these things in the Modified Version:
2571
2572	@enumerate A
2573	@item
2574	Use in the Title Page (and on the covers, if any) a title distinct
2575	from that of the Document, and from those of previous versions
2576	(which should, if there were any, be listed in the History section
2577	of the Document). You may use the same title as a previous version
2578	if the original publisher of that version gives permission.
2579
2580	@item
2581	List on the Title Page, as authors, one or more persons or entities
2582	responsible for authorship of the modifications in the Modified
2583	Version, together with at least five of the principal authors of the
2584	Document (all of its principal authors, if it has fewer than five),
2585	unless they release you from this requirement.
2586
2587	@item
2588	State on the Title page the name of the publisher of the
2589	Modified Version, as the publisher.
2590
2591	@item
2592	Preserve all the copyright notices of the Document.
2593
2594	@item
2595	Add an appropriate copyright notice for your modifications
2596	adjacent to the other copyright notices.
2597
2598	@item
2599	Include, immediately after the copyright notices, a license notice
2600	giving the public permission to use the Modified Version under the
2601	terms of this License, in the form shown in the Addendum below.
2602
2603	@item
2604	Preserve in that license notice the full lists of Invariant Sections
2605	and required Cover Texts given in the Document's license notice.
2606
2607	@item
2608	Include an unaltered copy of this License.
2609
2610	@item
2611	Preserve the section Entitled ``History'', Preserve its Title, and add
2612	to it an item stating at least the title, year, new authors, and
2613	publisher of the Modified Version as given on the Title Page. If
2614	there is no section Entitled ``History'' in the Document, create one
2615	stating the title, year, authors, and publisher of the Document as
2616	given on its Title Page, then add an item describing the Modified
2617	Version as stated in the previous sentence.
2618
2619	@item
2620	Preserve the network location, if any, given in the Document for
2621	public access to a Transparent copy of the Document, and likewise
2622	the network locations given in the Document for previous versions
2623	it was based on. These may be placed in the ``History'' section.
2624	You may omit a network location for a work that was published at
2625	least four years before the Document itself, or if the original
2626	publisher of the version it refers to gives permission.
2627
2628	@item
2629	For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
2630	the Title of the section, and preserve in the section all the
2631	substance and tone of each of the contributor acknowledgements and/or
2632	dedications given therein.
2633
2634	@item
2635	Preserve all the Invariant Sections of the Document,
2636	unaltered in their text and in their titles. Section numbers
2637	or the equivalent are not considered part of the section titles.
2638
2639	@item
2640	Delete any section Entitled ``Endorsements''. Such a section
2641	may not be included in the Modified Version.
2642
2643	@item
2644	Do not retitle any existing section to be Entitled ``Endorsements'' or
2645	to conflict in title with any Invariant Section.
2646
2647	@item
2648	Preserve any Warranty Disclaimers.
2649	@end enumerate
2650
2651	If the Modified Version includes new front-matter sections or
2652	appendices that qualify as Secondary Sections and contain no material
2653	copied from the Document, you may at your option designate some or all
2654	of these sections as invariant. To do this, add their titles to the
2655	list of Invariant Sections in the Modified Version's license notice.
2656	These titles must be distinct from any other section titles.
2657
2658	You may add a section Entitled ``Endorsements'', provided it contains
2659	nothing but endorsements of your Modified Version by various
2660	parties---for example, statements of peer review or that the text has
2661	been approved by an organization as the authoritative definition of a
2662	standard.
2663
2664	You may add a passage of up to five words as a Front-Cover Text, and a
2665	passage of up to 25 words as a Back-Cover Text, to the end of the list
2666	of Cover Texts in the Modified Version. Only one passage of
2667	Front-Cover Text and one of Back-Cover Text may be added by (or
2668	through arrangements made by) any one entity. If the Document already
2669	includes a cover text for the same cover, previously added by you or
2670	by arrangement made by the same entity you are acting on behalf of,
2671	you may not add another; but you may replace the old one, on explicit
2672	permission from the previous publisher that added the old one.
2673
2674	The author(s) and publisher(s) of the Document do not by this License
2675	give permission to use their names for publicity for or to assert or
2676	imply endorsement of any Modified Version.
2677
2678	@item
2679	COMBINING DOCUMENTS
2680
2681	You may combine the Document with other documents released under this
2682	License, under the terms defined in section 4 above for modified
2683	versions, provided that you include in the combination all of the
2684	Invariant Sections of all of the original documents, unmodified, and
2685	list them all as Invariant Sections of your combined work in its
2686	license notice, and that you preserve all their Warranty Disclaimers.
2687
2688	The combined work need only contain one copy of this License, and
2689	multiple identical Invariant Sections may be replaced with a single
2690	copy. If there are multiple Invariant Sections with the same name but
2691	different contents, make the title of each such section unique by
2692	adding at the end of it, in parentheses, the name of the original
2693	author or publisher of that section if known, or else a unique number.
2694	Make the same adjustment to the section titles in the list of
2695	Invariant Sections in the license notice of the combined work.
2696
2697	In the combination, you must combine any sections Entitled ``History''
2698	in the various original documents, forming one section Entitled
2699	``History''; likewise combine any sections Entitled ``Acknowledgements'',
2700	and any sections Entitled ``Dedications''. You must delete all
2701	sections Entitled ``Endorsements.''
2702
2703	@item
2704	COLLECTIONS OF DOCUMENTS
2705
2706	You may make a collection consisting of the Document and other documents
2707	released under this License, and replace the individual copies of this
2708	License in the various documents with a single copy that is included in
2709	the collection, provided that you follow the rules of this License for
2710	verbatim copying of each of the documents in all other respects.
2711
2712	You may extract a single document from such a collection, and distribute
2713	it individually under this License, provided you insert a copy of this
2714	License into the extracted document, and follow this License in all
2715	other respects regarding verbatim copying of that document.
2716
2717	@item
2718	AGGREGATION WITH INDEPENDENT WORKS
2719
2720	A compilation of the Document or its derivatives with other separate
2721	and independent documents or works, in or on a volume of a storage or
2722	distribution medium, is called an ``aggregate'' if the copyright
2723	resulting from the compilation is not used to limit the legal rights
2724	of the compilation's users beyond what the individual works permit.
2725	When the Document is included in an aggregate, this License does not
2726	apply to the other works in the aggregate which are not themselves
2727	derivative works of the Document.
2728
2729	If the Cover Text requirement of section 3 is applicable to these
2730	copies of the Document, then if the Document is less than one half of
2731	the entire aggregate, the Document's Cover Texts may be placed on
2732	covers that bracket the Document within the aggregate, or the
2733	electronic equivalent of covers if the Document is in electronic form.
2734	Otherwise they must appear on printed covers that bracket the whole
2735	aggregate.
2736
2737	@item
2738	TRANSLATION
2739
2740	Translation is considered a kind of modification, so you may
2741	distribute translations of the Document under the terms of section 4.
2742	Replacing Invariant Sections with translations requires special
2743	permission from their copyright holders, but you may include
2744	translations of some or all Invariant Sections in addition to the
2745	original versions of these Invariant Sections. You may include a
2746	translation of this License, and all the license notices in the
2747	Document, and any Warranty Disclaimers, provided that you also include
2748	the original English version of this License and the original versions
2749	of those notices and disclaimers. In case of a disagreement between
2750	the translation and the original version of this License or a notice
2751	or disclaimer, the original version will prevail.
2752
2753	If a section in the Document is Entitled ``Acknowledgements'',
2754	``Dedications'', or ``History'', the requirement (section 4) to Preserve
2755	its Title (section 1) will typically require changing the actual
2756	title.
2757
2758	@item
2759	TERMINATION
2760
2761	You may not copy, modify, sublicense, or distribute the Document except
2762	as expressly provided for under this License. Any other attempt to
2763	copy, modify, sublicense or distribute the Document is void, and will
2764	automatically terminate your rights under this License. However,
2765	parties who have received copies, or rights, from you under this
2766	License will not have their licenses terminated so long as such
2767	parties remain in full compliance.
2768
2769	@item
2770	FUTURE REVISIONS OF THIS LICENSE
2771
2772	The Free Software Foundation may publish new, revised versions
2773	of the GNU Free Documentation License from time to time. Such new
2774	versions will be similar in spirit to the present version, but may
2775	differ in detail to address new problems or concerns. See
2776	@uref{http://www.gnu.org/copyleft/}.
2777
2778	Each version of the License is given a distinguishing version number.
2779	If the Document specifies that a particular numbered version of this
2780	License ``or any later version'' applies to it, you have the option of
2781	following the terms and conditions either of that specified version or
2782	of any later version that has been published (not as a draft) by the
2783	Free Software Foundation. If the Document does not specify a version
2784	number of this License, you may choose any version ever published (not
2785	as a draft) by the Free Software Foundation.
2786	@end enumerate
2787
2788	@page
2789	@heading ADDENDUM: How to use this License for your documents
2790
2791	To use this License in a document you have written, include a copy of
2792	the License in the document and put the following copyright and
2793	license notices just after the title page:
2794
2795	@smallexample
2796	@group
2797	Copyright (C) @var{year} @var{your name}.
2798	Permission is granted to copy, distribute and/or modify this document
2799	under the terms of the GNU Free Documentation License, Version 1.2
2800	or any later version published by the Free Software Foundation;
2801	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
2802	Texts. A copy of the license is included in the section entitled ``GNU
2803	Free Documentation License''.
2804	@end group
2805	@end smallexample
2806
2807	If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
2808	replace the ``with@dots{}Texts.'' line with this:
2809
2810	@smallexample
2811	@group
2812	with the Invariant Sections being @var{list their titles}, with
2813	the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
2814	being @var{list}.
2815	@end group
2816	@end smallexample
2817
2818	If you have Invariant Sections without Cover Texts, or some other
2819	combination of the three, merge those two alternatives to suit the
2820	situation.
2821
2822	If your document contains nontrivial examples of program code, we
2823	recommend releasing these examples in parallel under your choice of
2824	free software license, such as the GNU General Public License,
2825	to permit their use in free software.
2826
2827	@c Local Variables:
2828	@c ispell-local-pdict: "ispell-dict"
2829	@c End:
2830
2831
2832	@c ---------------------------------------------------------------------
2833	@c ---------------------------------------------------------------------
2834
2835	@node Reporting bugs
2836	@chapter Reporting bugs
2837
2838	Report bugs to <obrebski@@amu.edu.pl>.
2839
2840	@c ---------------------------------------------------------------------
2841	@c ---------------------------------------------------------------------
2842
2843	@c @node Copyright
2844	@c @chapter Copyright
2845	@c
2846	@c Copyright 2004 by Tomasz ObrÄbski
2847	@c This software is free for research and educational use.
2848
2849	@c ---------------------------------------------------------------------
2850	@c ---------------------------------------------------------------------
2851
2852	@node Author
2853	@chapter Author
2854
2855
2856	@bye

Note: See TracBrowser for help on using the repository browser.

UAM Text Tools

Context Navigation

source: app/doc/utt.texinfo @ 9ace5d2

Download in other formats: