Context Navigation

source: doc/utt.texinfo @ 519eaf5

Last change on this file since 519eaf5 was d6a59ca, checked in by Tomasz Obrebski <obrebski@…>, 13 years ago
Poprawki w dokumentacji (utf8 dzia�a), poprawka w tre
Property mode set to `100644`
File size: 85.2 KB

Line
1
2	\input texinfo @c --texinfo--
3	@c @documentencoding ISO-8859-2
4	@c @documentlanguage pl
5
6	@c %**start of header
7	@setfilename utt.info
8	@settitle UAM Text Tools v0.90
9	@documentencoding utf-8
10	@c %**end of header
11
12	@copying
13	This manual is for UAM Text Tools (version 0.90, October, 2008)
14
15	Copyright @copyright{} 2005, 2007 Tomasz ObrÄbski, MichaÅ Stolarski, Justyna Walkowska, PaweÅ Konieczka.
16
17	Permission is granted to copy, distribute and/or modify this document
18	under the terms of the GNU Free Documentation License, Version 1.2 or
19	any later version published by the Free Software Foundation; with no
20	Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
21	copy of the license is included in the section entitled GNU Free
22	Documentation License,,GNU Free Documentation License.
23
24	@c @quotation
25	@c Permission is granted to ...
26	@c No permission is granted until the document is completed.
27	@c @end quotation
28	@end copying
29
30	@titlepage
31	@title UAM Text Tools 0.90 - User Manual
32	@subtitle edition 0.01, @today
33	@subtitle status: prescript
34	@author by Justyna Walkowska, Tomasz ObrÄbski and MichaÅ Stolarski
35	@page
36	@vskip 0pt plus 1filll
37	@insertcopying
38	@end titlepage
39
40	@contents
41
42	@c @paragraphindent none
43
44	@iftex
45	@tex
46	% \usepackage[T1]{fontenc}
47	% \usepackage[utf8]{inputenc}
48	% \usepackage{times}
49	@end tex
50
51	@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
52	@end iftex
53	@c @headings off
54	@c @everyheading LEM(1) @\| @\| LEM(1)
55	@everyfooting @today @c @\| @thispage @\|
56
57	@ifnottex
58
59	@node Top
60	@top UTT - UAM Text Tools
61
62	@insertcopying
63
64	@menu
65	* General information::
66	* UTT file format::
67	* Configuration files::
68	* UTT components::
69	* Auxiliary tools::
70	* Usage examples::
71	* PMDBF dictionary::
72	@c * Examples::
73	@c * Copyright::
74	* GNU Free Documentation License::
75	* Reporting bugs::
76	* Author::
77	@end menu
78	@end ifnottex
79
80
81	@c ----------------------------------------------------------------------
82
83	@node General information
84	@chapter General information
85
86	UAM Text Tools (UTT) is a package of language processing tools
87	developed at Adam Mickiewicz University. Its functionality includes:
88
89	@itemize @bullet
90
91	@item
92	tokenization Ã³ÅÄÅŒ
93	@item
94	dictionary-based morphological analysis
95	@item
96	heuristic morphological analysis of unknown words
97	@item
98	spelling correction Ã³ÅÄÅÄÅŒ
99	@item
100	pattern search
101	@item
102	sentence splitting
103	@item
104	generation of concordance tables
105	@end itemize
106
107	The toolkit is destined for processing of raw (not annotated)
108	unrestricted text for any conceivable purpose.
109
110	The system is organized as a collection of command-line programs, each
111	performing one operation, e.g. tokenization, lemmatization, spelling
112	correction. The components are independent one from another, the
113	unifying element being the uniform i/o file format.
114
115	The components may be combined in various ways to provide various text
116	processing services. Also new components supplied by the used may be
117	easily incorporated into the system provided that they respect the i/o
118	file format conventions.
119
120	UTT component programs does not depend on any specific tagset or
121	morphological description format.
122
123	UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
124	the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
125
126	The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use.
127
128
129	List of contributors:
130
131	@itemize
132	@item Pawel Konieczka
133	@item Tomasz ObrÄbski
134	@item MichaÅ Stolarski
135	@item Marcin Walas
136	@item Justyna Walkowska
137	@item PaweÅ WereÅski
138	@end itemize
139
140	@c ----------------------------------------------------------------------
141	@c ---------------------------------------------------------------------
142
143	@node UTT file format
144	@chapter UTT file format
145
146	A UTT file contains annotation of a text. It consists of a sequence of
147	segments. Each segment explicitly refers to a continuous piece of the
148	text and provides some information on it.
149
150	@section Segment format
151
152	A segment occupies one line of a UTT file and consists of
153	space-separated fields:
154
155
156	@quotation
157	@sp 1
158	[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
159	@sp 1
160	@end quotation
161
162	@table @var
163
164	@item @var{start}
165	Non-negative integer value indicating the position in the source text where the
166	segment starts.
167
168	@item @var{length}
169	Non-negative integer value indicating the length of the segment.
170
171	@item @var{type}
172	A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
173	@var{type} reflects the main classification of segments -
174	into words, numbers, punctuation marks, meta-text markers.
175	@xref{tok output,,tok output}, for description of automatically recognized type markers.
176
177	@item @var{form}
178	This field contains the textual form of the segment or the special
179	symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
180
181	The characters or character sequences that have special meaning in the
182	@var{form} field are enumerated below.
183
184	Characters with special meaning:
185
186	@itemize
187	@item @code{_} - space character
188	@item @code{*} - undefined contents
189	@end itemize
190
191	Escape sequences:
192
193	@itemize
194	@item @code{\n} - new line
195	@item @code{\t} - tabulation
196	@item @code{\r} - carriage return
197
198	@item @code{\_} - the @code{_} character
199	@item @code{\} - the @code{} character
200	@item @code{\\} - the @code{\} character
201
202	@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
203	@end itemize
204
205	@item @var{annotation1}
206	@item @var{annotation2}
207	@item ...
208	Annotation fields have the following format:
209
210	@var{longname} @code{:} @var{value}
211
212	or
213
214	@var{shortname} @var{value}
215
216	where @var{longname} is a string of alphanumeric characters
217	(isalnum() test), @var{shortname} - a single non-alphanumeric character
218	(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
219
220	@end table
221
222
223	Only two fields are mandatory: @var{type} and @var{form}. All other fields
224	may be absent. In the case when only one number precedes the
225	@var{type} field, it is interpreted as the @var{START} position.
226
227	If the @var{length} field is ommited, the length of the segment is the
228	length of the @var{form} field, except when the value of the
229	@var{form} field is @code{*} -- in this case, the length is assumed to
230	be 0.
231
232	If the @var{start} field is also absent, the segment is assumed to directly
233	follow the preceding one.
234
235	@c Conventions:
236
237	@c Annotation fields with predefined meaning:
238
239	@c @itemize
240	@c @item @code{!} - UTT components are allowed to modify the contents of
241	@c the @var{form} field (e.g. spelling correction does this). If this happens the
242	@c original form of the segment have to be placed in the @code{!}-field.
243	@c @item @code{@@} - morphological description
244	@c @item @code{=} - node identifier assignment (used in graph encoding)
245	@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
246	@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
247	@c @end itemize
248
249	Segments of length 0 may be used to mark file positions with some
250	information. See e.g. BOS and EOS (beginning/end of sentence) markers
251	in the example below.
252
253	Example:
254
255	sentence: @samp{Piszemy dobre progrumy.}
256
257	@example
258	0000 00 BOS *
259	0000 07 W Piszemy lem:pisaÄ,V
260	0007 01 S _
261	0008 05 W dobre lem:dobry,ADJ
262	0013 01 S _
263	0014 08 W progrumy cor:programy lem:program,N
264	0022 01 P .
265	0023 00 EOS *
266	0023 01 S _
267	0024 00 BOS *
268	0024 11 W Warszawiacy lem:Warszawiak,N
269	0035 01 S _
270	0036 03 W teÅŒ
271	0039 01 P .
272	0040 00 EOS *
273
274	@end example
275
276	@example
277	0000 BOS *
278	0000 W Piszemy lem:pisaÄ,V
279	0007 S _
280	0008 W dobre lem:dobry,ADJ
281	0013 S _
282	0014 W progrumy cor:programy lem:program,N
283	0022 P .
284	0023 EOS *
285	@end example
286
287	Posion information may be provided only for some types of segments:
288
289	@example
290	0000 BOS *
291	W Piszemy lem:pisaÄÂ,V
292	S _
293	W dobre lem:dobry,ADJ
294	S _
295	W progrumy cor:programy lem:program,N
296	P .
297	EOS *
298	S _
299	0024 BOS *
300	W Warszawiacy lem:Warszawiak,N
301	S _
302	W teÅŒ
303	P .
304	EOS *
305	@end example
306
307	Position/length information may be provided only when necessary:
308
309	@example
310	0000 04 N *
311	0000 N 12
312	P .
313	N 5
314	S _
315	W km
316	@end example
317
318	@section UTT File
319
320	A UTT file consists of a sequence of segments. The same text position
321	may be covered by multiple segments. In cosequence, ambiguous text
322	segmentation and ambiguous annotation may be represented.
323
324	There are two structural requirements a valid UTT-formatted file
325	has to meet:
326
327	@itemize @bullet
328
329	@item
330	segments have to be sorted with respect to the @var{position} field,
331
332	@item
333	for each
334	segment ending at position @var{n}, either there must be a segment starting at
335	position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
336	for each segment starting at position @var{n}, either there must be a segment
337	ending at position @var{n-1}, or the position @var{n-1} must not be covered
338	by any segment.
339
340	@end itemize
341
342	A valid annotation for the text fragment
343	@example
344	12.5 km
345	@end example
346
347	may be
348
349	@example
350	0000 02 N 12
351	0000 04 N 12.5
352	0002 01 P .
353	0003 01 N 5
354	0004 01 S _
355	0005 02 W km
356	@end example
357
358	but not
359
360	@example
361	0000 02 N 12
362	0000 04 N 12.5
363	0004 01 S _
364	0005 02 W km
365	@end example
366
367	because in the latter example the first segment (starting at position
368	0000, 2 characters long) ends at position @var{n}=0001 which is
369	covered by the second segment and no segment starts at position
370	@var{n+2}=0002.
371
372
373	@section Flattened UTT file
374
375	A UTT file format has two variants: regular and flattened. The regular
376	format was described above. In the flattened format some of the
377	end-of-line characters are replaced with line-feed characters.
378
379	The flatten format is basically used to represent whole sentences as
380	single lines of the input file (all intrasentential end-of-line
381	characters are replaced with line-feed characters).
382
383	This technical trick permits to perform certain text
384	processing operations on entire sentences with the use of such tools as
385	@command{grep} (see @command{grp} component) or @command{sed} (see @command{mar} component).
386
387	The conversion between the two formats is performed by the tools:
388	@command{fla} and @command{unfla}.
389
390	@section Character encoding
391
392	The UTT component programs accept only 1-byte character encoding, such
393	as ISO, ANSI, DOS.
394
395
396	@c @section Formats
397
398	@c @unnumberedsubsubsec Basic format
399
400	@c While processing large amounts of the overhead related with explicit
401	@c ... of the start position and segment length becomes ... . Therefore,
402	@c for efficiency reasons certain shortcuts are possible:
403
404	@c @unnumberedsubsubsec Relative start position
405
406	@c Start position may be given as relative distance from the last
407	@c absolut position.
408
409	@c @unnumberedsubsubsec Absent length
410
411	@c Segment length may by omitted. Normally it can be restored by counting
412	@c the length of the @emph{form field}. For segments with the special value
413	@c @code{*} in the @emph{form field} length 0 is assumed.
414
415	@c @unnumberedsubsubsec Absent length and start position
416
417	@c Both start position and segment length may be omitted. In this format
418	@c each segment is assumed to follow the previous one. This format is,
419	@c therefore, suitable only for unambiguously tagged text
420	@c (0-length markers can be still used.)
421
422
423	@c @table @code
424	@c @item AL
425	@c @code{1234 03 W kot}
426	@c @item RL
427	@c @code{+56 03 W kot}
428	@c @item A
429	@c @code{1234 W kot}
430	@c @item R
431	@c @code{+56 W kot}
432	@c @item 0
433	@c @code{W kot}
434	@c @end table
435
436
437	@c [JAK UZYSKAÄÂ POLSKIE CZCIONKI W DVI???]
438
439	@macro parhelp
440	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
441	Print help.
442	@end macro
443
444
445	@macro parversion
446	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
447	Print version information.
448	@end macro
449
450	@macro parinteractive
451	@item @b{@minus{}@minus{}interactive, @minus{}i}
452	This option toggles interactive mode, which is by default off. In the
453	interactive mode the program does not buffer the output.
454	@end macro
455
456
457	@c @macro parfile
458	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
459	@c Input file name.
460	@c If this option is absent or equal to '@minus{}', the program
461	@c reads from the standard input.
462	@c @end macro
463
464
465	@c @macro paroutput
466	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
467	@c Regular output file name. To regular output the program sends segments
468	@c which it successfully processed and copies those which were not
469	@c subject to processing. If this option is absent or equal to
470	@c '@minus{}', standard output is used.
471	@c @end macro
472
473	@c @macro parfail
474	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
475	@c Fail output file name. To fail output the program copies the segments
476	@c it failed to process. If this option is absent or equal to
477	@c '@minus{}', standard output is used.
478	@c @end macro
479
480
481	@c @macro parcopy
482	@c @item @b{@minus{}@minus{}copy, @minus{}c}
483	@c Copy succesfully processed segments to regular output also in their
484	@c original input form.
485	@c @end macro
486
487
488	@macro parinputfield
489	@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
490	The field containing the input to the program. The default is the
491	@var{form} field. The fields @var{position}, @var{length}, @var{type},
492	and @var{form} are referred to as @code{1}, @code{2}, @code{3},
493	@code{4}, respectively.
494	@end macro
495
496
497	@macro paroutputfield
498	@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
499	The name of the field added by the program. The default is the name of the program.
500	@end macro
501
502
503	@macro pardictionary
504	@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
505	Dictionary file name.
506	@end macro
507
508
509	@macro parprocess
510	@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
511	Process segments with the specified value in the @var{type} field.
512	Multiple occurences of this option are allowed and are interpreted as
513	disjunction. If this option is absent, all segments are processed.
514	@end macro
515
516
517	@macro parselect
518	@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
519	Select for processing only segments in which the field named
520	@var{fieldname} is present. Multiple occurences of this option are
521	allowed and are interpreted as conjunction of conditions. If this
522	option is absent, all segments are processed.
523	@end macro
524
525
526	@macro parunselect
527	@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
528	Select for processing only segments in which the field @var{fieldname}
529	is absent. Multiple occurences of this option are allowed and are
530	interpreted as conjunction of conditions. If this option is absent,
531	all segments are processed.
532	@end macro
533
534
535	@macro paroneline
536	@item @b{@minus{}@minus{}one-line}
537	This option makes the program print ambiguous annotation in one output
538	line by generating multiple annotation fields. By default when
539	ambiguous annotation may be produced for a segment, the segment is
540	multiplicated and each of the annotations is added to separate copy of
541	the segment.
542	@end macro
543
544
545	@macro paronefield
546	@item @b{@minus{}@minus{}one-field, @minus{}1}
547	This option makes the program print ambiguous annotation in one
548	annotation field. By default when ambiguous annotation may be produced
549	for a segment, the segment is multiplicated and each of the
550	annotations is added to separate copy of the segment.
551
552	This option is useful when working with @command{kot} or @command{con}.
553	@end macro
554
555
556	@c ---------------------------------------------------------------------
557	@c CONFIGURATION FILES
558	@c ---------------------------------------------------------------------
559
560	@node Configuration files
561	@chapter Configuration files
562
563	Values for all command line options accepted by a component
564	may be set in configuration files. The default location of the
565	configuration files for a component named @command{@var{program}} are
566
567	@example
568	@file{/usr/local/etc/utt/@var{program}.conf}
569	@end example
570
571	for system-wide configuration file and
572
573	@example
574	@file{~/.utt/@var{program}.conf}
575	@end example
576
577	for user configuration file.
578
579	@c The configuration file to load may be also specified with the
580	@c @option{--config} option. Configuration file need not be provided.
581
582	For each option, the value is set according to the following priority:
583
584	@itemize
585	@item command line
586	@c @item configuration file indicated with @option{--config} option
587	@item user configuration file (or configuration file indicated with the @option{--config} option)
588	@item system-wide configuration file
589	@end itemize
590
591	Parameter values are specified in the following format:
592
593	@var{parametername}=@var{value}
594
595	where @var{parametername} is the short or long name of an option accepted by
596	the program, or
597
598	@var{parametername}
599
600	if the option does not need arguments.
601
602	You can introduce comments to configuration files using the # sign.
603
604	If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
605
606	@c The equal sign may be omitted.
607
608
609	@quotation Tip
610	If you have two (or more) frequently used sets of options for the same
611	program (eg. lem with PMDBF dictionary and lem with a user dictionary)
612	a good solution is to create two soft links to lem, called
613	eg. lemg and lemu and specify their configuration in files lemg.conf
614	and lemu.conf respectively.
615	@end quotation
616
617	@c ---------------------------------------------------------------------
618	@c COMPONENTS
619	@c ---------------------------------------------------------------------
620
621	@node UTT components
622	@chapter UTT components
623
624	UTT components are of three types:
625
626	@menu
627	Sources: programs which read non-UTT data (e.g. raw text) and produce output
628	in UTT format
629	* tok:: a tokenizer
630
631	Filters: programs which read and produce UTT-formatted data
632	* lem:: a morphological analyzer
633	* gue:: a morphological guesser
634	* cor:: a simple spelling corrector
635	* kor:: a more elaborated spelling corrector
636	* sen:: a sentensizer
637	* ser:: a pattern search tool (marks matches)
638	* mar:: a pattern search tool (introduces arbitrary markers into the text)
639	* grp:: a pattern search tool (selects sentences containing a match)
640	@c * gph:: a word-graph annotation tool::
641	@c * dgp:: a dependency parser
642
643	Sinks: programs which read UTT data and produce output in another format
644	* kot:: an untokenizer
645	* con:: a concordance table generator
646	@end menu
647
648	@c ---------------------------------------------------------------------
649	@c TOK
650	@c ---------------------------------------------------------------------
651
652	@page
653	@node tok
654	@section tok - a tokenizer
655
656	@c ----------------------------------------
657
658	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
659	@item @strong{Authors:} @tab Tomasz ObrÄbski
660	@item @strong{Component category:} @tab source
661	@item @strong{Input format:} @tab raw text file
662	@item @strong{Output format:} @tab UTT regular
663	@item @strong{Required annotation:} @tab -
664	@end multitable
665
666
667	@menu
668	* tok description::
669	* tok input::
670	* tok output::
671	* tok command line options::
672	* tok example::
673	@end menu
674
675	@node tok description
676	@subsection Description
677
678	@code{tok} is a simple program which reads a text file and identifies
679	tokens on the basis of their orthographic form. The type of the token
680	is printed as the @var{type} field.
681
682	@node tok input
683	@subsection Input
684
685	Raw text.
686
687	@node tok output
688	@subsection Output
689
690	UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
691
692	@itemize
693
694	@item @code{W}
695	(word)
696	- continuous sequence of letters
697
698	@item @code{N}
699	(number)
700	- continuous sequence of digits
701
702	@item @code{S}
703	(space)
704	- continuous sequence of space characters
705
706	@item @code{P}
707	(punctuation mark)
708	- single printable characters not belonging to any of the other classes
709
710	@item @code{B}
711	(unprintable character)
712	- single unprintable character
713
714	@end itemize
715
716
717
718	@node tok command line options
719	@subsection Command line options
720
721	@table @code
722
723	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
724	Print help.
725
726	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
727	Print version information.
728
729	@item @b{@minus{}@minus{}interactive, @minus{}i}
730	This option toggles interactive mode, which is by default off. In the
731	interactive mode the program does not buffer the output.
732
733	@end table
734
735	@node tok example
736	@subsection Example
737
738	Input:
739
740	@example
741	Piszemy dobre programy.
742	@end example
743
744	Output:
745
746	@example
747	0000 07 W Piszemy
748	0007 01 S _
749	0008 05 W dobre
750	0013 01 S _
751	0014 08 W programy
752	0022 01 P .
753	0023 01 S \n
754	@end example
755
756
757	@c ---------------------------------------------------------------------
758	@c SEN
759	@c ---------------------------------------------------------------------
760
761	@c @node sen - sentencizer
762	@c @chapter sen - sentencizer
763
764	@c Authors: Tomasz ObrÄbski
765
766	@c ---------------------------------------------------------------------
767	@c LEM
768	@c ---------------------------------------------------------------------
769
770	@page
771	@node lem
772	@section lem - morphological analyzer
773
774	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
775	@item @strong{Authors:} @tab Tomasz ObrÄbski, MichaÅ Stolarski
776	@item @strong{Component category:} @tab filter
777	@item @strong{Input format:} @tab UTT regular
778	@item @strong{Output format:} @tab UTT regular
779	@item @strong{Required annotation:} @tab tok
780	@end multitable
781
782	@menu
783	* lem description::
784	* lem command line options::
785	* lem input::
786	* lem output::
787	* lem example::
788	* lem dictionaries::
789	* lem hints::
790	@end menu
791
792	@node lem description
793	@subsection Description
794
795	@command{lem} performs morphological analysis of a simple orthographic
796	word, returning all its possible morphological annotations,
797	disregarding the context.
798
799	@c ----------------------------------------
800
801	@node lem command line options
802	@subsection Command line options
803
804	@table @code
805	@parhelp
806	@parversion
807	@parinteractive
808	@c @parfile
809	@c @paroutput
810	@c @parfail
811	@c @parcopy
812	@parinputfield
813	@paroutputfield
814	@pardictionary
815	@parprocess
816	@parselect
817	@parunselect
818	@paroneline
819	@paronefield
820	@end table
821
822	@c ----------------------------------------
823
824	@node lem input
825	@subsection Input
826
827	Lem reads a UTT file and processes the value of the @var{form} field
828	(the input field may be changed with @option{--input-field} option).
829
830	@node lem output
831	@subsection Output
832
833	@command{lem} adds a new annotation field, whose default name is @code{lem}. In
834	case of ambiguity either the segment is multiplicated (default),
835	multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
836	annotation is produced as the value of single @code{lem} field (option
837	@option{--one-field,-1}):
838
839	@itemize @bullet
840
841	@item
842	unambiguous value format:
843
844	@example
845	<lemma>,<descr>
846	@end example
847
848	@item
849	ambiguous value format (@option{--one-field} option)
850
851
852	@example
853	<lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
854	@end example
855
856	(alternative descriptions for the same lemma are separated by commas,
857	alternative lemmata are separated by semicolons.)
858
859	@end itemize
860
861	@node lem example
862	@subsection Example
863
864	Input:
865
866	@example
867	0000 07 W Piszemy
868	0007 01 S _
869	0008 05 W dobre
870	0013 01 S _
871	0014 08 W programy
872	0022 01 P .
873	0023 01 B \n
874	@end example
875
876	Output (default):
877
878	@example
879	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
880	0007 01 B _
881	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
882	0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
883	0013 01 B _
884	0014 08 W programy lem:program,N/GiNpCa
885	0014 08 W programy lem:program,N/GiNpCn
886	0014 08 W programy lem:program,N/GiNpCv
887	0022 01 P .
888	0023 01 B \n
889	@end example
890
891	Output (@option{--one-line} option):
892
893	@example
894	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
895	0007 01 S _
896	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
897	0013 01 S _
898	0014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
899	0022 01 P .
900	0023 01 S \n
901	@end example
902
903	Output (@option{--one-field} option):
904
905	@example
906	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
907	0007 01 S _
908	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
909	0013 01 S _
910	0014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
911	0022 01 P .
912	0023 01 S \n
913	@end example
914
915	@c ----------------------------------------
916
917	@node lem dictionaries
918	@subsection Dictionaries
919
920	@command{lem} requires a dictionary. The dictionary may be provided in
921	one of two formats: in text (source) format or in binary (fsa) format.
922
923	@subsubheading Text format
924
925	Dictionary entries have the following structure:
926
927	@example
928	<form>;<lemma>,<descr>[;<lemma>,<descr>]
929	@end example
930
931	@var{lemma} may be given explicitly or in the cut-add format:
932
933	@example
934	@code{[<cut1><add1>-]<cut2><add2>}
935	@end example
936
937	meaning: replace prefix of length @code{<cut1>} with
938	string @code{<add1>}, replace suffix of length @code{<cut2>} with string
939	@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
940	@samp{kot}, @code{3-4aÃÅy} transforms @samp{najbielsi} into @samp{biaÃÅy}
941
942	Each dictionary entry must be written in one line and must not contain blank characters.
943
944	Examples:
945	@example
946	kot;0,N/GaNsCn
947	kota;1,N/GaNsCg;1,N/GaNsCa
948	kotu;1,N/GaNsCd
949	kotem;2,N/GaNsCi
950	kocie;3t,N/GaNsCl;3t,N/GaNsCv
951	najbielsi;3-4aÅy,ADJ/DsNpCnGp
952	najbielsze;3-5aÅy,ADJ/DsNpCnGaifn
953	najlepsi;dobry,ADJ/DsNpCnGp
954	najlepsze;dobry,ADJ/DsNpCnGaifn
955	@end example
956
957
958	The mandatory file name extension for a text dictionary is @code{dic}. For large
959	dictionaries it is preferable, however, to compile them into binary
960	(fsa) format.
961
962	@subsubheading Binary format
963
964	The mandatory file name extension for a binary dictionary is @code{bin}. To
965	compile a text dictionary into binary format, write:
966
967	@example
968	compdic <dictionaryname>.dic <dictionaryname>.bin
969	@end example
970
971	@subsubheading Polex/PMDBF dictionary
972
973	A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
974	the distribution as the default @emph{lem}'s dictionary. It's
975	located by default in:
976
977	@file{$HOME/.local/share/utt/pl_PL.ISO-8859-2/lem.bin}
978
979	in local installation or in
980
981	@file{/usr/local/share/utt/pl_PL.ISO-8859-2/lem.bin}
982
983	in system installation.
984
985	@node lem hints
986	@subsection Hints
987
988	@subsubheading Combining data from multiple dictionaries
989
990	@itemize
991
992	@item Apply <dict1>, then apply <dict2> to words which were not annotatated.
993
994	@example
995	lem -d <dict1> \| lem -S lem -d <dict2>
996	@end example
997
998	@item Add annotations from two dictionaries <dict1> and <dict2>.
999
1000	@example
1001	lem -c -d <dict1> \| lem -S lem -d <dict2>
1002	@end example
1003
1004	@end itemize
1005
1006
1007	@c ---------------------------------------------------------------------
1008	@c GUE
1009	@c ---------------------------------------------------------------------
1010
1011	@page
1012	@node gue
1013	@section gue - morphological guesser
1014
1015	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1016
1017	@item @strong{Authors:} @tab MichaÅ Stolarski, Tomasz ObrÄbski
1018	@item @strong{Component category:} @tab filter
1019
1020	@end multitable
1021
1022	@menu
1023	* gue description::
1024	* gue command line options::
1025	* gue example::
1026	* gue dictionaries::
1027	@end menu
1028
1029
1030	@node gue description
1031	@subsection Description
1032
1033	@command{gue} guesess morphological descriptions of the form contained
1034	in the @var{form} field.
1035
1036
1037	@node gue command line options
1038	@subsection Command line options
1039
1040	@table @code
1041
1042	@parhelp
1043	@parversion
1044	@parinteractive
1045	@c @parfile
1046	@c @paroutput
1047	@c @parfail
1048	@c @parcopy
1049	@parinputfield
1050	@paroutputfield
1051	@pardictionary
1052	@parprocess
1053	@parselect
1054	@parunselect
1055	@paroneline
1056	@paronefield
1057
1058	@item @b{@minus{}@minus{}delta=@var{n}}
1059	Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
1060
1061
1062	@item @b{@minus{}@minus{}cut-off=@var{n}}
1063	Do not display answers with less weight than cut-off value (default=`200').
1064
1065
1066	@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
1067	Guess up to n descriptions (default=`0', which means 'display all results').
1068
1069
1070
1071	@end table
1072
1073	@node gue example
1074	@subsection Example
1075
1076	@example
1077	command: gue -n 2
1078
1079	input:
1080	0000 07 W smerfny
1081
1082	output:
1083	0000 07 W smerfny gue:,ADJ/CaDpGiNs
1084	0000 07 W smerfny gue:,ADJ/CnvDpGaipNs
1085	@end example
1086
1087
1088	@node gue dictionaries
1089	@subsection Dictionaries
1090
1091	@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
1092	The fsa format is created by compiling text-format dictionaries.
1093
1094
1095
1096	@subsubheading Text format
1097
1098	Dictionary entries have the following structure:
1099
1100	@example
1101	@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
1102	@end example
1103
1104	@var{lemma} must be given in the cut-add format:
1105
1106	@example
1107	@code{[<cut1><add1>-]<cut2><add2>}
1108	@end example
1109	(no spaces in between): replace prefix of length @var{cut1} with
1110	string @var{add1}, replace suffix of length @var{cat2} with string
1111	@var{add2}.
1112
1113
1114	Example: @code{3-4aÅy} transforms @i{najbielsi} into @i{biaÅy}
1115
1116
1117	@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
1118
1119	@var{weight} is an integer value between 1 and 999 indicating the
1120	likelihood of the guess.
1121
1122	@c @example
1123	@c *ÅkÄ;1a,N/GfNsCa
1124	@c naj*elszy;3-4aÅy,ADJ/...:...
1125	@c @end example
1126
1127
1128	@c ---------------------------------------------------------------------
1129	@c COR
1130	@c ---------------------------------------------------------------------
1131
1132	@page
1133	@node cor
1134	@section cor - spelling corrector
1135
1136	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1137	@item @strong{Authors:} @tab Tomasz ObrÄbski, MichaÅ Stolarski
1138	@item @strong{Component category:} @tab filter
1139	@item @strong{Input format:} @tab UTT regular
1140	@item @strong{Output format:} @tab UTT regular
1141	@item @strong{Required annotation:} @tab tok
1142	@end multitable
1143
1144	@menu
1145	* cor description::
1146	* cor command line options::
1147	* cor dictionaries::
1148	@end menu
1149
1150
1151	@node cor description
1152	@subsection Description
1153
1154	The spelling corrector applies Kemal Oflazer's dynamic programming
1155	algorithm @cite{oflazer96} to the FSA representation of the set of
1156	word forms of the Polex/PMDBF dictionary. Given an incorrect
1157	word form it returns all word forms present in the dictionary whose
1158	edit distance is smaller than the threshold given as the parameter.
1159
1160
1161	@node cor command line options
1162	@subsection Command line options
1163
1164	@table @code
1165
1166	@parhelp
1167	@parversion
1168	@parinteractive
1169	@c @parfile
1170	@c @paroutput
1171	@c @parfail
1172	@c @parcopy
1173	@parinputfield
1174	@paroutputfield
1175	@pardictionary
1176	@parprocess
1177	@parselect
1178	@parunselect
1179	@paroneline
1180	@paronefield
1181
1182	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
1183	Maximum edit distance (default='1').
1184
1185	@c @item @b{@minus{}@minus{}replace, @minus{}r}
1186	@c Replace original form with corrected form, place original form in the
1187	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
1188
1189
1190	@end table
1191
1192	@node cor dictionaries
1193	@subsection Dictionaries
1194
1195	@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
1196	The fsa format is created by compiling text-format dictionaries.
1197
1198	@subsubheading Text format
1199
1200	The @command{cor} dictionary is a list of words:
1201	@example
1202	odlot
1203	odlotowy
1204	odludek
1205	@end example
1206
1207	@subsubheading Binary format
1208
1209	The mandatory file name extension for a binary dictionary is @code{bin}. To
1210	compile a text dictionary into binary format, write:
1211
1212	@example
1213	compdic <dictionaryname>.dic <dictionaryname>.bin
1214	@end example
1215
1216	@c ---------------------------------------------------------------------
1217	@c KOR
1218	@c ---------------------------------------------------------------------
1219
1220	@page
1221	@node kor
1222	@section kor - configurable spelling corrector
1223
1224	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1225	@item @strong{Authors:} @tab PaweÅ Werenski, Tomasz ObrÄbski, MichaÅ Stolarski
1226	@item @strong{Component category:} @tab filter
1227	@item @strong{Input format:} @tab UTT regular
1228	@item @strong{Output format:} @tab UTT regular
1229	@item @strong{Required annotation:} @tab tok
1230	@end multitable
1231
1232	@menu
1233	* kor description::
1234	* kor command line options::
1235	* kor weights definition file::
1236	* kor dictionaries::
1237	@end menu
1238
1239
1240	@node kor description
1241	@subsection Description
1242
1243	The spelling corrector applies a Pawel Werenski's dynamic programming
1244	algorithm to the FSA representation of the set of word forms of the
1245	Polex/PMDBF dictionary. The algorithm is an extension of K. Oflazer
1246	algorithm used by @command{cor}. In the extended version it is
1247	possible to assign weights to individual edit operations.
1248
1249	Given an incorrect word form it returns all word forms
1250	present in the dictionary whose edit distance is smaller than the
1251	threshold given as the parameter.
1252
1253
1254	@node kor command line options
1255	@subsection Command line options
1256
1257	@table @code
1258
1259	@parhelp
1260	@parversion
1261	@parinteractive
1262	@c @parfile
1263	@c @paroutput
1264	@c @parfail
1265	@c @parcopy
1266	@parinputfield
1267	@paroutputfield
1268	@pardictionary
1269	@parprocess
1270	@parselect
1271	@parunselect
1272	@paroneline
1273	@paronefield
1274
1275	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
1276	Maximum edit distance (default='1').
1277
1278	@item @b{@minus{}@minus{}weights=@var{filename}, @minus{}w @var{filename}}
1279	Edit operations' weights file.
1280
1281	@c @item @b{@minus{}@minus{}replace, @minus{}r}
1282	@c Replace original form with corrected form, place original form in the
1283	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
1284
1285
1286	@end table
1287
1288
1289	@node kor weights definition file
1290	@subsection Weights definition file
1291
1292	Example:
1293
1294	@example
1295
1296	%stdcor 1
1297	%xchg 1
1298	ÅŒ rz 0.5
1299	ch h 0.5
1300	u Ã³ 0.5
1301
1302	@end example
1303
1304
1305	Default weight is set to 1 (@code{%stdcor 1}), the weight of exchange
1306	operation is set to 1 (@code{%xchg 1}), the three principal orthographic
1307	errors are assigned the weight 0.5.
1308
1309	The edit operation weight declaration, such as
1310
1311	@example
1312	ÅŒ rz 0.5
1313	@end example
1314
1315	works in both ways, i.e. ÅŒ->rz, rz->ÅŒ.
1316
1317	The default weights definition file for @code{kor} is:
1318
1319	@example
1320	$HOME/.local/share/utt/weights.kor
1321	@end example
1322
1323	or, if the above mentioned file is absent:
1324
1325	@example
1326	/usr/local/share/utt/weights.kor
1327	@end example
1328
1329
1330	@node kor dictionaries
1331	@subsection Dictionaries
1332
1333	see @command{cor}
1334
1335	@c ---------------------------------------------------------------------
1336	@c SEN
1337	@c ---------------------------------------------------------------------
1338
1339	@page
1340	@node sen
1341	@section sen - a sentensizer
1342
1343	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1344
1345	@item @strong{Authors:} @tab Tomasz ObrÄbski
1346	@item @strong{Component category:} @tab filter
1347	@item @strong{Input format:} @tab UTT regular
1348	@item @strong{Output format:} @tab UTT regular
1349	@item @strong{Required annotation:} @tab tok
1350
1351	@end multitable
1352
1353
1354	@menu
1355	* sen description::
1356	@c * sen input::
1357	@c * sen output::
1358	* sen example::
1359	@end menu
1360
1361	@node sen description
1362	@subsection Description
1363
1364	@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
1365
1366	@node sen example
1367	@subsection Example
1368
1369	@example
1370	command: sen
1371
1372	input:
1373	0000 05 W CzeÅÄ
1374	0005 01 P !
1375	0006 01 S _
1376	0007 02 W To
1377	0009 01 S _
1378	0010 02 W ja
1379	0012 01 P .
1380	0013 01 S \n
1381
1382	output:
1383	0000 00 BOS *
1384	0000 05 W CzeÅÄ
1385	0005 01 P !
1386	0006 00 EOS *
1387	0006 00 BOS *
1388	0006 01 S _
1389	0007 02 W To
1390	0009 01 S _
1391	0010 02 W ja
1392	0012 01 P .
1393	0013 01 S \n
1394	0014 00 EOS *
1395	@end example
1396
1397
1398	@c ---------------------------------------------------------------------
1399	@c GPH
1400	@c ---------------------------------------------------------------------
1401
1402	@c @node gph - graphizer
1403	@c @chapter gph - graphizer
1404
1405	@c Authors: Tomasz ObrÄbski
1406
1407
1408
1409	@c ---------------------------------------------------------------------
1410	@c SER
1411	@c ---------------------------------------------------------------------
1412
1413	@page
1414	@node ser
1415	@section ser - pattern search tool
1416
1417	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1418	@item @strong{Authors:} @tab Tomasz ObrÄbski
1419	@item @strong{Component category:} @tab filter
1420	@item @strong{Input format:} @tab UTT regular
1421	@item @strong{Output format:} @tab UTT regular
1422	@item @strong{Required annotation:} @tab tok, lem --one-field
1423	@end multitable
1424
1425	@menu
1426	* ser description::
1427	* ser command line options::
1428	* ser pattern::
1429	* ser how ser works::
1430	* ser customization::
1431	* ser limitations::
1432	* ser requirements::
1433	@end menu
1434
1435
1436	@node ser description
1437	@subsection Description
1438
1439	@command{ser} looks for patterns in UTT-formatted texts.
1440
1441
1442	@c ---------------------------------------------------------------------
1443	@node ser command line options
1444	@subsection Command line options
1445
1446	@table @code
1447
1448	@parhelp
1449	@parversion
1450	@c @parfile
1451	@c @paroutput
1452	@c @parinputfield
1453	@c @paroutputfield
1454	@parprocess
1455	@parinteractive
1456
1457	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1458	The search pattern.
1459
1460	@item @b{@minus{}@minus{}morph=@var{field}}
1461	The name of the annotation field containing the morphological
1462	description (default @code{lem}).
1463
1464	@item @b{@minus{}@minus{}flex}
1465	Only print the generated flex source code.
1466
1467	@item @b{@minus{}@minus{}macro=@var{filename}}
1468	Read macrodefinitions from file @var{filename} rather than from
1469	default location. This option allows to redefine the set of terms.
1470
1471	@item @b{@minus{}@minus{}define=@var{filename}}
1472	Append macrodefinitions from file @var{filename}. This option
1473	allows to extend the set of terms.
1474
1475	@end table
1476
1477
1478	@c ---------------------------------------------------------------------
1479	@node ser pattern
1480	@subsection Pattern
1481
1482	The @command{ser} pattern is a regular expression over terms corresponding
1483	to text segments or segment sequences. Predefined terms are:
1484
1485	@table @code
1486
1487	@item seg(@var{t},@var{f},@var{a})
1488	a segment of type @var{t}, containing form @var{f} and annotation
1489	@var{a}
1490
1491	@item form(@var{f})
1492	a segment containing form @var{f}
1493
1494	@item field(@var{f})
1495	a segment containing annotation field @var{f}
1496
1497	@item space(@var{f})
1498	a space segment of form @var{f}
1499
1500	@item word(@var{f})
1501	a word segment of form @var{f}
1502
1503	@item punct(@var{f})
1504	a punct segment of form @var{f}
1505
1506	@item number(@var{f})
1507	a number segment of form @var{f}
1508
1509	@item lexeme(@var{f})
1510	a word segment with lemma @var{f}
1511
1512	@item cat(@var{c})
1513	a word segment of category @var{c}
1514
1515	@end table
1516
1517	All arguments are optional. If an argument is omitted, an arbitrary
1518	string of non-blank characters is assumed as the argument value. Term
1519	arguments may be arbitrary character-level regular expressions. The
1520	following special symbols can by used:
1521
1522	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1523	@item @code{[@dots{}]} @tab a character class
1524	@item @code{[^@dots{}]} @tab a negated character class
1525	@item @code{\|} @tab alternative
1526	@item @code{*} @tab repetition, including zero times
1527	@item @code{+} @tab repetition, at least one time
1528	@item @code{?} @tab optionality
1529	@item @code{@{@var{m},@var{n}@}} @tab repetition from @var{m} to @var{n} times
1530	@item @code{@{@var{m},@}} @tab repetition @var{m} or more times
1531	@item @code{@{@var{m}@}} @tab repetition @var{m} times
1532	@item @code{@var{\ddd}} @tab the character with octal value @var{ddd}
1533	@item @code{\x@var{hh}} @tab the character with hexadecimal value @var{hh}
1534	@item @code{( )} @tab parentheses, used to override precedence
1535	@c @end multitable
1536
1537	@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1538	@item @code{.} @tab a non-blank character
1539	@item @code{\w} @tab a letter
1540	@item @code{\W} @tab a non-blank character other than a letter
1541	@item @code{\d} @tab a digit
1542	@item @code{\D} @tab a non-blank character other than a digit
1543	@item @code{\s} @tab a space or tab character
1544	@item @code{\S} @tab a non-blank character (the same as @code{.})
1545	@item @code{\l} @tab a lowercase letter
1546	@item @code{\L} @tab an uppercase letter
1547	@end multitable
1548
1549
1550	@noindent The following characters:
1551	@example
1552	@verb{% [ ] ^ \| * + ? { } , . < > \ %}
1553	@end example
1554	must be escaped with a backslash, i.e. written as:
1555	@example
1556	@verb{% \[ \] \^ \\| \* \+ \? \{ \} \, \. \< \> \\ %}
1557	@end example
1558
1559	@quotation Note
1560	The special symbols are ... borrowed from Perl with minor
1561	modifications ... for convenience
1562	The meaning of certain special characters/sequences slightly differs
1563	from their common ???. This is motivated by convenience reasons.
1564	The meaning of the @code{.} special character is modified due to
1565	the special function of spaces in utt files (they are field
1566	separators). Use @code{\s} to explicitly
1567	@end quotation
1568
1569	In the argument of the @code{cat} term a special operator <...> may be
1570	used. A category specification enclosed in angle brackets matches all
1571	category descriptions which are consistent (non-contradictory) with the
1572	specification. For example @code{<N>} matches all noun descriptions,
1573	@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
1574
1575
1576	@*
1577	@noindent @b{Examples of one-segment patterns:}
1578
1579	@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1580	@item @code{seg} @tab any segment
1581	@item @code{word} @tab any word-form
1582	@item @code{word(pomocy)} @tab the word-form @samp{pomocy}
1583	@item @code{word(naj.+)} @tab a word-form beginning with @samp{naj}
1584	@item @code{word(\L\l+)} @tab a capitalized word-form
1585	@item @code{punct} @tab a punctuation character
1586	@item @code{space(.\\n.)} @tab a space segment containing a newline character
1587	@item @code{lexeme(pomoc)} @tab any form of the lexeme 'pomoc'
1588	@item @code{cat(N/.*)} @tab a word which category starts with @code{N/}
1589	@item @code{cat(<N/Ca>)} @tab a word which category matches @code{N/Ca}
1590	@end multitable
1591
1592	@*
1593	@noindent @b{Examples of multi-segment patterns:}
1594
1595	@table @code
1596
1597	@item (word(\L) punct(\.) space?)+ word(\L\l+)
1598	a sequence of initials followed by a surname
1599
1600	@item punct seg(W\|S\|N)* cat(<NPRO/Sr>) seg(W\|S\|N)* punct
1601	a text fragment between two punctuation characters, containing an
1602	ocurrence of a relative pronoun
1603
1604	@end table
1605
1606
1607	@node ser how ser works
1608	@subsection How ser works
1609
1610	@node ser customization
1611	@subsection Customization
1612
1613	@c All predefined terms correspond to single segments,
1614
1615	@example
1616	define(`verbseq', `(cat(<V>) (space cat(<V>)))')
1617	@end example
1618
1619
1620	the term @code{cat()} may not be used as a ... of
1621
1622	@c See @command{m4} manual for further details on macro definition format.
1623
1624	@node ser limitations
1625	@subsection Limitations
1626
1627	Do not use more than 3 attributes in <>.
1628
1629	@node ser requirements
1630	@subsection Requirements
1631
1632	In order to run @command{ser}, the following programs must be
1633	installed in the system:
1634
1635	@itemize
1636
1637	@item @command{m4}
1638	@item @command{grep}
1639	@item @command{flex}
1640	@item @command{gcc}
1641
1642	@end itemize
1643
1644
1645	@c ---------------------------------------------------------------------
1646	@c GRP
1647	@c ---------------------------------------------------------------------
1648
1649	@page
1650	@node grp
1651	@section grp - pattern search tool
1652
1653	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1654	@item @strong{Authors:} @tab Tomasz ObrÄbski
1655	@item @strong{Component category:} @tab filter
1656	@item @strong{Input format:} @tab UTT flattened
1657	@item @strong{Output format:} @tab UTT flattened
1658	@item @strong{Required annotation:} @tab tok, sen, lem --one-field
1659	@end multitable
1660
1661
1662	@menu
1663	* grp description::
1664	* grp command line options::
1665	* grp pattern::
1666	* grp hints::
1667	@end menu
1668
1669
1670	@node grp description
1671	@subsection Description
1672
1673	@code{gre} selects sentences containing an expression matching a
1674	pattern. The pattern format is exactly the same as that accepted by
1675	@code{ser}.
1676
1677	@code{gre} is intended mainly for speeding up corpus search process.
1678	It is extremely fast (processing speed is usually higher then the speed
1679	of reading the corpus file from disk).
1680
1681	@node grp command line options
1682	@subsection Command line options
1683
1684	@table @code
1685
1686	@parhelp
1687	@parversion
1688	@parprocess
1689	@parinteractive
1690
1691	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1692	The search pattern.
1693
1694	@item @b{@minus{}@minus{}morph=@var{field}}
1695	The name of the annotation field containing the morphological
1696	description (default @code{lem}).
1697
1698	@item @b{@minus{}@minus{}command}
1699	Only print the generated flex source code.
1700
1701	@item @b{@minus{}@minus{}macro=@var{filename}}
1702	Read macrodefinitions from file @var{filename} rather than from
1703	default location. This option allows to redefine the set of terms.
1704
1705	@item @b{@minus{}@minus{}define=@var{filename}}
1706	Append macrodefinitions from file @var{filename}. This option
1707	allows to extend the set of terms.
1708
1709	@end table
1710
1711
1712	@node grp pattern
1713	@subsection Pattern
1714
1715	(see @code{ser})
1716
1717	@node grp hints
1718	@subsection Hints
1719
1720	The corpus search speed may be increased by combining grp with lzop
1721	compression tool (grp usually processes data faster than it is read from a
1722	disk, especially for slow laptop drives).
1723
1724	@example
1725	cat corpus \| tok \| sen \| lem -1 \| fla \| lzop -7 > corpus.grp.lzo
1726	@end example
1727
1728	@example
1729	lzop -cd corpus.grp.lzo \| grp -e @var{EXPR} \| unfla \| ser -e @var{EXPR}
1730	@end example
1731
1732
1733
1734	@c ---------------------------------------------------------------------
1735	@c MAR
1736	@c ---------------------------------------------------------------------
1737
1738	@page
1739	@node mar
1740	@section mar
1741
1742	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1743	@item @strong{Authors:} @tab Marcin Walas, Tomasz ObrÄbski
1744	@item @strong{Input format:} @tab UTT flattened
1745	@item @strong{Output format:} @tab UTT flattened
1746	@item @strong{Required annotation:} @tab tok, sen, lem -1
1747	@end multitable
1748
1749	@subsection Description
1750	@code{mar} is a perl script, which matches given pattern on the utt-formated text
1751	and tags matching parts with any number of user-defined tags.
1752
1753	@subsection Command line options
1754	@table @code
1755	@parhelp
1756	@parversion
1757
1758	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1759	The search pattern.
1760	@item @b{@minus{}@minus{}action=@var{action}, @minus{}a @var{action} [p] [s] [P]}
1761	Perform only indicated actions. Where:
1762	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1763	@item @code{p} @tab preprocess
1764	@item @code{s} @tab search
1765	@item @code{P} @tab postprocess
1766	@end multitable
1767	default: psP
1768
1769	@item @b{@minus{}@minus{}command}
1770	print generated sed command, then exit
1771
1772	@item @b{@minus{}@minus{}help, @minus{}h}
1773	print help, then exit
1774
1775	@item @b{@minus{}@minus{}version, @minus{}v}
1776	print version, then exit
1777	@end table
1778	@subsection Tokens in pattern
1779	@code{mar} pattern is based on @code{ser} patterns(see @pxref{ser pattern}). @code{mar} pattern is a @code{ser} pattern,
1780	in which you can add any number of matching tags, which will be printed in exacly the place, where
1781	they were placed in the pattern. A valid token starts with @@ which follows any number of alphanumeric
1782	characters. For example valid match tokens are: @@STARTMATCH @@ENDMATCH
1783
1784	Matching tokens can be placed between, before or after any of @code{ser} pattern terms. They don't have
1785	to be paritied. There can be any number of them in the pattern (zero or more). They don't have to be unique.
1786	They can be placed one after another. For example:
1787
1788	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaa}
1789	@item @code{@@BOM lexeme(pomoc)} @tab place tag @b{BOM} before any form of the lexeme 'pomoc'
1790	@item @code{@@MATCH lexeme(pomoc) @@MATCH} @tab place tag @b{MATCH} before and after any form of the lexeme 'pomoc'
1791	@item @code{cat(<ADJ>) @@MATCH lexeme(pomoc) @@MATCH} @tab place tag @b{MATCH} before and after any form of the lexeme 'pomoc' which is followef by adjective
1792	@item @code{cat(<ADJ>) @@TAG @@BOM lexeme(pomoc) @@EOM} @tab place tags @b{TAG} and @b{BOM} before any form of the lexeme 'pomoc' which is followed by adjective and tag @b{EOM} after it
1793	@end multitable
1794
1795	(see mar's help 'mar -h' for some more information)
1796
1797	@subsection How mar works
1798	@code{mar} translates given @code{ser} pattern with @code{m4} macroprocessor to regular expression. Then it changes it into @code{sed} command script, which is then executed.
1799
1800	You can see translated sed script by using the @code{@minus{}@minus{}command} option.
1801	@subsection Limitations
1802	The complexity of computations performed by @code{mar} increases linearly with the number of placed tokens. So it is highly recommended not to place too much tokens.
1803	@subsection Requirements
1804	In order to run @code{mar}, the following programs must be installed in the system:
1805
1806	@itemize
1807
1808	@item @command{m4}
1809	@item @command{grep}
1810	@item @command{sed}
1811
1812	@end itemize
1813
1814
1815
1816	@c ---------------------------------------------------------------------
1817	@c KOT
1818	@c ---------------------------------------------------------------------
1819
1820	@page
1821	@node kot
1822	@section kot - untokenizer
1823
1824	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1825	@item @strong{Authors:} @tab Tomasz ObrÄbski
1826	@item @strong{Component category:} @tab filter
1827	@item @strong{Input format:} @tab UTT regular
1828	@item @strong{Output format:} @tab text
1829	@item @strong{Required annotation:} @tab tok
1830	@end multitable
1831
1832
1833	@menu
1834	* kot description::
1835	* kot command line options::
1836	* kot usage examples::
1837	@end menu
1838
1839	@node kot description
1840	@subsection Description
1841
1842	@command{kot} transforms a UTT formatted file back into raw text format.
1843
1844	@node kot command line options
1845	@subsection Command line options
1846
1847	@table @code
1848
1849	@parhelp
1850
1851	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1852
1853	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1854
1855	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1856
1857	@c @item @b{@minus{}@minus{}interactive @minus{}i}
1858
1859	@c @item @b{@minus{}@minus{}config=@var{filename}}
1860
1861	@item
1862
1863	@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
1864	print @var{string} between nonadjacent segments of the input file
1865
1866	@item @b{@minus{}@minus{}spaces, @minus{}r}
1867	retain the special characters @code{_}, @code{\t},
1868	@code{\n}, @code{\r}, @code{\f} unexpanded in the output
1869
1870	@end table
1871
1872	@node kot usage examples
1873	@subsection Usage examples
1874
1875	@example
1876	cat legia.txt \| tok \| kot
1877	@end example
1878
1879	@example
1880	cat legia.txt \| tok \| lem -1 \| kot
1881	@end example
1882
1883	@c ---------------------------------------------------------------
1884	@c CON
1885	@c ---------------------------------------------------------------
1886
1887
1888	@page
1889	@node con
1890	@section con - concordance table generator
1891
1892	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1893	@item @strong{Authors:} @tab Justyna Walkowska
1894	@item @strong{Component category:} @tab sink
1895	@item @strong{Input format:} @tab UTT regular
1896	@item @strong{Output format:} @tab text
1897	@item @strong{Required annotation:} @tab ser or mar
1898	@end multitable
1899	@c
1900
1901	@menu
1902	* con description::
1903	* con command line options::
1904	* con usage example::
1905	* con hints::
1906	@end menu
1907
1908
1909	@node con description
1910	@subsection Description
1911
1912	@command{con} generates a concordance table based on a pattern given to @command{ser}.
1913
1914
1915	@node con command line options
1916	@subsection Command line options
1917
1918	@table @code
1919
1920	@parhelp
1921
1922	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
1923	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1924	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1925	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1926	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
1927	@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
1928	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
1929	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
1930	@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
1931	@c @item @b{@minus{}@minus{}interactive @minus{}i}
1932	@c @item @b{@minus{}@minus{}config=@var{filename}}
1933	@c @item
1934	@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1935	@c search pattern
1936	@c
1937	@c @item @b{@minus{}@minus{}flex}
1938	@c only print the generated flex source code
1939	@c
1940	@c @item @b{@minus{}@minus{}macro=@var{filename}}
1941	@c read macrodefinitions from file @var{filename} rather than from
1942	@c default location. This option allows to redefine the set of terms.
1943	@c
1944	@c @item @b{@minus{}@minus{}define=@var{filename}}
1945	@c append macrodefinitions from file @var{filename}. This option
1946	@c allows to extend the set of terms.
1947
1948	@item @b{@minus{}@minus{}left @minus{}l}
1949	Left context info (default='30c'). Example:
1950	@example
1951	-l=5c: left context is 5 characters
1952	-l=5w: left context is 5 words
1953	-l=5s: left context is 5 non-empty input lines
1954	-l='\s*\S+\sr\S+BOS': left context starts with the given regex
1955	@end example
1956
1957	@item @b{@minus{}@minus{}right @minus{}r}
1958	Right context info (default='30c').
1959	@item @b{@minus{}@minus{}trim @minus{}t}
1960	Clear incomplete words from output.
1961	@item @b{@minus{}@minus{}white @minus{}w}
1962	DO NOT change all white characters into spaces.
1963	@item @b{@minus{}@minus{}column @minus{}c}
1964	Left column minimal width in characters (default = 0).
1965	@item @b{@minus{}@minus{}ignore @minus{}i}
1966	Ignore segment inconsistency in the input.
1967	@item @b{@minus{}@minus{}bom}
1968	Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
1969	@item @b{@minus{}@minus{}eom}
1970	End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
1971	@item @b{@minus{}@minus{}bod}
1972	Selected segment beginning display string (default='[').
1973	@item @b{@minus{}@minus{}eod}
1974	Selected segment end display string (default=']').
1975
1976
1977
1978	@end table
1979
1980	@node con usage example
1981	@subsection Usage example
1982	@example
1983	cat file.txt \| tok \| lem -1 \| ser -e 'lexeme(dom)' \| con
1984	@end example
1985
1986
1987	@node con hints
1988	@subsection Hints
1989
1990	@command{con} is a rather slow program. Do not pass large amounts of
1991	redundant text through this program. @command{con} works fine in the following
1992	sequence:
1993
1994	@example
1995	... \| grp -e EXPR \| ser -e EXPR \| con
1996	@end example
1997
1998
1999	@c ---------------------------------------------------------------------
2000	@c ---------------------------------------------------------------------
2001
2002	@page
2003	@node Auxiliary tools
2004	@chapter Auxiliary tools
2005
2006	@menu
2007	* compdic:: dictionary compiler
2008	* fla:: UTT file flattener
2009	* unfla:: UTT file unflattener
2010	@end menu
2011
2012
2013	@page
2014	@node compdic
2015	@section compdic - the dictionary compiler
2016
2017	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2018	@item @strong{Authors:} @tab MichaÅ Stolarski, Tomasz ObrÄbski
2019	@item @strong{Component category:} @tab additional tool
2020	@end multitable
2021	@c
2022
2023	@command{compdic} compiles dictionaries in text format (@code{.dic} extension) into binary
2024	(FST) format (@code{.bin} extension).
2025
2026	Automaton representation of a dictionary is built using the OpenFst toolkit.
2027
2028	In order for the compdic program to work you have to install the OpenFst toolkit in your system.
2029
2030	Usage:
2031	@example
2032	compdic <dictionaryname>.dic <dictionaryname>.bin
2033	@end example
2034
2035	The file <dictionaryname>.bin will be generated.
2036
2037	@c @menu
2038	@c * con command line options::
2039	@c * con usage example::
2040	@c * con hints::
2041	@c @end menu
2042
2043
2044	@c -------------------------------------------------------------------------------
2045	@c FLA
2046	@c -------------------------------------------------------------------------------
2047
2048	@page
2049	@node fla
2050	@section fla - the UTT file flattener
2051
2052	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2053	@item @strong{Authors:} @tab Tomasz ObrÄbski
2054	@item @strong{Input format:} @tab UTT regular
2055	@item @strong{Output format:} @tab UTT flattened
2056	@item @strong{Required annotation:} @tab sen
2057	@end multitable
2058	@c
2059
2060	@menu
2061	* fla description::
2062	@c * fla command line options::
2063	@c * fla usage example::
2064	@end menu
2065
2066
2067	@node fla description
2068	@subsection Description
2069
2070	@command{fla} ``flattens'' a utt file by merging segments belonging
2071	to one sentence in one line. Technically, end-of-line characters
2072	('\n', ASCII code 10) are replaced with line-feed characters ('\f',
2073	ASCII code 12). The flattening makes it possible to process UTT files
2074	with such tools as @command{grep} or @command{sed} sentence by
2075	sentence (used in @command{grp} and @command{mar}).
2076
2077	Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
2078
2079	Flattened files are still human-readible.
2080
2081	Usage:
2082
2083	@example
2084	fla [<bosregex>]
2085	@end example
2086
2087	The facultative argument is a regular expression describing segments
2088	which should be treated as sentence beginnings (the test is: the
2089	segment contains a fragment matching the @code{<bosregex>}). By
2090	default, segments containing a field @code{BOS} are seeked.
2091
2092	@c -------------------------------------------------------------------------------
2093	@c UNFLA
2094	@c -------------------------------------------------------------------------------
2095
2096	@page
2097	@node unfla
2098	@section unfla - the UTT file unflattener
2099
2100	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2101	@item @strong{Authors:} @tab Tomasz ObrÄbski
2102	@item @strong{Input format:} @tab UTT flattened
2103	@item @strong{Output format:} @tab UTT regular
2104	@item @strong{Required annotation:} @tab -
2105	@end multitable
2106
2107	@menu
2108	* unfla description::
2109	@c * fla command line options::
2110	@c * fla usage example::
2111	@end menu
2112
2113	@node unfla description
2114	@subsection Description
2115	@command{unfla} transforms a flattened UTT file, produced by
2116	@command{fla}, into the regular format by restoring end-of-line
2117	characters.
2118
2119
2120
2121
2122	@c ---------------------------------------------------------------------
2123	@c USAGE EXAMPLES
2124	@c ---------------------------------------------------------------------
2125
2126	@node Usage examples
2127	@chapter Usage examples
2128
2129	@subsubheading Simple pipelines
2130
2131	@enumerate
2132
2133	@item tokenization
2134
2135	cat text \| tok > output1
2136
2137	@item morphological annotation (1)
2138
2139	simple dictionary based lemmatization
2140
2141	cat text \| tok \| lem > output1
2142
2143	@item morphological annotation (2)
2144
2145	1) perform dictionary-based lemmatization
2146	4) guess descriptions for words which have no annotation
2147
2148	@example
2149	cat text \| tok \| lem \| gue -S lem > output2
2150	@end example
2151
2152	@item morphological annotation (3)
2153
2154	1) perform dictionary-based lemmatization
2155	2) try to correct words with no annotation
2156	3) perform dictionary-based lemmatization of corrected words
2157	4) guess descriptions for words which still have no annotation
2158
2159	@example
2160	cat text \| tok \| lem \| cor -p W -S lem \| lem -I cor \| gue -p W -S lem
2161	@end example
2162	@item spelling correction
2163
2164
2165
2166	@example
2167	cat text \| tok \| egrep ' W ' \| lem \| egrep -v 'lem:' \| cor -1
2168	@end example
2169
2170	@item Expression extraction
2171
2172	Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
2173
2174	@example
2175	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' -m \| kot > output4
2176	@end example
2177
2178	@item A word in context
2179
2180	Extraction of text fragments containing a form of the lexeme 'rozmowa' in
2181	the context of 5 preceeding and 5 succeeding corpus segments.
2182
2183	@example
2184	cat text \| tok \| lem -1 \| ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m \| kot > output
2185	@end example
2186
2187	@item generation of concordance table (1)
2188
2189	@example
2190	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
2191	@end example
2192
2193	10"
2194
2195	@item generation of concordance table (2)
2196
2197	The same as above but much faster
2198
2199	@example
2200	cat text \| tok \| lem -1 \| \
2201	grp -e 'cat(<V>) space lexeme(rozmowa)' \| \
2202	ser -e 'cat(<V>) space lexeme(rozmowa)' \| \
2203	con
2204	@end example
2205
2206	2"
2207
2208	@item generation of concordance table (3)
2209
2210	Usually, one performs repetitively search over the same corpus. In
2211	such case it is advisable to transform the corpus data into the format
2212	required by @command{grp} first, and then use the preprocessed data.
2213
2214	As @command{grp} (@command{grep}) processes data faster then it is
2215	read from the disk drive, the search time may be still shortened by
2216	using file compression techniques. We suggest using the
2217	@command{lzop} compressor/decompressor.
2218
2219	@item the fastest way to search a large corpus
2220
2221	step 1: corpus preprocessing
2222
2223	@example
2224	cat corpus \| tok \| sen \| lem -1 \
2225	\| fla \| lzop -7 > corpus.grp.lzo
2226	@end example
2227
2228	step 2: search
2229
2230	@example
2231	lzop -cd corpus.grp.lzo \| unfla \| grp -e 'cat(<V>) space
2232	lexeme(rozmowa)' \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
2233	@end example
2234
2235	@end enumerate
2236
2237	@c @subsubheading More complicated configurations
2238
2239
2240	@c @example
2241	@c mknod fifo1 p
2242	@c mknod fifo2 p
2243	@c mknod fifo3 p
2244	@c mknod fifo4 p
2245	@c mknod fifo5 p
2246
2247	@c tok \| lem -p W -e fifo1 > fifo2 &
2248	@c cor -e fifo3 < fifo1 \| lem > fifo4 &
2249	@c gue < fifo3 > fifo5 &
2250	@c sort -m fifo2 fifo4 fifo5
2251
2252	@c rm fifo?
2253	@c @end example
2254
2255
2256	@c ---------------------------------------------------------------------
2257	@c ---------------------------------------------------------------------
2258
2259	@c ---------------------------------------------------------------------
2260	@c PMDBF DICTIONARY
2261	@c ---------------------------------------------------------------------
2262
2263	@node PMDBF dictionary
2264	@chapter PMDBF dictionary
2265
2266	UTT components come with lexical data derived from Polish
2267	Morphological Database (PMDB).
2268
2269	@menu
2270	* PMDBF files::
2271	* PMDBF tag structure::
2272	* PMDBF parts of speech::
2273	* PMDBF morphosyntactic attributes::
2274	@end menu
2275
2276	@node PMDBF files
2277	@section Files
2278
2279	@node PMDBF tag structure
2280	@section Tag structure
2281
2282	pos = [[:upper:]]+
2283
2284	attr = [[:upper:]]+
2285
2286	val = [[:lower:][:digit:]?!*+-] \| <[^>\n]+>
2287
2288	descr = pos ( / ( attr val + ) + ) ?
2289
2290	@node PMDBF parts of speech
2291	@section Parts of speech
2292
2293	@multitable {ADJPRP} { adjectival-passive-participle }
2294	@item @code{N} @tab noun
2295	@item @code{NPRO} @tab nominal-pronoun
2296	@item @code{NV} @tab deverbal-noun
2297	@item @code{V} @tab verb
2298	@item @code{BYC} @tab byc
2299	@item @code{VNI} @tab non-inflected-verb
2300	@item @code{ADJ} @tab adjective
2301	@item @code{ADJPAP} @tab adjectival-passive-participle
2302	@item @code{ADJPRP} @tab adjectival-present-participle
2303	@item @code{ADJPP} @tab adjectival-past-participle
2304	@item @code{ADJPRO} @tab adjectival-pronoun
2305	@item @code{ADJNUM} @tab adjectival-numeral
2306	@item @code{ADV} @tab adverb
2307	@item @code{ADVANP} @tab adverbial-anterior-participle
2308	@item @code{ADVPRP} @tab adverbial-present-participle
2309	@item @code{ADVPRO} @tab adverbial-pronoun
2310	@item @code{ADVNUM} @tab adverbial-numeral
2311	@item @code{P} @tab preposition
2312	@item @code{PPRO} @tab prep-noun-pronoun
2313	@item @code{CONJ} @tab conjunction
2314	@item @code{EXCL} @tab exclamation
2315	@item @code{APP} @tab call
2316	@item @code{ONO} @tab onomatopoeia
2317	@item @code{PART} @tab particle
2318	@item @code{NUMCRD} @tab cardinal-numeral
2319	@item @code{NUMCOL} @tab collective-numeral
2320	@item @code{NUMPAR} @tab partitive-numeral
2321	@item @code{NUMORD} @tab ordinal-numeral
2322	@end multitable
2323
2324	@node PMDBF morphosyntactic attributes
2325	@section Morphosyntactic attributes
2326
2327	@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2328	@c @headitem Attr @tab Val @tab Description
2329	@item
2330	@code{A} @tab @tab Aspect
2331	@item
2332	@tab @code{p} @tab perfect
2333	@item
2334	@tab @code{i} @tab imperfect.
2335	@item
2336	@item
2337	@code{V} @tab @tab Verb-Form
2338	@item
2339	@tab @code{b} @tab infinitive,
2340	@item
2341	@tab @code{p} @tab personal,
2342	@item
2343	@tab @code{i} @tab impersonal.
2344	@item
2345	@item
2346	@code{M} @tab @tab Mood
2347	@item
2348	@tab @code{d} @tab declarative,
2349	@item
2350	@tab @code{c} @tab conditional,
2351	@item
2352	@tab @code{i} @tab imperative.
2353	@item
2354	@item
2355	@code{T} @tab @tab Tense
2356	@item
2357	@tab @code{a} @tab past,
2358	@item
2359	@tab @code{r} @tab present,
2360	@item
2361	@tab @code{f} @tab future.
2362	@item
2363	@item
2364	@code{P} @tab @tab Person
2365	@item
2366	@tab @code{1} @tab 1,
2367	@item
2368	@tab @code{2} @tab 2,
2369	@item
2370	@tab @code{3} @tab 3.
2371	@item
2372	@item
2373	@code{D} @tab @tab Degree
2374	@item
2375	@tab @code{p} @tab positive,
2376	@item
2377	@tab @code{c} @tab comparative,
2378	@item
2379	@tab @code{s} @tab superlative.
2380	@item
2381	@item
2382	@code{N} @tab @tab Number
2383	@item
2384	@tab @code{s} @tab singular,
2385	@item
2386	@tab @code{p} @tab plural.
2387	@item
2388	@item
2389	@code{C} @tab @tab Case
2390	@item
2391	@tab @code{n} @tab nominative,
2392	@item
2393	@tab @code{g} @tab genitive,
2394	@item
2395	@tab @code{d} @tab dative,
2396	@item
2397	@tab @code{a} @tab accusative,
2398	@item
2399	@tab @code{i} @tab instrumantal,
2400	@item
2401	@tab @code{l} @tab locative,
2402	@item
2403	@tab @code{v} @tab vocative.
2404	@item
2405	@code{G} @tab @tab Gender
2406	@item
2407	@tab @code{p} @tab masculine-personal,
2408	@item
2409	@tab @code{a} @tab masculine-animal,
2410	@item
2411	@tab @code{i} @tab masculine-inanimate,
2412	@item
2413	@tab @code{f} @tab feminine,
2414	@item
2415	@tab @code{n} @tab neuter.
2416	@end multitable
2417
2418
2419	@c ---------------------------------------------------------------------
2420	@c ---------------------------------------------------------------------
2421	@c
2422	@c @node Examples
2423	@c @chapter Examples
2424
2425	@c ----------------------------------------------------------------------
2426	@c ----------------------------------------------------------------------
2427
2428	@node GNU Free Documentation License
2429	@chapter GNU Free Documentation License
2430
2431	@c The GNU Free Documentation License.
2432	@center Version 1.2, November 2002
2433
2434	@c This file is intended to be included within another document,
2435	@c hence no sectioning command or @node.
2436
2437	@display
2438	Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
2439	51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
2440
2441	Everyone is permitted to copy and distribute verbatim copies
2442	of this license document, but changing it is not allowed.
2443	@end display
2444
2445	@enumerate 0
2446	@item
2447	PREAMBLE
2448
2449	The purpose of this License is to make a manual, textbook, or other
2450	functional and useful document @dfn{free} in the sense of freedom: to
2451	assure everyone the effective freedom to copy and redistribute it,
2452	with or without modifying it, either commercially or noncommercially.
2453	Secondarily, this License preserves for the author and publisher a way
2454	to get credit for their work, while not being considered responsible
2455	for modifications made by others.
2456
2457	This License is a kind of ``copyleft'', which means that derivative
2458	works of the document must themselves be free in the same sense. It
2459	complements the GNU General Public License, which is a copyleft
2460	license designed for free software.
2461
2462	We have designed this License in order to use it for manuals for free
2463	software, because free software needs free documentation: a free
2464	program should come with manuals providing the same freedoms that the
2465	software does. But this License is not limited to software manuals;
2466	it can be used for any textual work, regardless of subject matter or
2467	whether it is published as a printed book. We recommend this License
2468	principally for works whose purpose is instruction or reference.
2469
2470	@item
2471	APPLICABILITY AND DEFINITIONS
2472
2473	This License applies to any manual or other work, in any medium, that
2474	contains a notice placed by the copyright holder saying it can be
2475	distributed under the terms of this License. Such a notice grants a
2476	world-wide, royalty-free license, unlimited in duration, to use that
2477	work under the conditions stated herein. The ``Document'', below,
2478	refers to any such manual or work. Any member of the public is a
2479	licensee, and is addressed as ``you''. You accept the license if you
2480	copy, modify or distribute the work in a way requiring permission
2481	under copyright law.
2482
2483	A ``Modified Version'' of the Document means any work containing the
2484	Document or a portion of it, either copied verbatim, or with
2485	modifications and/or translated into another language.
2486
2487	A ``Secondary Section'' is a named appendix or a front-matter section
2488	of the Document that deals exclusively with the relationship of the
2489	publishers or authors of the Document to the Document's overall
2490	subject (or to related matters) and contains nothing that could fall
2491	directly within that overall subject. (Thus, if the Document is in
2492	part a textbook of mathematics, a Secondary Section may not explain
2493	any mathematics.) The relationship could be a matter of historical
2494	connection with the subject or with related matters, or of legal,
2495	commercial, philosophical, ethical or political position regarding
2496	them.
2497
2498	The ``Invariant Sections'' are certain Secondary Sections whose titles
2499	are designated, as being those of Invariant Sections, in the notice
2500	that says that the Document is released under this License. If a
2501	section does not fit the above definition of Secondary then it is not
2502	allowed to be designated as Invariant. The Document may contain zero
2503	Invariant Sections. If the Document does not identify any Invariant
2504	Sections then there are none.
2505
2506	The ``Cover Texts'' are certain short passages of text that are listed,
2507	as Front-Cover Texts or Back-Cover Texts, in the notice that says that
2508	the Document is released under this License. A Front-Cover Text may
2509	be at most 5 words, and a Back-Cover Text may be at most 25 words.
2510
2511	A ``Transparent'' copy of the Document means a machine-readable copy,
2512	represented in a format whose specification is available to the
2513	general public, that is suitable for revising the document
2514	straightforwardly with generic text editors or (for images composed of
2515	pixels) generic paint programs or (for drawings) some widely available
2516	drawing editor, and that is suitable for input to text formatters or
2517	for automatic translation to a variety of formats suitable for input
2518	to text formatters. A copy made in an otherwise Transparent file
2519	format whose markup, or absence of markup, has been arranged to thwart
2520	or discourage subsequent modification by readers is not Transparent.
2521	An image format is not Transparent if used for any substantial amount
2522	of text. A copy that is not ``Transparent'' is called ``Opaque''.
2523
2524	Examples of suitable formats for Transparent copies include plain
2525	@sc{ascii} without markup, Texinfo input format, La@TeX{} input
2526	format, @acronym{SGML} or @acronym{XML} using a publicly available
2527	@acronym{DTD}, and standard-conforming simple @acronym{HTML},
2528	PostScript or @acronym{PDF} designed for human modification. Examples
2529	of transparent image formats include @acronym{PNG}, @acronym{XCF} and
2530	@acronym{JPG}. Opaque formats include proprietary formats that can be
2531	read and edited only by proprietary word processors, @acronym{SGML} or
2532	@acronym{XML} for which the @acronym{DTD} and/or processing tools are
2533	not generally available, and the machine-generated @acronym{HTML},
2534	PostScript or @acronym{PDF} produced by some word processors for
2535	output purposes only.
2536
2537	The ``Title Page'' means, for a printed book, the title page itself,
2538	plus such following pages as are needed to hold, legibly, the material
2539	this License requires to appear in the title page. For works in
2540	formats which do not have any title page as such, ``Title Page'' means
2541	the text near the most prominent appearance of the work's title,
2542	preceding the beginning of the body of the text.
2543
2544	A section ``Entitled XYZ'' means a named subunit of the Document whose
2545	title either is precisely XYZ or contains XYZ in parentheses following
2546	text that translates XYZ in another language. (Here XYZ stands for a
2547	specific section name mentioned below, such as ``Acknowledgements'',
2548	``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title''
2549	of such a section when you modify the Document means that it remains a
2550	section ``Entitled XYZ'' according to this definition.
2551
2552	The Document may include Warranty Disclaimers next to the notice which
2553	states that this License applies to the Document. These Warranty
2554	Disclaimers are considered to be included by reference in this
2555	License, but only as regards disclaiming warranties: any other
2556	implication that these Warranty Disclaimers may have is void and has
2557	no effect on the meaning of this License.
2558
2559	@item
2560	VERBATIM COPYING
2561
2562	You may copy and distribute the Document in any medium, either
2563	commercially or noncommercially, provided that this License, the
2564	copyright notices, and the license notice saying this License applies
2565	to the Document are reproduced in all copies, and that you add no other
2566	conditions whatsoever to those of this License. You may not use
2567	technical measures to obstruct or control the reading or further
2568	copying of the copies you make or distribute. However, you may accept
2569	compensation in exchange for copies. If you distribute a large enough
2570	number of copies you must also follow the conditions in section 3.
2571
2572	You may also lend copies, under the same conditions stated above, and
2573	you may publicly display copies.
2574
2575	@item
2576	COPYING IN QUANTITY
2577
2578	If you publish printed copies (or copies in media that commonly have
2579	printed covers) of the Document, numbering more than 100, and the
2580	Document's license notice requires Cover Texts, you must enclose the
2581	copies in covers that carry, clearly and legibly, all these Cover
2582	Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
2583	the back cover. Both covers must also clearly and legibly identify
2584	you as the publisher of these copies. The front cover must present
2585	the full title with all words of the title equally prominent and
2586	visible. You may add other material on the covers in addition.
2587	Copying with changes limited to the covers, as long as they preserve
2588	the title of the Document and satisfy these conditions, can be treated
2589	as verbatim copying in other respects.
2590
2591	If the required texts for either cover are too voluminous to fit
2592	legibly, you should put the first ones listed (as many as fit
2593	reasonably) on the actual cover, and continue the rest onto adjacent
2594	pages.
2595
2596	If you publish or distribute Opaque copies of the Document numbering
2597	more than 100, you must either include a machine-readable Transparent
2598	copy along with each Opaque copy, or state in or with each Opaque copy
2599	a computer-network location from which the general network-using
2600	public has access to download using public-standard network protocols
2601	a complete Transparent copy of the Document, free of added material.
2602	If you use the latter option, you must take reasonably prudent steps,
2603	when you begin distribution of Opaque copies in quantity, to ensure
2604	that this Transparent copy will remain thus accessible at the stated
2605	location until at least one year after the last time you distribute an
2606	Opaque copy (directly or through your agents or retailers) of that
2607	edition to the public.
2608
2609	It is requested, but not required, that you contact the authors of the
2610	Document well before redistributing any large number of copies, to give
2611	them a chance to provide you with an updated version of the Document.
2612
2613	@item
2614	MODIFICATIONS
2615
2616	You may copy and distribute a Modified Version of the Document under
2617	the conditions of sections 2 and 3 above, provided that you release
2618	the Modified Version under precisely this License, with the Modified
2619	Version filling the role of the Document, thus licensing distribution
2620	and modification of the Modified Version to whoever possesses a copy
2621	of it. In addition, you must do these things in the Modified Version:
2622
2623	@enumerate A
2624	@item
2625	Use in the Title Page (and on the covers, if any) a title distinct
2626	from that of the Document, and from those of previous versions
2627	(which should, if there were any, be listed in the History section
2628	of the Document). You may use the same title as a previous version
2629	if the original publisher of that version gives permission.
2630
2631	@item
2632	List on the Title Page, as authors, one or more persons or entities
2633	responsible for authorship of the modifications in the Modified
2634	Version, together with at least five of the principal authors of the
2635	Document (all of its principal authors, if it has fewer than five),
2636	unless they release you from this requirement.
2637
2638	@item
2639	State on the Title page the name of the publisher of the
2640	Modified Version, as the publisher.
2641
2642	@item
2643	Preserve all the copyright notices of the Document.
2644
2645	@item
2646	Add an appropriate copyright notice for your modifications
2647	adjacent to the other copyright notices.
2648
2649	@item
2650	Include, immediately after the copyright notices, a license notice
2651	giving the public permission to use the Modified Version under the
2652	terms of this License, in the form shown in the Addendum below.
2653
2654	@item
2655	Preserve in that license notice the full lists of Invariant Sections
2656	and required Cover Texts given in the Document's license notice.
2657
2658	@item
2659	Include an unaltered copy of this License.
2660
2661	@item
2662	Preserve the section Entitled ``History'', Preserve its Title, and add
2663	to it an item stating at least the title, year, new authors, and
2664	publisher of the Modified Version as given on the Title Page. If
2665	there is no section Entitled ``History'' in the Document, create one
2666	stating the title, year, authors, and publisher of the Document as
2667	given on its Title Page, then add an item describing the Modified
2668	Version as stated in the previous sentence.
2669
2670	@item
2671	Preserve the network location, if any, given in the Document for
2672	public access to a Transparent copy of the Document, and likewise
2673	the network locations given in the Document for previous versions
2674	it was based on. These may be placed in the ``History'' section.
2675	You may omit a network location for a work that was published at
2676	least four years before the Document itself, or if the original
2677	publisher of the version it refers to gives permission.
2678
2679	@item
2680	For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
2681	the Title of the section, and preserve in the section all the
2682	substance and tone of each of the contributor acknowledgements and/or
2683	dedications given therein.
2684
2685	@item
2686	Preserve all the Invariant Sections of the Document,
2687	unaltered in their text and in their titles. Section numbers
2688	or the equivalent are not considered part of the section titles.
2689
2690	@item
2691	Delete any section Entitled ``Endorsements''. Such a section
2692	may not be included in the Modified Version.
2693
2694	@item
2695	Do not retitle any existing section to be Entitled ``Endorsements'' or
2696	to conflict in title with any Invariant Section.
2697
2698	@item
2699	Preserve any Warranty Disclaimers.
2700	@end enumerate
2701
2702	If the Modified Version includes new front-matter sections or
2703	appendices that qualify as Secondary Sections and contain no material
2704	copied from the Document, you may at your option designate some or all
2705	of these sections as invariant. To do this, add their titles to the
2706	list of Invariant Sections in the Modified Version's license notice.
2707	These titles must be distinct from any other section titles.
2708
2709	You may add a section Entitled ``Endorsements'', provided it contains
2710	nothing but endorsements of your Modified Version by various
2711	parties---for example, statements of peer review or that the text has
2712	been approved by an organization as the authoritative definition of a
2713	standard.
2714
2715	You may add a passage of up to five words as a Front-Cover Text, and a
2716	passage of up to 25 words as a Back-Cover Text, to the end of the list
2717	of Cover Texts in the Modified Version. Only one passage of
2718	Front-Cover Text and one of Back-Cover Text may be added by (or
2719	through arrangements made by) any one entity. If the Document already
2720	includes a cover text for the same cover, previously added by you or
2721	by arrangement made by the same entity you are acting on behalf of,
2722	you may not add another; but you may replace the old one, on explicit
2723	permission from the previous publisher that added the old one.
2724
2725	The author(s) and publisher(s) of the Document do not by this License
2726	give permission to use their names for publicity for or to assert or
2727	imply endorsement of any Modified Version.
2728
2729	@item
2730	COMBINING DOCUMENTS
2731
2732	You may combine the Document with other documents released under this
2733	License, under the terms defined in section 4 above for modified
2734	versions, provided that you include in the combination all of the
2735	Invariant Sections of all of the original documents, unmodified, and
2736	list them all as Invariant Sections of your combined work in its
2737	license notice, and that you preserve all their Warranty Disclaimers.
2738
2739	The combined work need only contain one copy of this License, and
2740	multiple identical Invariant Sections may be replaced with a single
2741	copy. If there are multiple Invariant Sections with the same name but
2742	different contents, make the title of each such section unique by
2743	adding at the end of it, in parentheses, the name of the original
2744	author or publisher of that section if known, or else a unique number.
2745	Make the same adjustment to the section titles in the list of
2746	Invariant Sections in the license notice of the combined work.
2747
2748	In the combination, you must combine any sections Entitled ``History''
2749	in the various original documents, forming one section Entitled
2750	``History''; likewise combine any sections Entitled ``Acknowledgements'',
2751	and any sections Entitled ``Dedications''. You must delete all
2752	sections Entitled ``Endorsements.''
2753
2754	@item
2755	COLLECTIONS OF DOCUMENTS
2756
2757	You may make a collection consisting of the Document and other documents
2758	released under this License, and replace the individual copies of this
2759	License in the various documents with a single copy that is included in
2760	the collection, provided that you follow the rules of this License for
2761	verbatim copying of each of the documents in all other respects.
2762
2763	You may extract a single document from such a collection, and distribute
2764	it individually under this License, provided you insert a copy of this
2765	License into the extracted document, and follow this License in all
2766	other respects regarding verbatim copying of that document.
2767
2768	@item
2769	AGGREGATION WITH INDEPENDENT WORKS
2770
2771	A compilation of the Document or its derivatives with other separate
2772	and independent documents or works, in or on a volume of a storage or
2773	distribution medium, is called an ``aggregate'' if the copyright
2774	resulting from the compilation is not used to limit the legal rights
2775	of the compilation's users beyond what the individual works permit.
2776	When the Document is included in an aggregate, this License does not
2777	apply to the other works in the aggregate which are not themselves
2778	derivative works of the Document.
2779
2780	If the Cover Text requirement of section 3 is applicable to these
2781	copies of the Document, then if the Document is less than one half of
2782	the entire aggregate, the Document's Cover Texts may be placed on
2783	covers that bracket the Document within the aggregate, or the
2784	electronic equivalent of covers if the Document is in electronic form.
2785	Otherwise they must appear on printed covers that bracket the whole
2786	aggregate.
2787
2788	@item
2789	TRANSLATION
2790
2791	Translation is considered a kind of modification, so you may
2792	distribute translations of the Document under the terms of section 4.
2793	Replacing Invariant Sections with translations requires special
2794	permission from their copyright holders, but you may include
2795	translations of some or all Invariant Sections in addition to the
2796	original versions of these Invariant Sections. You may include a
2797	translation of this License, and all the license notices in the
2798	Document, and any Warranty Disclaimers, provided that you also include
2799	the original English version of this License and the original versions
2800	of those notices and disclaimers. In case of a disagreement between
2801	the translation and the original version of this License or a notice
2802	or disclaimer, the original version will prevail.
2803
2804	If a section in the Document is Entitled ``Acknowledgements'',
2805	``Dedications'', or ``History'', the requirement (section 4) to Preserve
2806	its Title (section 1) will typically require changing the actual
2807	title.
2808
2809	@item
2810	TERMINATION
2811
2812	You may not copy, modify, sublicense, or distribute the Document except
2813	as expressly provided for under this License. Any other attempt to
2814	copy, modify, sublicense or distribute the Document is void, and will
2815	automatically terminate your rights under this License. However,
2816	parties who have received copies, or rights, from you under this
2817	License will not have their licenses terminated so long as such
2818	parties remain in full compliance.
2819
2820	@item
2821	FUTURE REVISIONS OF THIS LICENSE
2822
2823	The Free Software Foundation may publish new, revised versions
2824	of the GNU Free Documentation License from time to time. Such new
2825	versions will be similar in spirit to the present version, but may
2826	differ in detail to address new problems or concerns. See
2827	@uref{http://www.gnu.org/copyleft/}.
2828
2829	Each version of the License is given a distinguishing version number.
2830	If the Document specifies that a particular numbered version of this
2831	License ``or any later version'' applies to it, you have the option of
2832	following the terms and conditions either of that specified version or
2833	of any later version that has been published (not as a draft) by the
2834	Free Software Foundation. If the Document does not specify a version
2835	number of this License, you may choose any version ever published (not
2836	as a draft) by the Free Software Foundation.
2837	@end enumerate
2838
2839	@page
2840	@heading ADDENDUM: How to use this License for your documents
2841
2842	To use this License in a document you have written, include a copy of
2843	the License in the document and put the following copyright and
2844	license notices just after the title page:
2845
2846	@smallexample
2847	@group
2848	Copyright (C) @var{year} @var{your name}.
2849	Permission is granted to copy, distribute and/or modify this document
2850	under the terms of the GNU Free Documentation License, Version 1.2
2851	or any later version published by the Free Software Foundation;
2852	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
2853	Texts. A copy of the license is included in the section entitled ``GNU
2854	Free Documentation License''.
2855	@end group
2856	@end smallexample
2857
2858	If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
2859	replace the ``with@dots{}Texts.'' line with this:
2860
2861	@smallexample
2862	@group
2863	with the Invariant Sections being @var{list their titles}, with
2864	the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
2865	being @var{list}.
2866	@end group
2867	@end smallexample
2868
2869	If you have Invariant Sections without Cover Texts, or some other
2870	combination of the three, merge those two alternatives to suit the
2871	situation.
2872
2873	If your document contains nontrivial examples of program code, we
2874	recommend releasing these examples in parallel under your choice of
2875	free software license, such as the GNU General Public License,
2876	to permit their use in free software.
2877
2878	@c Local Variables:
2879	@c ispell-local-pdict: "ispell-dict"
2880	@c End:
2881
2882
2883	@c ---------------------------------------------------------------------
2884	@c ---------------------------------------------------------------------
2885
2886	@node Reporting bugs
2887	@chapter Reporting bugs
2888
2889	Report bugs to <obrebski@@amu.edu.pl>.
2890
2891	@c ---------------------------------------------------------------------
2892	@c ---------------------------------------------------------------------
2893
2894	@c @node Copyright
2895	@c @chapter Copyright
2896	@c
2897	@c Copyright 2004 by Tomasz ObrÄbski
2898	@c This software is free for research and educational use.
2899
2900	@c ---------------------------------------------------------------------
2901	@c ---------------------------------------------------------------------
2902
2903	@node Author
2904	@chapter Author
2905
2906
2907	@bye

Note: See TracBrowser for help on using the repository browser.

Download in other formats: