Context Navigation

source: app/doc/utt.texinfo @ 8c8a252

Last change on this file since 8c8a252 was 2d89d4b, checked in by walas <walas@…>, 18 years ago

dokumentacja mar + drobne poprawki mar (version)

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@64 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

Property mode set to 100644

File size: 85.5 KB

Line
1
2	\input texinfo @c --texinfo--
3	@c @documentencoding ISO-8859-2
4	@documentencoding UTF-8
5	@c @documentlanguage pl
6
7	@c %**start of header
8	@setfilename utt.info
9	@settitle UAM Text Tools v0.90
10	@c %**end of header
11
12	@copying
13	This manual is for UAM Text Tools (version 0.90, October, 2008)
14
15	Copyright @copyright{} 2005, 2007 Tomasz ObrÄbski, MichaÅ Stolarski, Justyna Walkowska, PaweÅ Konieczka.
16
17	Permission is granted to copy, distribute and/or modify this document
18	under the terms of the GNU Free Documentation License, Version 1.2 or
19	any later version published by the Free Software Foundation; with no
20	Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
21	copy of the license is included in the section entitled GNU Free
22	Documentation License,,GNU Free Documentation License.
23
24	@c @quotation
25	@c Permission is granted to ...
26	@c No permission is granted until the document is completed.
27	@c @end quotation
28	@end copying
29
30
31	@titlepage
32	@title UAM Text Tools 0.90 - User Manual
33	@subtitle edition 0.01, @today
34	@subtitle status: prescript
35	@author by Justyna Walkowska, Tomasz ObrÄbski and MichaÅ Stolarski
36	@page
37	@vskip 0pt plus 1filll
38	@insertcopying
39	@end titlepage
40
41	@contents
42
43	@c @paragraphindent none
44
45	@iftex
46	@tex
47	% \usepackage[T1]{fontenc}
48	% \usepackage[utf8]{inputenc}
49	% \usepackage{times}
50	@end tex
51
52	@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
53	@end iftex
54	@c @headings off
55	@c @everyheading LEM(1) @\| @\| LEM(1)
56	@everyfooting @today @c @\| @thispage @\|
57
58	@ifnottex
59
60	@node Top
61	@top UTT - UAM Text Tools
62
63	@insertcopying
64
65	@menu
66	* General information::
67	* UTT file format::
68	* Configuration files::
69	* UTT components::
70	* Auxiliary tools::
71	* Usage examples::
72	* PMDBF dictionary::
73	@c * Examples::
74	@c * Copyright::
75	* GNU Free Documentation License::
76	* Reporting bugs::
77	* Author::
78	@end menu
79	@end ifnottex
80
81
82	@c ----------------------------------------------------------------------
83
84	@node General information
85	@chapter General information
86
87	UAM Text Tools (UTT) is a package of language processing tools
88	developed at Adam Mickiewicz University. Its functionality includes:
89
90	@itemize @bullet
91
92	@item
93	tokenization Ã³ÅÄÅŒ
94	@item
95	dictionary-based morphological analysis
96	@item
97	heuristic morphological analysis of unknown words
98	@item
99	spelling correction Ã³ÅÄÅÄÅŒ
100	@item
101	pattern search
102	@item
103	sentence splitting
104	@item
105	generation of concordance tables
106	@end itemize
107
108	The toolkit is destined for processing of raw (not annotated)
109	unrestricted text for any conceivable purpose.
110
111	The system is organized as a collection of command-line programs, each
112	performing one operation, e.g. tokenization, lemmatization, spelling
113	correction. The components are independent one from another, the
114	unifying element being the uniform i/o file format.
115
116	The components may be combined in various ways to provide various text
117	processing services. Also new components supplied by the used may be
118	easily incorporated into the system provided that they respect the i/o
119	file format conventions.
120
121	UTT component programs does not depend on any specific tagset or
122	morphological description format.
123
124	UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
125	the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
126
127	The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use.
128
129
130	List of contributors:
131
132	@itemize
133	@item Pawel Konieczka
134	@item Tomasz ObrÄbski
135	@item MichaÅ Stolarski
136	@item Marcin Walas
137	@item Justyna Walkowska
138	@item PaweÅ WereÅski
139	@end itemize
140
141	@c ----------------------------------------------------------------------
142	@c ---------------------------------------------------------------------
143
144	@node UTT file format
145	@chapter UTT file format
146
147	A UTT file contains annotation of a text. It consists of a sequence of
148	segments. Each segment explicitly refers to a continuous piece of the
149	text and provides some information on it.
150
151	@section Segment format
152
153	A segment occupies one line of a UTT file and consists of
154	space-separated fields:
155
156
157	@quotation
158	@sp 1
159	[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
160	@sp 1
161	@end quotation
162
163	@table @var
164
165	@item @var{start}
166	Non-negative integer value indicating the position in the source text where the
167	segment starts.
168
169	@item @var{length}
170	Non-negative integer value indicating the length of the segment.
171
172	@item @var{type}
173	A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
174	@var{type} reflects the main classification of segments -
175	into words, numbers, punctuation marks, meta-text markers.
176	@xref{tok output,,tok output}, for description of automatically recognized type markers.
177
178	@item @var{form}
179	This field contains the textual form of the segment or the special
180	symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
181
182	The characters or character sequences that have special meaning in the
183	@var{form} field are enumerated below.
184
185	Characters with special meaning:
186
187	@itemize
188	@item @code{_} - space character
189	@item @code{*} - undefined contents
190	@end itemize
191
192	Escape sequences:
193
194	@itemize
195	@item @code{\n} - new line
196	@item @code{\t} - tabulation
197	@item @code{\r} - carriage return
198
199	@item @code{\_} - the @code{_} character
200	@item @code{\} - the @code{} character
201	@item @code{\\} - the @code{\} character
202
203	@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
204	@end itemize
205
206	@item @var{annotation1}
207	@item @var{annotation2}
208	@item ...
209	Annotation fields have the following format:
210
211	@var{longname} @code{:} @var{value}
212
213	or
214
215	@var{shortname} @var{value}
216
217	where @var{longname} is a string of alphanumeric characters
218	(isalnum() test), @var{shortname} - a single non-alphanumeric character
219	(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
220
221	@end table
222
223
224	Only two fields are mandatory: @var{type} and @var{form}. All other fields
225	may be absent. In the case when only one number precedes the
226	@var{type} field, it is interpreted as the @var{START} position.
227
228	If the @var{length} field is ommited, the length of the segment is the
229	length of the @var{form} field, except when the value of the
230	@var{form} field is @code{*} -- in this case, the length is assumed to
231	be 0.
232
233	If the @var{start} field is also absent, the segment is assumed to directly
234	follow the preceding one.
235
236	@c Conventions:
237
238	@c Annotation fields with predefined meaning:
239
240	@c @itemize
241	@c @item @code{!} - UTT components are allowed to modify the contents of
242	@c the @var{form} field (e.g. spelling correction does this). If this happens the
243	@c original form of the segment have to be placed in the @code{!}-field.
244	@c @item @code{@@} - morphological description
245	@c @item @code{=} - node identifier assignment (used in graph encoding)
246	@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
247	@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
248	@c @end itemize
249
250	Segments of length 0 may be used to mark file positions with some
251	information. See e.g. BOS and EOS (beginning/end of sentence) markers
252	in the example below.
253
254	Example:
255
256	sentence: @samp{Piszemy dobre progrumy.}
257
258	@example
259	0000 00 BOS *
260	0000 07 W Piszemy lem:pisaÄ,V
261	0007 01 S _
262	0008 05 W dobre lem:dobry,ADJ
263	0013 01 S _
264	0014 08 W progrumy cor:programy lem:program,N
265	0022 01 P .
266	0023 00 EOS *
267	0023 01 S _
268	0024 00 BOS *
269	0024 11 W Warszawiacy lem:Warszawiak,N
270	0035 01 S _
271	0036 03 W teÅŒ
272	0039 01 P .
273	0040 00 EOS *
274
275	@end example
276
277	@example
278	0000 BOS *
279	0000 W Piszemy lem:pisaÄ,V
280	0007 S _
281	0008 W dobre lem:dobry,ADJ
282	0013 S _
283	0014 W progrumy cor:programy lem:program,N
284	0022 P .
285	0023 EOS *
286	@end example
287
288	Posion information may be provided only for some types of segments:
289
290	@example
291	0000 BOS *
292	W Piszemy lem:pisaÄÂ,V
293	S _
294	W dobre lem:dobry,ADJ
295	S _
296	W progrumy cor:programy lem:program,N
297	P .
298	EOS *
299	S _
300	0024 BOS *
301	W Warszawiacy lem:Warszawiak,N
302	S _
303	W teÅŒ
304	P .
305	EOS *
306	@end example
307
308	Position/length information may be provided only when necessary:
309
310	@example
311	0000 04 N *
312	0000 N 12
313	P .
314	N 5
315	S _
316	W km
317	@end example
318
319	@section UTT File
320
321	A UTT file consists of a sequence of segments. The same text position
322	may be covered by multiple segments. In cosequence, ambiguous text
323	segmentation and ambiguous annotation may be represented.
324
325	There are two structural requirements a valid UTT-formatted file
326	has to meet:
327
328	@itemize @bullet
329
330	@item
331	segments have to be sorted with respect to the @var{position} field,
332
333	@item
334	for each
335	segment ending at position @var{n}, either there must be a segment starting at
336	position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
337	for each segment starting at position @var{n}, either there must be a segment
338	ending at position @var{n-1}, or the position @var{n-1} must not be covered
339	by any segment.
340
341	@end itemize
342
343	A valid annotation for the text fragment
344	@example
345	12.5 km
346	@end example
347
348	may be
349
350	@example
351	0000 02 N 12
352	0000 04 N 12.5
353	0002 01 P .
354	0003 01 N 5
355	0004 01 S _
356	0005 02 W km
357	@end example
358
359	but not
360
361	@example
362	0000 02 N 12
363	0000 04 N 12.5
364	0004 01 S _
365	0005 02 W km
366	@end example
367
368	because in the latter example the first segment (starting at position
369	0000, 2 characters long) ends at position @var{n}=0001 which is
370	covered by the second segment and no segment starts at position
371	@var{n+2}=0002.
372
373
374	@section Flattened UTT file
375
376	A UTT file format has two variants: regular and flattened. The regular
377	format was described above. In the flattened format some of the
378	end-of-line characters are replaced with line-feed characters.
379
380	The flatten format is basically used to represent whole sentences as
381	single lines of the input file (all intrasentential end-of-line
382	characters are replaced with line-feed characters).
383
384	This technical trick permits to perform certain text
385	processing operations on entire sentences with the use of such tools as
386	@command{grep} (see @command{grp} component) or @command{sed} (see @command{mar} component).
387
388	The conversion between the two formats is performed by the tools:
389	@command{fla} and @command{unfla}.
390
391	@section Character encoding
392
393	The UTT component programs accept only 1-byte character encoding, such
394	as ISO, ANSI, DOS.
395
396
397	@c @section Formats
398
399	@c @unnumberedsubsubsec Basic format
400
401	@c While processing large amounts of the overhead related with explicit
402	@c ... of the start position and segment length becomes ... . Therefore,
403	@c for efficiency reasons certain shortcuts are possible:
404
405	@c @unnumberedsubsubsec Relative start position
406
407	@c Start position may be given as relative distance from the last
408	@c absolut position.
409
410	@c @unnumberedsubsubsec Absent length
411
412	@c Segment length may by omitted. Normally it can be restored by counting
413	@c the length of the @emph{form field}. For segments with the special value
414	@c @code{*} in the @emph{form field} length 0 is assumed.
415
416	@c @unnumberedsubsubsec Absent length and start position
417
418	@c Both start position and segment length may be omitted. In this format
419	@c each segment is assumed to follow the previous one. This format is,
420	@c therefore, suitable only for unambiguously tagged text
421	@c (0-length markers can be still used.)
422
423
424	@c @table @code
425	@c @item AL
426	@c @code{1234 03 W kot}
427	@c @item RL
428	@c @code{+56 03 W kot}
429	@c @item A
430	@c @code{1234 W kot}
431	@c @item R
432	@c @code{+56 W kot}
433	@c @item 0
434	@c @code{W kot}
435	@c @end table
436
437
438	@c [JAK UZYSKAÄÂ POLSKIE CZCIONKI W DVI???]
439
440	@macro parhelp
441	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
442	Print help.
443	@end macro
444
445
446	@macro parversion
447	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
448	Print version information.
449	@end macro
450
451	@macro parinteractive
452	@item @b{@minus{}@minus{}interactive, @minus{}i}
453	This option toggles interactive mode, which is by default off. In the
454	interactive mode the program does not buffer the output.
455	@end macro
456
457
458	@c @macro parfile
459	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
460	@c Input file name.
461	@c If this option is absent or equal to '@minus{}', the program
462	@c reads from the standard input.
463	@c @end macro
464
465
466	@c @macro paroutput
467	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
468	@c Regular output file name. To regular output the program sends segments
469	@c which it successfully processed and copies those which were not
470	@c subject to processing. If this option is absent or equal to
471	@c '@minus{}', standard output is used.
472	@c @end macro
473
474	@c @macro parfail
475	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
476	@c Fail output file name. To fail output the program copies the segments
477	@c it failed to process. If this option is absent or equal to
478	@c '@minus{}', standard output is used.
479	@c @end macro
480
481
482	@c @macro parcopy
483	@c @item @b{@minus{}@minus{}copy, @minus{}c}
484	@c Copy succesfully processed segments to regular output also in their
485	@c original input form.
486	@c @end macro
487
488
489	@macro parinputfield
490	@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
491	The field containing the input to the program. The default is the
492	@var{form} field. The fields @var{position}, @var{length}, @var{type},
493	and @var{form} are referred to as @code{1}, @code{2}, @code{3},
494	@code{4}, respectively.
495	@end macro
496
497
498	@macro paroutputfield
499	@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
500	The name of the field added by the program. The default is the name of the program.
501	@end macro
502
503
504	@macro pardictionary
505	@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
506	Dictionary file name.
507	@end macro
508
509
510	@macro parprocess
511	@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
512	Process segments with the specified value in the @var{type} field.
513	Multiple occurences of this option are allowed and are interpreted as
514	disjunction. If this option is absent, all segments are processed.
515	@end macro
516
517
518	@macro parselect
519	@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
520	Select for processing only segments in which the field named
521	@var{fieldname} is present. Multiple occurences of this option are
522	allowed and are interpreted as conjunction of conditions. If this
523	option is absent, all segments are processed.
524	@end macro
525
526
527	@macro parunselect
528	@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
529	Select for processing only segments in which the field @var{fieldname}
530	is absent. Multiple occurences of this option are allowed and are
531	interpreted as conjunction of conditions. If this option is absent,
532	all segments are processed.
533	@end macro
534
535
536	@macro paroneline
537	@item @b{@minus{}@minus{}one-line}
538	This option makes the program print ambiguous annotation in one output
539	line by generating multiple annotation fields. By default when
540	ambiguous annotation may be produced for a segment, the segment is
541	multiplicated and each of the annotations is added to separate copy of
542	the segment.
543	@end macro
544
545
546	@macro paronefield
547	@item @b{@minus{}@minus{}one-field, @minus{}1}
548	This option makes the program print ambiguous annotation in one
549	annotation field. By default when ambiguous annotation may be produced
550	for a segment, the segment is multiplicated and each of the
551	annotations is added to separate copy of the segment.
552
553	This option is useful when working with @command{kot} or @command{con}.
554	@end macro
555
556
557	@c ---------------------------------------------------------------------
558	@c CONFIGURATION FILES
559	@c ---------------------------------------------------------------------
560
561	@node Configuration files
562	@chapter Configuration files
563
564	Values for all command line options accepted by a component
565	may be set in configuration files. The default location of the
566	configuration files for a component named @command{@var{program}} are
567
568	@example
569	@file{/usr/local/etc/utt/@var{program}.conf}
570	@end example
571
572	for system-wide configuration file and
573
574	@example
575	@file{~/.utt/@var{program}.conf}
576	@end example
577
578	for user configuration file.
579
580	@c The configuration file to load may be also specified with the
581	@c @option{--config} option. Configuration file need not be provided.
582
583	For each option, the value is set according to the following priority:
584
585	@itemize
586	@item command line
587	@c @item configuration file indicated with @option{--config} option
588	@item user configuration file (or configuration file indicated with the @option{--config} option)
589	@item system-wide configuration file
590	@end itemize
591
592	Parameter values are specified in the following format:
593
594	@var{parametername}=@var{value}
595
596	where @var{parametername} is the short or long name of an option accepted by
597	the program, or
598
599	@var{parametername}
600
601	if the option does not need arguments.
602
603	You can introduce comments to configuration files using the # sign.
604
605	If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
606
607	@c The equal sign may be omitted.
608
609
610	@quotation Tip
611	If you have two (or more) frequently used sets of options for the same
612	program (eg. lem with PMDBF dictionary and lem with a user dictionary)
613	a good solution is to create two soft links to lem, called
614	eg. lemg and lemu and specify their configuration in files lemg.conf
615	and lemu.conf respectively.
616	@end quotation
617
618	@c ---------------------------------------------------------------------
619	@c COMPONENTS
620	@c ---------------------------------------------------------------------
621
622	@node UTT components
623	@chapter UTT components
624
625	UTT components are of three types:
626
627	@menu
628	Sources: programs which read non-UTT data (e.g. raw text) and produce output
629	in UTT format
630	* tok:: a tokenizer
631
632	Filters: programs which read and produce UTT-formatted data
633	* lem:: a morphological analyzer
634	* gue:: a morphological guesser
635	* cor:: a simple spelling corrector
636	* kor:: a more elaborated spelling corrector
637	* sen:: a sentensizer
638	* ser:: a pattern search tool (marks matches)
639	* mar:: a pattern search tool (introduces arbitrary markers into the text)
640	* grp:: a pattern search tool (selects sentences containing a match)
641	@c * gph:: a word-graph annotation tool::
642	@c * dgp:: a dependency parser
643
644	Sinks: programs which read UTT data and produce output in another format
645	* kot:: an untokenizer
646	* con:: a concordance table generator
647	@end menu
648
649	@c ---------------------------------------------------------------------
650	@c TOK
651	@c ---------------------------------------------------------------------
652
653	@page
654	@node tok
655	@section tok - a tokenizer
656
657	@c ----------------------------------------
658
659	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
660	@item @strong{Authors:} @tab Tomasz ObrÄbski
661	@item @strong{Component category:} @tab source
662	@item @strong{Input format:} @tab raw text file
663	@item @strong{Output format:} @tab UTT regular
664	@item @strong{Required annotation:} @tab -
665	@end multitable
666
667
668	@menu
669	* tok description::
670	* tok input::
671	* tok output::
672	* tok command line options::
673	* tok example::
674	@end menu
675
676	@node tok description
677	@subsection Description
678
679	@code{tok} is a simple program which reads a text file and identifies
680	tokens on the basis of their orthographic form. The type of the token
681	is printed as the @var{type} field.
682
683	@node tok input
684	@subsection Input
685
686	Raw text.
687
688	@node tok output
689	@subsection Output
690
691	UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
692
693	@itemize
694
695	@item @code{W}
696	(word)
697	- continuous sequence of letters
698
699	@item @code{N}
700	(number)
701	- continuous sequence of digits
702
703	@item @code{S}
704	(space)
705	- continuous sequence of space characters
706
707	@item @code{P}
708	(punctuation mark)
709	- single printable characters not belonging to any of the other classes
710
711	@item @code{B}
712	(unprintable character)
713	- single unprintable character
714
715	@end itemize
716
717
718
719	@node tok command line options
720	@subsection Command line options
721
722	@table @code
723
724	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
725	Print help.
726
727	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
728	Print version information.
729
730	@item @b{@minus{}@minus{}interactive, @minus{}i}
731	This option toggles interactive mode, which is by default off. In the
732	interactive mode the program does not buffer the output.
733
734	@end table
735
736	@node tok example
737	@subsection Example
738
739	Input:
740
741	@example
742	Piszemy dobre programy.
743	@end example
744
745	Output:
746
747	@example
748	0000 07 W Piszemy
749	0007 01 S _
750	0008 05 W dobre
751	0013 01 S _
752	0014 08 W programy
753	0022 01 P .
754	0023 01 S \n
755	@end example
756
757
758	@c ---------------------------------------------------------------------
759	@c SEN
760	@c ---------------------------------------------------------------------
761
762	@c @node sen - sentencizer
763	@c @chapter sen - sentencizer
764
765	@c Authors: Tomasz ObrÄbski
766
767	@c ---------------------------------------------------------------------
768	@c LEM
769	@c ---------------------------------------------------------------------
770
771	@page
772	@node lem
773	@section lem - morphological analyzer
774
775	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
776	@item @strong{Authors:} @tab Tomasz ObrÄbski, MichaÅ Stolarski
777	@item @strong{Component category:} @tab filter
778	@item @strong{Input format:} @tab UTT regular
779	@item @strong{Output format:} @tab UTT regular
780	@item @strong{Required annotation:} @tab tok
781	@end multitable
782
783	@menu
784	* lem description::
785	* lem command line options::
786	* lem input::
787	* lem output::
788	* lem example::
789	* lem dictionaries::
790	* lem hints::
791	@end menu
792
793	@node lem description
794	@subsection Description
795
796	@command{lem} performs morphological analysis of a simple orthographic
797	word, returning all its possible morphological annotations,
798	disregarding the context.
799
800	@c ----------------------------------------
801
802	@node lem command line options
803	@subsection Command line options
804
805	@table @code
806	@parhelp
807	@parversion
808	@parinteractive
809	@c @parfile
810	@c @paroutput
811	@c @parfail
812	@c @parcopy
813	@parinputfield
814	@paroutputfield
815	@pardictionary
816	@parprocess
817	@parselect
818	@parunselect
819	@paroneline
820	@paronefield
821	@end table
822
823	@c ----------------------------------------
824
825	@node lem input
826	@subsection Input
827
828	Lem reads a UTT file and processes the value of the @var{form} field
829	(the input field may be changed with @option{--input-field} option).
830
831	@node lem output
832	@subsection Output
833
834	@command{lem} adds a new annotation field, whose default name is @code{lem}. In
835	case of ambiguity either the segment is multiplicated (default),
836	multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
837	annotation is produced as the value of single @code{lem} field (option
838	@option{--one-field,-1}):
839
840	@itemize @bullet
841
842	@item
843	unambiguous value format:
844
845	@example
846	<lemma>,<descr>
847	@end example
848
849	@item
850	ambiguous value format (@option{--one-field} option)
851
852
853	@example
854	<lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
855	@end example
856
857	(alternative descriptions for the same lemma are separated by commas,
858	alternative lemmata are separated by semicolons.)
859
860	@end itemize
861
862	@node lem example
863	@subsection Example
864
865	Input:
866
867	@example
868	0000 07 W Piszemy
869	0007 01 S _
870	0008 05 W dobre
871	0013 01 S _
872	0014 08 W programy
873	0022 01 P .
874	0023 01 B \n
875	@end example
876
877	Output (default):
878
879	@example
880	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
881	0007 01 B _
882	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
883	0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
884	0013 01 B _
885	0014 08 W programy lem:program,N/GiNpCa
886	0014 08 W programy lem:program,N/GiNpCn
887	0014 08 W programy lem:program,N/GiNpCv
888	0022 01 P .
889	0023 01 B \n
890	@end example
891
892	Output (@option{--one-line} option):
893
894	@example
895	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
896	0007 01 S _
897	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
898	0013 01 S _
899	0014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
900	0022 01 P .
901	0023 01 S \n
902	@end example
903
904	Output (@option{--one-field} option):
905
906	@example
907	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
908	0007 01 S _
909	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
910	0013 01 S _
911	0014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
912	0022 01 P .
913	0023 01 S \n
914	@end example
915
916	@c ----------------------------------------
917
918	@node lem dictionaries
919	@subsection Dictionaries
920
921	@command{lem} requires a dictionary. The dictionary may be provided in
922	one of two formats: in text (source) format or in binary (fsa) format.
923
924	@subsubheading Text format
925
926	Dictionary entries have the following structure:
927
928	@example
929	<form>;<lemma>,<descr>[;<lemma>,<descr>]
930	@end example
931
932	@var{lemma} may be given explicitly or in the cut-add format:
933
934	@example
935	@code{[<cut1><add1>-]<cut2><add2>}
936	@end example
937
938	meaning: replace prefix of length @code{<cut1>} with
939	string @code{<add1>}, replace suffix of length @code{<cut2>} with string
940	@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
941	@samp{kot}, @code{3-4aÃÅy} transforms @samp{najbielsi} into @samp{biaÃÅy}
942
943	Each dictionary entry must be written in one line and must not contain blank characters.
944
945	Examples:
946	@example
947	kot;0,N/GaNsCn
948	kota;1,N/GaNsCg;1,N/GaNsCa
949	kotu;1,N/GaNsCd
950	kotem;2,N/GaNsCi
951	kocie;3t,N/GaNsCl;3t,N/GaNsCv
952	najbielsi;3-4aÅy,ADJ/DsNpCnGp
953	najbielsze;3-5aÅy,ADJ/DsNpCnGaifn
954	najlepsi;dobry,ADJ/DsNpCnGp
955	najlepsze;dobry,ADJ/DsNpCnGaifn
956	@end example
957
958
959	The mandatory file name extension for a text dictionary is @code{dic}. For large
960	dictionaries it is preferable, however, to compile them into binary
961	(fsa) format.
962
963	@subsubheading Binary format
964
965	The mandatory file name extension for a binary dictionary is @code{bin}. To
966	compile a text dictionary into binary format, write:
967
968	@example
969	compiledic <dictionaryname>.dic
970	@end example
971
972	@subsubheading Polex/PMDBF dictionary
973
974	A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
975	the distribution as the default @emph{lem}'s dictionary. It's
976	located by default in:
977
978	@file{$HOME/.local/share/utt/pl_PL.ISO-8859-2/lem.bin}
979
980	in local installation or in
981
982	@file{/usr/local/share/utt/pl_PL.ISO-8859-2/lem.bin}
983
984	in system installation.
985
986	@node lem hints
987	@subsection Hints
988
989	@subsubheading Combining data from multiple dictionaries
990
991	@itemize
992
993	@item Apply <dict1>, then apply <dict2> to words which were not annotatated.
994
995	@example
996	lem -d <dict1> \| lem -S lem -d <dict2>
997	@end example
998
999	@item Add annotations from two dictionaries <dict1> and <dict2>.
1000
1001	@example
1002	lem -c -d <dict1> \| lem -S lem -d <dict2>
1003	@end example
1004
1005	@end itemize
1006
1007
1008	@c ---------------------------------------------------------------------
1009	@c GUE
1010	@c ---------------------------------------------------------------------
1011
1012	@page
1013	@node gue
1014	@section gue - morphological guesser
1015
1016	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1017
1018	@item @strong{Authors:} @tab MichaÅ Stolarski, Tomasz ObrÄbski
1019	@item @strong{Component category:} @tab filter
1020
1021	@end multitable
1022
1023	@menu
1024	* gue description::
1025	* gue command line options::
1026	* gue example::
1027	* gue dictionaries::
1028	@end menu
1029
1030
1031	@node gue description
1032	@subsection Description
1033
1034	@command{gue} guesess morphological descriptions of the form contained
1035	in the @var{form} field.
1036
1037
1038	@node gue command line options
1039	@subsection Command line options
1040
1041	@table @code
1042
1043	@parhelp
1044	@parversion
1045	@parinteractive
1046	@c @parfile
1047	@c @paroutput
1048	@c @parfail
1049	@c @parcopy
1050	@parinputfield
1051	@paroutputfield
1052	@pardictionary
1053	@parprocess
1054	@parselect
1055	@parunselect
1056	@paroneline
1057	@paronefield
1058
1059	@item @b{@minus{}@minus{}delta=@var{n}}
1060	Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
1061
1062
1063	@item @b{@minus{}@minus{}cut-off=@var{n}}
1064	Do not display answers with less weight than cut-off value (default=`200').
1065
1066
1067	@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
1068	Guess up to n descriptions (default=`0', which means 'display all results').
1069
1070
1071
1072	@end table
1073
1074	@node gue example
1075	@subsection Example
1076
1077	@example
1078	command: gue -n 2
1079
1080	input:
1081	0000 07 W smerfny
1082
1083	output:
1084	0000 07 W smerfny gue:,ADJ/CaDpGiNs
1085	0000 07 W smerfny gue:,ADJ/CnvDpGaipNs
1086	@end example
1087
1088
1089	@node gue dictionaries
1090	@subsection Dictionaries
1091
1092	@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
1093	The fsa format is created by compiling text-format dictionaries.
1094
1095
1096
1097	@subsubheading Text format
1098
1099	Dictionary entries have the following structure:
1100
1101	@example
1102	@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
1103	@end example
1104
1105	@var{lemma} must be given in the cut-add format:
1106
1107	@example
1108	@code{[<cut1><add1>-]<cut2><add2>}
1109	@end example
1110	(no spaces in between): replace prefix of length @var{cut1} with
1111	string @var{add1}, replace suffix of length @var{cat2} with string
1112	@var{add2}.
1113
1114
1115	Example: @code{3-4aÅy} transforms @i{najbielsi} into @i{biaÅy}
1116
1117
1118	@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
1119
1120	@var{weight} is an integer value between 1 and 999 indicating the
1121	likelihood of the guess.
1122
1123	@c @example
1124	@c *ÅkÄ;1a,N/GfNsCa
1125	@c naj*elszy;3-4aÅy,ADJ/...:...
1126	@c @end example
1127
1128
1129	@c ---------------------------------------------------------------------
1130	@c COR
1131	@c ---------------------------------------------------------------------
1132
1133	@page
1134	@node cor
1135	@section cor - spelling corrector
1136
1137	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1138	@item @strong{Authors:} @tab Tomasz ObrÄbski, MichaÅ Stolarski
1139	@item @strong{Component category:} @tab filter
1140	@item @strong{Input format:} @tab UTT regular
1141	@item @strong{Output format:} @tab UTT regular
1142	@item @strong{Required annotation:} @tab tok
1143	@end multitable
1144
1145	@menu
1146	* cor description::
1147	* cor command line options::
1148	* cor dictionaries::
1149	@end menu
1150
1151
1152	@node cor description
1153	@subsection Description
1154
1155	The spelling corrector applies Kemal Oflazer's dynamic programming
1156	algorithm @cite{oflazer96} to the FSA representation of the set of
1157	word forms of the Polex/PMDBF dictionary. Given an incorrect
1158	word form it returns all word forms present in the dictionary whose
1159	edit distance is smaller than the threshold given as the parameter.
1160
1161
1162	@node cor command line options
1163	@subsection Command line options
1164
1165	@table @code
1166
1167	@parhelp
1168	@parversion
1169	@parinteractive
1170	@c @parfile
1171	@c @paroutput
1172	@c @parfail
1173	@c @parcopy
1174	@parinputfield
1175	@paroutputfield
1176	@pardictionary
1177	@parprocess
1178	@parselect
1179	@parunselect
1180	@paroneline
1181	@paronefield
1182
1183	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
1184	Maximum edit distance (default='1').
1185
1186	@c @item @b{@minus{}@minus{}replace, @minus{}r}
1187	@c Replace original form with corrected form, place original form in the
1188	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
1189
1190
1191	@end table
1192
1193	@node cor dictionaries
1194	@subsection Dictionaries
1195
1196	@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
1197	The fsa format is created by compiling text-format dictionaries.
1198
1199	@subsubheading Text format
1200
1201	The @command{cor} dictionary is a list of words:
1202	@example
1203	odlot
1204	odlotowy
1205	odludek
1206	@end example
1207
1208	@subsubheading Binary format
1209
1210	The mandatory file name extension for a binary dictionary is @code{bin}. To
1211	compile a text dictionary into binary format, write:
1212
1213	@example
1214	compiledic <dictionaryname>.dic
1215	@end example
1216
1217	@c ---------------------------------------------------------------------
1218	@c KOR
1219	@c ---------------------------------------------------------------------
1220
1221	@page
1222	@node kor
1223	@section kor - configurable spelling corrector
1224
1225	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1226	@item @strong{Authors:} @tab PaweÅ Werenski, Tomasz ObrÄbski, MichaÅ Stolarski
1227	@item @strong{Component category:} @tab filter
1228	@item @strong{Input format:} @tab UTT regular
1229	@item @strong{Output format:} @tab UTT regular
1230	@item @strong{Required annotation:} @tab tok
1231	@end multitable
1232
1233	@menu
1234	* kor description::
1235	* kor command line options::
1236	* kor weights definition file::
1237	* kor dictionaries::
1238	@end menu
1239
1240
1241	@node kor description
1242	@subsection Description
1243
1244	The spelling corrector applies a Pawel Werenski's dynamic programming
1245	algorithm to the FSA representation of the set of word forms of the
1246	Polex/PMDBF dictionary. The algorithm is an extension of K. Oflazer
1247	algorithm used by @command{cor}. In the extended version it is
1248	possible to assign weights to individual edit operations.
1249
1250	Given an incorrect word form it returns all word forms
1251	present in the dictionary whose edit distance is smaller than the
1252	threshold given as the parameter.
1253
1254
1255	@node kor command line options
1256	@subsection Command line options
1257
1258	@table @code
1259
1260	@parhelp
1261	@parversion
1262	@parinteractive
1263	@c @parfile
1264	@c @paroutput
1265	@c @parfail
1266	@c @parcopy
1267	@parinputfield
1268	@paroutputfield
1269	@pardictionary
1270	@parprocess
1271	@parselect
1272	@parunselect
1273	@paroneline
1274	@paronefield
1275
1276	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
1277	Maximum edit distance (default='1').
1278
1279	@item @b{@minus{}@minus{}weights=@var{filename}, @minus{}w @var{filename}}
1280	Edit operations' weights file.
1281
1282	@c @item @b{@minus{}@minus{}replace, @minus{}r}
1283	@c Replace original form with corrected form, place original form in the
1284	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
1285
1286
1287	@end table
1288
1289
1290	@node kor weights definition file
1291	@subsection Weights definition file
1292
1293	Example:
1294
1295	@example
1296
1297	%stdcor 1
1298	%xchg 1
1299	ÅŒ rz 0.5
1300	ch h 0.5
1301	u Ã³ 0.5
1302
1303	@end example
1304
1305
1306	Default weight is set to 1 (@code{%stdcor 1}), the weight of exchange
1307	operation is set to 1 (@code{%xchg 1}), the three principal orthographic
1308	errors are assigned the weight 0.5.
1309
1310	The edit operation weight declaration, such as
1311
1312	@example
1313	ÅŒ rz 0.5
1314	@end example
1315
1316	works in both ways, i.e. ÅŒ->rz, rz->ÅŒ.
1317
1318	The default weights definition file for @code{kor} is:
1319
1320	@example
1321	$HOME/.local/share/utt/weights.kor
1322	@end example
1323
1324	or, if the above mentioned file is absent:
1325
1326	@example
1327	/usr/local/share/utt/weights.kor
1328	@end example
1329
1330
1331	@node kor dictionaries
1332	@subsection Dictionaries
1333
1334	see @command{cor}
1335
1336	@c ---------------------------------------------------------------------
1337	@c SEN
1338	@c ---------------------------------------------------------------------
1339
1340	@page
1341	@node sen
1342	@section sen - a sentensizer
1343
1344	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1345
1346	@item @strong{Authors:} @tab Tomasz ObrÄbski
1347	@item @strong{Component category:} @tab filter
1348	@item @strong{Input format:} @tab UTT regular
1349	@item @strong{Output format:} @tab UTT regular
1350	@item @strong{Required annotation:} @tab tok
1351
1352	@end multitable
1353
1354
1355	@menu
1356	* sen description::
1357	@c * sen input::
1358	@c * sen output::
1359	* sen example::
1360	@end menu
1361
1362	@node sen description
1363	@subsection Description
1364
1365	@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
1366
1367	@node sen example
1368	@subsection Example
1369
1370	@example
1371	command: sen
1372
1373	input:
1374	0000 05 W CzeÅÄ
1375	0005 01 P !
1376	0006 01 S _
1377	0007 02 W To
1378	0009 01 S _
1379	0010 02 W ja
1380	0012 01 P .
1381	0013 01 S \n
1382
1383	output:
1384	0000 00 BOS *
1385	0000 05 W CzeÅÄ
1386	0005 01 P !
1387	0006 00 EOS *
1388	0006 00 BOS *
1389	0006 01 S _
1390	0007 02 W To
1391	0009 01 S _
1392	0010 02 W ja
1393	0012 01 P .
1394	0013 01 S \n
1395	0014 00 EOS *
1396	@end example
1397
1398
1399	@c ---------------------------------------------------------------------
1400	@c GPH
1401	@c ---------------------------------------------------------------------
1402
1403	@c @node gph - graphizer
1404	@c @chapter gph - graphizer
1405
1406	@c Authors: Tomasz ObrÄbski
1407
1408
1409
1410	@c ---------------------------------------------------------------------
1411	@c SER
1412	@c ---------------------------------------------------------------------
1413
1414	@page
1415	@node ser
1416	@section ser - pattern search tool
1417
1418	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1419	@item @strong{Authors:} @tab Tomasz ObrÄbski
1420	@item @strong{Component category:} @tab filter
1421	@item @strong{Input format:} @tab UTT regular
1422	@item @strong{Output format:} @tab UTT regular
1423	@item @strong{Required annotation:} @tab tok, lem --one-field
1424	@end multitable
1425
1426	@menu
1427	* ser description::
1428	* ser command line options::
1429	* ser pattern::
1430	* ser how ser works::
1431	* ser customization::
1432	* ser limitations::
1433	* ser requirements::
1434	@end menu
1435
1436
1437	@node ser description
1438	@subsection Description
1439
1440	@command{ser} looks for patterns in UTT-formatted texts.
1441
1442
1443	@c ---------------------------------------------------------------------
1444	@node ser command line options
1445	@subsection Command line options
1446
1447	@table @code
1448
1449	@parhelp
1450	@parversion
1451	@c @parfile
1452	@c @paroutput
1453	@c @parinputfield
1454	@c @paroutputfield
1455	@parprocess
1456	@parinteractive
1457
1458	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1459	The search pattern.
1460
1461	@item @b{@minus{}@minus{}morph=@var{field}}
1462	The name of the annotation field containing the morphological
1463	description (default @code{lem}).
1464
1465	@item @b{@minus{}@minus{}flex}
1466	Only print the generated flex source code.
1467
1468	@item @b{@minus{}@minus{}macro=@var{filename}}
1469	Read macrodefinitions from file @var{filename} rather than from
1470	default location. This option allows to redefine the set of terms.
1471
1472	@item @b{@minus{}@minus{}define=@var{filename}}
1473	Append macrodefinitions from file @var{filename}. This option
1474	allows to extend the set of terms.
1475
1476	@end table
1477
1478
1479	@c ---------------------------------------------------------------------
1480	@node ser pattern
1481	@subsection Pattern
1482
1483	The @command{ser} pattern is a regular expression over terms corresponding
1484	to text segments or segment sequences. Predefined terms are:
1485
1486	@table @code
1487
1488	@item seg(@var{t},@var{f},@var{a})
1489	a segment of type @var{t}, containing form @var{f} and annotation
1490	@var{a}
1491
1492	@item form(@var{f})
1493	a segment containing form @var{f}
1494
1495	@item field(@var{f})
1496	a segment containing annotation field @var{f}
1497
1498	@item space(@var{f})
1499	a space segment of form @var{f}
1500
1501	@item word(@var{f})
1502	a word segment of form @var{f}
1503
1504	@item punct(@var{f})
1505	a punct segment of form @var{f}
1506
1507	@item number(@var{f})
1508	a number segment of form @var{f}
1509
1510	@item lexeme(@var{f})
1511	a word segment with lemma @var{f}
1512
1513	@item cat(@var{c})
1514	a word segment of category @var{c}
1515
1516	@end table
1517
1518	All arguments are optional. If an argument is omitted, an arbitrary
1519	string of non-blank characters is assumed as the argument value. Term
1520	arguments may be arbitrary character-level regular expressions. The
1521	following special symbols can by used:
1522
1523	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1524	@item @code{[@dots{}]} @tab a character class
1525	@item @code{[^@dots{}]} @tab a negated character class
1526	@item @code{\|} @tab alternative
1527	@item @code{*} @tab repetition, including zero times
1528	@item @code{+} @tab repetition, at least one time
1529	@item @code{?} @tab optionality
1530	@item @code{@{@var{m},@var{n}@}} @tab repetition from @var{m} to @var{n} times
1531	@item @code{@{@var{m},@}} @tab repetition @var{m} or more times
1532	@item @code{@{@var{m}@}} @tab repetition @var{m} times
1533	@item @code{@var{\ddd}} @tab the character with octal value @var{ddd}
1534	@item @code{\x@var{hh}} @tab the character with hexadecimal value @var{hh}
1535	@item @code{( )} @tab parentheses, used to override precedence
1536	@c @end multitable
1537
1538	@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1539	@item @code{.} @tab a non-blank character
1540	@item @code{\w} @tab a letter
1541	@item @code{\W} @tab a non-blank character other than a letter
1542	@item @code{\d} @tab a digit
1543	@item @code{\D} @tab a non-blank character other than a digit
1544	@item @code{\s} @tab a space or tab character
1545	@item @code{\S} @tab a non-blank character (the same as @code{.})
1546	@item @code{\l} @tab a lowercase letter
1547	@item @code{\L} @tab an uppercase letter
1548	@end multitable
1549
1550
1551	@noindent The following characters:
1552	@example
1553	@verb{% [ ] ^ \| * + ? { } , . < > \ %}
1554	@end example
1555	must be escaped with a backslash, i.e. written as:
1556	@example
1557	@verb{% \[ \] \^ \\| \* \+ \? \{ \} \, \. \< \> \\ %}
1558	@end example
1559
1560	@quotation Note
1561	The special symbols are ... borrowed from Perl with minor
1562	modifications ... for convenience
1563	The meaning of certain special characters/sequences slightly differs
1564	from their common ???. This is motivated by convenience reasons.
1565	The meaning of the @code{.} special character is modified due to
1566	the special function of spaces in utt files (they are field
1567	separators). Use @code{\s} to explicitly
1568	@end quotation
1569
1570	In the argument of the @code{cat} term a special operator <...> may be
1571	used. A category specification enclosed in angle brackets matches all
1572	category descriptions which are consistent (non-contradictory) with the
1573	specification. For example @code{<N>} matches all noun descriptions,
1574	@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
1575
1576
1577	@*
1578	@noindent @b{Examples of one-segment patterns:}
1579
1580	@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1581	@item @code{seg} @tab any segment
1582	@item @code{word} @tab any word-form
1583	@item @code{word(pomocy)} @tab the word-form @samp{pomocy}
1584	@item @code{word(naj.+)} @tab a word-form beginning with @samp{naj}
1585	@item @code{word(\L\l+)} @tab a capitalized word-form
1586	@item @code{punct} @tab a punctuation character
1587	@item @code{space(.\\n.)} @tab a space segment containing a newline character
1588	@item @code{lexeme(pomoc)} @tab any form of the lexeme 'pomoc'
1589	@item @code{cat(N/.*)} @tab a word which category starts with @code{N/}
1590	@item @code{cat(<N/Ca>)} @tab a word which category matches @code{N/Ca}
1591	@end multitable
1592
1593	@*
1594	@noindent @b{Examples of multi-segment patterns:}
1595
1596	@table @code
1597
1598	@item (word(\L) punct(\.) space?)+ word(\L\l+)
1599	a sequence of initials followed by a surname
1600
1601	@item punct seg(W\|S\|N)* cat(<NPRO/Sr>) seg(W\|S\|N)* punct
1602	a text fragment between two punctuation characters, containing an
1603	ocurrence of a relative pronoun
1604
1605	@end table
1606
1607
1608	@node ser how ser works
1609	@subsection How ser works
1610
1611	@node ser customization
1612	@subsection Customization
1613
1614	@c All predefined terms correspond to single segments,
1615
1616	@example
1617	define(`verbseq', `(cat(<V>) (space cat(<V>)))')
1618	@end example
1619
1620
1621	the term @code{cat()} may not be used as a ... of
1622
1623	@c See @command{m4} manual for further details on macro definition format.
1624
1625	@node ser limitations
1626	@subsection Limitations
1627
1628	Do not use more than 3 attributes in <>.
1629
1630	@node ser requirements
1631	@subsection Requirements
1632
1633	In order to run @command{ser}, the following programs must be
1634	installed in the system:
1635
1636	@itemize
1637
1638	@item @command{m4}
1639	@item @command{grep}
1640	@item @command{flex}
1641	@item @command{gcc}
1642
1643	@end itemize
1644
1645
1646	@c ---------------------------------------------------------------------
1647	@c GRP
1648	@c ---------------------------------------------------------------------
1649
1650	@page
1651	@node grp
1652	@section grp - pattern search tool
1653
1654	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1655	@item @strong{Authors:} @tab Tomasz ObrÄbski
1656	@item @strong{Component category:} @tab filter
1657	@item @strong{Input format:} @tab UTT flattened
1658	@item @strong{Output format:} @tab UTT flattened
1659	@item @strong{Required annotation:} @tab tok, sen, lem --one-field
1660	@end multitable
1661
1662
1663	@menu
1664	* grp description::
1665	* grp command line options::
1666	* grp pattern::
1667	* grp hints::
1668	@end menu
1669
1670
1671	@node grp description
1672	@subsection Description
1673
1674	@code{gre} selects sentences containing an expression matching a
1675	pattern. The pattern format is exactly the same as that accepted by
1676	@code{ser}.
1677
1678	@code{gre} is intended mainly for speeding up corpus search process.
1679	It is extremely fast (processing speed is usually higher then the speed
1680	of reading the corpus file from disk).
1681
1682	@node grp command line options
1683	@subsection Command line options
1684
1685	@table @code
1686
1687	@parhelp
1688	@parversion
1689	@parprocess
1690	@parinteractive
1691
1692	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1693	The search pattern.
1694
1695	@item @b{@minus{}@minus{}morph=@var{field}}
1696	The name of the annotation field containing the morphological
1697	description (default @code{lem}).
1698
1699	@item @b{@minus{}@minus{}command}
1700	Only print the generated flex source code.
1701
1702	@item @b{@minus{}@minus{}macro=@var{filename}}
1703	Read macrodefinitions from file @var{filename} rather than from
1704	default location. This option allows to redefine the set of terms.
1705
1706	@item @b{@minus{}@minus{}define=@var{filename}}
1707	Append macrodefinitions from file @var{filename}. This option
1708	allows to extend the set of terms.
1709
1710	@end table
1711
1712
1713	@node grp pattern
1714	@subsection Pattern
1715
1716	(see @code{ser})
1717
1718	@node grp hints
1719	@subsection Hints
1720
1721	The corpus search speed may be increased by combining grp with lzop
1722	compression tool (grp usually processes data faster than it is read from a
1723	disk, especially for slow laptop drives).
1724
1725	@example
1726	cat corpus \| tok \| sen \| lem -1 \| fla \| lzop -7 > corpus.grp.lzo
1727	@end example
1728
1729	@example
1730	lzop -cd corpus.grp.lzo \| grp -e @var{EXPR} \| unfla \| ser -e @var{EXPR}
1731	@end example
1732
1733
1734
1735	@c ---------------------------------------------------------------------
1736	@c MAR
1737	@c ---------------------------------------------------------------------
1738
1739	@page
1740	@node mar
1741	@section mar
1742
1743	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1744	@item @strong{Authors:} @tab Marcin Walas, Tomasz ObrÄbski
1745	@item @strong{Input format:} @tab UTT flattened
1746	@item @strong{Output format:} @tab UTT flattened
1747	@item @strong{Required annotation:} @tab tok, sen, lem -1
1748	@end multitable
1749
1750	@subsection Description
1751	@code{mar} is a perl script, which matches given pattern on the utt-formated text
1752	and tags matching parts with any number of user-defined tags.
1753
1754	@subsection Command line options
1755	@table @code
1756	@parhelp
1757	@parversion
1758
1759	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1760	The search pattern.
1761	@item @b{@minus{}@minus{}action=@var{action}, @minus{}a @var{action} [p] [s] [P]}
1762	Perform only indicated actions. Where:
1763	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1764	@item @code{p} @tab preprocess
1765	@item @code{s} @tab search
1766	@item @code{P} @tab postprocess
1767	@end multitable
1768	default: psP
1769
1770	@item @b{@minus{}@minus{}command}
1771	print generated sed command, then exit
1772
1773	@item @b{@minus{}@minus{}help, @minus{}h}
1774	print help, then exit
1775
1776	@item @b{@minus{}@minus{}version, @minus{}v}
1777	print version, then exit
1778	@end table
1779	@subsection Tokens in pattern
1780	@code{mar} pattern is based on @code{ser} patterns(see @pxref{ser pattern}). @code{mar} pattern is a @code{ser} pattern,
1781	in which you can add any number of matching tags, which will be printed in exacly the place, where
1782	they were placed in the pattern. A valid token starts with @@ which follows any number of alphanumeric
1783	characters. For example valid match tokens are: @@STARTMATCH @@ENDMATCH
1784
1785	Matching tokens can be placed between, before or after any of @code{ser} pattern terms. They don't have
1786	to be paritied. There can be any number of them in the pattern (zero or more). They don't have to be unique.
1787	They can be placed one after another. For example:
1788
1789	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaa}
1790	@item @code{@@BOM lexeme(pomoc)} @tab place tag @b{BOM} before any form of the lexeme 'pomoc'
1791	@item @code{@@MATCH lexeme(pomoc) @@MATCH} @tab place tag @b{MATCH} before and after any form of the lexeme 'pomoc'
1792	@item @code{cat(<ADJ>) @@MATCH lexeme(pomoc) @@MATCH} @tab place tag @b{MATCH} before and after any form of the lexeme 'pomoc' which is followef by adjective
1793	@item @code{cat(<ADJ>) @@TAG @@BOM lexeme(pomoc) @@EOM} @tab place tags @b{TAG} and @b{BOM} before any form of the lexeme 'pomoc' which is followed by adjective and tag @b{EOM} after it
1794	@end multitable
1795
1796	(see mar's help 'mar -h' for some more information)
1797
1798	@subsection How mar works
1799	@code{mar} translates given @code{ser} pattern with @code{m4} macroprocessor to regular expression. Then it changes it into @code{sed} command script, which is then executed.
1800
1801	You can see translated sed script by using the @code{@minus{}@minus{}command} option.
1802	@subsection Limitations
1803	The complexity of computations performed by @code{mar} increases linearly with the number of placed tokens. So it is highly recommended not to place too much tokens.
1804	@subsection Requirements
1805	In order to run @code{mar}, the following programs must be installed in the system:
1806
1807	@itemize
1808
1809	@item @command{m4}
1810	@item @command{grep}
1811	@item @command{sed}
1812
1813	@end itemize
1814
1815
1816
1817	@c ---------------------------------------------------------------------
1818	@c KOT
1819	@c ---------------------------------------------------------------------
1820
1821	@page
1822	@node kot
1823	@section kot - untokenizer
1824
1825	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1826	@item @strong{Authors:} @tab Tomasz ObrÄbski
1827	@item @strong{Component category:} @tab filter
1828	@item @strong{Input format:} @tab UTT regular
1829	@item @strong{Output format:} @tab text
1830	@item @strong{Required annotation:} @tab tok
1831	@end multitable
1832
1833
1834	@menu
1835	* kot description::
1836	* kot command line options::
1837	* kot usage examples::
1838	@end menu
1839
1840	@node kot description
1841	@subsection Description
1842
1843	@command{kot} transforms a UTT formatted file back into raw text format.
1844
1845	@node kot command line options
1846	@subsection Command line options
1847
1848	@table @code
1849
1850	@parhelp
1851
1852	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1853
1854	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1855
1856	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1857
1858	@c @item @b{@minus{}@minus{}interactive @minus{}i}
1859
1860	@c @item @b{@minus{}@minus{}config=@var{filename}}
1861
1862	@item
1863
1864	@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
1865	print @var{string} between nonadjacent segments of the input file
1866
1867	@item @b{@minus{}@minus{}spaces, @minus{}r}
1868	retain the special characters @code{_}, @code{\t},
1869	@code{\n}, @code{\r}, @code{\f} unexpanded in the output
1870
1871	@end table
1872
1873	@node kot usage examples
1874	@subsection Usage examples
1875
1876	@example
1877	cat legia.txt \| tok \| kot
1878	@end example
1879
1880	@example
1881	cat legia.txt \| tok \| lem -1 \| kot
1882	@end example
1883
1884	@c ---------------------------------------------------------------
1885	@c CON
1886	@c ---------------------------------------------------------------
1887
1888
1889	@page
1890	@node con
1891	@section con - concordance table generator
1892
1893	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
1894	@item @strong{Authors:} @tab Justyna Walkowska
1895	@item @strong{Component category:} @tab sink
1896	@item @strong{Input format:} @tab UTT regular
1897	@item @strong{Output format:} @tab text
1898	@item @strong{Required annotation:} @tab ser or mar
1899	@end multitable
1900	@c
1901
1902	@menu
1903	* con description::
1904	* con command line options::
1905	* con usage example::
1906	* con hints::
1907	@end menu
1908
1909
1910	@node con description
1911	@subsection Description
1912
1913	@command{con} generates a concordance table based on a pattern given to @command{ser}.
1914
1915
1916	@node con command line options
1917	@subsection Command line options
1918
1919	@table @code
1920
1921	@parhelp
1922
1923	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
1924	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
1925	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
1926	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
1927	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
1928	@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
1929	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
1930	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
1931	@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
1932	@c @item @b{@minus{}@minus{}interactive @minus{}i}
1933	@c @item @b{@minus{}@minus{}config=@var{filename}}
1934	@c @item
1935	@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
1936	@c search pattern
1937	@c
1938	@c @item @b{@minus{}@minus{}flex}
1939	@c only print the generated flex source code
1940	@c
1941	@c @item @b{@minus{}@minus{}macro=@var{filename}}
1942	@c read macrodefinitions from file @var{filename} rather than from
1943	@c default location. This option allows to redefine the set of terms.
1944	@c
1945	@c @item @b{@minus{}@minus{}define=@var{filename}}
1946	@c append macrodefinitions from file @var{filename}. This option
1947	@c allows to extend the set of terms.
1948
1949	@item @b{@minus{}@minus{}left @minus{}l}
1950	Left context info (default='30c'). Example:
1951	@example
1952	-l=5c: left context is 5 characters
1953	-l=5w: left context is 5 words
1954	-l=5s: left context is 5 non-empty input lines
1955	-l='\s*\S+\sr\S+BOS': left context starts with the given regex
1956	@end example
1957
1958	@item @b{@minus{}@minus{}right @minus{}r}
1959	Right context info (default='30c').
1960	@item @b{@minus{}@minus{}trim @minus{}t}
1961	Clear incomplete words from output.
1962	@item @b{@minus{}@minus{}white @minus{}w}
1963	DO NOT change all white characters into spaces.
1964	@item @b{@minus{}@minus{}column @minus{}c}
1965	Left column minimal width in characters (default = 0).
1966	@item @b{@minus{}@minus{}ignore @minus{}i}
1967	Ignore segment inconsistency in the input.
1968	@item @b{@minus{}@minus{}bom}
1969	Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
1970	@item @b{@minus{}@minus{}eom}
1971	End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
1972	@item @b{@minus{}@minus{}bod}
1973	Selected segment beginning display string (default='[').
1974	@item @b{@minus{}@minus{}eod}
1975	Selected segment end display string (default=']').
1976
1977
1978
1979	@end table
1980
1981	@node con usage example
1982	@subsection Usage example
1983	@example
1984	cat file.txt \| tok \| lem -1 \| ser -e 'lexeme(dom)' \| con
1985	@end example
1986
1987
1988	@node con hints
1989	@subsection Hints
1990
1991	@command{con} is a rather slow program. Do not pass large amounts of
1992	redundant text through this program. @command{con} works fine in the following
1993	sequence:
1994
1995	@example
1996	... \| grp -e EXPR \| ser -e EXPR \| con
1997	@end example
1998
1999
2000	@c ---------------------------------------------------------------------
2001	@c ---------------------------------------------------------------------
2002
2003	@page
2004	@node Auxiliary tools
2005	@chapter Auxiliary tools
2006
2007	@menu
2008	* compiledic:: dictionary compiler
2009	* fla:: UTT file flattener
2010	* unfla:: UTT file unflattener
2011	@end menu
2012
2013
2014	@page
2015	@node compiledic
2016	@section compiledic - the dictionary compiler
2017
2018	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2019	@item @strong{Authors:} @tab MichaÅ Stolarski, Tomasz ObrÄbski
2020	@item @strong{Component category:} @tab additional tool
2021	@end multitable
2022	@c
2023
2024	@command{compiledic} compiles dictionaries in text format (@code{.dic} extension) into binary
2025	(FSA) format (@code{.bin} extension).
2026
2027	Automaton representation of a dictionary is built using the AT&T tools:
2028	@itemize
2029	@item AT&T FSM Library,
2030	@item AT&T Lextools.
2031	@end itemize
2032
2033	In order for the compiledic program to work you have to install the
2034	above mentioned packages into your system. They are freely available
2035	for non-commercial use.
2036
2037	Usage:
2038	@example
2039	compiledic <dictionaryname>.dic
2040	@end example
2041
2042	The file <dictionaryname>.bin will be generated.
2043
2044	Remarque: The program produces a lot of temporary files which are
2045	stored in the current directory. They are deleted after successfull
2046	termination of the program.
2047
2048	@c @menu
2049	@c * con command line options::
2050	@c * con usage example::
2051	@c * con hints::
2052	@c @end menu
2053
2054
2055	@c -------------------------------------------------------------------------------
2056	@c FLA
2057	@c -------------------------------------------------------------------------------
2058
2059	@page
2060	@node fla
2061	@section fla - the UTT file flattener
2062
2063	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2064	@item @strong{Authors:} @tab Tomasz ObrÄbski
2065	@item @strong{Input format:} @tab UTT regular
2066	@item @strong{Output format:} @tab UTT flattened
2067	@item @strong{Required annotation:} @tab sen
2068	@end multitable
2069	@c
2070
2071	@menu
2072	* fla description::
2073	@c * fla command line options::
2074	@c * fla usage example::
2075	@end menu
2076
2077
2078	@node fla description
2079	@subsection Description
2080
2081	@command{fla} ``flattens'' a utt file by merging segments belonging
2082	to one sentence in one line. Technically, end-of-line characters
2083	('\n', ASCII code 10) are replaced with line-feed characters ('\f',
2084	ASCII code 12). The flattening makes it possible to process UTT files
2085	with such tools as @command{grep} or @command{sed} sentence by
2086	sentence (used in @command{grp} and @command{mar}).
2087
2088	Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
2089
2090	Flattened files are still human-readible.
2091
2092	Usage:
2093
2094	@example
2095	fla [<bosregex>]
2096	@end example
2097
2098	The facultative argument is a regular expression describing segments
2099	which should be treated as sentence beginnings (the test is: the
2100	segment contains a fragment matching the @code{<bosregex>}). By
2101	default, segments containing a field @code{BOS} are seeked.
2102
2103	@c -------------------------------------------------------------------------------
2104	@c UNFLA
2105	@c -------------------------------------------------------------------------------
2106
2107	@page
2108	@node unfla
2109	@section unfla - the UTT file unflattener
2110
2111	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2112	@item @strong{Authors:} @tab Tomasz ObrÄbski
2113	@item @strong{Input format:} @tab UTT flattened
2114	@item @strong{Output format:} @tab UTT regular
2115	@item @strong{Required annotation:} @tab -
2116	@end multitable
2117
2118	@menu
2119	* unfla description::
2120	@c * fla command line options::
2121	@c * fla usage example::
2122	@end menu
2123
2124	@node unfla description
2125	@subsection Description
2126	@command{unfla} transforms a flattened UTT file, produced by
2127	@command{fla}, into the regular format by restoring end-of-line
2128	characters.
2129
2130
2131
2132
2133	@c ---------------------------------------------------------------------
2134	@c USAGE EXAMPLES
2135	@c ---------------------------------------------------------------------
2136
2137	@node Usage examples
2138	@chapter Usage examples
2139
2140	@subsubheading Simple pipelines
2141
2142	@enumerate
2143
2144	@item tokenization
2145
2146	cat text \| tok > output1
2147
2148	@item morphological annotation (1)
2149
2150	simple dictionary based lemmatization
2151
2152	cat text \| tok \| lem > output1
2153
2154	@item morphological annotation (2)
2155
2156	1) perform dictionary-based lemmatization
2157	4) guess descriptions for words which have no annotation
2158
2159	@example
2160	cat text \| tok \| lem \| gue -S lem > output2
2161	@end example
2162
2163	@item morphological annotation (3)
2164
2165	1) perform dictionary-based lemmatization
2166	2) try to correct words with no annotation
2167	3) perform dictionary-based lemmatization of corrected words
2168	4) guess descriptions for words which still have no annotation
2169
2170	@example
2171	cat text \| tok \| lem \| cor -p W -S lem \| lem -I cor \| gue -p W -S lem
2172	@end example
2173	@item spelling correction
2174
2175
2176
2177	@example
2178	cat text \| tok \| egrep ' W ' \| lem \| egrep -v 'lem:' \| cor -1
2179	@end example
2180
2181	@item Expression extraction
2182
2183	Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
2184
2185	@example
2186	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' -m \| kot > output4
2187	@end example
2188
2189	@item A word in context
2190
2191	Extraction of text fragments containing a form of the lexeme 'rozmowa' in
2192	the context of 5 preceeding and 5 succeeding corpus segments.
2193
2194	@example
2195	cat text \| tok \| lem -1 \| ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m \| kot > output
2196	@end example
2197
2198	@item generation of concordance table (1)
2199
2200	@example
2201	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
2202	@end example
2203
2204	10"
2205
2206	@item generation of concordance table (2)
2207
2208	The same as above but much faster
2209
2210	@example
2211	cat text \| tok \| lem -1 \| \
2212	grp -e 'cat(<V>) space lexeme(rozmowa)' \| \
2213	ser -e 'cat(<V>) space lexeme(rozmowa)' \| \
2214	con
2215	@end example
2216
2217	2"
2218
2219	@item generation of concordance table (3)
2220
2221	Usually, one performs repetitively search over the same corpus. In
2222	such case it is advisable to transform the corpus data into the format
2223	required by @command{grp} first, and then use the preprocessed data.
2224
2225	As @command{grp} (@command{grep}) processes data faster then it is
2226	read from the disk drive, the search time may be still shortened by
2227	using file compression techniques. We suggest using the
2228	@command{lzop} compressor/decompressor.
2229
2230	@item the fastest way to search a large corpus
2231
2232	step 1: corpus preprocessing
2233
2234	@example
2235	cat corpus \| tok \| sen \| lem -1 \
2236	\| fla \| lzop -7 > corpus.grp.lzo
2237	@end example
2238
2239	step 2: search
2240
2241	@example
2242	lzop -cd corpus.grp.lzo \| unfla \| grp -e 'cat(<V>) space
2243	lexeme(rozmowa)' \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
2244	@end example
2245
2246	@end enumerate
2247
2248	@c @subsubheading More complicated configurations
2249
2250
2251	@c @example
2252	@c mknod fifo1 p
2253	@c mknod fifo2 p
2254	@c mknod fifo3 p
2255	@c mknod fifo4 p
2256	@c mknod fifo5 p
2257
2258	@c tok \| lem -p W -e fifo1 > fifo2 &
2259	@c cor -e fifo3 < fifo1 \| lem > fifo4 &
2260	@c gue < fifo3 > fifo5 &
2261	@c sort -m fifo2 fifo4 fifo5
2262
2263	@c rm fifo?
2264	@c @end example
2265
2266
2267	@c ---------------------------------------------------------------------
2268	@c ---------------------------------------------------------------------
2269
2270	@c ---------------------------------------------------------------------
2271	@c PMDBF DICTIONARY
2272	@c ---------------------------------------------------------------------
2273
2274	@node PMDBF dictionary
2275	@chapter PMDBF dictionary
2276
2277	UTT components come with lexical data derived from Polish
2278	Morphological Database (PMDB).
2279
2280	@menu
2281	* PMDBF files::
2282	* PMDBF tag structure::
2283	* PMDBF parts of speech::
2284	* PMDBF morphosyntactic attributes::
2285	@end menu
2286
2287	@node PMDBF files
2288	@section Files
2289
2290	@node PMDBF tag structure
2291	@section Tag structure
2292
2293	pos = [[:upper:]]+
2294
2295	attr = [[:upper:]]+
2296
2297	val = [[:lower:][:digit:]?!*+-] \| <[^>\n]+>
2298
2299	descr = pos ( / ( attr val + ) + ) ?
2300
2301	@node PMDBF parts of speech
2302	@section Parts of speech
2303
2304	@multitable {ADJPRP} { adjectival-passive-participle }
2305	@item @code{N} @tab noun
2306	@item @code{NPRO} @tab nominal-pronoun
2307	@item @code{NV} @tab deverbal-noun
2308	@item @code{V} @tab verb
2309	@item @code{BYC} @tab byc
2310	@item @code{VNI} @tab non-inflected-verb
2311	@item @code{ADJ} @tab adjective
2312	@item @code{ADJPAP} @tab adjectival-passive-participle
2313	@item @code{ADJPRP} @tab adjectival-present-participle
2314	@item @code{ADJPP} @tab adjectival-past-participle
2315	@item @code{ADJPRO} @tab adjectival-pronoun
2316	@item @code{ADJNUM} @tab adjectival-numeral
2317	@item @code{ADV} @tab adverb
2318	@item @code{ADVANP} @tab adverbial-anterior-participle
2319	@item @code{ADVPRP} @tab adverbial-present-participle
2320	@item @code{ADVPRO} @tab adverbial-pronoun
2321	@item @code{ADVNUM} @tab adverbial-numeral
2322	@item @code{P} @tab preposition
2323	@item @code{PPRO} @tab prep-noun-pronoun
2324	@item @code{CONJ} @tab conjunction
2325	@item @code{EXCL} @tab exclamation
2326	@item @code{APP} @tab call
2327	@item @code{ONO} @tab onomatopoeia
2328	@item @code{PART} @tab particle
2329	@item @code{NUMCRD} @tab cardinal-numeral
2330	@item @code{NUMCOL} @tab collective-numeral
2331	@item @code{NUMPAR} @tab partitive-numeral
2332	@item @code{NUMORD} @tab ordinal-numeral
2333	@end multitable
2334
2335	@node PMDBF morphosyntactic attributes
2336	@section Morphosyntactic attributes
2337
2338	@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
2339	@c @headitem Attr @tab Val @tab Description
2340	@item
2341	@code{A} @tab @tab Aspect
2342	@item
2343	@tab @code{p} @tab perfect
2344	@item
2345	@tab @code{i} @tab imperfect.
2346	@item
2347	@item
2348	@code{V} @tab @tab Verb-Form
2349	@item
2350	@tab @code{b} @tab infinitive,
2351	@item
2352	@tab @code{p} @tab personal,
2353	@item
2354	@tab @code{i} @tab impersonal.
2355	@item
2356	@item
2357	@code{M} @tab @tab Mood
2358	@item
2359	@tab @code{d} @tab declarative,
2360	@item
2361	@tab @code{c} @tab conditional,
2362	@item
2363	@tab @code{i} @tab imperative.
2364	@item
2365	@item
2366	@code{T} @tab @tab Tense
2367	@item
2368	@tab @code{a} @tab past,
2369	@item
2370	@tab @code{r} @tab present,
2371	@item
2372	@tab @code{f} @tab future.
2373	@item
2374	@item
2375	@code{P} @tab @tab Person
2376	@item
2377	@tab @code{1} @tab 1,
2378	@item
2379	@tab @code{2} @tab 2,
2380	@item
2381	@tab @code{3} @tab 3.
2382	@item
2383	@item
2384	@code{D} @tab @tab Degree
2385	@item
2386	@tab @code{p} @tab positive,
2387	@item
2388	@tab @code{c} @tab comparative,
2389	@item
2390	@tab @code{s} @tab superlative.
2391	@item
2392	@item
2393	@code{N} @tab @tab Number
2394	@item
2395	@tab @code{s} @tab singular,
2396	@item
2397	@tab @code{p} @tab plural.
2398	@item
2399	@item
2400	@code{C} @tab @tab Case
2401	@item
2402	@tab @code{n} @tab nominative,
2403	@item
2404	@tab @code{g} @tab genitive,
2405	@item
2406	@tab @code{d} @tab dative,
2407	@item
2408	@tab @code{a} @tab accusative,
2409	@item
2410	@tab @code{i} @tab instrumantal,
2411	@item
2412	@tab @code{l} @tab locative,
2413	@item
2414	@tab @code{v} @tab vocative.
2415	@item
2416	@code{G} @tab @tab Gender
2417	@item
2418	@tab @code{p} @tab masculine-personal,
2419	@item
2420	@tab @code{a} @tab masculine-animal,
2421	@item
2422	@tab @code{i} @tab masculine-inanimate,
2423	@item
2424	@tab @code{f} @tab feminine,
2425	@item
2426	@tab @code{n} @tab neuter.
2427	@end multitable
2428
2429
2430	@c ---------------------------------------------------------------------
2431	@c ---------------------------------------------------------------------
2432	@c
2433	@c @node Examples
2434	@c @chapter Examples
2435
2436	@c ----------------------------------------------------------------------
2437	@c ----------------------------------------------------------------------
2438
2439	@node GNU Free Documentation License
2440	@chapter GNU Free Documentation License
2441
2442	@c The GNU Free Documentation License.
2443	@center Version 1.2, November 2002
2444
2445	@c This file is intended to be included within another document,
2446	@c hence no sectioning command or @node.
2447
2448	@display
2449	Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
2450	51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
2451
2452	Everyone is permitted to copy and distribute verbatim copies
2453	of this license document, but changing it is not allowed.
2454	@end display
2455
2456	@enumerate 0
2457	@item
2458	PREAMBLE
2459
2460	The purpose of this License is to make a manual, textbook, or other
2461	functional and useful document @dfn{free} in the sense of freedom: to
2462	assure everyone the effective freedom to copy and redistribute it,
2463	with or without modifying it, either commercially or noncommercially.
2464	Secondarily, this License preserves for the author and publisher a way
2465	to get credit for their work, while not being considered responsible
2466	for modifications made by others.
2467
2468	This License is a kind of ``copyleft'', which means that derivative
2469	works of the document must themselves be free in the same sense. It
2470	complements the GNU General Public License, which is a copyleft
2471	license designed for free software.
2472
2473	We have designed this License in order to use it for manuals for free
2474	software, because free software needs free documentation: a free
2475	program should come with manuals providing the same freedoms that the
2476	software does. But this License is not limited to software manuals;
2477	it can be used for any textual work, regardless of subject matter or
2478	whether it is published as a printed book. We recommend this License
2479	principally for works whose purpose is instruction or reference.
2480
2481	@item
2482	APPLICABILITY AND DEFINITIONS
2483
2484	This License applies to any manual or other work, in any medium, that
2485	contains a notice placed by the copyright holder saying it can be
2486	distributed under the terms of this License. Such a notice grants a
2487	world-wide, royalty-free license, unlimited in duration, to use that
2488	work under the conditions stated herein. The ``Document'', below,
2489	refers to any such manual or work. Any member of the public is a
2490	licensee, and is addressed as ``you''. You accept the license if you
2491	copy, modify or distribute the work in a way requiring permission
2492	under copyright law.
2493
2494	A ``Modified Version'' of the Document means any work containing the
2495	Document or a portion of it, either copied verbatim, or with
2496	modifications and/or translated into another language.
2497
2498	A ``Secondary Section'' is a named appendix or a front-matter section
2499	of the Document that deals exclusively with the relationship of the
2500	publishers or authors of the Document to the Document's overall
2501	subject (or to related matters) and contains nothing that could fall
2502	directly within that overall subject. (Thus, if the Document is in
2503	part a textbook of mathematics, a Secondary Section may not explain
2504	any mathematics.) The relationship could be a matter of historical
2505	connection with the subject or with related matters, or of legal,
2506	commercial, philosophical, ethical or political position regarding
2507	them.
2508
2509	The ``Invariant Sections'' are certain Secondary Sections whose titles
2510	are designated, as being those of Invariant Sections, in the notice
2511	that says that the Document is released under this License. If a
2512	section does not fit the above definition of Secondary then it is not
2513	allowed to be designated as Invariant. The Document may contain zero
2514	Invariant Sections. If the Document does not identify any Invariant
2515	Sections then there are none.
2516
2517	The ``Cover Texts'' are certain short passages of text that are listed,
2518	as Front-Cover Texts or Back-Cover Texts, in the notice that says that
2519	the Document is released under this License. A Front-Cover Text may
2520	be at most 5 words, and a Back-Cover Text may be at most 25 words.
2521
2522	A ``Transparent'' copy of the Document means a machine-readable copy,
2523	represented in a format whose specification is available to the
2524	general public, that is suitable for revising the document
2525	straightforwardly with generic text editors or (for images composed of
2526	pixels) generic paint programs or (for drawings) some widely available
2527	drawing editor, and that is suitable for input to text formatters or
2528	for automatic translation to a variety of formats suitable for input
2529	to text formatters. A copy made in an otherwise Transparent file
2530	format whose markup, or absence of markup, has been arranged to thwart
2531	or discourage subsequent modification by readers is not Transparent.
2532	An image format is not Transparent if used for any substantial amount
2533	of text. A copy that is not ``Transparent'' is called ``Opaque''.
2534
2535	Examples of suitable formats for Transparent copies include plain
2536	@sc{ascii} without markup, Texinfo input format, La@TeX{} input
2537	format, @acronym{SGML} or @acronym{XML} using a publicly available
2538	@acronym{DTD}, and standard-conforming simple @acronym{HTML},
2539	PostScript or @acronym{PDF} designed for human modification. Examples
2540	of transparent image formats include @acronym{PNG}, @acronym{XCF} and
2541	@acronym{JPG}. Opaque formats include proprietary formats that can be
2542	read and edited only by proprietary word processors, @acronym{SGML} or
2543	@acronym{XML} for which the @acronym{DTD} and/or processing tools are
2544	not generally available, and the machine-generated @acronym{HTML},
2545	PostScript or @acronym{PDF} produced by some word processors for
2546	output purposes only.
2547
2548	The ``Title Page'' means, for a printed book, the title page itself,
2549	plus such following pages as are needed to hold, legibly, the material
2550	this License requires to appear in the title page. For works in
2551	formats which do not have any title page as such, ``Title Page'' means
2552	the text near the most prominent appearance of the work's title,
2553	preceding the beginning of the body of the text.
2554
2555	A section ``Entitled XYZ'' means a named subunit of the Document whose
2556	title either is precisely XYZ or contains XYZ in parentheses following
2557	text that translates XYZ in another language. (Here XYZ stands for a
2558	specific section name mentioned below, such as ``Acknowledgements'',
2559	``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title''
2560	of such a section when you modify the Document means that it remains a
2561	section ``Entitled XYZ'' according to this definition.
2562
2563	The Document may include Warranty Disclaimers next to the notice which
2564	states that this License applies to the Document. These Warranty
2565	Disclaimers are considered to be included by reference in this
2566	License, but only as regards disclaiming warranties: any other
2567	implication that these Warranty Disclaimers may have is void and has
2568	no effect on the meaning of this License.
2569
2570	@item
2571	VERBATIM COPYING
2572
2573	You may copy and distribute the Document in any medium, either
2574	commercially or noncommercially, provided that this License, the
2575	copyright notices, and the license notice saying this License applies
2576	to the Document are reproduced in all copies, and that you add no other
2577	conditions whatsoever to those of this License. You may not use
2578	technical measures to obstruct or control the reading or further
2579	copying of the copies you make or distribute. However, you may accept
2580	compensation in exchange for copies. If you distribute a large enough
2581	number of copies you must also follow the conditions in section 3.
2582
2583	You may also lend copies, under the same conditions stated above, and
2584	you may publicly display copies.
2585
2586	@item
2587	COPYING IN QUANTITY
2588
2589	If you publish printed copies (or copies in media that commonly have
2590	printed covers) of the Document, numbering more than 100, and the
2591	Document's license notice requires Cover Texts, you must enclose the
2592	copies in covers that carry, clearly and legibly, all these Cover
2593	Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
2594	the back cover. Both covers must also clearly and legibly identify
2595	you as the publisher of these copies. The front cover must present
2596	the full title with all words of the title equally prominent and
2597	visible. You may add other material on the covers in addition.
2598	Copying with changes limited to the covers, as long as they preserve
2599	the title of the Document and satisfy these conditions, can be treated
2600	as verbatim copying in other respects.
2601
2602	If the required texts for either cover are too voluminous to fit
2603	legibly, you should put the first ones listed (as many as fit
2604	reasonably) on the actual cover, and continue the rest onto adjacent
2605	pages.
2606
2607	If you publish or distribute Opaque copies of the Document numbering
2608	more than 100, you must either include a machine-readable Transparent
2609	copy along with each Opaque copy, or state in or with each Opaque copy
2610	a computer-network location from which the general network-using
2611	public has access to download using public-standard network protocols
2612	a complete Transparent copy of the Document, free of added material.
2613	If you use the latter option, you must take reasonably prudent steps,
2614	when you begin distribution of Opaque copies in quantity, to ensure
2615	that this Transparent copy will remain thus accessible at the stated
2616	location until at least one year after the last time you distribute an
2617	Opaque copy (directly or through your agents or retailers) of that
2618	edition to the public.
2619
2620	It is requested, but not required, that you contact the authors of the
2621	Document well before redistributing any large number of copies, to give
2622	them a chance to provide you with an updated version of the Document.
2623
2624	@item
2625	MODIFICATIONS
2626
2627	You may copy and distribute a Modified Version of the Document under
2628	the conditions of sections 2 and 3 above, provided that you release
2629	the Modified Version under precisely this License, with the Modified
2630	Version filling the role of the Document, thus licensing distribution
2631	and modification of the Modified Version to whoever possesses a copy
2632	of it. In addition, you must do these things in the Modified Version:
2633
2634	@enumerate A
2635	@item
2636	Use in the Title Page (and on the covers, if any) a title distinct
2637	from that of the Document, and from those of previous versions
2638	(which should, if there were any, be listed in the History section
2639	of the Document). You may use the same title as a previous version
2640	if the original publisher of that version gives permission.
2641
2642	@item
2643	List on the Title Page, as authors, one or more persons or entities
2644	responsible for authorship of the modifications in the Modified
2645	Version, together with at least five of the principal authors of the
2646	Document (all of its principal authors, if it has fewer than five),
2647	unless they release you from this requirement.
2648
2649	@item
2650	State on the Title page the name of the publisher of the
2651	Modified Version, as the publisher.
2652
2653	@item
2654	Preserve all the copyright notices of the Document.
2655
2656	@item
2657	Add an appropriate copyright notice for your modifications
2658	adjacent to the other copyright notices.
2659
2660	@item
2661	Include, immediately after the copyright notices, a license notice
2662	giving the public permission to use the Modified Version under the
2663	terms of this License, in the form shown in the Addendum below.
2664
2665	@item
2666	Preserve in that license notice the full lists of Invariant Sections
2667	and required Cover Texts given in the Document's license notice.
2668
2669	@item
2670	Include an unaltered copy of this License.
2671
2672	@item
2673	Preserve the section Entitled ``History'', Preserve its Title, and add
2674	to it an item stating at least the title, year, new authors, and
2675	publisher of the Modified Version as given on the Title Page. If
2676	there is no section Entitled ``History'' in the Document, create one
2677	stating the title, year, authors, and publisher of the Document as
2678	given on its Title Page, then add an item describing the Modified
2679	Version as stated in the previous sentence.
2680
2681	@item
2682	Preserve the network location, if any, given in the Document for
2683	public access to a Transparent copy of the Document, and likewise
2684	the network locations given in the Document for previous versions
2685	it was based on. These may be placed in the ``History'' section.
2686	You may omit a network location for a work that was published at
2687	least four years before the Document itself, or if the original
2688	publisher of the version it refers to gives permission.
2689
2690	@item
2691	For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
2692	the Title of the section, and preserve in the section all the
2693	substance and tone of each of the contributor acknowledgements and/or
2694	dedications given therein.
2695
2696	@item
2697	Preserve all the Invariant Sections of the Document,
2698	unaltered in their text and in their titles. Section numbers
2699	or the equivalent are not considered part of the section titles.
2700
2701	@item
2702	Delete any section Entitled ``Endorsements''. Such a section
2703	may not be included in the Modified Version.
2704
2705	@item
2706	Do not retitle any existing section to be Entitled ``Endorsements'' or
2707	to conflict in title with any Invariant Section.
2708
2709	@item
2710	Preserve any Warranty Disclaimers.
2711	@end enumerate
2712
2713	If the Modified Version includes new front-matter sections or
2714	appendices that qualify as Secondary Sections and contain no material
2715	copied from the Document, you may at your option designate some or all
2716	of these sections as invariant. To do this, add their titles to the
2717	list of Invariant Sections in the Modified Version's license notice.
2718	These titles must be distinct from any other section titles.
2719
2720	You may add a section Entitled ``Endorsements'', provided it contains
2721	nothing but endorsements of your Modified Version by various
2722	parties---for example, statements of peer review or that the text has
2723	been approved by an organization as the authoritative definition of a
2724	standard.
2725
2726	You may add a passage of up to five words as a Front-Cover Text, and a
2727	passage of up to 25 words as a Back-Cover Text, to the end of the list
2728	of Cover Texts in the Modified Version. Only one passage of
2729	Front-Cover Text and one of Back-Cover Text may be added by (or
2730	through arrangements made by) any one entity. If the Document already
2731	includes a cover text for the same cover, previously added by you or
2732	by arrangement made by the same entity you are acting on behalf of,
2733	you may not add another; but you may replace the old one, on explicit
2734	permission from the previous publisher that added the old one.
2735
2736	The author(s) and publisher(s) of the Document do not by this License
2737	give permission to use their names for publicity for or to assert or
2738	imply endorsement of any Modified Version.
2739
2740	@item
2741	COMBINING DOCUMENTS
2742
2743	You may combine the Document with other documents released under this
2744	License, under the terms defined in section 4 above for modified
2745	versions, provided that you include in the combination all of the
2746	Invariant Sections of all of the original documents, unmodified, and
2747	list them all as Invariant Sections of your combined work in its
2748	license notice, and that you preserve all their Warranty Disclaimers.
2749
2750	The combined work need only contain one copy of this License, and
2751	multiple identical Invariant Sections may be replaced with a single
2752	copy. If there are multiple Invariant Sections with the same name but
2753	different contents, make the title of each such section unique by
2754	adding at the end of it, in parentheses, the name of the original
2755	author or publisher of that section if known, or else a unique number.
2756	Make the same adjustment to the section titles in the list of
2757	Invariant Sections in the license notice of the combined work.
2758
2759	In the combination, you must combine any sections Entitled ``History''
2760	in the various original documents, forming one section Entitled
2761	``History''; likewise combine any sections Entitled ``Acknowledgements'',
2762	and any sections Entitled ``Dedications''. You must delete all
2763	sections Entitled ``Endorsements.''
2764
2765	@item
2766	COLLECTIONS OF DOCUMENTS
2767
2768	You may make a collection consisting of the Document and other documents
2769	released under this License, and replace the individual copies of this
2770	License in the various documents with a single copy that is included in
2771	the collection, provided that you follow the rules of this License for
2772	verbatim copying of each of the documents in all other respects.
2773
2774	You may extract a single document from such a collection, and distribute
2775	it individually under this License, provided you insert a copy of this
2776	License into the extracted document, and follow this License in all
2777	other respects regarding verbatim copying of that document.
2778
2779	@item
2780	AGGREGATION WITH INDEPENDENT WORKS
2781
2782	A compilation of the Document or its derivatives with other separate
2783	and independent documents or works, in or on a volume of a storage or
2784	distribution medium, is called an ``aggregate'' if the copyright
2785	resulting from the compilation is not used to limit the legal rights
2786	of the compilation's users beyond what the individual works permit.
2787	When the Document is included in an aggregate, this License does not
2788	apply to the other works in the aggregate which are not themselves
2789	derivative works of the Document.
2790
2791	If the Cover Text requirement of section 3 is applicable to these
2792	copies of the Document, then if the Document is less than one half of
2793	the entire aggregate, the Document's Cover Texts may be placed on
2794	covers that bracket the Document within the aggregate, or the
2795	electronic equivalent of covers if the Document is in electronic form.
2796	Otherwise they must appear on printed covers that bracket the whole
2797	aggregate.
2798
2799	@item
2800	TRANSLATION
2801
2802	Translation is considered a kind of modification, so you may
2803	distribute translations of the Document under the terms of section 4.
2804	Replacing Invariant Sections with translations requires special
2805	permission from their copyright holders, but you may include
2806	translations of some or all Invariant Sections in addition to the
2807	original versions of these Invariant Sections. You may include a
2808	translation of this License, and all the license notices in the
2809	Document, and any Warranty Disclaimers, provided that you also include
2810	the original English version of this License and the original versions
2811	of those notices and disclaimers. In case of a disagreement between
2812	the translation and the original version of this License or a notice
2813	or disclaimer, the original version will prevail.
2814
2815	If a section in the Document is Entitled ``Acknowledgements'',
2816	``Dedications'', or ``History'', the requirement (section 4) to Preserve
2817	its Title (section 1) will typically require changing the actual
2818	title.
2819
2820	@item
2821	TERMINATION
2822
2823	You may not copy, modify, sublicense, or distribute the Document except
2824	as expressly provided for under this License. Any other attempt to
2825	copy, modify, sublicense or distribute the Document is void, and will
2826	automatically terminate your rights under this License. However,
2827	parties who have received copies, or rights, from you under this
2828	License will not have their licenses terminated so long as such
2829	parties remain in full compliance.
2830
2831	@item
2832	FUTURE REVISIONS OF THIS LICENSE
2833
2834	The Free Software Foundation may publish new, revised versions
2835	of the GNU Free Documentation License from time to time. Such new
2836	versions will be similar in spirit to the present version, but may
2837	differ in detail to address new problems or concerns. See
2838	@uref{http://www.gnu.org/copyleft/}.
2839
2840	Each version of the License is given a distinguishing version number.
2841	If the Document specifies that a particular numbered version of this
2842	License ``or any later version'' applies to it, you have the option of
2843	following the terms and conditions either of that specified version or
2844	of any later version that has been published (not as a draft) by the
2845	Free Software Foundation. If the Document does not specify a version
2846	number of this License, you may choose any version ever published (not
2847	as a draft) by the Free Software Foundation.
2848	@end enumerate
2849
2850	@page
2851	@heading ADDENDUM: How to use this License for your documents
2852
2853	To use this License in a document you have written, include a copy of
2854	the License in the document and put the following copyright and
2855	license notices just after the title page:
2856
2857	@smallexample
2858	@group
2859	Copyright (C) @var{year} @var{your name}.
2860	Permission is granted to copy, distribute and/or modify this document
2861	under the terms of the GNU Free Documentation License, Version 1.2
2862	or any later version published by the Free Software Foundation;
2863	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
2864	Texts. A copy of the license is included in the section entitled ``GNU
2865	Free Documentation License''.
2866	@end group
2867	@end smallexample
2868
2869	If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
2870	replace the ``with@dots{}Texts.'' line with this:
2871
2872	@smallexample
2873	@group
2874	with the Invariant Sections being @var{list their titles}, with
2875	the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
2876	being @var{list}.
2877	@end group
2878	@end smallexample
2879
2880	If you have Invariant Sections without Cover Texts, or some other
2881	combination of the three, merge those two alternatives to suit the
2882	situation.
2883
2884	If your document contains nontrivial examples of program code, we
2885	recommend releasing these examples in parallel under your choice of
2886	free software license, such as the GNU General Public License,
2887	to permit their use in free software.
2888
2889	@c Local Variables:
2890	@c ispell-local-pdict: "ispell-dict"
2891	@c End:
2892
2893
2894	@c ---------------------------------------------------------------------
2895	@c ---------------------------------------------------------------------
2896
2897	@node Reporting bugs
2898	@chapter Reporting bugs
2899
2900	Report bugs to <obrebski@@amu.edu.pl>.
2901
2902	@c ---------------------------------------------------------------------
2903	@c ---------------------------------------------------------------------
2904
2905	@c @node Copyright
2906	@c @chapter Copyright
2907	@c
2908	@c Copyright 2004 by Tomasz ObrÄbski
2909	@c This software is free for research and educational use.
2910
2911	@c ---------------------------------------------------------------------
2912	@c ---------------------------------------------------------------------
2913
2914	@node Author
2915	@chapter Author
2916
2917
2918	@bye

Note: See TracBrowser for help on using the repository browser.

Download in other formats: