Context Navigation

source: doc/utt.texinfo @ d2f119e

Last change on this file since d2f119e was 9a36761, checked in by Mateusz Hromada <ruanda@…>, 17 years ago

Migration to new build system.

documentation moved and checked

Property mode set to 100644

File size: 85.5 KB

Rev	Line
[9ace5d2]	1
[25ae32e]	2	\input texinfo @c --texinfo--
[9ace5d2]	3	@c @documentencoding ISO-8859-2
	4	@documentencoding UTF-8
[25ae32e]	5	@c @documentlanguage pl
	6
	7	@c %**start of header
	8	@setfilename utt.info
	9	@settitle UAM Text Tools v0.90
	10	@c %**end of header
	11
	12	@copying
[261bf62]	13	This manual is for UAM Text Tools (version 0.90, October, 2008)
[25ae32e]	14
[9ace5d2]	15	Copyright @copyright{} 2005, 2007 Tomasz ObrÄbski, MichaÅ Stolarski, Justyna Walkowska, PaweÅ Konieczka.
[25ae32e]	16
	17	Permission is granted to copy, distribute and/or modify this document
[261bf62]	18	under the terms of the GNU Free Documentation License, Version 1.2 or
	19	any later version published by the Free Software Foundation; with no
	20	Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
	21	copy of the license is included in the section entitled GNU Free
	22	Documentation License,,GNU Free Documentation License.
[25ae32e]	23
	24	@c @quotation
	25	@c Permission is granted to ...
	26	@c No permission is granted until the document is completed.
	27	@c @end quotation
	28	@end copying
	29
	30
	31	@titlepage
	32	@title UAM Text Tools 0.90 - User Manual
	33	@subtitle edition 0.01, @today
	34	@subtitle status: prescript
[9ace5d2]	35	@author by Justyna Walkowska, Tomasz ObrÄbski and MichaÅ Stolarski
[25ae32e]	36	@page
	37	@vskip 0pt plus 1filll
	38	@insertcopying
	39	@end titlepage
	40
	41	@contents
	42
	43	@c @paragraphindent none
	44
	45	@iftex
[9ace5d2]	46	@tex
	47	% \usepackage[T1]{fontenc}
	48	% \usepackage[utf8]{inputenc}
	49	% \usepackage{times}
	50	@end tex
	51
[25ae32e]	52	@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
	53	@end iftex
	54	@c @headings off
	55	@c @everyheading LEM(1) @\| @\| LEM(1)
	56	@everyfooting @today @c @\| @thispage @\|
	57
	58	@ifnottex
	59
	60	@node Top
	61	@top UTT - UAM Text Tools
	62
	63	@insertcopying
	64
	65	@menu
	66	* General information::
	67	* UTT file format::
	68	* Configuration files::
	69	* UTT components::
	70	* Auxiliary tools::
	71	* Usage examples::
	72	* PMDBF dictionary::
	73	@c * Examples::
	74	@c * Copyright::
	75	* GNU Free Documentation License::
	76	* Reporting bugs::
	77	* Author::
	78	@end menu
	79	@end ifnottex
	80
	81
	82	@c ----------------------------------------------------------------------
	83
	84	@node General information
	85	@chapter General information
	86
	87	UAM Text Tools (UTT) is a package of language processing tools
	88	developed at Adam Mickiewicz University. Its functionality includes:
	89
	90	@itemize @bullet
	91
	92	@item
[9ace5d2]	93	tokenization Ã³ÅÄÅŒ
[25ae32e]	94	@item
	95	dictionary-based morphological analysis
	96	@item
	97	heuristic morphological analysis of unknown words
	98	@item
[9ace5d2]	99	spelling correction Ã³ÅÄÅÄÅŒ
[25ae32e]	100	@item
	101	pattern search
	102	@item
	103	sentence splitting
	104	@item
	105	generation of concordance tables
	106	@end itemize
	107
	108	The toolkit is destined for processing of raw (not annotated)
	109	unrestricted text for any conceivable purpose.
	110
	111	The system is organized as a collection of command-line programs, each
	112	performing one operation, e.g. tokenization, lemmatization, spelling
	113	correction. The components are independent one from another, the
	114	unifying element being the uniform i/o file format.
	115
	116	The components may be combined in various ways to provide various text
	117	processing services. Also new components supplied by the used may be
	118	easily incorporated into the system provided that they respect the i/o
	119	file format conventions.
	120
	121	UTT component programs does not depend on any specific tagset or
	122	morphological description format.
	123
	124	UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
	125	the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
	126
	127	The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use.
	128
	129
	130	List of contributors:
	131
	132	@itemize
	133	@item Pawel Konieczka
[9ace5d2]	134	@item Tomasz ObrÄbski
	135	@item MichaÅ Stolarski
[25ae32e]	136	@item Marcin Walas
	137	@item Justyna Walkowska
[9ace5d2]	138	@item PaweÅ WereÅski
[25ae32e]	139	@end itemize
	140
	141	@c ----------------------------------------------------------------------
	142	@c ---------------------------------------------------------------------
	143
	144	@node UTT file format
	145	@chapter UTT file format
	146
	147	A UTT file contains annotation of a text. It consists of a sequence of
	148	segments. Each segment explicitly refers to a continuous piece of the
	149	text and provides some information on it.
	150
	151	@section Segment format
	152
	153	A segment occupies one line of a UTT file and consists of
	154	space-separated fields:
	155
	156
	157	@quotation
	158	@sp 1
	159	[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
	160	@sp 1
	161	@end quotation
	162
	163	@table @var
	164
	165	@item @var{start}
	166	Non-negative integer value indicating the position in the source text where the
	167	segment starts.
	168
	169	@item @var{length}
	170	Non-negative integer value indicating the length of the segment.
	171
	172	@item @var{type}
	173	A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
	174	@var{type} reflects the main classification of segments -
	175	into words, numbers, punctuation marks, meta-text markers.
	176	@xref{tok output,,tok output}, for description of automatically recognized type markers.
	177
	178	@item @var{form}
	179	This field contains the textual form of the segment or the special
	180	symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
	181
	182	The characters or character sequences that have special meaning in the
	183	@var{form} field are enumerated below.
	184
	185	Characters with special meaning:
	186
	187	@itemize
	188	@item @code{_} - space character
	189	@item @code{*} - undefined contents
	190	@end itemize
	191
	192	Escape sequences:
	193
	194	@itemize
	195	@item @code{\n} - new line
	196	@item @code{\t} - tabulation
	197	@item @code{\r} - carriage return
	198
	199	@item @code{\_} - the @code{_} character
	200	@item @code{\} - the @code{} character
	201	@item @code{\\} - the @code{\} character
	202
	203	@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
	204	@end itemize
	205
	206	@item @var{annotation1}
	207	@item @var{annotation2}
	208	@item ...
	209	Annotation fields have the following format:
	210
	211	@var{longname} @code{:} @var{value}
	212
	213	or
	214
	215	@var{shortname} @var{value}
	216
	217	where @var{longname} is a string of alphanumeric characters
	218	(isalnum() test), @var{shortname} - a single non-alphanumeric character
	219	(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
	220
	221	@end table
	222
	223
	224	Only two fields are mandatory: @var{type} and @var{form}. All other fields
	225	may be absent. In the case when only one number precedes the
	226	@var{type} field, it is interpreted as the @var{START} position.
	227
	228	If the @var{length} field is ommited, the length of the segment is the
	229	length of the @var{form} field, except when the value of the
	230	@var{form} field is @code{*} -- in this case, the length is assumed to
	231	be 0.
	232
	233	If the @var{start} field is also absent, the segment is assumed to directly
	234	follow the preceding one.
	235
	236	@c Conventions:
	237
	238	@c Annotation fields with predefined meaning:
	239
	240	@c @itemize
	241	@c @item @code{!} - UTT components are allowed to modify the contents of
	242	@c the @var{form} field (e.g. spelling correction does this). If this happens the
	243	@c original form of the segment have to be placed in the @code{!}-field.
	244	@c @item @code{@@} - morphological description
	245	@c @item @code{=} - node identifier assignment (used in graph encoding)
	246	@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
	247	@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
	248	@c @end itemize
	249
	250	Segments of length 0 may be used to mark file positions with some
	251	information. See e.g. BOS and EOS (beginning/end of sentence) markers
	252	in the example below.
	253
	254	Example:
	255
	256	sentence: @samp{Piszemy dobre progrumy.}
	257
	258	@example
	259	0000 00 BOS *
[9ace5d2]	260	0000 07 W Piszemy lem:pisaÄ,V
[25ae32e]	261	0007 01 S _
	262	0008 05 W dobre lem:dobry,ADJ
	263	0013 01 S _
	264	0014 08 W progrumy cor:programy lem:program,N
	265	0022 01 P .
	266	0023 00 EOS *
	267	0023 01 S _
	268	0024 00 BOS *
	269	0024 11 W Warszawiacy lem:Warszawiak,N
	270	0035 01 S _
[9ace5d2]	271	0036 03 W teÅŒ
[25ae32e]	272	0039 01 P .
	273	0040 00 EOS *
	274
	275	@end example
	276
	277	@example
	278	0000 BOS *
[9ace5d2]	279	0000 W Piszemy lem:pisaÄ,V
[25ae32e]	280	0007 S _
	281	0008 W dobre lem:dobry,ADJ
	282	0013 S _
	283	0014 W progrumy cor:programy lem:program,N
	284	0022 P .
	285	0023 EOS *
	286	@end example
	287
	288	Posion information may be provided only for some types of segments:
	289
	290	@example
	291	0000 BOS *
[9ace5d2]	292	W Piszemy lem:pisaÄÂ,V
[25ae32e]	293	S _
	294	W dobre lem:dobry,ADJ
	295	S _
	296	W progrumy cor:programy lem:program,N
	297	P .
	298	EOS *
	299	S _
	300	0024 BOS *
	301	W Warszawiacy lem:Warszawiak,N
	302	S _
[9ace5d2]	303	W teÅŒ
[25ae32e]	304	P .
	305	EOS *
	306	@end example
	307
	308	Position/length information may be provided only when necessary:
	309
	310	@example
	311	0000 04 N *
	312	0000 N 12
	313	P .
	314	N 5
	315	S _
	316	W km
	317	@end example
	318
	319	@section UTT File
	320
	321	A UTT file consists of a sequence of segments. The same text position
	322	may be covered by multiple segments. In cosequence, ambiguous text
	323	segmentation and ambiguous annotation may be represented.
	324
	325	There are two structural requirements a valid UTT-formatted file
	326	has to meet:
	327
	328	@itemize @bullet
	329
	330	@item
	331	segments have to be sorted with respect to the @var{position} field,
	332
	333	@item
	334	for each
	335	segment ending at position @var{n}, either there must be a segment starting at
	336	position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
	337	for each segment starting at position @var{n}, either there must be a segment
	338	ending at position @var{n-1}, or the position @var{n-1} must not be covered
	339	by any segment.
	340
	341	@end itemize
	342
	343	A valid annotation for the text fragment
	344	@example
	345	12.5 km
	346	@end example
	347
	348	may be
	349
	350	@example
	351	0000 02 N 12
	352	0000 04 N 12.5
	353	0002 01 P .
	354	0003 01 N 5
	355	0004 01 S _
	356	0005 02 W km
	357	@end example
	358
	359	but not
	360
	361	@example
	362	0000 02 N 12
	363	0000 04 N 12.5
	364	0004 01 S _
	365	0005 02 W km
	366	@end example
	367
[261bf62]	368	because in the latter example the first segment (starting at position
	369	0000, 2 characters long) ends at position @var{n}=0001 which is
	370	covered by the second segment and no segment starts at position
	371	@var{n+2}=0002.
	372
	373
	374	@section Flattened UTT file
	375
[e28a625]	376	A UTT file format has two variants: regular and flattened. The regular
[261bf62]	377	format was described above. In the flattened format some of the
	378	end-of-line characters are replaced with line-feed characters.
	379
	380	The flatten format is basically used to represent whole sentences as
	381	single lines of the input file (all intrasentential end-of-line
	382	characters are replaced with line-feed characters).
	383
	384	This technical trick permits to perform certain text
	385	processing operations on entire sentences with the use of such tools as
	386	@command{grep} (see @command{grp} component) or @command{sed} (see @command{mar} component).
	387
	388	The conversion between the two formats is performed by the tools:
	389	@command{fla} and @command{unfla}.
[25ae32e]	390
	391	@section Character encoding
	392
	393	The UTT component programs accept only 1-byte character encoding, such
[261bf62]	394	as ISO, ANSI, DOS.
[25ae32e]	395
	396
	397	@c @section Formats
	398
	399	@c @unnumberedsubsubsec Basic format
	400
	401	@c While processing large amounts of the overhead related with explicit
	402	@c ... of the start position and segment length becomes ... . Therefore,
	403	@c for efficiency reasons certain shortcuts are possible:
	404
	405	@c @unnumberedsubsubsec Relative start position
	406
	407	@c Start position may be given as relative distance from the last
	408	@c absolut position.
	409
	410	@c @unnumberedsubsubsec Absent length
	411
	412	@c Segment length may by omitted. Normally it can be restored by counting
	413	@c the length of the @emph{form field}. For segments with the special value
	414	@c @code{*} in the @emph{form field} length 0 is assumed.
	415
	416	@c @unnumberedsubsubsec Absent length and start position
	417
	418	@c Both start position and segment length may be omitted. In this format
	419	@c each segment is assumed to follow the previous one. This format is,
	420	@c therefore, suitable only for unambiguously tagged text
	421	@c (0-length markers can be still used.)
	422
	423
	424	@c @table @code
	425	@c @item AL
	426	@c @code{1234 03 W kot}
	427	@c @item RL
	428	@c @code{+56 03 W kot}
	429	@c @item A
	430	@c @code{1234 W kot}
	431	@c @item R
	432	@c @code{+56 W kot}
	433	@c @item 0
	434	@c @code{W kot}
	435	@c @end table
	436
	437
[9ace5d2]	438	@c [JAK UZYSKAÄÂ POLSKIE CZCIONKI W DVI???]
[25ae32e]	439
	440	@macro parhelp
	441	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
	442	Print help.
	443	@end macro
	444
	445
	446	@macro parversion
	447	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
	448	Print version information.
	449	@end macro
	450
	451	@macro parinteractive
	452	@item @b{@minus{}@minus{}interactive, @minus{}i}
	453	This option toggles interactive mode, which is by default off. In the
	454	interactive mode the program does not buffer the output.
	455	@end macro
	456
	457
	458	@c @macro parfile
	459	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	460	@c Input file name.
	461	@c If this option is absent or equal to '@minus{}', the program
	462	@c reads from the standard input.
	463	@c @end macro
	464
	465
	466	@c @macro paroutput
	467	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	468	@c Regular output file name. To regular output the program sends segments
	469	@c which it successfully processed and copies those which were not
	470	@c subject to processing. If this option is absent or equal to
	471	@c '@minus{}', standard output is used.
	472	@c @end macro
	473
	474	@c @macro parfail
	475	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
	476	@c Fail output file name. To fail output the program copies the segments
	477	@c it failed to process. If this option is absent or equal to
	478	@c '@minus{}', standard output is used.
	479	@c @end macro
	480
	481
	482	@c @macro parcopy
	483	@c @item @b{@minus{}@minus{}copy, @minus{}c}
	484	@c Copy succesfully processed segments to regular output also in their
	485	@c original input form.
	486	@c @end macro
	487
	488
	489	@macro parinputfield
	490	@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
	491	The field containing the input to the program. The default is the
	492	@var{form} field. The fields @var{position}, @var{length}, @var{type},
	493	and @var{form} are referred to as @code{1}, @code{2}, @code{3},
	494	@code{4}, respectively.
	495	@end macro
	496
	497
	498	@macro paroutputfield
	499	@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
	500	The name of the field added by the program. The default is the name of the program.
	501	@end macro
	502
	503
	504	@macro pardictionary
	505	@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
	506	Dictionary file name.
	507	@end macro
	508
	509
	510	@macro parprocess
	511	@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
	512	Process segments with the specified value in the @var{type} field.
	513	Multiple occurences of this option are allowed and are interpreted as
	514	disjunction. If this option is absent, all segments are processed.
	515	@end macro
	516
	517
	518	@macro parselect
	519	@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
	520	Select for processing only segments in which the field named
	521	@var{fieldname} is present. Multiple occurences of this option are
	522	allowed and are interpreted as conjunction of conditions. If this
	523	option is absent, all segments are processed.
	524	@end macro
	525
	526
	527	@macro parunselect
	528	@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
	529	Select for processing only segments in which the field @var{fieldname}
	530	is absent. Multiple occurences of this option are allowed and are
	531	interpreted as conjunction of conditions. If this option is absent,
	532	all segments are processed.
	533	@end macro
	534
	535
	536	@macro paroneline
	537	@item @b{@minus{}@minus{}one-line}
	538	This option makes the program print ambiguous annotation in one output
	539	line by generating multiple annotation fields. By default when
	540	ambiguous annotation may be produced for a segment, the segment is
	541	multiplicated and each of the annotations is added to separate copy of
	542	the segment.
	543	@end macro
	544
	545
	546	@macro paronefield
	547	@item @b{@minus{}@minus{}one-field, @minus{}1}
	548	This option makes the program print ambiguous annotation in one
	549	annotation field. By default when ambiguous annotation may be produced
	550	for a segment, the segment is multiplicated and each of the
	551	annotations is added to separate copy of the segment.
	552
	553	This option is useful when working with @command{kot} or @command{con}.
	554	@end macro
	555
	556
	557	@c ---------------------------------------------------------------------
	558	@c CONFIGURATION FILES
	559	@c ---------------------------------------------------------------------
	560
	561	@node Configuration files
	562	@chapter Configuration files
	563
	564	Values for all command line options accepted by a component
	565	may be set in configuration files. The default location of the
	566	configuration files for a component named @command{@var{program}} are
	567
	568	@example
[246900a]	569	@file{/usr/local/etc/utt/@var{program}.conf}
[25ae32e]	570	@end example
	571
	572	for system-wide configuration file and
	573
	574	@example
[246900a]	575	@file{~/.utt/@var{program}.conf}
[25ae32e]	576	@end example
	577
	578	for user configuration file.
	579
	580	@c The configuration file to load may be also specified with the
	581	@c @option{--config} option. Configuration file need not be provided.
	582
	583	For each option, the value is set according to the following priority:
	584
	585	@itemize
	586	@item command line
	587	@c @item configuration file indicated with @option{--config} option
	588	@item user configuration file (or configuration file indicated with the @option{--config} option)
	589	@item system-wide configuration file
	590	@end itemize
	591
	592	Parameter values are specified in the following format:
	593
	594	@var{parametername}=@var{value}
	595
	596	where @var{parametername} is the short or long name of an option accepted by
	597	the program, or
	598
	599	@var{parametername}
	600
	601	if the option does not need arguments.
	602
	603	You can introduce comments to configuration files using the # sign.
	604
	605	If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
	606
	607	@c The equal sign may be omitted.
	608
	609
	610	@quotation Tip
	611	If you have two (or more) frequently used sets of options for the same
	612	program (eg. lem with PMDBF dictionary and lem with a user dictionary)
	613	a good solution is to create two soft links to lem, called
	614	eg. lemg and lemu and specify their configuration in files lemg.conf
	615	and lemu.conf respectively.
	616	@end quotation
	617
	618	@c ---------------------------------------------------------------------
	619	@c COMPONENTS
	620	@c ---------------------------------------------------------------------
	621
	622	@node UTT components
	623	@chapter UTT components
	624
	625	UTT components are of three types:
	626
	627	@menu
	628	Sources: programs which read non-UTT data (e.g. raw text) and produce output
	629	in UTT format
	630	* tok:: a tokenizer
	631
	632	Filters: programs which read and produce UTT-formatted data
	633	* lem:: a morphological analyzer
	634	* gue:: a morphological guesser
[261bf62]	635	* cor:: a simple spelling corrector
	636	* kor:: a more elaborated spelling corrector
[25ae32e]	637	* sen:: a sentensizer
	638	* ser:: a pattern search tool (marks matches)
[261bf62]	639	* mar:: a pattern search tool (introduces arbitrary markers into the text)
[25ae32e]	640	* grp:: a pattern search tool (selects sentences containing a match)
[261bf62]	641	@c * gph:: a word-graph annotation tool::
	642	@c * dgp:: a dependency parser
[25ae32e]	643
	644	Sinks: programs which read UTT data and produce output in another format
	645	* kot:: an untokenizer
	646	* con:: a concordance table generator
	647	@end menu
	648
	649	@c ---------------------------------------------------------------------
	650	@c TOK
	651	@c ---------------------------------------------------------------------
	652
	653	@page
	654	@node tok
	655	@section tok - a tokenizer
	656
	657	@c ----------------------------------------
	658
	659	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	660	@item @strong{Authors:} @tab Tomasz ObrÄbski
[25ae32e]	661	@item @strong{Component category:} @tab source
[261bf62]	662	@item @strong{Input format:} @tab raw text file
	663	@item @strong{Output format:} @tab UTT regular
	664	@item @strong{Required annotation:} @tab -
[25ae32e]	665	@end multitable
	666
	667
	668	@menu
	669	* tok description::
	670	* tok input::
	671	* tok output::
	672	* tok command line options::
	673	* tok example::
	674	@end menu
	675
	676	@node tok description
	677	@subsection Description
	678
	679	@code{tok} is a simple program which reads a text file and identifies
	680	tokens on the basis of their orthographic form. The type of the token
	681	is printed as the @var{type} field.
	682
	683	@node tok input
	684	@subsection Input
	685
	686	Raw text.
	687
	688	@node tok output
	689	@subsection Output
	690
	691	UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
	692
	693	@itemize
	694
	695	@item @code{W}
	696	(word)
	697	- continuous sequence of letters
	698
	699	@item @code{N}
	700	(number)
	701	- continuous sequence of digits
	702
	703	@item @code{S}
	704	(space)
	705	- continuous sequence of space characters
	706
	707	@item @code{P}
	708	(punctuation mark)
	709	- single printable characters not belonging to any of the other classes
	710
	711	@item @code{B}
	712	(unprintable character)
	713	- single unprintable character
	714
	715	@end itemize
	716
	717
	718
	719	@node tok command line options
	720	@subsection Command line options
	721
	722	@table @code
	723
	724	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
	725	Print help.
	726
	727	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
	728	Print version information.
	729
	730	@item @b{@minus{}@minus{}interactive, @minus{}i}
	731	This option toggles interactive mode, which is by default off. In the
	732	interactive mode the program does not buffer the output.
	733
	734	@end table
	735
	736	@node tok example
	737	@subsection Example
	738
	739	Input:
	740
	741	@example
	742	Piszemy dobre programy.
	743	@end example
	744
	745	Output:
	746
	747	@example
	748	0000 07 W Piszemy
	749	0007 01 S _
	750	0008 05 W dobre
	751	0013 01 S _
	752	0014 08 W programy
	753	0022 01 P .
	754	0023 01 S \n
	755	@end example
	756
	757
	758	@c ---------------------------------------------------------------------
	759	@c SEN
	760	@c ---------------------------------------------------------------------
	761
	762	@c @node sen - sentencizer
	763	@c @chapter sen - sentencizer
	764
[9ace5d2]	765	@c Authors: Tomasz ObrÄbski
[25ae32e]	766
	767	@c ---------------------------------------------------------------------
	768	@c LEM
	769	@c ---------------------------------------------------------------------
	770
	771	@page
	772	@node lem
	773	@section lem - morphological analyzer
	774
	775	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	776	@item @strong{Authors:} @tab Tomasz ObrÄbski, MichaÅ Stolarski
[25ae32e]	777	@item @strong{Component category:} @tab filter
[261bf62]	778	@item @strong{Input format:} @tab UTT regular
	779	@item @strong{Output format:} @tab UTT regular
	780	@item @strong{Required annotation:} @tab tok
[25ae32e]	781	@end multitable
	782
	783	@menu
	784	* lem description::
	785	* lem command line options::
	786	* lem input::
	787	* lem output::
	788	* lem example::
	789	* lem dictionaries::
	790	* lem hints::
	791	@end menu
	792
	793	@node lem description
	794	@subsection Description
	795
	796	@command{lem} performs morphological analysis of a simple orthographic
	797	word, returning all its possible morphological annotations,
	798	disregarding the context.
	799
	800	@c ----------------------------------------
	801
	802	@node lem command line options
	803	@subsection Command line options
	804
	805	@table @code
	806	@parhelp
	807	@parversion
	808	@parinteractive
	809	@c @parfile
	810	@c @paroutput
	811	@c @parfail
	812	@c @parcopy
	813	@parinputfield
	814	@paroutputfield
	815	@pardictionary
	816	@parprocess
	817	@parselect
	818	@parunselect
	819	@paroneline
	820	@paronefield
	821	@end table
	822
	823	@c ----------------------------------------
	824
	825	@node lem input
	826	@subsection Input
	827
	828	Lem reads a UTT file and processes the value of the @var{form} field
	829	(the input field may be changed with @option{--input-field} option).
	830
	831	@node lem output
	832	@subsection Output
	833
	834	@command{lem} adds a new annotation field, whose default name is @code{lem}. In
	835	case of ambiguity either the segment is multiplicated (default),
	836	multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
	837	annotation is produced as the value of single @code{lem} field (option
	838	@option{--one-field,-1}):
	839
	840	@itemize @bullet
	841
	842	@item
	843	unambiguous value format:
	844
	845	@example
	846	<lemma>,<descr>
	847	@end example
	848
	849	@item
	850	ambiguous value format (@option{--one-field} option)
	851
	852
	853	@example
	854	<lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
	855	@end example
	856
	857	(alternative descriptions for the same lemma are separated by commas,
	858	alternative lemmata are separated by semicolons.)
	859
	860	@end itemize
	861
	862	@node lem example
	863	@subsection Example
	864
	865	Input:
	866
	867	@example
	868	0000 07 W Piszemy
	869	0007 01 S _
	870	0008 05 W dobre
	871	0013 01 S _
	872	0014 08 W programy
	873	0022 01 P .
	874	0023 01 B \n
	875	@end example
	876
	877	Output (default):
	878
	879	@example
[9ace5d2]	880	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
[25ae32e]	881	0007 01 B _
	882	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
	883	0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
	884	0013 01 B _
	885	0014 08 W programy lem:program,N/GiNpCa
	886	0014 08 W programy lem:program,N/GiNpCn
	887	0014 08 W programy lem:program,N/GiNpCv
	888	0022 01 P .
	889	0023 01 B \n
	890	@end example
	891
	892	Output (@option{--one-line} option):
	893
	894	@example
[9ace5d2]	895	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
[25ae32e]	896	0007 01 S _
	897	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
	898	0013 01 S _
	899	0014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
	900	0022 01 P .
	901	0023 01 S \n
	902	@end example
	903
	904	Output (@option{--one-field} option):
	905
	906	@example
[9ace5d2]	907	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
[25ae32e]	908	0007 01 S _
	909	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
	910	0013 01 S _
	911	0014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
	912	0022 01 P .
	913	0023 01 S \n
	914	@end example
	915
	916	@c ----------------------------------------
	917
	918	@node lem dictionaries
	919	@subsection Dictionaries
	920
	921	@command{lem} requires a dictionary. The dictionary may be provided in
	922	one of two formats: in text (source) format or in binary (fsa) format.
	923
	924	@subsubheading Text format
	925
	926	Dictionary entries have the following structure:
	927
	928	@example
	929	<form>;<lemma>,<descr>[;<lemma>,<descr>]
	930	@end example
	931
	932	@var{lemma} may be given explicitly or in the cut-add format:
	933
	934	@example
	935	@code{[<cut1><add1>-]<cut2><add2>}
	936	@end example
	937
	938	meaning: replace prefix of length @code{<cut1>} with
	939	string @code{<add1>}, replace suffix of length @code{<cut2>} with string
	940	@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
[9ace5d2]	941	@samp{kot}, @code{3-4aÃÅy} transforms @samp{najbielsi} into @samp{biaÃÅy}
[25ae32e]	942
	943	Each dictionary entry must be written in one line and must not contain blank characters.
	944
	945	Examples:
	946	@example
	947	kot;0,N/GaNsCn
	948	kota;1,N/GaNsCg;1,N/GaNsCa
	949	kotu;1,N/GaNsCd
	950	kotem;2,N/GaNsCi
	951	kocie;3t,N/GaNsCl;3t,N/GaNsCv
[9ace5d2]	952	najbielsi;3-4aÅy,ADJ/DsNpCnGp
	953	najbielsze;3-5aÅy,ADJ/DsNpCnGaifn
[25ae32e]	954	najlepsi;dobry,ADJ/DsNpCnGp
	955	najlepsze;dobry,ADJ/DsNpCnGaifn
	956	@end example
	957
	958
	959	The mandatory file name extension for a text dictionary is @code{dic}. For large
	960	dictionaries it is preferable, however, to compile them into binary
	961	(fsa) format.
	962
	963	@subsubheading Binary format
	964
	965	The mandatory file name extension for a binary dictionary is @code{bin}. To
	966	compile a text dictionary into binary format, write:
	967
	968	@example
	969	compiledic <dictionaryname>.dic
	970	@end example
	971
	972	@subsubheading Polex/PMDBF dictionary
	973
	974	A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
	975	the distribution as the default @emph{lem}'s dictionary. It's
	976	located by default in:
	977
[261bf62]	978	@file{$HOME/.local/share/utt/pl_PL.ISO-8859-2/lem.bin}
	979
	980	in local installation or in
	981
	982	@file{/usr/local/share/utt/pl_PL.ISO-8859-2/lem.bin}
	983
	984	in system installation.
[25ae32e]	985
	986	@node lem hints
	987	@subsection Hints
	988
[261bf62]	989	@subsubheading Combining data from multiple dictionaries
[25ae32e]	990
[261bf62]	991	@itemize
[25ae32e]	992
[261bf62]	993	@item Apply <dict1>, then apply <dict2> to words which were not annotatated.
[25ae32e]	994
[261bf62]	995	@example
	996	lem -d <dict1> \| lem -S lem -d <dict2>
	997	@end example
[25ae32e]	998
[261bf62]	999	@item Add annotations from two dictionaries <dict1> and <dict2>.
[25ae32e]	1000
[261bf62]	1001	@example
	1002	lem -c -d <dict1> \| lem -S lem -d <dict2>
	1003	@end example
[25ae32e]	1004
[261bf62]	1005	@end itemize
[25ae32e]	1006
	1007
	1008	@c ---------------------------------------------------------------------
	1009	@c GUE
	1010	@c ---------------------------------------------------------------------
	1011
	1012	@page
	1013	@node gue
	1014	@section gue - morphological guesser
	1015
	1016	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1017
[9ace5d2]	1018	@item @strong{Authors:} @tab MichaÅ Stolarski, Tomasz ObrÄbski
[25ae32e]	1019	@item @strong{Component category:} @tab filter
	1020
	1021	@end multitable
	1022
	1023	@menu
[261bf62]	1024	* gue description::
[25ae32e]	1025	* gue command line options::
	1026	* gue example::
	1027	* gue dictionaries::
	1028	@end menu
	1029
[261bf62]	1030
	1031	@node gue description
	1032	@subsection Description
	1033
	1034	@command{gue} guesess morphological descriptions of the form contained
	1035	in the @var{form} field.
	1036
	1037
[25ae32e]	1038	@node gue command line options
	1039	@subsection Command line options
	1040
	1041	@table @code
	1042
	1043	@parhelp
	1044	@parversion
	1045	@parinteractive
	1046	@c @parfile
	1047	@c @paroutput
	1048	@c @parfail
	1049	@c @parcopy
	1050	@parinputfield
	1051	@paroutputfield
	1052	@pardictionary
	1053	@parprocess
	1054	@parselect
	1055	@parunselect
	1056	@paroneline
	1057	@paronefield
	1058
	1059	@item @b{@minus{}@minus{}delta=@var{n}}
	1060	Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
	1061
	1062
	1063	@item @b{@minus{}@minus{}cut-off=@var{n}}
	1064	Do not display answers with less weight than cut-off value (default=`200').
	1065
	1066
	1067	@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
	1068	Guess up to n descriptions (default=`0', which means 'display all results').
	1069
	1070
	1071
	1072	@end table
	1073
	1074	@node gue example
	1075	@subsection Example
	1076
	1077	@example
	1078	command: gue -n 2
	1079
	1080	input:
	1081	0000 07 W smerfny
	1082
	1083	output:
	1084	0000 07 W smerfny gue:,ADJ/CaDpGiNs
	1085	0000 07 W smerfny gue:,ADJ/CnvDpGaipNs
	1086	@end example
	1087
	1088
	1089	@node gue dictionaries
	1090	@subsection Dictionaries
	1091
	1092	@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
	1093	The fsa format is created by compiling text-format dictionaries.
	1094
	1095
	1096
	1097	@subsubheading Text format
	1098
	1099	Dictionary entries have the following structure:
	1100
	1101	@example
	1102	@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
	1103	@end example
	1104
	1105	@var{lemma} must be given in the cut-add format:
	1106
	1107	@example
	1108	@code{[<cut1><add1>-]<cut2><add2>}
	1109	@end example
	1110	(no spaces in between): replace prefix of length @var{cut1} with
	1111	string @var{add1}, replace suffix of length @var{cat2} with string
	1112	@var{add2}.
	1113
	1114
[9ace5d2]	1115	Example: @code{3-4aÅy} transforms @i{najbielsi} into @i{biaÅy}
[25ae32e]	1116
	1117
	1118	@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
	1119
	1120	@var{weight} is an integer value between 1 and 999 indicating the
	1121	likelihood of the guess.
	1122
[9ace5d2]	1123	@c @example
	1124	@c *ÅkÄ;1a,N/GfNsCa
	1125	@c naj*elszy;3-4aÅy,ADJ/...:...
	1126	@c @end example
[25ae32e]	1127
	1128
	1129	@c ---------------------------------------------------------------------
	1130	@c COR
	1131	@c ---------------------------------------------------------------------
	1132
	1133	@page
	1134	@node cor
	1135	@section cor - spelling corrector
	1136
	1137	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1138	@item @strong{Authors:} @tab Tomasz ObrÄbski, MichaÅ Stolarski
[25ae32e]	1139	@item @strong{Component category:} @tab filter
[261bf62]	1140	@item @strong{Input format:} @tab UTT regular
	1141	@item @strong{Output format:} @tab UTT regular
	1142	@item @strong{Required annotation:} @tab tok
[25ae32e]	1143	@end multitable
	1144
[261bf62]	1145	@menu
	1146	* cor description::
	1147	* cor command line options::
	1148	* cor dictionaries::
	1149	@end menu
	1150
	1151
	1152	@node cor description
	1153	@subsection Description
	1154
[25ae32e]	1155	The spelling corrector applies Kemal Oflazer's dynamic programming
	1156	algorithm @cite{oflazer96} to the FSA representation of the set of
	1157	word forms of the Polex/PMDBF dictionary. Given an incorrect
	1158	word form it returns all word forms present in the dictionary whose
	1159	edit distance is smaller than the threshold given as the parameter.
	1160
	1161
	1162	@node cor command line options
	1163	@subsection Command line options
	1164
	1165	@table @code
	1166
	1167	@parhelp
	1168	@parversion
	1169	@parinteractive
	1170	@c @parfile
	1171	@c @paroutput
	1172	@c @parfail
	1173	@c @parcopy
	1174	@parinputfield
	1175	@paroutputfield
	1176	@pardictionary
	1177	@parprocess
	1178	@parselect
	1179	@parunselect
	1180	@paroneline
	1181	@paronefield
	1182
	1183	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
	1184	Maximum edit distance (default='1').
	1185
[261bf62]	1186	@c @item @b{@minus{}@minus{}replace, @minus{}r}
	1187	@c Replace original form with corrected form, place original form in the
	1188	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
	1189
[25ae32e]	1190
	1191	@end table
	1192
	1193	@node cor dictionaries
	1194	@subsection Dictionaries
	1195
	1196	@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
	1197	The fsa format is created by compiling text-format dictionaries.
	1198
	1199	@subsubheading Text format
	1200
	1201	The @command{cor} dictionary is a list of words:
	1202	@example
	1203	odlot
	1204	odlotowy
	1205	odludek
	1206	@end example
	1207
[261bf62]	1208	@subsubheading Binary format
	1209
	1210	The mandatory file name extension for a binary dictionary is @code{bin}. To
	1211	compile a text dictionary into binary format, write:
	1212
	1213	@example
	1214	compiledic <dictionaryname>.dic
	1215	@end example
	1216
	1217	@c ---------------------------------------------------------------------
	1218	@c KOR
	1219	@c ---------------------------------------------------------------------
	1220
	1221	@page
	1222	@node kor
	1223	@section kor - configurable spelling corrector
	1224
[9ace5d2]	1225	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1226	@item @strong{Authors:} @tab PaweÅ Werenski, Tomasz ObrÄbski, MichaÅ Stolarski
	1227	@item @strong{Component category:} @tab filter
	1228	@item @strong{Input format:} @tab UTT regular
	1229	@item @strong{Output format:} @tab UTT regular
	1230	@item @strong{Required annotation:} @tab tok
	1231	@end multitable
	1232
	1233	@menu
	1234	* kor description::
	1235	* kor command line options::
	1236	* kor weights definition file::
	1237	* kor dictionaries::
	1238	@end menu
	1239
	1240
	1241	@node kor description
	1242	@subsection Description
	1243
	1244	The spelling corrector applies a Pawel Werenski's dynamic programming
	1245	algorithm to the FSA representation of the set of word forms of the
	1246	Polex/PMDBF dictionary. The algorithm is an extension of K. Oflazer
	1247	algorithm used by @command{cor}. In the extended version it is
	1248	possible to assign weights to individual edit operations.
	1249
	1250	Given an incorrect word form it returns all word forms
	1251	present in the dictionary whose edit distance is smaller than the
	1252	threshold given as the parameter.
	1253
	1254
	1255	@node kor command line options
	1256	@subsection Command line options
	1257
	1258	@table @code
	1259
	1260	@parhelp
	1261	@parversion
	1262	@parinteractive
	1263	@c @parfile
	1264	@c @paroutput
	1265	@c @parfail
	1266	@c @parcopy
	1267	@parinputfield
	1268	@paroutputfield
	1269	@pardictionary
	1270	@parprocess
	1271	@parselect
	1272	@parunselect
	1273	@paroneline
	1274	@paronefield
	1275
	1276	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
	1277	Maximum edit distance (default='1').
	1278
	1279	@item @b{@minus{}@minus{}weights=@var{filename}, @minus{}w @var{filename}}
	1280	Edit operations' weights file.
	1281
	1282	@c @item @b{@minus{}@minus{}replace, @minus{}r}
	1283	@c Replace original form with corrected form, place original form in the
	1284	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
	1285
	1286
	1287	@end table
	1288
	1289
	1290	@node kor weights definition file
	1291	@subsection Weights definition file
	1292
	1293	Example:
	1294
	1295	@example
	1296
	1297	%stdcor 1
	1298	%xchg 1
	1299	ÅŒ rz 0.5
	1300	ch h 0.5
	1301	u Ã³ 0.5
	1302
	1303	@end example
	1304
	1305
	1306	Default weight is set to 1 (@code{%stdcor 1}), the weight of exchange
	1307	operation is set to 1 (@code{%xchg 1}), the three principal orthographic
	1308	errors are assigned the weight 0.5.
	1309
	1310	The edit operation weight declaration, such as
	1311
	1312	@example
	1313	ÅŒ rz 0.5
	1314	@end example
	1315
	1316	works in both ways, i.e. ÅŒ->rz, rz->ÅŒ.
	1317
	1318	The default weights definition file for @code{kor} is:
	1319
	1320	@example
	1321	$HOME/.local/share/utt/weights.kor
	1322	@end example
	1323
	1324	or, if the above mentioned file is absent:
	1325
	1326	@example
	1327	/usr/local/share/utt/weights.kor
	1328	@end example
	1329
	1330
	1331	@node kor dictionaries
	1332	@subsection Dictionaries
	1333
	1334	see @command{cor}
[261bf62]	1335
	1336	@c ---------------------------------------------------------------------
	1337	@c SEN
	1338	@c ---------------------------------------------------------------------
	1339
[25ae32e]	1340	@page
	1341	@node sen
	1342	@section sen - a sentensizer
	1343
	1344	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1345
[9ace5d2]	1346	@item @strong{Authors:} @tab Tomasz ObrÄbski
[25ae32e]	1347	@item @strong{Component category:} @tab filter
[261bf62]	1348	@item @strong{Input format:} @tab UTT regular
	1349	@item @strong{Output format:} @tab UTT regular
	1350	@item @strong{Required annotation:} @tab tok
[25ae32e]	1351
	1352	@end multitable
	1353
	1354
	1355	@menu
[261bf62]	1356	* sen description::
[25ae32e]	1357	@c * sen input::
	1358	@c * sen output::
	1359	* sen example::
	1360	@end menu
	1361
[261bf62]	1362	@node sen description
	1363	@subsection Description
	1364
	1365	@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
	1366
[25ae32e]	1367	@node sen example
	1368	@subsection Example
	1369
	1370	@example
	1371	command: sen
	1372
	1373	input:
[9ace5d2]	1374	0000 05 W CzeÅÄ
[25ae32e]	1375	0005 01 P !
	1376	0006 01 S _
	1377	0007 02 W To
	1378	0009 01 S _
	1379	0010 02 W ja
	1380	0012 01 P .
	1381	0013 01 S \n
	1382
	1383	output:
	1384	0000 00 BOS *
[9ace5d2]	1385	0000 05 W CzeÅÄ
[25ae32e]	1386	0005 01 P !
	1387	0006 00 EOS *
	1388	0006 00 BOS *
	1389	0006 01 S _
	1390	0007 02 W To
	1391	0009 01 S _
	1392	0010 02 W ja
	1393	0012 01 P .
	1394	0013 01 S \n
	1395	0014 00 EOS *
	1396	@end example
	1397
	1398
	1399	@c ---------------------------------------------------------------------
	1400	@c GPH
	1401	@c ---------------------------------------------------------------------
	1402
	1403	@c @node gph - graphizer
	1404	@c @chapter gph - graphizer
	1405
[9ace5d2]	1406	@c Authors: Tomasz ObrÄbski
[25ae32e]	1407
	1408
	1409
	1410	@c ---------------------------------------------------------------------
[261bf62]	1411	@c SER
[25ae32e]	1412	@c ---------------------------------------------------------------------
	1413
	1414	@page
	1415	@node ser
	1416	@section ser - pattern search tool
	1417
	1418	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1419	@item @strong{Authors:} @tab Tomasz ObrÄbski
[25ae32e]	1420	@item @strong{Component category:} @tab filter
[261bf62]	1421	@item @strong{Input format:} @tab UTT regular
	1422	@item @strong{Output format:} @tab UTT regular
	1423	@item @strong{Required annotation:} @tab tok, lem --one-field
[25ae32e]	1424	@end multitable
	1425
	1426	@menu
[261bf62]	1427	* ser description::
[25ae32e]	1428	* ser command line options::
	1429	* ser pattern::
	1430	* ser how ser works::
	1431	* ser customization::
	1432	* ser limitations::
	1433	* ser requirements::
	1434	@end menu
	1435
	1436
[261bf62]	1437	@node ser description
	1438	@subsection Description
	1439
	1440	@command{ser} looks for patterns in UTT-formatted texts.
	1441
	1442
[25ae32e]	1443	@c ---------------------------------------------------------------------
	1444	@node ser command line options
	1445	@subsection Command line options
	1446
	1447	@table @code
	1448
	1449	@parhelp
	1450	@parversion
	1451	@c @parfile
	1452	@c @paroutput
	1453	@c @parinputfield
	1454	@c @paroutputfield
	1455	@parprocess
	1456	@parinteractive
	1457
	1458	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1459	The search pattern.
	1460
	1461	@item @b{@minus{}@minus{}morph=@var{field}}
	1462	The name of the annotation field containing the morphological
	1463	description (default @code{lem}).
	1464
	1465	@item @b{@minus{}@minus{}flex}
	1466	Only print the generated flex source code.
	1467
	1468	@item @b{@minus{}@minus{}macro=@var{filename}}
	1469	Read macrodefinitions from file @var{filename} rather than from
	1470	default location. This option allows to redefine the set of terms.
	1471
	1472	@item @b{@minus{}@minus{}define=@var{filename}}
	1473	Append macrodefinitions from file @var{filename}. This option
	1474	allows to extend the set of terms.
	1475
	1476	@end table
	1477
	1478
	1479	@c ---------------------------------------------------------------------
	1480	@node ser pattern
	1481	@subsection Pattern
	1482
	1483	The @command{ser} pattern is a regular expression over terms corresponding
	1484	to text segments or segment sequences. Predefined terms are:
	1485
	1486	@table @code
	1487
	1488	@item seg(@var{t},@var{f},@var{a})
	1489	a segment of type @var{t}, containing form @var{f} and annotation
	1490	@var{a}
	1491
	1492	@item form(@var{f})
	1493	a segment containing form @var{f}
	1494
	1495	@item field(@var{f})
	1496	a segment containing annotation field @var{f}
	1497
	1498	@item space(@var{f})
	1499	a space segment of form @var{f}
	1500
	1501	@item word(@var{f})
	1502	a word segment of form @var{f}
	1503
	1504	@item punct(@var{f})
	1505	a punct segment of form @var{f}
	1506
	1507	@item number(@var{f})
	1508	a number segment of form @var{f}
	1509
	1510	@item lexeme(@var{f})
	1511	a word segment with lemma @var{f}
	1512
	1513	@item cat(@var{c})
	1514	a word segment of category @var{c}
	1515
	1516	@end table
	1517
	1518	All arguments are optional. If an argument is omitted, an arbitrary
	1519	string of non-blank characters is assumed as the argument value. Term
	1520	arguments may be arbitrary character-level regular expressions. The
	1521	following special symbols can by used:
	1522
	1523	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1524	@item @code{[@dots{}]} @tab a character class
	1525	@item @code{[^@dots{}]} @tab a negated character class
	1526	@item @code{\|} @tab alternative
	1527	@item @code{*} @tab repetition, including zero times
	1528	@item @code{+} @tab repetition, at least one time
	1529	@item @code{?} @tab optionality
	1530	@item @code{@{@var{m},@var{n}@}} @tab repetition from @var{m} to @var{n} times
	1531	@item @code{@{@var{m},@}} @tab repetition @var{m} or more times
	1532	@item @code{@{@var{m}@}} @tab repetition @var{m} times
	1533	@item @code{@var{\ddd}} @tab the character with octal value @var{ddd}
	1534	@item @code{\x@var{hh}} @tab the character with hexadecimal value @var{hh}
	1535	@item @code{( )} @tab parentheses, used to override precedence
	1536	@c @end multitable
	1537
	1538	@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1539	@item @code{.} @tab a non-blank character
	1540	@item @code{\w} @tab a letter
	1541	@item @code{\W} @tab a non-blank character other than a letter
	1542	@item @code{\d} @tab a digit
	1543	@item @code{\D} @tab a non-blank character other than a digit
	1544	@item @code{\s} @tab a space or tab character
	1545	@item @code{\S} @tab a non-blank character (the same as @code{.})
	1546	@item @code{\l} @tab a lowercase letter
	1547	@item @code{\L} @tab an uppercase letter
	1548	@end multitable
	1549
	1550
	1551	@noindent The following characters:
	1552	@example
	1553	@verb{% [ ] ^ \| * + ? { } , . < > \ %}
	1554	@end example
	1555	must be escaped with a backslash, i.e. written as:
	1556	@example
	1557	@verb{% \[ \] \^ \\| \* \+ \? \{ \} \, \. \< \> \\ %}
	1558	@end example
	1559
	1560	@quotation Note
	1561	The special symbols are ... borrowed from Perl with minor
	1562	modifications ... for convenience
	1563	The meaning of certain special characters/sequences slightly differs
	1564	from their common ???. This is motivated by convenience reasons.
	1565	The meaning of the @code{.} special character is modified due to
	1566	the special function of spaces in utt files (they are field
	1567	separators). Use @code{\s} to explicitly
	1568	@end quotation
	1569
	1570	In the argument of the @code{cat} term a special operator <...> may be
	1571	used. A category specification enclosed in angle brackets matches all
	1572	category descriptions which are consistent (non-contradictory) with the
	1573	specification. For example @code{<N>} matches all noun descriptions,
	1574	@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
	1575
	1576
	1577	@*
	1578	@noindent @b{Examples of one-segment patterns:}
	1579
	1580	@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1581	@item @code{seg} @tab any segment
	1582	@item @code{word} @tab any word-form
	1583	@item @code{word(pomocy)} @tab the word-form @samp{pomocy}
	1584	@item @code{word(naj.+)} @tab a word-form beginning with @samp{naj}
	1585	@item @code{word(\L\l+)} @tab a capitalized word-form
	1586	@item @code{punct} @tab a punctuation character
	1587	@item @code{space(.\\n.)} @tab a space segment containing a newline character
	1588	@item @code{lexeme(pomoc)} @tab any form of the lexeme 'pomoc'
	1589	@item @code{cat(N/.*)} @tab a word which category starts with @code{N/}
	1590	@item @code{cat(<N/Ca>)} @tab a word which category matches @code{N/Ca}
	1591	@end multitable
	1592
	1593	@*
	1594	@noindent @b{Examples of multi-segment patterns:}
	1595
	1596	@table @code
	1597
	1598	@item (word(\L) punct(\.) space?)+ word(\L\l+)
	1599	a sequence of initials followed by a surname
	1600
	1601	@item punct seg(W\|S\|N)* cat(<NPRO/Sr>) seg(W\|S\|N)* punct
	1602	a text fragment between two punctuation characters, containing an
	1603	ocurrence of a relative pronoun
	1604
	1605	@end table
	1606
	1607
	1608	@node ser how ser works
	1609	@subsection How ser works
	1610
	1611	@node ser customization
	1612	@subsection Customization
	1613
	1614	@c All predefined terms correspond to single segments,
	1615
	1616	@example
[261bf62]	1617	define(`verbseq', `(cat(<V>) (space cat(<V>)))')
[25ae32e]	1618	@end example
	1619
	1620
	1621	the term @code{cat()} may not be used as a ... of
	1622
	1623	@c See @command{m4} manual for further details on macro definition format.
	1624
	1625	@node ser limitations
	1626	@subsection Limitations
	1627
[261bf62]	1628	Do not use more than 3 attributes in <>.
[25ae32e]	1629
	1630	@node ser requirements
	1631	@subsection Requirements
	1632
	1633	In order to run @command{ser}, the following programs must be
	1634	installed in the system:
	1635
	1636	@itemize
	1637
	1638	@item @command{m4}
	1639	@item @command{grep}
	1640	@item @command{flex}
	1641	@item @command{gcc}
	1642
	1643	@end itemize
	1644
	1645
	1646	@c ---------------------------------------------------------------------
[261bf62]	1647	@c GRP
[25ae32e]	1648	@c ---------------------------------------------------------------------
	1649
	1650	@page
	1651	@node grp
	1652	@section grp - pattern search tool
	1653
	1654	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1655	@item @strong{Authors:} @tab Tomasz ObrÄbski
[25ae32e]	1656	@item @strong{Component category:} @tab filter
[261bf62]	1657	@item @strong{Input format:} @tab UTT flattened
	1658	@item @strong{Output format:} @tab UTT flattened
	1659	@item @strong{Required annotation:} @tab tok, sen, lem --one-field
[25ae32e]	1660	@end multitable
	1661
	1662
[261bf62]	1663	@menu
	1664	* grp description::
	1665	* grp command line options::
	1666	* grp pattern::
	1667	* grp hints::
	1668	@end menu
	1669
	1670
	1671	@node grp description
	1672	@subsection Description
	1673
[25ae32e]	1674	@code{gre} selects sentences containing an expression matching a
	1675	pattern. The pattern format is exactly the same as that accepted by
	1676	@code{ser}.
	1677
	1678	@code{gre} is intended mainly for speeding up corpus search process.
	1679	It is extremely fast (processing speed is usually higher then the speed
	1680	of reading the corpus file from disk).
	1681
	1682	@node grp command line options
	1683	@subsection Command line options
	1684
	1685	@table @code
	1686
	1687	@parhelp
	1688	@parversion
	1689	@parprocess
	1690	@parinteractive
	1691
	1692	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1693	The search pattern.
	1694
	1695	@item @b{@minus{}@minus{}morph=@var{field}}
	1696	The name of the annotation field containing the morphological
	1697	description (default @code{lem}).
	1698
	1699	@item @b{@minus{}@minus{}command}
	1700	Only print the generated flex source code.
	1701
	1702	@item @b{@minus{}@minus{}macro=@var{filename}}
	1703	Read macrodefinitions from file @var{filename} rather than from
	1704	default location. This option allows to redefine the set of terms.
	1705
	1706	@item @b{@minus{}@minus{}define=@var{filename}}
	1707	Append macrodefinitions from file @var{filename}. This option
	1708	allows to extend the set of terms.
	1709
	1710	@end table
	1711
	1712
	1713	@node grp pattern
	1714	@subsection Pattern
	1715
	1716	(see @code{ser})
	1717
	1718	@node grp hints
	1719	@subsection Hints
	1720
	1721	The corpus search speed may be increased by combining grp with lzop
	1722	compression tool (grp usually processes data faster than it is read from a
	1723	disk, especially for slow laptop drives).
	1724
	1725	@example
[e28a625]	1726	cat corpus \| tok \| sen \| lem -1 \| fla \| lzop -7 > corpus.grp.lzo
[25ae32e]	1727	@end example
	1728
	1729	@example
[e28a625]	1730	lzop -cd corpus.grp.lzo \| grp -e @var{EXPR} \| unfla \| ser -e @var{EXPR}
[25ae32e]	1731	@end example
	1732
	1733
[261bf62]	1734
[25ae32e]	1735	@c ---------------------------------------------------------------------
[261bf62]	1736	@c MAR
[25ae32e]	1737	@c ---------------------------------------------------------------------
[261bf62]	1738
	1739	@page
	1740	@node mar
	1741	@section mar
	1742
	1743	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1744	@item @strong{Authors:} @tab Marcin Walas, Tomasz ObrÄbski
[e28a625]	1745	@item @strong{Input format:} @tab UTT flattened
	1746	@item @strong{Output format:} @tab UTT flattened
	1747	@item @strong{Required annotation:} @tab tok, sen, lem -1
[261bf62]	1748	@end multitable
	1749
[2d89d4b]	1750	@subsection Description
	1751	@code{mar} is a perl script, which matches given pattern on the utt-formated text
	1752	and tags matching parts with any number of user-defined tags.
	1753
	1754	@subsection Command line options
	1755	@table @code
	1756	@parhelp
	1757	@parversion
	1758
	1759	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1760	The search pattern.
	1761	@item @b{@minus{}@minus{}action=@var{action}, @minus{}a @var{action} [p] [s] [P]}
	1762	Perform only indicated actions. Where:
	1763	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1764	@item @code{p} @tab preprocess
	1765	@item @code{s} @tab search
	1766	@item @code{P} @tab postprocess
	1767	@end multitable
	1768	default: psP
	1769
	1770	@item @b{@minus{}@minus{}command}
	1771	print generated sed command, then exit
	1772
	1773	@item @b{@minus{}@minus{}help, @minus{}h}
	1774	print help, then exit
	1775
	1776	@item @b{@minus{}@minus{}version, @minus{}v}
	1777	print version, then exit
	1778	@end table
	1779	@subsection Tokens in pattern
	1780	@code{mar} pattern is based on @code{ser} patterns(see @pxref{ser pattern}). @code{mar} pattern is a @code{ser} pattern,
	1781	in which you can add any number of matching tags, which will be printed in exacly the place, where
	1782	they were placed in the pattern. A valid token starts with @@ which follows any number of alphanumeric
	1783	characters. For example valid match tokens are: @@STARTMATCH @@ENDMATCH
	1784
	1785	Matching tokens can be placed between, before or after any of @code{ser} pattern terms. They don't have
	1786	to be paritied. There can be any number of them in the pattern (zero or more). They don't have to be unique.
	1787	They can be placed one after another. For example:
	1788
	1789	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaa}
	1790	@item @code{@@BOM lexeme(pomoc)} @tab place tag @b{BOM} before any form of the lexeme 'pomoc'
	1791	@item @code{@@MATCH lexeme(pomoc) @@MATCH} @tab place tag @b{MATCH} before and after any form of the lexeme 'pomoc'
	1792	@item @code{cat(<ADJ>) @@MATCH lexeme(pomoc) @@MATCH} @tab place tag @b{MATCH} before and after any form of the lexeme 'pomoc' which is followef by adjective
	1793	@item @code{cat(<ADJ>) @@TAG @@BOM lexeme(pomoc) @@EOM} @tab place tags @b{TAG} and @b{BOM} before any form of the lexeme 'pomoc' which is followed by adjective and tag @b{EOM} after it
	1794	@end multitable
	1795
	1796	(see mar's help 'mar -h' for some more information)
	1797
	1798	@subsection How mar works
	1799	@code{mar} translates given @code{ser} pattern with @code{m4} macroprocessor to regular expression. Then it changes it into @code{sed} command script, which is then executed.
	1800
	1801	You can see translated sed script by using the @code{@minus{}@minus{}command} option.
	1802	@subsection Limitations
	1803	The complexity of computations performed by @code{mar} increases linearly with the number of placed tokens. So it is highly recommended not to place too much tokens.
	1804	@subsection Requirements
	1805	In order to run @code{mar}, the following programs must be installed in the system:
	1806
	1807	@itemize
	1808
	1809	@item @command{m4}
	1810	@item @command{grep}
	1811	@item @command{sed}
	1812
	1813	@end itemize
	1814
[261bf62]	1815
[e28a625]	1816
[261bf62]	1817	@c ---------------------------------------------------------------------
	1818	@c KOT
[25ae32e]	1819	@c ---------------------------------------------------------------------
	1820
	1821	@page
	1822	@node kot
	1823	@section kot - untokenizer
	1824
[261bf62]	1825	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1826	@item @strong{Authors:} @tab Tomasz ObrÄbski
[261bf62]	1827	@item @strong{Component category:} @tab filter
	1828	@item @strong{Input format:} @tab UTT regular
	1829	@item @strong{Output format:} @tab text
	1830	@item @strong{Required annotation:} @tab tok
	1831	@end multitable
[25ae32e]	1832
	1833
	1834	@menu
[261bf62]	1835	* kot description::
[25ae32e]	1836	* kot command line options::
	1837	* kot usage examples::
	1838	@end menu
	1839
[261bf62]	1840	@node kot description
	1841	@subsection Description
	1842
	1843	@command{kot} transforms a UTT formatted file back into raw text format.
	1844
[25ae32e]	1845	@node kot command line options
	1846	@subsection Command line options
	1847
	1848	@table @code
	1849
	1850	@parhelp
	1851
	1852	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
	1853
	1854	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	1855
	1856	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	1857
	1858	@c @item @b{@minus{}@minus{}interactive @minus{}i}
	1859
	1860	@c @item @b{@minus{}@minus{}config=@var{filename}}
	1861
	1862	@item
	1863
	1864	@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
	1865	print @var{string} between nonadjacent segments of the input file
	1866
	1867	@item @b{@minus{}@minus{}spaces, @minus{}r}
	1868	retain the special characters @code{_}, @code{\t},
	1869	@code{\n}, @code{\r}, @code{\f} unexpanded in the output
	1870
	1871	@end table
	1872
	1873	@node kot usage examples
	1874	@subsection Usage examples
	1875
	1876	@example
	1877	cat legia.txt \| tok \| kot
	1878	@end example
	1879
	1880	@example
	1881	cat legia.txt \| tok \| lem -1 \| kot
	1882	@end example
	1883
[261bf62]	1884	@c ---------------------------------------------------------------
	1885	@c CON
	1886	@c ---------------------------------------------------------------
	1887
[25ae32e]	1888
	1889	@page
	1890	@node con
	1891	@section con - concordance table generator
	1892
	1893	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1894	@item @strong{Authors:} @tab Justyna Walkowska
	1895	@item @strong{Component category:} @tab sink
[261bf62]	1896	@item @strong{Input format:} @tab UTT regular
	1897	@item @strong{Output format:} @tab text
	1898	@item @strong{Required annotation:} @tab ser or mar
[25ae32e]	1899	@end multitable
	1900	@c
	1901
	1902	@menu
[261bf62]	1903	* con description::
[25ae32e]	1904	* con command line options::
	1905	* con usage example::
	1906	* con hints::
	1907	@end menu
	1908
[261bf62]	1909
	1910	@node con description
	1911	@subsection Description
	1912
	1913	@command{con} generates a concordance table based on a pattern given to @command{ser}.
	1914
	1915
[25ae32e]	1916	@node con command line options
	1917	@subsection Command line options
	1918
	1919	@table @code
	1920
	1921	@parhelp
	1922
	1923	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
	1924	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
	1925	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	1926	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	1927	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
	1928	@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
	1929	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
	1930	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
	1931	@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
	1932	@c @item @b{@minus{}@minus{}interactive @minus{}i}
	1933	@c @item @b{@minus{}@minus{}config=@var{filename}}
	1934	@c @item
	1935	@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1936	@c search pattern
	1937	@c
	1938	@c @item @b{@minus{}@minus{}flex}
	1939	@c only print the generated flex source code
	1940	@c
	1941	@c @item @b{@minus{}@minus{}macro=@var{filename}}
	1942	@c read macrodefinitions from file @var{filename} rather than from
	1943	@c default location. This option allows to redefine the set of terms.
	1944	@c
	1945	@c @item @b{@minus{}@minus{}define=@var{filename}}
	1946	@c append macrodefinitions from file @var{filename}. This option
	1947	@c allows to extend the set of terms.
	1948
	1949	@item @b{@minus{}@minus{}left @minus{}l}
	1950	Left context info (default='30c'). Example:
	1951	@example
	1952	-l=5c: left context is 5 characters
	1953	-l=5w: left context is 5 words
	1954	-l=5s: left context is 5 non-empty input lines
	1955	-l='\s*\S+\sr\S+BOS': left context starts with the given regex
	1956	@end example
	1957
	1958	@item @b{@minus{}@minus{}right @minus{}r}
	1959	Right context info (default='30c').
	1960	@item @b{@minus{}@minus{}trim @minus{}t}
	1961	Clear incomplete words from output.
	1962	@item @b{@minus{}@minus{}white @minus{}w}
	1963	DO NOT change all white characters into spaces.
	1964	@item @b{@minus{}@minus{}column @minus{}c}
	1965	Left column minimal width in characters (default = 0).
	1966	@item @b{@minus{}@minus{}ignore @minus{}i}
	1967	Ignore segment inconsistency in the input.
[261bf62]	1968	@item @b{@minus{}@minus{}bom}
[25ae32e]	1969	Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
[261bf62]	1970	@item @b{@minus{}@minus{}eom}
[25ae32e]	1971	End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
	1972	@item @b{@minus{}@minus{}bod}
	1973	Selected segment beginning display string (default='[').
	1974	@item @b{@minus{}@minus{}eod}
	1975	Selected segment end display string (default=']').
	1976
	1977
	1978
	1979	@end table
	1980
	1981	@node con usage example
	1982	@subsection Usage example
	1983	@example
[261bf62]	1984	cat file.txt \| tok \| lem -1 \| ser -e 'lexeme(dom)' \| con
[25ae32e]	1985	@end example
	1986
	1987
	1988	@node con hints
	1989	@subsection Hints
	1990
	1991	@command{con} is a rather slow program. Do not pass large amounts of
	1992	redundant text through this program. @command{con} works fine in the following
	1993	sequence:
	1994
	1995	@example
	1996	... \| grp -e EXPR \| ser -e EXPR \| con
	1997	@end example
	1998
	1999
	2000	@c ---------------------------------------------------------------------
	2001	@c ---------------------------------------------------------------------
	2002
	2003	@page
	2004	@node Auxiliary tools
	2005	@chapter Auxiliary tools
	2006
	2007	@menu
	2008	* compiledic:: dictionary compiler
	2009	* fla:: UTT file flattener
	2010	* unfla:: UTT file unflattener
	2011	@end menu
	2012
	2013
	2014	@page
	2015	@node compiledic
	2016	@section compiledic - the dictionary compiler
	2017
	2018	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	2019	@item @strong{Authors:} @tab MichaÅ Stolarski, Tomasz ObrÄbski
[25ae32e]	2020	@item @strong{Component category:} @tab additional tool
	2021	@end multitable
	2022	@c
	2023
	2024	@command{compiledic} compiles dictionaries in text format (@code{.dic} extension) into binary
	2025	(FSA) format (@code{.bin} extension).
	2026
	2027	Automaton representation of a dictionary is built using the AT&T tools:
	2028	@itemize
	2029	@item AT&T FSM Library,
	2030	@item AT&T Lextools.
	2031	@end itemize
	2032
	2033	In order for the compiledic program to work you have to install the
	2034	above mentioned packages into your system. They are freely available
	2035	for non-commercial use.
	2036
	2037	Usage:
	2038	@example
	2039	compiledic <dictionaryname>.dic
	2040	@end example
	2041
	2042	The file <dictionaryname>.bin will be generated.
	2043
	2044	Remarque: The program produces a lot of temporary files which are
	2045	stored in the current directory. They are deleted after successfull
	2046	termination of the program.
	2047
	2048	@c @menu
	2049	@c * con command line options::
	2050	@c * con usage example::
	2051	@c * con hints::
	2052	@c @end menu
	2053
	2054
[e28a625]	2055	@c -------------------------------------------------------------------------------
	2056	@c FLA
	2057	@c -------------------------------------------------------------------------------
	2058
[25ae32e]	2059	@page
	2060	@node fla
	2061	@section fla - the UTT file flattener
	2062
	2063	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	2064	@item @strong{Authors:} @tab Tomasz ObrÄbski
[e28a625]	2065	@item @strong{Input format:} @tab UTT regular
	2066	@item @strong{Output format:} @tab UTT flattened
	2067	@item @strong{Required annotation:} @tab sen
[25ae32e]	2068	@end multitable
	2069	@c
	2070
[e28a625]	2071	@menu
	2072	* fla description::
	2073	@c * fla command line options::
	2074	@c * fla usage example::
	2075	@end menu
	2076
	2077
	2078	@node fla description
	2079	@subsection Description
	2080
[25ae32e]	2081	@command{fla} ``flattens'' a utt file by merging segments belonging
	2082	to one sentence in one line. Technically, end-of-line characters
	2083	('\n', ASCII code 10) are replaced with line-feed characters ('\f',
	2084	ASCII code 12). The flattening makes it possible to process UTT files
	2085	with such tools as @command{grep} or @command{sed} sentence by
	2086	sentence (used in @command{grp} and @command{mar}).
	2087
	2088	Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
	2089
	2090	Flattened files are still human-readible.
	2091
	2092	Usage:
	2093
	2094	@example
	2095	fla [<bosregex>]
	2096	@end example
	2097
	2098	The facultative argument is a regular expression describing segments
	2099	which should be treated as sentence beginnings (the test is: the
	2100	segment contains a fragment matching the @code{<bosregex>}). By
	2101	default, segments containing a field @code{BOS} are seeked.
	2102
[e28a625]	2103	@c -------------------------------------------------------------------------------
	2104	@c UNFLA
	2105	@c -------------------------------------------------------------------------------
[25ae32e]	2106
	2107	@page
	2108	@node unfla
	2109	@section unfla - the UTT file unflattener
	2110
	2111	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	2112	@item @strong{Authors:} @tab Tomasz ObrÄbski
[e28a625]	2113	@item @strong{Input format:} @tab UTT flattened
	2114	@item @strong{Output format:} @tab UTT regular
	2115	@item @strong{Required annotation:} @tab -
[25ae32e]	2116	@end multitable
	2117
[e28a625]	2118	@menu
	2119	* unfla description::
	2120	@c * fla command line options::
	2121	@c * fla usage example::
	2122	@end menu
	2123
	2124	@node unfla description
	2125	@subsection Description
[25ae32e]	2126	@command{unfla} transforms a flattened UTT file, produced by
	2127	@command{fla}, into the regular format by restoring end-of-line
	2128	characters.
	2129
	2130
	2131
	2132
	2133	@c ---------------------------------------------------------------------
	2134	@c USAGE EXAMPLES
	2135	@c ---------------------------------------------------------------------
	2136
	2137	@node Usage examples
	2138	@chapter Usage examples
	2139
	2140	@subsubheading Simple pipelines
	2141
	2142	@enumerate
	2143
	2144	@item tokenization
	2145
	2146	cat text \| tok > output1
	2147
	2148	@item morphological annotation (1)
	2149
	2150	simple dictionary based lemmatization
	2151
	2152	cat text \| tok \| lem > output1
	2153
	2154	@item morphological annotation (2)
	2155
	2156	1) perform dictionary-based lemmatization
	2157	4) guess descriptions for words which have no annotation
	2158
	2159	@example
	2160	cat text \| tok \| lem \| gue -S lem > output2
	2161	@end example
	2162
	2163	@item morphological annotation (3)
	2164
	2165	1) perform dictionary-based lemmatization
	2166	2) try to correct words with no annotation
	2167	3) perform dictionary-based lemmatization of corrected words
	2168	4) guess descriptions for words which still have no annotation
	2169
	2170	@example
	2171	cat text \| tok \| lem \| cor -p W -S lem \| lem -I cor \| gue -p W -S lem
	2172	@end example
	2173	@item spelling correction
	2174
	2175
	2176
	2177	@example
[e28a625]	2178	cat text \| tok \| egrep ' W ' \| lem \| egrep -v 'lem:' \| cor -1
[25ae32e]	2179	@end example
	2180
	2181	@item Expression extraction
	2182
	2183	Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
	2184
	2185	@example
	2186	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' -m \| kot > output4
	2187	@end example
	2188
	2189	@item A word in context
	2190
	2191	Extraction of text fragments containing a form of the lexeme 'rozmowa' in
	2192	the context of 5 preceeding and 5 succeeding corpus segments.
	2193
	2194	@example
	2195	cat text \| tok \| lem -1 \| ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m \| kot > output
	2196	@end example
	2197
	2198	@item generation of concordance table (1)
	2199
	2200	@example
	2201	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
	2202	@end example
	2203
	2204	10"
	2205
	2206	@item generation of concordance table (2)
	2207
	2208	The same as above but much faster
	2209
	2210	@example
	2211	cat text \| tok \| lem -1 \| \
	2212	grp -e 'cat(<V>) space lexeme(rozmowa)' \| \
	2213	ser -e 'cat(<V>) space lexeme(rozmowa)' \| \
	2214	con
	2215	@end example
	2216
	2217	2"
	2218
	2219	@item generation of concordance table (3)
	2220
	2221	Usually, one performs repetitively search over the same corpus. In
	2222	such case it is advisable to transform the corpus data into the format
	2223	required by @command{grp} first, and then use the preprocessed data.
	2224
	2225	As @command{grp} (@command{grep}) processes data faster then it is
	2226	read from the disk drive, the search time may be still shortened by
[e28a625]	2227	using file compression techniques. We suggest using the
	2228	@command{lzop} compressor/decompressor.
[25ae32e]	2229
	2230	@item the fastest way to search a large corpus
	2231
[e28a625]	2232	step 1: corpus preprocessing
[25ae32e]	2233
	2234	@example
	2235	cat corpus \| tok \| sen \| lem -1 \
[e28a625]	2236	\| fla \| lzop -7 > corpus.grp.lzo
[25ae32e]	2237	@end example
	2238
	2239	step 2: search
	2240
	2241	@example
[e28a625]	2242	lzop -cd corpus.grp.lzo \| unfla \| grp -e 'cat(<V>) space
[25ae32e]	2243	lexeme(rozmowa)' \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
	2244	@end example
	2245
	2246	@end enumerate
	2247
[e28a625]	2248	@c @subsubheading More complicated configurations
[25ae32e]	2249
	2250
[e28a625]	2251	@c @example
	2252	@c mknod fifo1 p
	2253	@c mknod fifo2 p
	2254	@c mknod fifo3 p
	2255	@c mknod fifo4 p
	2256	@c mknod fifo5 p
	2257
	2258	@c tok \| lem -p W -e fifo1 > fifo2 &
	2259	@c cor -e fifo3 < fifo1 \| lem > fifo4 &
	2260	@c gue < fifo3 > fifo5 &
	2261	@c sort -m fifo2 fifo4 fifo5
	2262
	2263	@c rm fifo?
	2264	@c @end example
[25ae32e]	2265
	2266
	2267	@c ---------------------------------------------------------------------
	2268	@c ---------------------------------------------------------------------
	2269
	2270	@c ---------------------------------------------------------------------
	2271	@c PMDBF DICTIONARY
	2272	@c ---------------------------------------------------------------------
	2273
	2274	@node PMDBF dictionary
	2275	@chapter PMDBF dictionary
	2276
	2277	UTT components come with lexical data derived from Polish
	2278	Morphological Database (PMDB).
	2279
	2280	@menu
	2281	* PMDBF files::
	2282	* PMDBF tag structure::
	2283	* PMDBF parts of speech::
	2284	* PMDBF morphosyntactic attributes::
	2285	@end menu
	2286
	2287	@node PMDBF files
	2288	@section Files
	2289
	2290	@node PMDBF tag structure
	2291	@section Tag structure
	2292
	2293	pos = [[:upper:]]+
	2294
	2295	attr = [[:upper:]]+
	2296
	2297	val = [[:lower:][:digit:]?!*+-] \| <[^>\n]+>
	2298
	2299	descr = pos ( / ( attr val + ) + ) ?
	2300
	2301	@node PMDBF parts of speech
	2302	@section Parts of speech
	2303
	2304	@multitable {ADJPRP} { adjectival-passive-participle }
	2305	@item @code{N} @tab noun
	2306	@item @code{NPRO} @tab nominal-pronoun
	2307	@item @code{NV} @tab deverbal-noun
	2308	@item @code{V} @tab verb
	2309	@item @code{BYC} @tab byc
	2310	@item @code{VNI} @tab non-inflected-verb
	2311	@item @code{ADJ} @tab adjective
	2312	@item @code{ADJPAP} @tab adjectival-passive-participle
	2313	@item @code{ADJPRP} @tab adjectival-present-participle
	2314	@item @code{ADJPP} @tab adjectival-past-participle
	2315	@item @code{ADJPRO} @tab adjectival-pronoun
	2316	@item @code{ADJNUM} @tab adjectival-numeral
	2317	@item @code{ADV} @tab adverb
	2318	@item @code{ADVANP} @tab adverbial-anterior-participle
	2319	@item @code{ADVPRP} @tab adverbial-present-participle
	2320	@item @code{ADVPRO} @tab adverbial-pronoun
	2321	@item @code{ADVNUM} @tab adverbial-numeral
	2322	@item @code{P} @tab preposition
	2323	@item @code{PPRO} @tab prep-noun-pronoun
	2324	@item @code{CONJ} @tab conjunction
	2325	@item @code{EXCL} @tab exclamation
	2326	@item @code{APP} @tab call
	2327	@item @code{ONO} @tab onomatopoeia
	2328	@item @code{PART} @tab particle
	2329	@item @code{NUMCRD} @tab cardinal-numeral
	2330	@item @code{NUMCOL} @tab collective-numeral
	2331	@item @code{NUMPAR} @tab partitive-numeral
	2332	@item @code{NUMORD} @tab ordinal-numeral
	2333	@end multitable
	2334
	2335	@node PMDBF morphosyntactic attributes
	2336	@section Morphosyntactic attributes
	2337
	2338	@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	2339	@c @headitem Attr @tab Val @tab Description
	2340	@item
	2341	@code{A} @tab @tab Aspect
	2342	@item
	2343	@tab @code{p} @tab perfect
	2344	@item
	2345	@tab @code{i} @tab imperfect.
	2346	@item
	2347	@item
	2348	@code{V} @tab @tab Verb-Form
	2349	@item
	2350	@tab @code{b} @tab infinitive,
	2351	@item
	2352	@tab @code{p} @tab personal,
	2353	@item
	2354	@tab @code{i} @tab impersonal.
	2355	@item
	2356	@item
	2357	@code{M} @tab @tab Mood
	2358	@item
	2359	@tab @code{d} @tab declarative,
	2360	@item
	2361	@tab @code{c} @tab conditional,
	2362	@item
	2363	@tab @code{i} @tab imperative.
	2364	@item
	2365	@item
	2366	@code{T} @tab @tab Tense
	2367	@item
	2368	@tab @code{a} @tab past,
	2369	@item
	2370	@tab @code{r} @tab present,
	2371	@item
	2372	@tab @code{f} @tab future.
	2373	@item
	2374	@item
	2375	@code{P} @tab @tab Person
	2376	@item
	2377	@tab @code{1} @tab 1,
	2378	@item
	2379	@tab @code{2} @tab 2,
	2380	@item
	2381	@tab @code{3} @tab 3.
	2382	@item
	2383	@item
	2384	@code{D} @tab @tab Degree
	2385	@item
	2386	@tab @code{p} @tab positive,
	2387	@item
	2388	@tab @code{c} @tab comparative,
	2389	@item
	2390	@tab @code{s} @tab superlative.
	2391	@item
	2392	@item
	2393	@code{N} @tab @tab Number
	2394	@item
	2395	@tab @code{s} @tab singular,
	2396	@item
	2397	@tab @code{p} @tab plural.
	2398	@item
	2399	@item
	2400	@code{C} @tab @tab Case
	2401	@item
	2402	@tab @code{n} @tab nominative,
	2403	@item
	2404	@tab @code{g} @tab genitive,
	2405	@item
	2406	@tab @code{d} @tab dative,
	2407	@item
	2408	@tab @code{a} @tab accusative,
	2409	@item
	2410	@tab @code{i} @tab instrumantal,
	2411	@item
	2412	@tab @code{l} @tab locative,
	2413	@item
	2414	@tab @code{v} @tab vocative.
	2415	@item
	2416	@code{G} @tab @tab Gender
	2417	@item
	2418	@tab @code{p} @tab masculine-personal,
	2419	@item
	2420	@tab @code{a} @tab masculine-animal,
	2421	@item
	2422	@tab @code{i} @tab masculine-inanimate,
	2423	@item
	2424	@tab @code{f} @tab feminine,
	2425	@item
	2426	@tab @code{n} @tab neuter.
	2427	@end multitable
	2428
	2429
	2430	@c ---------------------------------------------------------------------
	2431	@c ---------------------------------------------------------------------
	2432	@c
	2433	@c @node Examples
	2434	@c @chapter Examples
	2435
	2436	@c ----------------------------------------------------------------------
	2437	@c ----------------------------------------------------------------------
	2438
	2439	@node GNU Free Documentation License
	2440	@chapter GNU Free Documentation License
	2441
	2442	@c The GNU Free Documentation License.
	2443	@center Version 1.2, November 2002
	2444
	2445	@c This file is intended to be included within another document,
	2446	@c hence no sectioning command or @node.
	2447
	2448	@display
	2449	Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
	2450	51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
	2451
	2452	Everyone is permitted to copy and distribute verbatim copies
	2453	of this license document, but changing it is not allowed.
	2454	@end display
	2455
	2456	@enumerate 0
	2457	@item
	2458	PREAMBLE
	2459
	2460	The purpose of this License is to make a manual, textbook, or other
	2461	functional and useful document @dfn{free} in the sense of freedom: to
	2462	assure everyone the effective freedom to copy and redistribute it,
	2463	with or without modifying it, either commercially or noncommercially.
	2464	Secondarily, this License preserves for the author and publisher a way
	2465	to get credit for their work, while not being considered responsible
	2466	for modifications made by others.
	2467
	2468	This License is a kind of ``copyleft'', which means that derivative
	2469	works of the document must themselves be free in the same sense. It
	2470	complements the GNU General Public License, which is a copyleft
	2471	license designed for free software.
	2472
	2473	We have designed this License in order to use it for manuals for free
	2474	software, because free software needs free documentation: a free
	2475	program should come with manuals providing the same freedoms that the
	2476	software does. But this License is not limited to software manuals;
	2477	it can be used for any textual work, regardless of subject matter or
	2478	whether it is published as a printed book. We recommend this License
	2479	principally for works whose purpose is instruction or reference.
	2480
	2481	@item
	2482	APPLICABILITY AND DEFINITIONS
	2483
	2484	This License applies to any manual or other work, in any medium, that
	2485	contains a notice placed by the copyright holder saying it can be
	2486	distributed under the terms of this License. Such a notice grants a
	2487	world-wide, royalty-free license, unlimited in duration, to use that
	2488	work under the conditions stated herein. The ``Document'', below,
	2489	refers to any such manual or work. Any member of the public is a
	2490	licensee, and is addressed as ``you''. You accept the license if you
	2491	copy, modify or distribute the work in a way requiring permission
	2492	under copyright law.
	2493
	2494	A ``Modified Version'' of the Document means any work containing the
	2495	Document or a portion of it, either copied verbatim, or with
	2496	modifications and/or translated into another language.
	2497
	2498	A ``Secondary Section'' is a named appendix or a front-matter section
	2499	of the Document that deals exclusively with the relationship of the
	2500	publishers or authors of the Document to the Document's overall
	2501	subject (or to related matters) and contains nothing that could fall
	2502	directly within that overall subject. (Thus, if the Document is in
	2503	part a textbook of mathematics, a Secondary Section may not explain
	2504	any mathematics.) The relationship could be a matter of historical
	2505	connection with the subject or with related matters, or of legal,
	2506	commercial, philosophical, ethical or political position regarding
	2507	them.
	2508
	2509	The ``Invariant Sections'' are certain Secondary Sections whose titles
	2510	are designated, as being those of Invariant Sections, in the notice
	2511	that says that the Document is released under this License. If a
	2512	section does not fit the above definition of Secondary then it is not
	2513	allowed to be designated as Invariant. The Document may contain zero
	2514	Invariant Sections. If the Document does not identify any Invariant
	2515	Sections then there are none.
	2516
	2517	The ``Cover Texts'' are certain short passages of text that are listed,
	2518	as Front-Cover Texts or Back-Cover Texts, in the notice that says that
	2519	the Document is released under this License. A Front-Cover Text may
	2520	be at most 5 words, and a Back-Cover Text may be at most 25 words.
	2521
	2522	A ``Transparent'' copy of the Document means a machine-readable copy,
	2523	represented in a format whose specification is available to the
	2524	general public, that is suitable for revising the document
	2525	straightforwardly with generic text editors or (for images composed of
	2526	pixels) generic paint programs or (for drawings) some widely available
	2527	drawing editor, and that is suitable for input to text formatters or
	2528	for automatic translation to a variety of formats suitable for input
	2529	to text formatters. A copy made in an otherwise Transparent file
	2530	format whose markup, or absence of markup, has been arranged to thwart
	2531	or discourage subsequent modification by readers is not Transparent.
	2532	An image format is not Transparent if used for any substantial amount
	2533	of text. A copy that is not ``Transparent'' is called ``Opaque''.
	2534
	2535	Examples of suitable formats for Transparent copies include plain
	2536	@sc{ascii} without markup, Texinfo input format, La@TeX{} input
	2537	format, @acronym{SGML} or @acronym{XML} using a publicly available
	2538	@acronym{DTD}, and standard-conforming simple @acronym{HTML},
	2539	PostScript or @acronym{PDF} designed for human modification. Examples
	2540	of transparent image formats include @acronym{PNG}, @acronym{XCF} and
	2541	@acronym{JPG}. Opaque formats include proprietary formats that can be
	2542	read and edited only by proprietary word processors, @acronym{SGML} or
	2543	@acronym{XML} for which the @acronym{DTD} and/or processing tools are
	2544	not generally available, and the machine-generated @acronym{HTML},
	2545	PostScript or @acronym{PDF} produced by some word processors for
	2546	output purposes only.
	2547
	2548	The ``Title Page'' means, for a printed book, the title page itself,
	2549	plus such following pages as are needed to hold, legibly, the material
	2550	this License requires to appear in the title page. For works in
	2551	formats which do not have any title page as such, ``Title Page'' means
	2552	the text near the most prominent appearance of the work's title,
	2553	preceding the beginning of the body of the text.
	2554
	2555	A section ``Entitled XYZ'' means a named subunit of the Document whose
	2556	title either is precisely XYZ or contains XYZ in parentheses following
	2557	text that translates XYZ in another language. (Here XYZ stands for a
	2558	specific section name mentioned below, such as ``Acknowledgements'',
	2559	``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title''
	2560	of such a section when you modify the Document means that it remains a
	2561	section ``Entitled XYZ'' according to this definition.
	2562
	2563	The Document may include Warranty Disclaimers next to the notice which
	2564	states that this License applies to the Document. These Warranty
	2565	Disclaimers are considered to be included by reference in this
	2566	License, but only as regards disclaiming warranties: any other
	2567	implication that these Warranty Disclaimers may have is void and has
	2568	no effect on the meaning of this License.
	2569
	2570	@item
	2571	VERBATIM COPYING
	2572
	2573	You may copy and distribute the Document in any medium, either
	2574	commercially or noncommercially, provided that this License, the
	2575	copyright notices, and the license notice saying this License applies
	2576	to the Document are reproduced in all copies, and that you add no other
	2577	conditions whatsoever to those of this License. You may not use
	2578	technical measures to obstruct or control the reading or further
	2579	copying of the copies you make or distribute. However, you may accept
	2580	compensation in exchange for copies. If you distribute a large enough
	2581	number of copies you must also follow the conditions in section 3.
	2582
	2583	You may also lend copies, under the same conditions stated above, and
	2584	you may publicly display copies.
	2585
	2586	@item
	2587	COPYING IN QUANTITY
	2588
	2589	If you publish printed copies (or copies in media that commonly have
	2590	printed covers) of the Document, numbering more than 100, and the
	2591	Document's license notice requires Cover Texts, you must enclose the
	2592	copies in covers that carry, clearly and legibly, all these Cover
	2593	Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
	2594	the back cover. Both covers must also clearly and legibly identify
	2595	you as the publisher of these copies. The front cover must present
	2596	the full title with all words of the title equally prominent and
	2597	visible. You may add other material on the covers in addition.
	2598	Copying with changes limited to the covers, as long as they preserve
	2599	the title of the Document and satisfy these conditions, can be treated
	2600	as verbatim copying in other respects.
	2601
	2602	If the required texts for either cover are too voluminous to fit
	2603	legibly, you should put the first ones listed (as many as fit
	2604	reasonably) on the actual cover, and continue the rest onto adjacent
	2605	pages.
	2606
	2607	If you publish or distribute Opaque copies of the Document numbering
	2608	more than 100, you must either include a machine-readable Transparent
	2609	copy along with each Opaque copy, or state in or with each Opaque copy
	2610	a computer-network location from which the general network-using
	2611	public has access to download using public-standard network protocols
	2612	a complete Transparent copy of the Document, free of added material.
	2613	If you use the latter option, you must take reasonably prudent steps,
	2614	when you begin distribution of Opaque copies in quantity, to ensure
	2615	that this Transparent copy will remain thus accessible at the stated
	2616	location until at least one year after the last time you distribute an
	2617	Opaque copy (directly or through your agents or retailers) of that
	2618	edition to the public.
	2619
	2620	It is requested, but not required, that you contact the authors of the
	2621	Document well before redistributing any large number of copies, to give
	2622	them a chance to provide you with an updated version of the Document.
	2623
	2624	@item
	2625	MODIFICATIONS
	2626
	2627	You may copy and distribute a Modified Version of the Document under
	2628	the conditions of sections 2 and 3 above, provided that you release
	2629	the Modified Version under precisely this License, with the Modified
	2630	Version filling the role of the Document, thus licensing distribution
	2631	and modification of the Modified Version to whoever possesses a copy
	2632	of it. In addition, you must do these things in the Modified Version:
	2633
	2634	@enumerate A
	2635	@item
	2636	Use in the Title Page (and on the covers, if any) a title distinct
	2637	from that of the Document, and from those of previous versions
	2638	(which should, if there were any, be listed in the History section
	2639	of the Document). You may use the same title as a previous version
	2640	if the original publisher of that version gives permission.
	2641
	2642	@item
	2643	List on the Title Page, as authors, one or more persons or entities
	2644	responsible for authorship of the modifications in the Modified
	2645	Version, together with at least five of the principal authors of the
	2646	Document (all of its principal authors, if it has fewer than five),
	2647	unless they release you from this requirement.
	2648
	2649	@item
	2650	State on the Title page the name of the publisher of the
	2651	Modified Version, as the publisher.
	2652
	2653	@item
	2654	Preserve all the copyright notices of the Document.
	2655
	2656	@item
	2657	Add an appropriate copyright notice for your modifications
	2658	adjacent to the other copyright notices.
	2659
	2660	@item
	2661	Include, immediately after the copyright notices, a license notice
	2662	giving the public permission to use the Modified Version under the
	2663	terms of this License, in the form shown in the Addendum below.
	2664
	2665	@item
	2666	Preserve in that license notice the full lists of Invariant Sections
	2667	and required Cover Texts given in the Document's license notice.
	2668
	2669	@item
	2670	Include an unaltered copy of this License.
	2671
	2672	@item
	2673	Preserve the section Entitled ``History'', Preserve its Title, and add
	2674	to it an item stating at least the title, year, new authors, and
	2675	publisher of the Modified Version as given on the Title Page. If
	2676	there is no section Entitled ``History'' in the Document, create one
	2677	stating the title, year, authors, and publisher of the Document as
	2678	given on its Title Page, then add an item describing the Modified
	2679	Version as stated in the previous sentence.
	2680
	2681	@item
	2682	Preserve the network location, if any, given in the Document for
	2683	public access to a Transparent copy of the Document, and likewise
	2684	the network locations given in the Document for previous versions
	2685	it was based on. These may be placed in the ``History'' section.
	2686	You may omit a network location for a work that was published at
	2687	least four years before the Document itself, or if the original
	2688	publisher of the version it refers to gives permission.
	2689
	2690	@item
	2691	For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
	2692	the Title of the section, and preserve in the section all the
	2693	substance and tone of each of the contributor acknowledgements and/or
	2694	dedications given therein.
	2695
	2696	@item
	2697	Preserve all the Invariant Sections of the Document,
	2698	unaltered in their text and in their titles. Section numbers
	2699	or the equivalent are not considered part of the section titles.
	2700
	2701	@item
	2702	Delete any section Entitled ``Endorsements''. Such a section
	2703	may not be included in the Modified Version.
	2704
	2705	@item
	2706	Do not retitle any existing section to be Entitled ``Endorsements'' or
	2707	to conflict in title with any Invariant Section.
	2708
	2709	@item
	2710	Preserve any Warranty Disclaimers.
	2711	@end enumerate
	2712
	2713	If the Modified Version includes new front-matter sections or
	2714	appendices that qualify as Secondary Sections and contain no material
	2715	copied from the Document, you may at your option designate some or all
	2716	of these sections as invariant. To do this, add their titles to the
	2717	list of Invariant Sections in the Modified Version's license notice.
	2718	These titles must be distinct from any other section titles.
	2719
	2720	You may add a section Entitled ``Endorsements'', provided it contains
	2721	nothing but endorsements of your Modified Version by various
	2722	parties---for example, statements of peer review or that the text has
	2723	been approved by an organization as the authoritative definition of a
	2724	standard.
	2725
	2726	You may add a passage of up to five words as a Front-Cover Text, and a
	2727	passage of up to 25 words as a Back-Cover Text, to the end of the list
	2728	of Cover Texts in the Modified Version. Only one passage of
	2729	Front-Cover Text and one of Back-Cover Text may be added by (or
	2730	through arrangements made by) any one entity. If the Document already
	2731	includes a cover text for the same cover, previously added by you or
	2732	by arrangement made by the same entity you are acting on behalf of,
	2733	you may not add another; but you may replace the old one, on explicit
	2734	permission from the previous publisher that added the old one.
	2735
	2736	The author(s) and publisher(s) of the Document do not by this License
	2737	give permission to use their names for publicity for or to assert or
	2738	imply endorsement of any Modified Version.
	2739
	2740	@item
	2741	COMBINING DOCUMENTS
	2742
	2743	You may combine the Document with other documents released under this
	2744	License, under the terms defined in section 4 above for modified
	2745	versions, provided that you include in the combination all of the
	2746	Invariant Sections of all of the original documents, unmodified, and
	2747	list them all as Invariant Sections of your combined work in its
	2748	license notice, and that you preserve all their Warranty Disclaimers.
	2749
	2750	The combined work need only contain one copy of this License, and
	2751	multiple identical Invariant Sections may be replaced with a single
	2752	copy. If there are multiple Invariant Sections with the same name but
	2753	different contents, make the title of each such section unique by
	2754	adding at the end of it, in parentheses, the name of the original
	2755	author or publisher of that section if known, or else a unique number.
	2756	Make the same adjustment to the section titles in the list of
	2757	Invariant Sections in the license notice of the combined work.
	2758
	2759	In the combination, you must combine any sections Entitled ``History''
	2760	in the various original documents, forming one section Entitled
	2761	``History''; likewise combine any sections Entitled ``Acknowledgements'',
	2762	and any sections Entitled ``Dedications''. You must delete all
	2763	sections Entitled ``Endorsements.''
	2764
	2765	@item
	2766	COLLECTIONS OF DOCUMENTS
	2767
	2768	You may make a collection consisting of the Document and other documents
	2769	released under this License, and replace the individual copies of this
	2770	License in the various documents with a single copy that is included in
	2771	the collection, provided that you follow the rules of this License for
	2772	verbatim copying of each of the documents in all other respects.
	2773
	2774	You may extract a single document from such a collection, and distribute
	2775	it individually under this License, provided you insert a copy of this
	2776	License into the extracted document, and follow this License in all
	2777	other respects regarding verbatim copying of that document.
	2778
	2779	@item
	2780	AGGREGATION WITH INDEPENDENT WORKS
	2781
	2782	A compilation of the Document or its derivatives with other separate
	2783	and independent documents or works, in or on a volume of a storage or
	2784	distribution medium, is called an ``aggregate'' if the copyright
	2785	resulting from the compilation is not used to limit the legal rights
	2786	of the compilation's users beyond what the individual works permit.
	2787	When the Document is included in an aggregate, this License does not
	2788	apply to the other works in the aggregate which are not themselves
	2789	derivative works of the Document.
	2790
	2791	If the Cover Text requirement of section 3 is applicable to these
	2792	copies of the Document, then if the Document is less than one half of
	2793	the entire aggregate, the Document's Cover Texts may be placed on
	2794	covers that bracket the Document within the aggregate, or the
	2795	electronic equivalent of covers if the Document is in electronic form.
	2796	Otherwise they must appear on printed covers that bracket the whole
	2797	aggregate.
	2798
	2799	@item
	2800	TRANSLATION
	2801
	2802	Translation is considered a kind of modification, so you may
	2803	distribute translations of the Document under the terms of section 4.
	2804	Replacing Invariant Sections with translations requires special
	2805	permission from their copyright holders, but you may include
	2806	translations of some or all Invariant Sections in addition to the
	2807	original versions of these Invariant Sections. You may include a
	2808	translation of this License, and all the license notices in the
	2809	Document, and any Warranty Disclaimers, provided that you also include
	2810	the original English version of this License and the original versions
	2811	of those notices and disclaimers. In case of a disagreement between
	2812	the translation and the original version of this License or a notice
	2813	or disclaimer, the original version will prevail.
	2814
	2815	If a section in the Document is Entitled ``Acknowledgements'',
	2816	``Dedications'', or ``History'', the requirement (section 4) to Preserve
	2817	its Title (section 1) will typically require changing the actual
	2818	title.
	2819
	2820	@item
	2821	TERMINATION
	2822
	2823	You may not copy, modify, sublicense, or distribute the Document except
	2824	as expressly provided for under this License. Any other attempt to
	2825	copy, modify, sublicense or distribute the Document is void, and will
	2826	automatically terminate your rights under this License. However,
	2827	parties who have received copies, or rights, from you under this
	2828	License will not have their licenses terminated so long as such
	2829	parties remain in full compliance.
	2830
	2831	@item
	2832	FUTURE REVISIONS OF THIS LICENSE
	2833
	2834	The Free Software Foundation may publish new, revised versions
	2835	of the GNU Free Documentation License from time to time. Such new
	2836	versions will be similar in spirit to the present version, but may
	2837	differ in detail to address new problems or concerns. See
	2838	@uref{http://www.gnu.org/copyleft/}.
	2839
	2840	Each version of the License is given a distinguishing version number.
	2841	If the Document specifies that a particular numbered version of this
	2842	License ``or any later version'' applies to it, you have the option of
	2843	following the terms and conditions either of that specified version or
	2844	of any later version that has been published (not as a draft) by the
	2845	Free Software Foundation. If the Document does not specify a version
	2846	number of this License, you may choose any version ever published (not
	2847	as a draft) by the Free Software Foundation.
	2848	@end enumerate
	2849
	2850	@page
	2851	@heading ADDENDUM: How to use this License for your documents
	2852
	2853	To use this License in a document you have written, include a copy of
	2854	the License in the document and put the following copyright and
	2855	license notices just after the title page:
	2856
	2857	@smallexample
	2858	@group
	2859	Copyright (C) @var{year} @var{your name}.
	2860	Permission is granted to copy, distribute and/or modify this document
	2861	under the terms of the GNU Free Documentation License, Version 1.2
	2862	or any later version published by the Free Software Foundation;
	2863	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
	2864	Texts. A copy of the license is included in the section entitled ``GNU
	2865	Free Documentation License''.
	2866	@end group
	2867	@end smallexample
	2868
	2869	If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
	2870	replace the ``with@dots{}Texts.'' line with this:
	2871
	2872	@smallexample
	2873	@group
	2874	with the Invariant Sections being @var{list their titles}, with
	2875	the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
	2876	being @var{list}.
	2877	@end group
	2878	@end smallexample
	2879
	2880	If you have Invariant Sections without Cover Texts, or some other
	2881	combination of the three, merge those two alternatives to suit the
	2882	situation.
	2883
	2884	If your document contains nontrivial examples of program code, we
	2885	recommend releasing these examples in parallel under your choice of
	2886	free software license, such as the GNU General Public License,
	2887	to permit their use in free software.
	2888
	2889	@c Local Variables:
	2890	@c ispell-local-pdict: "ispell-dict"
	2891	@c End:
	2892
	2893
	2894	@c ---------------------------------------------------------------------
	2895	@c ---------------------------------------------------------------------
	2896
	2897	@node Reporting bugs
	2898	@chapter Reporting bugs
	2899
	2900	Report bugs to <obrebski@@amu.edu.pl>.
	2901
	2902	@c ---------------------------------------------------------------------
	2903	@c ---------------------------------------------------------------------
	2904
	2905	@c @node Copyright
	2906	@c @chapter Copyright
	2907	@c
[9ace5d2]	2908	@c Copyright 2004 by Tomasz ObrÄbski
[25ae32e]	2909	@c This software is free for research and educational use.
	2910
	2911	@c ---------------------------------------------------------------------
	2912	@c ---------------------------------------------------------------------
	2913
	2914	@node Author
	2915	@chapter Author
	2916
	2917
	2918	@bye

Note: See TracBrowser for help on using the repository browser.

Download in other formats: