Context Navigation

source: doc/utt.texinfo

Last change on this file was d6a59ca, checked in by Tomasz Obrebski <obrebski@…>, 12 years ago
Poprawki w dokumentacji (utf8 dzia�a), poprawka w tre
Property mode set to `100644`
File size: 85.2 KB

Rev	Line
[9ace5d2]	1
[25ae32e]	2	\input texinfo @c --texinfo--
[9ace5d2]	3	@c @documentencoding ISO-8859-2
[25ae32e]	4	@c @documentlanguage pl
	5
	6	@c %**start of header
	7	@setfilename utt.info
	8	@settitle UAM Text Tools v0.90
[d6a59ca]	9	@documentencoding utf-8
[25ae32e]	10	@c %**end of header
	11
	12	@copying
[261bf62]	13	This manual is for UAM Text Tools (version 0.90, October, 2008)
[25ae32e]	14
[9ace5d2]	15	Copyright @copyright{} 2005, 2007 Tomasz ObrÄbski, MichaÅ Stolarski, Justyna Walkowska, PaweÅ Konieczka.
[25ae32e]	16
	17	Permission is granted to copy, distribute and/or modify this document
[261bf62]	18	under the terms of the GNU Free Documentation License, Version 1.2 or
	19	any later version published by the Free Software Foundation; with no
	20	Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
	21	copy of the license is included in the section entitled GNU Free
	22	Documentation License,,GNU Free Documentation License.
[25ae32e]	23
	24	@c @quotation
	25	@c Permission is granted to ...
	26	@c No permission is granted until the document is completed.
	27	@c @end quotation
	28	@end copying
	29
	30	@titlepage
	31	@title UAM Text Tools 0.90 - User Manual
	32	@subtitle edition 0.01, @today
	33	@subtitle status: prescript
[9ace5d2]	34	@author by Justyna Walkowska, Tomasz ObrÄbski and MichaÅ Stolarski
[25ae32e]	35	@page
	36	@vskip 0pt plus 1filll
	37	@insertcopying
	38	@end titlepage
	39
	40	@contents
	41
	42	@c @paragraphindent none
	43
	44	@iftex
[9ace5d2]	45	@tex
	46	% \usepackage[T1]{fontenc}
	47	% \usepackage[utf8]{inputenc}
	48	% \usepackage{times}
	49	@end tex
	50
[25ae32e]	51	@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
	52	@end iftex
	53	@c @headings off
	54	@c @everyheading LEM(1) @\| @\| LEM(1)
	55	@everyfooting @today @c @\| @thispage @\|
	56
	57	@ifnottex
	58
	59	@node Top
	60	@top UTT - UAM Text Tools
	61
	62	@insertcopying
	63
	64	@menu
	65	* General information::
	66	* UTT file format::
	67	* Configuration files::
	68	* UTT components::
	69	* Auxiliary tools::
	70	* Usage examples::
	71	* PMDBF dictionary::
	72	@c * Examples::
	73	@c * Copyright::
	74	* GNU Free Documentation License::
	75	* Reporting bugs::
	76	* Author::
	77	@end menu
	78	@end ifnottex
	79
	80
	81	@c ----------------------------------------------------------------------
	82
	83	@node General information
	84	@chapter General information
	85
	86	UAM Text Tools (UTT) is a package of language processing tools
	87	developed at Adam Mickiewicz University. Its functionality includes:
	88
	89	@itemize @bullet
	90
	91	@item
[9ace5d2]	92	tokenization Ã³ÅÄÅŒ
[25ae32e]	93	@item
	94	dictionary-based morphological analysis
	95	@item
	96	heuristic morphological analysis of unknown words
	97	@item
[9ace5d2]	98	spelling correction Ã³ÅÄÅÄÅŒ
[25ae32e]	99	@item
	100	pattern search
	101	@item
	102	sentence splitting
	103	@item
	104	generation of concordance tables
	105	@end itemize
	106
	107	The toolkit is destined for processing of raw (not annotated)
	108	unrestricted text for any conceivable purpose.
	109
	110	The system is organized as a collection of command-line programs, each
	111	performing one operation, e.g. tokenization, lemmatization, spelling
	112	correction. The components are independent one from another, the
	113	unifying element being the uniform i/o file format.
	114
	115	The components may be combined in various ways to provide various text
	116	processing services. Also new components supplied by the used may be
	117	easily incorporated into the system provided that they respect the i/o
	118	file format conventions.
	119
	120	UTT component programs does not depend on any specific tagset or
	121	morphological description format.
	122
	123	UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
	124	the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
	125
	126	The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use.
	127
	128
	129	List of contributors:
	130
	131	@itemize
	132	@item Pawel Konieczka
[9ace5d2]	133	@item Tomasz ObrÄbski
	134	@item MichaÅ Stolarski
[25ae32e]	135	@item Marcin Walas
	136	@item Justyna Walkowska
[9ace5d2]	137	@item PaweÅ WereÅski
[25ae32e]	138	@end itemize
	139
	140	@c ----------------------------------------------------------------------
	141	@c ---------------------------------------------------------------------
	142
	143	@node UTT file format
	144	@chapter UTT file format
	145
	146	A UTT file contains annotation of a text. It consists of a sequence of
	147	segments. Each segment explicitly refers to a continuous piece of the
	148	text and provides some information on it.
	149
	150	@section Segment format
	151
	152	A segment occupies one line of a UTT file and consists of
	153	space-separated fields:
	154
	155
	156	@quotation
	157	@sp 1
	158	[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
	159	@sp 1
	160	@end quotation
	161
	162	@table @var
	163
	164	@item @var{start}
	165	Non-negative integer value indicating the position in the source text where the
	166	segment starts.
	167
	168	@item @var{length}
	169	Non-negative integer value indicating the length of the segment.
	170
	171	@item @var{type}
	172	A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
	173	@var{type} reflects the main classification of segments -
	174	into words, numbers, punctuation marks, meta-text markers.
	175	@xref{tok output,,tok output}, for description of automatically recognized type markers.
	176
	177	@item @var{form}
	178	This field contains the textual form of the segment or the special
	179	symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
	180
	181	The characters or character sequences that have special meaning in the
	182	@var{form} field are enumerated below.
	183
	184	Characters with special meaning:
	185
	186	@itemize
	187	@item @code{_} - space character
	188	@item @code{*} - undefined contents
	189	@end itemize
	190
	191	Escape sequences:
	192
	193	@itemize
	194	@item @code{\n} - new line
	195	@item @code{\t} - tabulation
	196	@item @code{\r} - carriage return
	197
	198	@item @code{\_} - the @code{_} character
	199	@item @code{\} - the @code{} character
	200	@item @code{\\} - the @code{\} character
	201
	202	@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
	203	@end itemize
	204
	205	@item @var{annotation1}
	206	@item @var{annotation2}
	207	@item ...
	208	Annotation fields have the following format:
	209
	210	@var{longname} @code{:} @var{value}
	211
	212	or
	213
	214	@var{shortname} @var{value}
	215
	216	where @var{longname} is a string of alphanumeric characters
	217	(isalnum() test), @var{shortname} - a single non-alphanumeric character
	218	(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
	219
	220	@end table
	221
	222
	223	Only two fields are mandatory: @var{type} and @var{form}. All other fields
	224	may be absent. In the case when only one number precedes the
	225	@var{type} field, it is interpreted as the @var{START} position.
	226
	227	If the @var{length} field is ommited, the length of the segment is the
	228	length of the @var{form} field, except when the value of the
	229	@var{form} field is @code{*} -- in this case, the length is assumed to
	230	be 0.
	231
	232	If the @var{start} field is also absent, the segment is assumed to directly
	233	follow the preceding one.
	234
	235	@c Conventions:
	236
	237	@c Annotation fields with predefined meaning:
	238
	239	@c @itemize
	240	@c @item @code{!} - UTT components are allowed to modify the contents of
	241	@c the @var{form} field (e.g. spelling correction does this). If this happens the
	242	@c original form of the segment have to be placed in the @code{!}-field.
	243	@c @item @code{@@} - morphological description
	244	@c @item @code{=} - node identifier assignment (used in graph encoding)
	245	@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
	246	@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
	247	@c @end itemize
	248
	249	Segments of length 0 may be used to mark file positions with some
	250	information. See e.g. BOS and EOS (beginning/end of sentence) markers
	251	in the example below.
	252
	253	Example:
	254
	255	sentence: @samp{Piszemy dobre progrumy.}
	256
	257	@example
	258	0000 00 BOS *
[9ace5d2]	259	0000 07 W Piszemy lem:pisaÄ,V
[25ae32e]	260	0007 01 S _
	261	0008 05 W dobre lem:dobry,ADJ
	262	0013 01 S _
	263	0014 08 W progrumy cor:programy lem:program,N
	264	0022 01 P .
	265	0023 00 EOS *
	266	0023 01 S _
	267	0024 00 BOS *
	268	0024 11 W Warszawiacy lem:Warszawiak,N
	269	0035 01 S _
[9ace5d2]	270	0036 03 W teÅŒ
[25ae32e]	271	0039 01 P .
	272	0040 00 EOS *
	273
	274	@end example
	275
	276	@example
	277	0000 BOS *
[9ace5d2]	278	0000 W Piszemy lem:pisaÄ,V
[25ae32e]	279	0007 S _
	280	0008 W dobre lem:dobry,ADJ
	281	0013 S _
	282	0014 W progrumy cor:programy lem:program,N
	283	0022 P .
	284	0023 EOS *
	285	@end example
	286
	287	Posion information may be provided only for some types of segments:
	288
	289	@example
	290	0000 BOS *
[9ace5d2]	291	W Piszemy lem:pisaÄÂ,V
[25ae32e]	292	S _
	293	W dobre lem:dobry,ADJ
	294	S _
	295	W progrumy cor:programy lem:program,N
	296	P .
	297	EOS *
	298	S _
	299	0024 BOS *
	300	W Warszawiacy lem:Warszawiak,N
	301	S _
[9ace5d2]	302	W teÅŒ
[25ae32e]	303	P .
	304	EOS *
	305	@end example
	306
	307	Position/length information may be provided only when necessary:
	308
	309	@example
	310	0000 04 N *
	311	0000 N 12
	312	P .
	313	N 5
	314	S _
	315	W km
	316	@end example
	317
	318	@section UTT File
	319
	320	A UTT file consists of a sequence of segments. The same text position
	321	may be covered by multiple segments. In cosequence, ambiguous text
	322	segmentation and ambiguous annotation may be represented.
	323
	324	There are two structural requirements a valid UTT-formatted file
	325	has to meet:
	326
	327	@itemize @bullet
	328
	329	@item
	330	segments have to be sorted with respect to the @var{position} field,
	331
	332	@item
	333	for each
	334	segment ending at position @var{n}, either there must be a segment starting at
	335	position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
	336	for each segment starting at position @var{n}, either there must be a segment
	337	ending at position @var{n-1}, or the position @var{n-1} must not be covered
	338	by any segment.
	339
	340	@end itemize
	341
	342	A valid annotation for the text fragment
	343	@example
	344	12.5 km
	345	@end example
	346
	347	may be
	348
	349	@example
	350	0000 02 N 12
	351	0000 04 N 12.5
	352	0002 01 P .
	353	0003 01 N 5
	354	0004 01 S _
	355	0005 02 W km
	356	@end example
	357
	358	but not
	359
	360	@example
	361	0000 02 N 12
	362	0000 04 N 12.5
	363	0004 01 S _
	364	0005 02 W km
	365	@end example
	366
[261bf62]	367	because in the latter example the first segment (starting at position
	368	0000, 2 characters long) ends at position @var{n}=0001 which is
	369	covered by the second segment and no segment starts at position
	370	@var{n+2}=0002.
	371
	372
	373	@section Flattened UTT file
	374
[e28a625]	375	A UTT file format has two variants: regular and flattened. The regular
[261bf62]	376	format was described above. In the flattened format some of the
	377	end-of-line characters are replaced with line-feed characters.
	378
	379	The flatten format is basically used to represent whole sentences as
	380	single lines of the input file (all intrasentential end-of-line
	381	characters are replaced with line-feed characters).
	382
	383	This technical trick permits to perform certain text
	384	processing operations on entire sentences with the use of such tools as
	385	@command{grep} (see @command{grp} component) or @command{sed} (see @command{mar} component).
	386
	387	The conversion between the two formats is performed by the tools:
	388	@command{fla} and @command{unfla}.
[25ae32e]	389
	390	@section Character encoding
	391
	392	The UTT component programs accept only 1-byte character encoding, such
[261bf62]	393	as ISO, ANSI, DOS.
[25ae32e]	394
	395
	396	@c @section Formats
	397
	398	@c @unnumberedsubsubsec Basic format
	399
	400	@c While processing large amounts of the overhead related with explicit
	401	@c ... of the start position and segment length becomes ... . Therefore,
	402	@c for efficiency reasons certain shortcuts are possible:
	403
	404	@c @unnumberedsubsubsec Relative start position
	405
	406	@c Start position may be given as relative distance from the last
	407	@c absolut position.
	408
	409	@c @unnumberedsubsubsec Absent length
	410
	411	@c Segment length may by omitted. Normally it can be restored by counting
	412	@c the length of the @emph{form field}. For segments with the special value
	413	@c @code{*} in the @emph{form field} length 0 is assumed.
	414
	415	@c @unnumberedsubsubsec Absent length and start position
	416
	417	@c Both start position and segment length may be omitted. In this format
	418	@c each segment is assumed to follow the previous one. This format is,
	419	@c therefore, suitable only for unambiguously tagged text
	420	@c (0-length markers can be still used.)
	421
	422
	423	@c @table @code
	424	@c @item AL
	425	@c @code{1234 03 W kot}
	426	@c @item RL
	427	@c @code{+56 03 W kot}
	428	@c @item A
	429	@c @code{1234 W kot}
	430	@c @item R
	431	@c @code{+56 W kot}
	432	@c @item 0
	433	@c @code{W kot}
	434	@c @end table
	435
	436
[9ace5d2]	437	@c [JAK UZYSKAÄÂ POLSKIE CZCIONKI W DVI???]
[25ae32e]	438
	439	@macro parhelp
	440	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
	441	Print help.
	442	@end macro
	443
	444
	445	@macro parversion
	446	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
	447	Print version information.
	448	@end macro
	449
	450	@macro parinteractive
	451	@item @b{@minus{}@minus{}interactive, @minus{}i}
	452	This option toggles interactive mode, which is by default off. In the
	453	interactive mode the program does not buffer the output.
	454	@end macro
	455
	456
	457	@c @macro parfile
	458	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	459	@c Input file name.
	460	@c If this option is absent or equal to '@minus{}', the program
	461	@c reads from the standard input.
	462	@c @end macro
	463
	464
	465	@c @macro paroutput
	466	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	467	@c Regular output file name. To regular output the program sends segments
	468	@c which it successfully processed and copies those which were not
	469	@c subject to processing. If this option is absent or equal to
	470	@c '@minus{}', standard output is used.
	471	@c @end macro
	472
	473	@c @macro parfail
	474	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
	475	@c Fail output file name. To fail output the program copies the segments
	476	@c it failed to process. If this option is absent or equal to
	477	@c '@minus{}', standard output is used.
	478	@c @end macro
	479
	480
	481	@c @macro parcopy
	482	@c @item @b{@minus{}@minus{}copy, @minus{}c}
	483	@c Copy succesfully processed segments to regular output also in their
	484	@c original input form.
	485	@c @end macro
	486
	487
	488	@macro parinputfield
	489	@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
	490	The field containing the input to the program. The default is the
	491	@var{form} field. The fields @var{position}, @var{length}, @var{type},
	492	and @var{form} are referred to as @code{1}, @code{2}, @code{3},
	493	@code{4}, respectively.
	494	@end macro
	495
	496
	497	@macro paroutputfield
	498	@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
	499	The name of the field added by the program. The default is the name of the program.
	500	@end macro
	501
	502
	503	@macro pardictionary
	504	@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
	505	Dictionary file name.
	506	@end macro
	507
	508
	509	@macro parprocess
	510	@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
	511	Process segments with the specified value in the @var{type} field.
	512	Multiple occurences of this option are allowed and are interpreted as
	513	disjunction. If this option is absent, all segments are processed.
	514	@end macro
	515
	516
	517	@macro parselect
	518	@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
	519	Select for processing only segments in which the field named
	520	@var{fieldname} is present. Multiple occurences of this option are
	521	allowed and are interpreted as conjunction of conditions. If this
	522	option is absent, all segments are processed.
	523	@end macro
	524
	525
	526	@macro parunselect
	527	@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
	528	Select for processing only segments in which the field @var{fieldname}
	529	is absent. Multiple occurences of this option are allowed and are
	530	interpreted as conjunction of conditions. If this option is absent,
	531	all segments are processed.
	532	@end macro
	533
	534
	535	@macro paroneline
	536	@item @b{@minus{}@minus{}one-line}
	537	This option makes the program print ambiguous annotation in one output
	538	line by generating multiple annotation fields. By default when
	539	ambiguous annotation may be produced for a segment, the segment is
	540	multiplicated and each of the annotations is added to separate copy of
	541	the segment.
	542	@end macro
	543
	544
	545	@macro paronefield
	546	@item @b{@minus{}@minus{}one-field, @minus{}1}
	547	This option makes the program print ambiguous annotation in one
	548	annotation field. By default when ambiguous annotation may be produced
	549	for a segment, the segment is multiplicated and each of the
	550	annotations is added to separate copy of the segment.
	551
	552	This option is useful when working with @command{kot} or @command{con}.
	553	@end macro
	554
	555
	556	@c ---------------------------------------------------------------------
	557	@c CONFIGURATION FILES
	558	@c ---------------------------------------------------------------------
	559
	560	@node Configuration files
	561	@chapter Configuration files
	562
	563	Values for all command line options accepted by a component
	564	may be set in configuration files. The default location of the
	565	configuration files for a component named @command{@var{program}} are
	566
	567	@example
[246900a]	568	@file{/usr/local/etc/utt/@var{program}.conf}
[25ae32e]	569	@end example
	570
	571	for system-wide configuration file and
	572
	573	@example
[246900a]	574	@file{~/.utt/@var{program}.conf}
[25ae32e]	575	@end example
	576
	577	for user configuration file.
	578
	579	@c The configuration file to load may be also specified with the
	580	@c @option{--config} option. Configuration file need not be provided.
	581
	582	For each option, the value is set according to the following priority:
	583
	584	@itemize
	585	@item command line
	586	@c @item configuration file indicated with @option{--config} option
	587	@item user configuration file (or configuration file indicated with the @option{--config} option)
	588	@item system-wide configuration file
	589	@end itemize
	590
	591	Parameter values are specified in the following format:
	592
	593	@var{parametername}=@var{value}
	594
	595	where @var{parametername} is the short or long name of an option accepted by
	596	the program, or
	597
	598	@var{parametername}
	599
	600	if the option does not need arguments.
	601
	602	You can introduce comments to configuration files using the # sign.
	603
	604	If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
	605
	606	@c The equal sign may be omitted.
	607
	608
	609	@quotation Tip
	610	If you have two (or more) frequently used sets of options for the same
	611	program (eg. lem with PMDBF dictionary and lem with a user dictionary)
	612	a good solution is to create two soft links to lem, called
	613	eg. lemg and lemu and specify their configuration in files lemg.conf
	614	and lemu.conf respectively.
	615	@end quotation
	616
	617	@c ---------------------------------------------------------------------
	618	@c COMPONENTS
	619	@c ---------------------------------------------------------------------
	620
	621	@node UTT components
	622	@chapter UTT components
	623
	624	UTT components are of three types:
	625
	626	@menu
	627	Sources: programs which read non-UTT data (e.g. raw text) and produce output
	628	in UTT format
	629	* tok:: a tokenizer
	630
	631	Filters: programs which read and produce UTT-formatted data
	632	* lem:: a morphological analyzer
	633	* gue:: a morphological guesser
[261bf62]	634	* cor:: a simple spelling corrector
	635	* kor:: a more elaborated spelling corrector
[25ae32e]	636	* sen:: a sentensizer
	637	* ser:: a pattern search tool (marks matches)
[261bf62]	638	* mar:: a pattern search tool (introduces arbitrary markers into the text)
[25ae32e]	639	* grp:: a pattern search tool (selects sentences containing a match)
[261bf62]	640	@c * gph:: a word-graph annotation tool::
	641	@c * dgp:: a dependency parser
[25ae32e]	642
	643	Sinks: programs which read UTT data and produce output in another format
	644	* kot:: an untokenizer
	645	* con:: a concordance table generator
	646	@end menu
	647
	648	@c ---------------------------------------------------------------------
	649	@c TOK
	650	@c ---------------------------------------------------------------------
	651
	652	@page
	653	@node tok
	654	@section tok - a tokenizer
	655
	656	@c ----------------------------------------
	657
	658	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	659	@item @strong{Authors:} @tab Tomasz ObrÄbski
[25ae32e]	660	@item @strong{Component category:} @tab source
[261bf62]	661	@item @strong{Input format:} @tab raw text file
	662	@item @strong{Output format:} @tab UTT regular
	663	@item @strong{Required annotation:} @tab -
[25ae32e]	664	@end multitable
	665
	666
	667	@menu
	668	* tok description::
	669	* tok input::
	670	* tok output::
	671	* tok command line options::
	672	* tok example::
	673	@end menu
	674
	675	@node tok description
	676	@subsection Description
	677
	678	@code{tok} is a simple program which reads a text file and identifies
	679	tokens on the basis of their orthographic form. The type of the token
	680	is printed as the @var{type} field.
	681
	682	@node tok input
	683	@subsection Input
	684
	685	Raw text.
	686
	687	@node tok output
	688	@subsection Output
	689
	690	UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
	691
	692	@itemize
	693
	694	@item @code{W}
	695	(word)
	696	- continuous sequence of letters
	697
	698	@item @code{N}
	699	(number)
	700	- continuous sequence of digits
	701
	702	@item @code{S}
	703	(space)
	704	- continuous sequence of space characters
	705
	706	@item @code{P}
	707	(punctuation mark)
	708	- single printable characters not belonging to any of the other classes
	709
	710	@item @code{B}
	711	(unprintable character)
	712	- single unprintable character
	713
	714	@end itemize
	715
	716
	717
	718	@node tok command line options
	719	@subsection Command line options
	720
	721	@table @code
	722
	723	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
	724	Print help.
	725
	726	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
	727	Print version information.
	728
	729	@item @b{@minus{}@minus{}interactive, @minus{}i}
	730	This option toggles interactive mode, which is by default off. In the
	731	interactive mode the program does not buffer the output.
	732
	733	@end table
	734
	735	@node tok example
	736	@subsection Example
	737
	738	Input:
	739
	740	@example
	741	Piszemy dobre programy.
	742	@end example
	743
	744	Output:
	745
	746	@example
	747	0000 07 W Piszemy
	748	0007 01 S _
	749	0008 05 W dobre
	750	0013 01 S _
	751	0014 08 W programy
	752	0022 01 P .
	753	0023 01 S \n
	754	@end example
	755
	756
	757	@c ---------------------------------------------------------------------
	758	@c SEN
	759	@c ---------------------------------------------------------------------
	760
	761	@c @node sen - sentencizer
	762	@c @chapter sen - sentencizer
	763
[9ace5d2]	764	@c Authors: Tomasz ObrÄbski
[25ae32e]	765
	766	@c ---------------------------------------------------------------------
	767	@c LEM
	768	@c ---------------------------------------------------------------------
	769
	770	@page
	771	@node lem
	772	@section lem - morphological analyzer
	773
	774	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	775	@item @strong{Authors:} @tab Tomasz ObrÄbski, MichaÅ Stolarski
[25ae32e]	776	@item @strong{Component category:} @tab filter
[261bf62]	777	@item @strong{Input format:} @tab UTT regular
	778	@item @strong{Output format:} @tab UTT regular
	779	@item @strong{Required annotation:} @tab tok
[25ae32e]	780	@end multitable
	781
	782	@menu
	783	* lem description::
	784	* lem command line options::
	785	* lem input::
	786	* lem output::
	787	* lem example::
	788	* lem dictionaries::
	789	* lem hints::
	790	@end menu
	791
	792	@node lem description
	793	@subsection Description
	794
	795	@command{lem} performs morphological analysis of a simple orthographic
	796	word, returning all its possible morphological annotations,
	797	disregarding the context.
	798
	799	@c ----------------------------------------
	800
	801	@node lem command line options
	802	@subsection Command line options
	803
	804	@table @code
	805	@parhelp
	806	@parversion
	807	@parinteractive
	808	@c @parfile
	809	@c @paroutput
	810	@c @parfail
	811	@c @parcopy
	812	@parinputfield
	813	@paroutputfield
	814	@pardictionary
	815	@parprocess
	816	@parselect
	817	@parunselect
	818	@paroneline
	819	@paronefield
	820	@end table
	821
	822	@c ----------------------------------------
	823
	824	@node lem input
	825	@subsection Input
	826
	827	Lem reads a UTT file and processes the value of the @var{form} field
	828	(the input field may be changed with @option{--input-field} option).
	829
	830	@node lem output
	831	@subsection Output
	832
	833	@command{lem} adds a new annotation field, whose default name is @code{lem}. In
	834	case of ambiguity either the segment is multiplicated (default),
	835	multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
	836	annotation is produced as the value of single @code{lem} field (option
	837	@option{--one-field,-1}):
	838
	839	@itemize @bullet
	840
	841	@item
	842	unambiguous value format:
	843
	844	@example
	845	<lemma>,<descr>
	846	@end example
	847
	848	@item
	849	ambiguous value format (@option{--one-field} option)
	850
	851
	852	@example
	853	<lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
	854	@end example
	855
	856	(alternative descriptions for the same lemma are separated by commas,
	857	alternative lemmata are separated by semicolons.)
	858
	859	@end itemize
	860
	861	@node lem example
	862	@subsection Example
	863
	864	Input:
	865
	866	@example
	867	0000 07 W Piszemy
	868	0007 01 S _
	869	0008 05 W dobre
	870	0013 01 S _
	871	0014 08 W programy
	872	0022 01 P .
	873	0023 01 B \n
	874	@end example
	875
	876	Output (default):
	877
	878	@example
[9ace5d2]	879	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
[25ae32e]	880	0007 01 B _
	881	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
	882	0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
	883	0013 01 B _
	884	0014 08 W programy lem:program,N/GiNpCa
	885	0014 08 W programy lem:program,N/GiNpCn
	886	0014 08 W programy lem:program,N/GiNpCv
	887	0022 01 P .
	888	0023 01 B \n
	889	@end example
	890
	891	Output (@option{--one-line} option):
	892
	893	@example
[9ace5d2]	894	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
[25ae32e]	895	0007 01 S _
	896	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
	897	0013 01 S _
	898	0014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
	899	0022 01 P .
	900	0023 01 S \n
	901	@end example
	902
	903	Output (@option{--one-field} option):
	904
	905	@example
[9ace5d2]	906	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
[25ae32e]	907	0007 01 S _
	908	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
	909	0013 01 S _
	910	0014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
	911	0022 01 P .
	912	0023 01 S \n
	913	@end example
	914
	915	@c ----------------------------------------
	916
	917	@node lem dictionaries
	918	@subsection Dictionaries
	919
	920	@command{lem} requires a dictionary. The dictionary may be provided in
	921	one of two formats: in text (source) format or in binary (fsa) format.
	922
	923	@subsubheading Text format
	924
	925	Dictionary entries have the following structure:
	926
	927	@example
	928	<form>;<lemma>,<descr>[;<lemma>,<descr>]
	929	@end example
	930
	931	@var{lemma} may be given explicitly or in the cut-add format:
	932
	933	@example
	934	@code{[<cut1><add1>-]<cut2><add2>}
	935	@end example
	936
	937	meaning: replace prefix of length @code{<cut1>} with
	938	string @code{<add1>}, replace suffix of length @code{<cut2>} with string
	939	@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
[9ace5d2]	940	@samp{kot}, @code{3-4aÃÅy} transforms @samp{najbielsi} into @samp{biaÃÅy}
[25ae32e]	941
	942	Each dictionary entry must be written in one line and must not contain blank characters.
	943
	944	Examples:
	945	@example
	946	kot;0,N/GaNsCn
	947	kota;1,N/GaNsCg;1,N/GaNsCa
	948	kotu;1,N/GaNsCd
	949	kotem;2,N/GaNsCi
	950	kocie;3t,N/GaNsCl;3t,N/GaNsCv
[9ace5d2]	951	najbielsi;3-4aÅy,ADJ/DsNpCnGp
	952	najbielsze;3-5aÅy,ADJ/DsNpCnGaifn
[25ae32e]	953	najlepsi;dobry,ADJ/DsNpCnGp
	954	najlepsze;dobry,ADJ/DsNpCnGaifn
	955	@end example
	956
	957
	958	The mandatory file name extension for a text dictionary is @code{dic}. For large
	959	dictionaries it is preferable, however, to compile them into binary
	960	(fsa) format.
	961
	962	@subsubheading Binary format
	963
	964	The mandatory file name extension for a binary dictionary is @code{bin}. To
	965	compile a text dictionary into binary format, write:
	966
	967	@example
[d6a59ca]	968	compdic <dictionaryname>.dic <dictionaryname>.bin
[25ae32e]	969	@end example
	970
	971	@subsubheading Polex/PMDBF dictionary
	972
	973	A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
	974	the distribution as the default @emph{lem}'s dictionary. It's
	975	located by default in:
	976
[261bf62]	977	@file{$HOME/.local/share/utt/pl_PL.ISO-8859-2/lem.bin}
	978
	979	in local installation or in
	980
	981	@file{/usr/local/share/utt/pl_PL.ISO-8859-2/lem.bin}
	982
	983	in system installation.
[25ae32e]	984
	985	@node lem hints
	986	@subsection Hints
	987
[261bf62]	988	@subsubheading Combining data from multiple dictionaries
[25ae32e]	989
[261bf62]	990	@itemize
[25ae32e]	991
[261bf62]	992	@item Apply <dict1>, then apply <dict2> to words which were not annotatated.
[25ae32e]	993
[261bf62]	994	@example
	995	lem -d <dict1> \| lem -S lem -d <dict2>
	996	@end example
[25ae32e]	997
[261bf62]	998	@item Add annotations from two dictionaries <dict1> and <dict2>.
[25ae32e]	999
[261bf62]	1000	@example
	1001	lem -c -d <dict1> \| lem -S lem -d <dict2>
	1002	@end example
[25ae32e]	1003
[261bf62]	1004	@end itemize
[25ae32e]	1005
	1006
	1007	@c ---------------------------------------------------------------------
	1008	@c GUE
	1009	@c ---------------------------------------------------------------------
	1010
	1011	@page
	1012	@node gue
	1013	@section gue - morphological guesser
	1014
	1015	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1016
[9ace5d2]	1017	@item @strong{Authors:} @tab MichaÅ Stolarski, Tomasz ObrÄbski
[25ae32e]	1018	@item @strong{Component category:} @tab filter
	1019
	1020	@end multitable
	1021
	1022	@menu
[261bf62]	1023	* gue description::
[25ae32e]	1024	* gue command line options::
	1025	* gue example::
	1026	* gue dictionaries::
	1027	@end menu
	1028
[261bf62]	1029
	1030	@node gue description
	1031	@subsection Description
	1032
	1033	@command{gue} guesess morphological descriptions of the form contained
	1034	in the @var{form} field.
	1035
	1036
[25ae32e]	1037	@node gue command line options
	1038	@subsection Command line options
	1039
	1040	@table @code
	1041
	1042	@parhelp
	1043	@parversion
	1044	@parinteractive
	1045	@c @parfile
	1046	@c @paroutput
	1047	@c @parfail
	1048	@c @parcopy
	1049	@parinputfield
	1050	@paroutputfield
	1051	@pardictionary
	1052	@parprocess
	1053	@parselect
	1054	@parunselect
	1055	@paroneline
	1056	@paronefield
	1057
	1058	@item @b{@minus{}@minus{}delta=@var{n}}
	1059	Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
	1060
	1061
	1062	@item @b{@minus{}@minus{}cut-off=@var{n}}
	1063	Do not display answers with less weight than cut-off value (default=`200').
	1064
	1065
	1066	@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
	1067	Guess up to n descriptions (default=`0', which means 'display all results').
	1068
	1069
	1070
	1071	@end table
	1072
	1073	@node gue example
	1074	@subsection Example
	1075
	1076	@example
	1077	command: gue -n 2
	1078
	1079	input:
	1080	0000 07 W smerfny
	1081
	1082	output:
	1083	0000 07 W smerfny gue:,ADJ/CaDpGiNs
	1084	0000 07 W smerfny gue:,ADJ/CnvDpGaipNs
	1085	@end example
	1086
	1087
	1088	@node gue dictionaries
	1089	@subsection Dictionaries
	1090
	1091	@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
	1092	The fsa format is created by compiling text-format dictionaries.
	1093
	1094
	1095
	1096	@subsubheading Text format
	1097
	1098	Dictionary entries have the following structure:
	1099
	1100	@example
	1101	@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
	1102	@end example
	1103
	1104	@var{lemma} must be given in the cut-add format:
	1105
	1106	@example
	1107	@code{[<cut1><add1>-]<cut2><add2>}
	1108	@end example
	1109	(no spaces in between): replace prefix of length @var{cut1} with
	1110	string @var{add1}, replace suffix of length @var{cat2} with string
	1111	@var{add2}.
	1112
	1113
[9ace5d2]	1114	Example: @code{3-4aÅy} transforms @i{najbielsi} into @i{biaÅy}
[25ae32e]	1115
	1116
	1117	@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
	1118
	1119	@var{weight} is an integer value between 1 and 999 indicating the
	1120	likelihood of the guess.
	1121
[9ace5d2]	1122	@c @example
	1123	@c *ÅkÄ;1a,N/GfNsCa
	1124	@c naj*elszy;3-4aÅy,ADJ/...:...
	1125	@c @end example
[25ae32e]	1126
	1127
	1128	@c ---------------------------------------------------------------------
	1129	@c COR
	1130	@c ---------------------------------------------------------------------
	1131
	1132	@page
	1133	@node cor
	1134	@section cor - spelling corrector
	1135
	1136	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1137	@item @strong{Authors:} @tab Tomasz ObrÄbski, MichaÅ Stolarski
[25ae32e]	1138	@item @strong{Component category:} @tab filter
[261bf62]	1139	@item @strong{Input format:} @tab UTT regular
	1140	@item @strong{Output format:} @tab UTT regular
	1141	@item @strong{Required annotation:} @tab tok
[25ae32e]	1142	@end multitable
	1143
[261bf62]	1144	@menu
	1145	* cor description::
	1146	* cor command line options::
	1147	* cor dictionaries::
	1148	@end menu
	1149
	1150
	1151	@node cor description
	1152	@subsection Description
	1153
[25ae32e]	1154	The spelling corrector applies Kemal Oflazer's dynamic programming
	1155	algorithm @cite{oflazer96} to the FSA representation of the set of
	1156	word forms of the Polex/PMDBF dictionary. Given an incorrect
	1157	word form it returns all word forms present in the dictionary whose
	1158	edit distance is smaller than the threshold given as the parameter.
	1159
	1160
	1161	@node cor command line options
	1162	@subsection Command line options
	1163
	1164	@table @code
	1165
	1166	@parhelp
	1167	@parversion
	1168	@parinteractive
	1169	@c @parfile
	1170	@c @paroutput
	1171	@c @parfail
	1172	@c @parcopy
	1173	@parinputfield
	1174	@paroutputfield
	1175	@pardictionary
	1176	@parprocess
	1177	@parselect
	1178	@parunselect
	1179	@paroneline
	1180	@paronefield
	1181
	1182	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
	1183	Maximum edit distance (default='1').
	1184
[261bf62]	1185	@c @item @b{@minus{}@minus{}replace, @minus{}r}
	1186	@c Replace original form with corrected form, place original form in the
	1187	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
	1188
[25ae32e]	1189
	1190	@end table
	1191
	1192	@node cor dictionaries
	1193	@subsection Dictionaries
	1194
	1195	@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
	1196	The fsa format is created by compiling text-format dictionaries.
	1197
	1198	@subsubheading Text format
	1199
	1200	The @command{cor} dictionary is a list of words:
	1201	@example
	1202	odlot
	1203	odlotowy
	1204	odludek
	1205	@end example
	1206
[261bf62]	1207	@subsubheading Binary format
	1208
	1209	The mandatory file name extension for a binary dictionary is @code{bin}. To
	1210	compile a text dictionary into binary format, write:
	1211
	1212	@example
[d6a59ca]	1213	compdic <dictionaryname>.dic <dictionaryname>.bin
[261bf62]	1214	@end example
	1215
	1216	@c ---------------------------------------------------------------------
	1217	@c KOR
	1218	@c ---------------------------------------------------------------------
	1219
	1220	@page
	1221	@node kor
	1222	@section kor - configurable spelling corrector
	1223
[9ace5d2]	1224	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1225	@item @strong{Authors:} @tab PaweÅ Werenski, Tomasz ObrÄbski, MichaÅ Stolarski
	1226	@item @strong{Component category:} @tab filter
	1227	@item @strong{Input format:} @tab UTT regular
	1228	@item @strong{Output format:} @tab UTT regular
	1229	@item @strong{Required annotation:} @tab tok
	1230	@end multitable
	1231
	1232	@menu
	1233	* kor description::
	1234	* kor command line options::
	1235	* kor weights definition file::
	1236	* kor dictionaries::
	1237	@end menu
	1238
	1239
	1240	@node kor description
	1241	@subsection Description
	1242
	1243	The spelling corrector applies a Pawel Werenski's dynamic programming
	1244	algorithm to the FSA representation of the set of word forms of the
	1245	Polex/PMDBF dictionary. The algorithm is an extension of K. Oflazer
	1246	algorithm used by @command{cor}. In the extended version it is
	1247	possible to assign weights to individual edit operations.
	1248
	1249	Given an incorrect word form it returns all word forms
	1250	present in the dictionary whose edit distance is smaller than the
	1251	threshold given as the parameter.
	1252
	1253
	1254	@node kor command line options
	1255	@subsection Command line options
	1256
	1257	@table @code
	1258
	1259	@parhelp
	1260	@parversion
	1261	@parinteractive
	1262	@c @parfile
	1263	@c @paroutput
	1264	@c @parfail
	1265	@c @parcopy
	1266	@parinputfield
	1267	@paroutputfield
	1268	@pardictionary
	1269	@parprocess
	1270	@parselect
	1271	@parunselect
	1272	@paroneline
	1273	@paronefield
	1274
	1275	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
	1276	Maximum edit distance (default='1').
	1277
	1278	@item @b{@minus{}@minus{}weights=@var{filename}, @minus{}w @var{filename}}
	1279	Edit operations' weights file.
	1280
	1281	@c @item @b{@minus{}@minus{}replace, @minus{}r}
	1282	@c Replace original form with corrected form, place original form in the
	1283	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
	1284
	1285
	1286	@end table
	1287
	1288
	1289	@node kor weights definition file
	1290	@subsection Weights definition file
	1291
	1292	Example:
	1293
	1294	@example
	1295
	1296	%stdcor 1
	1297	%xchg 1
	1298	ÅŒ rz 0.5
	1299	ch h 0.5
	1300	u Ã³ 0.5
	1301
	1302	@end example
	1303
	1304
	1305	Default weight is set to 1 (@code{%stdcor 1}), the weight of exchange
	1306	operation is set to 1 (@code{%xchg 1}), the three principal orthographic
	1307	errors are assigned the weight 0.5.
	1308
	1309	The edit operation weight declaration, such as
	1310
	1311	@example
	1312	ÅŒ rz 0.5
	1313	@end example
	1314
	1315	works in both ways, i.e. ÅŒ->rz, rz->ÅŒ.
	1316
	1317	The default weights definition file for @code{kor} is:
	1318
	1319	@example
	1320	$HOME/.local/share/utt/weights.kor
	1321	@end example
	1322
	1323	or, if the above mentioned file is absent:
	1324
	1325	@example
	1326	/usr/local/share/utt/weights.kor
	1327	@end example
	1328
	1329
	1330	@node kor dictionaries
	1331	@subsection Dictionaries
	1332
	1333	see @command{cor}
[261bf62]	1334
	1335	@c ---------------------------------------------------------------------
	1336	@c SEN
	1337	@c ---------------------------------------------------------------------
	1338
[25ae32e]	1339	@page
	1340	@node sen
	1341	@section sen - a sentensizer
	1342
	1343	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1344
[9ace5d2]	1345	@item @strong{Authors:} @tab Tomasz ObrÄbski
[25ae32e]	1346	@item @strong{Component category:} @tab filter
[261bf62]	1347	@item @strong{Input format:} @tab UTT regular
	1348	@item @strong{Output format:} @tab UTT regular
	1349	@item @strong{Required annotation:} @tab tok
[25ae32e]	1350
	1351	@end multitable
	1352
	1353
	1354	@menu
[261bf62]	1355	* sen description::
[25ae32e]	1356	@c * sen input::
	1357	@c * sen output::
	1358	* sen example::
	1359	@end menu
	1360
[261bf62]	1361	@node sen description
	1362	@subsection Description
	1363
	1364	@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
	1365
[25ae32e]	1366	@node sen example
	1367	@subsection Example
	1368
	1369	@example
	1370	command: sen
	1371
	1372	input:
[9ace5d2]	1373	0000 05 W CzeÅÄ
[25ae32e]	1374	0005 01 P !
	1375	0006 01 S _
	1376	0007 02 W To
	1377	0009 01 S _
	1378	0010 02 W ja
	1379	0012 01 P .
	1380	0013 01 S \n
	1381
	1382	output:
	1383	0000 00 BOS *
[9ace5d2]	1384	0000 05 W CzeÅÄ
[25ae32e]	1385	0005 01 P !
	1386	0006 00 EOS *
	1387	0006 00 BOS *
	1388	0006 01 S _
	1389	0007 02 W To
	1390	0009 01 S _
	1391	0010 02 W ja
	1392	0012 01 P .
	1393	0013 01 S \n
	1394	0014 00 EOS *
	1395	@end example
	1396
	1397
	1398	@c ---------------------------------------------------------------------
	1399	@c GPH
	1400	@c ---------------------------------------------------------------------
	1401
	1402	@c @node gph - graphizer
	1403	@c @chapter gph - graphizer
	1404
[9ace5d2]	1405	@c Authors: Tomasz ObrÄbski
[25ae32e]	1406
	1407
	1408
	1409	@c ---------------------------------------------------------------------
[261bf62]	1410	@c SER
[25ae32e]	1411	@c ---------------------------------------------------------------------
	1412
	1413	@page
	1414	@node ser
	1415	@section ser - pattern search tool
	1416
	1417	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1418	@item @strong{Authors:} @tab Tomasz ObrÄbski
[25ae32e]	1419	@item @strong{Component category:} @tab filter
[261bf62]	1420	@item @strong{Input format:} @tab UTT regular
	1421	@item @strong{Output format:} @tab UTT regular
	1422	@item @strong{Required annotation:} @tab tok, lem --one-field
[25ae32e]	1423	@end multitable
	1424
	1425	@menu
[261bf62]	1426	* ser description::
[25ae32e]	1427	* ser command line options::
	1428	* ser pattern::
	1429	* ser how ser works::
	1430	* ser customization::
	1431	* ser limitations::
	1432	* ser requirements::
	1433	@end menu
	1434
	1435
[261bf62]	1436	@node ser description
	1437	@subsection Description
	1438
	1439	@command{ser} looks for patterns in UTT-formatted texts.
	1440
	1441
[25ae32e]	1442	@c ---------------------------------------------------------------------
	1443	@node ser command line options
	1444	@subsection Command line options
	1445
	1446	@table @code
	1447
	1448	@parhelp
	1449	@parversion
	1450	@c @parfile
	1451	@c @paroutput
	1452	@c @parinputfield
	1453	@c @paroutputfield
	1454	@parprocess
	1455	@parinteractive
	1456
	1457	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1458	The search pattern.
	1459
	1460	@item @b{@minus{}@minus{}morph=@var{field}}
	1461	The name of the annotation field containing the morphological
	1462	description (default @code{lem}).
	1463
	1464	@item @b{@minus{}@minus{}flex}
	1465	Only print the generated flex source code.
	1466
	1467	@item @b{@minus{}@minus{}macro=@var{filename}}
	1468	Read macrodefinitions from file @var{filename} rather than from
	1469	default location. This option allows to redefine the set of terms.
	1470
	1471	@item @b{@minus{}@minus{}define=@var{filename}}
	1472	Append macrodefinitions from file @var{filename}. This option
	1473	allows to extend the set of terms.
	1474
	1475	@end table
	1476
	1477
	1478	@c ---------------------------------------------------------------------
	1479	@node ser pattern
	1480	@subsection Pattern
	1481
	1482	The @command{ser} pattern is a regular expression over terms corresponding
	1483	to text segments or segment sequences. Predefined terms are:
	1484
	1485	@table @code
	1486
	1487	@item seg(@var{t},@var{f},@var{a})
	1488	a segment of type @var{t}, containing form @var{f} and annotation
	1489	@var{a}
	1490
	1491	@item form(@var{f})
	1492	a segment containing form @var{f}
	1493
	1494	@item field(@var{f})
	1495	a segment containing annotation field @var{f}
	1496
	1497	@item space(@var{f})
	1498	a space segment of form @var{f}
	1499
	1500	@item word(@var{f})
	1501	a word segment of form @var{f}
	1502
	1503	@item punct(@var{f})
	1504	a punct segment of form @var{f}
	1505
	1506	@item number(@var{f})
	1507	a number segment of form @var{f}
	1508
	1509	@item lexeme(@var{f})
	1510	a word segment with lemma @var{f}
	1511
	1512	@item cat(@var{c})
	1513	a word segment of category @var{c}
	1514
	1515	@end table
	1516
	1517	All arguments are optional. If an argument is omitted, an arbitrary
	1518	string of non-blank characters is assumed as the argument value. Term
	1519	arguments may be arbitrary character-level regular expressions. The
	1520	following special symbols can by used:
	1521
	1522	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1523	@item @code{[@dots{}]} @tab a character class
	1524	@item @code{[^@dots{}]} @tab a negated character class
	1525	@item @code{\|} @tab alternative
	1526	@item @code{*} @tab repetition, including zero times
	1527	@item @code{+} @tab repetition, at least one time
	1528	@item @code{?} @tab optionality
	1529	@item @code{@{@var{m},@var{n}@}} @tab repetition from @var{m} to @var{n} times
	1530	@item @code{@{@var{m},@}} @tab repetition @var{m} or more times
	1531	@item @code{@{@var{m}@}} @tab repetition @var{m} times
	1532	@item @code{@var{\ddd}} @tab the character with octal value @var{ddd}
	1533	@item @code{\x@var{hh}} @tab the character with hexadecimal value @var{hh}
	1534	@item @code{( )} @tab parentheses, used to override precedence
	1535	@c @end multitable
	1536
	1537	@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1538	@item @code{.} @tab a non-blank character
	1539	@item @code{\w} @tab a letter
	1540	@item @code{\W} @tab a non-blank character other than a letter
	1541	@item @code{\d} @tab a digit
	1542	@item @code{\D} @tab a non-blank character other than a digit
	1543	@item @code{\s} @tab a space or tab character
	1544	@item @code{\S} @tab a non-blank character (the same as @code{.})
	1545	@item @code{\l} @tab a lowercase letter
	1546	@item @code{\L} @tab an uppercase letter
	1547	@end multitable
	1548
	1549
	1550	@noindent The following characters:
	1551	@example
	1552	@verb{% [ ] ^ \| * + ? { } , . < > \ %}
	1553	@end example
	1554	must be escaped with a backslash, i.e. written as:
	1555	@example
	1556	@verb{% \[ \] \^ \\| \* \+ \? \{ \} \, \. \< \> \\ %}
	1557	@end example
	1558
	1559	@quotation Note
	1560	The special symbols are ... borrowed from Perl with minor
	1561	modifications ... for convenience
	1562	The meaning of certain special characters/sequences slightly differs
	1563	from their common ???. This is motivated by convenience reasons.
	1564	The meaning of the @code{.} special character is modified due to
	1565	the special function of spaces in utt files (they are field
	1566	separators). Use @code{\s} to explicitly
	1567	@end quotation
	1568
	1569	In the argument of the @code{cat} term a special operator <...> may be
	1570	used. A category specification enclosed in angle brackets matches all
	1571	category descriptions which are consistent (non-contradictory) with the
	1572	specification. For example @code{<N>} matches all noun descriptions,
	1573	@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
	1574
	1575
	1576	@*
	1577	@noindent @b{Examples of one-segment patterns:}
	1578
	1579	@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1580	@item @code{seg} @tab any segment
	1581	@item @code{word} @tab any word-form
	1582	@item @code{word(pomocy)} @tab the word-form @samp{pomocy}
	1583	@item @code{word(naj.+)} @tab a word-form beginning with @samp{naj}
	1584	@item @code{word(\L\l+)} @tab a capitalized word-form
	1585	@item @code{punct} @tab a punctuation character
	1586	@item @code{space(.\\n.)} @tab a space segment containing a newline character
	1587	@item @code{lexeme(pomoc)} @tab any form of the lexeme 'pomoc'
	1588	@item @code{cat(N/.*)} @tab a word which category starts with @code{N/}
	1589	@item @code{cat(<N/Ca>)} @tab a word which category matches @code{N/Ca}
	1590	@end multitable
	1591
	1592	@*
	1593	@noindent @b{Examples of multi-segment patterns:}
	1594
	1595	@table @code
	1596
	1597	@item (word(\L) punct(\.) space?)+ word(\L\l+)
	1598	a sequence of initials followed by a surname
	1599
	1600	@item punct seg(W\|S\|N)* cat(<NPRO/Sr>) seg(W\|S\|N)* punct
	1601	a text fragment between two punctuation characters, containing an
	1602	ocurrence of a relative pronoun
	1603
	1604	@end table
	1605
	1606
	1607	@node ser how ser works
	1608	@subsection How ser works
	1609
	1610	@node ser customization
	1611	@subsection Customization
	1612
	1613	@c All predefined terms correspond to single segments,
	1614
	1615	@example
[261bf62]	1616	define(`verbseq', `(cat(<V>) (space cat(<V>)))')
[25ae32e]	1617	@end example
	1618
	1619
	1620	the term @code{cat()} may not be used as a ... of
	1621
	1622	@c See @command{m4} manual for further details on macro definition format.
	1623
	1624	@node ser limitations
	1625	@subsection Limitations
	1626
[261bf62]	1627	Do not use more than 3 attributes in <>.
[25ae32e]	1628
	1629	@node ser requirements
	1630	@subsection Requirements
	1631
	1632	In order to run @command{ser}, the following programs must be
	1633	installed in the system:
	1634
	1635	@itemize
	1636
	1637	@item @command{m4}
	1638	@item @command{grep}
	1639	@item @command{flex}
	1640	@item @command{gcc}
	1641
	1642	@end itemize
	1643
	1644
	1645	@c ---------------------------------------------------------------------
[261bf62]	1646	@c GRP
[25ae32e]	1647	@c ---------------------------------------------------------------------
	1648
	1649	@page
	1650	@node grp
	1651	@section grp - pattern search tool
	1652
	1653	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1654	@item @strong{Authors:} @tab Tomasz ObrÄbski
[25ae32e]	1655	@item @strong{Component category:} @tab filter
[261bf62]	1656	@item @strong{Input format:} @tab UTT flattened
	1657	@item @strong{Output format:} @tab UTT flattened
	1658	@item @strong{Required annotation:} @tab tok, sen, lem --one-field
[25ae32e]	1659	@end multitable
	1660
	1661
[261bf62]	1662	@menu
	1663	* grp description::
	1664	* grp command line options::
	1665	* grp pattern::
	1666	* grp hints::
	1667	@end menu
	1668
	1669
	1670	@node grp description
	1671	@subsection Description
	1672
[25ae32e]	1673	@code{gre} selects sentences containing an expression matching a
	1674	pattern. The pattern format is exactly the same as that accepted by
	1675	@code{ser}.
	1676
	1677	@code{gre} is intended mainly for speeding up corpus search process.
	1678	It is extremely fast (processing speed is usually higher then the speed
	1679	of reading the corpus file from disk).
	1680
	1681	@node grp command line options
	1682	@subsection Command line options
	1683
	1684	@table @code
	1685
	1686	@parhelp
	1687	@parversion
	1688	@parprocess
	1689	@parinteractive
	1690
	1691	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1692	The search pattern.
	1693
	1694	@item @b{@minus{}@minus{}morph=@var{field}}
	1695	The name of the annotation field containing the morphological
	1696	description (default @code{lem}).
	1697
	1698	@item @b{@minus{}@minus{}command}
	1699	Only print the generated flex source code.
	1700
	1701	@item @b{@minus{}@minus{}macro=@var{filename}}
	1702	Read macrodefinitions from file @var{filename} rather than from
	1703	default location. This option allows to redefine the set of terms.
	1704
	1705	@item @b{@minus{}@minus{}define=@var{filename}}
	1706	Append macrodefinitions from file @var{filename}. This option
	1707	allows to extend the set of terms.
	1708
	1709	@end table
	1710
	1711
	1712	@node grp pattern
	1713	@subsection Pattern
	1714
	1715	(see @code{ser})
	1716
	1717	@node grp hints
	1718	@subsection Hints
	1719
	1720	The corpus search speed may be increased by combining grp with lzop
	1721	compression tool (grp usually processes data faster than it is read from a
	1722	disk, especially for slow laptop drives).
	1723
	1724	@example
[e28a625]	1725	cat corpus \| tok \| sen \| lem -1 \| fla \| lzop -7 > corpus.grp.lzo
[25ae32e]	1726	@end example
	1727
	1728	@example
[e28a625]	1729	lzop -cd corpus.grp.lzo \| grp -e @var{EXPR} \| unfla \| ser -e @var{EXPR}
[25ae32e]	1730	@end example
	1731
	1732
[261bf62]	1733
[25ae32e]	1734	@c ---------------------------------------------------------------------
[261bf62]	1735	@c MAR
[25ae32e]	1736	@c ---------------------------------------------------------------------
[261bf62]	1737
	1738	@page
	1739	@node mar
	1740	@section mar
	1741
	1742	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1743	@item @strong{Authors:} @tab Marcin Walas, Tomasz ObrÄbski
[e28a625]	1744	@item @strong{Input format:} @tab UTT flattened
	1745	@item @strong{Output format:} @tab UTT flattened
	1746	@item @strong{Required annotation:} @tab tok, sen, lem -1
[261bf62]	1747	@end multitable
	1748
[2d89d4b]	1749	@subsection Description
	1750	@code{mar} is a perl script, which matches given pattern on the utt-formated text
	1751	and tags matching parts with any number of user-defined tags.
	1752
	1753	@subsection Command line options
	1754	@table @code
	1755	@parhelp
	1756	@parversion
	1757
	1758	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1759	The search pattern.
	1760	@item @b{@minus{}@minus{}action=@var{action}, @minus{}a @var{action} [p] [s] [P]}
	1761	Perform only indicated actions. Where:
	1762	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1763	@item @code{p} @tab preprocess
	1764	@item @code{s} @tab search
	1765	@item @code{P} @tab postprocess
	1766	@end multitable
	1767	default: psP
	1768
	1769	@item @b{@minus{}@minus{}command}
	1770	print generated sed command, then exit
	1771
	1772	@item @b{@minus{}@minus{}help, @minus{}h}
	1773	print help, then exit
	1774
	1775	@item @b{@minus{}@minus{}version, @minus{}v}
	1776	print version, then exit
	1777	@end table
	1778	@subsection Tokens in pattern
	1779	@code{mar} pattern is based on @code{ser} patterns(see @pxref{ser pattern}). @code{mar} pattern is a @code{ser} pattern,
	1780	in which you can add any number of matching tags, which will be printed in exacly the place, where
	1781	they were placed in the pattern. A valid token starts with @@ which follows any number of alphanumeric
	1782	characters. For example valid match tokens are: @@STARTMATCH @@ENDMATCH
	1783
	1784	Matching tokens can be placed between, before or after any of @code{ser} pattern terms. They don't have
	1785	to be paritied. There can be any number of them in the pattern (zero or more). They don't have to be unique.
	1786	They can be placed one after another. For example:
	1787
	1788	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaa}
	1789	@item @code{@@BOM lexeme(pomoc)} @tab place tag @b{BOM} before any form of the lexeme 'pomoc'
	1790	@item @code{@@MATCH lexeme(pomoc) @@MATCH} @tab place tag @b{MATCH} before and after any form of the lexeme 'pomoc'
	1791	@item @code{cat(<ADJ>) @@MATCH lexeme(pomoc) @@MATCH} @tab place tag @b{MATCH} before and after any form of the lexeme 'pomoc' which is followef by adjective
	1792	@item @code{cat(<ADJ>) @@TAG @@BOM lexeme(pomoc) @@EOM} @tab place tags @b{TAG} and @b{BOM} before any form of the lexeme 'pomoc' which is followed by adjective and tag @b{EOM} after it
	1793	@end multitable
	1794
	1795	(see mar's help 'mar -h' for some more information)
	1796
	1797	@subsection How mar works
	1798	@code{mar} translates given @code{ser} pattern with @code{m4} macroprocessor to regular expression. Then it changes it into @code{sed} command script, which is then executed.
	1799
	1800	You can see translated sed script by using the @code{@minus{}@minus{}command} option.
	1801	@subsection Limitations
	1802	The complexity of computations performed by @code{mar} increases linearly with the number of placed tokens. So it is highly recommended not to place too much tokens.
	1803	@subsection Requirements
	1804	In order to run @code{mar}, the following programs must be installed in the system:
	1805
	1806	@itemize
	1807
	1808	@item @command{m4}
	1809	@item @command{grep}
	1810	@item @command{sed}
	1811
	1812	@end itemize
	1813
[261bf62]	1814
[e28a625]	1815
[261bf62]	1816	@c ---------------------------------------------------------------------
	1817	@c KOT
[25ae32e]	1818	@c ---------------------------------------------------------------------
	1819
	1820	@page
	1821	@node kot
	1822	@section kot - untokenizer
	1823
[261bf62]	1824	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1825	@item @strong{Authors:} @tab Tomasz ObrÄbski
[261bf62]	1826	@item @strong{Component category:} @tab filter
	1827	@item @strong{Input format:} @tab UTT regular
	1828	@item @strong{Output format:} @tab text
	1829	@item @strong{Required annotation:} @tab tok
	1830	@end multitable
[25ae32e]	1831
	1832
	1833	@menu
[261bf62]	1834	* kot description::
[25ae32e]	1835	* kot command line options::
	1836	* kot usage examples::
	1837	@end menu
	1838
[261bf62]	1839	@node kot description
	1840	@subsection Description
	1841
	1842	@command{kot} transforms a UTT formatted file back into raw text format.
	1843
[25ae32e]	1844	@node kot command line options
	1845	@subsection Command line options
	1846
	1847	@table @code
	1848
	1849	@parhelp
	1850
	1851	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
	1852
	1853	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	1854
	1855	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	1856
	1857	@c @item @b{@minus{}@minus{}interactive @minus{}i}
	1858
	1859	@c @item @b{@minus{}@minus{}config=@var{filename}}
	1860
	1861	@item
	1862
	1863	@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
	1864	print @var{string} between nonadjacent segments of the input file
	1865
	1866	@item @b{@minus{}@minus{}spaces, @minus{}r}
	1867	retain the special characters @code{_}, @code{\t},
	1868	@code{\n}, @code{\r}, @code{\f} unexpanded in the output
	1869
	1870	@end table
	1871
	1872	@node kot usage examples
	1873	@subsection Usage examples
	1874
	1875	@example
	1876	cat legia.txt \| tok \| kot
	1877	@end example
	1878
	1879	@example
	1880	cat legia.txt \| tok \| lem -1 \| kot
	1881	@end example
	1882
[261bf62]	1883	@c ---------------------------------------------------------------
	1884	@c CON
	1885	@c ---------------------------------------------------------------
	1886
[25ae32e]	1887
	1888	@page
	1889	@node con
	1890	@section con - concordance table generator
	1891
	1892	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1893	@item @strong{Authors:} @tab Justyna Walkowska
	1894	@item @strong{Component category:} @tab sink
[261bf62]	1895	@item @strong{Input format:} @tab UTT regular
	1896	@item @strong{Output format:} @tab text
	1897	@item @strong{Required annotation:} @tab ser or mar
[25ae32e]	1898	@end multitable
	1899	@c
	1900
	1901	@menu
[261bf62]	1902	* con description::
[25ae32e]	1903	* con command line options::
	1904	* con usage example::
	1905	* con hints::
	1906	@end menu
	1907
[261bf62]	1908
	1909	@node con description
	1910	@subsection Description
	1911
	1912	@command{con} generates a concordance table based on a pattern given to @command{ser}.
	1913
	1914
[25ae32e]	1915	@node con command line options
	1916	@subsection Command line options
	1917
	1918	@table @code
	1919
	1920	@parhelp
	1921
	1922	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
	1923	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
	1924	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	1925	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	1926	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
	1927	@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
	1928	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
	1929	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
	1930	@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
	1931	@c @item @b{@minus{}@minus{}interactive @minus{}i}
	1932	@c @item @b{@minus{}@minus{}config=@var{filename}}
	1933	@c @item
	1934	@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1935	@c search pattern
	1936	@c
	1937	@c @item @b{@minus{}@minus{}flex}
	1938	@c only print the generated flex source code
	1939	@c
	1940	@c @item @b{@minus{}@minus{}macro=@var{filename}}
	1941	@c read macrodefinitions from file @var{filename} rather than from
	1942	@c default location. This option allows to redefine the set of terms.
	1943	@c
	1944	@c @item @b{@minus{}@minus{}define=@var{filename}}
	1945	@c append macrodefinitions from file @var{filename}. This option
	1946	@c allows to extend the set of terms.
	1947
	1948	@item @b{@minus{}@minus{}left @minus{}l}
	1949	Left context info (default='30c'). Example:
	1950	@example
	1951	-l=5c: left context is 5 characters
	1952	-l=5w: left context is 5 words
	1953	-l=5s: left context is 5 non-empty input lines
	1954	-l='\s*\S+\sr\S+BOS': left context starts with the given regex
	1955	@end example
	1956
	1957	@item @b{@minus{}@minus{}right @minus{}r}
	1958	Right context info (default='30c').
	1959	@item @b{@minus{}@minus{}trim @minus{}t}
	1960	Clear incomplete words from output.
	1961	@item @b{@minus{}@minus{}white @minus{}w}
	1962	DO NOT change all white characters into spaces.
	1963	@item @b{@minus{}@minus{}column @minus{}c}
	1964	Left column minimal width in characters (default = 0).
	1965	@item @b{@minus{}@minus{}ignore @minus{}i}
	1966	Ignore segment inconsistency in the input.
[261bf62]	1967	@item @b{@minus{}@minus{}bom}
[25ae32e]	1968	Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
[261bf62]	1969	@item @b{@minus{}@minus{}eom}
[25ae32e]	1970	End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
	1971	@item @b{@minus{}@minus{}bod}
	1972	Selected segment beginning display string (default='[').
	1973	@item @b{@minus{}@minus{}eod}
	1974	Selected segment end display string (default=']').
	1975
	1976
	1977
	1978	@end table
	1979
	1980	@node con usage example
	1981	@subsection Usage example
	1982	@example
[261bf62]	1983	cat file.txt \| tok \| lem -1 \| ser -e 'lexeme(dom)' \| con
[25ae32e]	1984	@end example
	1985
	1986
	1987	@node con hints
	1988	@subsection Hints
	1989
	1990	@command{con} is a rather slow program. Do not pass large amounts of
	1991	redundant text through this program. @command{con} works fine in the following
	1992	sequence:
	1993
	1994	@example
	1995	... \| grp -e EXPR \| ser -e EXPR \| con
	1996	@end example
	1997
	1998
	1999	@c ---------------------------------------------------------------------
	2000	@c ---------------------------------------------------------------------
	2001
	2002	@page
	2003	@node Auxiliary tools
	2004	@chapter Auxiliary tools
	2005
	2006	@menu
[d6a59ca]	2007	* compdic:: dictionary compiler
[25ae32e]	2008	* fla:: UTT file flattener
	2009	* unfla:: UTT file unflattener
	2010	@end menu
	2011
	2012
	2013	@page
[d6a59ca]	2014	@node compdic
	2015	@section compdic - the dictionary compiler
[25ae32e]	2016
	2017	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	2018	@item @strong{Authors:} @tab MichaÅ Stolarski, Tomasz ObrÄbski
[25ae32e]	2019	@item @strong{Component category:} @tab additional tool
	2020	@end multitable
	2021	@c
	2022
[d6a59ca]	2023	@command{compdic} compiles dictionaries in text format (@code{.dic} extension) into binary
	2024	(FST) format (@code{.bin} extension).
[25ae32e]	2025
[d6a59ca]	2026	Automaton representation of a dictionary is built using the OpenFst toolkit.
[25ae32e]	2027
[d6a59ca]	2028	In order for the compdic program to work you have to install the OpenFst toolkit in your system.
[25ae32e]	2029
	2030	Usage:
	2031	@example
[d6a59ca]	2032	compdic <dictionaryname>.dic <dictionaryname>.bin
[25ae32e]	2033	@end example
	2034
	2035	The file <dictionaryname>.bin will be generated.
	2036
	2037	@c @menu
	2038	@c * con command line options::
	2039	@c * con usage example::
	2040	@c * con hints::
	2041	@c @end menu
	2042
	2043
[e28a625]	2044	@c -------------------------------------------------------------------------------
	2045	@c FLA
	2046	@c -------------------------------------------------------------------------------
	2047
[25ae32e]	2048	@page
	2049	@node fla
	2050	@section fla - the UTT file flattener
	2051
	2052	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	2053	@item @strong{Authors:} @tab Tomasz ObrÄbski
[e28a625]	2054	@item @strong{Input format:} @tab UTT regular
	2055	@item @strong{Output format:} @tab UTT flattened
	2056	@item @strong{Required annotation:} @tab sen
[25ae32e]	2057	@end multitable
	2058	@c
	2059
[e28a625]	2060	@menu
	2061	* fla description::
	2062	@c * fla command line options::
	2063	@c * fla usage example::
	2064	@end menu
	2065
	2066
	2067	@node fla description
	2068	@subsection Description
	2069
[25ae32e]	2070	@command{fla} ``flattens'' a utt file by merging segments belonging
	2071	to one sentence in one line. Technically, end-of-line characters
	2072	('\n', ASCII code 10) are replaced with line-feed characters ('\f',
	2073	ASCII code 12). The flattening makes it possible to process UTT files
	2074	with such tools as @command{grep} or @command{sed} sentence by
	2075	sentence (used in @command{grp} and @command{mar}).
	2076
	2077	Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
	2078
	2079	Flattened files are still human-readible.
	2080
	2081	Usage:
	2082
	2083	@example
	2084	fla [<bosregex>]
	2085	@end example
	2086
	2087	The facultative argument is a regular expression describing segments
	2088	which should be treated as sentence beginnings (the test is: the
	2089	segment contains a fragment matching the @code{<bosregex>}). By
	2090	default, segments containing a field @code{BOS} are seeked.
	2091
[e28a625]	2092	@c -------------------------------------------------------------------------------
	2093	@c UNFLA
	2094	@c -------------------------------------------------------------------------------
[25ae32e]	2095
	2096	@page
	2097	@node unfla
	2098	@section unfla - the UTT file unflattener
	2099
	2100	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	2101	@item @strong{Authors:} @tab Tomasz ObrÄbski
[e28a625]	2102	@item @strong{Input format:} @tab UTT flattened
	2103	@item @strong{Output format:} @tab UTT regular
	2104	@item @strong{Required annotation:} @tab -
[25ae32e]	2105	@end multitable
	2106
[e28a625]	2107	@menu
	2108	* unfla description::
	2109	@c * fla command line options::
	2110	@c * fla usage example::
	2111	@end menu
	2112
	2113	@node unfla description
	2114	@subsection Description
[25ae32e]	2115	@command{unfla} transforms a flattened UTT file, produced by
	2116	@command{fla}, into the regular format by restoring end-of-line
	2117	characters.
	2118
	2119
	2120
	2121
	2122	@c ---------------------------------------------------------------------
	2123	@c USAGE EXAMPLES
	2124	@c ---------------------------------------------------------------------
	2125
	2126	@node Usage examples
	2127	@chapter Usage examples
	2128
	2129	@subsubheading Simple pipelines
	2130
	2131	@enumerate
	2132
	2133	@item tokenization
	2134
	2135	cat text \| tok > output1
	2136
	2137	@item morphological annotation (1)
	2138
	2139	simple dictionary based lemmatization
	2140
	2141	cat text \| tok \| lem > output1
	2142
	2143	@item morphological annotation (2)
	2144
	2145	1) perform dictionary-based lemmatization
	2146	4) guess descriptions for words which have no annotation
	2147
	2148	@example
	2149	cat text \| tok \| lem \| gue -S lem > output2
	2150	@end example
	2151
	2152	@item morphological annotation (3)
	2153
	2154	1) perform dictionary-based lemmatization
	2155	2) try to correct words with no annotation
	2156	3) perform dictionary-based lemmatization of corrected words
	2157	4) guess descriptions for words which still have no annotation
	2158
	2159	@example
	2160	cat text \| tok \| lem \| cor -p W -S lem \| lem -I cor \| gue -p W -S lem
	2161	@end example
	2162	@item spelling correction
	2163
	2164
	2165
	2166	@example
[e28a625]	2167	cat text \| tok \| egrep ' W ' \| lem \| egrep -v 'lem:' \| cor -1
[25ae32e]	2168	@end example
	2169
	2170	@item Expression extraction
	2171
	2172	Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
	2173
	2174	@example
	2175	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' -m \| kot > output4
	2176	@end example
	2177
	2178	@item A word in context
	2179
	2180	Extraction of text fragments containing a form of the lexeme 'rozmowa' in
	2181	the context of 5 preceeding and 5 succeeding corpus segments.
	2182
	2183	@example
	2184	cat text \| tok \| lem -1 \| ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m \| kot > output
	2185	@end example
	2186
	2187	@item generation of concordance table (1)
	2188
	2189	@example
	2190	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
	2191	@end example
	2192
	2193	10"
	2194
	2195	@item generation of concordance table (2)
	2196
	2197	The same as above but much faster
	2198
	2199	@example
	2200	cat text \| tok \| lem -1 \| \
	2201	grp -e 'cat(<V>) space lexeme(rozmowa)' \| \
	2202	ser -e 'cat(<V>) space lexeme(rozmowa)' \| \
	2203	con
	2204	@end example
	2205
	2206	2"
	2207
	2208	@item generation of concordance table (3)
	2209
	2210	Usually, one performs repetitively search over the same corpus. In
	2211	such case it is advisable to transform the corpus data into the format
	2212	required by @command{grp} first, and then use the preprocessed data.
	2213
	2214	As @command{grp} (@command{grep}) processes data faster then it is
	2215	read from the disk drive, the search time may be still shortened by
[e28a625]	2216	using file compression techniques. We suggest using the
	2217	@command{lzop} compressor/decompressor.
[25ae32e]	2218
	2219	@item the fastest way to search a large corpus
	2220
[e28a625]	2221	step 1: corpus preprocessing
[25ae32e]	2222
	2223	@example
	2224	cat corpus \| tok \| sen \| lem -1 \
[e28a625]	2225	\| fla \| lzop -7 > corpus.grp.lzo
[25ae32e]	2226	@end example
	2227
	2228	step 2: search
	2229
	2230	@example
[e28a625]	2231	lzop -cd corpus.grp.lzo \| unfla \| grp -e 'cat(<V>) space
[25ae32e]	2232	lexeme(rozmowa)' \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
	2233	@end example
	2234
	2235	@end enumerate
	2236
[e28a625]	2237	@c @subsubheading More complicated configurations
[25ae32e]	2238
	2239
[e28a625]	2240	@c @example
	2241	@c mknod fifo1 p
	2242	@c mknod fifo2 p
	2243	@c mknod fifo3 p
	2244	@c mknod fifo4 p
	2245	@c mknod fifo5 p
	2246
	2247	@c tok \| lem -p W -e fifo1 > fifo2 &
	2248	@c cor -e fifo3 < fifo1 \| lem > fifo4 &
	2249	@c gue < fifo3 > fifo5 &
	2250	@c sort -m fifo2 fifo4 fifo5
	2251
	2252	@c rm fifo?
	2253	@c @end example
[25ae32e]	2254
	2255
	2256	@c ---------------------------------------------------------------------
	2257	@c ---------------------------------------------------------------------
	2258
	2259	@c ---------------------------------------------------------------------
	2260	@c PMDBF DICTIONARY
	2261	@c ---------------------------------------------------------------------
	2262
	2263	@node PMDBF dictionary
	2264	@chapter PMDBF dictionary
	2265
	2266	UTT components come with lexical data derived from Polish
	2267	Morphological Database (PMDB).
	2268
	2269	@menu
	2270	* PMDBF files::
	2271	* PMDBF tag structure::
	2272	* PMDBF parts of speech::
	2273	* PMDBF morphosyntactic attributes::
	2274	@end menu
	2275
	2276	@node PMDBF files
	2277	@section Files
	2278
	2279	@node PMDBF tag structure
	2280	@section Tag structure
	2281
	2282	pos = [[:upper:]]+
	2283
	2284	attr = [[:upper:]]+
	2285
	2286	val = [[:lower:][:digit:]?!*+-] \| <[^>\n]+>
	2287
	2288	descr = pos ( / ( attr val + ) + ) ?
	2289
	2290	@node PMDBF parts of speech
	2291	@section Parts of speech
	2292
	2293	@multitable {ADJPRP} { adjectival-passive-participle }
	2294	@item @code{N} @tab noun
	2295	@item @code{NPRO} @tab nominal-pronoun
	2296	@item @code{NV} @tab deverbal-noun
	2297	@item @code{V} @tab verb
	2298	@item @code{BYC} @tab byc
	2299	@item @code{VNI} @tab non-inflected-verb
	2300	@item @code{ADJ} @tab adjective
	2301	@item @code{ADJPAP} @tab adjectival-passive-participle
	2302	@item @code{ADJPRP} @tab adjectival-present-participle
	2303	@item @code{ADJPP} @tab adjectival-past-participle
	2304	@item @code{ADJPRO} @tab adjectival-pronoun
	2305	@item @code{ADJNUM} @tab adjectival-numeral
	2306	@item @code{ADV} @tab adverb
	2307	@item @code{ADVANP} @tab adverbial-anterior-participle
	2308	@item @code{ADVPRP} @tab adverbial-present-participle
	2309	@item @code{ADVPRO} @tab adverbial-pronoun
	2310	@item @code{ADVNUM} @tab adverbial-numeral
	2311	@item @code{P} @tab preposition
	2312	@item @code{PPRO} @tab prep-noun-pronoun
	2313	@item @code{CONJ} @tab conjunction
	2314	@item @code{EXCL} @tab exclamation
	2315	@item @code{APP} @tab call
	2316	@item @code{ONO} @tab onomatopoeia
	2317	@item @code{PART} @tab particle
	2318	@item @code{NUMCRD} @tab cardinal-numeral
	2319	@item @code{NUMCOL} @tab collective-numeral
	2320	@item @code{NUMPAR} @tab partitive-numeral
	2321	@item @code{NUMORD} @tab ordinal-numeral
	2322	@end multitable
	2323
	2324	@node PMDBF morphosyntactic attributes
	2325	@section Morphosyntactic attributes
	2326
	2327	@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	2328	@c @headitem Attr @tab Val @tab Description
	2329	@item
	2330	@code{A} @tab @tab Aspect
	2331	@item
	2332	@tab @code{p} @tab perfect
	2333	@item
	2334	@tab @code{i} @tab imperfect.
	2335	@item
	2336	@item
	2337	@code{V} @tab @tab Verb-Form
	2338	@item
	2339	@tab @code{b} @tab infinitive,
	2340	@item
	2341	@tab @code{p} @tab personal,
	2342	@item
	2343	@tab @code{i} @tab impersonal.
	2344	@item
	2345	@item
	2346	@code{M} @tab @tab Mood
	2347	@item
	2348	@tab @code{d} @tab declarative,
	2349	@item
	2350	@tab @code{c} @tab conditional,
	2351	@item
	2352	@tab @code{i} @tab imperative.
	2353	@item
	2354	@item
	2355	@code{T} @tab @tab Tense
	2356	@item
	2357	@tab @code{a} @tab past,
	2358	@item
	2359	@tab @code{r} @tab present,
	2360	@item
	2361	@tab @code{f} @tab future.
	2362	@item
	2363	@item
	2364	@code{P} @tab @tab Person
	2365	@item
	2366	@tab @code{1} @tab 1,
	2367	@item
	2368	@tab @code{2} @tab 2,
	2369	@item
	2370	@tab @code{3} @tab 3.
	2371	@item
	2372	@item
	2373	@code{D} @tab @tab Degree
	2374	@item
	2375	@tab @code{p} @tab positive,
	2376	@item
	2377	@tab @code{c} @tab comparative,
	2378	@item
	2379	@tab @code{s} @tab superlative.
	2380	@item
	2381	@item
	2382	@code{N} @tab @tab Number
	2383	@item
	2384	@tab @code{s} @tab singular,
	2385	@item
	2386	@tab @code{p} @tab plural.
	2387	@item
	2388	@item
	2389	@code{C} @tab @tab Case
	2390	@item
	2391	@tab @code{n} @tab nominative,
	2392	@item
	2393	@tab @code{g} @tab genitive,
	2394	@item
	2395	@tab @code{d} @tab dative,
	2396	@item
	2397	@tab @code{a} @tab accusative,
	2398	@item
	2399	@tab @code{i} @tab instrumantal,
	2400	@item
	2401	@tab @code{l} @tab locative,
	2402	@item
	2403	@tab @code{v} @tab vocative.
	2404	@item
	2405	@code{G} @tab @tab Gender
	2406	@item
	2407	@tab @code{p} @tab masculine-personal,
	2408	@item
	2409	@tab @code{a} @tab masculine-animal,
	2410	@item
	2411	@tab @code{i} @tab masculine-inanimate,
	2412	@item
	2413	@tab @code{f} @tab feminine,
	2414	@item
	2415	@tab @code{n} @tab neuter.
	2416	@end multitable
	2417
	2418
	2419	@c ---------------------------------------------------------------------
	2420	@c ---------------------------------------------------------------------
	2421	@c
	2422	@c @node Examples
	2423	@c @chapter Examples
	2424
	2425	@c ----------------------------------------------------------------------
	2426	@c ----------------------------------------------------------------------
	2427
	2428	@node GNU Free Documentation License
	2429	@chapter GNU Free Documentation License
	2430
	2431	@c The GNU Free Documentation License.
	2432	@center Version 1.2, November 2002
	2433
	2434	@c This file is intended to be included within another document,
	2435	@c hence no sectioning command or @node.
	2436
	2437	@display
	2438	Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
	2439	51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
	2440
	2441	Everyone is permitted to copy and distribute verbatim copies
	2442	of this license document, but changing it is not allowed.
	2443	@end display
	2444
	2445	@enumerate 0
	2446	@item
	2447	PREAMBLE
	2448
	2449	The purpose of this License is to make a manual, textbook, or other
	2450	functional and useful document @dfn{free} in the sense of freedom: to
	2451	assure everyone the effective freedom to copy and redistribute it,
	2452	with or without modifying it, either commercially or noncommercially.
	2453	Secondarily, this License preserves for the author and publisher a way
	2454	to get credit for their work, while not being considered responsible
	2455	for modifications made by others.
	2456
	2457	This License is a kind of ``copyleft'', which means that derivative
	2458	works of the document must themselves be free in the same sense. It
	2459	complements the GNU General Public License, which is a copyleft
	2460	license designed for free software.
	2461
	2462	We have designed this License in order to use it for manuals for free
	2463	software, because free software needs free documentation: a free
	2464	program should come with manuals providing the same freedoms that the
	2465	software does. But this License is not limited to software manuals;
	2466	it can be used for any textual work, regardless of subject matter or
	2467	whether it is published as a printed book. We recommend this License
	2468	principally for works whose purpose is instruction or reference.
	2469
	2470	@item
	2471	APPLICABILITY AND DEFINITIONS
	2472
	2473	This License applies to any manual or other work, in any medium, that
	2474	contains a notice placed by the copyright holder saying it can be
	2475	distributed under the terms of this License. Such a notice grants a
	2476	world-wide, royalty-free license, unlimited in duration, to use that
	2477	work under the conditions stated herein. The ``Document'', below,
	2478	refers to any such manual or work. Any member of the public is a
	2479	licensee, and is addressed as ``you''. You accept the license if you
	2480	copy, modify or distribute the work in a way requiring permission
	2481	under copyright law.
	2482
	2483	A ``Modified Version'' of the Document means any work containing the
	2484	Document or a portion of it, either copied verbatim, or with
	2485	modifications and/or translated into another language.
	2486
	2487	A ``Secondary Section'' is a named appendix or a front-matter section
	2488	of the Document that deals exclusively with the relationship of the
	2489	publishers or authors of the Document to the Document's overall
	2490	subject (or to related matters) and contains nothing that could fall
	2491	directly within that overall subject. (Thus, if the Document is in
	2492	part a textbook of mathematics, a Secondary Section may not explain
	2493	any mathematics.) The relationship could be a matter of historical
	2494	connection with the subject or with related matters, or of legal,
	2495	commercial, philosophical, ethical or political position regarding
	2496	them.
	2497
	2498	The ``Invariant Sections'' are certain Secondary Sections whose titles
	2499	are designated, as being those of Invariant Sections, in the notice
	2500	that says that the Document is released under this License. If a
	2501	section does not fit the above definition of Secondary then it is not
	2502	allowed to be designated as Invariant. The Document may contain zero
	2503	Invariant Sections. If the Document does not identify any Invariant
	2504	Sections then there are none.
	2505
	2506	The ``Cover Texts'' are certain short passages of text that are listed,
	2507	as Front-Cover Texts or Back-Cover Texts, in the notice that says that
	2508	the Document is released under this License. A Front-Cover Text may
	2509	be at most 5 words, and a Back-Cover Text may be at most 25 words.
	2510
	2511	A ``Transparent'' copy of the Document means a machine-readable copy,
	2512	represented in a format whose specification is available to the
	2513	general public, that is suitable for revising the document
	2514	straightforwardly with generic text editors or (for images composed of
	2515	pixels) generic paint programs or (for drawings) some widely available
	2516	drawing editor, and that is suitable for input to text formatters or
	2517	for automatic translation to a variety of formats suitable for input
	2518	to text formatters. A copy made in an otherwise Transparent file
	2519	format whose markup, or absence of markup, has been arranged to thwart
	2520	or discourage subsequent modification by readers is not Transparent.
	2521	An image format is not Transparent if used for any substantial amount
	2522	of text. A copy that is not ``Transparent'' is called ``Opaque''.
	2523
	2524	Examples of suitable formats for Transparent copies include plain
	2525	@sc{ascii} without markup, Texinfo input format, La@TeX{} input
	2526	format, @acronym{SGML} or @acronym{XML} using a publicly available
	2527	@acronym{DTD}, and standard-conforming simple @acronym{HTML},
	2528	PostScript or @acronym{PDF} designed for human modification. Examples
	2529	of transparent image formats include @acronym{PNG}, @acronym{XCF} and
	2530	@acronym{JPG}. Opaque formats include proprietary formats that can be
	2531	read and edited only by proprietary word processors, @acronym{SGML} or
	2532	@acronym{XML} for which the @acronym{DTD} and/or processing tools are
	2533	not generally available, and the machine-generated @acronym{HTML},
	2534	PostScript or @acronym{PDF} produced by some word processors for
	2535	output purposes only.
	2536
	2537	The ``Title Page'' means, for a printed book, the title page itself,
	2538	plus such following pages as are needed to hold, legibly, the material
	2539	this License requires to appear in the title page. For works in
	2540	formats which do not have any title page as such, ``Title Page'' means
	2541	the text near the most prominent appearance of the work's title,
	2542	preceding the beginning of the body of the text.
	2543
	2544	A section ``Entitled XYZ'' means a named subunit of the Document whose
	2545	title either is precisely XYZ or contains XYZ in parentheses following
	2546	text that translates XYZ in another language. (Here XYZ stands for a
	2547	specific section name mentioned below, such as ``Acknowledgements'',
	2548	``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title''
	2549	of such a section when you modify the Document means that it remains a
	2550	section ``Entitled XYZ'' according to this definition.
	2551
	2552	The Document may include Warranty Disclaimers next to the notice which
	2553	states that this License applies to the Document. These Warranty
	2554	Disclaimers are considered to be included by reference in this
	2555	License, but only as regards disclaiming warranties: any other
	2556	implication that these Warranty Disclaimers may have is void and has
	2557	no effect on the meaning of this License.
	2558
	2559	@item
	2560	VERBATIM COPYING
	2561
	2562	You may copy and distribute the Document in any medium, either
	2563	commercially or noncommercially, provided that this License, the
	2564	copyright notices, and the license notice saying this License applies
	2565	to the Document are reproduced in all copies, and that you add no other
	2566	conditions whatsoever to those of this License. You may not use
	2567	technical measures to obstruct or control the reading or further
	2568	copying of the copies you make or distribute. However, you may accept
	2569	compensation in exchange for copies. If you distribute a large enough
	2570	number of copies you must also follow the conditions in section 3.
	2571
	2572	You may also lend copies, under the same conditions stated above, and
	2573	you may publicly display copies.
	2574
	2575	@item
	2576	COPYING IN QUANTITY
	2577
	2578	If you publish printed copies (or copies in media that commonly have
	2579	printed covers) of the Document, numbering more than 100, and the
	2580	Document's license notice requires Cover Texts, you must enclose the
	2581	copies in covers that carry, clearly and legibly, all these Cover
	2582	Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
	2583	the back cover. Both covers must also clearly and legibly identify
	2584	you as the publisher of these copies. The front cover must present
	2585	the full title with all words of the title equally prominent and
	2586	visible. You may add other material on the covers in addition.
	2587	Copying with changes limited to the covers, as long as they preserve
	2588	the title of the Document and satisfy these conditions, can be treated
	2589	as verbatim copying in other respects.
	2590
	2591	If the required texts for either cover are too voluminous to fit
	2592	legibly, you should put the first ones listed (as many as fit
	2593	reasonably) on the actual cover, and continue the rest onto adjacent
	2594	pages.
	2595
	2596	If you publish or distribute Opaque copies of the Document numbering
	2597	more than 100, you must either include a machine-readable Transparent
	2598	copy along with each Opaque copy, or state in or with each Opaque copy
	2599	a computer-network location from which the general network-using
	2600	public has access to download using public-standard network protocols
	2601	a complete Transparent copy of the Document, free of added material.
	2602	If you use the latter option, you must take reasonably prudent steps,
	2603	when you begin distribution of Opaque copies in quantity, to ensure
	2604	that this Transparent copy will remain thus accessible at the stated
	2605	location until at least one year after the last time you distribute an
	2606	Opaque copy (directly or through your agents or retailers) of that
	2607	edition to the public.
	2608
	2609	It is requested, but not required, that you contact the authors of the
	2610	Document well before redistributing any large number of copies, to give
	2611	them a chance to provide you with an updated version of the Document.
	2612
	2613	@item
	2614	MODIFICATIONS
	2615
	2616	You may copy and distribute a Modified Version of the Document under
	2617	the conditions of sections 2 and 3 above, provided that you release
	2618	the Modified Version under precisely this License, with the Modified
	2619	Version filling the role of the Document, thus licensing distribution
	2620	and modification of the Modified Version to whoever possesses a copy
	2621	of it. In addition, you must do these things in the Modified Version:
	2622
	2623	@enumerate A
	2624	@item
	2625	Use in the Title Page (and on the covers, if any) a title distinct
	2626	from that of the Document, and from those of previous versions
	2627	(which should, if there were any, be listed in the History section
	2628	of the Document). You may use the same title as a previous version
	2629	if the original publisher of that version gives permission.
	2630
	2631	@item
	2632	List on the Title Page, as authors, one or more persons or entities
	2633	responsible for authorship of the modifications in the Modified
	2634	Version, together with at least five of the principal authors of the
	2635	Document (all of its principal authors, if it has fewer than five),
	2636	unless they release you from this requirement.
	2637
	2638	@item
	2639	State on the Title page the name of the publisher of the
	2640	Modified Version, as the publisher.
	2641
	2642	@item
	2643	Preserve all the copyright notices of the Document.
	2644
	2645	@item
	2646	Add an appropriate copyright notice for your modifications
	2647	adjacent to the other copyright notices.
	2648
	2649	@item
	2650	Include, immediately after the copyright notices, a license notice
	2651	giving the public permission to use the Modified Version under the
	2652	terms of this License, in the form shown in the Addendum below.
	2653
	2654	@item
	2655	Preserve in that license notice the full lists of Invariant Sections
	2656	and required Cover Texts given in the Document's license notice.
	2657
	2658	@item
	2659	Include an unaltered copy of this License.
	2660
	2661	@item
	2662	Preserve the section Entitled ``History'', Preserve its Title, and add
	2663	to it an item stating at least the title, year, new authors, and
	2664	publisher of the Modified Version as given on the Title Page. If
	2665	there is no section Entitled ``History'' in the Document, create one
	2666	stating the title, year, authors, and publisher of the Document as
	2667	given on its Title Page, then add an item describing the Modified
	2668	Version as stated in the previous sentence.
	2669
	2670	@item
	2671	Preserve the network location, if any, given in the Document for
	2672	public access to a Transparent copy of the Document, and likewise
	2673	the network locations given in the Document for previous versions
	2674	it was based on. These may be placed in the ``History'' section.
	2675	You may omit a network location for a work that was published at
	2676	least four years before the Document itself, or if the original
	2677	publisher of the version it refers to gives permission.
	2678
	2679	@item
	2680	For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
	2681	the Title of the section, and preserve in the section all the
	2682	substance and tone of each of the contributor acknowledgements and/or
	2683	dedications given therein.
	2684
	2685	@item
	2686	Preserve all the Invariant Sections of the Document,
	2687	unaltered in their text and in their titles. Section numbers
	2688	or the equivalent are not considered part of the section titles.
	2689
	2690	@item
	2691	Delete any section Entitled ``Endorsements''. Such a section
	2692	may not be included in the Modified Version.
	2693
	2694	@item
	2695	Do not retitle any existing section to be Entitled ``Endorsements'' or
	2696	to conflict in title with any Invariant Section.
	2697
	2698	@item
	2699	Preserve any Warranty Disclaimers.
	2700	@end enumerate
	2701
	2702	If the Modified Version includes new front-matter sections or
	2703	appendices that qualify as Secondary Sections and contain no material
	2704	copied from the Document, you may at your option designate some or all
	2705	of these sections as invariant. To do this, add their titles to the
	2706	list of Invariant Sections in the Modified Version's license notice.
	2707	These titles must be distinct from any other section titles.
	2708
	2709	You may add a section Entitled ``Endorsements'', provided it contains
	2710	nothing but endorsements of your Modified Version by various
	2711	parties---for example, statements of peer review or that the text has
	2712	been approved by an organization as the authoritative definition of a
	2713	standard.
	2714
	2715	You may add a passage of up to five words as a Front-Cover Text, and a
	2716	passage of up to 25 words as a Back-Cover Text, to the end of the list
	2717	of Cover Texts in the Modified Version. Only one passage of
	2718	Front-Cover Text and one of Back-Cover Text may be added by (or
	2719	through arrangements made by) any one entity. If the Document already
	2720	includes a cover text for the same cover, previously added by you or
	2721	by arrangement made by the same entity you are acting on behalf of,
	2722	you may not add another; but you may replace the old one, on explicit
	2723	permission from the previous publisher that added the old one.
	2724
	2725	The author(s) and publisher(s) of the Document do not by this License
	2726	give permission to use their names for publicity for or to assert or
	2727	imply endorsement of any Modified Version.
	2728
	2729	@item
	2730	COMBINING DOCUMENTS
	2731
	2732	You may combine the Document with other documents released under this
	2733	License, under the terms defined in section 4 above for modified
	2734	versions, provided that you include in the combination all of the
	2735	Invariant Sections of all of the original documents, unmodified, and
	2736	list them all as Invariant Sections of your combined work in its
	2737	license notice, and that you preserve all their Warranty Disclaimers.
	2738
	2739	The combined work need only contain one copy of this License, and
	2740	multiple identical Invariant Sections may be replaced with a single
	2741	copy. If there are multiple Invariant Sections with the same name but
	2742	different contents, make the title of each such section unique by
	2743	adding at the end of it, in parentheses, the name of the original
	2744	author or publisher of that section if known, or else a unique number.
	2745	Make the same adjustment to the section titles in the list of
	2746	Invariant Sections in the license notice of the combined work.
	2747
	2748	In the combination, you must combine any sections Entitled ``History''
	2749	in the various original documents, forming one section Entitled
	2750	``History''; likewise combine any sections Entitled ``Acknowledgements'',
	2751	and any sections Entitled ``Dedications''. You must delete all
	2752	sections Entitled ``Endorsements.''
	2753
	2754	@item
	2755	COLLECTIONS OF DOCUMENTS
	2756
	2757	You may make a collection consisting of the Document and other documents
	2758	released under this License, and replace the individual copies of this
	2759	License in the various documents with a single copy that is included in
	2760	the collection, provided that you follow the rules of this License for
	2761	verbatim copying of each of the documents in all other respects.
	2762
	2763	You may extract a single document from such a collection, and distribute
	2764	it individually under this License, provided you insert a copy of this
	2765	License into the extracted document, and follow this License in all
	2766	other respects regarding verbatim copying of that document.
	2767
	2768	@item
	2769	AGGREGATION WITH INDEPENDENT WORKS
	2770
	2771	A compilation of the Document or its derivatives with other separate
	2772	and independent documents or works, in or on a volume of a storage or
	2773	distribution medium, is called an ``aggregate'' if the copyright
	2774	resulting from the compilation is not used to limit the legal rights
	2775	of the compilation's users beyond what the individual works permit.
	2776	When the Document is included in an aggregate, this License does not
	2777	apply to the other works in the aggregate which are not themselves
	2778	derivative works of the Document.
	2779
	2780	If the Cover Text requirement of section 3 is applicable to these
	2781	copies of the Document, then if the Document is less than one half of
	2782	the entire aggregate, the Document's Cover Texts may be placed on
	2783	covers that bracket the Document within the aggregate, or the
	2784	electronic equivalent of covers if the Document is in electronic form.
	2785	Otherwise they must appear on printed covers that bracket the whole
	2786	aggregate.
	2787
	2788	@item
	2789	TRANSLATION
	2790
	2791	Translation is considered a kind of modification, so you may
	2792	distribute translations of the Document under the terms of section 4.
	2793	Replacing Invariant Sections with translations requires special
	2794	permission from their copyright holders, but you may include
	2795	translations of some or all Invariant Sections in addition to the
	2796	original versions of these Invariant Sections. You may include a
	2797	translation of this License, and all the license notices in the
	2798	Document, and any Warranty Disclaimers, provided that you also include
	2799	the original English version of this License and the original versions
	2800	of those notices and disclaimers. In case of a disagreement between
	2801	the translation and the original version of this License or a notice
	2802	or disclaimer, the original version will prevail.
	2803
	2804	If a section in the Document is Entitled ``Acknowledgements'',
	2805	``Dedications'', or ``History'', the requirement (section 4) to Preserve
	2806	its Title (section 1) will typically require changing the actual
	2807	title.
	2808
	2809	@item
	2810	TERMINATION
	2811
	2812	You may not copy, modify, sublicense, or distribute the Document except
	2813	as expressly provided for under this License. Any other attempt to
	2814	copy, modify, sublicense or distribute the Document is void, and will
	2815	automatically terminate your rights under this License. However,
	2816	parties who have received copies, or rights, from you under this
	2817	License will not have their licenses terminated so long as such
	2818	parties remain in full compliance.
	2819
	2820	@item
	2821	FUTURE REVISIONS OF THIS LICENSE
	2822
	2823	The Free Software Foundation may publish new, revised versions
	2824	of the GNU Free Documentation License from time to time. Such new
	2825	versions will be similar in spirit to the present version, but may
	2826	differ in detail to address new problems or concerns. See
	2827	@uref{http://www.gnu.org/copyleft/}.
	2828
	2829	Each version of the License is given a distinguishing version number.
	2830	If the Document specifies that a particular numbered version of this
	2831	License ``or any later version'' applies to it, you have the option of
	2832	following the terms and conditions either of that specified version or
	2833	of any later version that has been published (not as a draft) by the
	2834	Free Software Foundation. If the Document does not specify a version
	2835	number of this License, you may choose any version ever published (not
	2836	as a draft) by the Free Software Foundation.
	2837	@end enumerate
	2838
	2839	@page
	2840	@heading ADDENDUM: How to use this License for your documents
	2841
	2842	To use this License in a document you have written, include a copy of
	2843	the License in the document and put the following copyright and
	2844	license notices just after the title page:
	2845
	2846	@smallexample
	2847	@group
	2848	Copyright (C) @var{year} @var{your name}.
	2849	Permission is granted to copy, distribute and/or modify this document
	2850	under the terms of the GNU Free Documentation License, Version 1.2
	2851	or any later version published by the Free Software Foundation;
	2852	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
	2853	Texts. A copy of the license is included in the section entitled ``GNU
	2854	Free Documentation License''.
	2855	@end group
	2856	@end smallexample
	2857
	2858	If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
	2859	replace the ``with@dots{}Texts.'' line with this:
	2860
	2861	@smallexample
	2862	@group
	2863	with the Invariant Sections being @var{list their titles}, with
	2864	the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
	2865	being @var{list}.
	2866	@end group
	2867	@end smallexample
	2868
	2869	If you have Invariant Sections without Cover Texts, or some other
	2870	combination of the three, merge those two alternatives to suit the
	2871	situation.
	2872
	2873	If your document contains nontrivial examples of program code, we
	2874	recommend releasing these examples in parallel under your choice of
	2875	free software license, such as the GNU General Public License,
	2876	to permit their use in free software.
	2877
	2878	@c Local Variables:
	2879	@c ispell-local-pdict: "ispell-dict"
	2880	@c End:
	2881
	2882
	2883	@c ---------------------------------------------------------------------
	2884	@c ---------------------------------------------------------------------
	2885
	2886	@node Reporting bugs
	2887	@chapter Reporting bugs
	2888
	2889	Report bugs to <obrebski@@amu.edu.pl>.
	2890
	2891	@c ---------------------------------------------------------------------
	2892	@c ---------------------------------------------------------------------
	2893
	2894	@c @node Copyright
	2895	@c @chapter Copyright
	2896	@c
[9ace5d2]	2897	@c Copyright 2004 by Tomasz ObrÄbski
[25ae32e]	2898	@c This software is free for research and educational use.
	2899
	2900	@c ---------------------------------------------------------------------
	2901	@c ---------------------------------------------------------------------
	2902
	2903	@node Author
	2904	@chapter Author
	2905
	2906
	2907	@bye

Note: See TracBrowser for help on using the repository browser.

Download in other formats: