Context Navigation

utt.texinfo @ e28a625

help

Last change on this file since e28a625 was e28a625, checked in by obrebski <obrebski@…>, 18 years ago

Ta linia i następne zostaną zignorowane--

M app/dist/files/README

uaktualnione

M app/doc/utt.texinfo

dopiski

M app/src/gue/Makefile

statyczne biblioteki

M app/src/cor/cmdline_cor.ggo

usuniecie nie dzialajacych parametrow

M app/src/cor/Makefile

statyczne biblioteki

M app/src/common/cmdline_common.ggo

?

M app/src/kor/Makefile

statyczne biblioteki

M app/src/lem/Makefile

statyczne biblioteki

M lang/dist/tarball/Makefile

pakowanie modulow jezykowych po jednym

M lang/Makefile

-"-

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@61 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

Property mode set to 100644

File size: 80.0 KB

Rev	Line
[25ae32e]	1	\input texinfo @c --texinfo--
	2	@documentencoding ISO-8859-2
	3	@c @documentlanguage pl
	4
	5	@c %**start of header
	6	@setfilename utt.info
	7	@settitle UAM Text Tools v0.90
	8	@c %**end of header
	9
	10	@copying
[261bf62]	11	This manual is for UAM Text Tools (version 0.90, October, 2008)
[25ae32e]	12
[19760ef]	13	Copyright @copyright{} 2005, 2007 Tomasz ObrÃªbski, MichaÂ³ Stolarski, Justyna Walkowska, PaweÂ³ Konieczka.
[25ae32e]	14
	15	Permission is granted to copy, distribute and/or modify this document
[261bf62]	16	under the terms of the GNU Free Documentation License, Version 1.2 or
	17	any later version published by the Free Software Foundation; with no
	18	Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
	19	copy of the license is included in the section entitled GNU Free
	20	Documentation License,,GNU Free Documentation License.
[25ae32e]	21
	22	@c @quotation
	23	@c Permission is granted to ...
	24	@c No permission is granted until the document is completed.
	25	@c @end quotation
	26	@end copying
	27
	28
	29	@titlepage
	30	@title UAM Text Tools 0.90 - User Manual
	31	@subtitle edition 0.01, @today
	32	@subtitle status: prescript
	33	@author by Justyna Walkowska, Tomasz Obr@,{}ebski and Micha@l{} Stolarski
	34	@page
	35	@vskip 0pt plus 1filll
	36	@insertcopying
	37	@end titlepage
	38
	39	@contents
	40
	41	@c @paragraphindent none
	42
	43	@iftex
	44	@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
	45	@end iftex
	46
	47	@c @headings off
	48	@c @everyheading LEM(1) @\| @\| LEM(1)
	49	@everyfooting @today @c @\| @thispage @\|
	50
	51	@ifnottex
	52
	53	@node Top
	54	@top UTT - UAM Text Tools
	55
	56	@insertcopying
	57
	58	@menu
	59	* General information::
	60	* UTT file format::
	61	* Configuration files::
	62	* UTT components::
	63	* Auxiliary tools::
	64	* Usage examples::
	65	* PMDBF dictionary::
	66	@c * Examples::
	67	@c * Copyright::
	68	* GNU Free Documentation License::
	69	* Reporting bugs::
	70	* Author::
	71	@end menu
	72	@end ifnottex
	73
	74
	75	@c ----------------------------------------------------------------------
	76
	77	@node General information
	78	@chapter General information
	79
	80	UAM Text Tools (UTT) is a package of language processing tools
	81	developed at Adam Mickiewicz University. Its functionality includes:
	82
	83	@itemize @bullet
	84
	85	@item
	86	tokenization
	87	@item
	88	dictionary-based morphological analysis
	89	@item
	90	heuristic morphological analysis of unknown words
	91	@item
	92	spelling correction
	93	@item
	94	pattern search
	95	@item
	96	sentence splitting
	97	@item
	98	generation of concordance tables
	99	@end itemize
	100
	101	The toolkit is destined for processing of raw (not annotated)
	102	unrestricted text for any conceivable purpose.
	103
	104	The system is organized as a collection of command-line programs, each
	105	performing one operation, e.g. tokenization, lemmatization, spelling
	106	correction. The components are independent one from another, the
	107	unifying element being the uniform i/o file format.
	108
	109	The components may be combined in various ways to provide various text
	110	processing services. Also new components supplied by the used may be
	111	easily incorporated into the system provided that they respect the i/o
	112	file format conventions.
	113
	114	UTT component programs does not depend on any specific tagset or
	115	morphological description format.
	116
	117	UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
	118	the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
	119
	120	The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use.
	121
	122
	123	List of contributors:
	124
	125	@itemize
	126	@item Pawel Konieczka
	127	@item Tomasz Obrebski
	128	@item Michal Stolarski
	129	@item Marcin Walas
	130	@item Justyna Walkowska
[04ae414]	131	@item Pawel Werenski
[25ae32e]	132	@end itemize
	133
	134	@c ----------------------------------------------------------------------
	135	@c ---------------------------------------------------------------------
	136
	137	@node UTT file format
	138	@chapter UTT file format
	139
	140	A UTT file contains annotation of a text. It consists of a sequence of
	141	segments. Each segment explicitly refers to a continuous piece of the
	142	text and provides some information on it.
	143
	144	@section Segment format
	145
	146	A segment occupies one line of a UTT file and consists of
	147	space-separated fields:
	148
	149
	150	@quotation
	151	@sp 1
	152	[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
	153	@sp 1
	154	@end quotation
	155
	156	@table @var
	157
	158	@item @var{start}
	159	Non-negative integer value indicating the position in the source text where the
	160	segment starts.
	161
	162	@item @var{length}
	163	Non-negative integer value indicating the length of the segment.
	164
	165	@item @var{type}
	166	A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
	167	@var{type} reflects the main classification of segments -
	168	into words, numbers, punctuation marks, meta-text markers.
	169	@xref{tok output,,tok output}, for description of automatically recognized type markers.
	170
	171	@item @var{form}
	172	This field contains the textual form of the segment or the special
	173	symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
	174
	175	The characters or character sequences that have special meaning in the
	176	@var{form} field are enumerated below.
	177
	178	Characters with special meaning:
	179
	180	@itemize
	181	@item @code{_} - space character
	182	@item @code{*} - undefined contents
	183	@end itemize
	184
	185	Escape sequences:
	186
	187	@itemize
	188	@item @code{\n} - new line
	189	@item @code{\t} - tabulation
	190	@item @code{\r} - carriage return
	191
	192	@item @code{\_} - the @code{_} character
	193	@item @code{\} - the @code{} character
	194	@item @code{\\} - the @code{\} character
	195
	196	@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
	197	@end itemize
	198
	199	@item @var{annotation1}
	200	@item @var{annotation2}
	201	@item ...
	202	Annotation fields have the following format:
	203
	204	@var{longname} @code{:} @var{value}
	205
	206	or
	207
	208	@var{shortname} @var{value}
	209
	210	where @var{longname} is a string of alphanumeric characters
	211	(isalnum() test), @var{shortname} - a single non-alphanumeric character
	212	(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
	213
	214	@end table
	215
	216
	217	Only two fields are mandatory: @var{type} and @var{form}. All other fields
	218	may be absent. In the case when only one number precedes the
	219	@var{type} field, it is interpreted as the @var{START} position.
	220
	221	If the @var{length} field is ommited, the length of the segment is the
	222	length of the @var{form} field, except when the value of the
	223	@var{form} field is @code{*} -- in this case, the length is assumed to
	224	be 0.
	225
	226	If the @var{start} field is also absent, the segment is assumed to directly
	227	follow the preceding one.
	228
	229	@c Conventions:
	230
	231	@c Annotation fields with predefined meaning:
	232
	233	@c @itemize
	234	@c @item @code{!} - UTT components are allowed to modify the contents of
	235	@c the @var{form} field (e.g. spelling correction does this). If this happens the
	236	@c original form of the segment have to be placed in the @code{!}-field.
	237	@c @item @code{@@} - morphological description
	238	@c @item @code{=} - node identifier assignment (used in graph encoding)
	239	@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
	240	@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
	241	@c @end itemize
	242
	243	Segments of length 0 may be used to mark file positions with some
	244	information. See e.g. BOS and EOS (beginning/end of sentence) markers
	245	in the example below.
	246
	247	Example:
	248
	249	sentence: @samp{Piszemy dobre progrumy.}
	250
	251	@example
	252	0000 00 BOS *
[19760ef]	253	0000 07 W Piszemy lem:pisaÃŠ,V
[25ae32e]	254	0007 01 S _
	255	0008 05 W dobre lem:dobry,ADJ
	256	0013 01 S _
	257	0014 08 W progrumy cor:programy lem:program,N
	258	0022 01 P .
	259	0023 00 EOS *
	260	0023 01 S _
	261	0024 00 BOS *
	262	0024 11 W Warszawiacy lem:Warszawiak,N
	263	0035 01 S _
[19760ef]	264	0036 03 W teÂ¿
[25ae32e]	265	0039 01 P .
	266	0040 00 EOS *
	267
	268	@end example
	269
	270	@example
	271	0000 BOS *
[19760ef]	272	0000 W Piszemy lem:pisaÃŠ,V
[25ae32e]	273	0007 S _
	274	0008 W dobre lem:dobry,ADJ
	275	0013 S _
	276	0014 W progrumy cor:programy lem:program,N
	277	0022 P .
	278	0023 EOS *
	279	@end example
	280
	281	Posion information may be provided only for some types of segments:
	282
	283	@example
	284	0000 BOS *
[19760ef]	285	W Piszemy lem:pisaÃŠ,V
[25ae32e]	286	S _
	287	W dobre lem:dobry,ADJ
	288	S _
	289	W progrumy cor:programy lem:program,N
	290	P .
	291	EOS *
	292	S _
	293	0024 BOS *
	294	W Warszawiacy lem:Warszawiak,N
	295	S _
[19760ef]	296	W teÂ¿
[25ae32e]	297	P .
	298	EOS *
	299	@end example
	300
	301	Position/length information may be provided only when necessary:
	302
	303	@example
	304	0000 04 N *
	305	0000 N 12
	306	P .
	307	N 5
	308	S _
	309	W km
	310	@end example
	311
	312	@section UTT File
	313
	314	A UTT file consists of a sequence of segments. The same text position
	315	may be covered by multiple segments. In cosequence, ambiguous text
	316	segmentation and ambiguous annotation may be represented.
	317
	318	There are two structural requirements a valid UTT-formatted file
	319	has to meet:
	320
	321	@itemize @bullet
	322
	323	@item
	324	segments have to be sorted with respect to the @var{position} field,
	325
	326	@item
	327	for each
	328	segment ending at position @var{n}, either there must be a segment starting at
	329	position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
	330	for each segment starting at position @var{n}, either there must be a segment
	331	ending at position @var{n-1}, or the position @var{n-1} must not be covered
	332	by any segment.
	333
	334	@end itemize
	335
	336	A valid annotation for the text fragment
	337	@example
	338	12.5 km
	339	@end example
	340
	341	may be
	342
	343	@example
	344	0000 02 N 12
	345	0000 04 N 12.5
	346	0002 01 P .
	347	0003 01 N 5
	348	0004 01 S _
	349	0005 02 W km
	350	@end example
	351
	352	but not
	353
	354	@example
	355	0000 02 N 12
	356	0000 04 N 12.5
	357	0004 01 S _
	358	0005 02 W km
	359	@end example
	360
[261bf62]	361	because in the latter example the first segment (starting at position
	362	0000, 2 characters long) ends at position @var{n}=0001 which is
	363	covered by the second segment and no segment starts at position
	364	@var{n+2}=0002.
	365
	366
	367	@section Flattened UTT file
	368
[e28a625]	369	A UTT file format has two variants: regular and flattened. The regular
[261bf62]	370	format was described above. In the flattened format some of the
	371	end-of-line characters are replaced with line-feed characters.
	372
	373	The flatten format is basically used to represent whole sentences as
	374	single lines of the input file (all intrasentential end-of-line
	375	characters are replaced with line-feed characters).
	376
	377	This technical trick permits to perform certain text
	378	processing operations on entire sentences with the use of such tools as
	379	@command{grep} (see @command{grp} component) or @command{sed} (see @command{mar} component).
	380
	381	The conversion between the two formats is performed by the tools:
	382	@command{fla} and @command{unfla}.
[25ae32e]	383
	384	@section Character encoding
	385
	386	The UTT component programs accept only 1-byte character encoding, such
[261bf62]	387	as ISO, ANSI, DOS.
[25ae32e]	388
	389
	390	@c @section Formats
	391
	392	@c @unnumberedsubsubsec Basic format
	393
	394	@c While processing large amounts of the overhead related with explicit
	395	@c ... of the start position and segment length becomes ... . Therefore,
	396	@c for efficiency reasons certain shortcuts are possible:
	397
	398	@c @unnumberedsubsubsec Relative start position
	399
	400	@c Start position may be given as relative distance from the last
	401	@c absolut position.
	402
	403	@c @unnumberedsubsubsec Absent length
	404
	405	@c Segment length may by omitted. Normally it can be restored by counting
	406	@c the length of the @emph{form field}. For segments with the special value
	407	@c @code{*} in the @emph{form field} length 0 is assumed.
	408
	409	@c @unnumberedsubsubsec Absent length and start position
	410
	411	@c Both start position and segment length may be omitted. In this format
	412	@c each segment is assumed to follow the previous one. This format is,
	413	@c therefore, suitable only for unambiguously tagged text
	414	@c (0-length markers can be still used.)
	415
	416
	417	@c @table @code
	418	@c @item AL
	419	@c @code{1234 03 W kot}
	420	@c @item RL
	421	@c @code{+56 03 W kot}
	422	@c @item A
	423	@c @code{1234 W kot}
	424	@c @item R
	425	@c @code{+56 W kot}
	426	@c @item 0
	427	@c @code{W kot}
	428	@c @end table
	429
	430
[19760ef]	431	@c [JAK UZYSKAÃ POLSKIE CZCIONKI W DVI???]
[25ae32e]	432
	433	@macro parhelp
	434	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
	435	Print help.
	436	@end macro
	437
	438
	439	@macro parversion
	440	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
	441	Print version information.
	442	@end macro
	443
	444	@macro parinteractive
	445	@item @b{@minus{}@minus{}interactive, @minus{}i}
	446	This option toggles interactive mode, which is by default off. In the
	447	interactive mode the program does not buffer the output.
	448	@end macro
	449
	450
	451	@c @macro parfile
	452	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	453	@c Input file name.
	454	@c If this option is absent or equal to '@minus{}', the program
	455	@c reads from the standard input.
	456	@c @end macro
	457
	458
	459	@c @macro paroutput
	460	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	461	@c Regular output file name. To regular output the program sends segments
	462	@c which it successfully processed and copies those which were not
	463	@c subject to processing. If this option is absent or equal to
	464	@c '@minus{}', standard output is used.
	465	@c @end macro
	466
	467	@c @macro parfail
	468	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
	469	@c Fail output file name. To fail output the program copies the segments
	470	@c it failed to process. If this option is absent or equal to
	471	@c '@minus{}', standard output is used.
	472	@c @end macro
	473
	474
	475	@c @macro parcopy
	476	@c @item @b{@minus{}@minus{}copy, @minus{}c}
	477	@c Copy succesfully processed segments to regular output also in their
	478	@c original input form.
	479	@c @end macro
	480
	481
	482	@macro parinputfield
	483	@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
	484	The field containing the input to the program. The default is the
	485	@var{form} field. The fields @var{position}, @var{length}, @var{type},
	486	and @var{form} are referred to as @code{1}, @code{2}, @code{3},
	487	@code{4}, respectively.
	488	@end macro
	489
	490
	491	@macro paroutputfield
	492	@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
	493	The name of the field added by the program. The default is the name of the program.
	494	@end macro
	495
	496
	497	@macro pardictionary
	498	@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
	499	Dictionary file name.
	500	@end macro
	501
	502
	503	@macro parprocess
	504	@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
	505	Process segments with the specified value in the @var{type} field.
	506	Multiple occurences of this option are allowed and are interpreted as
	507	disjunction. If this option is absent, all segments are processed.
	508	@end macro
	509
	510
	511	@macro parselect
	512	@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
	513	Select for processing only segments in which the field named
	514	@var{fieldname} is present. Multiple occurences of this option are
	515	allowed and are interpreted as conjunction of conditions. If this
	516	option is absent, all segments are processed.
	517	@end macro
	518
	519
	520	@macro parunselect
	521	@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
	522	Select for processing only segments in which the field @var{fieldname}
	523	is absent. Multiple occurences of this option are allowed and are
	524	interpreted as conjunction of conditions. If this option is absent,
	525	all segments are processed.
	526	@end macro
	527
	528
	529	@macro paroneline
	530	@item @b{@minus{}@minus{}one-line}
	531	This option makes the program print ambiguous annotation in one output
	532	line by generating multiple annotation fields. By default when
	533	ambiguous annotation may be produced for a segment, the segment is
	534	multiplicated and each of the annotations is added to separate copy of
	535	the segment.
	536	@end macro
	537
	538
	539	@macro paronefield
	540	@item @b{@minus{}@minus{}one-field, @minus{}1}
	541	This option makes the program print ambiguous annotation in one
	542	annotation field. By default when ambiguous annotation may be produced
	543	for a segment, the segment is multiplicated and each of the
	544	annotations is added to separate copy of the segment.
	545
	546	This option is useful when working with @command{kot} or @command{con}.
	547	@end macro
	548
	549
	550	@c ---------------------------------------------------------------------
	551	@c CONFIGURATION FILES
	552	@c ---------------------------------------------------------------------
	553
	554	@node Configuration files
	555	@chapter Configuration files
	556
	557	Values for all command line options accepted by a component
	558	may be set in configuration files. The default location of the
	559	configuration files for a component named @command{@var{program}} are
	560
	561	@example
[246900a]	562	@file{/usr/local/etc/utt/@var{program}.conf}
[25ae32e]	563	@end example
	564
	565	for system-wide configuration file and
	566
	567	@example
[246900a]	568	@file{~/.utt/@var{program}.conf}
[25ae32e]	569	@end example
	570
	571	for user configuration file.
	572
	573	@c The configuration file to load may be also specified with the
	574	@c @option{--config} option. Configuration file need not be provided.
	575
	576	For each option, the value is set according to the following priority:
	577
	578	@itemize
	579	@item command line
	580	@c @item configuration file indicated with @option{--config} option
	581	@item user configuration file (or configuration file indicated with the @option{--config} option)
	582	@item system-wide configuration file
	583	@end itemize
	584
	585	Parameter values are specified in the following format:
	586
	587	@var{parametername}=@var{value}
	588
	589	where @var{parametername} is the short or long name of an option accepted by
	590	the program, or
	591
	592	@var{parametername}
	593
	594	if the option does not need arguments.
	595
	596	You can introduce comments to configuration files using the # sign.
	597
	598	If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
	599
	600	@c The equal sign may be omitted.
	601
	602
	603	@quotation Tip
	604	If you have two (or more) frequently used sets of options for the same
	605	program (eg. lem with PMDBF dictionary and lem with a user dictionary)
	606	a good solution is to create two soft links to lem, called
	607	eg. lemg and lemu and specify their configuration in files lemg.conf
	608	and lemu.conf respectively.
	609	@end quotation
	610
	611	@c ---------------------------------------------------------------------
	612	@c COMPONENTS
	613	@c ---------------------------------------------------------------------
	614
	615	@node UTT components
	616	@chapter UTT components
	617
	618	UTT components are of three types:
	619
	620	@menu
	621	Sources: programs which read non-UTT data (e.g. raw text) and produce output
	622	in UTT format
	623	* tok:: a tokenizer
	624
	625	Filters: programs which read and produce UTT-formatted data
	626	* lem:: a morphological analyzer
	627	* gue:: a morphological guesser
[261bf62]	628	* cor:: a simple spelling corrector
	629	* kor:: a more elaborated spelling corrector
[25ae32e]	630	* sen:: a sentensizer
	631	* ser:: a pattern search tool (marks matches)
[261bf62]	632	* mar:: a pattern search tool (introduces arbitrary markers into the text)
[25ae32e]	633	* grp:: a pattern search tool (selects sentences containing a match)
[261bf62]	634	@c * gph:: a word-graph annotation tool::
	635	@c * dgp:: a dependency parser
[25ae32e]	636
	637	Sinks: programs which read UTT data and produce output in another format
	638	* kot:: an untokenizer
	639	* con:: a concordance table generator
	640	@end menu
	641
	642	@c ---------------------------------------------------------------------
	643	@c TOK
	644	@c ---------------------------------------------------------------------
	645
	646	@page
	647	@node tok
	648	@section tok - a tokenizer
	649
	650	@c ----------------------------------------
	651
	652	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[19760ef]	653	@item @strong{Authors:} @tab Tomasz ObrÃªbski
[25ae32e]	654	@item @strong{Component category:} @tab source
[261bf62]	655	@item @strong{Input format:} @tab raw text file
	656	@item @strong{Output format:} @tab UTT regular
	657	@item @strong{Required annotation:} @tab -
[25ae32e]	658	@end multitable
	659
	660
	661	@menu
	662	* tok description::
	663	* tok input::
	664	* tok output::
	665	* tok command line options::
	666	* tok example::
	667	@end menu
	668
	669	@node tok description
	670	@subsection Description
	671
	672	@code{tok} is a simple program which reads a text file and identifies
	673	tokens on the basis of their orthographic form. The type of the token
	674	is printed as the @var{type} field.
	675
	676	@node tok input
	677	@subsection Input
	678
	679	Raw text.
	680
	681	@node tok output
	682	@subsection Output
	683
	684	UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
	685
	686	@itemize
	687
	688	@item @code{W}
	689	(word)
	690	- continuous sequence of letters
	691
	692	@item @code{N}
	693	(number)
	694	- continuous sequence of digits
	695
	696	@item @code{S}
	697	(space)
	698	- continuous sequence of space characters
	699
	700	@item @code{P}
	701	(punctuation mark)
	702	- single printable characters not belonging to any of the other classes
	703
	704	@item @code{B}
	705	(unprintable character)
	706	- single unprintable character
	707
	708	@end itemize
	709
	710
	711
	712	@node tok command line options
	713	@subsection Command line options
	714
	715	@table @code
	716
	717	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
	718	Print help.
	719
	720	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
	721	Print version information.
	722
	723	@item @b{@minus{}@minus{}interactive, @minus{}i}
	724	This option toggles interactive mode, which is by default off. In the
	725	interactive mode the program does not buffer the output.
	726
	727	@end table
	728
	729	@node tok example
	730	@subsection Example
	731
	732	Input:
	733
	734	@example
	735	Piszemy dobre programy.
	736	@end example
	737
	738	Output:
	739
	740	@example
	741	0000 07 W Piszemy
	742	0007 01 S _
	743	0008 05 W dobre
	744	0013 01 S _
	745	0014 08 W programy
	746	0022 01 P .
	747	0023 01 S \n
	748	@end example
	749
	750
	751	@c ---------------------------------------------------------------------
	752	@c SEN
	753	@c ---------------------------------------------------------------------
	754
	755	@c @node sen - sentencizer
	756	@c @chapter sen - sentencizer
	757
[19760ef]	758	@c Authors: Tomasz ObrÃªbski
[25ae32e]	759
	760	@c ---------------------------------------------------------------------
	761	@c LEM
	762	@c ---------------------------------------------------------------------
	763
	764	@page
	765	@node lem
	766	@section lem - morphological analyzer
	767
	768	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[19760ef]	769	@item @strong{Authors:} @tab Tomasz ObrÃªbski, MichaÂ³ Stolarski
[25ae32e]	770	@item @strong{Component category:} @tab filter
[261bf62]	771	@item @strong{Input format:} @tab UTT regular
	772	@item @strong{Output format:} @tab UTT regular
	773	@item @strong{Required annotation:} @tab tok
[25ae32e]	774	@end multitable
	775
	776	@menu
	777	* lem description::
	778	* lem command line options::
	779	* lem input::
	780	* lem output::
	781	* lem example::
	782	* lem dictionaries::
	783	* lem hints::
	784	@end menu
	785
	786	@node lem description
	787	@subsection Description
	788
	789	@command{lem} performs morphological analysis of a simple orthographic
	790	word, returning all its possible morphological annotations,
	791	disregarding the context.
	792
	793	@c ----------------------------------------
	794
	795	@node lem command line options
	796	@subsection Command line options
	797
	798	@table @code
	799	@parhelp
	800	@parversion
	801	@parinteractive
	802	@c @parfile
	803	@c @paroutput
	804	@c @parfail
	805	@c @parcopy
	806	@parinputfield
	807	@paroutputfield
	808	@pardictionary
	809	@parprocess
	810	@parselect
	811	@parunselect
	812	@paroneline
	813	@paronefield
	814	@end table
	815
	816	@c ----------------------------------------
	817
	818	@node lem input
	819	@subsection Input
	820
	821	Lem reads a UTT file and processes the value of the @var{form} field
	822	(the input field may be changed with @option{--input-field} option).
	823
	824	@node lem output
	825	@subsection Output
	826
	827	@command{lem} adds a new annotation field, whose default name is @code{lem}. In
	828	case of ambiguity either the segment is multiplicated (default),
	829	multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
	830	annotation is produced as the value of single @code{lem} field (option
	831	@option{--one-field,-1}):
	832
	833	@itemize @bullet
	834
	835	@item
	836	unambiguous value format:
	837
	838	@example
	839	<lemma>,<descr>
	840	@end example
	841
	842	@item
	843	ambiguous value format (@option{--one-field} option)
	844
	845
	846	@example
	847	<lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
	848	@end example
	849
	850	(alternative descriptions for the same lemma are separated by commas,
	851	alternative lemmata are separated by semicolons.)
	852
	853	@end itemize
	854
	855	@node lem example
	856	@subsection Example
	857
	858	Input:
	859
	860	@example
	861	0000 07 W Piszemy
	862	0007 01 S _
	863	0008 05 W dobre
	864	0013 01 S _
	865	0014 08 W programy
	866	0022 01 P .
	867	0023 01 B \n
	868	@end example
	869
	870	Output (default):
	871
	872	@example
[19760ef]	873	0000 07 W Piszemy lem:pisaÃŠ,V/AiVpMdTrfNpP1
[25ae32e]	874	0007 01 B _
	875	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
	876	0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
	877	0013 01 B _
	878	0014 08 W programy lem:program,N/GiNpCa
	879	0014 08 W programy lem:program,N/GiNpCn
	880	0014 08 W programy lem:program,N/GiNpCv
	881	0022 01 P .
	882	0023 01 B \n
	883	@end example
	884
	885	Output (@option{--one-line} option):
	886
	887	@example
[19760ef]	888	0000 07 W Piszemy lem:pisaÃŠ,V/AiVpMdTrfNpP1
[25ae32e]	889	0007 01 S _
	890	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
	891	0013 01 S _
	892	0014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
	893	0022 01 P .
	894	0023 01 S \n
	895	@end example
	896
	897	Output (@option{--one-field} option):
	898
	899	@example
[19760ef]	900	0000 07 W Piszemy lem:pisaÃŠ,V/AiVpMdTrfNpP1
[25ae32e]	901	0007 01 S _
	902	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
	903	0013 01 S _
	904	0014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
	905	0022 01 P .
	906	0023 01 S \n
	907	@end example
	908
	909	@c ----------------------------------------
	910
	911	@node lem dictionaries
	912	@subsection Dictionaries
	913
	914	@command{lem} requires a dictionary. The dictionary may be provided in
	915	one of two formats: in text (source) format or in binary (fsa) format.
	916
	917	@subsubheading Text format
	918
	919	Dictionary entries have the following structure:
	920
	921	@example
	922	<form>;<lemma>,<descr>[;<lemma>,<descr>]
	923	@end example
	924
	925	@var{lemma} may be given explicitly or in the cut-add format:
	926
	927	@example
	928	@code{[<cut1><add1>-]<cut2><add2>}
	929	@end example
	930
	931	meaning: replace prefix of length @code{<cut1>} with
	932	string @code{<add1>}, replace suffix of length @code{<cut2>} with string
	933	@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
[19760ef]	934	@samp{kot}, @code{3-4aÂ³y} transforms @samp{najbielsi} into @samp{biaÂ³y}
[25ae32e]	935
	936	Each dictionary entry must be written in one line and must not contain blank characters.
	937
	938	Examples:
	939	@example
	940	kot;0,N/GaNsCn
	941	kota;1,N/GaNsCg;1,N/GaNsCa
	942	kotu;1,N/GaNsCd
	943	kotem;2,N/GaNsCi
	944	kocie;3t,N/GaNsCl;3t,N/GaNsCv
[19760ef]	945	najbielsi;3-4aÂ³y,ADJ/DsNpCnGp
	946	najbielsze;3-5aÂ³y,ADJ/DsNpCnGaifn
[25ae32e]	947	najlepsi;dobry,ADJ/DsNpCnGp
	948	najlepsze;dobry,ADJ/DsNpCnGaifn
	949	@end example
	950
	951
	952	The mandatory file name extension for a text dictionary is @code{dic}. For large
	953	dictionaries it is preferable, however, to compile them into binary
	954	(fsa) format.
	955
	956	@subsubheading Binary format
	957
	958	The mandatory file name extension for a binary dictionary is @code{bin}. To
	959	compile a text dictionary into binary format, write:
	960
	961	@example
	962	compiledic <dictionaryname>.dic
	963	@end example
	964
	965	@subsubheading Polex/PMDBF dictionary
	966
	967	A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
	968	the distribution as the default @emph{lem}'s dictionary. It's
	969	located by default in:
	970
[261bf62]	971	@file{$HOME/.local/share/utt/pl_PL.ISO-8859-2/lem.bin}
	972
	973	in local installation or in
	974
	975	@file{/usr/local/share/utt/pl_PL.ISO-8859-2/lem.bin}
	976
	977	in system installation.
[25ae32e]	978
	979	@node lem hints
	980	@subsection Hints
	981
[261bf62]	982	@subsubheading Combining data from multiple dictionaries
[25ae32e]	983
[261bf62]	984	@itemize
[25ae32e]	985
[261bf62]	986	@item Apply <dict1>, then apply <dict2> to words which were not annotatated.
[25ae32e]	987
[261bf62]	988	@example
	989	lem -d <dict1> \| lem -S lem -d <dict2>
	990	@end example
[25ae32e]	991
[261bf62]	992	@item Add annotations from two dictionaries <dict1> and <dict2>.
[25ae32e]	993
[261bf62]	994	@example
	995	lem -c -d <dict1> \| lem -S lem -d <dict2>
	996	@end example
[25ae32e]	997
[261bf62]	998	@end itemize
[25ae32e]	999
	1000
	1001	@c ---------------------------------------------------------------------
	1002	@c GUE
	1003	@c ---------------------------------------------------------------------
	1004
	1005	@page
	1006	@node gue
	1007	@section gue - morphological guesser
	1008
	1009	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1010
[19760ef]	1011	@item @strong{Authors:} @tab MichaÂ³ Stolarski, Tomasz ObrÃªbski
[25ae32e]	1012	@item @strong{Component category:} @tab filter
	1013
	1014	@end multitable
	1015
	1016	@menu
[261bf62]	1017	* gue description::
[25ae32e]	1018	* gue command line options::
	1019	* gue example::
	1020	* gue dictionaries::
	1021	@end menu
	1022
[261bf62]	1023
	1024	@node gue description
	1025	@subsection Description
	1026
	1027	@command{gue} guesess morphological descriptions of the form contained
	1028	in the @var{form} field.
	1029
	1030
[25ae32e]	1031	@node gue command line options
	1032	@subsection Command line options
	1033
	1034	@table @code
	1035
	1036	@parhelp
	1037	@parversion
	1038	@parinteractive
	1039	@c @parfile
	1040	@c @paroutput
	1041	@c @parfail
	1042	@c @parcopy
	1043	@parinputfield
	1044	@paroutputfield
	1045	@pardictionary
	1046	@parprocess
	1047	@parselect
	1048	@parunselect
	1049	@paroneline
	1050	@paronefield
	1051
	1052	@item @b{@minus{}@minus{}delta=@var{n}}
	1053	Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
	1054
	1055
	1056	@item @b{@minus{}@minus{}cut-off=@var{n}}
	1057	Do not display answers with less weight than cut-off value (default=`200').
	1058
	1059
	1060	@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
	1061	Guess up to n descriptions (default=`0', which means 'display all results').
	1062
	1063
	1064
	1065	@end table
	1066
	1067	@node gue example
	1068	@subsection Example
	1069
	1070	@example
	1071	command: gue -n 2
	1072
	1073	input:
	1074	0000 07 W smerfny
	1075
	1076	output:
	1077	0000 07 W smerfny gue:,ADJ/CaDpGiNs
	1078	0000 07 W smerfny gue:,ADJ/CnvDpGaipNs
	1079	@end example
	1080
	1081
	1082	@node gue dictionaries
	1083	@subsection Dictionaries
	1084
	1085	@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
	1086	The fsa format is created by compiling text-format dictionaries.
	1087
	1088
	1089
	1090	@subsubheading Text format
	1091
	1092	Dictionary entries have the following structure:
	1093
	1094	@example
	1095	@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
	1096	@end example
	1097
	1098	@var{lemma} must be given in the cut-add format:
	1099
	1100	@example
	1101	@code{[<cut1><add1>-]<cut2><add2>}
	1102	@end example
	1103	(no spaces in between): replace prefix of length @var{cut1} with
	1104	string @var{add1}, replace suffix of length @var{cat2} with string
	1105	@var{add2}.
	1106
	1107
[19760ef]	1108	Example: @code{3-4aÂ³y} transforms @i{najbielsi} into @i{biaÂ³y}
[25ae32e]	1109
	1110
	1111	@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
	1112
	1113	@var{weight} is an integer value between 1 and 999 indicating the
	1114	likelihood of the guess.
	1115
	1116	@example
[19760ef]	1117	*Â³kÃª;1a,N/GfNsCa
	1118	naj*elszy;3-4aÂ³y,ADJ/...:...
[25ae32e]	1119	@end example
	1120
	1121
	1122	@c ---------------------------------------------------------------------
	1123	@c COR
	1124	@c ---------------------------------------------------------------------
	1125
	1126	@page
	1127	@node cor
	1128	@section cor - spelling corrector
	1129
	1130	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[19760ef]	1131	@item @strong{Authors:} @tab Tomasz ObrÃªbski, MichaÂ³ Stolarski
[25ae32e]	1132	@item @strong{Component category:} @tab filter
[261bf62]	1133	@item @strong{Input format:} @tab UTT regular
	1134	@item @strong{Output format:} @tab UTT regular
	1135	@item @strong{Required annotation:} @tab tok
[25ae32e]	1136	@end multitable
	1137
[261bf62]	1138	@menu
	1139	* cor description::
	1140	* cor command line options::
	1141	* cor dictionaries::
	1142	@end menu
	1143
	1144
	1145	@node cor description
	1146	@subsection Description
	1147
[25ae32e]	1148	The spelling corrector applies Kemal Oflazer's dynamic programming
	1149	algorithm @cite{oflazer96} to the FSA representation of the set of
	1150	word forms of the Polex/PMDBF dictionary. Given an incorrect
	1151	word form it returns all word forms present in the dictionary whose
	1152	edit distance is smaller than the threshold given as the parameter.
	1153
	1154
	1155	@node cor command line options
	1156	@subsection Command line options
	1157
	1158	@table @code
	1159
	1160	@parhelp
	1161	@parversion
	1162	@parinteractive
	1163	@c @parfile
	1164	@c @paroutput
	1165	@c @parfail
	1166	@c @parcopy
	1167	@parinputfield
	1168	@paroutputfield
	1169	@pardictionary
	1170	@parprocess
	1171	@parselect
	1172	@parunselect
	1173	@paroneline
	1174	@paronefield
	1175
	1176	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
	1177	Maximum edit distance (default='1').
	1178
[261bf62]	1179	@c @item @b{@minus{}@minus{}replace, @minus{}r}
	1180	@c Replace original form with corrected form, place original form in the
	1181	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
	1182
[25ae32e]	1183
	1184	@end table
	1185
	1186	@node cor dictionaries
	1187	@subsection Dictionaries
	1188
	1189	@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
	1190	The fsa format is created by compiling text-format dictionaries.
	1191
	1192	@subsubheading Text format
	1193
	1194	The @command{cor} dictionary is a list of words:
	1195	@example
	1196	odlot
	1197	odlotowy
	1198	odludek
	1199	@end example
	1200
[261bf62]	1201	@subsubheading Binary format
	1202
	1203	The mandatory file name extension for a binary dictionary is @code{bin}. To
	1204	compile a text dictionary into binary format, write:
	1205
	1206	@example
	1207	compiledic <dictionaryname>.dic
	1208	@end example
	1209
	1210	@c ---------------------------------------------------------------------
	1211	@c KOR
	1212	@c ---------------------------------------------------------------------
	1213
	1214	@page
	1215	@node kor
	1216	@section kor - configurable spelling corrector
	1217
	1218	[TODO]
	1219
	1220	@c ---------------------------------------------------------------------
	1221	@c SEN
	1222	@c ---------------------------------------------------------------------
	1223
[25ae32e]	1224	@page
	1225	@node sen
	1226	@section sen - a sentensizer
	1227
	1228	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1229
[19760ef]	1230	@item @strong{Authors:} @tab Tomasz ObrÃªbski
[25ae32e]	1231	@item @strong{Component category:} @tab filter
[261bf62]	1232	@item @strong{Input format:} @tab UTT regular
	1233	@item @strong{Output format:} @tab UTT regular
	1234	@item @strong{Required annotation:} @tab tok
[25ae32e]	1235
	1236	@end multitable
	1237
	1238
	1239	@menu
[261bf62]	1240	* sen description::
[25ae32e]	1241	@c * sen input::
	1242	@c * sen output::
	1243	* sen example::
	1244	@end menu
	1245
[261bf62]	1246	@node sen description
	1247	@subsection Description
	1248
	1249	@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
	1250
[25ae32e]	1251	@node sen example
	1252	@subsection Example
	1253
	1254	@example
	1255	command: sen
	1256
	1257	input:
[19760ef]	1258	0000 05 W CzeÂ¶ÃŠ
[25ae32e]	1259	0005 01 P !
	1260	0006 01 S _
	1261	0007 02 W To
	1262	0009 01 S _
	1263	0010 02 W ja
	1264	0012 01 P .
	1265	0013 01 S \n
	1266
	1267	output:
	1268	0000 00 BOS *
[19760ef]	1269	0000 05 W CzeÂ¶ÃŠ
[25ae32e]	1270	0005 01 P !
	1271	0006 00 EOS *
	1272	0006 00 BOS *
	1273	0006 01 S _
	1274	0007 02 W To
	1275	0009 01 S _
	1276	0010 02 W ja
	1277	0012 01 P .
	1278	0013 01 S \n
	1279	0014 00 EOS *
	1280	@end example
	1281
	1282
	1283	@c ---------------------------------------------------------------------
	1284	@c GPH
	1285	@c ---------------------------------------------------------------------
	1286
	1287	@c @node gph - graphizer
	1288	@c @chapter gph - graphizer
	1289
[19760ef]	1290	@c Authors: Tomasz ObrÃªbski
[25ae32e]	1291
	1292
	1293
	1294	@c ---------------------------------------------------------------------
[261bf62]	1295	@c SER
[25ae32e]	1296	@c ---------------------------------------------------------------------
	1297
	1298	@page
	1299	@node ser
	1300	@section ser - pattern search tool
	1301
	1302	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[19760ef]	1303	@item @strong{Authors:} @tab Tomasz ObrÃªbski
[25ae32e]	1304	@item @strong{Component category:} @tab filter
[261bf62]	1305	@item @strong{Input format:} @tab UTT regular
	1306	@item @strong{Output format:} @tab UTT regular
	1307	@item @strong{Required annotation:} @tab tok, lem --one-field
[25ae32e]	1308	@end multitable
	1309
	1310	@menu
[261bf62]	1311	* ser description::
[25ae32e]	1312	* ser command line options::
	1313	* ser pattern::
	1314	* ser how ser works::
	1315	* ser customization::
	1316	* ser limitations::
	1317	* ser requirements::
	1318	@end menu
	1319
	1320
[261bf62]	1321	@node ser description
	1322	@subsection Description
	1323
	1324	@command{ser} looks for patterns in UTT-formatted texts.
	1325
	1326
[25ae32e]	1327	@c ---------------------------------------------------------------------
	1328	@node ser command line options
	1329	@subsection Command line options
	1330
	1331	@table @code
	1332
	1333	@parhelp
	1334	@parversion
	1335	@c @parfile
	1336	@c @paroutput
	1337	@c @parinputfield
	1338	@c @paroutputfield
	1339	@parprocess
	1340	@parinteractive
	1341
	1342	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1343	The search pattern.
	1344
	1345	@item @b{@minus{}@minus{}morph=@var{field}}
	1346	The name of the annotation field containing the morphological
	1347	description (default @code{lem}).
	1348
	1349	@item @b{@minus{}@minus{}flex}
	1350	Only print the generated flex source code.
	1351
	1352	@item @b{@minus{}@minus{}macro=@var{filename}}
	1353	Read macrodefinitions from file @var{filename} rather than from
	1354	default location. This option allows to redefine the set of terms.
	1355
	1356	@item @b{@minus{}@minus{}define=@var{filename}}
	1357	Append macrodefinitions from file @var{filename}. This option
	1358	allows to extend the set of terms.
	1359
	1360	@end table
	1361
	1362
	1363	@c ---------------------------------------------------------------------
	1364	@node ser pattern
	1365	@subsection Pattern
	1366
	1367	The @command{ser} pattern is a regular expression over terms corresponding
	1368	to text segments or segment sequences. Predefined terms are:
	1369
	1370	@table @code
	1371
	1372	@item seg(@var{t},@var{f},@var{a})
	1373	a segment of type @var{t}, containing form @var{f} and annotation
	1374	@var{a}
	1375
	1376	@item form(@var{f})
	1377	a segment containing form @var{f}
	1378
	1379	@item field(@var{f})
	1380	a segment containing annotation field @var{f}
	1381
	1382	@item space(@var{f})
	1383	a space segment of form @var{f}
	1384
	1385	@item word(@var{f})
	1386	a word segment of form @var{f}
	1387
	1388	@item punct(@var{f})
	1389	a punct segment of form @var{f}
	1390
	1391	@item number(@var{f})
	1392	a number segment of form @var{f}
	1393
	1394	@item lexeme(@var{f})
	1395	a word segment with lemma @var{f}
	1396
	1397	@item cat(@var{c})
	1398	a word segment of category @var{c}
	1399
	1400	@end table
	1401
	1402	All arguments are optional. If an argument is omitted, an arbitrary
	1403	string of non-blank characters is assumed as the argument value. Term
	1404	arguments may be arbitrary character-level regular expressions. The
	1405	following special symbols can by used:
	1406
	1407	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1408	@item @code{[@dots{}]} @tab a character class
	1409	@item @code{[^@dots{}]} @tab a negated character class
	1410	@item @code{\|} @tab alternative
	1411	@item @code{*} @tab repetition, including zero times
	1412	@item @code{+} @tab repetition, at least one time
	1413	@item @code{?} @tab optionality
	1414	@item @code{@{@var{m},@var{n}@}} @tab repetition from @var{m} to @var{n} times
	1415	@item @code{@{@var{m},@}} @tab repetition @var{m} or more times
	1416	@item @code{@{@var{m}@}} @tab repetition @var{m} times
	1417	@item @code{@var{\ddd}} @tab the character with octal value @var{ddd}
	1418	@item @code{\x@var{hh}} @tab the character with hexadecimal value @var{hh}
	1419	@item @code{( )} @tab parentheses, used to override precedence
	1420	@c @end multitable
	1421
	1422	@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1423	@item @code{.} @tab a non-blank character
	1424	@item @code{\w} @tab a letter
	1425	@item @code{\W} @tab a non-blank character other than a letter
	1426	@item @code{\d} @tab a digit
	1427	@item @code{\D} @tab a non-blank character other than a digit
	1428	@item @code{\s} @tab a space or tab character
	1429	@item @code{\S} @tab a non-blank character (the same as @code{.})
	1430	@item @code{\l} @tab a lowercase letter
	1431	@item @code{\L} @tab an uppercase letter
	1432	@end multitable
	1433
	1434
	1435	@noindent The following characters:
	1436	@example
	1437	@verb{% [ ] ^ \| * + ? { } , . < > \ %}
	1438	@end example
	1439	must be escaped with a backslash, i.e. written as:
	1440	@example
	1441	@verb{% \[ \] \^ \\| \* \+ \? \{ \} \, \. \< \> \\ %}
	1442	@end example
	1443
	1444	@quotation Note
	1445	The special symbols are ... borrowed from Perl with minor
	1446	modifications ... for convenience
	1447	The meaning of certain special characters/sequences slightly differs
	1448	from their common ???. This is motivated by convenience reasons.
	1449	The meaning of the @code{.} special character is modified due to
	1450	the special function of spaces in utt files (they are field
	1451	separators). Use @code{\s} to explicitly
	1452	@end quotation
	1453
	1454	In the argument of the @code{cat} term a special operator <...> may be
	1455	used. A category specification enclosed in angle brackets matches all
	1456	category descriptions which are consistent (non-contradictory) with the
	1457	specification. For example @code{<N>} matches all noun descriptions,
	1458	@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
	1459
	1460
	1461	@*
	1462	@noindent @b{Examples of one-segment patterns:}
	1463
	1464	@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1465	@item @code{seg} @tab any segment
	1466	@item @code{word} @tab any word-form
	1467	@item @code{word(pomocy)} @tab the word-form @samp{pomocy}
	1468	@item @code{word(naj.+)} @tab a word-form beginning with @samp{naj}
	1469	@item @code{word(\L\l+)} @tab a capitalized word-form
	1470	@item @code{punct} @tab a punctuation character
	1471	@item @code{space(.\\n.)} @tab a space segment containing a newline character
	1472	@item @code{lexeme(pomoc)} @tab any form of the lexeme 'pomoc'
	1473	@item @code{cat(N/.*)} @tab a word which category starts with @code{N/}
	1474	@item @code{cat(<N/Ca>)} @tab a word which category matches @code{N/Ca}
	1475	@end multitable
	1476
	1477	@*
	1478	@noindent @b{Examples of multi-segment patterns:}
	1479
	1480	@table @code
	1481
	1482	@item (word(\L) punct(\.) space?)+ word(\L\l+)
	1483	a sequence of initials followed by a surname
	1484
	1485	@item punct seg(W\|S\|N)* cat(<NPRO/Sr>) seg(W\|S\|N)* punct
	1486	a text fragment between two punctuation characters, containing an
	1487	ocurrence of a relative pronoun
	1488
	1489	@end table
	1490
	1491
	1492	@node ser how ser works
	1493	@subsection How ser works
	1494
	1495	@node ser customization
	1496	@subsection Customization
	1497
	1498	@c All predefined terms correspond to single segments,
	1499
	1500	@example
[261bf62]	1501	define(`verbseq', `(cat(<V>) (space cat(<V>)))')
[25ae32e]	1502	@end example
	1503
	1504
	1505	the term @code{cat()} may not be used as a ... of
	1506
	1507	@c See @command{m4} manual for further details on macro definition format.
	1508
	1509	@node ser limitations
	1510	@subsection Limitations
	1511
[261bf62]	1512	Do not use more than 3 attributes in <>.
[25ae32e]	1513
	1514	@node ser requirements
	1515	@subsection Requirements
	1516
	1517	In order to run @command{ser}, the following programs must be
	1518	installed in the system:
	1519
	1520	@itemize
	1521
	1522	@item @command{m4}
	1523	@item @command{grep}
	1524	@item @command{flex}
	1525	@item @command{gcc}
	1526
	1527	@end itemize
	1528
	1529
	1530	@c ---------------------------------------------------------------------
[261bf62]	1531	@c GRP
[25ae32e]	1532	@c ---------------------------------------------------------------------
	1533
	1534	@page
	1535	@node grp
	1536	@section grp - pattern search tool
	1537
	1538	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[19760ef]	1539	@item @strong{Authors:} @tab Tomasz ObrÃªbski
[25ae32e]	1540	@item @strong{Component category:} @tab filter
[261bf62]	1541	@item @strong{Input format:} @tab UTT flattened
	1542	@item @strong{Output format:} @tab UTT flattened
	1543	@item @strong{Required annotation:} @tab tok, sen, lem --one-field
[25ae32e]	1544	@end multitable
	1545
	1546
[261bf62]	1547	@menu
	1548	* grp description::
	1549	* grp command line options::
	1550	* grp pattern::
	1551	* grp hints::
	1552	@end menu
	1553
	1554
	1555	@node grp description
	1556	@subsection Description
	1557
[25ae32e]	1558	@code{gre} selects sentences containing an expression matching a
	1559	pattern. The pattern format is exactly the same as that accepted by
	1560	@code{ser}.
	1561
	1562	@code{gre} is intended mainly for speeding up corpus search process.
	1563	It is extremely fast (processing speed is usually higher then the speed
	1564	of reading the corpus file from disk).
	1565
	1566	@node grp command line options
	1567	@subsection Command line options
	1568
	1569	@table @code
	1570
	1571	@parhelp
	1572	@parversion
	1573	@parprocess
	1574	@parinteractive
	1575
	1576	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1577	The search pattern.
	1578
	1579	@item @b{@minus{}@minus{}morph=@var{field}}
	1580	The name of the annotation field containing the morphological
	1581	description (default @code{lem}).
	1582
	1583	@item @b{@minus{}@minus{}command}
	1584	Only print the generated flex source code.
	1585
	1586	@item @b{@minus{}@minus{}macro=@var{filename}}
	1587	Read macrodefinitions from file @var{filename} rather than from
	1588	default location. This option allows to redefine the set of terms.
	1589
	1590	@item @b{@minus{}@minus{}define=@var{filename}}
	1591	Append macrodefinitions from file @var{filename}. This option
	1592	allows to extend the set of terms.
	1593
	1594	@end table
	1595
	1596
	1597	@node grp pattern
	1598	@subsection Pattern
	1599
	1600	(see @code{ser})
	1601
	1602	@node grp hints
	1603	@subsection Hints
	1604
	1605	The corpus search speed may be increased by combining grp with lzop
	1606	compression tool (grp usually processes data faster than it is read from a
	1607	disk, especially for slow laptop drives).
	1608
	1609	@example
[e28a625]	1610	cat corpus \| tok \| sen \| lem -1 \| fla \| lzop -7 > corpus.grp.lzo
[25ae32e]	1611	@end example
	1612
	1613	@example
[e28a625]	1614	lzop -cd corpus.grp.lzo \| grp -e @var{EXPR} \| unfla \| ser -e @var{EXPR}
[25ae32e]	1615	@end example
	1616
	1617
[261bf62]	1618
[25ae32e]	1619	@c ---------------------------------------------------------------------
[261bf62]	1620	@c MAR
[25ae32e]	1621	@c ---------------------------------------------------------------------
[261bf62]	1622
	1623	@page
	1624	@node mar
	1625	@section mar
	1626
	1627	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1628	@item @strong{Authors:} @tab Marcin Walas, Tomasz ObrÃªbski
[e28a625]	1629	@item @strong{Input format:} @tab UTT flattened
	1630	@item @strong{Output format:} @tab UTT flattened
	1631	@item @strong{Required annotation:} @tab tok, sen, lem -1
[261bf62]	1632	@end multitable
	1633
	1634	[TODO]
	1635
[e28a625]	1636	(see mar's help 'mar -h' for some information)
	1637
[261bf62]	1638	@c ---------------------------------------------------------------------
	1639	@c KOT
[25ae32e]	1640	@c ---------------------------------------------------------------------
	1641
[261bf62]	1642
[25ae32e]	1643	@page
	1644	@node kot
	1645	@section kot - untokenizer
	1646
[261bf62]	1647	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1648	@item @strong{Authors:} @tab Tomasz ObrÃªbski
	1649	@item @strong{Component category:} @tab filter
	1650	@item @strong{Input format:} @tab UTT regular
	1651	@item @strong{Output format:} @tab text
	1652	@item @strong{Required annotation:} @tab tok
	1653	@end multitable
[25ae32e]	1654
	1655
	1656	@menu
[261bf62]	1657	* kot description::
[25ae32e]	1658	* kot command line options::
	1659	* kot usage examples::
	1660	@end menu
	1661
[261bf62]	1662	@node kot description
	1663	@subsection Description
	1664
	1665	@command{kot} transforms a UTT formatted file back into raw text format.
	1666
[25ae32e]	1667	@node kot command line options
	1668	@subsection Command line options
	1669
	1670	@table @code
	1671
	1672	@parhelp
	1673
	1674	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
	1675
	1676	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	1677
	1678	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	1679
	1680	@c @item @b{@minus{}@minus{}interactive @minus{}i}
	1681
	1682	@c @item @b{@minus{}@minus{}config=@var{filename}}
	1683
	1684	@item
	1685
	1686	@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
	1687	print @var{string} between nonadjacent segments of the input file
	1688
	1689	@item @b{@minus{}@minus{}spaces, @minus{}r}
	1690	retain the special characters @code{_}, @code{\t},
	1691	@code{\n}, @code{\r}, @code{\f} unexpanded in the output
	1692
	1693	@end table
	1694
	1695	@node kot usage examples
	1696	@subsection Usage examples
	1697
	1698	@example
	1699	cat legia.txt \| tok \| kot
	1700	@end example
	1701
	1702	@example
	1703	cat legia.txt \| tok \| lem -1 \| kot
	1704	@end example
	1705
[261bf62]	1706	@c ---------------------------------------------------------------
	1707	@c CON
	1708	@c ---------------------------------------------------------------
	1709
[25ae32e]	1710
	1711	@page
	1712	@node con
	1713	@section con - concordance table generator
	1714
	1715	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1716	@item @strong{Authors:} @tab Justyna Walkowska
	1717	@item @strong{Component category:} @tab sink
[261bf62]	1718	@item @strong{Input format:} @tab UTT regular
	1719	@item @strong{Output format:} @tab text
	1720	@item @strong{Required annotation:} @tab ser or mar
[25ae32e]	1721	@end multitable
	1722	@c
	1723
	1724	@menu
[261bf62]	1725	* con description::
[25ae32e]	1726	* con command line options::
	1727	* con usage example::
	1728	* con hints::
	1729	@end menu
	1730
[261bf62]	1731
	1732	@node con description
	1733	@subsection Description
	1734
	1735	@command{con} generates a concordance table based on a pattern given to @command{ser}.
	1736
	1737
[25ae32e]	1738	@node con command line options
	1739	@subsection Command line options
	1740
	1741	@table @code
	1742
	1743	@parhelp
	1744
	1745	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
	1746	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
	1747	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	1748	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	1749	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
	1750	@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
	1751	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
	1752	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
	1753	@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
	1754	@c @item @b{@minus{}@minus{}interactive @minus{}i}
	1755	@c @item @b{@minus{}@minus{}config=@var{filename}}
	1756	@c @item
	1757	@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1758	@c search pattern
	1759	@c
	1760	@c @item @b{@minus{}@minus{}flex}
	1761	@c only print the generated flex source code
	1762	@c
	1763	@c @item @b{@minus{}@minus{}macro=@var{filename}}
	1764	@c read macrodefinitions from file @var{filename} rather than from
	1765	@c default location. This option allows to redefine the set of terms.
	1766	@c
	1767	@c @item @b{@minus{}@minus{}define=@var{filename}}
	1768	@c append macrodefinitions from file @var{filename}. This option
	1769	@c allows to extend the set of terms.
	1770
	1771	@item @b{@minus{}@minus{}left @minus{}l}
	1772	Left context info (default='30c'). Example:
	1773	@example
	1774	-l=5c: left context is 5 characters
	1775	-l=5w: left context is 5 words
	1776	-l=5s: left context is 5 non-empty input lines
	1777	-l='\s*\S+\sr\S+BOS': left context starts with the given regex
	1778	@end example
	1779
	1780	@item @b{@minus{}@minus{}right @minus{}r}
	1781	Right context info (default='30c').
	1782	@item @b{@minus{}@minus{}trim @minus{}t}
	1783	Clear incomplete words from output.
	1784	@item @b{@minus{}@minus{}white @minus{}w}
	1785	DO NOT change all white characters into spaces.
	1786	@item @b{@minus{}@minus{}column @minus{}c}
	1787	Left column minimal width in characters (default = 0).
	1788	@item @b{@minus{}@minus{}ignore @minus{}i}
	1789	Ignore segment inconsistency in the input.
[261bf62]	1790	@item @b{@minus{}@minus{}bom}
[25ae32e]	1791	Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
[261bf62]	1792	@item @b{@minus{}@minus{}eom}
[25ae32e]	1793	End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
	1794	@item @b{@minus{}@minus{}bod}
	1795	Selected segment beginning display string (default='[').
	1796	@item @b{@minus{}@minus{}eod}
	1797	Selected segment end display string (default=']').
	1798
	1799
	1800
	1801	@end table
	1802
	1803	@node con usage example
	1804	@subsection Usage example
	1805	@example
[261bf62]	1806	cat file.txt \| tok \| lem -1 \| ser -e 'lexeme(dom)' \| con
[25ae32e]	1807	@end example
	1808
	1809
	1810	@node con hints
	1811	@subsection Hints
	1812
	1813	@command{con} is a rather slow program. Do not pass large amounts of
	1814	redundant text through this program. @command{con} works fine in the following
	1815	sequence:
	1816
	1817	@example
	1818	... \| grp -e EXPR \| ser -e EXPR \| con
	1819	@end example
	1820
	1821
	1822	@c ---------------------------------------------------------------------
	1823	@c ---------------------------------------------------------------------
	1824
	1825	@page
	1826	@node Auxiliary tools
	1827	@chapter Auxiliary tools
	1828
	1829	@menu
	1830	* compiledic:: dictionary compiler
	1831	* fla:: UTT file flattener
	1832	* unfla:: UTT file unflattener
	1833	@end menu
	1834
	1835
	1836	@page
	1837	@node compiledic
	1838	@section compiledic - the dictionary compiler
	1839
	1840	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1841	@item @strong{Authors:} @tab Michal Stolarski, Tomasz Obrebski
	1842	@item @strong{Component category:} @tab additional tool
	1843	@end multitable
	1844	@c
	1845
	1846	@command{compiledic} compiles dictionaries in text format (@code{.dic} extension) into binary
	1847	(FSA) format (@code{.bin} extension).
	1848
	1849	Automaton representation of a dictionary is built using the AT&T tools:
	1850	@itemize
	1851	@item AT&T FSM Library,
	1852	@item AT&T Lextools.
	1853	@end itemize
	1854
	1855	In order for the compiledic program to work you have to install the
	1856	above mentioned packages into your system. They are freely available
	1857	for non-commercial use.
	1858
	1859	Usage:
	1860	@example
	1861	compiledic <dictionaryname>.dic
	1862	@end example
	1863
	1864	The file <dictionaryname>.bin will be generated.
	1865
	1866	Remarque: The program produces a lot of temporary files which are
	1867	stored in the current directory. They are deleted after successfull
	1868	termination of the program.
	1869
	1870	@c @menu
	1871	@c * con command line options::
	1872	@c * con usage example::
	1873	@c * con hints::
	1874	@c @end menu
	1875
	1876
[e28a625]	1877	@c -------------------------------------------------------------------------------
	1878	@c FLA
	1879	@c -------------------------------------------------------------------------------
	1880
[25ae32e]	1881	@page
	1882	@node fla
	1883	@section fla - the UTT file flattener
	1884
	1885	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[19760ef]	1886	@item @strong{Authors:} @tab Tomasz ObrÃªbski
[e28a625]	1887	@item @strong{Input format:} @tab UTT regular
	1888	@item @strong{Output format:} @tab UTT flattened
	1889	@item @strong{Required annotation:} @tab sen
[25ae32e]	1890	@end multitable
	1891	@c
	1892
[e28a625]	1893	@menu
	1894	* fla description::
	1895	@c * fla command line options::
	1896	@c * fla usage example::
	1897	@end menu
	1898
	1899
	1900	@node fla description
	1901	@subsection Description
	1902
[25ae32e]	1903	@command{fla} ``flattens'' a utt file by merging segments belonging
	1904	to one sentence in one line. Technically, end-of-line characters
	1905	('\n', ASCII code 10) are replaced with line-feed characters ('\f',
	1906	ASCII code 12). The flattening makes it possible to process UTT files
	1907	with such tools as @command{grep} or @command{sed} sentence by
	1908	sentence (used in @command{grp} and @command{mar}).
	1909
	1910	Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
	1911
	1912	Flattened files are still human-readible.
	1913
	1914	Usage:
	1915
	1916	@example
	1917	fla [<bosregex>]
	1918	@end example
	1919
	1920	The facultative argument is a regular expression describing segments
	1921	which should be treated as sentence beginnings (the test is: the
	1922	segment contains a fragment matching the @code{<bosregex>}). By
	1923	default, segments containing a field @code{BOS} are seeked.
	1924
[e28a625]	1925	@c -------------------------------------------------------------------------------
	1926	@c UNFLA
	1927	@c -------------------------------------------------------------------------------
[25ae32e]	1928
	1929	@page
	1930	@node unfla
	1931	@section unfla - the UTT file unflattener
	1932
	1933	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[19760ef]	1934	@item @strong{Authors:} @tab Tomasz ObrÃªbski
[e28a625]	1935	@item @strong{Input format:} @tab UTT flattened
	1936	@item @strong{Output format:} @tab UTT regular
	1937	@item @strong{Required annotation:} @tab -
[25ae32e]	1938	@end multitable
	1939
[e28a625]	1940	@menu
	1941	* unfla description::
	1942	@c * fla command line options::
	1943	@c * fla usage example::
	1944	@end menu
	1945
	1946	@node unfla description
	1947	@subsection Description
[25ae32e]	1948	@command{unfla} transforms a flattened UTT file, produced by
	1949	@command{fla}, into the regular format by restoring end-of-line
	1950	characters.
	1951
	1952
	1953
	1954
	1955	@c ---------------------------------------------------------------------
	1956	@c USAGE EXAMPLES
	1957	@c ---------------------------------------------------------------------
	1958
	1959	@node Usage examples
	1960	@chapter Usage examples
	1961
	1962	@subsubheading Simple pipelines
	1963
	1964	@enumerate
	1965
	1966	@item tokenization
	1967
	1968	cat text \| tok > output1
	1969
	1970	@item morphological annotation (1)
	1971
	1972	simple dictionary based lemmatization
	1973
	1974	cat text \| tok \| lem > output1
	1975
	1976	@item morphological annotation (2)
	1977
	1978	1) perform dictionary-based lemmatization
	1979	4) guess descriptions for words which have no annotation
	1980
	1981	@example
	1982	cat text \| tok \| lem \| gue -S lem > output2
	1983	@end example
	1984
	1985	@item morphological annotation (3)
	1986
	1987	1) perform dictionary-based lemmatization
	1988	2) try to correct words with no annotation
	1989	3) perform dictionary-based lemmatization of corrected words
	1990	4) guess descriptions for words which still have no annotation
	1991
	1992	@example
	1993	cat text \| tok \| lem \| cor -p W -S lem \| lem -I cor \| gue -p W -S lem
	1994	@end example
	1995	@item spelling correction
	1996
	1997
	1998
	1999	@example
[e28a625]	2000	cat text \| tok \| egrep ' W ' \| lem \| egrep -v 'lem:' \| cor -1
[25ae32e]	2001	@end example
	2002
	2003	@item Expression extraction
	2004
	2005	Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
	2006
	2007	@example
	2008	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' -m \| kot > output4
	2009	@end example
	2010
	2011	@item A word in context
	2012
	2013	Extraction of text fragments containing a form of the lexeme 'rozmowa' in
	2014	the context of 5 preceeding and 5 succeeding corpus segments.
	2015
	2016	@example
	2017	cat text \| tok \| lem -1 \| ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m \| kot > output
	2018	@end example
	2019
	2020	@item generation of concordance table (1)
	2021
	2022	@example
	2023	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
	2024	@end example
	2025
	2026	10"
	2027
	2028	@item generation of concordance table (2)
	2029
	2030	The same as above but much faster
	2031
	2032	@example
	2033	cat text \| tok \| lem -1 \| \
	2034	grp -e 'cat(<V>) space lexeme(rozmowa)' \| \
	2035	ser -e 'cat(<V>) space lexeme(rozmowa)' \| \
	2036	con
	2037	@end example
	2038
	2039	2"
	2040
	2041	@item generation of concordance table (3)
	2042
	2043	Usually, one performs repetitively search over the same corpus. In
	2044	such case it is advisable to transform the corpus data into the format
	2045	required by @command{grp} first, and then use the preprocessed data.
	2046
	2047	As @command{grp} (@command{grep}) processes data faster then it is
	2048	read from the disk drive, the search time may be still shortened by
[e28a625]	2049	using file compression techniques. We suggest using the
	2050	@command{lzop} compressor/decompressor.
[25ae32e]	2051
	2052	@item the fastest way to search a large corpus
	2053
[e28a625]	2054	step 1: corpus preprocessing
[25ae32e]	2055
	2056	@example
	2057	cat corpus \| tok \| sen \| lem -1 \
[e28a625]	2058	\| fla \| lzop -7 > corpus.grp.lzo
[25ae32e]	2059	@end example
	2060
	2061	step 2: search
	2062
	2063	@example
[e28a625]	2064	lzop -cd corpus.grp.lzo \| unfla \| grp -e 'cat(<V>) space
[25ae32e]	2065	lexeme(rozmowa)' \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
	2066	@end example
	2067
	2068	@end enumerate
	2069
[e28a625]	2070	@c @subsubheading More complicated configurations
[25ae32e]	2071
	2072
[e28a625]	2073	@c @example
	2074	@c mknod fifo1 p
	2075	@c mknod fifo2 p
	2076	@c mknod fifo3 p
	2077	@c mknod fifo4 p
	2078	@c mknod fifo5 p
	2079
	2080	@c tok \| lem -p W -e fifo1 > fifo2 &
	2081	@c cor -e fifo3 < fifo1 \| lem > fifo4 &
	2082	@c gue < fifo3 > fifo5 &
	2083	@c sort -m fifo2 fifo4 fifo5
	2084
	2085	@c rm fifo?
	2086	@c @end example
[25ae32e]	2087
	2088
	2089	@c ---------------------------------------------------------------------
	2090	@c ---------------------------------------------------------------------
	2091
	2092	@c ---------------------------------------------------------------------
	2093	@c PMDBF DICTIONARY
	2094	@c ---------------------------------------------------------------------
	2095
	2096	@node PMDBF dictionary
	2097	@chapter PMDBF dictionary
	2098
	2099	UTT components come with lexical data derived from Polish
	2100	Morphological Database (PMDB).
	2101
	2102	@menu
	2103	* PMDBF files::
	2104	* PMDBF tag structure::
	2105	* PMDBF parts of speech::
	2106	* PMDBF morphosyntactic attributes::
	2107	@end menu
	2108
	2109	@node PMDBF files
	2110	@section Files
	2111
	2112	@node PMDBF tag structure
	2113	@section Tag structure
	2114
	2115	pos = [[:upper:]]+
	2116
	2117	attr = [[:upper:]]+
	2118
	2119	val = [[:lower:][:digit:]?!*+-] \| <[^>\n]+>
	2120
	2121	descr = pos ( / ( attr val + ) + ) ?
	2122
	2123	@node PMDBF parts of speech
	2124	@section Parts of speech
	2125
	2126	@multitable {ADJPRP} { adjectival-passive-participle }
	2127	@item @code{N} @tab noun
	2128	@item @code{NPRO} @tab nominal-pronoun
	2129	@item @code{NV} @tab deverbal-noun
	2130	@item @code{V} @tab verb
	2131	@item @code{BYC} @tab byc
	2132	@item @code{VNI} @tab non-inflected-verb
	2133	@item @code{ADJ} @tab adjective
	2134	@item @code{ADJPAP} @tab adjectival-passive-participle
	2135	@item @code{ADJPRP} @tab adjectival-present-participle
	2136	@item @code{ADJPP} @tab adjectival-past-participle
	2137	@item @code{ADJPRO} @tab adjectival-pronoun
	2138	@item @code{ADJNUM} @tab adjectival-numeral
	2139	@item @code{ADV} @tab adverb
	2140	@item @code{ADVANP} @tab adverbial-anterior-participle
	2141	@item @code{ADVPRP} @tab adverbial-present-participle
	2142	@item @code{ADVPRO} @tab adverbial-pronoun
	2143	@item @code{ADVNUM} @tab adverbial-numeral
	2144	@item @code{P} @tab preposition
	2145	@item @code{PPRO} @tab prep-noun-pronoun
	2146	@item @code{CONJ} @tab conjunction
	2147	@item @code{EXCL} @tab exclamation
	2148	@item @code{APP} @tab call
	2149	@item @code{ONO} @tab onomatopoeia
	2150	@item @code{PART} @tab particle
	2151	@item @code{NUMCRD} @tab cardinal-numeral
	2152	@item @code{NUMCOL} @tab collective-numeral
	2153	@item @code{NUMPAR} @tab partitive-numeral
	2154	@item @code{NUMORD} @tab ordinal-numeral
	2155	@end multitable
	2156
	2157	@node PMDBF morphosyntactic attributes
	2158	@section Morphosyntactic attributes
	2159
	2160	@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	2161	@c @headitem Attr @tab Val @tab Description
	2162	@item
	2163	@code{A} @tab @tab Aspect
	2164	@item
	2165	@tab @code{p} @tab perfect
	2166	@item
	2167	@tab @code{i} @tab imperfect.
	2168	@item
	2169	@item
	2170	@code{V} @tab @tab Verb-Form
	2171	@item
	2172	@tab @code{b} @tab infinitive,
	2173	@item
	2174	@tab @code{p} @tab personal,
	2175	@item
	2176	@tab @code{i} @tab impersonal.
	2177	@item
	2178	@item
	2179	@code{M} @tab @tab Mood
	2180	@item
	2181	@tab @code{d} @tab declarative,
	2182	@item
	2183	@tab @code{c} @tab conditional,
	2184	@item
	2185	@tab @code{i} @tab imperative.
	2186	@item
	2187	@item
	2188	@code{T} @tab @tab Tense
	2189	@item
	2190	@tab @code{a} @tab past,
	2191	@item
	2192	@tab @code{r} @tab present,
	2193	@item
	2194	@tab @code{f} @tab future.
	2195	@item
	2196	@item
	2197	@code{P} @tab @tab Person
	2198	@item
	2199	@tab @code{1} @tab 1,
	2200	@item
	2201	@tab @code{2} @tab 2,
	2202	@item
	2203	@tab @code{3} @tab 3.
	2204	@item
	2205	@item
	2206	@code{D} @tab @tab Degree
	2207	@item
	2208	@tab @code{p} @tab positive,
	2209	@item
	2210	@tab @code{c} @tab comparative,
	2211	@item
	2212	@tab @code{s} @tab superlative.
	2213	@item
	2214	@item
	2215	@code{N} @tab @tab Number
	2216	@item
	2217	@tab @code{s} @tab singular,
	2218	@item
	2219	@tab @code{p} @tab plural.
	2220	@item
	2221	@item
	2222	@code{C} @tab @tab Case
	2223	@item
	2224	@tab @code{n} @tab nominative,
	2225	@item
	2226	@tab @code{g} @tab genitive,
	2227	@item
	2228	@tab @code{d} @tab dative,
	2229	@item
	2230	@tab @code{a} @tab accusative,
	2231	@item
	2232	@tab @code{i} @tab instrumantal,
	2233	@item
	2234	@tab @code{l} @tab locative,
	2235	@item
	2236	@tab @code{v} @tab vocative.
	2237	@item
	2238	@item
	2239	@code{G} @tab @tab Gender
	2240	@item
	2241	@tab @code{p} @tab masculine-personal,
	2242	@item
	2243	@tab @code{a} @tab masculine-animal,
	2244	@item
	2245	@tab @code{i} @tab masculine-inanimate,
	2246	@item
	2247	@tab @code{f} @tab feminine,
	2248	@item
	2249	@tab @code{n} @tab neuter.
	2250	@end multitable
	2251
	2252
	2253	@c ---------------------------------------------------------------------
	2254	@c ---------------------------------------------------------------------
	2255	@c
	2256	@c @node Examples
	2257	@c @chapter Examples
	2258
	2259	@c ----------------------------------------------------------------------
	2260	@c ----------------------------------------------------------------------
	2261
	2262	@node GNU Free Documentation License
	2263	@chapter GNU Free Documentation License
	2264
	2265	@c The GNU Free Documentation License.
	2266	@center Version 1.2, November 2002
	2267
	2268	@c This file is intended to be included within another document,
	2269	@c hence no sectioning command or @node.
	2270
	2271	@display
	2272	Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
	2273	51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
	2274
	2275	Everyone is permitted to copy and distribute verbatim copies
	2276	of this license document, but changing it is not allowed.
	2277	@end display
	2278
	2279	@enumerate 0
	2280	@item
	2281	PREAMBLE
	2282
	2283	The purpose of this License is to make a manual, textbook, or other
	2284	functional and useful document @dfn{free} in the sense of freedom: to
	2285	assure everyone the effective freedom to copy and redistribute it,
	2286	with or without modifying it, either commercially or noncommercially.
	2287	Secondarily, this License preserves for the author and publisher a way
	2288	to get credit for their work, while not being considered responsible
	2289	for modifications made by others.
	2290
	2291	This License is a kind of ``copyleft'', which means that derivative
	2292	works of the document must themselves be free in the same sense. It
	2293	complements the GNU General Public License, which is a copyleft
	2294	license designed for free software.
	2295
	2296	We have designed this License in order to use it for manuals for free
	2297	software, because free software needs free documentation: a free
	2298	program should come with manuals providing the same freedoms that the
	2299	software does. But this License is not limited to software manuals;
	2300	it can be used for any textual work, regardless of subject matter or
	2301	whether it is published as a printed book. We recommend this License
	2302	principally for works whose purpose is instruction or reference.
	2303
	2304	@item
	2305	APPLICABILITY AND DEFINITIONS
	2306
	2307	This License applies to any manual or other work, in any medium, that
	2308	contains a notice placed by the copyright holder saying it can be
	2309	distributed under the terms of this License. Such a notice grants a
	2310	world-wide, royalty-free license, unlimited in duration, to use that
	2311	work under the conditions stated herein. The ``Document'', below,
	2312	refers to any such manual or work. Any member of the public is a
	2313	licensee, and is addressed as ``you''. You accept the license if you
	2314	copy, modify or distribute the work in a way requiring permission
	2315	under copyright law.
	2316
	2317	A ``Modified Version'' of the Document means any work containing the
	2318	Document or a portion of it, either copied verbatim, or with
	2319	modifications and/or translated into another language.
	2320
	2321	A ``Secondary Section'' is a named appendix or a front-matter section
	2322	of the Document that deals exclusively with the relationship of the
	2323	publishers or authors of the Document to the Document's overall
	2324	subject (or to related matters) and contains nothing that could fall
	2325	directly within that overall subject. (Thus, if the Document is in
	2326	part a textbook of mathematics, a Secondary Section may not explain
	2327	any mathematics.) The relationship could be a matter of historical
	2328	connection with the subject or with related matters, or of legal,
	2329	commercial, philosophical, ethical or political position regarding
	2330	them.
	2331
	2332	The ``Invariant Sections'' are certain Secondary Sections whose titles
	2333	are designated, as being those of Invariant Sections, in the notice
	2334	that says that the Document is released under this License. If a
	2335	section does not fit the above definition of Secondary then it is not
	2336	allowed to be designated as Invariant. The Document may contain zero
	2337	Invariant Sections. If the Document does not identify any Invariant
	2338	Sections then there are none.
	2339
	2340	The ``Cover Texts'' are certain short passages of text that are listed,
	2341	as Front-Cover Texts or Back-Cover Texts, in the notice that says that
	2342	the Document is released under this License. A Front-Cover Text may
	2343	be at most 5 words, and a Back-Cover Text may be at most 25 words.
	2344
	2345	A ``Transparent'' copy of the Document means a machine-readable copy,
	2346	represented in a format whose specification is available to the
	2347	general public, that is suitable for revising the document
	2348	straightforwardly with generic text editors or (for images composed of
	2349	pixels) generic paint programs or (for drawings) some widely available
	2350	drawing editor, and that is suitable for input to text formatters or
	2351	for automatic translation to a variety of formats suitable for input
	2352	to text formatters. A copy made in an otherwise Transparent file
	2353	format whose markup, or absence of markup, has been arranged to thwart
	2354	or discourage subsequent modification by readers is not Transparent.
	2355	An image format is not Transparent if used for any substantial amount
	2356	of text. A copy that is not ``Transparent'' is called ``Opaque''.
	2357
	2358	Examples of suitable formats for Transparent copies include plain
	2359	@sc{ascii} without markup, Texinfo input format, La@TeX{} input
	2360	format, @acronym{SGML} or @acronym{XML} using a publicly available
	2361	@acronym{DTD}, and standard-conforming simple @acronym{HTML},
	2362	PostScript or @acronym{PDF} designed for human modification. Examples
	2363	of transparent image formats include @acronym{PNG}, @acronym{XCF} and
	2364	@acronym{JPG}. Opaque formats include proprietary formats that can be
	2365	read and edited only by proprietary word processors, @acronym{SGML} or
	2366	@acronym{XML} for which the @acronym{DTD} and/or processing tools are
	2367	not generally available, and the machine-generated @acronym{HTML},
	2368	PostScript or @acronym{PDF} produced by some word processors for
	2369	output purposes only.
	2370
	2371	The ``Title Page'' means, for a printed book, the title page itself,
	2372	plus such following pages as are needed to hold, legibly, the material
	2373	this License requires to appear in the title page. For works in
	2374	formats which do not have any title page as such, ``Title Page'' means
	2375	the text near the most prominent appearance of the work's title,
	2376	preceding the beginning of the body of the text.
	2377
	2378	A section ``Entitled XYZ'' means a named subunit of the Document whose
	2379	title either is precisely XYZ or contains XYZ in parentheses following
	2380	text that translates XYZ in another language. (Here XYZ stands for a
	2381	specific section name mentioned below, such as ``Acknowledgements'',
	2382	``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title''
	2383	of such a section when you modify the Document means that it remains a
	2384	section ``Entitled XYZ'' according to this definition.
	2385
	2386	The Document may include Warranty Disclaimers next to the notice which
	2387	states that this License applies to the Document. These Warranty
	2388	Disclaimers are considered to be included by reference in this
	2389	License, but only as regards disclaiming warranties: any other
	2390	implication that these Warranty Disclaimers may have is void and has
	2391	no effect on the meaning of this License.
	2392
	2393	@item
	2394	VERBATIM COPYING
	2395
	2396	You may copy and distribute the Document in any medium, either
	2397	commercially or noncommercially, provided that this License, the
	2398	copyright notices, and the license notice saying this License applies
	2399	to the Document are reproduced in all copies, and that you add no other
	2400	conditions whatsoever to those of this License. You may not use
	2401	technical measures to obstruct or control the reading or further
	2402	copying of the copies you make or distribute. However, you may accept
	2403	compensation in exchange for copies. If you distribute a large enough
	2404	number of copies you must also follow the conditions in section 3.
	2405
	2406	You may also lend copies, under the same conditions stated above, and
	2407	you may publicly display copies.
	2408
	2409	@item
	2410	COPYING IN QUANTITY
	2411
	2412	If you publish printed copies (or copies in media that commonly have
	2413	printed covers) of the Document, numbering more than 100, and the
	2414	Document's license notice requires Cover Texts, you must enclose the
	2415	copies in covers that carry, clearly and legibly, all these Cover
	2416	Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
	2417	the back cover. Both covers must also clearly and legibly identify
	2418	you as the publisher of these copies. The front cover must present
	2419	the full title with all words of the title equally prominent and
	2420	visible. You may add other material on the covers in addition.
	2421	Copying with changes limited to the covers, as long as they preserve
	2422	the title of the Document and satisfy these conditions, can be treated
	2423	as verbatim copying in other respects.
	2424
	2425	If the required texts for either cover are too voluminous to fit
	2426	legibly, you should put the first ones listed (as many as fit
	2427	reasonably) on the actual cover, and continue the rest onto adjacent
	2428	pages.
	2429
	2430	If you publish or distribute Opaque copies of the Document numbering
	2431	more than 100, you must either include a machine-readable Transparent
	2432	copy along with each Opaque copy, or state in or with each Opaque copy
	2433	a computer-network location from which the general network-using
	2434	public has access to download using public-standard network protocols
	2435	a complete Transparent copy of the Document, free of added material.
	2436	If you use the latter option, you must take reasonably prudent steps,
	2437	when you begin distribution of Opaque copies in quantity, to ensure
	2438	that this Transparent copy will remain thus accessible at the stated
	2439	location until at least one year after the last time you distribute an
	2440	Opaque copy (directly or through your agents or retailers) of that
	2441	edition to the public.
	2442
	2443	It is requested, but not required, that you contact the authors of the
	2444	Document well before redistributing any large number of copies, to give
	2445	them a chance to provide you with an updated version of the Document.
	2446
	2447	@item
	2448	MODIFICATIONS
	2449
	2450	You may copy and distribute a Modified Version of the Document under
	2451	the conditions of sections 2 and 3 above, provided that you release
	2452	the Modified Version under precisely this License, with the Modified
	2453	Version filling the role of the Document, thus licensing distribution
	2454	and modification of the Modified Version to whoever possesses a copy
	2455	of it. In addition, you must do these things in the Modified Version:
	2456
	2457	@enumerate A
	2458	@item
	2459	Use in the Title Page (and on the covers, if any) a title distinct
	2460	from that of the Document, and from those of previous versions
	2461	(which should, if there were any, be listed in the History section
	2462	of the Document). You may use the same title as a previous version
	2463	if the original publisher of that version gives permission.
	2464
	2465	@item
	2466	List on the Title Page, as authors, one or more persons or entities
	2467	responsible for authorship of the modifications in the Modified
	2468	Version, together with at least five of the principal authors of the
	2469	Document (all of its principal authors, if it has fewer than five),
	2470	unless they release you from this requirement.
	2471
	2472	@item
	2473	State on the Title page the name of the publisher of the
	2474	Modified Version, as the publisher.
	2475
	2476	@item
	2477	Preserve all the copyright notices of the Document.
	2478
	2479	@item
	2480	Add an appropriate copyright notice for your modifications
	2481	adjacent to the other copyright notices.
	2482
	2483	@item
	2484	Include, immediately after the copyright notices, a license notice
	2485	giving the public permission to use the Modified Version under the
	2486	terms of this License, in the form shown in the Addendum below.
	2487
	2488	@item
	2489	Preserve in that license notice the full lists of Invariant Sections
	2490	and required Cover Texts given in the Document's license notice.
	2491
	2492	@item
	2493	Include an unaltered copy of this License.
	2494
	2495	@item
	2496	Preserve the section Entitled ``History'', Preserve its Title, and add
	2497	to it an item stating at least the title, year, new authors, and
	2498	publisher of the Modified Version as given on the Title Page. If
	2499	there is no section Entitled ``History'' in the Document, create one
	2500	stating the title, year, authors, and publisher of the Document as
	2501	given on its Title Page, then add an item describing the Modified
	2502	Version as stated in the previous sentence.
	2503
	2504	@item
	2505	Preserve the network location, if any, given in the Document for
	2506	public access to a Transparent copy of the Document, and likewise
	2507	the network locations given in the Document for previous versions
	2508	it was based on. These may be placed in the ``History'' section.
	2509	You may omit a network location for a work that was published at
	2510	least four years before the Document itself, or if the original
	2511	publisher of the version it refers to gives permission.
	2512
	2513	@item
	2514	For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
	2515	the Title of the section, and preserve in the section all the
	2516	substance and tone of each of the contributor acknowledgements and/or
	2517	dedications given therein.
	2518
	2519	@item
	2520	Preserve all the Invariant Sections of the Document,
	2521	unaltered in their text and in their titles. Section numbers
	2522	or the equivalent are not considered part of the section titles.
	2523
	2524	@item
	2525	Delete any section Entitled ``Endorsements''. Such a section
	2526	may not be included in the Modified Version.
	2527
	2528	@item
	2529	Do not retitle any existing section to be Entitled ``Endorsements'' or
	2530	to conflict in title with any Invariant Section.
	2531
	2532	@item
	2533	Preserve any Warranty Disclaimers.
	2534	@end enumerate
	2535
	2536	If the Modified Version includes new front-matter sections or
	2537	appendices that qualify as Secondary Sections and contain no material
	2538	copied from the Document, you may at your option designate some or all
	2539	of these sections as invariant. To do this, add their titles to the
	2540	list of Invariant Sections in the Modified Version's license notice.
	2541	These titles must be distinct from any other section titles.
	2542
	2543	You may add a section Entitled ``Endorsements'', provided it contains
	2544	nothing but endorsements of your Modified Version by various
	2545	parties---for example, statements of peer review or that the text has
	2546	been approved by an organization as the authoritative definition of a
	2547	standard.
	2548
	2549	You may add a passage of up to five words as a Front-Cover Text, and a
	2550	passage of up to 25 words as a Back-Cover Text, to the end of the list
	2551	of Cover Texts in the Modified Version. Only one passage of
	2552	Front-Cover Text and one of Back-Cover Text may be added by (or
	2553	through arrangements made by) any one entity. If the Document already
	2554	includes a cover text for the same cover, previously added by you or
	2555	by arrangement made by the same entity you are acting on behalf of,
	2556	you may not add another; but you may replace the old one, on explicit
	2557	permission from the previous publisher that added the old one.
	2558
	2559	The author(s) and publisher(s) of the Document do not by this License
	2560	give permission to use their names for publicity for or to assert or
	2561	imply endorsement of any Modified Version.
	2562
	2563	@item
	2564	COMBINING DOCUMENTS
	2565
	2566	You may combine the Document with other documents released under this
	2567	License, under the terms defined in section 4 above for modified
	2568	versions, provided that you include in the combination all of the
	2569	Invariant Sections of all of the original documents, unmodified, and
	2570	list them all as Invariant Sections of your combined work in its
	2571	license notice, and that you preserve all their Warranty Disclaimers.
	2572
	2573	The combined work need only contain one copy of this License, and
	2574	multiple identical Invariant Sections may be replaced with a single
	2575	copy. If there are multiple Invariant Sections with the same name but
	2576	different contents, make the title of each such section unique by
	2577	adding at the end of it, in parentheses, the name of the original
	2578	author or publisher of that section if known, or else a unique number.
	2579	Make the same adjustment to the section titles in the list of
	2580	Invariant Sections in the license notice of the combined work.
	2581
	2582	In the combination, you must combine any sections Entitled ``History''
	2583	in the various original documents, forming one section Entitled
	2584	``History''; likewise combine any sections Entitled ``Acknowledgements'',
	2585	and any sections Entitled ``Dedications''. You must delete all
	2586	sections Entitled ``Endorsements.''
	2587
	2588	@item
	2589	COLLECTIONS OF DOCUMENTS
	2590
	2591	You may make a collection consisting of the Document and other documents
	2592	released under this License, and replace the individual copies of this
	2593	License in the various documents with a single copy that is included in
	2594	the collection, provided that you follow the rules of this License for
	2595	verbatim copying of each of the documents in all other respects.
	2596
	2597	You may extract a single document from such a collection, and distribute
	2598	it individually under this License, provided you insert a copy of this
	2599	License into the extracted document, and follow this License in all
	2600	other respects regarding verbatim copying of that document.
	2601
	2602	@item
	2603	AGGREGATION WITH INDEPENDENT WORKS
	2604
	2605	A compilation of the Document or its derivatives with other separate
	2606	and independent documents or works, in or on a volume of a storage or
	2607	distribution medium, is called an ``aggregate'' if the copyright
	2608	resulting from the compilation is not used to limit the legal rights
	2609	of the compilation's users beyond what the individual works permit.
	2610	When the Document is included in an aggregate, this License does not
	2611	apply to the other works in the aggregate which are not themselves
	2612	derivative works of the Document.
	2613
	2614	If the Cover Text requirement of section 3 is applicable to these
	2615	copies of the Document, then if the Document is less than one half of
	2616	the entire aggregate, the Document's Cover Texts may be placed on
	2617	covers that bracket the Document within the aggregate, or the
	2618	electronic equivalent of covers if the Document is in electronic form.
	2619	Otherwise they must appear on printed covers that bracket the whole
	2620	aggregate.
	2621
	2622	@item
	2623	TRANSLATION
	2624
	2625	Translation is considered a kind of modification, so you may
	2626	distribute translations of the Document under the terms of section 4.
	2627	Replacing Invariant Sections with translations requires special
	2628	permission from their copyright holders, but you may include
	2629	translations of some or all Invariant Sections in addition to the
	2630	original versions of these Invariant Sections. You may include a
	2631	translation of this License, and all the license notices in the
	2632	Document, and any Warranty Disclaimers, provided that you also include
	2633	the original English version of this License and the original versions
	2634	of those notices and disclaimers. In case of a disagreement between
	2635	the translation and the original version of this License or a notice
	2636	or disclaimer, the original version will prevail.
	2637
	2638	If a section in the Document is Entitled ``Acknowledgements'',
	2639	``Dedications'', or ``History'', the requirement (section 4) to Preserve
	2640	its Title (section 1) will typically require changing the actual
	2641	title.
	2642
	2643	@item
	2644	TERMINATION
	2645
	2646	You may not copy, modify, sublicense, or distribute the Document except
	2647	as expressly provided for under this License. Any other attempt to
	2648	copy, modify, sublicense or distribute the Document is void, and will
	2649	automatically terminate your rights under this License. However,
	2650	parties who have received copies, or rights, from you under this
	2651	License will not have their licenses terminated so long as such
	2652	parties remain in full compliance.
	2653
	2654	@item
	2655	FUTURE REVISIONS OF THIS LICENSE
	2656
	2657	The Free Software Foundation may publish new, revised versions
	2658	of the GNU Free Documentation License from time to time. Such new
	2659	versions will be similar in spirit to the present version, but may
	2660	differ in detail to address new problems or concerns. See
	2661	@uref{http://www.gnu.org/copyleft/}.
	2662
	2663	Each version of the License is given a distinguishing version number.
	2664	If the Document specifies that a particular numbered version of this
	2665	License ``or any later version'' applies to it, you have the option of
	2666	following the terms and conditions either of that specified version or
	2667	of any later version that has been published (not as a draft) by the
	2668	Free Software Foundation. If the Document does not specify a version
	2669	number of this License, you may choose any version ever published (not
	2670	as a draft) by the Free Software Foundation.
	2671	@end enumerate
	2672
	2673	@page
	2674	@heading ADDENDUM: How to use this License for your documents
	2675
	2676	To use this License in a document you have written, include a copy of
	2677	the License in the document and put the following copyright and
	2678	license notices just after the title page:
	2679
	2680	@smallexample
	2681	@group
	2682	Copyright (C) @var{year} @var{your name}.
	2683	Permission is granted to copy, distribute and/or modify this document
	2684	under the terms of the GNU Free Documentation License, Version 1.2
	2685	or any later version published by the Free Software Foundation;
	2686	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
	2687	Texts. A copy of the license is included in the section entitled ``GNU
	2688	Free Documentation License''.
	2689	@end group
	2690	@end smallexample
	2691
	2692	If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
	2693	replace the ``with@dots{}Texts.'' line with this:
	2694
	2695	@smallexample
	2696	@group
	2697	with the Invariant Sections being @var{list their titles}, with
	2698	the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
	2699	being @var{list}.
	2700	@end group
	2701	@end smallexample
	2702
	2703	If you have Invariant Sections without Cover Texts, or some other
	2704	combination of the three, merge those two alternatives to suit the
	2705	situation.
	2706
	2707	If your document contains nontrivial examples of program code, we
	2708	recommend releasing these examples in parallel under your choice of
	2709	free software license, such as the GNU General Public License,
	2710	to permit their use in free software.
	2711
	2712	@c Local Variables:
	2713	@c ispell-local-pdict: "ispell-dict"
	2714	@c End:
	2715
	2716
	2717	@c ---------------------------------------------------------------------
	2718	@c ---------------------------------------------------------------------
	2719
	2720	@node Reporting bugs
	2721	@chapter Reporting bugs
	2722
	2723	Report bugs to <obrebski@@amu.edu.pl>.
	2724
	2725	@c ---------------------------------------------------------------------
	2726	@c ---------------------------------------------------------------------
	2727
	2728	@c @node Copyright
	2729	@c @chapter Copyright
	2730	@c
	2731	@c Copyright 2004 by Tomasz Obrebski
	2732	@c This software is free for research and educational use.
	2733
	2734	@c ---------------------------------------------------------------------
	2735	@c ---------------------------------------------------------------------
	2736
	2737	@node Author
	2738	@chapter Author
	2739
	2740
	2741	@bye

Note: See TracBrowser for help on using the repository browser.

UAM Text Tools

Context Navigation

source: app/doc/utt.texinfo @ e28a625

Download in other formats: