Context Navigation

utt.texinfo @ 389de9a

help

Last change on this file since 389de9a was 246900a, checked in by pawelk <pawelk@…>, 18 years ago

Przejrzałem kody programów pod kątem korzystania z plików konfiguracyjnych.

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@10 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

Property mode set to 100644

File size: 79.3 KB

Rev	Line
[25ae32e]	1	\input texinfo @c --texinfo--
	2	@documentencoding ISO-8859-2
	3	@c @documentlanguage pl
	4
	5	@c %**start of header
	6	@setfilename utt.info
	7	@settitle UAM Text Tools v0.90
	8	@c %**end of header
	9
	10	@copying
	11	This manual is for UAM Text Tools (version 0.90, November, 2007)
	12
	13	Copyright @copyright{} 2005, 2007 Tomasz Obrêbski, Micha³ Stolarski, Justyna Walkowska, Pawe³ Konieczka.
	14
	15	Permission is granted to copy, distribute and/or modify this document
	16	under the terms of the GNU Free Documentation License, Version 1.2
	17	or any later version published by the Free Software Foundation;
	18	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
	19	Texts. A copy of the license is included in the section entitled GNU Free Documentation License,,GNU Free Documentation License.
	20
	21	@c @quotation
	22	@c Permission is granted to ...
	23	@c No permission is granted until the document is completed.
	24	@c @end quotation
	25	@end copying
	26
	27
	28	@titlepage
	29	@title UAM Text Tools 0.90 - User Manual
	30	@subtitle edition 0.01, @today
	31	@subtitle status: prescript
	32	@author by Justyna Walkowska, Tomasz Obr@,{}ebski and Micha@l{} Stolarski
	33	@page
	34	@vskip 0pt plus 1filll
	35	@insertcopying
	36	@end titlepage
	37
	38	@contents
	39
	40	@c @paragraphindent none
	41
	42	@iftex
	43	@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
	44	@end iftex
	45
	46	@c @headings off
	47	@c @everyheading LEM(1) @\| @\| LEM(1)
	48	@everyfooting @today @c @\| @thispage @\|
	49
	50	@ifnottex
	51
	52	@node Top
	53	@top UTT - UAM Text Tools
	54
	55	@insertcopying
	56
	57	@menu
	58	* General information::
	59	* UTT file format::
	60	* Configuration files::
	61	* UTT components::
	62	* Auxiliary tools::
	63	* Usage examples::
	64	* PMDBF dictionary::
	65	@c * Examples::
	66	@c * Copyright::
	67	* GNU Free Documentation License::
	68	* Reporting bugs::
	69	* Author::
	70	@end menu
	71	@end ifnottex
	72
	73
	74	@c ----------------------------------------------------------------------
	75
	76	@node General information
	77	@chapter General information
	78
	79	UAM Text Tools (UTT) is a package of language processing tools
	80	developed at Adam Mickiewicz University. Its functionality includes:
	81
	82	@itemize @bullet
	83
	84	@item
	85	tokenization
	86	@item
	87	dictionary-based morphological analysis
	88	@item
	89	heuristic morphological analysis of unknown words
	90	@item
	91	spelling correction
	92	@item
	93	pattern search
	94	@item
	95	sentence splitting
	96	@item
	97	generation of concordance tables
	98	@end itemize
	99
	100	The toolkit is destined for processing of raw (not annotated)
	101	unrestricted text for any conceivable purpose.
	102
	103	The system is organized as a collection of command-line programs, each
	104	performing one operation, e.g. tokenization, lemmatization, spelling
	105	correction. The components are independent one from another, the
	106	unifying element being the uniform i/o file format.
	107
	108	The components may be combined in various ways to provide various text
	109	processing services. Also new components supplied by the used may be
	110	easily incorporated into the system provided that they respect the i/o
	111	file format conventions.
	112
	113	UTT component programs does not depend on any specific tagset or
	114	morphological description format.
	115
	116	UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
	117	the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
	118
	119	The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use.
	120
	121
	122	List of contributors:
	123
	124	@itemize
	125	@item Pawel Konieczka
	126	@item Tomasz Obrebski
	127	@item Michal Stolarski
	128	@item Marcin Walas
	129	@item Justyna Walkowska
	130	@end itemize
	131
	132	@c ----------------------------------------------------------------------
	133	@c ---------------------------------------------------------------------
	134
	135	@node UTT file format
	136	@chapter UTT file format
	137
	138	A UTT file contains annotation of a text. It consists of a sequence of
	139	segments. Each segment explicitly refers to a continuous piece of the
	140	text and provides some information on it.
	141
	142	@section Segment format
	143
	144	A segment occupies one line of a UTT file and consists of
	145	space-separated fields:
	146
	147
	148	@quotation
	149	@sp 1
	150	[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
	151	@sp 1
	152	@end quotation
	153
	154	@table @var
	155
	156	@item @var{start}
	157	Non-negative integer value indicating the position in the source text where the
	158	segment starts.
	159
	160	@item @var{length}
	161	Non-negative integer value indicating the length of the segment.
	162
	163	@item @var{type}
	164	A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
	165	@var{type} reflects the main classification of segments -
	166	into words, numbers, punctuation marks, meta-text markers.
	167	@xref{tok output,,tok output}, for description of automatically recognized type markers.
	168
	169	@item @var{form}
	170	This field contains the textual form of the segment or the special
	171	symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
	172
	173	The characters or character sequences that have special meaning in the
	174	@var{form} field are enumerated below.
	175
	176	Characters with special meaning:
	177
	178	@itemize
	179	@item @code{_} - space character
	180	@item @code{*} - undefined contents
	181	@end itemize
	182
	183	Escape sequences:
	184
	185	@itemize
	186	@item @code{\n} - new line
	187	@item @code{\t} - tabulation
	188	@item @code{\r} - carriage return
	189
	190	@item @code{\_} - the @code{_} character
	191	@item @code{\} - the @code{} character
	192	@item @code{\\} - the @code{\} character
	193
	194	@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
	195	@end itemize
	196
	197	@item @var{annotation1}
	198	@item @var{annotation2}
	199	@item ...
	200	Annotation fields have the following format:
	201
	202	@var{longname} @code{:} @var{value}
	203
	204	or
	205
	206	@var{shortname} @var{value}
	207
	208	where @var{longname} is a string of alphanumeric characters
	209	(isalnum() test), @var{shortname} - a single non-alphanumeric character
	210	(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
	211
	212	@end table
	213
	214
	215	Only two fields are mandatory: @var{type} and @var{form}. All other fields
	216	may be absent. In the case when only one number precedes the
	217	@var{type} field, it is interpreted as the @var{START} position.
	218
	219	If the @var{length} field is ommited, the length of the segment is the
	220	length of the @var{form} field, except when the value of the
	221	@var{form} field is @code{*} -- in this case, the length is assumed to
	222	be 0.
	223
	224	If the @var{start} field is also absent, the segment is assumed to directly
	225	follow the preceding one.
	226
	227	@c Conventions:
	228
	229	@c Annotation fields with predefined meaning:
	230
	231	@c @itemize
	232	@c @item @code{!} - UTT components are allowed to modify the contents of
	233	@c the @var{form} field (e.g. spelling correction does this). If this happens the
	234	@c original form of the segment have to be placed in the @code{!}-field.
	235	@c @item @code{@@} - morphological description
	236	@c @item @code{=} - node identifier assignment (used in graph encoding)
	237	@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
	238	@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
	239	@c @end itemize
	240
	241	Segments of length 0 may be used to mark file positions with some
	242	information. See e.g. BOS and EOS (beginning/end of sentence) markers
	243	in the example below.
	244
	245	Example:
	246
	247	sentence: @samp{Piszemy dobre progrumy.}
	248
	249	@example
	250	0000 00 BOS *
	251	0000 07 W Piszemy lem:pisaæ,V
	252	0007 01 S _
	253	0008 05 W dobre lem:dobry,ADJ
	254	0013 01 S _
	255	0014 08 W progrumy cor:programy lem:program,N
	256	0022 01 P .
	257	0023 00 EOS *
	258	0023 01 S _
	259	0024 00 BOS *
	260	0024 11 W Warszawiacy lem:Warszawiak,N
	261	0035 01 S _
	262	0036 03 W te¿
	263	0039 01 P .
	264	0040 00 EOS *
	265
	266	@end example
	267
	268	@example
	269	0000 BOS *
	270	0000 W Piszemy lem:pisaæ,V
	271	0007 S _
	272	0008 W dobre lem:dobry,ADJ
	273	0013 S _
	274	0014 W progrumy cor:programy lem:program,N
	275	0022 P .
	276	0023 EOS *
	277	@end example
	278
	279	Posion information may be provided only for some types of segments:
	280
	281	@example
	282	0000 BOS *
	283	W Piszemy lem:pisaæ,V
	284	S _
	285	W dobre lem:dobry,ADJ
	286	S _
	287	W progrumy cor:programy lem:program,N
	288	P .
	289	EOS *
	290	S _
	291	0024 BOS *
	292	W Warszawiacy lem:Warszawiak,N
	293	S _
	294	W te¿
	295	P .
	296	EOS *
	297	@end example
	298
	299	Position/length information may be provided only when necessary:
	300
	301	@example
	302	0000 04 N *
	303	0000 N 12
	304	P .
	305	N 5
	306	S _
	307	W km
	308	@end example
	309
	310	@section UTT File
	311
	312	A UTT file consists of a sequence of segments. The same text position
	313	may be covered by multiple segments. In cosequence, ambiguous text
	314	segmentation and ambiguous annotation may be represented.
	315
	316	There are two structural requirements a valid UTT-formatted file
	317	has to meet:
	318
	319	@itemize @bullet
	320
	321	@item
	322	segments have to be sorted with respect to the @var{position} field,
	323
	324	@item
	325	for each
	326	segment ending at position @var{n}, either there must be a segment starting at
	327	position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
	328	for each segment starting at position @var{n}, either there must be a segment
	329	ending at position @var{n-1}, or the position @var{n-1} must not be covered
	330	by any segment.
	331
	332	@end itemize
	333
	334	A valid annotation for the text fragment
	335	@example
	336	12.5 km
	337	@end example
	338
	339	may be
	340
	341	@example
	342	0000 02 N 12
	343	0000 04 N 12.5
	344	0002 01 P .
	345	0003 01 N 5
	346	0004 01 S _
	347	0005 02 W km
	348	@end example
	349
	350	but not
	351
	352	@example
	353	0000 02 N 12
	354	0000 04 N 12.5
	355	0004 01 S _
	356	0005 02 W km
	357	@end example
	358
	359	because in the latter example the first segment (starting at position 0000, 2 characters long) ends at position @var{n}=0001 which is covered by the second segment and no segment starts at position @var{n+2}=0002.
	360
	361	@section Character encoding
	362
	363	The UTT component programs accept only 1-byte character encoding, such
	364	as ISO, ANSI, DOS, UTF-8 (probably: not tested yet).
	365
	366
	367	@c @section Formats
	368
	369	@c @unnumberedsubsubsec Basic format
	370
	371	@c While processing large amounts of the overhead related with explicit
	372	@c ... of the start position and segment length becomes ... . Therefore,
	373	@c for efficiency reasons certain shortcuts are possible:
	374
	375	@c @unnumberedsubsubsec Relative start position
	376
	377	@c Start position may be given as relative distance from the last
	378	@c absolut position.
	379
	380	@c @unnumberedsubsubsec Absent length
	381
	382	@c Segment length may by omitted. Normally it can be restored by counting
	383	@c the length of the @emph{form field}. For segments with the special value
	384	@c @code{*} in the @emph{form field} length 0 is assumed.
	385
	386	@c @unnumberedsubsubsec Absent length and start position
	387
	388	@c Both start position and segment length may be omitted. In this format
	389	@c each segment is assumed to follow the previous one. This format is,
	390	@c therefore, suitable only for unambiguously tagged text
	391	@c (0-length markers can be still used.)
	392
	393
	394	@c @table @code
	395	@c @item AL
	396	@c @code{1234 03 W kot}
	397	@c @item RL
	398	@c @code{+56 03 W kot}
	399	@c @item A
	400	@c @code{1234 W kot}
	401	@c @item R
	402	@c @code{+56 W kot}
	403	@c @item 0
	404	@c @code{W kot}
	405	@c @end table
	406
	407
	408	@c [JAK UZYSKAÆ POLSKIE CZCIONKI W DVI???]
	409
	410	@macro parhelp
	411	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
	412	Print help.
	413	@end macro
	414
	415
	416	@macro parversion
	417	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
	418	Print version information.
	419	@end macro
	420
	421	@macro parinteractive
	422	@item @b{@minus{}@minus{}interactive, @minus{}i}
	423	This option toggles interactive mode, which is by default off. In the
	424	interactive mode the program does not buffer the output.
	425	@end macro
	426
	427
	428	@c @macro parfile
	429	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	430	@c Input file name.
	431	@c If this option is absent or equal to '@minus{}', the program
	432	@c reads from the standard input.
	433	@c @end macro
	434
	435
	436	@c @macro paroutput
	437	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	438	@c Regular output file name. To regular output the program sends segments
	439	@c which it successfully processed and copies those which were not
	440	@c subject to processing. If this option is absent or equal to
	441	@c '@minus{}', standard output is used.
	442	@c @end macro
	443
	444	@c @macro parfail
	445	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
	446	@c Fail output file name. To fail output the program copies the segments
	447	@c it failed to process. If this option is absent or equal to
	448	@c '@minus{}', standard output is used.
	449	@c @end macro
	450
	451
	452	@c @macro parcopy
	453	@c @item @b{@minus{}@minus{}copy, @minus{}c}
	454	@c Copy succesfully processed segments to regular output also in their
	455	@c original input form.
	456	@c @end macro
	457
	458
	459	@macro parinputfield
	460	@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
	461	The field containing the input to the program. The default is the
	462	@var{form} field. The fields @var{position}, @var{length}, @var{type},
	463	and @var{form} are referred to as @code{1}, @code{2}, @code{3},
	464	@code{4}, respectively.
	465	@end macro
	466
	467
	468	@macro paroutputfield
	469	@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
	470	The name of the field added by the program. The default is the name of the program.
	471	@end macro
	472
	473
	474	@macro pardictionary
	475	@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
	476	Dictionary file name.
	477	@end macro
	478
	479
	480	@macro parprocess
	481	@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
	482	Process segments with the specified value in the @var{type} field.
	483	Multiple occurences of this option are allowed and are interpreted as
	484	disjunction. If this option is absent, all segments are processed.
	485	@end macro
	486
	487
	488	@macro parselect
	489	@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
	490	Select for processing only segments in which the field named
	491	@var{fieldname} is present. Multiple occurences of this option are
	492	allowed and are interpreted as conjunction of conditions. If this
	493	option is absent, all segments are processed.
	494	@end macro
	495
	496
	497	@macro parunselect
	498	@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
	499	Select for processing only segments in which the field @var{fieldname}
	500	is absent. Multiple occurences of this option are allowed and are
	501	interpreted as conjunction of conditions. If this option is absent,
	502	all segments are processed.
	503	@end macro
	504
	505
	506	@macro paroneline
	507	@item @b{@minus{}@minus{}one-line}
	508	This option makes the program print ambiguous annotation in one output
	509	line by generating multiple annotation fields. By default when
	510	ambiguous annotation may be produced for a segment, the segment is
	511	multiplicated and each of the annotations is added to separate copy of
	512	the segment.
	513	@end macro
	514
	515
	516	@macro paronefield
	517	@item @b{@minus{}@minus{}one-field, @minus{}1}
	518	This option makes the program print ambiguous annotation in one
	519	annotation field. By default when ambiguous annotation may be produced
	520	for a segment, the segment is multiplicated and each of the
	521	annotations is added to separate copy of the segment.
	522
	523	This option is useful when working with @command{kot} or @command{con}.
	524	@end macro
	525
	526
	527	@c ---------------------------------------------------------------------
	528	@c ---------------------------------------------------------------------
	529
	530	@c @node Common command line options
	531	@c @chapter Common command line options
	532
	533	@c @table @code
	534
	535	@c @parhelp
	536
	537	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
	538	@c Print help.
	539
	540	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
	541	@c Print version information.
	542
	543	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	544	@c Input file name.
	545	@c If this option is absent or equal to '@minus{}', the program
	546	@c reads from the standard input.
	547
	548	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	549	@c Regular output file name. To regular output the program sends segments
	550	@c which it successfully processed and copies those which were not
	551	@c subject to processing. If this option is absent or equal to
	552	@c '@minus{}', standard output is used.
	553
	554	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
	555	@c Fail output file name. To fail output the program copies the segments
	556	@c it failed to process. If this option is absent or equal to
	557	@c '@minus{}', standard output is used.
	558
	559	@c @item @b{@minus{}@minus{}only-fail}
	560	@c Discard segments which would normally be sent to regular
	561	@c output. Print only segments the program failed to process.
	562
	563	@c @item @b{@minus{}@minus{}no-fail}
	564	@c Discard segments the program failed to process.
	565	@c (This and the previous option are functionally equivalent to,
	566	@c respectively, @option{-o /dev/null} and @option{-e /dev/null}, but
	567	@c make the programs run faster.)
	568
	569	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
	570	@c The field containing the input to the program. The default is usually
	571	@c the @var{form} field (unless otherwise stated in the program
	572	@c description). The fields @var{position}, @var{length}, @var{tag}, and
	573	@c @var{form} are referred to as @code{1}, @code{2}, @code{3}, @code{4},
	574	@c respectively.
	575
	576	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
	577	@c The name of the field added by the program. The default is the name of
	578	@c the program.
	579
	580	@c @c @item @b{@minus{}@minus{}copy, @minus{}c}
	581	@c @c Copy processed segments to regular output.
	582
	583	@c @item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
	584	@c Dictionary file name.
	585	@c (This option is used by programs which use dictionary data.)
	586
	587	@c @item @b{@minus{}@minus{}process=@var{tag}, @minus{}p @var{tag}}
	588	@c Process segments with the specified value in the @var{tag} field.
	589	@c Multiple occurences of this option are allowed and are interpreted as
	590	@c disjunction. If this option is absent, all segments are processed.
	591
	592	@c @item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
	593	@c Select for processing only segments in which the field named
	594	@c @var{fieldname} is present. Multiple occurences of this option are
	595	@c allowed and are interpreted as conjunction of conditions. If this
	596	@c option is absent, all segments are processed.
	597
	598	@c @item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
	599	@c Select for processing only segments in which the field @var{fieldname}
	600	@c is absent. Multiple occurences of this option are allowed and are
	601	@c interpreted as conjunction of conditions. If this option is absent,
	602	@c all segments are processed.
	603
	604	@c @item @b{@minus{}@minus{}interactive @minus{}i}
	605	@c This option toggles interactive mode, which is by default off. In the
	606	@c interactive mode the program does not buffer the output.
	607
	608	@c @item @b{@minus{}@minus{}config=@var{filename}}
	609	@c Read configuration from file @file{@var{filename}}.
	610
	611	@c @item @b{@minus{}@minus{}one @minus{}1}
	612	@c This option makes the program print ambiguous annotation in one output
	613	@c segment. By default when
	614	@c ambiguous new annotation is being produced for a segment, the segment
	615	@c is multiplicated and each of the annotations is added to separate copy
	616	@c of the segment.
	617
	618	@c @end table
	619
	620	@c ---------------------------------------------------------------------
	621	@c CONFIGURATION FILES
	622	@c ---------------------------------------------------------------------
	623
	624	@node Configuration files
	625	@chapter Configuration files
	626
	627	Values for all command line options accepted by a component
	628	may be set in configuration files. The default location of the
	629	configuration files for a component named @command{@var{program}} are
	630
	631	@example
[246900a]	632	@file{/usr/local/etc/utt/@var{program}.conf}
[25ae32e]	633	@end example
	634
	635	for system-wide configuration file and
	636
	637	@example
[246900a]	638	@file{~/.utt/@var{program}.conf}
[25ae32e]	639	@end example
	640
	641	for user configuration file.
	642
	643	@c The configuration file to load may be also specified with the
	644	@c @option{--config} option. Configuration file need not be provided.
	645
	646	For each option, the value is set according to the following priority:
	647
	648	@itemize
	649	@item command line
	650	@c @item configuration file indicated with @option{--config} option
	651	@item user configuration file (or configuration file indicated with the @option{--config} option)
	652	@item system-wide configuration file
	653	@end itemize
	654
	655	Parameter values are specified in the following format:
	656
	657	@var{parametername}=@var{value}
	658
	659	where @var{parametername} is the short or long name of an option accepted by
	660	the program, or
	661
	662	@var{parametername}
	663
	664	if the option does not need arguments.
	665
	666	You can introduce comments to configuration files using the # sign.
	667
	668	If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
	669
	670	@c The equal sign may be omitted.
	671
	672
	673	@quotation Tip
	674	If you have two (or more) frequently used sets of options for the same
	675	program (eg. lem with PMDBF dictionary and lem with a user dictionary)
	676	a good solution is to create two soft links to lem, called
	677	eg. lemg and lemu and specify their configuration in files lemg.conf
	678	and lemu.conf respectively.
	679	@end quotation
	680
	681	@c ---------------------------------------------------------------------
	682	@c COMPONENTS
	683	@c ---------------------------------------------------------------------
	684
	685	@node UTT components
	686	@chapter UTT components
	687
	688	UTT components are of three types:
	689
	690	@menu
	691	Sources: programs which read non-UTT data (e.g. raw text) and produce output
	692	in UTT format
	693	* tok:: a tokenizer
	694
	695	Filters: programs which read and produce UTT-formatted data
	696	@c * sen - the sentencizer::
	697	* lem:: a morphological analyzer
	698	* gue:: a morphological guesser
	699	* cor:: a spelling corrector
	700	* sen:: a sentensizer
	701	@c * gph - the graphizer::
	702	* ser:: a pattern search tool (marks matches)
	703	* grp:: a pattern search tool (selects sentences containing a match)
	704
	705	Sinks: programs which read UTT data and produce output in another format
	706	* kot:: an untokenizer
	707	* con:: a concordance table generator
	708	@end menu
	709
	710	@c ---------------------------------------------------------------------
	711	@c TOK
	712	@c ---------------------------------------------------------------------
	713
	714	@page
	715	@node tok
	716	@section tok - a tokenizer
	717
	718	@c ----------------------------------------
	719
	720	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	721	@item @strong{Authors:} @tab Tomasz Obrêbski
	722	@item @strong{Component category:} @tab source
	723	@end multitable
	724
	725
	726	@menu
	727	* tok description::
	728	* tok input::
	729	* tok output::
	730	* tok command line options::
	731	* tok example::
	732	@end menu
	733
	734	@node tok description
	735	@subsection Description
	736
	737	@code{tok} is a simple program which reads a text file and identifies
	738	tokens on the basis of their orthographic form. The type of the token
	739	is printed as the @var{type} field.
	740
	741	@node tok input
	742	@subsection Input
	743
	744	Raw text.
	745
	746	@node tok output
	747	@subsection Output
	748
	749	UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
	750
	751	@itemize
	752
	753	@item @code{W}
	754	(word)
	755	- continuous sequence of letters
	756
	757	@item @code{N}
	758	(number)
	759	- continuous sequence of digits
	760
	761	@item @code{S}
	762	(space)
	763	- continuous sequence of space characters
	764
	765	@item @code{P}
	766	(punctuation mark)
	767	- single printable characters not belonging to any of the other classes
	768
	769	@item @code{B}
	770	(unprintable character)
	771	- single unprintable character
	772
	773	@end itemize
	774
	775
	776
	777	@node tok command line options
	778	@subsection Command line options
	779
	780	@table @code
	781
	782	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
	783	Print help.
	784
	785	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
	786	Print version information.
	787
	788	@item @b{@minus{}@minus{}interactive, @minus{}i}
	789	This option toggles interactive mode, which is by default off. In the
	790	interactive mode the program does not buffer the output.
	791
	792	@end table
	793
	794	@node tok example
	795	@subsection Example
	796
	797	Input:
	798
	799	@example
	800	Piszemy dobre programy.
	801	@end example
	802
	803	Output:
	804
	805	@example
	806	0000 07 W Piszemy
	807	0007 01 S _
	808	0008 05 W dobre
	809	0013 01 S _
	810	0014 08 W programy
	811	0022 01 P .
	812	0023 01 S \n
	813	@end example
	814
	815
	816	@c ---------------------------------------------------------------------
	817	@c SEN
	818	@c ---------------------------------------------------------------------
	819
	820	@c @node sen - sentencizer
	821	@c @chapter sen - sentencizer
	822
	823	@c Authors: Tomasz Obrêbski
	824
	825	@c ---------------------------------------------------------------------
	826	@c LEM
	827	@c ---------------------------------------------------------------------
	828
	829	@page
	830	@node lem
	831	@section lem - morphological analyzer
	832
	833	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	834	@item @strong{Authors:} @tab Tomasz Obrêbski, Micha³ Stolarski
	835	@item @strong{Component category:} @tab filter
	836	@end multitable
	837
	838	@menu
	839	* lem description::
	840	* lem command line options::
	841	* lem input::
	842	* lem output::
	843	* lem example::
	844	* lem dictionaries::
	845	* lem hints::
	846	@end menu
	847
	848	@node lem description
	849	@subsection Description
	850
	851	@command{lem} performs morphological analysis of a simple orthographic
	852	word, returning all its possible morphological annotations,
	853	disregarding the context.
	854
	855	@c ----------------------------------------
	856
	857	@node lem command line options
	858	@subsection Command line options
	859
	860	@table @code
	861	@parhelp
	862	@parversion
	863	@parinteractive
	864	@c @parfile
	865	@c @paroutput
	866	@c @parfail
	867	@c @parcopy
	868	@parinputfield
	869	@paroutputfield
	870	@pardictionary
	871	@parprocess
	872	@parselect
	873	@parunselect
	874	@paroneline
	875	@paronefield
	876	@end table
	877
	878	@c ----------------------------------------
	879
	880	@node lem input
	881	@subsection Input
	882
	883	Lem reads a UTT file and processes the value of the @var{form} field
	884	(the input field may be changed with @option{--input-field} option).
	885
	886	@node lem output
	887	@subsection Output
	888
	889	@command{lem} adds a new annotation field, whose default name is @code{lem}. In
	890	case of ambiguity either the segment is multiplicated (default),
	891	multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
	892	annotation is produced as the value of single @code{lem} field (option
	893	@option{--one-field,-1}):
	894
	895	@itemize @bullet
	896
	897	@item
	898	unambiguous value format:
	899
	900	@example
	901	<lemma>,<descr>
	902	@end example
	903
	904	@item
	905	ambiguous value format (@option{--one-field} option)
	906
	907
	908	@example
	909	<lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
	910	@end example
	911
	912	(alternative descriptions for the same lemma are separated by commas,
	913	alternative lemmata are separated by semicolons.)
	914
	915	@end itemize
	916
	917	@node lem example
	918	@subsection Example
	919
	920	Input:
	921
	922	@example
	923	0000 07 W Piszemy
	924	0007 01 S _
	925	0008 05 W dobre
	926	0013 01 S _
	927	0014 08 W programy
	928	0022 01 P .
	929	0023 01 B \n
	930	@end example
	931
	932	Output (default):
	933
	934	@example
	935	0000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1
	936	0007 01 B _
	937	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
	938	0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
	939	0013 01 B _
	940	0014 08 W programy lem:program,N/GiNpCa
	941	0014 08 W programy lem:program,N/GiNpCn
	942	0014 08 W programy lem:program,N/GiNpCv
	943	0022 01 P .
	944	0023 01 B \n
	945	@end example
	946
	947	Output (@option{--one-line} option):
	948
	949	@example
	950	0000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1
	951	0007 01 S _
	952	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
	953	0013 01 S _
	954	0014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
	955	0022 01 P .
	956	0023 01 S \n
	957	@end example
	958
	959	Output (@option{--one-field} option):
	960
	961	@example
	962	0000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1
	963	0007 01 S _
	964	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
	965	0013 01 S _
	966	0014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
	967	0022 01 P .
	968	0023 01 S \n
	969	@end example
	970
	971	@c ----------------------------------------
	972
	973	@node lem dictionaries
	974	@subsection Dictionaries
	975
	976	@command{lem} requires a dictionary. The dictionary may be provided in
	977	one of two formats: in text (source) format or in binary (fsa) format.
	978
	979	@subsubheading Text format
	980
	981	Dictionary entries have the following structure:
	982
	983	@example
	984	<form>;<lemma>,<descr>[;<lemma>,<descr>]
	985	@end example
	986
	987	@var{lemma} may be given explicitly or in the cut-add format:
	988
	989	@example
	990	@code{[<cut1><add1>-]<cut2><add2>}
	991	@end example
	992
	993	meaning: replace prefix of length @code{<cut1>} with
	994	string @code{<add1>}, replace suffix of length @code{<cut2>} with string
	995	@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
	996	@samp{kot}, @code{3-4a³y} transforms @samp{najbielsi} into @samp{bia³y}
	997
	998	Each dictionary entry must be written in one line and must not contain blank characters.
	999
	1000	Examples:
	1001	@example
	1002	kot;0,N/GaNsCn
	1003	kota;1,N/GaNsCg;1,N/GaNsCa
	1004	kotu;1,N/GaNsCd
	1005	kotem;2,N/GaNsCi
	1006	kocie;3t,N/GaNsCl;3t,N/GaNsCv
	1007	najbielsi;3-4a³y,ADJ/DsNpCnGp
	1008	najbielsze;3-5a³y,ADJ/DsNpCnGaifn
	1009	najlepsi;dobry,ADJ/DsNpCnGp
	1010	najlepsze;dobry,ADJ/DsNpCnGaifn
	1011	@end example
	1012
	1013
	1014	The mandatory file name extension for a text dictionary is @code{dic}. For large
	1015	dictionaries it is preferable, however, to compile them into binary
	1016	(fsa) format.
	1017
	1018	@subsubheading Binary format
	1019
	1020	The mandatory file name extension for a binary dictionary is @code{bin}. To
	1021	compile a text dictionary into binary format, write:
	1022
	1023	@example
	1024	compiledic <dictionaryname>.dic
	1025	@end example
	1026
	1027	@subsubheading Polex/PMDBF dictionary
	1028
	1029	A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
	1030	the distribution as the default @emph{lem}'s dictionary. It's
	1031	located by default in:
	1032
	1033	@file{$HOME/.utt/pl/lem.bin}
	1034
	1035	@node lem hints
	1036	@subsection Hints
	1037
	1038	@c @subsubheading Combining data from multiple dictionaries
	1039
	1040	@c @itemize
	1041
	1042	@c @item Apply <dict1>, then apply <dict2> to words which were not annotatated.
	1043
	1044	@c @example
	1045	@c lem -d <dict1> \| lem -S lem -d <dict2>
	1046	@c @end example
	1047
	1048	@c @item Add annotations from two dictionaries <dict1> and <dict2>.
	1049
	1050	@c @example
	1051	@c lem -c -d <dict1> \| lem -S lem -d <dict2>
	1052	@c @end example
	1053
	1054	@c @end itemize
	1055
	1056
	1057	@c ---------------------------------------------------------------------
	1058	@c GUE
	1059	@c ---------------------------------------------------------------------
	1060
	1061	@page
	1062	@node gue
	1063	@section gue - morphological guesser
	1064
	1065	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1066
	1067	@item @strong{Authors:} @tab Micha³ Stolarski, Tomasz Obrêbski
	1068	@item @strong{Component category:} @tab filter
	1069
	1070	@end multitable
	1071
	1072	@command{gue} guesess morphological descriptions of the form contained
	1073	in the @var{form} field.
	1074
	1075	@menu
	1076	* gue command line options::
	1077	* gue example::
	1078	* gue dictionaries::
	1079	@end menu
	1080
	1081	@node gue command line options
	1082	@subsection Command line options
	1083
	1084	@table @code
	1085
	1086	@parhelp
	1087	@parversion
	1088	@parinteractive
	1089	@c @parfile
	1090	@c @paroutput
	1091	@c @parfail
	1092	@c @parcopy
	1093	@parinputfield
	1094	@paroutputfield
	1095	@pardictionary
	1096	@parprocess
	1097	@parselect
	1098	@parunselect
	1099	@paroneline
	1100	@paronefield
	1101
	1102	@item @b{@minus{}@minus{}delta=@var{n}}
	1103	Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
	1104
	1105
	1106	@item @b{@minus{}@minus{}cut-off=@var{n}}
	1107	Do not display answers with less weight than cut-off value (default=`200').
	1108
	1109
	1110	@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
	1111	Guess up to n descriptions (default=`0', which means 'display all results').
	1112
	1113
	1114
	1115	@end table
	1116
	1117	@node gue example
	1118	@subsection Example
	1119
	1120	@example
	1121	command: gue -n 2
	1122
	1123	input:
	1124	0000 07 W smerfny
	1125
	1126	output:
	1127	0000 07 W smerfny gue:,ADJ/CaDpGiNs
	1128	0000 07 W smerfny gue:,ADJ/CnvDpGaipNs
	1129	@end example
	1130
	1131
	1132	@node gue dictionaries
	1133	@subsection Dictionaries
	1134
	1135	@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
	1136	The fsa format is created by compiling text-format dictionaries.
	1137
	1138
	1139
	1140	@subsubheading Text format
	1141
	1142	Dictionary entries have the following structure:
	1143
	1144	@example
	1145	@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
	1146	@end example
	1147
	1148	@var{lemma} must be given in the cut-add format:
	1149
	1150	@example
	1151	@code{[<cut1><add1>-]<cut2><add2>}
	1152	@end example
	1153	(no spaces in between): replace prefix of length @var{cut1} with
	1154	string @var{add1}, replace suffix of length @var{cat2} with string
	1155	@var{add2}.
	1156
	1157
	1158	Example: @code{3-4a³y} transforms @i{najbielsi} into @i{bia³y}
	1159
	1160
	1161	@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
	1162
	1163	@var{weight} is an integer value between 1 and 999 indicating the
	1164	likelihood of the guess.
	1165
	1166	@example
	1167	*³kê;1a,N/GfNsCa
	1168	naj*elszy;3-4a³y,ADJ/...:...
	1169	@end example
	1170
	1171
	1172	@c ---------------------------------------------------------------------
	1173	@c COR
	1174	@c ---------------------------------------------------------------------
	1175
	1176	@page
	1177	@node cor
	1178	@section cor - spelling corrector
	1179
	1180	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1181	@item @strong{Authors:} @tab Tomasz Obrêbski, Micha³ Stolarski
	1182	@item @strong{Component category:} @tab filter
	1183	@end multitable
	1184
	1185	The spelling corrector applies Kemal Oflazer's dynamic programming
	1186	algorithm @cite{oflazer96} to the FSA representation of the set of
	1187	word forms of the Polex/PMDBF dictionary. Given an incorrect
	1188	word form it returns all word forms present in the dictionary whose
	1189	edit distance is smaller than the threshold given as the parameter.
	1190
	1191	By default @code{cor} replaces the contents of the @var{form} field
	1192	with new corrected value, placing the old contents in the @code{cor}
	1193	field.
	1194
	1195
	1196	@menu
	1197	* cor command line options::
	1198	* cor dictionaries::
	1199	@end menu
	1200
	1201
	1202	@node cor command line options
	1203	@subsection Command line options
	1204
	1205	@table @code
	1206
	1207	@parhelp
	1208	@parversion
	1209	@parinteractive
	1210	@c @parfile
	1211	@c @paroutput
	1212	@c @parfail
	1213	@c @parcopy
	1214	@parinputfield
	1215	@paroutputfield
	1216	@pardictionary
	1217	@parprocess
	1218	@parselect
	1219	@parunselect
	1220	@paroneline
	1221	@paronefield
	1222
	1223	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
	1224	Maximum edit distance (default='1').
	1225
	1226
	1227	@end table
	1228
	1229	@node cor dictionaries
	1230	@subsection Dictionaries
	1231
	1232	@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
	1233	The fsa format is created by compiling text-format dictionaries.
	1234
	1235	@subsubheading Text format
	1236
	1237	The @command{cor} dictionary is a list of words:
	1238	@example
	1239	odlot
	1240	odlotowy
	1241	odludek
	1242	@end example
	1243
	1244	@page
	1245	@node sen
	1246	@section sen - a sentensizer
	1247
	1248	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1249
	1250	@item @strong{Authors:} @tab Tomasz Obrêbski
	1251	@item @strong{Component category:} @tab filter
	1252
	1253	@end multitable
	1254
	1255	@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
	1256
	1257	@menu
	1258	@c * sen input::
	1259	@c * sen output::
	1260	* sen example::
	1261	@end menu
	1262
	1263	@node sen example
	1264	@subsection Example
	1265
	1266	@example
	1267	command: sen
	1268
	1269	input:
	1270	0000 05 W Cze¶æ
	1271	0005 01 P !
	1272	0006 01 S _
	1273	0007 02 W To
	1274	0009 01 S _
	1275	0010 02 W ja
	1276	0012 01 P .
	1277	0013 01 S \n
	1278
	1279	output:
	1280	0000 00 BOS *
	1281	0000 05 W Cze¶æ
	1282	0005 01 P !
	1283	0006 00 EOS *
	1284	0006 00 BOS *
	1285	0006 01 S _
	1286	0007 02 W To
	1287	0009 01 S _
	1288	0010 02 W ja
	1289	0012 01 P .
	1290	0013 01 S \n
	1291	0014 00 EOS *
	1292	@end example
	1293
	1294
	1295	@c ---------------------------------------------------------------------
	1296	@c GPH
	1297	@c ---------------------------------------------------------------------
	1298
	1299	@c @node gph - graphizer
	1300	@c @chapter gph - graphizer
	1301
	1302	@c Authors: Tomasz Obrêbski
	1303
	1304
	1305
	1306	@c SER
	1307	@c ---------------------------------------------------------------------
	1308	@c ---------------------------------------------------------------------
	1309
	1310	@page
	1311	@node ser
	1312	@section ser - pattern search tool
	1313
	1314	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1315	@item @strong{Authors:} @tab Tomasz Obrêbski
	1316	@item @strong{Component category:} @tab filter
	1317	@end multitable
	1318
	1319	@command{ser} looks for patterns in UTT-formatted texts.
	1320
	1321	@menu
	1322	* ser command line options::
	1323	* ser pattern::
	1324	* ser how ser works::
	1325	* ser customization::
	1326	* ser limitations::
	1327	* ser requirements::
	1328	@end menu
	1329
	1330
	1331	@c ---------------------------------------------------------------------
	1332	@node ser command line options
	1333	@subsection Command line options
	1334
	1335	@table @code
	1336
	1337	@parhelp
	1338	@parversion
	1339	@c @parfile
	1340	@c @paroutput
	1341	@c @parinputfield
	1342	@c @paroutputfield
	1343	@parprocess
	1344	@parinteractive
	1345
	1346	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1347	The search pattern.
	1348
	1349	@item @b{@minus{}@minus{}morph=@var{field}}
	1350	The name of the annotation field containing the morphological
	1351	description (default @code{lem}).
	1352
	1353	@item @b{@minus{}@minus{}flex}
	1354	Only print the generated flex source code.
	1355
	1356	@item @b{@minus{}@minus{}macro=@var{filename}}
	1357	Read macrodefinitions from file @var{filename} rather than from
	1358	default location. This option allows to redefine the set of terms.
	1359
	1360	@item @b{@minus{}@minus{}define=@var{filename}}
	1361	Append macrodefinitions from file @var{filename}. This option
	1362	allows to extend the set of terms.
	1363
	1364	@end table
	1365
	1366
	1367	@c ---------------------------------------------------------------------
	1368	@node ser pattern
	1369	@subsection Pattern
	1370
	1371	The @command{ser} pattern is a regular expression over terms corresponding
	1372	to text segments or segment sequences. Predefined terms are:
	1373
	1374	@table @code
	1375
	1376	@item seg(@var{t},@var{f},@var{a})
	1377	a segment of type @var{t}, containing form @var{f} and annotation
	1378	@var{a}
	1379
	1380	@item form(@var{f})
	1381	a segment containing form @var{f}
	1382
	1383	@item field(@var{f})
	1384	a segment containing annotation field @var{f}
	1385
	1386	@item space(@var{f})
	1387	a space segment of form @var{f}
	1388
	1389	@item word(@var{f})
	1390	a word segment of form @var{f}
	1391
	1392	@item punct(@var{f})
	1393	a punct segment of form @var{f}
	1394
	1395	@item number(@var{f})
	1396	a number segment of form @var{f}
	1397
	1398	@item lexeme(@var{f})
	1399	a word segment with lemma @var{f}
	1400
	1401	@item cat(@var{c})
	1402	a word segment of category @var{c}
	1403
	1404	@end table
	1405
	1406	All arguments are optional. If an argument is omitted, an arbitrary
	1407	string of non-blank characters is assumed as the argument value. Term
	1408	arguments may be arbitrary character-level regular expressions. The
	1409	following special symbols can by used:
	1410
	1411	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1412	@item @code{[@dots{}]} @tab a character class
	1413	@item @code{[^@dots{}]} @tab a negated character class
	1414	@item @code{\|} @tab alternative
	1415	@item @code{*} @tab repetition, including zero times
	1416	@item @code{+} @tab repetition, at least one time
	1417	@item @code{?} @tab optionality
	1418	@item @code{@{@var{m},@var{n}@}} @tab repetition from @var{m} to @var{n} times
	1419	@item @code{@{@var{m},@}} @tab repetition @var{m} or more times
	1420	@item @code{@{@var{m}@}} @tab repetition @var{m} times
	1421	@item @code{@var{\ddd}} @tab the character with octal value @var{ddd}
	1422	@item @code{\x@var{hh}} @tab the character with hexadecimal value @var{hh}
	1423	@item @code{( )} @tab parentheses, used to override precedence
	1424	@c @end multitable
	1425
	1426	@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1427	@item @code{.} @tab a non-blank character
	1428	@item @code{\w} @tab a letter
	1429	@item @code{\W} @tab a non-blank character other than a letter
	1430	@item @code{\d} @tab a digit
	1431	@item @code{\D} @tab a non-blank character other than a digit
	1432	@item @code{\s} @tab a space or tab character
	1433	@item @code{\S} @tab a non-blank character (the same as @code{.})
	1434	@item @code{\l} @tab a lowercase letter
	1435	@item @code{\L} @tab an uppercase letter
	1436	@end multitable
	1437
	1438
	1439	@noindent The following characters:
	1440	@example
	1441	@verb{% [ ] ^ \| * + ? { } , . < > \ %}
	1442	@end example
	1443	must be escaped with a backslash, i.e. written as:
	1444	@example
	1445	@verb{% \[ \] \^ \\| \* \+ \? \{ \} \, \. \< \> \\ %}
	1446	@end example
	1447
	1448	@quotation Note
	1449	The special symbols are ... borrowed from Perl with minor
	1450	modifications ... for convenience
	1451	The meaning of certain special characters/sequences slightly differs
	1452	from their common ???. This is motivated by convenience reasons.
	1453	The meaning of the @code{.} special character is modified due to
	1454	the special function of spaces in utt files (they are field
	1455	separators). Use @code{\s} to explicitly
	1456	@end quotation
	1457
	1458	In the argument of the @code{cat} term a special operator <...> may be
	1459	used. A category specification enclosed in angle brackets matches all
	1460	category descriptions which are consistent (non-contradictory) with the
	1461	specification. For example @code{<N>} matches all noun descriptions,
	1462	@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
	1463
	1464
	1465	@*
	1466	@noindent @b{Examples of one-segment patterns:}
	1467
	1468	@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1469	@item @code{seg} @tab any segment
	1470	@item @code{word} @tab any word-form
	1471	@item @code{word(pomocy)} @tab the word-form @samp{pomocy}
	1472	@item @code{word(naj.+)} @tab a word-form beginning with @samp{naj}
	1473	@item @code{word(\L\l+)} @tab a capitalized word-form
	1474	@item @code{punct} @tab a punctuation character
	1475	@item @code{space(.\\n.)} @tab a space segment containing a newline character
	1476	@item @code{lexeme(pomoc)} @tab any form of the lexeme 'pomoc'
	1477	@item @code{cat(N/.*)} @tab a word which category starts with @code{N/}
	1478	@item @code{cat(<N/Ca>)} @tab a word which category matches @code{N/Ca}
	1479	@end multitable
	1480
	1481	@*
	1482	@noindent @b{Examples of multi-segment patterns:}
	1483
	1484	@table @code
	1485
	1486	@item (word(\L) punct(\.) space?)+ word(\L\l+)
	1487	a sequence of initials followed by a surname
	1488
	1489	@item punct seg(W\|S\|N)* cat(<NPRO/Sr>) seg(W\|S\|N)* punct
	1490	a text fragment between two punctuation characters, containing an
	1491	ocurrence of a relative pronoun
	1492
	1493	@end table
	1494
	1495
	1496	@node ser how ser works
	1497	@subsection How ser works
	1498
	1499	@node ser customization
	1500	@subsection Customization
	1501
	1502	@c All predefined terms correspond to single segments,
	1503
	1504	@example
	1505	define(`verbseq', `(cat(V) (space cat(V)))')
	1506	@end example
	1507
	1508
	1509	the term @code{cat()} may not be used as a ... of
	1510
	1511	@c See @command{m4} manual for further details on macro definition format.
	1512
	1513	@node ser limitations
	1514	@subsection Limitations
	1515
	1516	more than 3 attributes in <>.
	1517
	1518	@node ser requirements
	1519	@subsection Requirements
	1520
	1521	In order to run @command{ser}, the following programs must be
	1522	installed in the system:
	1523
	1524	@itemize
	1525
	1526	@item @command{m4}
	1527	@item @command{grep}
	1528	@item @command{flex}
	1529	@item @command{gcc}
	1530
	1531	@end itemize
	1532
	1533
	1534	@c GRP
	1535	@c ---------------------------------------------------------------------
	1536	@c ---------------------------------------------------------------------
	1537
	1538	@page
	1539	@node grp
	1540	@section grp - pattern search tool
	1541
	1542	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1543	@item @strong{Authors:} @tab Tomasz Obrêbski
	1544	@item @strong{Component category:} @tab filter
	1545	@end multitable
	1546
	1547
	1548	@code{gre} selects sentences containing an expression matching a
	1549	pattern. The pattern format is exactly the same as that accepted by
	1550	@code{ser}.
	1551
	1552	@code{gre} is intended mainly for speeding up corpus search process.
	1553	It is extremely fast (processing speed is usually higher then the speed
	1554	of reading the corpus file from disk).
	1555
	1556
	1557
	1558	@c @menu
	1559	@c * ser command line options::
	1560	@c * ser pattern::
	1561	@c * ser how ser works::
	1562	@c * ser customization::
	1563	@c * ser limitations::
	1564	@c * ser requirements::
	1565	@c @end menu
	1566	@menu
	1567	* grp command line options::
	1568	* grp pattern::
	1569	* grp hints::
	1570	@end menu
	1571
	1572	@node grp command line options
	1573	@subsection Command line options
	1574
	1575	@table @code
	1576
	1577	@parhelp
	1578	@parversion
	1579	@c @parfile
	1580	@c @paroutput
	1581	@c @parinputfield
	1582	@c @paroutputfield
	1583	@parprocess
	1584	@parinteractive
	1585
	1586	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1587	The search pattern.
	1588
	1589	@item @b{@minus{}@minus{}morph=@var{field}}
	1590	The name of the annotation field containing the morphological
	1591	description (default @code{lem}).
	1592
	1593	@item @b{@minus{}@minus{}command}
	1594	Only print the generated flex source code.
	1595
	1596	@item @b{@minus{}@minus{}macro=@var{filename}}
	1597	Read macrodefinitions from file @var{filename} rather than from
	1598	default location. This option allows to redefine the set of terms.
	1599
	1600	@item @b{@minus{}@minus{}define=@var{filename}}
	1601	Append macrodefinitions from file @var{filename}. This option
	1602	allows to extend the set of terms.
	1603
	1604	@end table
	1605
	1606
	1607	@node grp pattern
	1608	@subsection Pattern
	1609
	1610	(see @code{ser})
	1611
	1612	@node grp hints
	1613	@subsection Hints
	1614
	1615	The corpus search speed may be increased by combining grp with lzop
	1616	compression tool (grp usually processes data faster than it is read from a
	1617	disk, especially for slow laptop drives).
	1618
	1619	@example
	1620	cat corpus \| tok \| sen \| lem \| grp -a p \| lzop -7 > corpus.grp.lzo
	1621	@end example
	1622
	1623	@example
	1624	lzop -cd corpus.grp.lzo \| grp -a gP -e @var{EXPR} \| ser -e @var{EXPR}
	1625	@end example
	1626
	1627
	1628	@c ---------------------------------------------------------------------
	1629	@c kot
	1630	@c ---------------------------------------------------------------------
	1631	@c ---------------------------------------------------------------------
	1632
	1633	@page
	1634	@node kot
	1635	@section kot - untokenizer
	1636
	1637	Authors: Tomasz Obrêbski
	1638
	1639	@command{kot} is the opposite of @command{tok}. It changes UTT-formatted text into plain text.
	1640
	1641	@menu
	1642	* kot command line options::
	1643	* kot usage examples::
	1644	@end menu
	1645
	1646	@node kot command line options
	1647	@subsection Command line options
	1648
	1649	@table @code
	1650
	1651	@parhelp
	1652
	1653	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
	1654
	1655	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	1656
	1657	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	1658
	1659	@c @item @b{@minus{}@minus{}interactive @minus{}i}
	1660
	1661	@c @item @b{@minus{}@minus{}config=@var{filename}}
	1662
	1663	@item
	1664
	1665	@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
	1666	print @var{string} between nonadjacent segments of the input file
	1667
	1668	@item @b{@minus{}@minus{}spaces, @minus{}r}
	1669	retain the special characters @code{_}, @code{\t},
	1670	@code{\n}, @code{\r}, @code{\f} unexpanded in the output
	1671
	1672	@end table
	1673
	1674	@node kot usage examples
	1675	@subsection Usage examples
	1676
	1677	@example
	1678	cat legia.txt \| tok \| kot
	1679	@end example
	1680
	1681	@example
	1682	cat legia.txt \| tok \| lem -1 \| kot
	1683	@end example
	1684
	1685	@c CON............................................................
	1686	@c ...............................................................
	1687	@c ...............................................................
	1688
	1689	@page
	1690	@node con
	1691	@section con - concordance table generator
	1692
	1693	@command{con} generates a concordance table based on a pattern given to @command{ser}.
	1694
	1695	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1696	@item @strong{Authors:} @tab Justyna Walkowska
	1697	@item @strong{Component category:} @tab sink
	1698	@end multitable
	1699	@c
	1700
	1701	@menu
	1702	* con command line options::
	1703	* con usage example::
	1704	* con hints::
	1705	@end menu
	1706
	1707	@node con command line options
	1708	@subsection Command line options
	1709
	1710	@table @code
	1711
	1712	@parhelp
	1713
	1714	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
	1715	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
	1716	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	1717	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	1718	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
	1719	@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
	1720	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
	1721	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
	1722	@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
	1723	@c @item @b{@minus{}@minus{}interactive @minus{}i}
	1724	@c @item @b{@minus{}@minus{}config=@var{filename}}
	1725	@c @item
	1726	@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1727	@c search pattern
	1728	@c
	1729	@c @item @b{@minus{}@minus{}flex}
	1730	@c only print the generated flex source code
	1731	@c
	1732	@c @item @b{@minus{}@minus{}macro=@var{filename}}
	1733	@c read macrodefinitions from file @var{filename} rather than from
	1734	@c default location. This option allows to redefine the set of terms.
	1735	@c
	1736	@c @item @b{@minus{}@minus{}define=@var{filename}}
	1737	@c append macrodefinitions from file @var{filename}. This option
	1738	@c allows to extend the set of terms.
	1739
	1740	@item @b{@minus{}@minus{}left @minus{}l}
	1741	Left context info (default='30c'). Example:
	1742	@example
	1743	-l=5c: left context is 5 characters
	1744	-l=5w: left context is 5 words
	1745	-l=5s: left context is 5 non-empty input lines
	1746	-l='\s*\S+\sr\S+BOS': left context starts with the given regex
	1747	@end example
	1748
	1749	@item @b{@minus{}@minus{}right @minus{}r}
	1750	Right context info (default='30c').
	1751	@item @b{@minus{}@minus{}trim @minus{}t}
	1752	Clear incomplete words from output.
	1753	@item @b{@minus{}@minus{}white @minus{}w}
	1754	DO NOT change all white characters into spaces.
	1755	@item @b{@minus{}@minus{}column @minus{}c}
	1756	Left column minimal width in characters (default = 0).
	1757	@item @b{@minus{}@minus{}ignore @minus{}i}
	1758	Ignore segment inconsistency in the input.
	1759	@item @b{@minus{}@minus{}bon}
	1760	Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
	1761	@item @b{@minus{}@minus{}eob}
	1762	End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
	1763	@item @b{@minus{}@minus{}bod}
	1764	Selected segment beginning display string (default='[').
	1765	@item @b{@minus{}@minus{}eod}
	1766	Selected segment end display string (default=']').
	1767
	1768
	1769
	1770	@end table
	1771
	1772	@node con usage example
	1773	@subsection Usage example
	1774	@example
	1775	cat file.txt \| tok \| lem -1 \| ser -e 'lexeme(dom) \| con'
	1776	@end example
	1777
	1778
	1779	@node con hints
	1780	@subsection Hints
	1781
	1782	@command{con} is a rather slow program. Do not pass large amounts of
	1783	redundant text through this program. @command{con} works fine in the following
	1784	sequence:
	1785
	1786	@example
	1787	... \| grp -e EXPR \| ser -e EXPR \| con
	1788	@end example
	1789
	1790
	1791
	1792	@c ---------------------------------------------------------------------
	1793	@c ---------------------------------------------------------------------
	1794
	1795	@page
	1796	@node Auxiliary tools
	1797	@chapter Auxiliary tools
	1798
	1799	@menu
	1800	* compiledic:: dictionary compiler
	1801	* fla:: UTT file flattener
	1802	* unfla:: UTT file unflattener
	1803	@end menu
	1804
	1805
	1806	@page
	1807	@node compiledic
	1808	@section compiledic - the dictionary compiler
	1809
	1810	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1811	@item @strong{Authors:} @tab Michal Stolarski, Tomasz Obrebski
	1812	@item @strong{Component category:} @tab additional tool
	1813	@end multitable
	1814	@c
	1815
	1816	@command{compiledic} compiles dictionaries in text format (@code{.dic} extension) into binary
	1817	(FSA) format (@code{.bin} extension).
	1818
	1819	Automaton representation of a dictionary is built using the AT&T tools:
	1820	@itemize
	1821	@item AT&T FSM Library,
	1822	@item AT&T Lextools.
	1823	@end itemize
	1824
	1825	In order for the compiledic program to work you have to install the
	1826	above mentioned packages into your system. They are freely available
	1827	for non-commercial use.
	1828
	1829	Usage:
	1830	@example
	1831	compiledic <dictionaryname>.dic
	1832	@end example
	1833
	1834	The file <dictionaryname>.bin will be generated.
	1835
	1836	Remarque: The program produces a lot of temporary files which are
	1837	stored in the current directory. They are deleted after successfull
	1838	termination of the program.
	1839
	1840	@c @menu
	1841	@c * con command line options::
	1842	@c * con usage example::
	1843	@c * con hints::
	1844	@c @end menu
	1845
	1846
	1847	@page
	1848	@node fla
	1849	@section fla - the UTT file flattener
	1850
	1851	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1852	@item @strong{Authors:} @tab Tomasz Obrêbski
	1853	@item @strong{Component category:} @tab filter
	1854	@end multitable
	1855	@c
	1856
	1857	@command{fla} ``flattens'' a utt file by merging segments belonging
	1858	to one sentence in one line. Technically, end-of-line characters
	1859	('\n', ASCII code 10) are replaced with line-feed characters ('\f',
	1860	ASCII code 12). The flattening makes it possible to process UTT files
	1861	with such tools as @command{grep} or @command{sed} sentence by
	1862	sentence (used in @command{grp} and @command{mar}).
	1863
	1864	Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
	1865
	1866	Flattened files are still human-readible.
	1867
	1868	Usage:
	1869
	1870	@example
	1871	fla [<bosregex>]
	1872	@end example
	1873
	1874	The facultative argument is a regular expression describing segments
	1875	which should be treated as sentence beginnings (the test is: the
	1876	segment contains a fragment matching the @code{<bosregex>}). By
	1877	default, segments containing a field @code{BOS} are seeked.
	1878	@c @menu
	1879	@c * con command line options::
	1880	@c * con usage example::
	1881	@c * con hints::
	1882	@c @end menu
	1883
	1884
	1885
	1886	@page
	1887	@node unfla
	1888	@section unfla - the UTT file unflattener
	1889
	1890	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1891	@item @strong{Authors:} @tab Tomasz Obrêbski
	1892	@item @strong{Component category:} @tab filter
	1893	@end multitable
	1894
	1895	@command{unfla} transforms a flattened UTT file, produced by
	1896	@command{fla}, into the regular format by restoring end-of-line
	1897	characters.
	1898
	1899
	1900
	1901
	1902	@c ---------------------------------------------------------------------
	1903	@c USAGE EXAMPLES
	1904	@c ---------------------------------------------------------------------
	1905
	1906	@node Usage examples
	1907	@chapter Usage examples
	1908
	1909	@subsubheading Simple pipelines
	1910
	1911	@enumerate
	1912
	1913	@item tokenization
	1914
	1915	cat text \| tok > output1
	1916
	1917	@item morphological annotation (1)
	1918
	1919	simple dictionary based lemmatization
	1920
	1921	cat text \| tok \| lem > output1
	1922
	1923	@item morphological annotation (2)
	1924
	1925	1) perform dictionary-based lemmatization
	1926	4) guess descriptions for words which have no annotation
	1927
	1928	@example
	1929	cat text \| tok \| lem \| gue -S lem > output2
	1930	@end example
	1931
	1932	@item morphological annotation (3)
	1933
	1934	1) perform dictionary-based lemmatization
	1935	2) try to correct words with no annotation
	1936	3) perform dictionary-based lemmatization of corrected words
	1937	4) guess descriptions for words which still have no annotation
	1938
	1939	@example
	1940	cat text \| tok \| lem \| cor -p W -S lem \| lem -I cor \| gue -p W -S lem
	1941	@end example
	1942	@item spelling correction
	1943
	1944
	1945
	1946	@example
	1947	cat text \| tok \| lem --only-fail \| cor -1 > output3
	1948	@end example
	1949
	1950	@item Expression extraction
	1951
	1952	Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
	1953
	1954	@example
	1955	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' -m \| kot > output4
	1956	@end example
	1957
	1958	@item A word in context
	1959
	1960	Extraction of text fragments containing a form of the lexeme 'rozmowa' in
	1961	the context of 5 preceeding and 5 succeeding corpus segments.
	1962
	1963	@example
	1964	cat text \| tok \| lem -1 \| ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m \| kot > output
	1965	@end example
	1966
	1967	@item generation of concordance table (1)
	1968
	1969	@example
	1970	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
	1971	@end example
	1972
	1973	10"
	1974
	1975	@item generation of concordance table (2)
	1976
	1977	The same as above but much faster
	1978
	1979	@example
	1980	cat text \| tok \| lem -1 \| \
	1981	grp -e 'cat(<V>) space lexeme(rozmowa)' \| \
	1982	ser -e 'cat(<V>) space lexeme(rozmowa)' \| \
	1983	con
	1984	@end example
	1985
	1986	2"
	1987
	1988	@item generation of concordance table (3)
	1989
	1990	Usually, one performs repetitively search over the same corpus. In
	1991	such case it is advisable to transform the corpus data into the format
	1992	required by @command{grp} first, and then use the preprocessed data.
	1993
	1994	As @command{grp} (@command{grep}) processes data faster then it is
	1995	read from the disk drive, the search time may be still shortened by
	1996	using file compression techniques. We suggest usin @command{lzop}.
	1997
	1998	@item the fastest way to search a large corpus
	1999
	2000	step 1: preprocessing
	2001
	2002	@example
	2003	cat corpus \| tok \| sen \| lem -1 \
	2004	\| grp -a p \| lzop -7 > corpus.grp.lzo
	2005	@end example
	2006
	2007	step 2: search
	2008
	2009	@example
	2010	lzop -cd corpus.grp.lzo \| grp -a gP -e 'cat(<V>) space
	2011	lexeme(rozmowa)' \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
	2012	@end example
	2013
	2014	@end enumerate
	2015
	2016	@subsubheading More complicated configurations
	2017
	2018
	2019	@example
	2020	mknod fifo1 p
	2021	mknod fifo2 p
	2022	mknod fifo3 p
	2023	mknod fifo4 p
	2024	mknod fifo5 p
	2025
	2026	tok \| lem -p W -e fifo1 > fifo2 &
	2027	cor -e fifo3 < fifo1 \| lem > fifo4 &
	2028	gue < fifo3 > fifo5 &
	2029	sort -m fifo2 fifo4 fifo5
	2030
	2031	rm fifo?
	2032	@end example
	2033
	2034
	2035	@c ---------------------------------------------------------------------
	2036	@c ---------------------------------------------------------------------
	2037
	2038	@c ---------------------------------------------------------------------
	2039	@c PMDBF DICTIONARY
	2040	@c ---------------------------------------------------------------------
	2041
	2042	@node PMDBF dictionary
	2043	@chapter PMDBF dictionary
	2044
	2045	UTT components come with lexical data derived from Polish
	2046	Morphological Database (PMDB).
	2047
	2048	@menu
	2049	* PMDBF files::
	2050	* PMDBF tag structure::
	2051	* PMDBF parts of speech::
	2052	* PMDBF morphosyntactic attributes::
	2053	@end menu
	2054
	2055	@node PMDBF files
	2056	@section Files
	2057
	2058	@node PMDBF tag structure
	2059	@section Tag structure
	2060
	2061	pos = [[:upper:]]+
	2062
	2063	attr = [[:upper:]]+
	2064
	2065	val = [[:lower:][:digit:]?!*+-] \| <[^>\n]+>
	2066
	2067	descr = pos ( / ( attr val + ) + ) ?
	2068
	2069	@node PMDBF parts of speech
	2070	@section Parts of speech
	2071
	2072	@multitable {ADJPRP} { adjectival-passive-participle }
	2073	@item @code{N} @tab noun
	2074	@item @code{NPRO} @tab nominal-pronoun
	2075	@item @code{NV} @tab deverbal-noun
	2076	@item @code{V} @tab verb
	2077	@item @code{BYC} @tab byc
	2078	@item @code{VNI} @tab non-inflected-verb
	2079	@item @code{ADJ} @tab adjective
	2080	@item @code{ADJPAP} @tab adjectival-passive-participle
	2081	@item @code{ADJPRP} @tab adjectival-present-participle
	2082	@item @code{ADJPP} @tab adjectival-past-participle
	2083	@item @code{ADJPRO} @tab adjectival-pronoun
	2084	@item @code{ADJNUM} @tab adjectival-numeral
	2085	@item @code{ADV} @tab adverb
	2086	@item @code{ADVANP} @tab adverbial-anterior-participle
	2087	@item @code{ADVPRP} @tab adverbial-present-participle
	2088	@item @code{ADVPRO} @tab adverbial-pronoun
	2089	@item @code{ADVNUM} @tab adverbial-numeral
	2090	@item @code{P} @tab preposition
	2091	@item @code{PPRO} @tab prep-noun-pronoun
	2092	@item @code{CONJ} @tab conjunction
	2093	@item @code{EXCL} @tab exclamation
	2094	@item @code{APP} @tab call
	2095	@item @code{ONO} @tab onomatopoeia
	2096	@item @code{PART} @tab particle
	2097	@item @code{NUMCRD} @tab cardinal-numeral
	2098	@item @code{NUMCOL} @tab collective-numeral
	2099	@item @code{NUMPAR} @tab partitive-numeral
	2100	@item @code{NUMORD} @tab ordinal-numeral
	2101	@end multitable
	2102
	2103	@node PMDBF morphosyntactic attributes
	2104	@section Morphosyntactic attributes
	2105
	2106	@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	2107	@c @headitem Attr @tab Val @tab Description
	2108	@item
	2109	@code{A} @tab @tab Aspect
	2110	@item
	2111	@tab @code{p} @tab perfect
	2112	@item
	2113	@tab @code{i} @tab imperfect.
	2114	@item
	2115	@item
	2116	@code{V} @tab @tab Verb-Form
	2117	@item
	2118	@tab @code{b} @tab infinitive,
	2119	@item
	2120	@tab @code{p} @tab personal,
	2121	@item
	2122	@tab @code{i} @tab impersonal.
	2123	@item
	2124	@item
	2125	@code{M} @tab @tab Mood
	2126	@item
	2127	@tab @code{d} @tab declarative,
	2128	@item
	2129	@tab @code{c} @tab conditional,
	2130	@item
	2131	@tab @code{i} @tab imperative.
	2132	@item
	2133	@item
	2134	@code{T} @tab @tab Tense
	2135	@item
	2136	@tab @code{a} @tab past,
	2137	@item
	2138	@tab @code{r} @tab present,
	2139	@item
	2140	@tab @code{f} @tab future.
	2141	@item
	2142	@item
	2143	@code{P} @tab @tab Person
	2144	@item
	2145	@tab @code{1} @tab 1,
	2146	@item
	2147	@tab @code{2} @tab 2,
	2148	@item
	2149	@tab @code{3} @tab 3.
	2150	@item
	2151	@item
	2152	@code{D} @tab @tab Degree
	2153	@item
	2154	@tab @code{p} @tab positive,
	2155	@item
	2156	@tab @code{c} @tab comparative,
	2157	@item
	2158	@tab @code{s} @tab superlative.
	2159	@item
	2160	@item
	2161	@code{N} @tab @tab Number
	2162	@item
	2163	@tab @code{s} @tab singular,
	2164	@item
	2165	@tab @code{p} @tab plural.
	2166	@item
	2167	@item
	2168	@code{C} @tab @tab Case
	2169	@item
	2170	@tab @code{n} @tab nominative,
	2171	@item
	2172	@tab @code{g} @tab genitive,
	2173	@item
	2174	@tab @code{d} @tab dative,
	2175	@item
	2176	@tab @code{a} @tab accusative,
	2177	@item
	2178	@tab @code{i} @tab instrumantal,
	2179	@item
	2180	@tab @code{l} @tab locative,
	2181	@item
	2182	@tab @code{v} @tab vocative.
	2183	@item
	2184	@item
	2185	@code{G} @tab @tab Gender
	2186	@item
	2187	@tab @code{p} @tab masculine-personal,
	2188	@item
	2189	@tab @code{a} @tab masculine-animal,
	2190	@item
	2191	@tab @code{i} @tab masculine-inanimate,
	2192	@item
	2193	@tab @code{f} @tab feminine,
	2194	@item
	2195	@tab @code{n} @tab neuter.
	2196	@end multitable
	2197
	2198
	2199	@c ---------------------------------------------------------------------
	2200	@c ---------------------------------------------------------------------
	2201	@c
	2202	@c @node Examples
	2203	@c @chapter Examples
	2204
	2205	@c ----------------------------------------------------------------------
	2206	@c ----------------------------------------------------------------------
	2207
	2208	@node GNU Free Documentation License
	2209	@chapter GNU Free Documentation License
	2210
	2211	@c The GNU Free Documentation License.
	2212	@center Version 1.2, November 2002
	2213
	2214	@c This file is intended to be included within another document,
	2215	@c hence no sectioning command or @node.
	2216
	2217	@display
	2218	Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
	2219	51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
	2220
	2221	Everyone is permitted to copy and distribute verbatim copies
	2222	of this license document, but changing it is not allowed.
	2223	@end display
	2224
	2225	@enumerate 0
	2226	@item
	2227	PREAMBLE
	2228
	2229	The purpose of this License is to make a manual, textbook, or other
	2230	functional and useful document @dfn{free} in the sense of freedom: to
	2231	assure everyone the effective freedom to copy and redistribute it,
	2232	with or without modifying it, either commercially or noncommercially.
	2233	Secondarily, this License preserves for the author and publisher a way
	2234	to get credit for their work, while not being considered responsible
	2235	for modifications made by others.
	2236
	2237	This License is a kind of ``copyleft'', which means that derivative
	2238	works of the document must themselves be free in the same sense. It
	2239	complements the GNU General Public License, which is a copyleft
	2240	license designed for free software.
	2241
	2242	We have designed this License in order to use it for manuals for free
	2243	software, because free software needs free documentation: a free
	2244	program should come with manuals providing the same freedoms that the
	2245	software does. But this License is not limited to software manuals;
	2246	it can be used for any textual work, regardless of subject matter or
	2247	whether it is published as a printed book. We recommend this License
	2248	principally for works whose purpose is instruction or reference.
	2249
	2250	@item
	2251	APPLICABILITY AND DEFINITIONS
	2252
	2253	This License applies to any manual or other work, in any medium, that
	2254	contains a notice placed by the copyright holder saying it can be
	2255	distributed under the terms of this License. Such a notice grants a
	2256	world-wide, royalty-free license, unlimited in duration, to use that
	2257	work under the conditions stated herein. The ``Document'', below,
	2258	refers to any such manual or work. Any member of the public is a
	2259	licensee, and is addressed as ``you''. You accept the license if you
	2260	copy, modify or distribute the work in a way requiring permission
	2261	under copyright law.
	2262
	2263	A ``Modified Version'' of the Document means any work containing the
	2264	Document or a portion of it, either copied verbatim, or with
	2265	modifications and/or translated into another language.
	2266
	2267	A ``Secondary Section'' is a named appendix or a front-matter section
	2268	of the Document that deals exclusively with the relationship of the
	2269	publishers or authors of the Document to the Document's overall
	2270	subject (or to related matters) and contains nothing that could fall
	2271	directly within that overall subject. (Thus, if the Document is in
	2272	part a textbook of mathematics, a Secondary Section may not explain
	2273	any mathematics.) The relationship could be a matter of historical
	2274	connection with the subject or with related matters, or of legal,
	2275	commercial, philosophical, ethical or political position regarding
	2276	them.
	2277
	2278	The ``Invariant Sections'' are certain Secondary Sections whose titles
	2279	are designated, as being those of Invariant Sections, in the notice
	2280	that says that the Document is released under this License. If a
	2281	section does not fit the above definition of Secondary then it is not
	2282	allowed to be designated as Invariant. The Document may contain zero
	2283	Invariant Sections. If the Document does not identify any Invariant
	2284	Sections then there are none.
	2285
	2286	The ``Cover Texts'' are certain short passages of text that are listed,
	2287	as Front-Cover Texts or Back-Cover Texts, in the notice that says that
	2288	the Document is released under this License. A Front-Cover Text may
	2289	be at most 5 words, and a Back-Cover Text may be at most 25 words.
	2290
	2291	A ``Transparent'' copy of the Document means a machine-readable copy,
	2292	represented in a format whose specification is available to the
	2293	general public, that is suitable for revising the document
	2294	straightforwardly with generic text editors or (for images composed of
	2295	pixels) generic paint programs or (for drawings) some widely available
	2296	drawing editor, and that is suitable for input to text formatters or
	2297	for automatic translation to a variety of formats suitable for input
	2298	to text formatters. A copy made in an otherwise Transparent file
	2299	format whose markup, or absence of markup, has been arranged to thwart
	2300	or discourage subsequent modification by readers is not Transparent.
	2301	An image format is not Transparent if used for any substantial amount
	2302	of text. A copy that is not ``Transparent'' is called ``Opaque''.
	2303
	2304	Examples of suitable formats for Transparent copies include plain
	2305	@sc{ascii} without markup, Texinfo input format, La@TeX{} input
	2306	format, @acronym{SGML} or @acronym{XML} using a publicly available
	2307	@acronym{DTD}, and standard-conforming simple @acronym{HTML},
	2308	PostScript or @acronym{PDF} designed for human modification. Examples
	2309	of transparent image formats include @acronym{PNG}, @acronym{XCF} and
	2310	@acronym{JPG}. Opaque formats include proprietary formats that can be
	2311	read and edited only by proprietary word processors, @acronym{SGML} or
	2312	@acronym{XML} for which the @acronym{DTD} and/or processing tools are
	2313	not generally available, and the machine-generated @acronym{HTML},
	2314	PostScript or @acronym{PDF} produced by some word processors for
	2315	output purposes only.
	2316
	2317	The ``Title Page'' means, for a printed book, the title page itself,
	2318	plus such following pages as are needed to hold, legibly, the material
	2319	this License requires to appear in the title page. For works in
	2320	formats which do not have any title page as such, ``Title Page'' means
	2321	the text near the most prominent appearance of the work's title,
	2322	preceding the beginning of the body of the text.
	2323
	2324	A section ``Entitled XYZ'' means a named subunit of the Document whose
	2325	title either is precisely XYZ or contains XYZ in parentheses following
	2326	text that translates XYZ in another language. (Here XYZ stands for a
	2327	specific section name mentioned below, such as ``Acknowledgements'',
	2328	``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title''
	2329	of such a section when you modify the Document means that it remains a
	2330	section ``Entitled XYZ'' according to this definition.
	2331
	2332	The Document may include Warranty Disclaimers next to the notice which
	2333	states that this License applies to the Document. These Warranty
	2334	Disclaimers are considered to be included by reference in this
	2335	License, but only as regards disclaiming warranties: any other
	2336	implication that these Warranty Disclaimers may have is void and has
	2337	no effect on the meaning of this License.
	2338
	2339	@item
	2340	VERBATIM COPYING
	2341
	2342	You may copy and distribute the Document in any medium, either
	2343	commercially or noncommercially, provided that this License, the
	2344	copyright notices, and the license notice saying this License applies
	2345	to the Document are reproduced in all copies, and that you add no other
	2346	conditions whatsoever to those of this License. You may not use
	2347	technical measures to obstruct or control the reading or further
	2348	copying of the copies you make or distribute. However, you may accept
	2349	compensation in exchange for copies. If you distribute a large enough
	2350	number of copies you must also follow the conditions in section 3.
	2351
	2352	You may also lend copies, under the same conditions stated above, and
	2353	you may publicly display copies.
	2354
	2355	@item
	2356	COPYING IN QUANTITY
	2357
	2358	If you publish printed copies (or copies in media that commonly have
	2359	printed covers) of the Document, numbering more than 100, and the
	2360	Document's license notice requires Cover Texts, you must enclose the
	2361	copies in covers that carry, clearly and legibly, all these Cover
	2362	Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
	2363	the back cover. Both covers must also clearly and legibly identify
	2364	you as the publisher of these copies. The front cover must present
	2365	the full title with all words of the title equally prominent and
	2366	visible. You may add other material on the covers in addition.
	2367	Copying with changes limited to the covers, as long as they preserve
	2368	the title of the Document and satisfy these conditions, can be treated
	2369	as verbatim copying in other respects.
	2370
	2371	If the required texts for either cover are too voluminous to fit
	2372	legibly, you should put the first ones listed (as many as fit
	2373	reasonably) on the actual cover, and continue the rest onto adjacent
	2374	pages.
	2375
	2376	If you publish or distribute Opaque copies of the Document numbering
	2377	more than 100, you must either include a machine-readable Transparent
	2378	copy along with each Opaque copy, or state in or with each Opaque copy
	2379	a computer-network location from which the general network-using
	2380	public has access to download using public-standard network protocols
	2381	a complete Transparent copy of the Document, free of added material.
	2382	If you use the latter option, you must take reasonably prudent steps,
	2383	when you begin distribution of Opaque copies in quantity, to ensure
	2384	that this Transparent copy will remain thus accessible at the stated
	2385	location until at least one year after the last time you distribute an
	2386	Opaque copy (directly or through your agents or retailers) of that
	2387	edition to the public.
	2388
	2389	It is requested, but not required, that you contact the authors of the
	2390	Document well before redistributing any large number of copies, to give
	2391	them a chance to provide you with an updated version of the Document.
	2392
	2393	@item
	2394	MODIFICATIONS
	2395
	2396	You may copy and distribute a Modified Version of the Document under
	2397	the conditions of sections 2 and 3 above, provided that you release
	2398	the Modified Version under precisely this License, with the Modified
	2399	Version filling the role of the Document, thus licensing distribution
	2400	and modification of the Modified Version to whoever possesses a copy
	2401	of it. In addition, you must do these things in the Modified Version:
	2402
	2403	@enumerate A
	2404	@item
	2405	Use in the Title Page (and on the covers, if any) a title distinct
	2406	from that of the Document, and from those of previous versions
	2407	(which should, if there were any, be listed in the History section
	2408	of the Document). You may use the same title as a previous version
	2409	if the original publisher of that version gives permission.
	2410
	2411	@item
	2412	List on the Title Page, as authors, one or more persons or entities
	2413	responsible for authorship of the modifications in the Modified
	2414	Version, together with at least five of the principal authors of the
	2415	Document (all of its principal authors, if it has fewer than five),
	2416	unless they release you from this requirement.
	2417
	2418	@item
	2419	State on the Title page the name of the publisher of the
	2420	Modified Version, as the publisher.
	2421
	2422	@item
	2423	Preserve all the copyright notices of the Document.
	2424
	2425	@item
	2426	Add an appropriate copyright notice for your modifications
	2427	adjacent to the other copyright notices.
	2428
	2429	@item
	2430	Include, immediately after the copyright notices, a license notice
	2431	giving the public permission to use the Modified Version under the
	2432	terms of this License, in the form shown in the Addendum below.
	2433
	2434	@item
	2435	Preserve in that license notice the full lists of Invariant Sections
	2436	and required Cover Texts given in the Document's license notice.
	2437
	2438	@item
	2439	Include an unaltered copy of this License.
	2440
	2441	@item
	2442	Preserve the section Entitled ``History'', Preserve its Title, and add
	2443	to it an item stating at least the title, year, new authors, and
	2444	publisher of the Modified Version as given on the Title Page. If
	2445	there is no section Entitled ``History'' in the Document, create one
	2446	stating the title, year, authors, and publisher of the Document as
	2447	given on its Title Page, then add an item describing the Modified
	2448	Version as stated in the previous sentence.
	2449
	2450	@item
	2451	Preserve the network location, if any, given in the Document for
	2452	public access to a Transparent copy of the Document, and likewise
	2453	the network locations given in the Document for previous versions
	2454	it was based on. These may be placed in the ``History'' section.
	2455	You may omit a network location for a work that was published at
	2456	least four years before the Document itself, or if the original
	2457	publisher of the version it refers to gives permission.
	2458
	2459	@item
	2460	For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
	2461	the Title of the section, and preserve in the section all the
	2462	substance and tone of each of the contributor acknowledgements and/or
	2463	dedications given therein.
	2464
	2465	@item
	2466	Preserve all the Invariant Sections of the Document,
	2467	unaltered in their text and in their titles. Section numbers
	2468	or the equivalent are not considered part of the section titles.
	2469
	2470	@item
	2471	Delete any section Entitled ``Endorsements''. Such a section
	2472	may not be included in the Modified Version.
	2473
	2474	@item
	2475	Do not retitle any existing section to be Entitled ``Endorsements'' or
	2476	to conflict in title with any Invariant Section.
	2477
	2478	@item
	2479	Preserve any Warranty Disclaimers.
	2480	@end enumerate
	2481
	2482	If the Modified Version includes new front-matter sections or
	2483	appendices that qualify as Secondary Sections and contain no material
	2484	copied from the Document, you may at your option designate some or all
	2485	of these sections as invariant. To do this, add their titles to the
	2486	list of Invariant Sections in the Modified Version's license notice.
	2487	These titles must be distinct from any other section titles.
	2488
	2489	You may add a section Entitled ``Endorsements'', provided it contains
	2490	nothing but endorsements of your Modified Version by various
	2491	parties---for example, statements of peer review or that the text has
	2492	been approved by an organization as the authoritative definition of a
	2493	standard.
	2494
	2495	You may add a passage of up to five words as a Front-Cover Text, and a
	2496	passage of up to 25 words as a Back-Cover Text, to the end of the list
	2497	of Cover Texts in the Modified Version. Only one passage of
	2498	Front-Cover Text and one of Back-Cover Text may be added by (or
	2499	through arrangements made by) any one entity. If the Document already
	2500	includes a cover text for the same cover, previously added by you or
	2501	by arrangement made by the same entity you are acting on behalf of,
	2502	you may not add another; but you may replace the old one, on explicit
	2503	permission from the previous publisher that added the old one.
	2504
	2505	The author(s) and publisher(s) of the Document do not by this License
	2506	give permission to use their names for publicity for or to assert or
	2507	imply endorsement of any Modified Version.
	2508
	2509	@item
	2510	COMBINING DOCUMENTS
	2511
	2512	You may combine the Document with other documents released under this
	2513	License, under the terms defined in section 4 above for modified
	2514	versions, provided that you include in the combination all of the
	2515	Invariant Sections of all of the original documents, unmodified, and
	2516	list them all as Invariant Sections of your combined work in its
	2517	license notice, and that you preserve all their Warranty Disclaimers.
	2518
	2519	The combined work need only contain one copy of this License, and
	2520	multiple identical Invariant Sections may be replaced with a single
	2521	copy. If there are multiple Invariant Sections with the same name but
	2522	different contents, make the title of each such section unique by
	2523	adding at the end of it, in parentheses, the name of the original
	2524	author or publisher of that section if known, or else a unique number.
	2525	Make the same adjustment to the section titles in the list of
	2526	Invariant Sections in the license notice of the combined work.
	2527
	2528	In the combination, you must combine any sections Entitled ``History''
	2529	in the various original documents, forming one section Entitled
	2530	``History''; likewise combine any sections Entitled ``Acknowledgements'',
	2531	and any sections Entitled ``Dedications''. You must delete all
	2532	sections Entitled ``Endorsements.''
	2533
	2534	@item
	2535	COLLECTIONS OF DOCUMENTS
	2536
	2537	You may make a collection consisting of the Document and other documents
	2538	released under this License, and replace the individual copies of this
	2539	License in the various documents with a single copy that is included in
	2540	the collection, provided that you follow the rules of this License for
	2541	verbatim copying of each of the documents in all other respects.
	2542
	2543	You may extract a single document from such a collection, and distribute
	2544	it individually under this License, provided you insert a copy of this
	2545	License into the extracted document, and follow this License in all
	2546	other respects regarding verbatim copying of that document.
	2547
	2548	@item
	2549	AGGREGATION WITH INDEPENDENT WORKS
	2550
	2551	A compilation of the Document or its derivatives with other separate
	2552	and independent documents or works, in or on a volume of a storage or
	2553	distribution medium, is called an ``aggregate'' if the copyright
	2554	resulting from the compilation is not used to limit the legal rights
	2555	of the compilation's users beyond what the individual works permit.
	2556	When the Document is included in an aggregate, this License does not
	2557	apply to the other works in the aggregate which are not themselves
	2558	derivative works of the Document.
	2559
	2560	If the Cover Text requirement of section 3 is applicable to these
	2561	copies of the Document, then if the Document is less than one half of
	2562	the entire aggregate, the Document's Cover Texts may be placed on
	2563	covers that bracket the Document within the aggregate, or the
	2564	electronic equivalent of covers if the Document is in electronic form.
	2565	Otherwise they must appear on printed covers that bracket the whole
	2566	aggregate.
	2567
	2568	@item
	2569	TRANSLATION
	2570
	2571	Translation is considered a kind of modification, so you may
	2572	distribute translations of the Document under the terms of section 4.
	2573	Replacing Invariant Sections with translations requires special
	2574	permission from their copyright holders, but you may include
	2575	translations of some or all Invariant Sections in addition to the
	2576	original versions of these Invariant Sections. You may include a
	2577	translation of this License, and all the license notices in the
	2578	Document, and any Warranty Disclaimers, provided that you also include
	2579	the original English version of this License and the original versions
	2580	of those notices and disclaimers. In case of a disagreement between
	2581	the translation and the original version of this License or a notice
	2582	or disclaimer, the original version will prevail.
	2583
	2584	If a section in the Document is Entitled ``Acknowledgements'',
	2585	``Dedications'', or ``History'', the requirement (section 4) to Preserve
	2586	its Title (section 1) will typically require changing the actual
	2587	title.
	2588
	2589	@item
	2590	TERMINATION
	2591
	2592	You may not copy, modify, sublicense, or distribute the Document except
	2593	as expressly provided for under this License. Any other attempt to
	2594	copy, modify, sublicense or distribute the Document is void, and will
	2595	automatically terminate your rights under this License. However,
	2596	parties who have received copies, or rights, from you under this
	2597	License will not have their licenses terminated so long as such
	2598	parties remain in full compliance.
	2599
	2600	@item
	2601	FUTURE REVISIONS OF THIS LICENSE
	2602
	2603	The Free Software Foundation may publish new, revised versions
	2604	of the GNU Free Documentation License from time to time. Such new
	2605	versions will be similar in spirit to the present version, but may
	2606	differ in detail to address new problems or concerns. See
	2607	@uref{http://www.gnu.org/copyleft/}.
	2608
	2609	Each version of the License is given a distinguishing version number.
	2610	If the Document specifies that a particular numbered version of this
	2611	License ``or any later version'' applies to it, you have the option of
	2612	following the terms and conditions either of that specified version or
	2613	of any later version that has been published (not as a draft) by the
	2614	Free Software Foundation. If the Document does not specify a version
	2615	number of this License, you may choose any version ever published (not
	2616	as a draft) by the Free Software Foundation.
	2617	@end enumerate
	2618
	2619	@page
	2620	@heading ADDENDUM: How to use this License for your documents
	2621
	2622	To use this License in a document you have written, include a copy of
	2623	the License in the document and put the following copyright and
	2624	license notices just after the title page:
	2625
	2626	@smallexample
	2627	@group
	2628	Copyright (C) @var{year} @var{your name}.
	2629	Permission is granted to copy, distribute and/or modify this document
	2630	under the terms of the GNU Free Documentation License, Version 1.2
	2631	or any later version published by the Free Software Foundation;
	2632	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
	2633	Texts. A copy of the license is included in the section entitled ``GNU
	2634	Free Documentation License''.
	2635	@end group
	2636	@end smallexample
	2637
	2638	If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
	2639	replace the ``with@dots{}Texts.'' line with this:
	2640
	2641	@smallexample
	2642	@group
	2643	with the Invariant Sections being @var{list their titles}, with
	2644	the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
	2645	being @var{list}.
	2646	@end group
	2647	@end smallexample
	2648
	2649	If you have Invariant Sections without Cover Texts, or some other
	2650	combination of the three, merge those two alternatives to suit the
	2651	situation.
	2652
	2653	If your document contains nontrivial examples of program code, we
	2654	recommend releasing these examples in parallel under your choice of
	2655	free software license, such as the GNU General Public License,
	2656	to permit their use in free software.
	2657
	2658	@c Local Variables:
	2659	@c ispell-local-pdict: "ispell-dict"
	2660	@c End:
	2661
	2662
	2663	@c ---------------------------------------------------------------------
	2664	@c ---------------------------------------------------------------------
	2665
	2666	@node Reporting bugs
	2667	@chapter Reporting bugs
	2668
	2669	Report bugs to <obrebski@@amu.edu.pl>.
	2670
	2671	@c ---------------------------------------------------------------------
	2672	@c ---------------------------------------------------------------------
	2673
	2674	@c @node Copyright
	2675	@c @chapter Copyright
	2676	@c
	2677	@c Copyright 2004 by Tomasz Obrebski
	2678	@c This software is free for research and educational use.
	2679
	2680	@c ---------------------------------------------------------------------
	2681	@c ---------------------------------------------------------------------
	2682
	2683	@node Author
	2684	@chapter Author
	2685
	2686
	2687	@bye

Note: See TracBrowser for help on using the repository browser.

UAM Text Tools

Context Navigation

source: app/doc/utt.texinfo @ 389de9a

Download in other formats: