Context Navigation

utt.texinfo @ 9ace5d2

help

Last change on this file since 9ace5d2 was 9ace5d2, checked in by obrebski <obrebski@…>, 18 years ago

trochę zmian

M app/doc/utt.texinfo
M app/src/dgp/sgraph.hh
M app/src/dgp/const.hh
M app/src/dgp/grammar.hh
M app/src/dgp/thesymbols.hh
M app/src/dgp/dgc
M app/src/dgp/sgraph.cc
M app/src/dgp/grammar.cc

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@63 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

Property mode set to 100644

File size: 82.6 KB

Rev	Line
[9ace5d2]	1
[25ae32e]	2	\input texinfo @c --texinfo--
[9ace5d2]	3	@c @documentencoding ISO-8859-2
	4	@documentencoding UTF-8
[25ae32e]	5	@c @documentlanguage pl
	6
	7	@c %**start of header
	8	@setfilename utt.info
	9	@settitle UAM Text Tools v0.90
	10	@c %**end of header
	11
	12	@copying
[261bf62]	13	This manual is for UAM Text Tools (version 0.90, October, 2008)
[25ae32e]	14
[9ace5d2]	15	Copyright @copyright{} 2005, 2007 Tomasz ObrÄbski, MichaÅ Stolarski, Justyna Walkowska, PaweÅ Konieczka.
[25ae32e]	16
	17	Permission is granted to copy, distribute and/or modify this document
[261bf62]	18	under the terms of the GNU Free Documentation License, Version 1.2 or
	19	any later version published by the Free Software Foundation; with no
	20	Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
	21	copy of the license is included in the section entitled GNU Free
	22	Documentation License,,GNU Free Documentation License.
[25ae32e]	23
	24	@c @quotation
	25	@c Permission is granted to ...
	26	@c No permission is granted until the document is completed.
	27	@c @end quotation
	28	@end copying
	29
	30
	31	@titlepage
	32	@title UAM Text Tools 0.90 - User Manual
	33	@subtitle edition 0.01, @today
	34	@subtitle status: prescript
[9ace5d2]	35	@author by Justyna Walkowska, Tomasz ObrÄbski and MichaÅ Stolarski
[25ae32e]	36	@page
	37	@vskip 0pt plus 1filll
	38	@insertcopying
	39	@end titlepage
	40
	41	@contents
	42
	43	@c @paragraphindent none
	44
	45	@iftex
[9ace5d2]	46	@tex
	47	% \usepackage[T1]{fontenc}
	48	% \usepackage[utf8]{inputenc}
	49	% \usepackage{times}
	50	@end tex
	51
[25ae32e]	52	@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt
	53	@end iftex
	54	@c @headings off
	55	@c @everyheading LEM(1) @\| @\| LEM(1)
	56	@everyfooting @today @c @\| @thispage @\|
	57
	58	@ifnottex
	59
	60	@node Top
	61	@top UTT - UAM Text Tools
	62
	63	@insertcopying
	64
	65	@menu
	66	* General information::
	67	* UTT file format::
	68	* Configuration files::
	69	* UTT components::
	70	* Auxiliary tools::
	71	* Usage examples::
	72	* PMDBF dictionary::
	73	@c * Examples::
	74	@c * Copyright::
	75	* GNU Free Documentation License::
	76	* Reporting bugs::
	77	* Author::
	78	@end menu
	79	@end ifnottex
	80
	81
	82	@c ----------------------------------------------------------------------
	83
	84	@node General information
	85	@chapter General information
	86
	87	UAM Text Tools (UTT) is a package of language processing tools
	88	developed at Adam Mickiewicz University. Its functionality includes:
	89
	90	@itemize @bullet
	91
	92	@item
[9ace5d2]	93	tokenization Ã³ÅÄÅŒ
[25ae32e]	94	@item
	95	dictionary-based morphological analysis
	96	@item
	97	heuristic morphological analysis of unknown words
	98	@item
[9ace5d2]	99	spelling correction Ã³ÅÄÅÄÅŒ
[25ae32e]	100	@item
	101	pattern search
	102	@item
	103	sentence splitting
	104	@item
	105	generation of concordance tables
	106	@end itemize
	107
	108	The toolkit is destined for processing of raw (not annotated)
	109	unrestricted text for any conceivable purpose.
	110
	111	The system is organized as a collection of command-line programs, each
	112	performing one operation, e.g. tokenization, lemmatization, spelling
	113	correction. The components are independent one from another, the
	114	unifying element being the uniform i/o file format.
	115
	116	The components may be combined in various ways to provide various text
	117	processing services. Also new components supplied by the used may be
	118	easily incorporated into the system provided that they respect the i/o
	119	file format conventions.
	120
	121	UTT component programs does not depend on any specific tagset or
	122	morphological description format.
	123
	124	UTT is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
	125	the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
	126
	127	The Polex/PMDBF dictionary is licensed under the Creative Commons by-nc-sa License which prohibits commercial use.
	128
	129
	130	List of contributors:
	131
	132	@itemize
	133	@item Pawel Konieczka
[9ace5d2]	134	@item Tomasz ObrÄbski
	135	@item MichaÅ Stolarski
[25ae32e]	136	@item Marcin Walas
	137	@item Justyna Walkowska
[9ace5d2]	138	@item PaweÅ WereÅski
[25ae32e]	139	@end itemize
	140
	141	@c ----------------------------------------------------------------------
	142	@c ---------------------------------------------------------------------
	143
	144	@node UTT file format
	145	@chapter UTT file format
	146
	147	A UTT file contains annotation of a text. It consists of a sequence of
	148	segments. Each segment explicitly refers to a continuous piece of the
	149	text and provides some information on it.
	150
	151	@section Segment format
	152
	153	A segment occupies one line of a UTT file and consists of
	154	space-separated fields:
	155
	156
	157	@quotation
	158	@sp 1
	159	[@var{start} [@var{length}]] @var{type} @var{form} [@var{annotation1} [@var{annotation2} ...]]
	160	@sp 1
	161	@end quotation
	162
	163	@table @var
	164
	165	@item @var{start}
	166	Non-negative integer value indicating the position in the source text where the
	167	segment starts.
	168
	169	@item @var{length}
	170	Non-negative integer value indicating the length of the segment.
	171
	172	@item @var{type}
	173	A sequence of non-ASCII characters (without spaces or letters, which could lead to @var{type} being misinterpreted as a @var{start} or @var{length} field).
	174	@var{type} reflects the main classification of segments -
	175	into words, numbers, punctuation marks, meta-text markers.
	176	@xref{tok output,,tok output}, for description of automatically recognized type markers.
	177
	178	@item @var{form}
	179	This field contains the textual form of the segment or the special
	180	symbol @code{*} indicating that the form is not given (e.g. when the segment has been created artificially to mark something and is of lentgh 0).
	181
	182	The characters or character sequences that have special meaning in the
	183	@var{form} field are enumerated below.
	184
	185	Characters with special meaning:
	186
	187	@itemize
	188	@item @code{_} - space character
	189	@item @code{*} - undefined contents
	190	@end itemize
	191
	192	Escape sequences:
	193
	194	@itemize
	195	@item @code{\n} - new line
	196	@item @code{\t} - tabulation
	197	@item @code{\r} - carriage return
	198
	199	@item @code{\_} - the @code{_} character
	200	@item @code{\} - the @code{} character
	201	@item @code{\\} - the @code{\} character
	202
	203	@c @item @code{\hh} - a character with hexadecimal code @code{hh} (used for non-printable characters)
	204	@end itemize
	205
	206	@item @var{annotation1}
	207	@item @var{annotation2}
	208	@item ...
	209	Annotation fields have the following format:
	210
	211	@var{longname} @code{:} @var{value}
	212
	213	or
	214
	215	@var{shortname} @var{value}
	216
	217	where @var{longname} is a string of alphanumeric characters
	218	(isalnum() test), @var{shortname} - a single non-alphanumeric character
	219	(ispunct() test), and @var{value} is an arbitrary string of non-blank characters.
	220
	221	@end table
	222
	223
	224	Only two fields are mandatory: @var{type} and @var{form}. All other fields
	225	may be absent. In the case when only one number precedes the
	226	@var{type} field, it is interpreted as the @var{START} position.
	227
	228	If the @var{length} field is ommited, the length of the segment is the
	229	length of the @var{form} field, except when the value of the
	230	@var{form} field is @code{*} -- in this case, the length is assumed to
	231	be 0.
	232
	233	If the @var{start} field is also absent, the segment is assumed to directly
	234	follow the preceding one.
	235
	236	@c Conventions:
	237
	238	@c Annotation fields with predefined meaning:
	239
	240	@c @itemize
	241	@c @item @code{!} - UTT components are allowed to modify the contents of
	242	@c the @var{form} field (e.g. spelling correction does this). If this happens the
	243	@c original form of the segment have to be placed in the @code{!}-field.
	244	@c @item @code{@@} - morphological description
	245	@c @item @code{=} - node identifier assignment (used in graph encoding)
	246	@c @item @code{<} - preceding/dominating node(s) (used in graph encoding)
	247	@c @item @code{>} - succeeding/subordinate node(s) (used in graph encoding)
	248	@c @end itemize
	249
	250	Segments of length 0 may be used to mark file positions with some
	251	information. See e.g. BOS and EOS (beginning/end of sentence) markers
	252	in the example below.
	253
	254	Example:
	255
	256	sentence: @samp{Piszemy dobre progrumy.}
	257
	258	@example
	259	0000 00 BOS *
[9ace5d2]	260	0000 07 W Piszemy lem:pisaÄ,V
[25ae32e]	261	0007 01 S _
	262	0008 05 W dobre lem:dobry,ADJ
	263	0013 01 S _
	264	0014 08 W progrumy cor:programy lem:program,N
	265	0022 01 P .
	266	0023 00 EOS *
	267	0023 01 S _
	268	0024 00 BOS *
	269	0024 11 W Warszawiacy lem:Warszawiak,N
	270	0035 01 S _
[9ace5d2]	271	0036 03 W teÅŒ
[25ae32e]	272	0039 01 P .
	273	0040 00 EOS *
	274
	275	@end example
	276
	277	@example
	278	0000 BOS *
[9ace5d2]	279	0000 W Piszemy lem:pisaÄ,V
[25ae32e]	280	0007 S _
	281	0008 W dobre lem:dobry,ADJ
	282	0013 S _
	283	0014 W progrumy cor:programy lem:program,N
	284	0022 P .
	285	0023 EOS *
	286	@end example
	287
	288	Posion information may be provided only for some types of segments:
	289
	290	@example
	291	0000 BOS *
[9ace5d2]	292	W Piszemy lem:pisaÄÂ,V
[25ae32e]	293	S _
	294	W dobre lem:dobry,ADJ
	295	S _
	296	W progrumy cor:programy lem:program,N
	297	P .
	298	EOS *
	299	S _
	300	0024 BOS *
	301	W Warszawiacy lem:Warszawiak,N
	302	S _
[9ace5d2]	303	W teÅŒ
[25ae32e]	304	P .
	305	EOS *
	306	@end example
	307
	308	Position/length information may be provided only when necessary:
	309
	310	@example
	311	0000 04 N *
	312	0000 N 12
	313	P .
	314	N 5
	315	S _
	316	W km
	317	@end example
	318
	319	@section UTT File
	320
	321	A UTT file consists of a sequence of segments. The same text position
	322	may be covered by multiple segments. In cosequence, ambiguous text
	323	segmentation and ambiguous annotation may be represented.
	324
	325	There are two structural requirements a valid UTT-formatted file
	326	has to meet:
	327
	328	@itemize @bullet
	329
	330	@item
	331	segments have to be sorted with respect to the @var{position} field,
	332
	333	@item
	334	for each
	335	segment ending at position @var{n}, either there must be a segment starting at
	336	position @var{n+1}, or position @var{n+1} is not covered by any segment; similarly
	337	for each segment starting at position @var{n}, either there must be a segment
	338	ending at position @var{n-1}, or the position @var{n-1} must not be covered
	339	by any segment.
	340
	341	@end itemize
	342
	343	A valid annotation for the text fragment
	344	@example
	345	12.5 km
	346	@end example
	347
	348	may be
	349
	350	@example
	351	0000 02 N 12
	352	0000 04 N 12.5
	353	0002 01 P .
	354	0003 01 N 5
	355	0004 01 S _
	356	0005 02 W km
	357	@end example
	358
	359	but not
	360
	361	@example
	362	0000 02 N 12
	363	0000 04 N 12.5
	364	0004 01 S _
	365	0005 02 W km
	366	@end example
	367
[261bf62]	368	because in the latter example the first segment (starting at position
	369	0000, 2 characters long) ends at position @var{n}=0001 which is
	370	covered by the second segment and no segment starts at position
	371	@var{n+2}=0002.
	372
	373
	374	@section Flattened UTT file
	375
[e28a625]	376	A UTT file format has two variants: regular and flattened. The regular
[261bf62]	377	format was described above. In the flattened format some of the
	378	end-of-line characters are replaced with line-feed characters.
	379
	380	The flatten format is basically used to represent whole sentences as
	381	single lines of the input file (all intrasentential end-of-line
	382	characters are replaced with line-feed characters).
	383
	384	This technical trick permits to perform certain text
	385	processing operations on entire sentences with the use of such tools as
	386	@command{grep} (see @command{grp} component) or @command{sed} (see @command{mar} component).
	387
	388	The conversion between the two formats is performed by the tools:
	389	@command{fla} and @command{unfla}.
[25ae32e]	390
	391	@section Character encoding
	392
	393	The UTT component programs accept only 1-byte character encoding, such
[261bf62]	394	as ISO, ANSI, DOS.
[25ae32e]	395
	396
	397	@c @section Formats
	398
	399	@c @unnumberedsubsubsec Basic format
	400
	401	@c While processing large amounts of the overhead related with explicit
	402	@c ... of the start position and segment length becomes ... . Therefore,
	403	@c for efficiency reasons certain shortcuts are possible:
	404
	405	@c @unnumberedsubsubsec Relative start position
	406
	407	@c Start position may be given as relative distance from the last
	408	@c absolut position.
	409
	410	@c @unnumberedsubsubsec Absent length
	411
	412	@c Segment length may by omitted. Normally it can be restored by counting
	413	@c the length of the @emph{form field}. For segments with the special value
	414	@c @code{*} in the @emph{form field} length 0 is assumed.
	415
	416	@c @unnumberedsubsubsec Absent length and start position
	417
	418	@c Both start position and segment length may be omitted. In this format
	419	@c each segment is assumed to follow the previous one. This format is,
	420	@c therefore, suitable only for unambiguously tagged text
	421	@c (0-length markers can be still used.)
	422
	423
	424	@c @table @code
	425	@c @item AL
	426	@c @code{1234 03 W kot}
	427	@c @item RL
	428	@c @code{+56 03 W kot}
	429	@c @item A
	430	@c @code{1234 W kot}
	431	@c @item R
	432	@c @code{+56 W kot}
	433	@c @item 0
	434	@c @code{W kot}
	435	@c @end table
	436
	437
[9ace5d2]	438	@c [JAK UZYSKAÄÂ POLSKIE CZCIONKI W DVI???]
[25ae32e]	439
	440	@macro parhelp
	441	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
	442	Print help.
	443	@end macro
	444
	445
	446	@macro parversion
	447	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
	448	Print version information.
	449	@end macro
	450
	451	@macro parinteractive
	452	@item @b{@minus{}@minus{}interactive, @minus{}i}
	453	This option toggles interactive mode, which is by default off. In the
	454	interactive mode the program does not buffer the output.
	455	@end macro
	456
	457
	458	@c @macro parfile
	459	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	460	@c Input file name.
	461	@c If this option is absent or equal to '@minus{}', the program
	462	@c reads from the standard input.
	463	@c @end macro
	464
	465
	466	@c @macro paroutput
	467	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	468	@c Regular output file name. To regular output the program sends segments
	469	@c which it successfully processed and copies those which were not
	470	@c subject to processing. If this option is absent or equal to
	471	@c '@minus{}', standard output is used.
	472	@c @end macro
	473
	474	@c @macro parfail
	475	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
	476	@c Fail output file name. To fail output the program copies the segments
	477	@c it failed to process. If this option is absent or equal to
	478	@c '@minus{}', standard output is used.
	479	@c @end macro
	480
	481
	482	@c @macro parcopy
	483	@c @item @b{@minus{}@minus{}copy, @minus{}c}
	484	@c Copy succesfully processed segments to regular output also in their
	485	@c original input form.
	486	@c @end macro
	487
	488
	489	@macro parinputfield
	490	@item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
	491	The field containing the input to the program. The default is the
	492	@var{form} field. The fields @var{position}, @var{length}, @var{type},
	493	and @var{form} are referred to as @code{1}, @code{2}, @code{3},
	494	@code{4}, respectively.
	495	@end macro
	496
	497
	498	@macro paroutputfield
	499	@item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
	500	The name of the field added by the program. The default is the name of the program.
	501	@end macro
	502
	503
	504	@macro pardictionary
	505	@item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
	506	Dictionary file name.
	507	@end macro
	508
	509
	510	@macro parprocess
	511	@item @b{@minus{}@minus{}process=@var{type}, @minus{}p @var{type}}
	512	Process segments with the specified value in the @var{type} field.
	513	Multiple occurences of this option are allowed and are interpreted as
	514	disjunction. If this option is absent, all segments are processed.
	515	@end macro
	516
	517
	518	@macro parselect
	519	@item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
	520	Select for processing only segments in which the field named
	521	@var{fieldname} is present. Multiple occurences of this option are
	522	allowed and are interpreted as conjunction of conditions. If this
	523	option is absent, all segments are processed.
	524	@end macro
	525
	526
	527	@macro parunselect
	528	@item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
	529	Select for processing only segments in which the field @var{fieldname}
	530	is absent. Multiple occurences of this option are allowed and are
	531	interpreted as conjunction of conditions. If this option is absent,
	532	all segments are processed.
	533	@end macro
	534
	535
	536	@macro paroneline
	537	@item @b{@minus{}@minus{}one-line}
	538	This option makes the program print ambiguous annotation in one output
	539	line by generating multiple annotation fields. By default when
	540	ambiguous annotation may be produced for a segment, the segment is
	541	multiplicated and each of the annotations is added to separate copy of
	542	the segment.
	543	@end macro
	544
	545
	546	@macro paronefield
	547	@item @b{@minus{}@minus{}one-field, @minus{}1}
	548	This option makes the program print ambiguous annotation in one
	549	annotation field. By default when ambiguous annotation may be produced
	550	for a segment, the segment is multiplicated and each of the
	551	annotations is added to separate copy of the segment.
	552
	553	This option is useful when working with @command{kot} or @command{con}.
	554	@end macro
	555
	556
	557	@c ---------------------------------------------------------------------
	558	@c CONFIGURATION FILES
	559	@c ---------------------------------------------------------------------
	560
	561	@node Configuration files
	562	@chapter Configuration files
	563
	564	Values for all command line options accepted by a component
	565	may be set in configuration files. The default location of the
	566	configuration files for a component named @command{@var{program}} are
	567
	568	@example
[246900a]	569	@file{/usr/local/etc/utt/@var{program}.conf}
[25ae32e]	570	@end example
	571
	572	for system-wide configuration file and
	573
	574	@example
[246900a]	575	@file{~/.utt/@var{program}.conf}
[25ae32e]	576	@end example
	577
	578	for user configuration file.
	579
	580	@c The configuration file to load may be also specified with the
	581	@c @option{--config} option. Configuration file need not be provided.
	582
	583	For each option, the value is set according to the following priority:
	584
	585	@itemize
	586	@item command line
	587	@c @item configuration file indicated with @option{--config} option
	588	@item user configuration file (or configuration file indicated with the @option{--config} option)
	589	@item system-wide configuration file
	590	@end itemize
	591
	592	Parameter values are specified in the following format:
	593
	594	@var{parametername}=@var{value}
	595
	596	where @var{parametername} is the short or long name of an option accepted by
	597	the program, or
	598
	599	@var{parametername}
	600
	601	if the option does not need arguments.
	602
	603	You can introduce comments to configuration files using the # sign.
	604
	605	If a program accepts multiple occurences of an option (e.g. @var{lem}'s select option) you can specify them in two distinct lines of the program's configuration file.
	606
	607	@c The equal sign may be omitted.
	608
	609
	610	@quotation Tip
	611	If you have two (or more) frequently used sets of options for the same
	612	program (eg. lem with PMDBF dictionary and lem with a user dictionary)
	613	a good solution is to create two soft links to lem, called
	614	eg. lemg and lemu and specify their configuration in files lemg.conf
	615	and lemu.conf respectively.
	616	@end quotation
	617
	618	@c ---------------------------------------------------------------------
	619	@c COMPONENTS
	620	@c ---------------------------------------------------------------------
	621
	622	@node UTT components
	623	@chapter UTT components
	624
	625	UTT components are of three types:
	626
	627	@menu
	628	Sources: programs which read non-UTT data (e.g. raw text) and produce output
	629	in UTT format
	630	* tok:: a tokenizer
	631
	632	Filters: programs which read and produce UTT-formatted data
	633	* lem:: a morphological analyzer
	634	* gue:: a morphological guesser
[261bf62]	635	* cor:: a simple spelling corrector
	636	* kor:: a more elaborated spelling corrector
[25ae32e]	637	* sen:: a sentensizer
	638	* ser:: a pattern search tool (marks matches)
[261bf62]	639	* mar:: a pattern search tool (introduces arbitrary markers into the text)
[25ae32e]	640	* grp:: a pattern search tool (selects sentences containing a match)
[261bf62]	641	@c * gph:: a word-graph annotation tool::
	642	@c * dgp:: a dependency parser
[25ae32e]	643
	644	Sinks: programs which read UTT data and produce output in another format
	645	* kot:: an untokenizer
	646	* con:: a concordance table generator
	647	@end menu
	648
	649	@c ---------------------------------------------------------------------
	650	@c TOK
	651	@c ---------------------------------------------------------------------
	652
	653	@page
	654	@node tok
	655	@section tok - a tokenizer
	656
	657	@c ----------------------------------------
	658
	659	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	660	@item @strong{Authors:} @tab Tomasz ObrÄbski
[25ae32e]	661	@item @strong{Component category:} @tab source
[261bf62]	662	@item @strong{Input format:} @tab raw text file
	663	@item @strong{Output format:} @tab UTT regular
	664	@item @strong{Required annotation:} @tab -
[25ae32e]	665	@end multitable
	666
	667
	668	@menu
	669	* tok description::
	670	* tok input::
	671	* tok output::
	672	* tok command line options::
	673	* tok example::
	674	@end menu
	675
	676	@node tok description
	677	@subsection Description
	678
	679	@code{tok} is a simple program which reads a text file and identifies
	680	tokens on the basis of their orthographic form. The type of the token
	681	is printed as the @var{type} field.
	682
	683	@node tok input
	684	@subsection Input
	685
	686	Raw text.
	687
	688	@node tok output
	689	@subsection Output
	690
	691	UTT-file with four fields: @var{start}, @var{length}, @var{type}, and @var{form}. In the @var{type} field five types of tokens are distinguished:
	692
	693	@itemize
	694
	695	@item @code{W}
	696	(word)
	697	- continuous sequence of letters
	698
	699	@item @code{N}
	700	(number)
	701	- continuous sequence of digits
	702
	703	@item @code{S}
	704	(space)
	705	- continuous sequence of space characters
	706
	707	@item @code{P}
	708	(punctuation mark)
	709	- single printable characters not belonging to any of the other classes
	710
	711	@item @code{B}
	712	(unprintable character)
	713	- single unprintable character
	714
	715	@end itemize
	716
	717
	718
	719	@node tok command line options
	720	@subsection Command line options
	721
	722	@table @code
	723
	724	@item @b{@minus{}@minus{}help}, @b{@minus{}h}
	725	Print help.
	726
	727	@item @b{@minus{}@minus{}version}, @b{@minus{}V}
	728	Print version information.
	729
	730	@item @b{@minus{}@minus{}interactive, @minus{}i}
	731	This option toggles interactive mode, which is by default off. In the
	732	interactive mode the program does not buffer the output.
	733
	734	@end table
	735
	736	@node tok example
	737	@subsection Example
	738
	739	Input:
	740
	741	@example
	742	Piszemy dobre programy.
	743	@end example
	744
	745	Output:
	746
	747	@example
	748	0000 07 W Piszemy
	749	0007 01 S _
	750	0008 05 W dobre
	751	0013 01 S _
	752	0014 08 W programy
	753	0022 01 P .
	754	0023 01 S \n
	755	@end example
	756
	757
	758	@c ---------------------------------------------------------------------
	759	@c SEN
	760	@c ---------------------------------------------------------------------
	761
	762	@c @node sen - sentencizer
	763	@c @chapter sen - sentencizer
	764
[9ace5d2]	765	@c Authors: Tomasz ObrÄbski
[25ae32e]	766
	767	@c ---------------------------------------------------------------------
	768	@c LEM
	769	@c ---------------------------------------------------------------------
	770
	771	@page
	772	@node lem
	773	@section lem - morphological analyzer
	774
	775	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	776	@item @strong{Authors:} @tab Tomasz ObrÄbski, MichaÅ Stolarski
[25ae32e]	777	@item @strong{Component category:} @tab filter
[261bf62]	778	@item @strong{Input format:} @tab UTT regular
	779	@item @strong{Output format:} @tab UTT regular
	780	@item @strong{Required annotation:} @tab tok
[25ae32e]	781	@end multitable
	782
	783	@menu
	784	* lem description::
	785	* lem command line options::
	786	* lem input::
	787	* lem output::
	788	* lem example::
	789	* lem dictionaries::
	790	* lem hints::
	791	@end menu
	792
	793	@node lem description
	794	@subsection Description
	795
	796	@command{lem} performs morphological analysis of a simple orthographic
	797	word, returning all its possible morphological annotations,
	798	disregarding the context.
	799
	800	@c ----------------------------------------
	801
	802	@node lem command line options
	803	@subsection Command line options
	804
	805	@table @code
	806	@parhelp
	807	@parversion
	808	@parinteractive
	809	@c @parfile
	810	@c @paroutput
	811	@c @parfail
	812	@c @parcopy
	813	@parinputfield
	814	@paroutputfield
	815	@pardictionary
	816	@parprocess
	817	@parselect
	818	@parunselect
	819	@paroneline
	820	@paronefield
	821	@end table
	822
	823	@c ----------------------------------------
	824
	825	@node lem input
	826	@subsection Input
	827
	828	Lem reads a UTT file and processes the value of the @var{form} field
	829	(the input field may be changed with @option{--input-field} option).
	830
	831	@node lem output
	832	@subsection Output
	833
	834	@command{lem} adds a new annotation field, whose default name is @code{lem}. In
	835	case of ambiguity either the segment is multiplicated (default),
	836	multiple @code{lem} fields are added (@option{--one-line}) or ambiguous
	837	annotation is produced as the value of single @code{lem} field (option
	838	@option{--one-field,-1}):
	839
	840	@itemize @bullet
	841
	842	@item
	843	unambiguous value format:
	844
	845	@example
	846	<lemma>,<descr>
	847	@end example
	848
	849	@item
	850	ambiguous value format (@option{--one-field} option)
	851
	852
	853	@example
	854	<lemma>,<descr>[,<descr>][;<lemma>,<descr>[,<descr>]]
	855	@end example
	856
	857	(alternative descriptions for the same lemma are separated by commas,
	858	alternative lemmata are separated by semicolons.)
	859
	860	@end itemize
	861
	862	@node lem example
	863	@subsection Example
	864
	865	Input:
	866
	867	@example
	868	0000 07 W Piszemy
	869	0007 01 S _
	870	0008 05 W dobre
	871	0013 01 S _
	872	0014 08 W programy
	873	0022 01 P .
	874	0023 01 B \n
	875	@end example
	876
	877	Output (default):
	878
	879	@example
[9ace5d2]	880	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
[25ae32e]	881	0007 01 B _
	882	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
	883	0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
	884	0013 01 B _
	885	0014 08 W programy lem:program,N/GiNpCa
	886	0014 08 W programy lem:program,N/GiNpCn
	887	0014 08 W programy lem:program,N/GiNpCv
	888	0022 01 P .
	889	0023 01 B \n
	890	@end example
	891
	892	Output (@option{--one-line} option):
	893
	894	@example
[9ace5d2]	895	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
[25ae32e]	896	0007 01 S _
	897	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
	898	0013 01 S _
	899	0014 08 W programy lem:program,N/GiNpCa lem:program,N/GiNpCn lem:program,N/GiNpCv
	900	0022 01 P .
	901	0023 01 S \n
	902	@end example
	903
	904	Output (@option{--one-field} option):
	905
	906	@example
[9ace5d2]	907	0000 07 W Piszemy lem:pisaÄ,V/AiVpMdTrfNpP1
[25ae32e]	908	0007 01 S _
	909	0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
	910	0013 01 S _
	911	0014 08 W programy lem:program,N/GiNpCa,N/GiNpCn,N/GiNpCv
	912	0022 01 P .
	913	0023 01 S \n
	914	@end example
	915
	916	@c ----------------------------------------
	917
	918	@node lem dictionaries
	919	@subsection Dictionaries
	920
	921	@command{lem} requires a dictionary. The dictionary may be provided in
	922	one of two formats: in text (source) format or in binary (fsa) format.
	923
	924	@subsubheading Text format
	925
	926	Dictionary entries have the following structure:
	927
	928	@example
	929	<form>;<lemma>,<descr>[;<lemma>,<descr>]
	930	@end example
	931
	932	@var{lemma} may be given explicitly or in the cut-add format:
	933
	934	@example
	935	@code{[<cut1><add1>-]<cut2><add2>}
	936	@end example
	937
	938	meaning: replace prefix of length @code{<cut1>} with
	939	string @code{<add1>}, replace suffix of length @code{<cut2>} with string
	940	@code{<add2>}. For example @code{3t} transforms @samp{kocie} into
[9ace5d2]	941	@samp{kot}, @code{3-4aÃÅy} transforms @samp{najbielsi} into @samp{biaÃÅy}
[25ae32e]	942
	943	Each dictionary entry must be written in one line and must not contain blank characters.
	944
	945	Examples:
	946	@example
	947	kot;0,N/GaNsCn
	948	kota;1,N/GaNsCg;1,N/GaNsCa
	949	kotu;1,N/GaNsCd
	950	kotem;2,N/GaNsCi
	951	kocie;3t,N/GaNsCl;3t,N/GaNsCv
[9ace5d2]	952	najbielsi;3-4aÅy,ADJ/DsNpCnGp
	953	najbielsze;3-5aÅy,ADJ/DsNpCnGaifn
[25ae32e]	954	najlepsi;dobry,ADJ/DsNpCnGp
	955	najlepsze;dobry,ADJ/DsNpCnGaifn
	956	@end example
	957
	958
	959	The mandatory file name extension for a text dictionary is @code{dic}. For large
	960	dictionaries it is preferable, however, to compile them into binary
	961	(fsa) format.
	962
	963	@subsubheading Binary format
	964
	965	The mandatory file name extension for a binary dictionary is @code{bin}. To
	966	compile a text dictionary into binary format, write:
	967
	968	@example
	969	compiledic <dictionaryname>.dic
	970	@end example
	971
	972	@subsubheading Polex/PMDBF dictionary
	973
	974	A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is included in
	975	the distribution as the default @emph{lem}'s dictionary. It's
	976	located by default in:
	977
[261bf62]	978	@file{$HOME/.local/share/utt/pl_PL.ISO-8859-2/lem.bin}
	979
	980	in local installation or in
	981
	982	@file{/usr/local/share/utt/pl_PL.ISO-8859-2/lem.bin}
	983
	984	in system installation.
[25ae32e]	985
	986	@node lem hints
	987	@subsection Hints
	988
[261bf62]	989	@subsubheading Combining data from multiple dictionaries
[25ae32e]	990
[261bf62]	991	@itemize
[25ae32e]	992
[261bf62]	993	@item Apply <dict1>, then apply <dict2> to words which were not annotatated.
[25ae32e]	994
[261bf62]	995	@example
	996	lem -d <dict1> \| lem -S lem -d <dict2>
	997	@end example
[25ae32e]	998
[261bf62]	999	@item Add annotations from two dictionaries <dict1> and <dict2>.
[25ae32e]	1000
[261bf62]	1001	@example
	1002	lem -c -d <dict1> \| lem -S lem -d <dict2>
	1003	@end example
[25ae32e]	1004
[261bf62]	1005	@end itemize
[25ae32e]	1006
	1007
	1008	@c ---------------------------------------------------------------------
	1009	@c GUE
	1010	@c ---------------------------------------------------------------------
	1011
	1012	@page
	1013	@node gue
	1014	@section gue - morphological guesser
	1015
	1016	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1017
[9ace5d2]	1018	@item @strong{Authors:} @tab MichaÅ Stolarski, Tomasz ObrÄbski
[25ae32e]	1019	@item @strong{Component category:} @tab filter
	1020
	1021	@end multitable
	1022
	1023	@menu
[261bf62]	1024	* gue description::
[25ae32e]	1025	* gue command line options::
	1026	* gue example::
	1027	* gue dictionaries::
	1028	@end menu
	1029
[261bf62]	1030
	1031	@node gue description
	1032	@subsection Description
	1033
	1034	@command{gue} guesess morphological descriptions of the form contained
	1035	in the @var{form} field.
	1036
	1037
[25ae32e]	1038	@node gue command line options
	1039	@subsection Command line options
	1040
	1041	@table @code
	1042
	1043	@parhelp
	1044	@parversion
	1045	@parinteractive
	1046	@c @parfile
	1047	@c @paroutput
	1048	@c @parfail
	1049	@c @parcopy
	1050	@parinputfield
	1051	@paroutputfield
	1052	@pardictionary
	1053	@parprocess
	1054	@parselect
	1055	@parunselect
	1056	@paroneline
	1057	@paronefield
	1058
	1059	@item @b{@minus{}@minus{}delta=@var{n}}
	1060	Stop displaying answers after fall of weight, that is, when weight difference between 2 subsequent results is more than delta value (default=`0.2').
	1061
	1062
	1063	@item @b{@minus{}@minus{}cut-off=@var{n}}
	1064	Do not display answers with less weight than cut-off value (default=`200').
	1065
	1066
	1067	@item @b{@minus{}@minus{}guess_count=@var{n}, @minus{}n @var{n}}
	1068	Guess up to n descriptions (default=`0', which means 'display all results').
	1069
	1070
	1071
	1072	@end table
	1073
	1074	@node gue example
	1075	@subsection Example
	1076
	1077	@example
	1078	command: gue -n 2
	1079
	1080	input:
	1081	0000 07 W smerfny
	1082
	1083	output:
	1084	0000 07 W smerfny gue:,ADJ/CaDpGiNs
	1085	0000 07 W smerfny gue:,ADJ/CnvDpGaipNs
	1086	@end example
	1087
	1088
	1089	@node gue dictionaries
	1090	@subsection Dictionaries
	1091
	1092	@command{gue} requires a dictionary. For now, the dictionary must be provided in binary (fsa) format.
	1093	The fsa format is created by compiling text-format dictionaries.
	1094
	1095
	1096
	1097	@subsubheading Text format
	1098
	1099	Dictionary entries have the following structure:
	1100
	1101	@example
	1102	@var{prefix}@code{*}@var{suffix}@code{;}@var{lemma}@code{,}@var{description}@code{:}@var{weight}
	1103	@end example
	1104
	1105	@var{lemma} must be given in the cut-add format:
	1106
	1107	@example
	1108	@code{[<cut1><add1>-]<cut2><add2>}
	1109	@end example
	1110	(no spaces in between): replace prefix of length @var{cut1} with
	1111	string @var{add1}, replace suffix of length @var{cat2} with string
	1112	@var{add2}.
	1113
	1114
[9ace5d2]	1115	Example: @code{3-4aÅy} transforms @i{najbielsi} into @i{biaÅy}
[25ae32e]	1116
	1117
	1118	@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
	1119
	1120	@var{weight} is an integer value between 1 and 999 indicating the
	1121	likelihood of the guess.
	1122
[9ace5d2]	1123	@c @example
	1124	@c *ÅkÄ;1a,N/GfNsCa
	1125	@c naj*elszy;3-4aÅy,ADJ/...:...
	1126	@c @end example
[25ae32e]	1127
	1128
	1129	@c ---------------------------------------------------------------------
	1130	@c COR
	1131	@c ---------------------------------------------------------------------
	1132
	1133	@page
	1134	@node cor
	1135	@section cor - spelling corrector
	1136
	1137	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1138	@item @strong{Authors:} @tab Tomasz ObrÄbski, MichaÅ Stolarski
[25ae32e]	1139	@item @strong{Component category:} @tab filter
[261bf62]	1140	@item @strong{Input format:} @tab UTT regular
	1141	@item @strong{Output format:} @tab UTT regular
	1142	@item @strong{Required annotation:} @tab tok
[25ae32e]	1143	@end multitable
	1144
[261bf62]	1145	@menu
	1146	* cor description::
	1147	* cor command line options::
	1148	* cor dictionaries::
	1149	@end menu
	1150
	1151
	1152	@node cor description
	1153	@subsection Description
	1154
[25ae32e]	1155	The spelling corrector applies Kemal Oflazer's dynamic programming
	1156	algorithm @cite{oflazer96} to the FSA representation of the set of
	1157	word forms of the Polex/PMDBF dictionary. Given an incorrect
	1158	word form it returns all word forms present in the dictionary whose
	1159	edit distance is smaller than the threshold given as the parameter.
	1160
	1161
	1162	@node cor command line options
	1163	@subsection Command line options
	1164
	1165	@table @code
	1166
	1167	@parhelp
	1168	@parversion
	1169	@parinteractive
	1170	@c @parfile
	1171	@c @paroutput
	1172	@c @parfail
	1173	@c @parcopy
	1174	@parinputfield
	1175	@paroutputfield
	1176	@pardictionary
	1177	@parprocess
	1178	@parselect
	1179	@parunselect
	1180	@paroneline
	1181	@paronefield
	1182
	1183	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
	1184	Maximum edit distance (default='1').
	1185
[261bf62]	1186	@c @item @b{@minus{}@minus{}replace, @minus{}r}
	1187	@c Replace original form with corrected form, place original form in the
	1188	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
	1189
[25ae32e]	1190
	1191	@end table
	1192
	1193	@node cor dictionaries
	1194	@subsection Dictionaries
	1195
	1196	@command{cor} requires a dictionary. The dictionary has to be provided in binary (fsa) format.
	1197	The fsa format is created by compiling text-format dictionaries.
	1198
	1199	@subsubheading Text format
	1200
	1201	The @command{cor} dictionary is a list of words:
	1202	@example
	1203	odlot
	1204	odlotowy
	1205	odludek
	1206	@end example
	1207
[261bf62]	1208	@subsubheading Binary format
	1209
	1210	The mandatory file name extension for a binary dictionary is @code{bin}. To
	1211	compile a text dictionary into binary format, write:
	1212
	1213	@example
	1214	compiledic <dictionaryname>.dic
	1215	@end example
	1216
	1217	@c ---------------------------------------------------------------------
	1218	@c KOR
	1219	@c ---------------------------------------------------------------------
	1220
	1221	@page
	1222	@node kor
	1223	@section kor - configurable spelling corrector
	1224
[9ace5d2]	1225	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1226	@item @strong{Authors:} @tab PaweÅ Werenski, Tomasz ObrÄbski, MichaÅ Stolarski
	1227	@item @strong{Component category:} @tab filter
	1228	@item @strong{Input format:} @tab UTT regular
	1229	@item @strong{Output format:} @tab UTT regular
	1230	@item @strong{Required annotation:} @tab tok
	1231	@end multitable
	1232
	1233	@menu
	1234	* kor description::
	1235	* kor command line options::
	1236	* kor weights definition file::
	1237	* kor dictionaries::
	1238	@end menu
	1239
	1240
	1241	@node kor description
	1242	@subsection Description
	1243
	1244	The spelling corrector applies a Pawel Werenski's dynamic programming
	1245	algorithm to the FSA representation of the set of word forms of the
	1246	Polex/PMDBF dictionary. The algorithm is an extension of K. Oflazer
	1247	algorithm used by @command{cor}. In the extended version it is
	1248	possible to assign weights to individual edit operations.
	1249
	1250	Given an incorrect word form it returns all word forms
	1251	present in the dictionary whose edit distance is smaller than the
	1252	threshold given as the parameter.
	1253
	1254
	1255	@node kor command line options
	1256	@subsection Command line options
	1257
	1258	@table @code
	1259
	1260	@parhelp
	1261	@parversion
	1262	@parinteractive
	1263	@c @parfile
	1264	@c @paroutput
	1265	@c @parfail
	1266	@c @parcopy
	1267	@parinputfield
	1268	@paroutputfield
	1269	@pardictionary
	1270	@parprocess
	1271	@parselect
	1272	@parunselect
	1273	@paroneline
	1274	@paronefield
	1275
	1276	@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
	1277	Maximum edit distance (default='1').
	1278
	1279	@item @b{@minus{}@minus{}weights=@var{filename}, @minus{}w @var{filename}}
	1280	Edit operations' weights file.
	1281
	1282	@c @item @b{@minus{}@minus{}replace, @minus{}r}
	1283	@c Replace original form with corrected form, place original form in the
	1284	@c cor field. This option has no effect in @option{--one-*} modes (default=off)
	1285
	1286
	1287	@end table
	1288
	1289
	1290	@node kor weights definition file
	1291	@subsection Weights definition file
	1292
	1293	Example:
	1294
	1295	@example
	1296
	1297	%stdcor 1
	1298	%xchg 1
	1299	ÅŒ rz 0.5
	1300	ch h 0.5
	1301	u Ã³ 0.5
	1302
	1303	@end example
	1304
	1305
	1306	Default weight is set to 1 (@code{%stdcor 1}), the weight of exchange
	1307	operation is set to 1 (@code{%xchg 1}), the three principal orthographic
	1308	errors are assigned the weight 0.5.
	1309
	1310	The edit operation weight declaration, such as
	1311
	1312	@example
	1313	ÅŒ rz 0.5
	1314	@end example
	1315
	1316	works in both ways, i.e. ÅŒ->rz, rz->ÅŒ.
	1317
	1318	The default weights definition file for @code{kor} is:
	1319
	1320	@example
	1321	$HOME/.local/share/utt/weights.kor
	1322	@end example
	1323
	1324	or, if the above mentioned file is absent:
	1325
	1326	@example
	1327	/usr/local/share/utt/weights.kor
	1328	@end example
	1329
	1330
	1331	@node kor dictionaries
	1332	@subsection Dictionaries
	1333
	1334	see @command{cor}
[261bf62]	1335
	1336	@c ---------------------------------------------------------------------
	1337	@c SEN
	1338	@c ---------------------------------------------------------------------
	1339
[25ae32e]	1340	@page
	1341	@node sen
	1342	@section sen - a sentensizer
	1343
	1344	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1345
[9ace5d2]	1346	@item @strong{Authors:} @tab Tomasz ObrÄbski
[25ae32e]	1347	@item @strong{Component category:} @tab filter
[261bf62]	1348	@item @strong{Input format:} @tab UTT regular
	1349	@item @strong{Output format:} @tab UTT regular
	1350	@item @strong{Required annotation:} @tab tok
[25ae32e]	1351
	1352	@end multitable
	1353
	1354
	1355	@menu
[261bf62]	1356	* sen description::
[25ae32e]	1357	@c * sen input::
	1358	@c * sen output::
	1359	* sen example::
	1360	@end menu
	1361
[261bf62]	1362	@node sen description
	1363	@subsection Description
	1364
	1365	@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
	1366
[25ae32e]	1367	@node sen example
	1368	@subsection Example
	1369
	1370	@example
	1371	command: sen
	1372
	1373	input:
[9ace5d2]	1374	0000 05 W CzeÅÄ
[25ae32e]	1375	0005 01 P !
	1376	0006 01 S _
	1377	0007 02 W To
	1378	0009 01 S _
	1379	0010 02 W ja
	1380	0012 01 P .
	1381	0013 01 S \n
	1382
	1383	output:
	1384	0000 00 BOS *
[9ace5d2]	1385	0000 05 W CzeÅÄ
[25ae32e]	1386	0005 01 P !
	1387	0006 00 EOS *
	1388	0006 00 BOS *
	1389	0006 01 S _
	1390	0007 02 W To
	1391	0009 01 S _
	1392	0010 02 W ja
	1393	0012 01 P .
	1394	0013 01 S \n
	1395	0014 00 EOS *
	1396	@end example
	1397
	1398
	1399	@c ---------------------------------------------------------------------
	1400	@c GPH
	1401	@c ---------------------------------------------------------------------
	1402
	1403	@c @node gph - graphizer
	1404	@c @chapter gph - graphizer
	1405
[9ace5d2]	1406	@c Authors: Tomasz ObrÄbski
[25ae32e]	1407
	1408
	1409
	1410	@c ---------------------------------------------------------------------
[261bf62]	1411	@c SER
[25ae32e]	1412	@c ---------------------------------------------------------------------
	1413
	1414	@page
	1415	@node ser
	1416	@section ser - pattern search tool
	1417
	1418	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1419	@item @strong{Authors:} @tab Tomasz ObrÄbski
[25ae32e]	1420	@item @strong{Component category:} @tab filter
[261bf62]	1421	@item @strong{Input format:} @tab UTT regular
	1422	@item @strong{Output format:} @tab UTT regular
	1423	@item @strong{Required annotation:} @tab tok, lem --one-field
[25ae32e]	1424	@end multitable
	1425
	1426	@menu
[261bf62]	1427	* ser description::
[25ae32e]	1428	* ser command line options::
	1429	* ser pattern::
	1430	* ser how ser works::
	1431	* ser customization::
	1432	* ser limitations::
	1433	* ser requirements::
	1434	@end menu
	1435
	1436
[261bf62]	1437	@node ser description
	1438	@subsection Description
	1439
	1440	@command{ser} looks for patterns in UTT-formatted texts.
	1441
	1442
[25ae32e]	1443	@c ---------------------------------------------------------------------
	1444	@node ser command line options
	1445	@subsection Command line options
	1446
	1447	@table @code
	1448
	1449	@parhelp
	1450	@parversion
	1451	@c @parfile
	1452	@c @paroutput
	1453	@c @parinputfield
	1454	@c @paroutputfield
	1455	@parprocess
	1456	@parinteractive
	1457
	1458	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1459	The search pattern.
	1460
	1461	@item @b{@minus{}@minus{}morph=@var{field}}
	1462	The name of the annotation field containing the morphological
	1463	description (default @code{lem}).
	1464
	1465	@item @b{@minus{}@minus{}flex}
	1466	Only print the generated flex source code.
	1467
	1468	@item @b{@minus{}@minus{}macro=@var{filename}}
	1469	Read macrodefinitions from file @var{filename} rather than from
	1470	default location. This option allows to redefine the set of terms.
	1471
	1472	@item @b{@minus{}@minus{}define=@var{filename}}
	1473	Append macrodefinitions from file @var{filename}. This option
	1474	allows to extend the set of terms.
	1475
	1476	@end table
	1477
	1478
	1479	@c ---------------------------------------------------------------------
	1480	@node ser pattern
	1481	@subsection Pattern
	1482
	1483	The @command{ser} pattern is a regular expression over terms corresponding
	1484	to text segments or segment sequences. Predefined terms are:
	1485
	1486	@table @code
	1487
	1488	@item seg(@var{t},@var{f},@var{a})
	1489	a segment of type @var{t}, containing form @var{f} and annotation
	1490	@var{a}
	1491
	1492	@item form(@var{f})
	1493	a segment containing form @var{f}
	1494
	1495	@item field(@var{f})
	1496	a segment containing annotation field @var{f}
	1497
	1498	@item space(@var{f})
	1499	a space segment of form @var{f}
	1500
	1501	@item word(@var{f})
	1502	a word segment of form @var{f}
	1503
	1504	@item punct(@var{f})
	1505	a punct segment of form @var{f}
	1506
	1507	@item number(@var{f})
	1508	a number segment of form @var{f}
	1509
	1510	@item lexeme(@var{f})
	1511	a word segment with lemma @var{f}
	1512
	1513	@item cat(@var{c})
	1514	a word segment of category @var{c}
	1515
	1516	@end table
	1517
	1518	All arguments are optional. If an argument is omitted, an arbitrary
	1519	string of non-blank characters is assumed as the argument value. Term
	1520	arguments may be arbitrary character-level regular expressions. The
	1521	following special symbols can by used:
	1522
	1523	@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1524	@item @code{[@dots{}]} @tab a character class
	1525	@item @code{[^@dots{}]} @tab a negated character class
	1526	@item @code{\|} @tab alternative
	1527	@item @code{*} @tab repetition, including zero times
	1528	@item @code{+} @tab repetition, at least one time
	1529	@item @code{?} @tab optionality
	1530	@item @code{@{@var{m},@var{n}@}} @tab repetition from @var{m} to @var{n} times
	1531	@item @code{@{@var{m},@}} @tab repetition @var{m} or more times
	1532	@item @code{@{@var{m}@}} @tab repetition @var{m} times
	1533	@item @code{@var{\ddd}} @tab the character with octal value @var{ddd}
	1534	@item @code{\x@var{hh}} @tab the character with hexadecimal value @var{hh}
	1535	@item @code{( )} @tab parentheses, used to override precedence
	1536	@c @end multitable
	1537
	1538	@c @multitable {aaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1539	@item @code{.} @tab a non-blank character
	1540	@item @code{\w} @tab a letter
	1541	@item @code{\W} @tab a non-blank character other than a letter
	1542	@item @code{\d} @tab a digit
	1543	@item @code{\D} @tab a non-blank character other than a digit
	1544	@item @code{\s} @tab a space or tab character
	1545	@item @code{\S} @tab a non-blank character (the same as @code{.})
	1546	@item @code{\l} @tab a lowercase letter
	1547	@item @code{\L} @tab an uppercase letter
	1548	@end multitable
	1549
	1550
	1551	@noindent The following characters:
	1552	@example
	1553	@verb{% [ ] ^ \| * + ? { } , . < > \ %}
	1554	@end example
	1555	must be escaped with a backslash, i.e. written as:
	1556	@example
	1557	@verb{% \[ \] \^ \\| \* \+ \? \{ \} \, \. \< \> \\ %}
	1558	@end example
	1559
	1560	@quotation Note
	1561	The special symbols are ... borrowed from Perl with minor
	1562	modifications ... for convenience
	1563	The meaning of certain special characters/sequences slightly differs
	1564	from their common ???. This is motivated by convenience reasons.
	1565	The meaning of the @code{.} special character is modified due to
	1566	the special function of spaces in utt files (they are field
	1567	separators). Use @code{\s} to explicitly
	1568	@end quotation
	1569
	1570	In the argument of the @code{cat} term a special operator <...> may be
	1571	used. A category specification enclosed in angle brackets matches all
	1572	category descriptions which are consistent (non-contradictory) with the
	1573	specification. For example @code{<N>} matches all noun descriptions,
	1574	@code{<ADJ/Can>} matches all adjectives in accusative or nominal case.
	1575
	1576
	1577	@*
	1578	@noindent @b{Examples of one-segment patterns:}
	1579
	1580	@multitable {aaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1581	@item @code{seg} @tab any segment
	1582	@item @code{word} @tab any word-form
	1583	@item @code{word(pomocy)} @tab the word-form @samp{pomocy}
	1584	@item @code{word(naj.+)} @tab a word-form beginning with @samp{naj}
	1585	@item @code{word(\L\l+)} @tab a capitalized word-form
	1586	@item @code{punct} @tab a punctuation character
	1587	@item @code{space(.\\n.)} @tab a space segment containing a newline character
	1588	@item @code{lexeme(pomoc)} @tab any form of the lexeme 'pomoc'
	1589	@item @code{cat(N/.*)} @tab a word which category starts with @code{N/}
	1590	@item @code{cat(<N/Ca>)} @tab a word which category matches @code{N/Ca}
	1591	@end multitable
	1592
	1593	@*
	1594	@noindent @b{Examples of multi-segment patterns:}
	1595
	1596	@table @code
	1597
	1598	@item (word(\L) punct(\.) space?)+ word(\L\l+)
	1599	a sequence of initials followed by a surname
	1600
	1601	@item punct seg(W\|S\|N)* cat(<NPRO/Sr>) seg(W\|S\|N)* punct
	1602	a text fragment between two punctuation characters, containing an
	1603	ocurrence of a relative pronoun
	1604
	1605	@end table
	1606
	1607
	1608	@node ser how ser works
	1609	@subsection How ser works
	1610
	1611	@node ser customization
	1612	@subsection Customization
	1613
	1614	@c All predefined terms correspond to single segments,
	1615
	1616	@example
[261bf62]	1617	define(`verbseq', `(cat(<V>) (space cat(<V>)))')
[25ae32e]	1618	@end example
	1619
	1620
	1621	the term @code{cat()} may not be used as a ... of
	1622
	1623	@c See @command{m4} manual for further details on macro definition format.
	1624
	1625	@node ser limitations
	1626	@subsection Limitations
	1627
[261bf62]	1628	Do not use more than 3 attributes in <>.
[25ae32e]	1629
	1630	@node ser requirements
	1631	@subsection Requirements
	1632
	1633	In order to run @command{ser}, the following programs must be
	1634	installed in the system:
	1635
	1636	@itemize
	1637
	1638	@item @command{m4}
	1639	@item @command{grep}
	1640	@item @command{flex}
	1641	@item @command{gcc}
	1642
	1643	@end itemize
	1644
	1645
	1646	@c ---------------------------------------------------------------------
[261bf62]	1647	@c GRP
[25ae32e]	1648	@c ---------------------------------------------------------------------
	1649
	1650	@page
	1651	@node grp
	1652	@section grp - pattern search tool
	1653
	1654	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1655	@item @strong{Authors:} @tab Tomasz ObrÄbski
[25ae32e]	1656	@item @strong{Component category:} @tab filter
[261bf62]	1657	@item @strong{Input format:} @tab UTT flattened
	1658	@item @strong{Output format:} @tab UTT flattened
	1659	@item @strong{Required annotation:} @tab tok, sen, lem --one-field
[25ae32e]	1660	@end multitable
	1661
	1662
[261bf62]	1663	@menu
	1664	* grp description::
	1665	* grp command line options::
	1666	* grp pattern::
	1667	* grp hints::
	1668	@end menu
	1669
	1670
	1671	@node grp description
	1672	@subsection Description
	1673
[25ae32e]	1674	@code{gre} selects sentences containing an expression matching a
	1675	pattern. The pattern format is exactly the same as that accepted by
	1676	@code{ser}.
	1677
	1678	@code{gre} is intended mainly for speeding up corpus search process.
	1679	It is extremely fast (processing speed is usually higher then the speed
	1680	of reading the corpus file from disk).
	1681
	1682	@node grp command line options
	1683	@subsection Command line options
	1684
	1685	@table @code
	1686
	1687	@parhelp
	1688	@parversion
	1689	@parprocess
	1690	@parinteractive
	1691
	1692	@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1693	The search pattern.
	1694
	1695	@item @b{@minus{}@minus{}morph=@var{field}}
	1696	The name of the annotation field containing the morphological
	1697	description (default @code{lem}).
	1698
	1699	@item @b{@minus{}@minus{}command}
	1700	Only print the generated flex source code.
	1701
	1702	@item @b{@minus{}@minus{}macro=@var{filename}}
	1703	Read macrodefinitions from file @var{filename} rather than from
	1704	default location. This option allows to redefine the set of terms.
	1705
	1706	@item @b{@minus{}@minus{}define=@var{filename}}
	1707	Append macrodefinitions from file @var{filename}. This option
	1708	allows to extend the set of terms.
	1709
	1710	@end table
	1711
	1712
	1713	@node grp pattern
	1714	@subsection Pattern
	1715
	1716	(see @code{ser})
	1717
	1718	@node grp hints
	1719	@subsection Hints
	1720
	1721	The corpus search speed may be increased by combining grp with lzop
	1722	compression tool (grp usually processes data faster than it is read from a
	1723	disk, especially for slow laptop drives).
	1724
	1725	@example
[e28a625]	1726	cat corpus \| tok \| sen \| lem -1 \| fla \| lzop -7 > corpus.grp.lzo
[25ae32e]	1727	@end example
	1728
	1729	@example
[e28a625]	1730	lzop -cd corpus.grp.lzo \| grp -e @var{EXPR} \| unfla \| ser -e @var{EXPR}
[25ae32e]	1731	@end example
	1732
	1733
[261bf62]	1734
[25ae32e]	1735	@c ---------------------------------------------------------------------
[261bf62]	1736	@c MAR
[25ae32e]	1737	@c ---------------------------------------------------------------------
[261bf62]	1738
	1739	@page
	1740	@node mar
	1741	@section mar
	1742
	1743	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1744	@item @strong{Authors:} @tab Marcin Walas, Tomasz ObrÄbski
[e28a625]	1745	@item @strong{Input format:} @tab UTT flattened
	1746	@item @strong{Output format:} @tab UTT flattened
	1747	@item @strong{Required annotation:} @tab tok, sen, lem -1
[261bf62]	1748	@end multitable
	1749
	1750	[TODO]
	1751
[e28a625]	1752	(see mar's help 'mar -h' for some information)
	1753
[261bf62]	1754	@c ---------------------------------------------------------------------
	1755	@c KOT
[25ae32e]	1756	@c ---------------------------------------------------------------------
	1757
[261bf62]	1758
[25ae32e]	1759	@page
	1760	@node kot
	1761	@section kot - untokenizer
	1762
[261bf62]	1763	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1764	@item @strong{Authors:} @tab Tomasz ObrÄbski
[261bf62]	1765	@item @strong{Component category:} @tab filter
	1766	@item @strong{Input format:} @tab UTT regular
	1767	@item @strong{Output format:} @tab text
	1768	@item @strong{Required annotation:} @tab tok
	1769	@end multitable
[25ae32e]	1770
	1771
	1772	@menu
[261bf62]	1773	* kot description::
[25ae32e]	1774	* kot command line options::
	1775	* kot usage examples::
	1776	@end menu
	1777
[261bf62]	1778	@node kot description
	1779	@subsection Description
	1780
	1781	@command{kot} transforms a UTT formatted file back into raw text format.
	1782
[25ae32e]	1783	@node kot command line options
	1784	@subsection Command line options
	1785
	1786	@table @code
	1787
	1788	@parhelp
	1789
	1790	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
	1791
	1792	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	1793
	1794	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	1795
	1796	@c @item @b{@minus{}@minus{}interactive @minus{}i}
	1797
	1798	@c @item @b{@minus{}@minus{}config=@var{filename}}
	1799
	1800	@item
	1801
	1802	@item @b{@minus{}@minus{}gap-fill=@var{string}, @minus{}g @var{string}}
	1803	print @var{string} between nonadjacent segments of the input file
	1804
	1805	@item @b{@minus{}@minus{}spaces, @minus{}r}
	1806	retain the special characters @code{_}, @code{\t},
	1807	@code{\n}, @code{\r}, @code{\f} unexpanded in the output
	1808
	1809	@end table
	1810
	1811	@node kot usage examples
	1812	@subsection Usage examples
	1813
	1814	@example
	1815	cat legia.txt \| tok \| kot
	1816	@end example
	1817
	1818	@example
	1819	cat legia.txt \| tok \| lem -1 \| kot
	1820	@end example
	1821
[261bf62]	1822	@c ---------------------------------------------------------------
	1823	@c CON
	1824	@c ---------------------------------------------------------------
	1825
[25ae32e]	1826
	1827	@page
	1828	@node con
	1829	@section con - concordance table generator
	1830
	1831	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	1832	@item @strong{Authors:} @tab Justyna Walkowska
	1833	@item @strong{Component category:} @tab sink
[261bf62]	1834	@item @strong{Input format:} @tab UTT regular
	1835	@item @strong{Output format:} @tab text
	1836	@item @strong{Required annotation:} @tab ser or mar
[25ae32e]	1837	@end multitable
	1838	@c
	1839
	1840	@menu
[261bf62]	1841	* con description::
[25ae32e]	1842	* con command line options::
	1843	* con usage example::
	1844	* con hints::
	1845	@end menu
	1846
[261bf62]	1847
	1848	@node con description
	1849	@subsection Description
	1850
	1851	@command{con} generates a concordance table based on a pattern given to @command{ser}.
	1852
	1853
[25ae32e]	1854	@node con command line options
	1855	@subsection Command line options
	1856
	1857	@table @code
	1858
	1859	@parhelp
	1860
	1861	@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
	1862	@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
	1863	@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
	1864	@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
	1865	@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}} [???]
	1866	@c @item @b{@minus{}@minus{}copy, @minus{}c} [???]
	1867	@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
	1868	@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
	1869	@c @item @b{@minus{}@minus{}process=@var{class}, @minus{}p @var{class}}
	1870	@c @item @b{@minus{}@minus{}interactive @minus{}i}
	1871	@c @item @b{@minus{}@minus{}config=@var{filename}}
	1872	@c @item
	1873	@c @item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}}
	1874	@c search pattern
	1875	@c
	1876	@c @item @b{@minus{}@minus{}flex}
	1877	@c only print the generated flex source code
	1878	@c
	1879	@c @item @b{@minus{}@minus{}macro=@var{filename}}
	1880	@c read macrodefinitions from file @var{filename} rather than from
	1881	@c default location. This option allows to redefine the set of terms.
	1882	@c
	1883	@c @item @b{@minus{}@minus{}define=@var{filename}}
	1884	@c append macrodefinitions from file @var{filename}. This option
	1885	@c allows to extend the set of terms.
	1886
	1887	@item @b{@minus{}@minus{}left @minus{}l}
	1888	Left context info (default='30c'). Example:
	1889	@example
	1890	-l=5c: left context is 5 characters
	1891	-l=5w: left context is 5 words
	1892	-l=5s: left context is 5 non-empty input lines
	1893	-l='\s*\S+\sr\S+BOS': left context starts with the given regex
	1894	@end example
	1895
	1896	@item @b{@minus{}@minus{}right @minus{}r}
	1897	Right context info (default='30c').
	1898	@item @b{@minus{}@minus{}trim @minus{}t}
	1899	Clear incomplete words from output.
	1900	@item @b{@minus{}@minus{}white @minus{}w}
	1901	DO NOT change all white characters into spaces.
	1902	@item @b{@minus{}@minus{}column @minus{}c}
	1903	Left column minimal width in characters (default = 0).
	1904	@item @b{@minus{}@minus{}ignore @minus{}i}
	1905	Ignore segment inconsistency in the input.
[261bf62]	1906	@item @b{@minus{}@minus{}bom}
[25ae32e]	1907	Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
[261bf62]	1908	@item @b{@minus{}@minus{}eom}
[25ae32e]	1909	End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
	1910	@item @b{@minus{}@minus{}bod}
	1911	Selected segment beginning display string (default='[').
	1912	@item @b{@minus{}@minus{}eod}
	1913	Selected segment end display string (default=']').
	1914
	1915
	1916
	1917	@end table
	1918
	1919	@node con usage example
	1920	@subsection Usage example
	1921	@example
[261bf62]	1922	cat file.txt \| tok \| lem -1 \| ser -e 'lexeme(dom)' \| con
[25ae32e]	1923	@end example
	1924
	1925
	1926	@node con hints
	1927	@subsection Hints
	1928
	1929	@command{con} is a rather slow program. Do not pass large amounts of
	1930	redundant text through this program. @command{con} works fine in the following
	1931	sequence:
	1932
	1933	@example
	1934	... \| grp -e EXPR \| ser -e EXPR \| con
	1935	@end example
	1936
	1937
	1938	@c ---------------------------------------------------------------------
	1939	@c ---------------------------------------------------------------------
	1940
	1941	@page
	1942	@node Auxiliary tools
	1943	@chapter Auxiliary tools
	1944
	1945	@menu
	1946	* compiledic:: dictionary compiler
	1947	* fla:: UTT file flattener
	1948	* unfla:: UTT file unflattener
	1949	@end menu
	1950
	1951
	1952	@page
	1953	@node compiledic
	1954	@section compiledic - the dictionary compiler
	1955
	1956	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	1957	@item @strong{Authors:} @tab MichaÅ Stolarski, Tomasz ObrÄbski
[25ae32e]	1958	@item @strong{Component category:} @tab additional tool
	1959	@end multitable
	1960	@c
	1961
	1962	@command{compiledic} compiles dictionaries in text format (@code{.dic} extension) into binary
	1963	(FSA) format (@code{.bin} extension).
	1964
	1965	Automaton representation of a dictionary is built using the AT&T tools:
	1966	@itemize
	1967	@item AT&T FSM Library,
	1968	@item AT&T Lextools.
	1969	@end itemize
	1970
	1971	In order for the compiledic program to work you have to install the
	1972	above mentioned packages into your system. They are freely available
	1973	for non-commercial use.
	1974
	1975	Usage:
	1976	@example
	1977	compiledic <dictionaryname>.dic
	1978	@end example
	1979
	1980	The file <dictionaryname>.bin will be generated.
	1981
	1982	Remarque: The program produces a lot of temporary files which are
	1983	stored in the current directory. They are deleted after successfull
	1984	termination of the program.
	1985
	1986	@c @menu
	1987	@c * con command line options::
	1988	@c * con usage example::
	1989	@c * con hints::
	1990	@c @end menu
	1991
	1992
[e28a625]	1993	@c -------------------------------------------------------------------------------
	1994	@c FLA
	1995	@c -------------------------------------------------------------------------------
	1996
[25ae32e]	1997	@page
	1998	@node fla
	1999	@section fla - the UTT file flattener
	2000
	2001	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	2002	@item @strong{Authors:} @tab Tomasz ObrÄbski
[e28a625]	2003	@item @strong{Input format:} @tab UTT regular
	2004	@item @strong{Output format:} @tab UTT flattened
	2005	@item @strong{Required annotation:} @tab sen
[25ae32e]	2006	@end multitable
	2007	@c
	2008
[e28a625]	2009	@menu
	2010	* fla description::
	2011	@c * fla command line options::
	2012	@c * fla usage example::
	2013	@end menu
	2014
	2015
	2016	@node fla description
	2017	@subsection Description
	2018
[25ae32e]	2019	@command{fla} ``flattens'' a utt file by merging segments belonging
	2020	to one sentence in one line. Technically, end-of-line characters
	2021	('\n', ASCII code 10) are replaced with line-feed characters ('\f',
	2022	ASCII code 12). The flattening makes it possible to process UTT files
	2023	with such tools as @command{grep} or @command{sed} sentence by
	2024	sentence (used in @command{grp} and @command{mar}).
	2025
	2026	Flattened files should have the suffix @code{.fla}, eg. @file{thetext.utt.fla}.
	2027
	2028	Flattened files are still human-readible.
	2029
	2030	Usage:
	2031
	2032	@example
	2033	fla [<bosregex>]
	2034	@end example
	2035
	2036	The facultative argument is a regular expression describing segments
	2037	which should be treated as sentence beginnings (the test is: the
	2038	segment contains a fragment matching the @code{<bosregex>}). By
	2039	default, segments containing a field @code{BOS} are seeked.
	2040
[e28a625]	2041	@c -------------------------------------------------------------------------------
	2042	@c UNFLA
	2043	@c -------------------------------------------------------------------------------
[25ae32e]	2044
	2045	@page
	2046	@node unfla
	2047	@section unfla - the UTT file unflattener
	2048
	2049	@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[9ace5d2]	2050	@item @strong{Authors:} @tab Tomasz ObrÄbski
[e28a625]	2051	@item @strong{Input format:} @tab UTT flattened
	2052	@item @strong{Output format:} @tab UTT regular
	2053	@item @strong{Required annotation:} @tab -
[25ae32e]	2054	@end multitable
	2055
[e28a625]	2056	@menu
	2057	* unfla description::
	2058	@c * fla command line options::
	2059	@c * fla usage example::
	2060	@end menu
	2061
	2062	@node unfla description
	2063	@subsection Description
[25ae32e]	2064	@command{unfla} transforms a flattened UTT file, produced by
	2065	@command{fla}, into the regular format by restoring end-of-line
	2066	characters.
	2067
	2068
	2069
	2070
	2071	@c ---------------------------------------------------------------------
	2072	@c USAGE EXAMPLES
	2073	@c ---------------------------------------------------------------------
	2074
	2075	@node Usage examples
	2076	@chapter Usage examples
	2077
	2078	@subsubheading Simple pipelines
	2079
	2080	@enumerate
	2081
	2082	@item tokenization
	2083
	2084	cat text \| tok > output1
	2085
	2086	@item morphological annotation (1)
	2087
	2088	simple dictionary based lemmatization
	2089
	2090	cat text \| tok \| lem > output1
	2091
	2092	@item morphological annotation (2)
	2093
	2094	1) perform dictionary-based lemmatization
	2095	4) guess descriptions for words which have no annotation
	2096
	2097	@example
	2098	cat text \| tok \| lem \| gue -S lem > output2
	2099	@end example
	2100
	2101	@item morphological annotation (3)
	2102
	2103	1) perform dictionary-based lemmatization
	2104	2) try to correct words with no annotation
	2105	3) perform dictionary-based lemmatization of corrected words
	2106	4) guess descriptions for words which still have no annotation
	2107
	2108	@example
	2109	cat text \| tok \| lem \| cor -p W -S lem \| lem -I cor \| gue -p W -S lem
	2110	@end example
	2111	@item spelling correction
	2112
	2113
	2114
	2115	@example
[e28a625]	2116	cat text \| tok \| egrep ' W ' \| lem \| egrep -v 'lem:' \| cor -1
[25ae32e]	2117	@end example
	2118
	2119	@item Expression extraction
	2120
	2121	Extraction of all occurrences of a verb followed by a form of the noun 'rozmowa'.
	2122
	2123	@example
	2124	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' -m \| kot > output4
	2125	@end example
	2126
	2127	@item A word in context
	2128
	2129	Extraction of text fragments containing a form of the lexeme 'rozmowa' in
	2130	the context of 5 preceeding and 5 succeeding corpus segments.
	2131
	2132	@example
	2133	cat text \| tok \| lem -1 \| ser -e 'seg@{5@} lexeme(rozmowa) seg@{5@}' -m \| kot > output
	2134	@end example
	2135
	2136	@item generation of concordance table (1)
	2137
	2138	@example
	2139	cat text \| tok \| lem -1 \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
	2140	@end example
	2141
	2142	10"
	2143
	2144	@item generation of concordance table (2)
	2145
	2146	The same as above but much faster
	2147
	2148	@example
	2149	cat text \| tok \| lem -1 \| \
	2150	grp -e 'cat(<V>) space lexeme(rozmowa)' \| \
	2151	ser -e 'cat(<V>) space lexeme(rozmowa)' \| \
	2152	con
	2153	@end example
	2154
	2155	2"
	2156
	2157	@item generation of concordance table (3)
	2158
	2159	Usually, one performs repetitively search over the same corpus. In
	2160	such case it is advisable to transform the corpus data into the format
	2161	required by @command{grp} first, and then use the preprocessed data.
	2162
	2163	As @command{grp} (@command{grep}) processes data faster then it is
	2164	read from the disk drive, the search time may be still shortened by
[e28a625]	2165	using file compression techniques. We suggest using the
	2166	@command{lzop} compressor/decompressor.
[25ae32e]	2167
	2168	@item the fastest way to search a large corpus
	2169
[e28a625]	2170	step 1: corpus preprocessing
[25ae32e]	2171
	2172	@example
	2173	cat corpus \| tok \| sen \| lem -1 \
[e28a625]	2174	\| fla \| lzop -7 > corpus.grp.lzo
[25ae32e]	2175	@end example
	2176
	2177	step 2: search
	2178
	2179	@example
[e28a625]	2180	lzop -cd corpus.grp.lzo \| unfla \| grp -e 'cat(<V>) space
[25ae32e]	2181	lexeme(rozmowa)' \| ser -e 'cat(<V>) space lexeme(rozmowa)' \| con
	2182	@end example
	2183
	2184	@end enumerate
	2185
[e28a625]	2186	@c @subsubheading More complicated configurations
[25ae32e]	2187
	2188
[e28a625]	2189	@c @example
	2190	@c mknod fifo1 p
	2191	@c mknod fifo2 p
	2192	@c mknod fifo3 p
	2193	@c mknod fifo4 p
	2194	@c mknod fifo5 p
	2195
	2196	@c tok \| lem -p W -e fifo1 > fifo2 &
	2197	@c cor -e fifo3 < fifo1 \| lem > fifo4 &
	2198	@c gue < fifo3 > fifo5 &
	2199	@c sort -m fifo2 fifo4 fifo5
	2200
	2201	@c rm fifo?
	2202	@c @end example
[25ae32e]	2203
	2204
	2205	@c ---------------------------------------------------------------------
	2206	@c ---------------------------------------------------------------------
	2207
	2208	@c ---------------------------------------------------------------------
	2209	@c PMDBF DICTIONARY
	2210	@c ---------------------------------------------------------------------
	2211
	2212	@node PMDBF dictionary
	2213	@chapter PMDBF dictionary
	2214
	2215	UTT components come with lexical data derived from Polish
	2216	Morphological Database (PMDB).
	2217
	2218	@menu
	2219	* PMDBF files::
	2220	* PMDBF tag structure::
	2221	* PMDBF parts of speech::
	2222	* PMDBF morphosyntactic attributes::
	2223	@end menu
	2224
	2225	@node PMDBF files
	2226	@section Files
	2227
	2228	@node PMDBF tag structure
	2229	@section Tag structure
	2230
	2231	pos = [[:upper:]]+
	2232
	2233	attr = [[:upper:]]+
	2234
	2235	val = [[:lower:][:digit:]?!*+-] \| <[^>\n]+>
	2236
	2237	descr = pos ( / ( attr val + ) + ) ?
	2238
	2239	@node PMDBF parts of speech
	2240	@section Parts of speech
	2241
	2242	@multitable {ADJPRP} { adjectival-passive-participle }
	2243	@item @code{N} @tab noun
	2244	@item @code{NPRO} @tab nominal-pronoun
	2245	@item @code{NV} @tab deverbal-noun
	2246	@item @code{V} @tab verb
	2247	@item @code{BYC} @tab byc
	2248	@item @code{VNI} @tab non-inflected-verb
	2249	@item @code{ADJ} @tab adjective
	2250	@item @code{ADJPAP} @tab adjectival-passive-participle
	2251	@item @code{ADJPRP} @tab adjectival-present-participle
	2252	@item @code{ADJPP} @tab adjectival-past-participle
	2253	@item @code{ADJPRO} @tab adjectival-pronoun
	2254	@item @code{ADJNUM} @tab adjectival-numeral
	2255	@item @code{ADV} @tab adverb
	2256	@item @code{ADVANP} @tab adverbial-anterior-participle
	2257	@item @code{ADVPRP} @tab adverbial-present-participle
	2258	@item @code{ADVPRO} @tab adverbial-pronoun
	2259	@item @code{ADVNUM} @tab adverbial-numeral
	2260	@item @code{P} @tab preposition
	2261	@item @code{PPRO} @tab prep-noun-pronoun
	2262	@item @code{CONJ} @tab conjunction
	2263	@item @code{EXCL} @tab exclamation
	2264	@item @code{APP} @tab call
	2265	@item @code{ONO} @tab onomatopoeia
	2266	@item @code{PART} @tab particle
	2267	@item @code{NUMCRD} @tab cardinal-numeral
	2268	@item @code{NUMCOL} @tab collective-numeral
	2269	@item @code{NUMPAR} @tab partitive-numeral
	2270	@item @code{NUMORD} @tab ordinal-numeral
	2271	@end multitable
	2272
	2273	@node PMDBF morphosyntactic attributes
	2274	@section Morphosyntactic attributes
	2275
	2276	@multitable {Attr} {Val} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
	2277	@c @headitem Attr @tab Val @tab Description
	2278	@item
	2279	@code{A} @tab @tab Aspect
	2280	@item
	2281	@tab @code{p} @tab perfect
	2282	@item
	2283	@tab @code{i} @tab imperfect.
	2284	@item
	2285	@item
	2286	@code{V} @tab @tab Verb-Form
	2287	@item
	2288	@tab @code{b} @tab infinitive,
	2289	@item
	2290	@tab @code{p} @tab personal,
	2291	@item
	2292	@tab @code{i} @tab impersonal.
	2293	@item
	2294	@item
	2295	@code{M} @tab @tab Mood
	2296	@item
	2297	@tab @code{d} @tab declarative,
	2298	@item
	2299	@tab @code{c} @tab conditional,
	2300	@item
	2301	@tab @code{i} @tab imperative.
	2302	@item
	2303	@item
	2304	@code{T} @tab @tab Tense
	2305	@item
	2306	@tab @code{a} @tab past,
	2307	@item
	2308	@tab @code{r} @tab present,
	2309	@item
	2310	@tab @code{f} @tab future.
	2311	@item
	2312	@item
	2313	@code{P} @tab @tab Person
	2314	@item
	2315	@tab @code{1} @tab 1,
	2316	@item
	2317	@tab @code{2} @tab 2,
	2318	@item
	2319	@tab @code{3} @tab 3.
	2320	@item
	2321	@item
	2322	@code{D} @tab @tab Degree
	2323	@item
	2324	@tab @code{p} @tab positive,
	2325	@item
	2326	@tab @code{c} @tab comparative,
	2327	@item
	2328	@tab @code{s} @tab superlative.
	2329	@item
	2330	@item
	2331	@code{N} @tab @tab Number
	2332	@item
	2333	@tab @code{s} @tab singular,
	2334	@item
	2335	@tab @code{p} @tab plural.
	2336	@item
	2337	@item
	2338	@code{C} @tab @tab Case
	2339	@item
	2340	@tab @code{n} @tab nominative,
	2341	@item
	2342	@tab @code{g} @tab genitive,
	2343	@item
	2344	@tab @code{d} @tab dative,
	2345	@item
	2346	@tab @code{a} @tab accusative,
	2347	@item
	2348	@tab @code{i} @tab instrumantal,
	2349	@item
	2350	@tab @code{l} @tab locative,
	2351	@item
	2352	@tab @code{v} @tab vocative.
	2353	@item
	2354	@code{G} @tab @tab Gender
	2355	@item
	2356	@tab @code{p} @tab masculine-personal,
	2357	@item
	2358	@tab @code{a} @tab masculine-animal,
	2359	@item
	2360	@tab @code{i} @tab masculine-inanimate,
	2361	@item
	2362	@tab @code{f} @tab feminine,
	2363	@item
	2364	@tab @code{n} @tab neuter.
	2365	@end multitable
	2366
	2367
	2368	@c ---------------------------------------------------------------------
	2369	@c ---------------------------------------------------------------------
	2370	@c
	2371	@c @node Examples
	2372	@c @chapter Examples
	2373
	2374	@c ----------------------------------------------------------------------
	2375	@c ----------------------------------------------------------------------
	2376
	2377	@node GNU Free Documentation License
	2378	@chapter GNU Free Documentation License
	2379
	2380	@c The GNU Free Documentation License.
	2381	@center Version 1.2, November 2002
	2382
	2383	@c This file is intended to be included within another document,
	2384	@c hence no sectioning command or @node.
	2385
	2386	@display
	2387	Copyright @copyright{} 2000,2001,2002 Free Software Foundation, Inc.
	2388	51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
	2389
	2390	Everyone is permitted to copy and distribute verbatim copies
	2391	of this license document, but changing it is not allowed.
	2392	@end display
	2393
	2394	@enumerate 0
	2395	@item
	2396	PREAMBLE
	2397
	2398	The purpose of this License is to make a manual, textbook, or other
	2399	functional and useful document @dfn{free} in the sense of freedom: to
	2400	assure everyone the effective freedom to copy and redistribute it,
	2401	with or without modifying it, either commercially or noncommercially.
	2402	Secondarily, this License preserves for the author and publisher a way
	2403	to get credit for their work, while not being considered responsible
	2404	for modifications made by others.
	2405
	2406	This License is a kind of ``copyleft'', which means that derivative
	2407	works of the document must themselves be free in the same sense. It
	2408	complements the GNU General Public License, which is a copyleft
	2409	license designed for free software.
	2410
	2411	We have designed this License in order to use it for manuals for free
	2412	software, because free software needs free documentation: a free
	2413	program should come with manuals providing the same freedoms that the
	2414	software does. But this License is not limited to software manuals;
	2415	it can be used for any textual work, regardless of subject matter or
	2416	whether it is published as a printed book. We recommend this License
	2417	principally for works whose purpose is instruction or reference.
	2418
	2419	@item
	2420	APPLICABILITY AND DEFINITIONS
	2421
	2422	This License applies to any manual or other work, in any medium, that
	2423	contains a notice placed by the copyright holder saying it can be
	2424	distributed under the terms of this License. Such a notice grants a
	2425	world-wide, royalty-free license, unlimited in duration, to use that
	2426	work under the conditions stated herein. The ``Document'', below,
	2427	refers to any such manual or work. Any member of the public is a
	2428	licensee, and is addressed as ``you''. You accept the license if you
	2429	copy, modify or distribute the work in a way requiring permission
	2430	under copyright law.
	2431
	2432	A ``Modified Version'' of the Document means any work containing the
	2433	Document or a portion of it, either copied verbatim, or with
	2434	modifications and/or translated into another language.
	2435
	2436	A ``Secondary Section'' is a named appendix or a front-matter section
	2437	of the Document that deals exclusively with the relationship of the
	2438	publishers or authors of the Document to the Document's overall
	2439	subject (or to related matters) and contains nothing that could fall
	2440	directly within that overall subject. (Thus, if the Document is in
	2441	part a textbook of mathematics, a Secondary Section may not explain
	2442	any mathematics.) The relationship could be a matter of historical
	2443	connection with the subject or with related matters, or of legal,
	2444	commercial, philosophical, ethical or political position regarding
	2445	them.
	2446
	2447	The ``Invariant Sections'' are certain Secondary Sections whose titles
	2448	are designated, as being those of Invariant Sections, in the notice
	2449	that says that the Document is released under this License. If a
	2450	section does not fit the above definition of Secondary then it is not
	2451	allowed to be designated as Invariant. The Document may contain zero
	2452	Invariant Sections. If the Document does not identify any Invariant
	2453	Sections then there are none.
	2454
	2455	The ``Cover Texts'' are certain short passages of text that are listed,
	2456	as Front-Cover Texts or Back-Cover Texts, in the notice that says that
	2457	the Document is released under this License. A Front-Cover Text may
	2458	be at most 5 words, and a Back-Cover Text may be at most 25 words.
	2459
	2460	A ``Transparent'' copy of the Document means a machine-readable copy,
	2461	represented in a format whose specification is available to the
	2462	general public, that is suitable for revising the document
	2463	straightforwardly with generic text editors or (for images composed of
	2464	pixels) generic paint programs or (for drawings) some widely available
	2465	drawing editor, and that is suitable for input to text formatters or
	2466	for automatic translation to a variety of formats suitable for input
	2467	to text formatters. A copy made in an otherwise Transparent file
	2468	format whose markup, or absence of markup, has been arranged to thwart
	2469	or discourage subsequent modification by readers is not Transparent.
	2470	An image format is not Transparent if used for any substantial amount
	2471	of text. A copy that is not ``Transparent'' is called ``Opaque''.
	2472
	2473	Examples of suitable formats for Transparent copies include plain
	2474	@sc{ascii} without markup, Texinfo input format, La@TeX{} input
	2475	format, @acronym{SGML} or @acronym{XML} using a publicly available
	2476	@acronym{DTD}, and standard-conforming simple @acronym{HTML},
	2477	PostScript or @acronym{PDF} designed for human modification. Examples
	2478	of transparent image formats include @acronym{PNG}, @acronym{XCF} and
	2479	@acronym{JPG}. Opaque formats include proprietary formats that can be
	2480	read and edited only by proprietary word processors, @acronym{SGML} or
	2481	@acronym{XML} for which the @acronym{DTD} and/or processing tools are
	2482	not generally available, and the machine-generated @acronym{HTML},
	2483	PostScript or @acronym{PDF} produced by some word processors for
	2484	output purposes only.
	2485
	2486	The ``Title Page'' means, for a printed book, the title page itself,
	2487	plus such following pages as are needed to hold, legibly, the material
	2488	this License requires to appear in the title page. For works in
	2489	formats which do not have any title page as such, ``Title Page'' means
	2490	the text near the most prominent appearance of the work's title,
	2491	preceding the beginning of the body of the text.
	2492
	2493	A section ``Entitled XYZ'' means a named subunit of the Document whose
	2494	title either is precisely XYZ or contains XYZ in parentheses following
	2495	text that translates XYZ in another language. (Here XYZ stands for a
	2496	specific section name mentioned below, such as ``Acknowledgements'',
	2497	``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title''
	2498	of such a section when you modify the Document means that it remains a
	2499	section ``Entitled XYZ'' according to this definition.
	2500
	2501	The Document may include Warranty Disclaimers next to the notice which
	2502	states that this License applies to the Document. These Warranty
	2503	Disclaimers are considered to be included by reference in this
	2504	License, but only as regards disclaiming warranties: any other
	2505	implication that these Warranty Disclaimers may have is void and has
	2506	no effect on the meaning of this License.
	2507
	2508	@item
	2509	VERBATIM COPYING
	2510
	2511	You may copy and distribute the Document in any medium, either
	2512	commercially or noncommercially, provided that this License, the
	2513	copyright notices, and the license notice saying this License applies
	2514	to the Document are reproduced in all copies, and that you add no other
	2515	conditions whatsoever to those of this License. You may not use
	2516	technical measures to obstruct or control the reading or further
	2517	copying of the copies you make or distribute. However, you may accept
	2518	compensation in exchange for copies. If you distribute a large enough
	2519	number of copies you must also follow the conditions in section 3.
	2520
	2521	You may also lend copies, under the same conditions stated above, and
	2522	you may publicly display copies.
	2523
	2524	@item
	2525	COPYING IN QUANTITY
	2526
	2527	If you publish printed copies (or copies in media that commonly have
	2528	printed covers) of the Document, numbering more than 100, and the
	2529	Document's license notice requires Cover Texts, you must enclose the
	2530	copies in covers that carry, clearly and legibly, all these Cover
	2531	Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
	2532	the back cover. Both covers must also clearly and legibly identify
	2533	you as the publisher of these copies. The front cover must present
	2534	the full title with all words of the title equally prominent and
	2535	visible. You may add other material on the covers in addition.
	2536	Copying with changes limited to the covers, as long as they preserve
	2537	the title of the Document and satisfy these conditions, can be treated
	2538	as verbatim copying in other respects.
	2539
	2540	If the required texts for either cover are too voluminous to fit
	2541	legibly, you should put the first ones listed (as many as fit
	2542	reasonably) on the actual cover, and continue the rest onto adjacent
	2543	pages.
	2544
	2545	If you publish or distribute Opaque copies of the Document numbering
	2546	more than 100, you must either include a machine-readable Transparent
	2547	copy along with each Opaque copy, or state in or with each Opaque copy
	2548	a computer-network location from which the general network-using
	2549	public has access to download using public-standard network protocols
	2550	a complete Transparent copy of the Document, free of added material.
	2551	If you use the latter option, you must take reasonably prudent steps,
	2552	when you begin distribution of Opaque copies in quantity, to ensure
	2553	that this Transparent copy will remain thus accessible at the stated
	2554	location until at least one year after the last time you distribute an
	2555	Opaque copy (directly or through your agents or retailers) of that
	2556	edition to the public.
	2557
	2558	It is requested, but not required, that you contact the authors of the
	2559	Document well before redistributing any large number of copies, to give
	2560	them a chance to provide you with an updated version of the Document.
	2561
	2562	@item
	2563	MODIFICATIONS
	2564
	2565	You may copy and distribute a Modified Version of the Document under
	2566	the conditions of sections 2 and 3 above, provided that you release
	2567	the Modified Version under precisely this License, with the Modified
	2568	Version filling the role of the Document, thus licensing distribution
	2569	and modification of the Modified Version to whoever possesses a copy
	2570	of it. In addition, you must do these things in the Modified Version:
	2571
	2572	@enumerate A
	2573	@item
	2574	Use in the Title Page (and on the covers, if any) a title distinct
	2575	from that of the Document, and from those of previous versions
	2576	(which should, if there were any, be listed in the History section
	2577	of the Document). You may use the same title as a previous version
	2578	if the original publisher of that version gives permission.
	2579
	2580	@item
	2581	List on the Title Page, as authors, one or more persons or entities
	2582	responsible for authorship of the modifications in the Modified
	2583	Version, together with at least five of the principal authors of the
	2584	Document (all of its principal authors, if it has fewer than five),
	2585	unless they release you from this requirement.
	2586
	2587	@item
	2588	State on the Title page the name of the publisher of the
	2589	Modified Version, as the publisher.
	2590
	2591	@item
	2592	Preserve all the copyright notices of the Document.
	2593
	2594	@item
	2595	Add an appropriate copyright notice for your modifications
	2596	adjacent to the other copyright notices.
	2597
	2598	@item
	2599	Include, immediately after the copyright notices, a license notice
	2600	giving the public permission to use the Modified Version under the
	2601	terms of this License, in the form shown in the Addendum below.
	2602
	2603	@item
	2604	Preserve in that license notice the full lists of Invariant Sections
	2605	and required Cover Texts given in the Document's license notice.
	2606
	2607	@item
	2608	Include an unaltered copy of this License.
	2609
	2610	@item
	2611	Preserve the section Entitled ``History'', Preserve its Title, and add
	2612	to it an item stating at least the title, year, new authors, and
	2613	publisher of the Modified Version as given on the Title Page. If
	2614	there is no section Entitled ``History'' in the Document, create one
	2615	stating the title, year, authors, and publisher of the Document as
	2616	given on its Title Page, then add an item describing the Modified
	2617	Version as stated in the previous sentence.
	2618
	2619	@item
	2620	Preserve the network location, if any, given in the Document for
	2621	public access to a Transparent copy of the Document, and likewise
	2622	the network locations given in the Document for previous versions
	2623	it was based on. These may be placed in the ``History'' section.
	2624	You may omit a network location for a work that was published at
	2625	least four years before the Document itself, or if the original
	2626	publisher of the version it refers to gives permission.
	2627
	2628	@item
	2629	For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
	2630	the Title of the section, and preserve in the section all the
	2631	substance and tone of each of the contributor acknowledgements and/or
	2632	dedications given therein.
	2633
	2634	@item
	2635	Preserve all the Invariant Sections of the Document,
	2636	unaltered in their text and in their titles. Section numbers
	2637	or the equivalent are not considered part of the section titles.
	2638
	2639	@item
	2640	Delete any section Entitled ``Endorsements''. Such a section
	2641	may not be included in the Modified Version.
	2642
	2643	@item
	2644	Do not retitle any existing section to be Entitled ``Endorsements'' or
	2645	to conflict in title with any Invariant Section.
	2646
	2647	@item
	2648	Preserve any Warranty Disclaimers.
	2649	@end enumerate
	2650
	2651	If the Modified Version includes new front-matter sections or
	2652	appendices that qualify as Secondary Sections and contain no material
	2653	copied from the Document, you may at your option designate some or all
	2654	of these sections as invariant. To do this, add their titles to the
	2655	list of Invariant Sections in the Modified Version's license notice.
	2656	These titles must be distinct from any other section titles.
	2657
	2658	You may add a section Entitled ``Endorsements'', provided it contains
	2659	nothing but endorsements of your Modified Version by various
	2660	parties---for example, statements of peer review or that the text has
	2661	been approved by an organization as the authoritative definition of a
	2662	standard.
	2663
	2664	You may add a passage of up to five words as a Front-Cover Text, and a
	2665	passage of up to 25 words as a Back-Cover Text, to the end of the list
	2666	of Cover Texts in the Modified Version. Only one passage of
	2667	Front-Cover Text and one of Back-Cover Text may be added by (or
	2668	through arrangements made by) any one entity. If the Document already
	2669	includes a cover text for the same cover, previously added by you or
	2670	by arrangement made by the same entity you are acting on behalf of,
	2671	you may not add another; but you may replace the old one, on explicit
	2672	permission from the previous publisher that added the old one.
	2673
	2674	The author(s) and publisher(s) of the Document do not by this License
	2675	give permission to use their names for publicity for or to assert or
	2676	imply endorsement of any Modified Version.
	2677
	2678	@item
	2679	COMBINING DOCUMENTS
	2680
	2681	You may combine the Document with other documents released under this
	2682	License, under the terms defined in section 4 above for modified
	2683	versions, provided that you include in the combination all of the
	2684	Invariant Sections of all of the original documents, unmodified, and
	2685	list them all as Invariant Sections of your combined work in its
	2686	license notice, and that you preserve all their Warranty Disclaimers.
	2687
	2688	The combined work need only contain one copy of this License, and
	2689	multiple identical Invariant Sections may be replaced with a single
	2690	copy. If there are multiple Invariant Sections with the same name but
	2691	different contents, make the title of each such section unique by
	2692	adding at the end of it, in parentheses, the name of the original
	2693	author or publisher of that section if known, or else a unique number.
	2694	Make the same adjustment to the section titles in the list of
	2695	Invariant Sections in the license notice of the combined work.
	2696
	2697	In the combination, you must combine any sections Entitled ``History''
	2698	in the various original documents, forming one section Entitled
	2699	``History''; likewise combine any sections Entitled ``Acknowledgements'',
	2700	and any sections Entitled ``Dedications''. You must delete all
	2701	sections Entitled ``Endorsements.''
	2702
	2703	@item
	2704	COLLECTIONS OF DOCUMENTS
	2705
	2706	You may make a collection consisting of the Document and other documents
	2707	released under this License, and replace the individual copies of this
	2708	License in the various documents with a single copy that is included in
	2709	the collection, provided that you follow the rules of this License for
	2710	verbatim copying of each of the documents in all other respects.
	2711
	2712	You may extract a single document from such a collection, and distribute
	2713	it individually under this License, provided you insert a copy of this
	2714	License into the extracted document, and follow this License in all
	2715	other respects regarding verbatim copying of that document.
	2716
	2717	@item
	2718	AGGREGATION WITH INDEPENDENT WORKS
	2719
	2720	A compilation of the Document or its derivatives with other separate
	2721	and independent documents or works, in or on a volume of a storage or
	2722	distribution medium, is called an ``aggregate'' if the copyright
	2723	resulting from the compilation is not used to limit the legal rights
	2724	of the compilation's users beyond what the individual works permit.
	2725	When the Document is included in an aggregate, this License does not
	2726	apply to the other works in the aggregate which are not themselves
	2727	derivative works of the Document.
	2728
	2729	If the Cover Text requirement of section 3 is applicable to these
	2730	copies of the Document, then if the Document is less than one half of
	2731	the entire aggregate, the Document's Cover Texts may be placed on
	2732	covers that bracket the Document within the aggregate, or the
	2733	electronic equivalent of covers if the Document is in electronic form.
	2734	Otherwise they must appear on printed covers that bracket the whole
	2735	aggregate.
	2736
	2737	@item
	2738	TRANSLATION
	2739
	2740	Translation is considered a kind of modification, so you may
	2741	distribute translations of the Document under the terms of section 4.
	2742	Replacing Invariant Sections with translations requires special
	2743	permission from their copyright holders, but you may include
	2744	translations of some or all Invariant Sections in addition to the
	2745	original versions of these Invariant Sections. You may include a
	2746	translation of this License, and all the license notices in the
	2747	Document, and any Warranty Disclaimers, provided that you also include
	2748	the original English version of this License and the original versions
	2749	of those notices and disclaimers. In case of a disagreement between
	2750	the translation and the original version of this License or a notice
	2751	or disclaimer, the original version will prevail.
	2752
	2753	If a section in the Document is Entitled ``Acknowledgements'',
	2754	``Dedications'', or ``History'', the requirement (section 4) to Preserve
	2755	its Title (section 1) will typically require changing the actual
	2756	title.
	2757
	2758	@item
	2759	TERMINATION
	2760
	2761	You may not copy, modify, sublicense, or distribute the Document except
	2762	as expressly provided for under this License. Any other attempt to
	2763	copy, modify, sublicense or distribute the Document is void, and will
	2764	automatically terminate your rights under this License. However,
	2765	parties who have received copies, or rights, from you under this
	2766	License will not have their licenses terminated so long as such
	2767	parties remain in full compliance.
	2768
	2769	@item
	2770	FUTURE REVISIONS OF THIS LICENSE
	2771
	2772	The Free Software Foundation may publish new, revised versions
	2773	of the GNU Free Documentation License from time to time. Such new
	2774	versions will be similar in spirit to the present version, but may
	2775	differ in detail to address new problems or concerns. See
	2776	@uref{http://www.gnu.org/copyleft/}.
	2777
	2778	Each version of the License is given a distinguishing version number.
	2779	If the Document specifies that a particular numbered version of this
	2780	License ``or any later version'' applies to it, you have the option of
	2781	following the terms and conditions either of that specified version or
	2782	of any later version that has been published (not as a draft) by the
	2783	Free Software Foundation. If the Document does not specify a version
	2784	number of this License, you may choose any version ever published (not
	2785	as a draft) by the Free Software Foundation.
	2786	@end enumerate
	2787
	2788	@page
	2789	@heading ADDENDUM: How to use this License for your documents
	2790
	2791	To use this License in a document you have written, include a copy of
	2792	the License in the document and put the following copyright and
	2793	license notices just after the title page:
	2794
	2795	@smallexample
	2796	@group
	2797	Copyright (C) @var{year} @var{your name}.
	2798	Permission is granted to copy, distribute and/or modify this document
	2799	under the terms of the GNU Free Documentation License, Version 1.2
	2800	or any later version published by the Free Software Foundation;
	2801	with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
	2802	Texts. A copy of the license is included in the section entitled ``GNU
	2803	Free Documentation License''.
	2804	@end group
	2805	@end smallexample
	2806
	2807	If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
	2808	replace the ``with@dots{}Texts.'' line with this:
	2809
	2810	@smallexample
	2811	@group
	2812	with the Invariant Sections being @var{list their titles}, with
	2813	the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
	2814	being @var{list}.
	2815	@end group
	2816	@end smallexample
	2817
	2818	If you have Invariant Sections without Cover Texts, or some other
	2819	combination of the three, merge those two alternatives to suit the
	2820	situation.
	2821
	2822	If your document contains nontrivial examples of program code, we
	2823	recommend releasing these examples in parallel under your choice of
	2824	free software license, such as the GNU General Public License,
	2825	to permit their use in free software.
	2826
	2827	@c Local Variables:
	2828	@c ispell-local-pdict: "ispell-dict"
	2829	@c End:
	2830
	2831
	2832	@c ---------------------------------------------------------------------
	2833	@c ---------------------------------------------------------------------
	2834
	2835	@node Reporting bugs
	2836	@chapter Reporting bugs
	2837
	2838	Report bugs to <obrebski@@amu.edu.pl>.
	2839
	2840	@c ---------------------------------------------------------------------
	2841	@c ---------------------------------------------------------------------
	2842
	2843	@c @node Copyright
	2844	@c @chapter Copyright
	2845	@c
[9ace5d2]	2846	@c Copyright 2004 by Tomasz ObrÄbski
[25ae32e]	2847	@c This software is free for research and educational use.
	2848
	2849	@c ---------------------------------------------------------------------
	2850	@c ---------------------------------------------------------------------
	2851
	2852	@node Author
	2853	@chapter Author
	2854
	2855
	2856	@bye

Note: See TracBrowser for help on using the repository browser.

UAM Text Tools

Context Navigation

source: app/doc/utt.texinfo @ 9ace5d2

Download in other formats: