Ignore:
Timestamp:
12/11/08 22:20:14 (16 years ago)
Author:
obrebski <obrebski@…>
Branches:
master, help
Children:
2d89d4b
Parents:
91ed676
git-author:
obrebski <obrebski@…> (12/11/08 22:20:14)
git-committer:
obrebski <obrebski@…> (12/11/08 22:20:14)
Message:

trochę zmian

M app/doc/utt.texinfo
M app/src/dgp/sgraph.hh
M app/src/dgp/const.hh
M app/src/dgp/grammar.hh
M app/src/dgp/thesymbols.hh
M app/src/dgp/dgc
M app/src/dgp/sgraph.cc
M app/src/dgp/grammar.cc

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@63 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

File:
1 edited

Legend:

Unmodified
Added
Removed
  • app/doc/utt.texinfo

    re28a625 r9ace5d2  
     1 
    12\input texinfo   @c -*-texinfo-*- 
    2 @documentencoding ISO-8859-2 
     3@c @documentencoding ISO-8859-2 
     4@documentencoding UTF-8 
    35@c @documentlanguage pl 
    46 
     
    1113This manual is for UAM Text Tools (version 0.90, October, 2008) 
    1214 
    13 Copyright @copyright{}  2005, 2007  Tomasz Obrêbski, Micha³ Stolarski, Justyna Walkowska, Pawe³ Konieczka. 
     15Copyright @copyright{}  2005, 2007  Tomasz Obrębski, Michał Stolarski, Justyna Walkowska, Paweł Konieczka. 
    1416 
    1517Permission is granted to copy, distribute and/or modify this document 
     
    3133@subtitle edition 0.01, @today 
    3234@subtitle status: prescript 
    33 @author by Justyna Walkowska, Tomasz Obr@,{}ebski and Micha@l{} Stolarski 
     35@author by Justyna Walkowska, Tomasz Obrębski and Michał Stolarski 
    3436@page 
    3537@vskip 0pt plus 1filll 
     
    4244 
    4345@iftex 
     46@tex 
     47% \usepackage[T1]{fontenc} 
     48% \usepackage[utf8]{inputenc} 
     49% \usepackage{times} 
     50@end tex 
     51 
    4452@parskip = 0.5@normalbaselineskip plus 3pt minus 1pt 
    4553@end iftex 
    46  
    4754@c @headings off 
    4855@c @everyheading LEM(1) @| @| LEM(1) 
     
    8491 
    8592@item 
    86 tokenization 
     93tokenization ółĠ
     94ÅŒ 
    8795@item 
    8896dictionary-based morphological analysis 
     
    9098heuristic morphological analysis of unknown words 
    9199@item 
    92 spelling correction 
     100spelling correction ółĠ
     101śćŌ 
    93102@item 
    94103pattern search 
     
    125134@itemize 
    126135@item Pawel Konieczka 
    127 @item Tomasz Obrebski 
    128 @item Michal Stolarski 
     136@item Tomasz Obrębski 
     137@item Michał Stolarski 
    129138@item Marcin Walas 
    130139@item Justyna Walkowska 
    131 @item Pawel Werenski 
     140@item Paweł Wereński 
    132141@end itemize 
    133142 
     
    251260@example 
    2522610000 00 BOS * 
    253 0000 07 W Piszemy lem:pisaÊ,V 
     2620000 07 W Piszemy lem:pisać,V 
    2542630007 01 S _ 
    2552640008 05 W dobre lem:dobry,ADJ 
     
    2622710024 11 W Warszawiacy lem:Warszawiak,N 
    2632720035 01 S _ 
    264 0036 03 W te¿ 
     2730036 03 W teÅŒ 
    2652740039 01 P . 
    2662750040 00 EOS * 
     
    270279@example 
    2712800000 BOS * 
    272 0000 W Piszemy lem:pisaÊ,V 
     2810000 W Piszemy lem:pisać,V 
    2732820007 S _ 
    2742830008 W dobre lem:dobry,ADJ 
     
    283292@example 
    2842930000 BOS * 
    285 W Piszemy lem:pisaÊ,V 
     294W Piszemy lem:pisać‡,V 
    286295S _ 
    287296W dobre lem:dobry,ADJ 
     
    294303W Warszawiacy lem:Warszawiak,N 
    295304S _ 
    296 W te¿ 
     305W teÅŒ 
    297306P . 
    298307EOS * 
     
    429438 
    430439 
    431 @c [JAK UZYSKAÆ POLSKIE CZCIONKI W DVI???] 
     440@c [JAK UZYSKAÆ POLSKIE CZCIONKI W DVI???] 
    432441 
    433442@macro parhelp 
     
    651660 
    652661@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    653 @item @strong{Authors:}                 @tab Tomasz Obrêbski 
     662@item @strong{Authors:}                 @tab Tomasz Obrębski 
    654663@item @strong{Component category:}      @tab source 
    655664@item @strong{Input format:}            @tab raw text file 
     
    756765@c @chapter sen - sentencizer 
    757766 
    758 @c Authors: Tomasz Obrêbski 
     767@c Authors: Tomasz Obrębski 
    759768 
    760769@c --------------------------------------------------------------------- 
     
    767776 
    768777@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    769 @item @strong{Authors:}                 @tab Tomasz Obrêbski, Micha³ Stolarski 
     778@item @strong{Authors:}                 @tab Tomasz Obrębski, Michał Stolarski 
    770779@item @strong{Component category:}      @tab filter 
    771780@item @strong{Input format:}            @tab UTT regular 
     
    871880 
    872881@example 
    873 0000 07 W Piszemy lem:pisaÊ,V/AiVpMdTrfNpP1 
     8820000 07 W Piszemy lem:pisać,V/AiVpMdTrfNpP1 
    8748830007 01 B _ 
    8758840008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn 
     
    886895 
    887896@example 
    888 0000 07 W Piszemy lem:pisaÊ,V/AiVpMdTrfNpP1 
     8970000 07 W Piszemy lem:pisać,V/AiVpMdTrfNpP1 
    8898980007 01 S _ 
    8908990008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn 
     
    898907 
    899908@example 
    900 0000 07 W Piszemy lem:pisaÊ,V/AiVpMdTrfNpP1 
     9090000 07 W Piszemy lem:pisać,V/AiVpMdTrfNpP1 
    9019100007 01 S _ 
    9029110008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn 
     
    932941string @code{<add1>}, replace suffix of length @code{<cut2>} with string 
    933942@code{<add2>}. For example @code{3t} transforms @samp{kocie} into 
    934 @samp{kot}, @code{3-4a³y} transforms @samp{najbielsi} into @samp{bia³y} 
     943@samp{kot}, @code{3-4aÂły} transforms @samp{najbielsi} into @samp{biaÂły} 
    935944 
    936945Each dictionary entry must be written in one line and must not contain blank characters. 
     
    943952kotem;2,N/GaNsCi 
    944953kocie;3t,N/GaNsCl;3t,N/GaNsCv 
    945 najbielsi;3-4a³y,ADJ/DsNpCnGp 
    946 najbielsze;3-5a³y,ADJ/DsNpCnGaifn 
     954najbielsi;3-4ały,ADJ/DsNpCnGp 
     955najbielsze;3-5ały,ADJ/DsNpCnGaifn 
    947956najlepsi;dobry,ADJ/DsNpCnGp 
    948957najlepsze;dobry,ADJ/DsNpCnGaifn 
     
    10091018@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    10101019 
    1011 @item @strong{Authors:}                 @tab Micha³ Stolarski, Tomasz Obrêbski 
     1020@item @strong{Authors:}                 @tab Michał Stolarski, Tomasz Obrębski 
    10121021@item @strong{Component category:}      @tab filter 
    10131022 
     
    11061115 
    11071116 
    1108 Example: @code{3-4a³y} transforms @i{najbielsi} into @i{bia³y} 
     1117Example: @code{3-4ały} transforms @i{najbielsi} into @i{biały} 
    11091118 
    11101119 
     
    11141123likelihood of the guess. 
    11151124 
    1116 @example 
    1117 *³kê;1a,N/GfNsCa 
    1118 naj*elszy;3-4a³y,ADJ/...:... 
    1119 @end example 
     1125@c @example 
     1126@c *łkę;1a,N/GfNsCa 
     1127@c naj*elszy;3-4ały,ADJ/...:... 
     1128@c @end example 
    11201129 
    11211130 
     
    11291138 
    11301139@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    1131 @item @strong{Authors:}                 @tab Tomasz Obrêbski, Micha³ Stolarski 
     1140@item @strong{Authors:}                 @tab Tomasz Obrębski, Michał Stolarski 
    11321141@item @strong{Component category:}      @tab filter 
    11331142@item @strong{Input format:}            @tab UTT regular 
     
    12161225@section kor - configurable spelling corrector 
    12171226 
    1218 [TODO] 
     1227@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
     1228@item @strong{Authors:}                 @tab Paweł Werenski, Tomasz Obrębski, Michał Stolarski 
     1229@item @strong{Component category:}      @tab filter 
     1230@item @strong{Input format:}            @tab UTT regular 
     1231@item @strong{Output format:}           @tab UTT regular 
     1232@item @strong{Required annotation:}     @tab tok 
     1233@end multitable 
     1234 
     1235@menu 
     1236* kor description:: 
     1237* kor command line options:: 
     1238* kor weights definition file::     
     1239* kor dictionaries::             
     1240@end menu 
     1241 
     1242 
     1243@node kor description 
     1244@subsection Description 
     1245 
     1246The spelling corrector applies a Pawel Werenski's dynamic programming 
     1247algorithm to the FSA representation of the set of word forms of the 
     1248Polex/PMDBF dictionary. The algorithm is an extension of K. Oflazer 
     1249algorithm used by @command{cor}. In the extended version it is 
     1250possible to assign weights to individual edit operations. 
     1251 
     1252Given an incorrect word form it returns all word forms 
     1253present in the dictionary whose edit distance is smaller than the 
     1254threshold given as the parameter. 
     1255 
     1256 
     1257@node kor command line options 
     1258@subsection Command line options 
     1259 
     1260@table @code 
     1261 
     1262@parhelp 
     1263@parversion 
     1264@parinteractive 
     1265@c @parfile 
     1266@c @paroutput 
     1267@c @parfail 
     1268@c @parcopy 
     1269@parinputfield 
     1270@paroutputfield 
     1271@pardictionary 
     1272@parprocess 
     1273@parselect 
     1274@parunselect 
     1275@paroneline 
     1276@paronefield 
     1277 
     1278@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}} 
     1279Maximum edit distance (default='1'). 
     1280 
     1281@item @b{@minus{}@minus{}weights=@var{filename}, @minus{}w @var{filename}} 
     1282Edit operations' weights file. 
     1283 
     1284@c @item @b{@minus{}@minus{}replace, @minus{}r} 
     1285@c Replace original form with corrected form, place original form in the 
     1286@c cor field. This option has no effect in @option{--one-*} modes (default=off) 
     1287 
     1288 
     1289@end table 
     1290 
     1291 
     1292@node kor weights definition file 
     1293@subsection Weights definition file 
     1294 
     1295Example: 
     1296 
     1297@example 
     1298 
     1299%stdcor 1 
     1300%xchg   1 
     1301ÅŒ  rz 0.5 
     1302ch h  0.5 
     1303u  ó  0.5 
     1304 
     1305@end example 
     1306 
     1307 
     1308Default weight is set to 1 (@code{%stdcor 1}), the weight of exchange 
     1309operation is set to 1 (@code{%xchg 1}), the three principal orthographic 
     1310errors are assigned the weight 0.5. 
     1311 
     1312The edit operation weight declaration, such as 
     1313 
     1314@example 
     1315ÅŒ  rz 0.5 
     1316@end example 
     1317 
     1318works in both ways, i.e. ÅŒ->rz, rz->ÅŒ. 
     1319 
     1320The default weights definition file for @code{kor} is: 
     1321 
     1322@example 
     1323$HOME/.local/share/utt/weights.kor 
     1324@end example 
     1325 
     1326or, if the above mentioned file is absent: 
     1327 
     1328@example 
     1329/usr/local/share/utt/weights.kor 
     1330@end example 
     1331 
     1332 
     1333@node kor dictionaries 
     1334@subsection Dictionaries 
     1335 
     1336see @command{cor} 
    12191337 
    12201338@c --------------------------------------------------------------------- 
     
    12281346@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    12291347 
    1230 @item @strong{Authors:}                 @tab Tomasz Obrêbski 
     1348@item @strong{Authors:}                 @tab Tomasz Obrębski 
    12311349@item @strong{Component category:}      @tab filter 
    12321350@item @strong{Input format:}            @tab UTT regular 
     
    12561374 
    12571375input: 
    1258 0000 05 W Cze¶Ê 
     13760000 05 W Cześć 
    125913770005 01 P ! 
    126013780006 01 S _ 
     
    12671385output: 
    126813860000 00 BOS * 
    1269 0000 05 W Cze¶Ê 
     13870000 05 W Cześć 
    127013880005 01 P ! 
    127113890006 00 EOS * 
     
    12881406@c @chapter gph - graphizer 
    12891407 
    1290 @c Authors: Tomasz Obrêbski 
     1408@c Authors: Tomasz Obrębski 
    12911409 
    12921410 
     
    13011419 
    13021420@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    1303 @item @strong{Authors:}                 @tab Tomasz Obrêbski 
     1421@item @strong{Authors:}                 @tab Tomasz Obrębski 
    13041422@item @strong{Component category:}      @tab filter 
    13051423@item @strong{Input format:}            @tab UTT regular 
     
    15371655 
    15381656@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    1539 @item @strong{Authors:}                 @tab Tomasz Obrêbski 
     1657@item @strong{Authors:}                 @tab Tomasz Obrębski 
    15401658@item @strong{Component category:}      @tab filter 
    15411659@item @strong{Input format:}            @tab UTT flattened 
     
    16261744 
    16271745@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    1628 @item @strong{Authors:}                 @tab Marcin Walas, Tomasz Obrêbski 
     1746@item @strong{Authors:}                 @tab Marcin Walas, Tomasz Obrębski 
    16291747@item @strong{Input format:}            @tab UTT flattened 
    16301748@item @strong{Output format:}           @tab UTT flattened 
     
    16461764 
    16471765@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    1648 @item @strong{Authors:}                 @tab Tomasz Obrêbski 
     1766@item @strong{Authors:}                 @tab Tomasz Obrębski 
    16491767@item @strong{Component category:}      @tab filter 
    16501768@item @strong{Input format:}            @tab UTT regular 
     
    18391957 
    18401958@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    1841 @item @strong{Authors:}                 @tab Michal Stolarski, Tomasz Obrebski 
     1959@item @strong{Authors:}                 @tab Michał Stolarski, Tomasz Obrębski 
    18421960@item @strong{Component category:}      @tab additional tool 
    18431961@end multitable 
     
    18842002 
    18852003@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    1886 @item @strong{Authors:}                 @tab Tomasz Obrêbski 
     2004@item @strong{Authors:}                 @tab Tomasz Obrębski 
    18872005@item @strong{Input format:}            @tab UTT regular 
    18882006@item @strong{Output format:}           @tab UTT flattened 
     
    19322050 
    19332051@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    1934 @item @strong{Authors:}                 @tab Tomasz Obrêbski 
     2052@item @strong{Authors:}                 @tab Tomasz Obrębski 
    19352053@item @strong{Input format:}            @tab UTT flattened 
    19362054@item @strong{Output format:}           @tab UTT regular 
     
    22362354@tab @code{v} @tab vocative. 
    22372355@item 
    2238 @item 
    22392356@code{G} @tab @tab Gender 
    22402357@item 
     
    27292846@c @chapter Copyright 
    27302847@c  
    2731 @c Copyright 2004 by Tomasz Obrebski 
     2848@c Copyright 2004 by Tomasz Obrębski 
    27322849@c This software is free for research and educational use. 
    27332850 
Note: See TracChangeset for help on using the changeset viewer.