Ignore:
Timestamp:
10/29/08 11:17:16 (16 years ago)
Author:
obrebski <obrebski@…>
Branches:
master, help
Children:
91ed676
Parents:
261bf62
git-author:
obrebski <obrebski@…> (10/29/08 11:17:16)
git-committer:
obrebski <obrebski@…> (10/29/08 11:17:16)
Message:

Ta linia i następne zostaną zignorowane--

M app/dist/files/README

uaktualnione

M app/doc/utt.texinfo

dopiski

M app/src/gue/Makefile

statyczne biblioteki

M app/src/cor/cmdline_cor.ggo

usuniecie nie dzialajacych parametrow

M app/src/cor/Makefile

statyczne biblioteki

M app/src/common/cmdline_common.ggo

?

M app/src/kor/Makefile

statyczne biblioteki

M app/src/lem/Makefile

statyczne biblioteki

M lang/dist/tarball/Makefile

pakowanie modulow jezykowych po jednym

M lang/Makefile

-"-

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@61 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

File:
1 edited

Legend:

Unmodified
Added
Removed
  • app/doc/utt.texinfo

    r261bf62 re28a625  
    367367@section Flattened UTT file 
    368368 
    369 A UTT file format has two variants: regular and flattend. The regular 
     369A UTT file format has two variants: regular and flattened. The regular 
    370370format was described above.  In the flattened format some of the 
    371371end-of-line characters are replaced with line-feed characters. 
     
    16081608 
    16091609@example 
    1610 cat corpus | tok | sen | lem | grp -a p | lzop -7 > corpus.grp.lzo 
    1611 @end example 
    1612  
    1613 @example 
    1614 lzop -cd corpus.grp.lzo | grp -a gP -e @var{EXPR} | ser -e @var{EXPR} 
     1610cat corpus | tok | sen | lem -1 | fla | lzop -7 > corpus.grp.lzo 
     1611@end example 
     1612 
     1613@example 
     1614lzop -cd corpus.grp.lzo | grp -e @var{EXPR} | unfla | ser -e @var{EXPR} 
    16151615@end example 
    16161616 
     
    16271627@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    16281628@item @strong{Authors:}                 @tab Marcin Walas, Tomasz Obrêbski 
    1629 @item @strong{Component category:}      @tab filter 
     1629@item @strong{Input format:}            @tab UTT flattened 
     1630@item @strong{Output format:}           @tab UTT flattened 
     1631@item @strong{Required annotation:}     @tab tok, sen, lem -1 
    16301632@end multitable 
    16311633 
    16321634[TODO] 
     1635 
     1636(see mar's help 'mar -h' for some information) 
    16331637 
    16341638@c --------------------------------------------------------------------- 
     
    18711875 
    18721876 
     1877@c ------------------------------------------------------------------------------- 
     1878@c FLA 
     1879@c ------------------------------------------------------------------------------- 
     1880 
    18731881@page 
    18741882@node fla 
     
    18771885@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    18781886@item @strong{Authors:}                 @tab Tomasz Obrêbski 
    1879 @item @strong{Component category:}      @tab filter 
     1887@item @strong{Input format:}            @tab UTT regular 
     1888@item @strong{Output format:}           @tab UTT flattened 
     1889@item @strong{Required annotation:}     @tab sen 
    18801890@end multitable 
    18811891@c 
     1892 
     1893@menu 
     1894* fla description:: 
     1895@c * fla command line options:: 
     1896@c * fla usage example:: 
     1897@end menu 
     1898 
     1899 
     1900@node fla description 
     1901@subsection Description 
    18821902 
    18831903@command{fla} ``flattens'' a utt file by merging segments belonging 
     
    19021922segment contains a fragment matching the @code{<bosregex>}). By 
    19031923default, segments containing a field @code{BOS} are seeked. 
    1904 @c @menu 
    1905 @c * con command line options:: 
    1906 @c * con usage example:: 
    1907 @c * con hints::     
    1908 @c @end menu 
    1909  
    1910  
     1924 
     1925@c ------------------------------------------------------------------------------- 
     1926@c UNFLA 
     1927@c ------------------------------------------------------------------------------- 
    19111928 
    19121929@page 
     
    19161933@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    19171934@item @strong{Authors:}                 @tab Tomasz Obrêbski 
    1918 @item @strong{Component category:}      @tab filter 
     1935@item @strong{Input format:}            @tab UTT flattened 
     1936@item @strong{Output format:}           @tab UTT regular 
     1937@item @strong{Required annotation:}     @tab - 
    19191938@end multitable 
    19201939 
     1940@menu 
     1941* unfla description:: 
     1942@c * fla command line options:: 
     1943@c * fla usage example:: 
     1944@end menu 
     1945 
     1946@node unfla description 
     1947@subsection Description 
    19211948@command{unfla} transforms a flattened UTT file, produced by 
    19221949@command{fla}, into the regular format by restoring end-of-line 
     
    19711998 
    19721999@example 
    1973 cat text | tok | lem --only-fail | cor -1 > output3 
     2000cat text | tok | egrep ' W ' | lem | egrep -v 'lem:' | cor -1 
    19742001@end example 
    19752002 
     
    20202047As @command{grp} (@command{grep}) processes data faster then it is 
    20212048read from the disk drive, the search time may be still shortened by 
    2022 using file compression techniques.  We suggest usin @command{lzop}. 
     2049using file compression techniques.  We suggest using the 
     2050@command{lzop} compressor/decompressor. 
    20232051 
    20242052@item the fastest way to search a large corpus 
    20252053 
    2026 step 1: preprocessing 
     2054step 1: corpus preprocessing 
    20272055 
    20282056@example 
    20292057cat corpus | tok | sen | lem -1 \ 
    2030 | grp -a p | lzop -7 > corpus.grp.lzo 
     2058| fla | lzop -7 > corpus.grp.lzo 
    20312059@end example 
    20322060 
     
    20342062 
    20352063@example 
    2036 lzop -cd corpus.grp.lzo | grp -a gP -e 'cat(<V>) space 
     2064lzop -cd corpus.grp.lzo | unfla | grp -e 'cat(<V>) space 
    20372065lexeme(rozmowa)' | ser -e 'cat(<V>) space lexeme(rozmowa)' | con 
    20382066@end example 
     
    20402068@end enumerate 
    20412069 
    2042 @subsubheading More complicated configurations 
    2043  
    2044  
    2045 @example 
    2046 mknod fifo1 p 
    2047 mknod fifo2 p 
    2048 mknod fifo3 p 
    2049 mknod fifo4 p 
    2050 mknod fifo5 p 
    2051  
    2052 tok | lem -p W -e fifo1 > fifo2 & 
    2053 cor -e fifo3 < fifo1 | lem > fifo4 & 
    2054 gue < fifo3 > fifo5 & 
    2055 sort -m fifo2 fifo4 fifo5 
    2056  
    2057 rm fifo? 
    2058 @end example 
     2070@c @subsubheading More complicated configurations 
     2071 
     2072 
     2073@c @example 
     2074@c mknod fifo1 p 
     2075@c mknod fifo2 p 
     2076@c mknod fifo3 p 
     2077@c mknod fifo4 p 
     2078@c mknod fifo5 p 
     2079 
     2080@c tok | lem -p W -e fifo1 > fifo2 & 
     2081@c cor -e fifo3 < fifo1 | lem > fifo4 & 
     2082@c gue < fifo3 > fifo5 & 
     2083@c sort -m fifo2 fifo4 fifo5 
     2084 
     2085@c rm fifo? 
     2086@c @end example 
    20592087 
    20602088 
Note: See TracChangeset for help on using the changeset viewer.