Changeset e28a625


Ignore:
Timestamp:
10/29/08 11:17:16 (16 years ago)
Author:
obrebski <obrebski@…>
Branches:
master, help
Children:
91ed676
Parents:
261bf62
git-author:
obrebski <obrebski@…> (10/29/08 11:17:16)
git-committer:
obrebski <obrebski@…> (10/29/08 11:17:16)
Message:

Ta linia i następne zostaną zignorowane--

M app/dist/files/README

uaktualnione

M app/doc/utt.texinfo

dopiski

M app/src/gue/Makefile

statyczne biblioteki

M app/src/cor/cmdline_cor.ggo

usuniecie nie dzialajacych parametrow

M app/src/cor/Makefile

statyczne biblioteki

M app/src/common/cmdline_common.ggo

?

M app/src/kor/Makefile

statyczne biblioteki

M app/src/lem/Makefile

statyczne biblioteki

M lang/dist/tarball/Makefile

pakowanie modulow jezykowych po jednym

M lang/Makefile

-"-

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@61 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

Files:
10 edited

Legend:

Unmodified
Added
Removed
  • app/dist/files/README

    ra4d0da5 re28a625  
    1818Installation 
    1919************** 
    20 Run utt_make_config.pl to create configuration files. 
    21 Configuration files will be created in ~/.utt/ 
     20 
     211) unpack the UTT tar archive 
     222) in the same directory, unpack the tar archives of all UTT dictionary modules you have 
     233) run 
     24        make install 
     25   in the root directory of the installation 
     264) add the bin directory to the PATH variable 
     27 
     28 
     29Requirements 
     30************* 
     31 
     32* File::HomeDir 
     33 
     34  the Perl package File::HomeDir must be installed 
     35  (to install the package, run 'perl -MCPAN -e shell' and write 
     36   'install File::HomeDir' after the 'cpan>' prompt appears) 
     37    
     38* flex 
     39 
     40  to run the ser component, flex must be installed in your system 
     41 
     42* ruby 
     43 
     44  to run the tre component, ruby must be installed in your system 
     45 
     46* locale pl_PL.iso-8852-2 
     47 
     48  the locales pl_PL.iso-8859-2 (pl_PL in short) must be installed 
     49  and set while using UTT with the Polish module. The text you  
     50  process with UTT must be encoded in iso-8859-2. 
     51   
  • app/doc/utt.texinfo

    r261bf62 re28a625  
    367367@section Flattened UTT file 
    368368 
    369 A UTT file format has two variants: regular and flattend. The regular 
     369A UTT file format has two variants: regular and flattened. The regular 
    370370format was described above.  In the flattened format some of the 
    371371end-of-line characters are replaced with line-feed characters. 
     
    16081608 
    16091609@example 
    1610 cat corpus | tok | sen | lem | grp -a p | lzop -7 > corpus.grp.lzo 
    1611 @end example 
    1612  
    1613 @example 
    1614 lzop -cd corpus.grp.lzo | grp -a gP -e @var{EXPR} | ser -e @var{EXPR} 
     1610cat corpus | tok | sen | lem -1 | fla | lzop -7 > corpus.grp.lzo 
     1611@end example 
     1612 
     1613@example 
     1614lzop -cd corpus.grp.lzo | grp -e @var{EXPR} | unfla | ser -e @var{EXPR} 
    16151615@end example 
    16161616 
     
    16271627@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    16281628@item @strong{Authors:}                 @tab Marcin Walas, Tomasz Obrêbski 
    1629 @item @strong{Component category:}      @tab filter 
     1629@item @strong{Input format:}            @tab UTT flattened 
     1630@item @strong{Output format:}           @tab UTT flattened 
     1631@item @strong{Required annotation:}     @tab tok, sen, lem -1 
    16301632@end multitable 
    16311633 
    16321634[TODO] 
     1635 
     1636(see mar's help 'mar -h' for some information) 
    16331637 
    16341638@c --------------------------------------------------------------------- 
     
    18711875 
    18721876 
     1877@c ------------------------------------------------------------------------------- 
     1878@c FLA 
     1879@c ------------------------------------------------------------------------------- 
     1880 
    18731881@page 
    18741882@node fla 
     
    18771885@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    18781886@item @strong{Authors:}                 @tab Tomasz Obrêbski 
    1879 @item @strong{Component category:}      @tab filter 
     1887@item @strong{Input format:}            @tab UTT regular 
     1888@item @strong{Output format:}           @tab UTT flattened 
     1889@item @strong{Required annotation:}     @tab sen 
    18801890@end multitable 
    18811891@c 
     1892 
     1893@menu 
     1894* fla description:: 
     1895@c * fla command line options:: 
     1896@c * fla usage example:: 
     1897@end menu 
     1898 
     1899 
     1900@node fla description 
     1901@subsection Description 
    18821902 
    18831903@command{fla} ``flattens'' a utt file by merging segments belonging 
     
    19021922segment contains a fragment matching the @code{<bosregex>}). By 
    19031923default, segments containing a field @code{BOS} are seeked. 
    1904 @c @menu 
    1905 @c * con command line options:: 
    1906 @c * con usage example:: 
    1907 @c * con hints::     
    1908 @c @end menu 
    1909  
    1910  
     1924 
     1925@c ------------------------------------------------------------------------------- 
     1926@c UNFLA 
     1927@c ------------------------------------------------------------------------------- 
    19111928 
    19121929@page 
     
    19161933@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
    19171934@item @strong{Authors:}                 @tab Tomasz Obrêbski 
    1918 @item @strong{Component category:}      @tab filter 
     1935@item @strong{Input format:}            @tab UTT flattened 
     1936@item @strong{Output format:}           @tab UTT regular 
     1937@item @strong{Required annotation:}     @tab - 
    19191938@end multitable 
    19201939 
     1940@menu 
     1941* unfla description:: 
     1942@c * fla command line options:: 
     1943@c * fla usage example:: 
     1944@end menu 
     1945 
     1946@node unfla description 
     1947@subsection Description 
    19211948@command{unfla} transforms a flattened UTT file, produced by 
    19221949@command{fla}, into the regular format by restoring end-of-line 
     
    19711998 
    19721999@example 
    1973 cat text | tok | lem --only-fail | cor -1 > output3 
     2000cat text | tok | egrep ' W ' | lem | egrep -v 'lem:' | cor -1 
    19742001@end example 
    19752002 
     
    20202047As @command{grp} (@command{grep}) processes data faster then it is 
    20212048read from the disk drive, the search time may be still shortened by 
    2022 using file compression techniques.  We suggest usin @command{lzop}. 
     2049using file compression techniques.  We suggest using the 
     2050@command{lzop} compressor/decompressor. 
    20232051 
    20242052@item the fastest way to search a large corpus 
    20252053 
    2026 step 1: preprocessing 
     2054step 1: corpus preprocessing 
    20272055 
    20282056@example 
    20292057cat corpus | tok | sen | lem -1 \ 
    2030 | grp -a p | lzop -7 > corpus.grp.lzo 
     2058| fla | lzop -7 > corpus.grp.lzo 
    20312059@end example 
    20322060 
     
    20342062 
    20352063@example 
    2036 lzop -cd corpus.grp.lzo | grp -a gP -e 'cat(<V>) space 
     2064lzop -cd corpus.grp.lzo | unfla | grp -e 'cat(<V>) space 
    20372065lexeme(rozmowa)' | ser -e 'cat(<V>) space lexeme(rozmowa)' | con 
    20382066@end example 
     
    20402068@end enumerate 
    20412069 
    2042 @subsubheading More complicated configurations 
    2043  
    2044  
    2045 @example 
    2046 mknod fifo1 p 
    2047 mknod fifo2 p 
    2048 mknod fifo3 p 
    2049 mknod fifo4 p 
    2050 mknod fifo5 p 
    2051  
    2052 tok | lem -p W -e fifo1 > fifo2 & 
    2053 cor -e fifo3 < fifo1 | lem > fifo4 & 
    2054 gue < fifo3 > fifo5 & 
    2055 sort -m fifo2 fifo4 fifo5 
    2056  
    2057 rm fifo? 
    2058 @end example 
     2070@c @subsubheading More complicated configurations 
     2071 
     2072 
     2073@c @example 
     2074@c mknod fifo1 p 
     2075@c mknod fifo2 p 
     2076@c mknod fifo3 p 
     2077@c mknod fifo4 p 
     2078@c mknod fifo5 p 
     2079 
     2080@c tok | lem -p W -e fifo1 > fifo2 & 
     2081@c cor -e fifo3 < fifo1 | lem > fifo4 & 
     2082@c gue < fifo3 > fifo5 & 
     2083@c sort -m fifo2 fifo4 fifo5 
     2084 
     2085@c rm fifo? 
     2086@c @end example 
    20592087 
    20602088 
  • app/src/common/cmdline_common.ggo

    r25ae32e re28a625  
    22 
    33 
    4 option  "input"         f       "Input file" string no hidden 
     4option  "input"         f       "Input file" string no 
    55 
    6 option  "output"        o       "Output file" string no hidden 
     6option  "output"        o       "Output file for succesfully processed segments" string no 
    77 
    8 option  "fail"          e       "Output file for unsuccesfully processed segments " string no hidden 
     8option  "fail"          e       "Output file for unsuccesfully processed segments " string no 
    99 
    1010option  "only-fail"     -       "Print only segments the program failed to process" flag off hidden 
     
    1212option  "no-fail"       -       "Print only segments the program processed" flag off hidden  
    1313 
    14 option  "copy"          c       "Copy succesfully processed segments to standard output" flag off hidden 
     14option  "copy"          c       "Copy succesfully processed segments to standard output" flag off 
    1515 
    1616option  "process"       p       "Process segments with this tag" string no multiple 
  • app/src/cor/Makefile

    r13a8a67 re28a625  
    1 PAR=-Wno-deprecated -m32 -fpermissive 
    2 # -static 
     1PAR=-Wno-deprecated -m32 -fpermissive -static 
    32PAR2=-c -Wno-deprecated -m32 -fpermissive 
    43LIB_PATH=../lib 
  • app/src/cor/cmdline_cor.ggo

    r25ae32e re28a625  
    55option "dictionary"             d       "Dictionary" string typestr="FILENAME" default="cor.bin" no 
    66option "distance"               n       "Maximal edit distance." int default="1" no 
    7 option "replace"                r       "Replace original form with corrected form, place original form in the cor field. This option has no effect in single mode" flag off 
     7option "replace"                r       "Replace original form with corrected form, place original form in the cor field. This option has no effect in single mode" flag off hidden 
    88#option "single"                        -       "Place all alternatives in the same line" flag off 
  • app/src/gue/Makefile

    r8d3e6ab re28a625  
    1 PAR=-Wno-deprecated -O3 -fpermissive -m32 
    2 #-static 
     1PAR=-Wno-deprecated -O3 -fpermissive -m32 -static 
    32PAR2=-c -Wno-deprecated -O3 -fpermissive -m32 
    43LIB_PATH=../lib 
  • app/src/kor/Makefile

    r13a8a67 re28a625  
    1 PAR=-Wno-deprecated -m32 -fpermissive 
    2 # -static 
     1PAR=-Wno-deprecated -m32 -fpermissive -static 
    32PAR2=-c -Wno-deprecated -m32 -fpermissive 
    43LIB_PATH=../lib 
  • app/src/lem/Makefile

    r13a8a67 re28a625  
    1 PAR=-Wno-deprecated -m32 -O3 -fpermissive 
    2 #-static 
    3 PAR2=-c -Wno-deprecated -m32 -O3 -fpermissive 
     1PAR=-Wno-deprecated -m32 -O3 -fpermissive -static 
     2PAR2=-c -Wno-deprecated -m32 -O3 -fpermissive -static 
    43LIB_PATH=../lib 
    54COMMON_PATH=../common 
  • lang/Makefile

    ref85bd7 re28a625  
    1111export UTT_DIC_OUTPUT=${CUR_DIR} 
    1212 
     13export LANG_MODULES=pl_PL.ISO-8852-2 pl_PL.UTF-8 
    1314 
    1415# path to dictionary compiler 
     
    3233        cd dist && make tarball; cd ${CUR_DIR}; 
    3334         
     35         
     36.PHONY: dist_tarball_pl_PL.ISO-8859-2 
     37dist_tarball: 
     38        export DIC_LANG=pl_PL.ISO-8859-2 && \ 
     39        cd dist && make tarball; cd ${CUR_DIR}; 
     40         
  • lang/dist/tarball/Makefile

    r9b57c4d re28a625  
    1313_TARBALL_ROOT=$(DIR)/utt-$(_UTT_VER).$(_UTT_REL) 
    1414_UTT_DIC_HOME=share/utt 
    15 _TAR_FILE_NAME=utt.dic.$(_UTT_VER)_$(_UTT_REL) 
     15_TAR_FILE_NAME=utt.$(_UTT_VER)_$(_UTT_REL) 
     16 
    1617 
    1718#defualt task 
     
    2122        @echo Output directory for tarball: ${UTT_DIC_OUTPUT} 
    2223        mkdir -p ${_TARBALL_ROOT}/${_UTT_DIC_HOME} 
    23         if test -n "${DIC_LANG}" -a -d ${UTT_DIC_BIN}/${DIC_LANG} ; \ 
     24        if [[ -n "${DIC_LANG}" && -d ${UTT_DIC_BIN}/${DIC_LANG} ]]; \ 
    2425        then \ 
    2526            echo "Tworze dystrybucje ${DIC_LANG}"; \ 
Note: See TracChangeset for help on using the changeset viewer.