Context Navigation

← Previous Change
Next Change →

utt.texinfo

Timestamp:

10/29/08 11:17:16 (17 years ago)

Author:

obrebski <obrebski@…>

Branches:

master, help

Children:

91ed676

Parents:

261bf62

git-author:

obrebski <obrebski@…> (10/29/08 11:17:16)

git-committer:

obrebski <obrebski@…> (10/29/08 11:17:16)

Message:

Ta linia i następne zostaną zignorowane--

M app/dist/files/README

uaktualnione

M app/doc/utt.texinfo

dopiski

M app/src/gue/Makefile

statyczne biblioteki

M app/src/cor/cmdline_cor.ggo

usuniecie nie dzialajacych parametrow

M app/src/cor/Makefile

statyczne biblioteki

M app/src/common/cmdline_common.ggo

?

M app/src/kor/Makefile

statyczne biblioteki

M app/src/lem/Makefile

statyczne biblioteki

M lang/dist/tarball/Makefile

pakowanie modulow jezykowych po jednym

M lang/Makefile

-"-

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@61 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

File:

: 1 edited

app/doc/utt.texinfo (modified) (11 diffs)

Legend:

: Unmodified
: Added
: Removed

app/doc/utt.texinfo

-                      r261bf62
+                      re28a625
 @section Flattened UTT file
 A UTT file format has two variants: regular and flattend. The regular
+A UTT file format has two variants: regular and flattened. The regular
 format was described above.  In the flattened format some of the
 end-of-line characters are replaced with line-feed characters.
 …
 @example
 cat corpus | tok | sen | lem | grp -a p | lzop -7 > corpus.grp.lzo
 @end example
 @example
 lzop -cd corpus.grp.lzo | grp -a gP -e @var{EXPR} | ser -e @var{EXPR}
+cat corpus | tok | sen | lem -1 | fla | lzop -7 > corpus.grp.lzo
+@end example
+@example
+lzop -cd corpus.grp.lzo | grp -e @var{EXPR} | unfla | ser -e @var{EXPR}
 @end example
 …
 @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
 @item @strong{Authors:}                 @tab Marcin Walas, Tomasz ObrÃªbski
+@item @strong{Component category:}      @tab filter
+@item @strong{Input format:}            @tab UTT flattened
+@item @strong{Output format:}           @tab UTT flattened
+@item @strong{Required annotation:}     @tab tok, sen, lem -1
 @end multitable
 [TODO]
+(see mar's help 'mar -h' for some information)
 @c ---------------------------------------------------------------------
 …
+@c -------------------------------------------------------------------------------
+@c FLA
+@c -------------------------------------------------------------------------------
 @page
 @node fla
 …
 @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
 @item @strong{Authors:}                 @tab Tomasz ObrÃªbski
+@item @strong{Component category:}      @tab filter
+@item @strong{Input format:}            @tab UTT regular
+@item @strong{Output format:}           @tab UTT flattened
+@item @strong{Required annotation:}     @tab sen
 @end multitable
 @c
+@menu
+* fla description::
+@c * fla command line options::
+@c * fla usage example::
+@end menu
+@node fla description
+@subsection Description
 @command{fla} ``flattens'' a utt file by merging segments belonging
 …
 segment contains a fragment matching the @code{<bosregex>}). By
 default, segments containing a field @code{BOS} are seeked.
+@c @menu
+@c * con command line options::
+@c * con usage example::
+@c * con hints::
+@c @end menu
+@c -------------------------------------------------------------------------------
+@c UNFLA
+@c -------------------------------------------------------------------------------
 @page
 …
 @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
 @item @strong{Authors:}                 @tab Tomasz ObrÃªbski
+@item @strong{Component category:}      @tab filter
+@item @strong{Input format:}            @tab UTT flattened
+@item @strong{Output format:}           @tab UTT regular
+@item @strong{Required annotation:}     @tab -
 @end multitable
+@menu
+* unfla description::
+@c * fla command line options::
+@c * fla usage example::
+@end menu
+@node unfla description
+@subsection Description
 @command{unfla} transforms a flattened UTT file, produced by
 @command{fla}, into the regular format by restoring end-of-line
 …
 @example
 cat text | tok | lem --only-fail | cor -1 > output3
+cat text | tok | egrep ' W ' | lem | egrep -v 'lem:' | cor -1
 @end example
 …
 As @command{grp} (@command{grep}) processes data faster then it is
 read from the disk drive, the search time may be still shortened by
+using file compression techniques.  We suggest usin @command{lzop}.
+using file compression techniques.  We suggest using the
+@command{lzop} compressor/decompressor.
 @item the fastest way to search a large corpus
 step 1: preprocessing
+step 1: corpus preprocessing
 @example
 cat corpus | tok | sen | lem -1 \
 | grp -a p | lzop -7 > corpus.grp.lzo
+| fla | lzop -7 > corpus.grp.lzo
 @end example
 …
 @example
 lzop -cd corpus.grp.lzo | grp -a gP -e 'cat(<V>) space
+lzop -cd corpus.grp.lzo | unfla | grp -e 'cat(<V>) space
 lexeme(rozmowa)' | ser -e 'cat(<V>) space lexeme(rozmowa)' | con
 @end example
 …
 @end enumerate
 @subsubheading More complicated configurations
 @example
 mknod fifo1 p
 mknod fifo2 p
 mknod fifo3 p
 mknod fifo4 p
 mknod fifo5 p
 tok | lem -p W -e fifo1 > fifo2 &
 cor -e fifo3 < fifo1 | lem > fifo4 &
 gue < fifo3 > fifo5 &
 sort -m fifo2 fifo4 fifo5
 rm fifo?
 @end example
+@c @subsubheading More complicated configurations
+@c @example
+@c mknod fifo1 p
+@c mknod fifo2 p
+@c mknod fifo3 p
+@c mknod fifo4 p
+@c mknod fifo5 p
+@c tok | lem -p W -e fifo1 > fifo2 &
+@c cor -e fifo3 < fifo1 | lem > fifo4 &
+@c gue < fifo3 > fifo5 &
+@c sort -m fifo2 fifo4 fifo5
+@c rm fifo?
+@c @end example

Note: See TracChangeset for help on using the changeset viewer.

UAM Text Tools

Context Navigation

Changeset e28a625 for app/doc/utt.texinfo

Legend:

app/doc/utt.texinfo

Download in other formats: