Changeset 2d89d4b for app


Ignore:
Timestamp:
12/16/08 23:40:47 (16 years ago)
Author:
walas <walas@…>
Branches:
master, help
Children:
2969c84
Parents:
9ace5d2
git-author:
walas <walas@…> (12/16/08 23:40:47)
git-committer:
walas <walas@…> (12/16/08 23:40:47)
Message:

dokumentacja mar + drobne poprawki mar (version)

git-svn-id: svn://atos.wmid.amu.edu.pl/utt@64 e293616e-ec6a-49c2-aa92-f4a8b91c5d16

Location:
app
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • app/doc/utt.texinfo

    r9ace5d2 r2d89d4b  
    17501750@end multitable 
    17511751 
    1752 [TODO] 
    1753  
    1754 (see mar's help 'mar -h' for some information) 
     1752@subsection Description 
     1753@code{mar} is a perl script, which matches given pattern on the utt-formated text 
     1754and tags matching parts with any number of user-defined tags. 
     1755 
     1756@subsection Command line options 
     1757@table @code 
     1758@parhelp 
     1759@parversion 
     1760 
     1761@item @b{@minus{}@minus{}pattern=@var{pattern}, @minus{}e @var{pattern}} 
     1762The search pattern. 
     1763@item @b{@minus{}@minus{}action=@var{action}, @minus{}a @var{action} [p] [s] [P]} 
     1764Perform only indicated actions. Where: 
     1765@multitable {aaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} 
     1766@item @code{p}   @tab preprocess 
     1767@item @code{s}   @tab search 
     1768@item @code{P}   @tab postprocess 
     1769@end multitable 
     1770default: psP 
     1771 
     1772@item @b{@minus{}@minus{}command} 
     1773print generated sed command, then exit 
     1774 
     1775@item @b{@minus{}@minus{}help, @minus{}h} 
     1776print help, then exit 
     1777 
     1778@item @b{@minus{}@minus{}version, @minus{}v} 
     1779print version, then exit 
     1780@end table 
     1781@subsection Tokens in pattern 
     1782@code{mar} pattern is based on @code{ser} patterns(see @pxref{ser pattern}). @code{mar} pattern is a @code{ser} pattern, 
     1783in which you can add any number of matching tags, which will be printed in exacly the place, where 
     1784they were placed in the pattern. A valid token starts with @@ which follows any number of alphanumeric 
     1785characters. For example valid match tokens are: @@STARTMATCH @@ENDMATCH 
     1786 
     1787Matching tokens can be placed between, before or after any of @code{ser} pattern terms. They don't have 
     1788to be paritied. There can be any number of them in the pattern (zero or more). They don't have to be unique. 
     1789They can be placed one after another. For example: 
     1790 
     1791@multitable {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaa} 
     1792@item @code{@@BOM lexeme(pomoc)}  @tab place tag @b{BOM} before any form of the lexeme 'pomoc' 
     1793@item @code{@@MATCH lexeme(pomoc) @@MATCH}      @tab place tag @b{MATCH} before and after any form of the lexeme 'pomoc' 
     1794@item @code{cat(<ADJ>) @@MATCH lexeme(pomoc) @@MATCH}      @tab place tag @b{MATCH} before and after any form of the lexeme 'pomoc' which is  followef by adjective 
     1795@item @code{cat(<ADJ>) @@TAG @@BOM lexeme(pomoc) @@EOM}      @tab place tags @b{TAG} and @b{BOM}  before any form of the lexeme 'pomoc' which is  followed by adjective and tag @b{EOM} after it 
     1796@end multitable 
     1797 
     1798(see mar's help 'mar -h' for some more information) 
     1799 
     1800@subsection How mar works 
     1801@code{mar} translates given @code{ser} pattern with @code{m4} macroprocessor to regular expression. Then it changes it into @code{sed} command script, which is then executed. 
     1802 
     1803You can see translated sed script by using the @code{@minus{}@minus{}command} option. 
     1804@subsection Limitations 
     1805The complexity of computations performed by @code{mar} increases linearly with the number of placed tokens. So it is highly recommended not to place too much tokens. 
     1806@subsection Requirements 
     1807In order to run @code{mar}, the following programs must be installed in the system: 
     1808 
     1809@itemize 
     1810 
     1811@item @command{m4} 
     1812@item @command{grep} 
     1813@item @command{sed} 
     1814 
     1815@end itemize 
     1816 
     1817 
    17551818 
    17561819@c --------------------------------------------------------------------- 
    17571820@c KOT 
    17581821@c --------------------------------------------------------------------- 
    1759  
    17601822 
    17611823@page 
  • app/src/mar/mar

    radb4c8d r2d89d4b  
    1010#which is one of the parametres of the script 
    1111#contact: d287572@atos.wmid.amu.edu.pl, walasiek@gmail.com 
     12 
     13my $version = '1.0'; 
    1214 
    1315use lib "/usr/local/lib/utt"; 
     
    3739my $morfield='lem'; 
    3840my $tags=0; 
     41my $show_version = 0; 
    3942 
    4043#read configuration files########################### 
     
    9093           "action=s" => \$action, 
    9194           "help|h" => \$help, 
    92            "space|s" => \$explicit_space 
     95           "space|s" => \$explicit_space, 
     96       "version|v" => \$show_version, 
    9397   ); 
    9498 
    9599 
    96100 
     101if($show_version){ 
     102    print "Version: $version\n"; 
     103    exit 0; 
     104} 
    97105 
    98106if($help) 
     
    103111Options: 
    104112   --pattern -e PATTERN         Pattern. 
    105    --bos -E PATTERN             Segment serving as sentence beginning marker. [TODO] 
     113   --eos -E PATTERN             Segment serving as sentence beginning marker. [TODO] 
    106114   --macros=FILE                Read macrodefinitions from FILE. [TODO] 
    107115   --define=FILE                Add macrodefinitions from FILE. [TODO] 
     
    110118                                    s - search 
    111119                                    P - postprocess 
    112                                 (default pgP) 
     120                                (default psP) 
    113121   --command                    Print generated shell command and exit. 
    114122   --help -h                    Print help. 
     123   --version -v         Script version 
    115124 
    116125In patern you can put any tag. Tags should begin with the @ character. 
Note: See TracChangeset for help on using the changeset viewer.