开发者

Read a file and store arbitrary values into a variable in bash script

I'm new to bash scripting and I'm having a hard time to figure out this problem. I have about two hundred files that follow this pattern:

ANÁLISE DA GLOSA FUNGICIDA
A ANÁLISE RESULTA EM:
S='Glosa02626354' = "agente que destrói ou previne o crescimento de fungos"
    {antifúngico: O I]antifúngico clássico utilizado no tratamento não previne a disseminação típica da infecção.,
    agente antifúngico: Os resultados sugerem a utilização terapêutica do extrato do limão como I]agente antifúngico na Odontologia.,
    fungicida: A duração do ]fungicida no carpete tem garantia de cinco anos.,
    antimicótico: Os grupos nomearam o I]antimicótico e realizaram campanha de lançamento fictícia, com material técnico de divulgação e brindes.,
    agente antimicótico: Em caso de infecção, deverá ser instituído o uso de um I]agente antimicótico.}

Chave: FUNGICIDA <noun.artifact> 
ILI: 02626354
Sense 1
{02626354} <noun.artifact> antifungal, antifungal agent, fungicide, antimycotic, antimycotic agent -- (any agent that destroys or prevents the growth of fungi)
       => {13935705} <noun.substance> agent -- (a substance that exerts some force or effect)
           => {00005598} <noun.Tops> causal agent, cause, causal agency -- (any entity that causes events to happen)
               => {00001740} <noun.Tops> entity -- (that which is perceived or known or inferred to have its own distinct existence (living or nonliving))

In this case, I have to store the following values between braces: ‘antifúngico’, ‘agente antifúngico’, ‘fungicida’, ‘antimicótico’ and ‘agente antimicótico’ in one variable. Those words will of course be different in every file. For comparison, here's another file:

ANÁLISE DA GLOSA VIA ÁPIA
A ANÁLISE RESULTA EM:
S='Glosa02634922' = "estrada da antiga Roma, na Itália, extendendo-se ao sul, de Roma a Brindisi; iniciada em 312 AC"
    {Via Ápia: Toda a I]Via Apia era conhecida quer pela sua extensão, quer pela sua extraordinária beleza.}

Chave: VIA ÁPIA <noun.artifact>
ILI: 02634922 
Sense 1
{02634922} <noun.artifact> Appian Way#1 -- (an ancient Roman road in Italy extending south from Rome to Brindisi; begun in 312 BC)
       => {03390668} <noun.artifact> highway#1, main road#1 -- (a major road for any form of motor transport)
           => {03941718} <noun.artifact> road#1, route#2 -- (an open way (generally public) for travel or transportation)
               => {04387207} <noun.artifact> way#6 -- (any artifact consisting of a road or path affording passage from one place to another; "he said he was looking for the way out")
                 开发者_Python百科  => {00019244} <noun.Tops> artifact#1, artefact#1 -- (a man-made object taken as a whole)
                       => {00016236} <noun.Tops> object#1, physical object#1 -- (a tangible and visible entity; an entity that can cast a shadow; "it was full of rackets, balls and other objects")
                           => {00001740} <noun.Tops> entity#1 -- (that which is perceived or known or inferred to have its own distinct existence (living or nonliving))
                       => {00002645} <noun.Tops> whole#2, whole thing#1, unit#6 -- (an assemblage of parts that is regarded as a single entity; "how big is that part compared to the whole?"; "the team is a unit")
                           => {00016236} <noun.Tops> object#1, physical object#1 -- (a tangible and visible entity; an entity that can cast a shadow; "it was full of rackets, balls and other objects")
                               => {00001740} <noun.Tops> entity#1 -- (that which is perceived or known or inferred to have its own distinct existence (living or nonliving))

Here, the variable will have just one value, the string ‘Via Ápia’.


Update: I found a way to single out the lines that are relevant using some regular expression wizardry:

grep ':*\.,' file_name.txt

The output of this command for the first example is

    {antifúngico: O I]antifúngico clássico utilizado no tratamento não previne a disseminação típica da infecção.,
    agente antifúngico: Os resultados sugerem a utilização terapêutica do extrato do limão como I]agente antifúngico na Odontologia.,
    fungicida: A duração do ]fungicida no carpete tem garantia de cinco anos.,
    antimicótico: Os grupos nomearam o I]antimicótico e realizaram campanha de lançamento fictícia, com material técnico de divulgação e brindes.,


If you just want to assign the result of your regex match to a variable in bash, then this should do it:

myVar=$(cat file_name.txt|grep ':*\.,')

EDIT:

This may get you a bit closer:

myVar=$(cat file_name.txt|grep ':*\.,'|./x.pl)

Where x.pl is:

#!/usr/bin/perl

while (<STDIN>) {
    my @x = split /,/;

    foreach (@x) {
        print $1 . "\n" if /\{?\W*(.*?)\:/;
    }   
}

That will extract the 4 words you want, separated by newlines. I'm still not entirely clear if that's what you want though.


If you have GNU grep, you may have good luck with grep -Po '(?<={)[^:]+(?=:)'

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜