print everything up to match in pattern
I have a data set that looks like the following:
movie (year) genre for example.some words (1934) action
My goal is to grab each "movie" field and then check a different file that also has a bunch of movies and delete the lines from the second file that do not contain the movie. I have been trying to use awk to do this, but have only been able to match the year field. Is there a way that I can create a variable for the movie field? I feel like the easiest way to do this would be to match the year field and create a va开发者_StackOverflow社区riable from everything that comes before it in each line. I have not been able to figure this out, is there some way to do this that might be easier than my suggestion?
assuming your dataset is in a file
$ cat dataset
Terminator (19XX) action
The Ghostrider (2009) supernatural
$ awk -F"[()]" '{print $1}' dataset
Terminator
The Ghostrider
$ awk -F"[()]" '{print $1}' dataset > movie_names
$ grep -f movie_names secondfile
$ grep -f secondfile movie_names
Of course, you can do it with just awk as well
awk -F"[()]" 'FNR==NR { m[++d]=$1;next } { for(i=1;i<=d;i++){if( $0 ~ m[i] ){ print }}}' dataset secondfile
You can ask sed
to remove the year field and everything that comes after it.
$ cat file | sed 's/([0-9]\+).*//'
This will only return the name of the movie on each line. You can then pipe it into a while read;
loop.
If needed you can refine the regex so that it only matches on 4 digits (this one will match any number of digits between parens).
精彩评论