开发者

bash for semantic file structure creation

Update 2010-11-02 7p: Shortened description; posted initial bash solution.


Description

I'd like to create a semantic file structure to better organize my data. I don't want to go a route like recoll, strigi, or beagle; I want no gui and full control. The closest might be oyepa or even closer, Tagsistant.

Here's the idea: one maintains a "regular" tree of their files. For example, mine are organized in project folders like this:

 ,---
 | ~/proj1
 | ---- ../proj1_file1[tag1-tag2].ext
 | ---- ../proj1_file2[tag3]_yyyy-mm-dd.ext
 | ~/proj2
 | ---- ../proj2_file3[tag2-tag4].ext
 | ---- ../proj1_file4[tag1].ext
 `---

proj1, proj2 are very short abbreviations I have for my projects.

Then what I want to do is recursively go through the directory and get the following:

  • proj ID
  • tags
  • extension

Each of these will be form a complete "tag list" for each file.

Then in a user-defined directory, a "semantic hierarchy" will be created based on these tags. This gets a bit long, so just take a look at the directory structure created for all files containing tag2 in the name:

,---
| ~/tag2
| --- ../proj1_file1[tag1-tag2].ext -> ~/proj1/proj1_file1[tag1-tag2].ext
| --- ../proj2_file3[tag2-tag4].ext -> ~/proj2/proj2_file3[tag2-tag4].ext
| ---../tag1
| ------- ../proj1_file1[tag1-tag2].ext -> ~/proj1/proj1_file1[tag1-tag2].ext
| --- ../tag4
| ------- ../proj2_file3[tag2-tag4].ext -> ~/proj2/proj2_file3[tag2-tag4].ext
| --- ../proj1
| ------- ../proj1_file1[tag1-tag2].ext -> ~/proj1/proj1_file1[tag1-tag2].ext
| --- ../proj2
| ------- ../proj2_file3[tag2-tag4].ext -> ~/proj2/proj2_file3[tag2-tag4].ext
`---

In other words, directories are created with all combinations of a file's tags, and each contains a symlink to the actual files having those tags. I have omitted the file type directories, but these would also exist. It looks really messy in type, but I think the effect would be very cool. One could then fine a given file along a number of "tag bread crumbs."

My thoughts so far:

  • ls -R in a top directory to get all the file names
  • identify those files with a [ and ] in the filename (tagged files)
  • with what's left, enter a loop:
    • strip out the proj ID, tags, and extension
    • create all the necessary dirs based on the tags
    • create symlinks to the file in all of the dirs created

First Solution! 2010-11-3 7p

Here's my current working code. It only works on files in the top level directory, does not figure out extension types yet, and only works on 2 tags + the project ID for a total of 3 tags per file. It is a hacked manual chug solution but maybe it would help someone see what I'm doing and how this could be muuuuch better:

#!/bin/bash

########################
#### User Variables ####
########################

## set top directory for the semantic filer
## example: ~/semantic
## result will be ~/semantic/tag1, ~/semantic/tag2, etc.
top_dir=~/Desktop/semantic

## set document extensions, space separated
## example: "doc odt txt"
doc_ext="doc odt txt"

## se开发者_高级运维t presentation extensions, space separated
pres_ext="ppt odp pptx"

## set image extensions, space separated
img_ext="jpg png gif"

#### End User Variables ####


#####################
#### Begin Script####
#####################

cd $top_dir

ls -1 | (while read fname;
do
   if [[ $fname == *[* ]]
   then

     tag_names=$( echo $fname | sed -e 's/-/ /g' -e 's/_.*\[/ /' -e 's/\].*$//' )

     num_tags=$(echo $tag_names | wc -w)

     current_tags=( `echo $tag_names | sed -e 's/ /\n/g'` )
     echo ${current_tags[0]}
     echo ${current_tags[1]}
     echo ${current_tags[2]}

     case $num_tags in
       3)

       mkdir -p ./${current_tags[0]}/${current_tags[1]}/${current_tags[2]}
       mkdir -p ./${current_tags[0]}/${current_tags[2]}/${current_tags[1]}
       mkdir -p ./${current_tags[1]}/${current_tags[0]}/${current_tags[2]}
       mkdir -p ./${current_tags[1]}/${current_tags[2]}/${current_tags[0]}
       mkdir -p ./${current_tags[2]}/${current_tags[0]}/${current_tags[1]}
       mkdir -p ./${current_tags[2]}/${current_tags[1]}/${current_tags[0]}

       cd $top_dir/${current_tags[0]}
       echo $PWD
       ln -s $top_dir/$fname
       ln -s $top_dir/$fname ./${current_tags[1]}/$fname
       ln -s $top_dir/$fname ./${current_tags[2]}/$fname

       cd $top_dir/${current_tags[1]}
       echo $PWD
       ln -s $top_dir/$fname
       ln -s $top_dir/$fname ./${current_tags[0]}/$fname
       ln -s $top_dir/$fname ./${current_tags[2]}/$fname

       cd $top_dir/${current_tags[2]}
       echo $PWD
       ln -s $top_dir/$fname
       ln -s $top_dir/$fname ./${current_tags[0]}/$fname
       ln -s $top_dir/$fname ./${current_tags[1]}/$fname

       cd $top_dir
       ;;

       esac

   fi

done
)

It's actually pretty neat. If you want to try it, do this:

  • create a dir somewhere
  • use touch to create a bunch of files with the format above: proj_name[tag1-tag2].ext
  • define the top_dir variable
  • run the script
  • play around!

ToDo

  • make this work using an "ls -R" in order to get into sub-dirs in my actual tree
  • robustness check
  • consider switching languages; hey, I've always wanted to learn perl and/or python!

Still open to any suggestions you have. Thanks!


Hmm, big problem, too big to do on a short break...

But I can give you an example of one of the various ways you could structure the script...

#!/bin/sh

ls -1 / | (while read fname; do
    echo "$fname"
    test=hello
    # example transformation...
    test2=`echo $fname | tr a-z A-Z`
    echo "$test2"
  done
  echo post-loop processing here, $test
  # then finally close the subshell with a right paren
)


Maybe something like this for each tag?

find . -type f|grep -Z "[[-]$tag[]-]"| \
xargs -0 -I %%% ln -s "../../%%%" "tagfolder/$tag/"

Note: The second line doesn't really work, don't know why.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜