bash for semantic file structure creation
Update 2010-11-02 7p: Shortened description; posted initial bash solution.
Description
I'd like to create a semantic file structure to better organize my data. I don't want to go a route like recoll, strigi, or beagle; I want no gui and full control. The closest might be oyepa or even closer, Tagsistant.
Here's the idea: one maintains a "regular" tree of their files. For example, mine are organized in project folders like this:
,---
| ~/proj1
| ---- ../proj1_file1[tag1-tag2].ext
| ---- ../proj1_file2[tag3]_yyyy-mm-dd.ext
| ~/proj2
| ---- ../proj2_file3[tag2-tag4].ext
| ---- ../proj1_file4[tag1].ext
`---
proj1, proj2 are very short abbreviations I have for my projects.
Then what I want to do is recursively go through the directory and get the following:
- proj ID
- tags
- extension
Each of these will be form a complete "tag list" for each file.
Then in a user-defined directory, a "semantic hierarchy" will be created based on these tags. This gets a bit long, so just take a look at the directory structure created for all files containing tag2 in the name:
,---
| ~/tag2
| --- ../proj1_file1[tag1-tag2].ext -> ~/proj1/proj1_file1[tag1-tag2].ext
| --- ../proj2_file3[tag2-tag4].ext -> ~/proj2/proj2_file3[tag2-tag4].ext
| ---../tag1
| ------- ../proj1_file1[tag1-tag2].ext -> ~/proj1/proj1_file1[tag1-tag2].ext
| --- ../tag4
| ------- ../proj2_file3[tag2-tag4].ext -> ~/proj2/proj2_file3[tag2-tag4].ext
| --- ../proj1
| ------- ../proj1_file1[tag1-tag2].ext -> ~/proj1/proj1_file1[tag1-tag2].ext
| --- ../proj2
| ------- ../proj2_file3[tag2-tag4].ext -> ~/proj2/proj2_file3[tag2-tag4].ext
`---
In other words, directories are created with all combinations of a file's tags, and each contains a symlink to the actual files having those tags. I have omitted the file type directories, but these would also exist. It looks really messy in type, but I think the effect would be very cool. One could then fine a given file along a number of "tag bread crumbs."
My thoughts so far:
- ls -R in a top directory to get all the file names
- identify those files with a [ and ] in the filename (tagged files)
- with what's left, enter a loop:
- strip out the proj ID, tags, and extension
- create all the necessary dirs based on the tags
- create symlinks to the file in all of the dirs created
First Solution! 2010-11-3 7p
Here's my current working code. It only works on files in the top level directory, does not figure out extension types yet, and only works on 2 tags + the project ID for a total of 3 tags per file. It is a hacked manual chug solution but maybe it would help someone see what I'm doing and how this could be muuuuch better:
#!/bin/bash
########################
#### User Variables ####
########################
## set top directory for the semantic filer
## example: ~/semantic
## result will be ~/semantic/tag1, ~/semantic/tag2, etc.
top_dir=~/Desktop/semantic
## set document extensions, space separated
## example: "doc odt txt"
doc_ext="doc odt txt"
## se开发者_高级运维t presentation extensions, space separated
pres_ext="ppt odp pptx"
## set image extensions, space separated
img_ext="jpg png gif"
#### End User Variables ####
#####################
#### Begin Script####
#####################
cd $top_dir
ls -1 | (while read fname;
do
if [[ $fname == *[* ]]
then
tag_names=$( echo $fname | sed -e 's/-/ /g' -e 's/_.*\[/ /' -e 's/\].*$//' )
num_tags=$(echo $tag_names | wc -w)
current_tags=( `echo $tag_names | sed -e 's/ /\n/g'` )
echo ${current_tags[0]}
echo ${current_tags[1]}
echo ${current_tags[2]}
case $num_tags in
3)
mkdir -p ./${current_tags[0]}/${current_tags[1]}/${current_tags[2]}
mkdir -p ./${current_tags[0]}/${current_tags[2]}/${current_tags[1]}
mkdir -p ./${current_tags[1]}/${current_tags[0]}/${current_tags[2]}
mkdir -p ./${current_tags[1]}/${current_tags[2]}/${current_tags[0]}
mkdir -p ./${current_tags[2]}/${current_tags[0]}/${current_tags[1]}
mkdir -p ./${current_tags[2]}/${current_tags[1]}/${current_tags[0]}
cd $top_dir/${current_tags[0]}
echo $PWD
ln -s $top_dir/$fname
ln -s $top_dir/$fname ./${current_tags[1]}/$fname
ln -s $top_dir/$fname ./${current_tags[2]}/$fname
cd $top_dir/${current_tags[1]}
echo $PWD
ln -s $top_dir/$fname
ln -s $top_dir/$fname ./${current_tags[0]}/$fname
ln -s $top_dir/$fname ./${current_tags[2]}/$fname
cd $top_dir/${current_tags[2]}
echo $PWD
ln -s $top_dir/$fname
ln -s $top_dir/$fname ./${current_tags[0]}/$fname
ln -s $top_dir/$fname ./${current_tags[1]}/$fname
cd $top_dir
;;
esac
fi
done
)
It's actually pretty neat. If you want to try it, do this:
- create a dir somewhere
- use touch to create a bunch of files with the format above: proj_name[tag1-tag2].ext
- define the top_dir variable
- run the script
- play around!
ToDo
- make this work using an "ls -R" in order to get into sub-dirs in my actual tree
- robustness check
- consider switching languages; hey, I've always wanted to learn perl and/or python!
Still open to any suggestions you have. Thanks!
Hmm, big problem, too big to do on a short break...
But I can give you an example of one of the various ways you could structure the script...
#!/bin/sh
ls -1 / | (while read fname; do
echo "$fname"
test=hello
# example transformation...
test2=`echo $fname | tr a-z A-Z`
echo "$test2"
done
echo post-loop processing here, $test
# then finally close the subshell with a right paren
)
Maybe something like this for each tag?
find . -type f|grep -Z "[[-]$tag[]-]"| \
xargs -0 -I %%% ln -s "../../%%%" "tagfolder/$tag/"
Note: The second line doesn't really work, don't know why.
精彩评论