How to optimize a bash script? (Find files, ignore those on whitelist, report rest)
I wrote this script to find all files/directories to which $WWWUSER has write permissions. At first I stored the remaining, matching items in a temporary file. I new there must be a way without using files, so this is my "solution". It works, but it's pretty slow. Any tips?
Update: On a directory structure containing about 7k directories and 30k files (~8k whitelistings) the script takes about 15 minutes... (ext3 filesystem, UW320 SCSI harddisk).
#!/usr/bin/env bash
# Checks the webroot for files owned by www daemon and
# writable at the same time. This is only needed by some files
# So we'll check with a whitelist
WWWROOT=/var/www
WWWUSER=www-data
WHITELIST=(/wp-content/uploads
/wp-content/cache
/sitemap.xml
)
OLDIFS=$IFS
IFS=$'\n'
LIST=($(find $WWWROOT -perm /u+w -user $WWWUSER -o -perm /g+w -group $WWWUSER))
IFS=$OLDIFS
arraycount=-1
whitelist_matches=0
for matchedentry in "${LIST[@]}"; do
arraycount=$(($arraycount+1))
for whitelistedentry in "${WHITELIST[@]}"; do
if [ $(echo $matchedentry | grep -c "$whitelistedentry") -gt 0 ]; then
unse开发者_如何学编程t LIST[$arraycount]
whitelist_matches=$(($whitelist_matches+1))
fi
done
LISTCOUNT=${#LIST[@]}
done
if [ $(echo $LISTCOUNT) -gt 0 ]; then
for item in "${LIST[@]}"; do
echo -e "$item\r"
done
echo "$LISTCOUNT items are writable by '$WWWUSER' ($whitelist_matches whitelisted)."
else
echo "No writable items found ($whitelist_matches whitelisted)."
fi
(I don't have a setup handy to test this on, but it should work...)
#!/usr/bin/env bash
# Checks the webroot for files owned by www daemon and
# writable at the same time. This is only needed by some files
# So we'll check with a whitelist
WWWROOT=/var/www
WWWUSER=www-data
WHITELIST="(/wp-content/uploads|/wp-content/cache|/sitemap.xml)"
listcount=0
whitelist_matches=0
while IFS="" read -r matchedentry; do
if [[ "$matchedentry" =~ $WHITELIST ]]; then
((whitelist_matches++))
else
echo -e "$matchedentry\r"
((listcount++))
fi
done < <(find "$WWWROOT" -perm /u+w -user $WWWUSER -o -perm /g+w -group $WWWUSER)
if (( $listcount > 0 )); then
echo "$listcount items are writable by '$WWWUSER' ($whitelist_matches whitelisted)."
else
echo "No writable items found ($whitelist_matches whitelisted)."
fi
Edit: I've incorporated Dennis Williamson's suggestions on the math; also, here's a way to build the WHITELIST pattern starting from an array:
WHITELIST_ARRAY=(/wp-content/uploads
/wp-content/cache
/sitemap.xml
)
WHITELIST=""
for entry in "${WHITELIST_ARRAY[@]}"; do
WHITELIST+="|$entry"
done
WHITELIST="(${WHITELIST#|})" # this removes the stray "|" from the front, and adds parens
Edit2: Sorpigal's comment about eliminating new processes got me thinking -- I suspect most of the speedup in this version comes from not running ~40 invocations of grep
per scanned file, and just a little bit from removing the array manipulation, but it occurred to me that if you don't need the totals at the end, you could remove the main while loop and replace it with this:
find "$WWWROOT" -perm /u+w -user $WWWUSER -o -perm /g+w -group $WWWUSER | grep -v "$WHITELIST"
...which does run grep
, but only once (and runs the entire file list through that single instance), and once it's started grep
'll be able to scan the list of files faster than a bash loop...
There is another possibility. Changing the whitelist to a regex pattern you could use the =~ bash regex operator (version 3 and up) to match any found word quickly against the list: if($word=~$pattern) $pattern could be "^(whitelistentry1|whitelistentry2|whitelistentry3|...)$".
精彩评论