How can I make file I/O more transactional?
I'm writing CGI scripts in Haskell. When the user hits ‘submit’, a Haskell program runs on开发者_如何学C the server, updating (i.e. reading in, processing, overwriting) a status file. Reading then overwriting sometimes causes issues with lazy IO, as we may be able to generate a large output prefix before we've finished reading the input. Worse, users sometimes bounce on the submit button and two instances of the process run concurrently, fighting over the same file!
What's a good way to implement
transactionalUpdate :: FilePath -> (String -> String) -> IO ()
where the function (‘update’) computes the new file contents from the old file contents? It is not safe to presume that ‘update’ is strict, but it may be presumed that it is total (robustness to partial update functions is a bonus). Transactions may be attempted concurrently, but no transaction should be able to update if the file has been written by anyone else since it was read. It's ok for a transaction to abort in case of competition for file access. We may assume a source of systemwide-unique temporary filenames.
My current attempt writes to a temporary file, then uses a system copy command to overwrite. That seems to deal with the lazy IO problems, but it doesn't strike me as safe from races. Is there a tried and tested formula that we could just bottle?
The most idiomatic unixy way to do this is with flock:
- http://hackage.haskell.org/package/flock
- http://swoolley.org/man.cgi/2/flock
Here is a rough first cut that relies on the atomicity of the underlying mkdir
. It seems to fulfill the specification, but I'm not sure how robust or fast it is:
import Control.DeepSeq
import Control.Exception
import System.Directory
import System.IO
transactionalUpdate :: FilePath -> (String -> String) -> IO ()
transactionalUpdate file upd = bracket acquire release update
where
acquire = do
let lockName = file ++ ".lock"
createDirectory lockName
return lockName
release = removeDirectory
update _ = nonTransactionalUpdate file upd
nonTransactionalUpdate :: FilePath -> (String -> String) -> IO ()
nonTransactionalUpdate file upd = do
h <- openFile file ReadMode
s <- upd `fmap` hGetContents h
s `deepseq` hClose h
h <- openFile file WriteMode
hPutStr h s
hClose h
I tested this by adding the following main
and throwing a threadDelay
in the middle of nonTransactionalUpdate
:
main = do
[n] <- getArgs
transactionalUpdate "foo.txt" ((show n ++ "\n") ++)
putStrLn $ "successfully updated " ++ show n
Then I compiled and ran a bunch of instances with this script:
#!/bin/bash
rm foo.txt
touch foo.txt
for i in {1..50}
do
./SO $i &
done
A process that printed a successful update message if and only if the corresponding number was in foo.txt
; all the others printed the expected SO: foo.txt.notveryunique: createDirectory: already exists (File exists)
.
Update: You actually do not want to use unique names here; it must be a consistent name across the competing processes. I've updated the code accordingly.
精彩评论