About¶
Filetracker is a module which provides a shared storage for files together with some extra metadata.
It was designed with the intent to be used along with a relational database in cases where large files need to be stored and accessed from multiple locations, but storing them as blobs in the database is not suitable.
Filetracker base supports caching of files downloaded from the remote master store.
Filetracker API allows versioning of the stored files, but its implementation is optional and not provided by default store classes.
Files, names and versions¶
A file may contain arbitrary data. Each file has a name, which looks like an absolute filesystem path (components separated by slashes and the first symbol in the filename must be a slash). Filetracker does not support folders explicitly. At the moment you may assume that a file in filetracker is identified by name which by convention looks like a filesystem path. In the future we may make use of this fact, so please obey.
Many methods accept or return versioned names, which look like regular
names with version number appended, separated by @
. For those methods,
passing an unversioned name usually means “the latest version of that file”.
Configuration and usage¶
Probably the only class you’d like to know and use is Client
.
-
class
filetracker.
Client
(local_store='auto', remote_store='auto', lock_manager='auto', cache_dir=None, remote_url=None, locks_dir=None)[source]¶ The main filetracker client class.
The client instance can be built is one of several ways. The easiest one is to just call the constructor without arguments. In this case the configuration is taken from the environment variables:
FILETRACKER_DIR
- the folder to use as the local cache; if not specified,
~/.filetracker-store
is used. FILETRACKER_URL
- the URL of the filetracker server; if not present, the constructed client is a stand-alone local client, which stores the files and metadata locally — this can be safely used by multiple processes on the same machine, too.
Another way to create a client is to pass these values as constructor arguments —
remote_url
andcache_dir
.If you are the power-user, you may create the client by manually passing
local_store
andremote_store
to the constructor (see Internal API Reference).-
delete_file
(name)[source]¶ Deletes the file identified by
name
along with its metadata.The file is removed from both the local store and the remote store.
-
file_size
(name, force_refresh=False)[source]¶ Returns the size of the file.
For efficiency this operation does not use locking, so may return inconsistent data. Use it for informational purposes.
-
file_version
(name)[source]¶ Returns the newest available version number of the file.
If the remote store is configured, it is queried, otherwise the local version is returned. It is assumed that the remote store always has the newest version of the file.
If version is a part of
name
, it is ignored.
-
get_file
(name, save_to, add_to_cache=True, force_refresh=False, _lock_exclusive=False)[source]¶ Retrieves file identified by
name
.The file is saved as
save_to
. Ifadd_to_cache
isTrue
, the file is added to the local store. Ifforce_refresh
isTrue
, local cache is not examined if a remote store is configured.If a remote store is configured, but
name
does not contain a version, the local data store is not used, as we cannot guarantee that the version there is fresh.Local data store implemented in
LocalDataStore
tries to not copy the entire file tosave_to
if possible, but instead uses hardlinking. Therefore you should not modify the file if you don’t want to totally blow something.This method returns the full versioned name of the retrieved file.
-
get_stream
(name, force_refresh=False)[source]¶ Retrieves file identified by
name
in streaming mode.Works like
get_file()
, except that returns a tuple (file-like object, versioned name).Does not support adding to cache, although the file will be served locally if a full version is specified and exists in the cache.
-
put_file
(name, filename, to_local_store=True, to_remote_store=True)[source]¶ Adds file
filename
to the filetracker under the namename
.If the file already exists, a new version is created. In practice if the store does not support versioning, the file is overwritten.
The file may be added to local store only (if
to_remote_store
isFalse
), to remote store only (ifto_local_store
isFalse
) or both. If only one store is configured, the values ofto_local_store
andto_remote_store
are ignored.Local data store implemented in
LocalDataStore
tries to not directly copy the data to the final cache destination, but uses hardlinking. Therefore you should not modify the file in-place later as this would be disastrous.
If you write tests, you may be also interested in
filetracker.dummy.DummyClient
.
Filetracker server¶
At some point you probably want to run a filetracker server, so that more than one machine can share the store. Just do:
$ filetracker-server --help
This script can be used to start the metadata and file servers with minimal effort.
Using filetracker from the shell¶
No programmer can live without a way to fiddle with filetracker from the shell:
$ filetracker --help
Internal API Reference¶
-
filetracker.
split_name
(name)[source]¶ Splits a (possibly versioned) name into unversioned name and version.
Returns a tuple
(unversioned_name, version)
, whereversion
may beNone
.
-
filetracker.
versioned_name
(unversioned_name, version)[source]¶ Joins an unversioned name with the specified version.
Returns a versioned path.
-
class
filetracker.
DataStore
[source]¶ An abstract base class giving access to storing and retrieving files’ content.
-
add_file
(name, filename)[source]¶ Saves the actual file in the store.
Works like
add_stream()
, butfilename
is the name of an existing file in the filesystem.
-
add_stream
(name, file)[source]¶ Saves the passed stream in the store.
file
may be any file-like object, which will be saved under namename
. Ifname
contains a version, the file is saved with this particular version. If the version exists, this method silently succeeds without checking if the content of the stream matches the already saved data.Returns the version of the newly added file.
-
delete_file
(name)[source]¶ Deletes the file under the name
name
and the metadata corresponding to it.If name contains a version, the file is deleted only if this it the latest version of the file.
-
exists
(name)[source]¶ Returns
True
if the file exists,False
otherwise.If
name
contains version, existence of this particular version is checked.
-
file_size
(name)[source]¶ Returns the size of the file.
Raises an (unspecified) exception if file is not found.
-
file_version
(name)[source]¶ Returns the most recent version of the file.
If
name
has a version number, it is ignored.Raises an (unspecified) exception if file is not found.
-
get_file
(name, filename)[source]¶ Saves the content of file named
name
tofilename
.Works like
get_stream()
, butfilename
is the name of a file which will be created (or overwritten).Returns the full versioned name of the retrieved file.
-
-
class
filetracker.
LocalDataStore
(dir)[source]¶ Data store which uses local filesystem.
The files are saved under
<base_dir>/files
, wherebase_dir
can be passed to the constructor.
-
class
filetracker.
RemoteDataStore
(base_url)[source]¶ Data store which uses a remote HTTP server.
The server must support PUT requests which automatically create non-existent directories.
The server must return the Last-Modified header and must accept it in PUT and DELETE requests.
The files are saved under
<base_url>/files
, wherebase_url
can be passed to the constructor.
-
class
filetracker.
LockManager
[source]¶ An abstract class representing a lock manager.
Lock manager is basically a factory of
FileLock
instances.
-
class
filetracker.
FcntlLockManager
(dir)[source]¶ A
LockManager
usingfcntl.flock
.
-
class
filetracker.
NoOpLockManager
[source]¶ A no-op
LockManager
.It may be used when no local store is configured, as we probably do not need concurrency control.
To-dos and ideas¶
- access control
- cache pruning
- support for “directories”: especially ls
- fuse client
- rm