About¶
Filetracker is a module which provides a shared storage for files together with some extra metadata.
It was designed with the intent to be used along with a relational database in cases where large files need to be stored and accessed from multiple locations, but storing them as blobs in the database is not suitable.
Filetracker base supports caching of files downloaded from the remote master store.
Filetracker API allows versioning of the stored files, but its implementation is optional and not provided by default store classes.
Files, names and versions¶
A file may contain arbitrary data. Each file has a name, which looks like an absolute filesystem path (components separated by slashes and the first symbol in the filename must be a slash). Filetracker does not support folders explicitly. At the moment you may assume that a file in filetracker is identified by name which by convention looks like a filesystem path. In the future we may make use of this fact, so please obey.
Many methods accept or return versioned names, which look like regular
names with version number appended, separated by @. For those methods,
passing an unversioned name usually means “the latest version of that file”.
Configuration and usage¶
Probably the only class you’d like to know and use is Client.
-
class
filetracker.Client(local_store='auto', remote_store='auto', lock_manager='auto', cache_dir=None, remote_url=None, locks_dir=None)[source]¶ The main filetracker client class.
The client instance can be built is one of several ways. The easiest one is to just call the constructor without arguments. In this case the configuration is taken from the environment variables:
FILETRACKER_DIR- the folder to use as the local cache; if not specified,
~/.filetracker-storeis used. FILETRACKER_URL- the URL of the filetracker server; if not present, the constructed client is a stand-alone local client, which stores the files and metadata locally — this can be safely used by multiple processes on the same machine, too.
Another way to create a client is to pass these values as constructor arguments —
remote_urlandcache_dir.If you are the power-user, you may create the client by manually passing
local_storeandremote_storeto the constructor (see Internal API Reference).-
delete_file(name)[source]¶ Deletes the file identified by
namealong with its metadata.The file is removed from both the local store and the remote store.
-
file_size(name, force_refresh=False)[source]¶ Returns the size of the file.
For efficiency this operation does not use locking, so may return inconsistent data. Use it for informational purposes.
-
file_version(name)[source]¶ Returns the newest available version number of the file.
If the remote store is configured, it is queried, otherwise the local version is returned. It is assumed that the remote store always has the newest version of the file.
If version is a part of
name, it is ignored.
-
get_file(name, save_to, add_to_cache=True, force_refresh=False, _lock_exclusive=False)[source]¶ Retrieves file identified by
name.The file is saved as
save_to. Ifadd_to_cacheisTrue, the file is added to the local store. Ifforce_refreshisTrue, local cache is not examined if a remote store is configured.If a remote store is configured, but
namedoes not contain a version, the local data store is not used, as we cannot guarantee that the version there is fresh.Local data store implemented in
LocalDataStoretries to not copy the entire file tosave_toif possible, but instead uses hardlinking. Therefore you should not modify the file if you don’t want to totally blow something.This method returns the full versioned name of the retrieved file.
-
get_stream(name, force_refresh=False)[source]¶ Retrieves file identified by
namein streaming mode.Works like
get_file(), except that returns a tuple (file-like object, versioned name).Does not support adding to cache, although the file will be served locally if a full version is specified and exists in the cache.
-
put_file(name, filename, to_local_store=True, to_remote_store=True)[source]¶ Adds file
filenameto the filetracker under the namename.If the file already exists, a new version is created. In practice if the store does not support versioning, the file is overwritten.
The file may be added to local store only (if
to_remote_storeisFalse), to remote store only (ifto_local_storeisFalse) or both. If only one store is configured, the values ofto_local_storeandto_remote_storeare ignored.Local data store implemented in
LocalDataStoretries to not directly copy the data to the final cache destination, but uses hardlinking. Therefore you should not modify the file in-place later as this would be disastrous.
If you write tests, you may be also interested in
filetracker.dummy.DummyClient.
Filetracker server¶
At some point you probably want to run a filetracker server, so that more than one machine can share the store. Just do:
$ filetracker-server --help
This script can be used to start the metadata and file servers with minimal effort.
Using filetracker from the shell¶
No programmer can live without a way to fiddle with filetracker from the shell:
$ filetracker --help
Internal API Reference¶
-
filetracker.split_name(name)[source]¶ Splits a (possibly versioned) name into unversioned name and version.
Returns a tuple
(unversioned_name, version), whereversionmay beNone.
-
filetracker.versioned_name(unversioned_name, version)[source]¶ Joins an unversioned name with the specified version.
Returns a versioned path.
-
class
filetracker.DataStore[source]¶ An abstract base class giving access to storing and retrieving files’ content.
-
add_file(name, filename)[source]¶ Saves the actual file in the store.
Works like
add_stream(), butfilenameis the name of an existing file in the filesystem.
-
add_stream(name, file)[source]¶ Saves the passed stream in the store.
filemay be any file-like object, which will be saved under namename. Ifnamecontains a version, the file is saved with this particular version. If the version exists, this method silently succeeds without checking if the content of the stream matches the already saved data.Returns the version of the newly added file.
-
delete_file(name)[source]¶ Deletes the file under the name
nameand the metadata corresponding to it.If name contains a version, the file is deleted only if this it the latest version of the file.
-
exists(name)[source]¶ Returns
Trueif the file exists,Falseotherwise.If
namecontains version, existence of this particular version is checked.
-
file_size(name)[source]¶ Returns the size of the file.
Raises an (unspecified) exception if file is not found.
-
file_version(name)[source]¶ Returns the most recent version of the file.
If
namehas a version number, it is ignored.Raises an (unspecified) exception if file is not found.
-
get_file(name, filename)[source]¶ Saves the content of file named
nametofilename.Works like
get_stream(), butfilenameis the name of a file which will be created (or overwritten).Returns the full versioned name of the retrieved file.
-
-
class
filetracker.LocalDataStore(dir)[source]¶ Data store which uses local filesystem.
The files are saved under
<base_dir>/files, wherebase_dircan be passed to the constructor.
-
class
filetracker.RemoteDataStore(base_url)[source]¶ Data store which uses a remote HTTP server.
The server must support PUT requests which automatically create non-existent directories.
The server must return the Last-Modified header and must accept it in PUT and DELETE requests.
The files are saved under
<base_url>/files, wherebase_urlcan be passed to the constructor.
-
class
filetracker.LockManager[source]¶ An abstract class representing a lock manager.
Lock manager is basically a factory of
FileLockinstances.
-
class
filetracker.FcntlLockManager(dir)[source]¶ A
LockManagerusingfcntl.flock.
-
class
filetracker.NoOpLockManager[source]¶ A no-op
LockManager.It may be used when no local store is configured, as we probably do not need concurrency control.
To-dos and ideas¶
- access control
- cache pruning
- support for “directories”: especially ls
- fuse client
- rm