About

Filetracker is a module which provides a shared storage for files together with some extra metadata.

It was designed with the intent to be used along with a relational database in cases where large files need to be stored and accessed from multiple locations, but storing them as blobs in the database is not suitable.

Filetracker base supports caching of files downloaded from the remote master store.

Filetracker API allows versioning of the stored files, but its implementation is optional and not provided by default store classes.

Files, names and versions

A file may contain arbitrary data. Each file has a name, which looks like an absolute filesystem path (components separated by slashes and the first symbol in the filename must be a slash). Filetracker does not support folders explicitly. At the moment you may assume that a file in filetracker is identified by name which by convention looks like a filesystem path. In the future we may make use of this fact, so please obey.

Many methods accept or return versioned names, which look like regular names with version number appended, separated by @. For those methods, passing an unversioned name usually means “the latest version of that file”.

Configuration and usage

Probably the only class you’d like to know and use is Client.

class filetracker.Client(local_store='auto', remote_store='auto', lock_manager='auto', cache_dir=None, remote_url=None, locks_dir=None)[source]

The main filetracker client class.

The client instance can be built is one of several ways. The easiest one is to just call the constructor without arguments. In this case the configuration is taken from the environment variables:

FILETRACKER_DIR
the folder to use as the local cache; if not specified, ~/.filetracker-store is used.
FILETRACKER_URL
the URL of the filetracker server; if not present, the constructed client is a stand-alone local client, which stores the files and metadata locally — this can be safely used by multiple processes on the same machine, too.

Another way to create a client is to pass these values as constructor arguments — remote_url and cache_dir.

If you are the power-user, you may create the client by manually passing local_store and remote_store to the constructor (see Internal API Reference).

delete_file(name)[source]

Deletes the file identified by name along with its metadata.

The file is removed from both the local store and the remote store.

file_size(name, force_refresh=False)[source]

Returns the size of the file.

For efficiency this operation does not use locking, so may return inconsistent data. Use it for informational purposes.

file_version(name)[source]

Returns the newest available version number of the file.

If the remote store is configured, it is queried, otherwise the local version is returned. It is assumed that the remote store always has the newest version of the file.

If version is a part of name, it is ignored.

get_file(name, save_to, add_to_cache=True, force_refresh=False, _lock_exclusive=False)[source]

Retrieves file identified by name.

The file is saved as save_to. If add_to_cache is True, the file is added to the local store. If force_refresh is True, local cache is not examined if a remote store is configured.

If a remote store is configured, but name does not contain a version, the local data store is not used, as we cannot guarantee that the version there is fresh.

Local data store implemented in LocalDataStore tries to not copy the entire file to save_to if possible, but instead uses hardlinking. Therefore you should not modify the file if you don’t want to totally blow something.

This method returns the full versioned name of the retrieved file.

get_stream(name, force_refresh=False)[source]

Retrieves file identified by name in streaming mode.

Works like get_file(), except that returns a tuple (file-like object, versioned name).

Does not support adding to cache, although the file will be served locally if a full version is specified and exists in the cache.

put_file(name, filename, to_local_store=True, to_remote_store=True)[source]

Adds file filename to the filetracker under the name name.

If the file already exists, a new version is created. In practice if the store does not support versioning, the file is overwritten.

The file may be added to local store only (if to_remote_store is False), to remote store only (if to_local_store is False) or both. If only one store is configured, the values of to_local_store and to_remote_store are ignored.

Local data store implemented in LocalDataStore tries to not directly copy the data to the final cache destination, but uses hardlinking. Therefore you should not modify the file in-place later as this would be disastrous.

If you write tests, you may be also interested in filetracker.dummy.DummyClient.

Filetracker server

At some point you probably want to run a filetracker server, so that more than one machine can share the store. Just do:

$ filetracker-server --help

This script can be used to start the metadata and file servers with minimal effort.

Using filetracker from the shell

No programmer can live without a way to fiddle with filetracker from the shell:

$ filetracker --help

Internal API Reference

filetracker.split_name(name)[source]

Splits a (possibly versioned) name into unversioned name and version.

Returns a tuple (unversioned_name, version), where version may be None.

filetracker.versioned_name(unversioned_name, version)[source]

Joins an unversioned name with the specified version.

Returns a versioned path.

class filetracker.DataStore[source]

An abstract base class giving access to storing and retrieving files’ content.

add_file(name, filename)[source]

Saves the actual file in the store.

Works like add_stream(), but filename is the name of an existing file in the filesystem.

add_stream(name, file)[source]

Saves the passed stream in the store.

file may be any file-like object, which will be saved under name name. If name contains a version, the file is saved with this particular version. If the version exists, this method silently succeeds without checking if the content of the stream matches the already saved data.

Returns the version of the newly added file.

delete_file(name)[source]

Deletes the file under the name name and the metadata corresponding to it.

If name contains a version, the file is deleted only if this it the latest version of the file.

exists(name)[source]

Returns True if the file exists, False otherwise.

If name contains version, existence of this particular version is checked.

file_size(name)[source]

Returns the size of the file.

Raises an (unspecified) exception if file is not found.

file_version(name)[source]

Returns the most recent version of the file.

If name has a version number, it is ignored.

Raises an (unspecified) exception if file is not found.

get_file(name, filename)[source]

Saves the content of file named name to filename.

Works like get_stream(), but filename is the name of a file which will be created (or overwritten).

Returns the full versioned name of the retrieved file.

get_stream(name)[source]

Retrieves a file in streaming mode.

Returns a pair (file-like object, versioned name).

list_files()[source]

Returns a list of all stored files, along with the dates of last modification.

class filetracker.LocalDataStore(dir)[source]

Data store which uses local filesystem.

The files are saved under <base_dir>/files, where base_dir can be passed to the constructor.

class filetracker.RemoteDataStore(base_url)[source]

Data store which uses a remote HTTP server.

The server must support PUT requests which automatically create non-existent directories.

The server must return the Last-Modified header and must accept it in PUT and DELETE requests.

The files are saved under <base_url>/files, where base_url can be passed to the constructor.

class filetracker.LockManager[source]

An abstract class representing a lock manager.

Lock manager is basically a factory of FileLock instances.

class Lock[source]

An abstract class representing a lockable file descriptor.

close()[source]

Unlocks the file and releases any system resources.

May be called more than once (it’s a no-op then).

lock_exclusive()[source]

Locks the file in exclusive mode (upgrades an existing lock)

lock_shared()[source]

Locks the file in shared mode (downgrades an existing lock)

unlock()[source]

Unlocks the file (no-op if file is not locked)

LockManager.lock_for(name)[source]

Returns a FileLock bound to the passed file.

Locks are not versioned – there should be a single lock for all versions of the given name. The argument name may contain version specification, but it must be ignored.

class filetracker.FcntlLockManager(dir)[source]

A LockManager using fcntl.flock.

class filetracker.NoOpLockManager[source]

A no-op LockManager.

It may be used when no local store is configured, as we probably do not need concurrency control.

class filetracker.dummy.DummyDataStore[source]

A dummy data store which uses memory to store files.

Cool for testing, but beware — do not try to store too much. And this class is not thread-safe, too.

class filetracker.dummy.DummyClient[source]

Filetracker client which uses a dummy local data store.

To-dos and ideas

  • access control
  • cache pruning
  • support for “directories”: especially ls
  • fuse client
  • rm

Indices and tables