Docspell aims to be a simple yet effective document organizer thatmakes stowing documents away very quick and finding them laterreliable (and also fast). It is a bit opinionated and more targetedfor home use and small/medium organizations.
In contrast to many DMS, the main focus is not so much to provide allkinds of features to manually create organizational structures, likefolder hierarchies, where you place the documents yourself. Theapproach is to leave it as a big pile of documents, but extract andattach metadata from each document. These are mainly properties thatemerge from the document itself. The reason is that this is possibleto automate. This makes it very simple to add documents, becausethere is no time spent to think about where to put it. And it ispossible to apply different structures on top later, like show firstall documents of a specific correspondent, then all with tag'invoice', etc. If these properties are attached to all documents, itis really easy to find a document. It even can be combined withfulltext search for the, hopefully rare, desperate cases.
Of course, it is also possible to add custom properties and arbitrarytags.
Docspell analyzes the text to find metadata automatically. It canlearn from existing data and can applyNLPtechniques to support this. This metadata must be maintained manuallyin the application. Docspell looks for candidates for:
- Correspondents
- Concerned person or things
- A date and due date
- Tags
For tags, it sets all that it thinks do apply. For the others, it willpropose a few candidates and sets the most likely one to your item.
This might be wrong, so it is recommended to curate the results.However, very often the correct one is either set or within theproposals where you fix it by a single click.
Besides these properties, there are more metadata you can use toorganize your files, for example custom fields, folders and notes.
Docspell is also for programmers. Everything is available via a RESTor HTTP api and can be easily used within your own scripts and tools,for example using curl
. There are also features for "advanced use"and many configuration options.
Docspell consists of multiple components that run in separateprocesses:
- REST server
- JOEX, short for job executor
- Fulltext Search Index (optional, Apache SOLR or PostgreSQL)
The REST server provides the Api and the web application. The webapplication is aSPA writtenin Elm and is a client to the REST api. Allfeatures are available via a http/rest api.
The joex is the component that does the “heavy work”, executinglong-running tasks, like processing files or importing your mailsperiodically. While the joex component also exposes a small REST apifor controlling it, the main user interface is all inside the restserver api.
The rest server and the job executor can be started multiple times inorder to scale out. It must be ensured, that all connect to the samedatabase. And it is also recommended (though not strictly required),that all components can reach each other.
The fulltext search index is another separate component, wherecurrently SOLR andPostgreSQLis supported. Fulltext search is optional, this component is notrequired if docspell is run without fulltext search support.
In order to better understand the following pages, some terms areexplained.
Item🔗
An item is roughly your document, only that an item may spanmultiple files, which are called attachments. An item has metadata associated:
- a correspondent: the other side of the communication. It can bean organization or a person.
- a concerning person or equipment: a person or thing thatthis item is about. Maybe it is an insurance contract about yourcar.
- tag: an item can be tagged with one or more tags (or labels). Atag can have a category. This is intended for grouping tags, forexample a category
doctype
could be used to group tags likebill
,contract
,receipt
etc. Usually an item is not taggedwith more than one tag of a category. - a folder: a folder is similiar to a tag, but an item can only bein exactly one folder (or none). Furthermore folders allow toassociate users, so that items are only visible to the users who aremembers of a folder.
- an item date: this is the date of the document – if this is notset, the created date of the item is used.
- a due date: an optional date indicating that something has to bedone (e.g. paying a bill, submitting it) about this item until thisdate
- a direction: one of "incoming" or "outgoing"
- a name: some item name, defaults to the file name of theattachments
- some notes: arbitrary descriptive text. You can use markdownhere, which is properly formatted in the web application.
Collective🔗
The users of the application are part of a collective. Acollective is a group of users that share access to the sameitems. The account name is therefore comprised of a collective nameand a user name.
All users of a collective are equal; they have same permissions toaccess all items. The items don't belong to a user, but to thecollective.
That means, to identify yourself when signing in, you have to give thecollective name and your user name. By default it is separated by aslash /
, for example smith/john
. If your user name is the same asthe collective name, you can omit one; so smith/smith
can beabbreviated to just smith
.
By default, all users can see all items of their collective. Afolder can be used to implement other visibilities: Every user cancreate a folder and associate members. It is possible to put items inthese folders and docspell shows only items that are either in nospecific folder or in a folder where the current user is owner ormember.