Compare commits

..

1 Commits

Author SHA1 Message Date
Daniel Quinn
717b4a2b49 Fixes #172
Introduce some creative code around setting of ALLOWED_HOSTS that defaults to ['*'].  Also added PAPERLESS_ALLOWED_HOSTS to paperless.conf.example with an explanation as to what it's for
2017-01-03 09:52:31 +00:00
865 changed files with 32509 additions and 251930 deletions

View File

@@ -1,9 +0,0 @@
{
"qpdf": {
"version": "11.1.1"
},
"jbig2enc": {
"version": "0.29",
"git_tag": "0.29"
}
}

View File

@@ -1,21 +0,0 @@
**/__pycache__
/src-ui/.vscode
/src-ui/node_modules
/src-ui/dist
.git
/export
/consume
/media
/data
/docs
.pytest_cache
/dist
/scripts
/resources
**/tests
**/*.spec.ts
**/htmlcov
/src/.pytest_cache
.idea
.venv/
.vscode/

View File

@@ -1,40 +0,0 @@
# EditorConfig: http://EditorConfig.org
root = true
[*]
indent_style = tab
indent_size = 2
insert_final_newline = true
trim_trailing_whitespace = true
end_of_line = lf
charset = utf-8
max_line_length = 79
[{*.html,*.css,*.js}]
max_line_length = off
[*.py]
indent_size = 4
indent_style = space
[*.{yml,yaml}]
indent_style = space
[*.rst]
indent_style = space
[*.md]
indent_style = space
[Pipfile.lock]
indent_style = space
# Tests don't get a line width restriction. It's still a good idea to follow
# the 79 character rule, but in the interests of clarity, tests often need to
# violate it.
[**/test_*.py]
max_line_length = off
[Dockerfile*]
indent_style = space

2
.env
View File

@@ -1,2 +0,0 @@
COMPOSE_PROJECT_NAME=paperless
export PROMPT="(pipenv-projectname)$P$G"

View File

@@ -1,97 +0,0 @@
name: Bug report
description: Something is not working
title: "[BUG] Concise description of the issue"
labels: ["bug", "unconfirmed"]
body:
- type: markdown
attributes:
value: |
Have a question? 👉 [Start a new discussion](https://github.com/paperless-ngx/paperless-ngx/discussions/new) or [ask in chat](https://matrix.to/#/#paperless:adnidor.de).
Before opening an issue, please double check:
- [The troubleshooting documentation](https://paperless-ngx.readthedocs.io/en/latest/troubleshooting.html).
- [The installation instructions](https://paperless-ngx.readthedocs.io/en/latest/setup.html#installation).
- [Existing issues and discussions](https://github.com/paperless-ngx/paperless-ngx/search?q=&type=issues).
- Disable any customer container initialization scripts, if using any
If you encounter issues while installing or configuring Paperless-ngx, please post in the ["Support" section of the discussions](https://github.com/paperless-ngx/paperless-ngx/discussions/new?category=support).
- type: textarea
id: description
attributes:
label: Description
description: A clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem.
placeholder: |
Currently Paperless does not work when...
[Screenshot if applicable]
validations:
required: true
- type: textarea
id: reproduction
attributes:
label: Steps to reproduce
description: Steps to reproduce the behavior.
placeholder: |
1. Go to '...'
2. Click on '....'
3. See error
validations:
required: true
- type: textarea
id: logs
attributes:
label: Webserver logs
description: Logs from the web server related to your issue.
render: bash
validations:
required: true
- type: textarea
id: logs_browser
attributes:
label: Browser logs
description: Logs from the web browser related to your issue, if needed
render: bash
- type: input
id: version
attributes:
label: Paperless-ngx version
placeholder: e.g. 1.6.0
validations:
required: true
- type: input
id: host-os
attributes:
label: Host OS
description: Host OS of the machine running paperless-ngx. Please add the architecture (uname -m) if applicable.
placeholder: e.g. Archlinux / Ubuntu 20.04 / Raspberry Pi `arm64`
validations:
required: true
- type: dropdown
id: install-method
attributes:
label: Installation method
options:
- Docker - official image
- Docker - linuxserver.io image
- Bare metal
- Other (please describe above)
description: Note there are significant differences from the official image and linuxserver.io, please check if your issue is specific to the third-party image.
validations:
required: true
- type: input
id: browser
attributes:
label: Browser
description: Which browser you are using, if relevant.
placeholder: e.g. Chrome, Safari
- type: input
id: config-changes
attributes:
label: Configuration changes
description: Any configuration changes you made in `docker-compose.yml`, `docker-compose.env` or `paperless.conf`.
- type: input
id: other
attributes:
label: Other
description: Any other relevant details.

View File

@@ -1,11 +0,0 @@
blank_issues_enabled: false
contact_links:
- name: 🤔 Questions and Help
url: https://github.com/paperless-ngx/paperless-ngx/discussions
about: This issue tracker is not for support questions. Please refer to our Discussions.
- name: 💬 Chat
url: https://matrix.to/#/#paperless:adnidor.de
about: Want to discuss Paperless-ngx with others? Check out our chat.
- name: 🚀 Feature Request
url: https://github.com/paperless-ngx/paperless-ngx/discussions/new?category=feature-requests
about: Remember to search for existing feature requests and "up-vote" any you like

View File

@@ -1,32 +0,0 @@
<!--
Note: All PRs with code changes should be targeted to the `dev` branch, pure documentation changes can target `main`
-->
## Proposed change
<!--
Please include a summary of the change and which issue is fixed (if any) and any relevant motivation / context. List any dependencies that are required for this change. If appropriate, please include an explanation of how your proposed change can be tested. Screenshots and / or videos can also be helpful if appropriate.
-->
Fixes # (issue)
## Type of change
<!--
What type of change does your PR introduce to Paperless-ngx?
NOTE: Please check only one box!
-->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Other (please explain)
## Checklist:
- [ ] I have read & agree with the [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md).
- [ ] If applicable, I have tested my code for new features & regressions on both mobile & desktop devices, using the latest version of major browsers.
- [ ] If applicable, I have checked that all tests pass, see [documentation](https://paperless-ngx.readthedocs.io/en/latest/extending.html#back-end-development).
- [ ] I have run all `pre-commit` hooks, see [documentation](https://paperless-ngx.readthedocs.io/en/latest/extending.html#code-formatting-with-pre-commit-hooks).
- [ ] I have made corresponding changes to the documentation as needed.
- [ ] I have checked my modifications for any breaking changes.

View File

@@ -1,48 +0,0 @@
# https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/configuration-options-for-dependency-updates#package-ecosystem
version: 2
updates:
# Enable version updates for npm
- package-ecosystem: "npm"
target-branch: "dev"
# Look for `package.json` and `lock` files in the `/src-ui` directory
directory: "/src-ui"
# Check the npm registry for updates every month
schedule:
interval: "monthly"
labels:
- "frontend"
- "dependencies"
# Add reviewers
reviewers:
- "paperless-ngx/frontend"
# Enable version updates for Python
- package-ecosystem: "pip"
target-branch: "dev"
# Look for a `Pipfile` in the `root` directory
directory: "/"
# Check for updates once a week
schedule:
interval: "weekly"
labels:
- "backend"
- "dependencies"
# Add reviewers
reviewers:
- "paperless-ngx/backend"
# Enable updates for Github Actions
- package-ecosystem: "github-actions"
target-branch: "dev"
directory: "/"
schedule:
# Check for updates to GitHub Actions every month
interval: "monthly"
labels:
- "ci-cd"
- "dependencies"
# Add reviewers
reviewers:
- "paperless-ngx/ci-cd"

View File

@@ -1,55 +0,0 @@
autolabeler:
- label: "bug"
branch:
- '/^fix/'
title:
- "/^fix/i"
- label: "enhancement"
branch:
- '/^feature/'
title:
- "/^feature/i"
categories:
- title: 'Breaking Changes'
labels:
- 'breaking-change'
- title: 'Features'
labels:
- 'enhancement'
- title: 'Bug Fixes'
labels:
- 'bug'
- title: 'Documentation'
label: 'documentation'
- title: 'Maintenance'
labels:
- 'chore'
- 'deployment'
- 'translation'
- 'ci-cd'
- title: 'Dependencies'
collapse-after: 3
label: 'dependencies'
- title: 'All App Changes'
labels:
- 'frontend'
- 'backend'
collapse-after: 0
include-labels:
- 'enhancement'
- 'bug'
- 'chore'
- 'deployment'
- 'translation'
- 'dependencies'
- 'documentation'
- 'frontend'
- 'backend'
- 'ci-cd'
category-template: '### $TITLE'
change-template: '- $TITLE @$AUTHOR ([#$NUMBER]($URL))'
change-title-escapes: '\<*_&#@'
template: |
## paperless-ngx $RESOLVED_VERSION
$CHANGES

View File

@@ -1,402 +0,0 @@
#!/usr/bin/env python3
import json
import logging
import os
import shutil
import subprocess
from argparse import ArgumentParser
from typing import Dict
from typing import Final
from typing import List
from typing import Optional
from common import get_log_level
from github import ContainerPackage
from github import GithubBranchApi
from github import GithubContainerRegistryApi
logger = logging.getLogger("cleanup-tags")
class DockerManifest2:
"""
Data class wrapping the Docker Image Manifest Version 2.
See https://docs.docker.com/registry/spec/manifest-v2-2/
"""
def __init__(self, data: Dict) -> None:
self._data = data
# This is the sha256: digest string. Corresponds to GitHub API name
# if the package is an untagged package
self.digest = self._data["digest"]
platform_data_os = self._data["platform"]["os"]
platform_arch = self._data["platform"]["architecture"]
platform_variant = self._data["platform"].get(
"variant",
"",
)
self.platform = f"{platform_data_os}/{platform_arch}{platform_variant}"
class RegistryTagsCleaner:
"""
This is the base class for the image registry cleaning. Given a package
name, it will keep all images which are tagged and all untagged images
referred to by a manifest. This results in only images which have been untagged
and cannot be referenced except by their SHA in being removed. None of these
images should be referenced, so it is fine to delete them.
"""
def __init__(
self,
package_name: str,
repo_owner: str,
repo_name: str,
package_api: GithubContainerRegistryApi,
branch_api: Optional[GithubBranchApi],
):
self.actually_delete = False
self.package_api = package_api
self.branch_api = branch_api
self.package_name = package_name
self.repo_owner = repo_owner
self.repo_name = repo_name
self.tags_to_delete: List[str] = []
self.tags_to_keep: List[str] = []
# Get the information about all versions of the given package
# These are active, not deleted, the default returned from the API
self.all_package_versions = self.package_api.get_active_package_versions(
self.package_name,
)
# Get a mapping from a tag like "1.7.0" or "feature-xyz" to the ContainerPackage
# tagged with it. It makes certain lookups easy
self.all_pkgs_tags_to_version: Dict[str, ContainerPackage] = {}
for pkg in self.all_package_versions:
for tag in pkg.tags:
self.all_pkgs_tags_to_version[tag] = pkg
logger.info(
f"Located {len(self.all_package_versions)} versions of package {self.package_name}",
)
self.decide_what_tags_to_keep()
def clean(self):
"""
This method will delete image versions, based on the selected tags to delete
"""
for tag_to_delete in self.tags_to_delete:
package_version_info = self.all_pkgs_tags_to_version[tag_to_delete]
if self.actually_delete:
logger.info(
f"Deleting {tag_to_delete} (id {package_version_info.id})",
)
self.package_api.delete_package_version(
package_version_info,
)
else:
logger.info(
f"Would delete {tag_to_delete} (id {package_version_info.id})",
)
else:
logger.info("No tags to delete")
def clean_untagged(self, is_manifest_image: bool):
"""
This method will delete untagged images, that is those which are not named. It
handles if the image tag is actually a manifest, which points to images that look otherwise
untagged.
"""
def _clean_untagged_manifest():
"""
Handles the deletion of untagged images, but where the package is a manifest, ie a multi
arch image, which means some "untagged" images need to exist still.
Ok, bear with me, these are annoying.
Our images are multi-arch, so the manifest is more like a pointer to a sha256 digest.
These images are untagged, but pointed to, and so should not be removed (or every pull fails).
So for each image getting kept, parse the manifest to find the digest(s) it points to. Then
remove those from the list of untagged images. The final result is the untagged, not pointed to
version which should be safe to remove.
Example:
Tag: ghcr.io/paperless-ngx/paperless-ngx:1.7.1 refers to
amd64: sha256:b9ed4f8753bbf5146547671052d7e91f68cdfc9ef049d06690b2bc866fec2690
armv7: sha256:81605222df4ba4605a2ba4893276e5d08c511231ead1d5da061410e1bbec05c3
arm64: sha256:374cd68db40734b844705bfc38faae84cc4182371de4bebd533a9a365d5e8f3b
each of which appears as untagged image, but isn't really.
So from the list of untagged packages, remove those digests. Once all tags which
are being kept are checked, the remaining untagged packages are actually untagged
with no referrals in a manifest to them.
"""
# Simplify the untagged data, mapping name (which is a digest) to the version
# At the moment, these are the images which APPEAR untagged.
untagged_versions = {}
for x in self.all_package_versions:
if x.untagged:
untagged_versions[x.name] = x
skips = 0
# Parse manifests to locate digests pointed to
for tag in sorted(self.tags_to_keep):
full_name = f"ghcr.io/{self.repo_owner}/{self.package_name}:{tag}"
logger.info(f"Checking manifest for {full_name}")
try:
proc = subprocess.run(
[
shutil.which("docker"),
"manifest",
"inspect",
full_name,
],
capture_output=True,
)
manifest_list = json.loads(proc.stdout)
for manifest_data in manifest_list["manifests"]:
manifest = DockerManifest2(manifest_data)
if manifest.digest in untagged_versions:
logger.info(
f"Skipping deletion of {manifest.digest},"
f" referred to by {full_name}"
f" for {manifest.platform}",
)
del untagged_versions[manifest.digest]
skips += 1
except Exception as err:
self.actually_delete = False
logger.exception(err)
return
logger.info(
f"Skipping deletion of {skips} packages referred to by a manifest",
)
# Delete the untagged and not pointed at packages
logger.info(f"Deleting untagged packages of {self.package_name}")
for to_delete_name in untagged_versions:
to_delete_version = untagged_versions[to_delete_name]
if self.actually_delete:
logger.info(
f"Deleting id {to_delete_version.id} named {to_delete_version.name}",
)
self.package_api.delete_package_version(
to_delete_version,
)
else:
logger.info(
f"Would delete {to_delete_name} (id {to_delete_version.id})",
)
def _clean_untagged_non_manifest():
"""
If the package is not a multi-arch manifest, images without tags are safe to delete.
"""
for package in self.all_package_versions:
if package.untagged:
if self.actually_delete:
logger.info(
f"Deleting id {package.id} named {package.name}",
)
self.package_api.delete_package_version(
package,
)
else:
logger.info(
f"Would delete {package.name} (id {package.id})",
)
else:
logger.info(
f"Not deleting tag {package.tags[0]} of package {self.package_name}",
)
logger.info("Beginning untagged image cleaning")
if is_manifest_image:
_clean_untagged_manifest()
else:
_clean_untagged_non_manifest()
def decide_what_tags_to_keep(self):
"""
This method holds the logic to delete what tags to keep and there fore
what tags to delete.
By default, any image with at least 1 tag will be kept
"""
# By default, keep anything which is tagged
self.tags_to_keep = list(set(self.all_pkgs_tags_to_version.keys()))
class MainImageTagsCleaner(RegistryTagsCleaner):
def decide_what_tags_to_keep(self):
"""
Overrides the default logic for deciding what images to keep. Images tagged as "feature-"
will be removed, if the corresponding branch no longer exists.
"""
# Default to everything gets kept still
super().decide_what_tags_to_keep()
# Locate the feature branches
feature_branches = {}
for branch in self.branch_api.get_branches(
repo=self.repo_name,
):
if branch.name.startswith("feature-"):
logger.debug(f"Found feature branch {branch.name}")
feature_branches[branch.name] = branch
logger.info(f"Located {len(feature_branches)} feature branches")
if not len(feature_branches):
# Our work here is done, delete nothing
return
# Filter to packages which are tagged with feature-*
packages_tagged_feature: List[ContainerPackage] = []
for package in self.all_package_versions:
if package.tag_matches("feature-"):
packages_tagged_feature.append(package)
# Map tags like "feature-xyz" to a ContainerPackage
feature_pkgs_tags_to_versions: Dict[str, ContainerPackage] = {}
for pkg in packages_tagged_feature:
for tag in pkg.tags:
feature_pkgs_tags_to_versions[tag] = pkg
logger.info(
f'Located {len(feature_pkgs_tags_to_versions)} versions of package {self.package_name} tagged "feature-"',
)
# All the feature tags minus all the feature branches leaves us feature tags
# with no corresponding branch
self.tags_to_delete = list(
set(feature_pkgs_tags_to_versions.keys()) - set(feature_branches.keys()),
)
# All the tags minus the set of going to be deleted tags leaves us the
# tags which will be kept around
self.tags_to_keep = list(
set(self.all_pkgs_tags_to_version.keys()) - set(self.tags_to_delete),
)
logger.info(
f"Located {len(self.tags_to_delete)} versions of package {self.package_name} to delete",
)
class LibraryTagsCleaner(RegistryTagsCleaner):
"""
Exists for the off change that someday, the installer library images
will need their own logic
"""
pass
def _main():
parser = ArgumentParser(
description="Using the GitHub API locate and optionally delete container"
" tags which no longer have an associated feature branch",
)
# Requires an affirmative command to actually do a delete
parser.add_argument(
"--delete",
action="store_true",
default=False,
help="If provided, actually delete the container tags",
)
# When a tagged image is updated, the previous version remains, but it no longer tagged
# Add this option to remove them as well
parser.add_argument(
"--untagged",
action="store_true",
default=False,
help="If provided, delete untagged containers as well",
)
# If given, the package is assumed to be a multi-arch manifest. Cache packages are
# not multi-arch, all other types are
parser.add_argument(
"--is-manifest",
action="store_true",
default=False,
help="If provided, the package is assumed to be a multi-arch manifest following schema v2",
)
# Allows configuration of log level for debugging
parser.add_argument(
"--loglevel",
default="info",
help="Configures the logging level",
)
# Get the name of the package being processed this round
parser.add_argument(
"package",
help="The package to process",
)
args = parser.parse_args()
logging.basicConfig(
level=get_log_level(args),
datefmt="%Y-%m-%d %H:%M:%S",
format="%(asctime)s %(levelname)-8s %(message)s",
)
# Must be provided in the environment
repo_owner: Final[str] = os.environ["GITHUB_REPOSITORY_OWNER"]
repo: Final[str] = os.environ["GITHUB_REPOSITORY"]
gh_token: Final[str] = os.environ["TOKEN"]
# Find all branches named feature-*
# Note: Only relevant to the main application, but simpler to
# leave in for all packages
with GithubBranchApi(gh_token) as branch_api:
with GithubContainerRegistryApi(gh_token, repo_owner) as container_api:
if args.package in {"paperless-ngx", "paperless-ngx/builder/cache/app"}:
cleaner = MainImageTagsCleaner(
args.package,
repo_owner,
repo,
container_api,
branch_api,
)
else:
cleaner = LibraryTagsCleaner(
args.package,
repo_owner,
repo,
container_api,
None,
)
# Set if actually doing a delete vs dry run
cleaner.actually_delete = args.delete
# Clean images with tags
cleaner.clean()
# Clean images which are untagged
cleaner.clean_untagged(args.is_manifest)
if __name__ == "__main__":
_main()

View File

@@ -1,48 +0,0 @@
#!/usr/bin/env python3
import logging
def get_image_tag(
repo_name: str,
pkg_name: str,
pkg_version: str,
) -> str:
"""
Returns a string representing the normal image for a given package
"""
return f"ghcr.io/{repo_name.lower()}/builder/{pkg_name}:{pkg_version}"
def get_cache_image_tag(
repo_name: str,
pkg_name: str,
pkg_version: str,
branch_name: str,
) -> str:
"""
Returns a string representing the expected image cache tag for a given package
Registry type caching is utilized for the builder images, to allow fast
rebuilds, generally almost instant for the same version
"""
return f"ghcr.io/{repo_name.lower()}/builder/cache/{pkg_name}:{pkg_version}"
def get_log_level(args) -> int:
"""
Returns a logging level, based
:param args:
:return:
"""
levels = {
"critical": logging.CRITICAL,
"error": logging.ERROR,
"warn": logging.WARNING,
"warning": logging.WARNING,
"info": logging.INFO,
"debug": logging.DEBUG,
}
level = levels.get(args.loglevel.lower())
if level is None:
level = logging.INFO
return level

View File

@@ -1,92 +0,0 @@
#!/usr/bin/env python3
"""
This is a helper script for the mutli-stage Docker image builder.
It provides a single point of configuration for package version control.
The output JSON object is used by the CI workflow to determine what versions
to build and pull into the final Docker image.
Python package information is obtained from the Pipfile.lock. As this is
kept updated by dependabot, it usually will need no further configuration.
The sole exception currently is pikepdf, which has a dependency on qpdf,
and is configured here to use the latest version of qpdf built by the workflow.
Other package version information is configured directly below, generally by
setting the version and Git information, if any.
"""
import argparse
import json
import os
from pathlib import Path
from typing import Final
from common import get_cache_image_tag
from common import get_image_tag
def _main():
parser = argparse.ArgumentParser(
description="Generate a JSON object of information required to build the given package, based on the Pipfile.lock",
)
parser.add_argument(
"package",
help="The name of the package to generate JSON for",
)
PIPFILE_LOCK_PATH: Final[Path] = Path("Pipfile.lock")
BUILD_CONFIG_PATH: Final[Path] = Path(".build-config.json")
# Read the main config file
build_json: Final = json.loads(BUILD_CONFIG_PATH.read_text())
# Read Pipfile.lock file
pipfile_data: Final = json.loads(PIPFILE_LOCK_PATH.read_text())
args: Final = parser.parse_args()
# Read from environment variables set by GitHub Actions
repo_name: Final[str] = os.environ["GITHUB_REPOSITORY"]
branch_name: Final[str] = os.environ["GITHUB_REF_NAME"]
# Default output values
version = None
extra_config = {}
if args.package in pipfile_data["default"]:
# Read the version from Pipfile.lock
pkg_data = pipfile_data["default"][args.package]
pkg_version = pkg_data["version"].split("==")[-1]
version = pkg_version
# Any extra/special values needed
if args.package == "pikepdf":
extra_config["qpdf_version"] = build_json["qpdf"]["version"]
elif args.package in build_json:
version = build_json[args.package]["version"]
else:
raise NotImplementedError(args.package)
# The JSON object we'll output
output = {
"name": args.package,
"version": version,
"image_tag": get_image_tag(repo_name, args.package, version),
"cache_tag": get_cache_image_tag(
repo_name,
args.package,
version,
branch_name,
),
}
# Add anything special a package may need
output.update(extra_config)
# Output the JSON info to stdout
print(json.dumps(output))
if __name__ == "__main__":
_main()

View File

@@ -1,274 +0,0 @@
#!/usr/bin/env python3
"""
This module contains some useful classes for interacting with the Github API.
The full documentation for the API can be found here: https://docs.github.com/en/rest
Mostly, this focusses on two areas, repo branches and repo packages, as the use case
is cleaning up container images which are no longer referred to.
"""
import functools
import logging
import re
import urllib.parse
from typing import Dict
from typing import List
from typing import Optional
import httpx
logger = logging.getLogger("github-api")
class _GithubApiBase:
"""
A base class for interacting with the Github API. It
will handle the session and setting authorization headers.
"""
def __init__(self, token: str) -> None:
self._token = token
self._client: Optional[httpx.Client] = None
def __enter__(self) -> "_GithubApiBase":
"""
Sets up the required headers for auth and response
type from the API
"""
self._client = httpx.Client()
self._client.headers.update(
{
"Accept": "application/vnd.github.v3+json",
"Authorization": f"token {self._token}",
},
)
return self
def __exit__(self, exc_type, exc_val, exc_tb):
"""
Ensures the authorization token is cleaned up no matter
the reason for the exit
"""
if "Accept" in self._client.headers:
del self._client.headers["Accept"]
if "Authorization" in self._client.headers:
del self._client.headers["Authorization"]
# Close the session as well
self._client.close()
self._client = None
def _read_all_pages(self, endpoint):
"""
Helper function to read all pages of an endpoint, utilizing the
next.url until exhausted. Assumes the endpoint returns a list
"""
internal_data = []
while True:
resp = self._client.get(endpoint)
if resp.status_code == 200:
internal_data += resp.json()
if "next" in resp.links:
endpoint = resp.links["next"]["url"]
else:
logger.debug("Exiting pagination loop")
break
else:
logger.warning(f"Request to {endpoint} return HTTP {resp.status_code}")
resp.raise_for_status()
return internal_data
class _EndpointResponse:
"""
For all endpoint JSON responses, store the full
response data, for ease of extending later, if need be.
"""
def __init__(self, data: Dict) -> None:
self._data = data
class GithubBranch(_EndpointResponse):
"""
Simple wrapper for a repository branch, only extracts name information
for now.
"""
def __init__(self, data: Dict) -> None:
super().__init__(data)
self.name = self._data["name"]
class GithubBranchApi(_GithubApiBase):
"""
Wrapper around branch API.
See https://docs.github.com/en/rest/branches/branches
"""
def __init__(self, token: str) -> None:
super().__init__(token)
self._ENDPOINT = "https://api.github.com/repos/{REPO}/branches"
def get_branches(self, repo: str) -> List[GithubBranch]:
"""
Returns all current branches of the given repository owned by the given
owner or organization.
"""
# The environment GITHUB_REPOSITORY already contains the owner in the correct location
endpoint = self._ENDPOINT.format(REPO=repo)
internal_data = self._read_all_pages(endpoint)
return [GithubBranch(branch) for branch in internal_data]
class ContainerPackage(_EndpointResponse):
"""
Data class wrapping the JSON response from the package related
endpoints
"""
def __init__(self, data: Dict):
super().__init__(data)
# This is a numerical ID, required for interactions with this
# specific package, including deletion of it or restoration
self.id: int = self._data["id"]
# A string name. This might be an actual name or it could be a
# digest string like "sha256:"
self.name: str = self._data["name"]
# URL to the package, including its ID, can be used for deletion
# or restoration without needing to build up a URL ourselves
self.url: str = self._data["url"]
# The list of tags applied to this image. Maybe an empty list
self.tags: List[str] = self._data["metadata"]["container"]["tags"]
@functools.cached_property
def untagged(self) -> bool:
"""
Returns True if the image has no tags applied to it, False otherwise
"""
return len(self.tags) == 0
@functools.cache
def tag_matches(self, pattern: str) -> bool:
"""
Returns True if the image has at least one tag which matches the given regex,
False otherwise
"""
for tag in self.tags:
if re.match(pattern, tag) is not None:
return True
return False
def __repr__(self):
return f"Package {self.name}"
class GithubContainerRegistryApi(_GithubApiBase):
"""
Class wrapper to deal with the Github packages API. This class only deals with
container type packages, the only type published by paperless-ngx.
"""
def __init__(self, token: str, owner_or_org: str) -> None:
super().__init__(token)
self._owner_or_org = owner_or_org
if self._owner_or_org == "paperless-ngx":
# https://docs.github.com/en/rest/packages#get-all-package-versions-for-a-package-owned-by-an-organization
self._PACKAGES_VERSIONS_ENDPOINT = "https://api.github.com/orgs/{ORG}/packages/{PACKAGE_TYPE}/{PACKAGE_NAME}/versions"
# https://docs.github.com/en/rest/packages#delete-package-version-for-an-organization
self._PACKAGE_VERSION_DELETE_ENDPOINT = "https://api.github.com/orgs/{ORG}/packages/{PACKAGE_TYPE}/{PACKAGE_NAME}/versions/{PACKAGE_VERSION_ID}"
else:
# https://docs.github.com/en/rest/packages#get-all-package-versions-for-a-package-owned-by-the-authenticated-user
self._PACKAGES_VERSIONS_ENDPOINT = "https://api.github.com/user/packages/{PACKAGE_TYPE}/{PACKAGE_NAME}/versions"
# https://docs.github.com/en/rest/packages#delete-a-package-version-for-the-authenticated-user
self._PACKAGE_VERSION_DELETE_ENDPOINT = "https://api.github.com/user/packages/{PACKAGE_TYPE}/{PACKAGE_NAME}/versions/{PACKAGE_VERSION_ID}"
self._PACKAGE_VERSION_RESTORE_ENDPOINT = (
f"{self._PACKAGE_VERSION_DELETE_ENDPOINT}/restore"
)
def get_active_package_versions(
self,
package_name: str,
) -> List[ContainerPackage]:
"""
Returns all the versions of a given package (container images) from
the API
"""
package_type: str = "container"
# Need to quote this for slashes in the name
package_name = urllib.parse.quote(package_name, safe="")
endpoint = self._PACKAGES_VERSIONS_ENDPOINT.format(
ORG=self._owner_or_org,
PACKAGE_TYPE=package_type,
PACKAGE_NAME=package_name,
)
pkgs = []
for data in self._read_all_pages(endpoint):
pkgs.append(ContainerPackage(data))
return pkgs
def get_deleted_package_versions(
self,
package_name: str,
) -> List[ContainerPackage]:
package_type: str = "container"
# Need to quote this for slashes in the name
package_name = urllib.parse.quote(package_name, safe="")
endpoint = (
self._PACKAGES_VERSIONS_ENDPOINT.format(
ORG=self._owner_or_org,
PACKAGE_TYPE=package_type,
PACKAGE_NAME=package_name,
)
+ "?state=deleted"
)
pkgs = []
for data in self._read_all_pages(endpoint):
pkgs.append(ContainerPackage(data))
return pkgs
def delete_package_version(self, package_data: ContainerPackage):
"""
Deletes the given package version from the GHCR
"""
resp = self._client.delete(package_data.url)
if resp.status_code != 204:
logger.warning(
f"Request to delete {package_data.url} returned HTTP {resp.status_code}",
)
def restore_package_version(
self,
package_name: str,
package_data: ContainerPackage,
):
package_type: str = "container"
endpoint = self._PACKAGE_VERSION_RESTORE_ENDPOINT.format(
ORG=self._owner_or_org,
PACKAGE_TYPE=package_type,
PACKAGE_NAME=package_name,
PACKAGE_VERSION_ID=package_data.id,
)
resp = self._client.post(endpoint)
if resp.status_code != 204:
logger.warning(
f"Request to delete {endpoint} returned HTTP {resp.status_code}",
)

23
.github/stale.yml vendored
View File

@@ -1,23 +0,0 @@
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 30
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 7
# Only issues or pull requests with all of these labels are check if stale. Defaults to `[]` (disabled)
onlyLabels: [cant-reproduce]
# Label to use when marking an issue as stale
staleLabel: stale
# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
# Comment to post when closing a stale issue. Set to `false` to disable
closeComment: false
# See https://github.com/marketplace/stale for more info on the app
# and https://github.com/probot/stale for the configuration docs

View File

@@ -1,550 +0,0 @@
name: ci
on:
push:
tags:
# https://semver.org/#spec-item-2
- 'v[0-9]+.[0-9]+.[0-9]+'
# https://semver.org/#spec-item-9
- 'v[0-9]+.[0-9]+.[0-9]+-beta.rc[0-9]+'
branches-ignore:
- 'translations**'
pull_request:
branches-ignore:
- 'translations**'
jobs:
pre-commit:
name: Linting Checks
runs-on: ubuntu-latest
steps:
-
name: Checkout repository
uses: actions/checkout@v3
-
name: Install tools
uses: actions/setup-python@v4
with:
python-version: "3.9"
-
name: Check files
uses: pre-commit/action@v3.0.0
documentation:
name: "Build Documentation"
runs-on: ubuntu-20.04
needs:
- pre-commit
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Install pipenv
run: |
pipx install pipenv==2022.10.12
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.9
cache: "pipenv"
cache-dependency-path: 'Pipfile.lock'
-
name: Install dependencies
run: |
pipenv sync --dev
-
name: List installed Python dependencies
run: |
pipenv run pip list
-
name: Make documentation
run: |
cd docs/
pipenv run make html
-
name: Upload artifact
uses: actions/upload-artifact@v3
with:
name: documentation
path: docs/_build/html/
tests-backend:
name: "Tests (${{ matrix.python-version }})"
runs-on: ubuntu-20.04
needs:
- pre-commit
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10']
fail-fast: false
services:
tika:
image: ghcr.io/paperless-ngx/tika:latest
ports:
- "9998:9998/tcp"
gotenberg:
image: docker.io/gotenberg/gotenberg:7.6
ports:
- "3000:3000/tcp"
env:
# Enable Tika end to end testing
TIKA_LIVE: 1
# Enable paperless_mail testing against real server
PAPERLESS_MAIL_TEST_HOST: ${{ secrets.TEST_MAIL_HOST }}
PAPERLESS_MAIL_TEST_USER: ${{ secrets.TEST_MAIL_USER }}
PAPERLESS_MAIL_TEST_PASSWD: ${{ secrets.TEST_MAIL_PASSWD }}
steps:
-
name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
-
name: Install pipenv
run: |
pipx install pipenv==2022.10.12
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "${{ matrix.python-version }}"
cache: "pipenv"
cache-dependency-path: 'Pipfile.lock'
-
name: Install system dependencies
run: |
sudo apt-get update -qq
sudo apt-get install -qq --no-install-recommends unpaper tesseract-ocr imagemagick ghostscript libzbar0 poppler-utils
-
name: Install Python dependencies
run: |
pipenv sync --dev
-
name: List installed Python dependencies
run: |
pipenv run pip list
-
name: Tests
run: |
cd src/
pipenv run pytest -rfEp
-
name: Get changed files
id: changed-files-specific
uses: tj-actions/changed-files@v34
with:
files: |
src/**
-
name: List all changed files
run: |
for file in ${{ steps.changed-files-specific.outputs.all_changed_files }}; do
echo "${file} was changed"
done
-
name: Publish coverage results
if: matrix.python-version == '3.9' && steps.changed-files-specific.outputs.any_changed == 'true'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# https://github.com/coveralls-clients/coveralls-python/issues/251
run: |
cd src/
pipenv run coveralls --service=github
tests-frontend:
name: "Tests Frontend"
runs-on: ubuntu-20.04
needs:
- pre-commit
strategy:
matrix:
node-version: [16.x]
steps:
- uses: actions/checkout@v3
-
name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
- run: cd src-ui && npm ci
- run: cd src-ui && npm run test
- run: cd src-ui && npm run e2e:ci
prepare-docker-build:
name: Prepare Docker Pipeline Data
if: github.event_name == 'push' && (startsWith(github.ref, 'refs/heads/feature-') || github.ref == 'refs/heads/dev' || github.ref == 'refs/heads/beta' || contains(github.ref, 'beta.rc') || startsWith(github.ref, 'refs/tags/v'))
runs-on: ubuntu-20.04
# If the push triggered the installer library workflow, wait for it to
# complete here. This ensures the required versions for the final
# image have been built, while not waiting at all if the versions haven't changed
concurrency:
group: build-installer-library
cancel-in-progress: false
needs:
- documentation
- tests-backend
- tests-frontend
steps:
-
name: Set ghcr repository name
id: set-ghcr-repository
run: |
ghcr_name=$(echo "${GITHUB_REPOSITORY}" | awk '{ print tolower($0) }')
echo "repository=${ghcr_name}" >> $GITHUB_OUTPUT
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.9"
-
name: Setup qpdf image
id: qpdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py qpdf)
echo ${build_json}
echo "qpdf-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup psycopg2 image
id: psycopg2-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py psycopg2)
echo ${build_json}
echo "psycopg2-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup pikepdf image
id: pikepdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py pikepdf)
echo ${build_json}
echo "pikepdf-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup jbig2enc image
id: jbig2enc-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py jbig2enc)
echo ${build_json}
echo "jbig2enc-json=${build_json}" >> $GITHUB_OUTPUT
outputs:
ghcr-repository: ${{ steps.set-ghcr-repository.outputs.repository }}
qpdf-json: ${{ steps.qpdf-setup.outputs.qpdf-json }}
pikepdf-json: ${{ steps.pikepdf-setup.outputs.pikepdf-json }}
psycopg2-json: ${{ steps.psycopg2-setup.outputs.psycopg2-json }}
jbig2enc-json: ${{ steps.jbig2enc-setup.outputs.jbig2enc-json}}
# build and push image to docker hub.
build-docker-image:
runs-on: ubuntu-20.04
concurrency:
group: ${{ github.workflow }}-build-docker-image-${{ github.ref_name }}
cancel-in-progress: true
needs:
- prepare-docker-build
steps:
-
name: Check pushing to Docker Hub
id: docker-hub
# Only push to Dockerhub from the main repo AND the ref is either:
# main
# dev
# beta
# a tag
# Otherwise forks would require a Docker Hub account and secrets setup
run: |
if [[ ${{ needs.prepare-docker-build.outputs.ghcr-repository }} == "paperless-ngx/paperless-ngx" && ( ${{ github.ref_name }} == "main" || ${{ github.ref_name }} == "dev" || ${{ github.ref_name }} == "beta" || ${{ startsWith(github.ref, 'refs/tags/v') }} == "true" ) ]] ; then
echo "Enabling DockerHub image push"
echo "enable=true" >> $GITHUB_OUTPUT
else
echo "Not pushing to DockerHub"
echo "enable=false" >> $GITHUB_OUTPUT
fi
-
name: Gather Docker metadata
id: docker-meta
uses: docker/metadata-action@v4
with:
images: |
ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}
name=paperlessngx/paperless-ngx,enable=${{ steps.docker-hub.outputs.enable }}
tags: |
# Tag branches with branch name
type=ref,event=branch
# Process semver tags
# For a tag x.y.z or vX.Y.Z, output an x.y.z and x.y image tag
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
-
name: Set up QEMU
uses: docker/setup-qemu-action@v2
-
name: Login to Github Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
-
name: Login to Docker Hub
uses: docker/login-action@v2
# Don't attempt to login is not pushing to Docker Hub
if: steps.docker-hub.outputs.enable == 'true'
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
-
name: Build and push
uses: docker/build-push-action@v3
with:
context: .
file: ./Dockerfile
platforms: linux/amd64,linux/arm/v7,linux/arm64
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.docker-meta.outputs.tags }}
labels: ${{ steps.docker-meta.outputs.labels }}
build-args: |
JBIG2ENC_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.jbig2enc-json).version }}
QPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).version }}
PIKEPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.pikepdf-json).version }}
PSYCOPG2_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.psycopg2-json).version }}
# Get cache layers from this branch, then dev, then main
# This allows new branches to get at least some cache benefits, generally from dev
cache-from: |
type=registry,ref=ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}/builder/cache/app:${{ github.ref_name }}
type=registry,ref=ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}/builder/cache/app:dev
type=registry,ref=ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}/builder/cache/app:main
cache-to: |
type=registry,mode=max,ref=ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}/builder/cache/app:${{ github.ref_name }}
-
name: Inspect image
run: |
docker buildx imagetools inspect ${{ fromJSON(steps.docker-meta.outputs.json).tags[0] }}
-
name: Export frontend artifact from docker
run: |
docker create --name frontend-extract ${{ fromJSON(steps.docker-meta.outputs.json).tags[0] }}
docker cp frontend-extract:/usr/src/paperless/src/documents/static/frontend src/documents/static/frontend/
-
name: Upload frontend artifact
uses: actions/upload-artifact@v3
with:
name: frontend-compiled
path: src/documents/static/frontend/
build-release:
needs:
- build-docker-image
runs-on: ubuntu-20.04
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Install pipenv
run: |
pip3 install --upgrade pip setuptools wheel pipx
pipx install pipenv
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.9
cache: "pipenv"
cache-dependency-path: 'Pipfile.lock'
-
name: Install Python dependencies
run: |
pipenv sync --dev
-
name: Install system dependencies
run: |
sudo apt-get update -qq
sudo apt-get install -qq --no-install-recommends gettext liblept5
-
name: Download frontend artifact
uses: actions/download-artifact@v3
with:
name: frontend-compiled
path: src/documents/static/frontend/
-
name: Download documentation artifact
uses: actions/download-artifact@v3
with:
name: documentation
path: docs/_build/html/
-
name: Generate requirements file
run: |
pipenv requirements > requirements.txt
-
name: Compile messages
run: |
cd src/
pipenv run python3 manage.py compilemessages
-
name: Collect static files
run: |
cd src/
pipenv run python3 manage.py collectstatic --no-input
-
name: Move files
run: |
mkdir dist
mkdir dist/paperless-ngx
mkdir dist/paperless-ngx/scripts
cp .dockerignore .env Dockerfile Pipfile Pipfile.lock requirements.txt LICENSE README.md dist/paperless-ngx/
cp paperless.conf.example dist/paperless-ngx/paperless.conf
cp gunicorn.conf.py dist/paperless-ngx/gunicorn.conf.py
cp -r docker/ dist/paperless-ngx/docker
cp scripts/*.service scripts/*.sh dist/paperless-ngx/scripts/
cp -r src/ dist/paperless-ngx/src
cp -r docs/_build/html/ dist/paperless-ngx/docs
mv static dist/paperless-ngx
-
name: Make release package
run: |
cd dist
tar -cJf paperless-ngx.tar.xz paperless-ngx/
-
name: Upload release artifact
uses: actions/upload-artifact@v3
with:
name: release
path: dist/paperless-ngx.tar.xz
publish-release:
runs-on: ubuntu-20.04
outputs:
prerelease: ${{ steps.get_version.outputs.prerelease }}
changelog: ${{ steps.create-release.outputs.body }}
version: ${{ steps.get_version.outputs.version }}
needs:
- build-release
if: github.ref_type == 'tag' && (startsWith(github.ref_name, 'v') || contains(github.ref_name, '-beta.rc'))
steps:
-
name: Download release artifact
uses: actions/download-artifact@v3
with:
name: release
path: ./
-
name: Get version
id: get_version
run: |
echo "version=${{ github.ref_name }}" >> $GITHUB_OUTPUT
if [[ ${{ contains(github.ref_name, '-beta.rc') }} == 'true' ]]; then
echo "prerelease=true" >> $GITHUB_OUTPUT
else
echo "prerelease=false" >> $GITHUB_OUTPUT
fi
-
name: Create Release and Changelog
id: create-release
uses: paperless-ngx/release-drafter@master
with:
name: Paperless-ngx ${{ steps.get_version.outputs.version }}
tag: ${{ steps.get_version.outputs.version }}
version: ${{ steps.get_version.outputs.version }}
prerelease: ${{ steps.get_version.outputs.prerelease }}
publish: true # ensures release is not marked as draft
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-
name: Upload release archive
id: upload-release-asset
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ steps.create-release.outputs.upload_url }}
asset_path: ./paperless-ngx.tar.xz
asset_name: paperless-ngx-${{ steps.get_version.outputs.version }}.tar.xz
asset_content_type: application/x-xz
append-changelog:
runs-on: ubuntu-20.04
needs:
- publish-release
if: needs.publish-release.outputs.prerelease == 'false'
steps:
-
name: Checkout
uses: actions/checkout@v3
with:
ref: main
-
name: Install pipenv
run: |
pip3 install --upgrade pip setuptools wheel pipx
pipx install pipenv
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.9
cache: "pipenv"
cache-dependency-path: 'Pipfile.lock'
-
name: Append Changelog to docs
id: append-Changelog
working-directory: docs
run: |
git branch ${{ needs.publish-release.outputs.version }}-changelog
git checkout ${{ needs.publish-release.outputs.version }}-changelog
echo -e "# Changelog\n\n${{ needs.publish-release.outputs.changelog }}\n" > changelog-new.md
echo "Manually linking usernames"
sed -i -r 's|@(.+?) \(\[#|[@\1](https://github.com/\1) ([#|ig' changelog-new.md
CURRENT_CHANGELOG=`tail --lines +2 changelog.md`
echo -e "$CURRENT_CHANGELOG" >> changelog-new.md
mv changelog-new.md changelog.md
pipenv run pre-commit run --files changelog.md
git config --global user.name "github-actions"
git config --global user.email "41898282+github-actions[bot]@users.noreply.github.com"
git commit -am "Changelog ${{ needs.publish-release.outputs.version }} - GHA"
git push origin ${{ needs.publish-release.outputs.version }}-changelog
-
name: Create Pull Request
uses: actions/github-script@v6
with:
script: |
const { repo, owner } = context.repo;
const result = await github.rest.pulls.create({
title: '[Documentation] Add ${{ needs.publish-release.outputs.version }} changelog',
owner,
repo,
head: '${{ needs.publish-release.outputs.version }}-changelog',
base: 'main',
body: 'This PR is auto-generated by CI.'
});
github.rest.issues.addLabels({
owner,
repo,
issue_number: result.data.number,
labels: ['documentation']
});

View File

@@ -1,95 +0,0 @@
# This workflow runs on certain conditions to check for and potentially
# delete container images from the GHCR which no longer have an associated
# code branch.
# Requires a PAT with the correct scope set in the secrets
name: Cleanup Image Tags
on:
schedule:
- cron: '0 0 * * SAT'
delete:
pull_request:
types:
- closed
push:
paths:
- ".github/workflows/cleanup-tags.yml"
- ".github/scripts/cleanup-tags.py"
- ".github/scripts/github.py"
- ".github/scripts/common.py"
concurrency:
group: registry-tags-cleanup
cancel-in-progress: false
jobs:
cleanup-images:
name: Cleanup Image Tags for ${{ matrix.primary-name }}
runs-on: ubuntu-latest
strategy:
matrix:
include:
- primary-name: "paperless-ngx"
cache-name: "paperless-ngx/builder/cache/app"
- primary-name: "paperless-ngx/builder/qpdf"
cache-name: "paperless-ngx/builder/cache/qpdf"
- primary-name: "paperless-ngx/builder/pikepdf"
cache-name: "paperless-ngx/builder/cache/pikepdf"
- primary-name: "paperless-ngx/builder/jbig2enc"
cache-name: "paperless-ngx/builder/cache/jbig2enc"
- primary-name: "paperless-ngx/builder/psycopg2"
cache-name: "paperless-ngx/builder/cache/psycopg2"
env:
# Requires a personal access token with the OAuth scope delete:packages
TOKEN: ${{ secrets.GHA_CONTAINER_DELETE_TOKEN }}
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Login to Github Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
-
name: Install httpx
run: |
python -m pip install httpx
#
# Clean up primary package
#
-
name: Cleanup for package "${{ matrix.primary-name }}"
if: "${{ env.TOKEN != '' }}"
run: |
python ${GITHUB_WORKSPACE}/.github/scripts/cleanup-tags.py --untagged --is-manifest --delete "${{ matrix.primary-name }}"
#
# Clean up registry cache package
#
-
name: Cleanup for package "${{ matrix.cache-name }}"
if: "${{ env.TOKEN != '' }}"
run: |
python ${GITHUB_WORKSPACE}/.github/scripts/cleanup-tags.py --untagged --delete "${{ matrix.cache-name }}"
#
# Verify tags which are left still pull
#
-
name: Check all tags still pull
run: |
ghcr_name=$(echo "ghcr.io/${GITHUB_REPOSITORY_OWNER}/${{ matrix.primary-name }}" | awk '{ print tolower($0) }')
echo "Pulling all tags of ${ghcr_name}"
docker pull --quiet --all-tags ${ghcr_name}
docker image list

View File

@@ -1,54 +0,0 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"
on:
push:
branches: [ main, dev ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ dev ]
schedule:
- cron: '28 13 * * 5'
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: [ 'javascript', 'python' ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
# Learn more about CodeQL language support at https://git.io/codeql-language-support
steps:
- name: Checkout repository
uses: actions/checkout@v3
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2

View File

@@ -1,170 +0,0 @@
# This workflow will run to update the installer library of
# Docker images. These are the images which provide updated wheels
# .deb installation packages or maybe just some compiled library
name: Build Image Library
on:
push:
# Must match one of these branches AND one of the paths
# to be triggered
branches:
- "main"
- "dev"
- "library-*"
- "feature-*"
paths:
# Trigger the workflow if a Dockerfile changed
- "docker-builders/**"
# Trigger if a package was updated
- ".build-config.json"
- "Pipfile.lock"
# Also trigger on workflow changes related to the library
- ".github/workflows/installer-library.yml"
- ".github/workflows/reusable-workflow-builder.yml"
- ".github/scripts/**"
# Set a workflow level concurrency group so primary workflow
# can wait for this to complete if needed
# DO NOT CHANGE without updating main workflow group
concurrency:
group: build-installer-library
cancel-in-progress: false
jobs:
prepare-docker-build:
name: Prepare Docker Image Version Data
runs-on: ubuntu-20.04
steps:
-
name: Set ghcr repository name
id: set-ghcr-repository
run: |
ghcr_name=$(echo "${GITHUB_REPOSITORY}" | awk '{ print tolower($0) }')
echo "repository=${ghcr_name}" >> $GITHUB_OUTPUT
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.9"
-
name: Install jq
run: |
sudo apt-get update
sudo apt-get install jq
-
name: Setup qpdf image
id: qpdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py qpdf)
echo ${build_json}
echo "qpdf-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup psycopg2 image
id: psycopg2-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py psycopg2)
echo ${build_json}
echo "psycopg2-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup pikepdf image
id: pikepdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py pikepdf)
echo ${build_json}
echo "pikepdf-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup jbig2enc image
id: jbig2enc-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py jbig2enc)
echo ${build_json}
echo "jbig2enc-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup other versions
id: cache-bust-setup
run: |
pillow_version=$(jq ".default.pillow.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
lxml_version=$(jq ".default.lxml.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
echo "Pillow is ${pillow_version}"
echo "lxml is ${lxml_version}"
echo "pillow-version=${pillow_version}" >> $GITHUB_OUTPUT
echo "lxml-version=${lxml_version}" >> $GITHUB_OUTPUT
outputs:
ghcr-repository: ${{ steps.set-ghcr-repository.outputs.repository }}
qpdf-json: ${{ steps.qpdf-setup.outputs.qpdf-json }}
pikepdf-json: ${{ steps.pikepdf-setup.outputs.pikepdf-json }}
psycopg2-json: ${{ steps.psycopg2-setup.outputs.psycopg2-json }}
jbig2enc-json: ${{ steps.jbig2enc-setup.outputs.jbig2enc-json }}
pillow-version: ${{ steps.cache-bust-setup.outputs.pillow-version }}
lxml-version: ${{ steps.cache-bust-setup.outputs.lxml-version }}
build-qpdf-debs:
name: qpdf
needs:
- prepare-docker-build
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.qpdf
build-json: ${{ needs.prepare-docker-build.outputs.qpdf-json }}
build-args: |
QPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).version }}
build-jbig2enc:
name: jbig2enc
needs:
- prepare-docker-build
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.jbig2enc
build-json: ${{ needs.prepare-docker-build.outputs.jbig2enc-json }}
build-args: |
JBIG2ENC_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.jbig2enc-json).version }}
build-psycopg2-wheel:
name: psycopg2
needs:
- prepare-docker-build
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.psycopg2
build-json: ${{ needs.prepare-docker-build.outputs.psycopg2-json }}
build-args: |
PSYCOPG2_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.psycopg2-json).version }}
build-pikepdf-wheel:
name: pikepdf
needs:
- prepare-docker-build
- build-qpdf-debs
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.pikepdf
build-json: ${{ needs.prepare-docker-build.outputs.pikepdf-json }}
build-args: |
REPO=${{ needs.prepare-docker-build.outputs.ghcr-repository }}
QPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).version }}
PIKEPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.pikepdf-json).version }}
PILLOW_VERSION=${{ needs.prepare-docker-build.outputs.pillow-version }}
LXML_VERSION=${{ needs.prepare-docker-build.outputs.lxml-version }}

View File

@@ -1,57 +0,0 @@
name: Project Automations
on:
issues:
types:
- opened
- reopened
pull_request_target: #_target allows access to secrets
types:
- opened
- reopened
branches:
- main
- dev
permissions:
contents: read
env:
todo: Todo
done: Done
in_progress: In Progress
jobs:
issue_opened_or_reopened:
name: issue_opened_or_reopened
runs-on: ubuntu-latest
if: github.event_name == 'issues' && (github.event.action == 'opened' || github.event.action == 'reopened')
steps:
- name: Add issue to project and set status to ${{ env.todo }}
uses: leonsteinhaeuser/project-beta-automations@v2.0.1
with:
gh_token: ${{ secrets.GH_TOKEN }}
organization: paperless-ngx
project_id: 2
resource_node_id: ${{ github.event.issue.node_id }}
status_value: ${{ env.todo }} # Target status
pr_opened_or_reopened:
name: pr_opened_or_reopened
runs-on: ubuntu-latest
permissions:
# write permission is required for autolabeler
pull-requests: write
if: github.event_name == 'pull_request_target' && (github.event.action == 'opened' || github.event.action == 'reopened') && github.event.pull_request.user.login != 'dependabot'
steps:
- name: Add PR to project and set status to "Needs Review"
uses: leonsteinhaeuser/project-beta-automations@v2.0.1
with:
gh_token: ${{ secrets.GH_TOKEN }}
organization: paperless-ngx
project_id: 2
resource_node_id: ${{ github.event.pull_request.node_id }}
status_value: "Needs Review" # Target status
- name: Label PR with release-drafter
uses: release-drafter/release-drafter@v5
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -1,53 +0,0 @@
name: Reusable Image Builder
on:
workflow_call:
inputs:
dockerfile:
required: true
type: string
build-json:
required: true
type: string
build-args:
required: false
default: ""
type: string
concurrency:
group: ${{ github.workflow }}-${{ fromJSON(inputs.build-json).name }}-${{ fromJSON(inputs.build-json).version }}
cancel-in-progress: false
jobs:
build-image:
name: Build ${{ fromJSON(inputs.build-json).name }} @ ${{ fromJSON(inputs.build-json).version }}
runs-on: ubuntu-latest
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Login to Github Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
-
name: Set up QEMU
uses: docker/setup-qemu-action@v2
-
name: Build ${{ fromJSON(inputs.build-json).name }}
uses: docker/build-push-action@v3
with:
context: .
file: ${{ inputs.dockerfile }}
tags: ${{ fromJSON(inputs.build-json).image_tag }}
platforms: linux/amd64,linux/arm64,linux/arm/v7
build-args: ${{ inputs.build-args }}
push: true
cache-from: type=registry,ref=${{ fromJSON(inputs.build-json).cache_tag }}
cache-to: type=registry,mode=max,ref=${{ fromJSON(inputs.build-json).cache_tag }}

39
.gitignore vendored
View File

@@ -42,7 +42,6 @@ htmlcov/
nosetests.xml
coverage.xml
*,cover
.pytest_cache
# Translations
*.mo
@@ -57,42 +56,24 @@ docs/_build/
# PyBuilder
target/
# Stored PDFs
media/documents/*.gpg
media/documents/thumbnails/*.gpg
media/documents/originals/*.gpg
# Sqlite database
db.sqlite3
# PyCharm
.idea
# VS Code
.vscode
/src-ui/.vscode
/docs/.vscode
# Other stuff that doesn't belong
.virtualenv
virtualenv
/venv
.venv/
/docker-compose.env
/docker-compose.yml
.vagrant
docker-compose.yml
docker-compose.env
# Used for development
scripts/import-for-development
scripts/nuke
# Static files collected by the collectstatic command
/static/
# Stored PDFs
/media/
/data/
/paperless.conf
/consume/
/export/
# this is where the compiled frontend is moved to.
/src/documents/static/frontend/
# mac os
.DS_Store
# celery schedule file
celerybeat-schedule*

View File

@@ -1,8 +0,0 @@
failure-threshold: warning
ignored:
# https://github.com/hadolint/hadolint/wiki/DL3008
- DL3008
# https://github.com/hadolint/hadolint/wiki/DL3013
- DL3013
# https://github.com/hadolint/hadolint/wiki/DL3003
- DL3003

View File

@@ -1,87 +0,0 @@
# This file configures pre-commit hooks.
# See https://pre-commit.com/ for general information
# See https://pre-commit.com/hooks.html for a listing of possible hooks
repos:
# General hooks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
hooks:
- id: check-docstring-first
- id: check-json
exclude: "tsconfig.*json"
- id: check-yaml
- id: check-toml
- id: check-executables-have-shebangs
- id: end-of-file-fixer
exclude_types:
- svg
- pofile
exclude: "(^LICENSE$)"
- id: mixed-line-ending
args:
- "--fix=lf"
- id: trailing-whitespace
exclude_types:
- svg
- id: check-case-conflict
- id: detect-private-key
- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v2.7.1"
hooks:
- id: prettier
types_or:
- javascript
- ts
- markdown
exclude: "(^Pipfile\\.lock$)"
# Python hooks
- repo: https://github.com/asottile/reorder_python_imports
rev: v3.9.0
hooks:
- id: reorder-python-imports
exclude: "(migrations)"
- repo: https://github.com/asottile/yesqa
rev: "v1.4.0"
hooks:
- id: yesqa
exclude: "(migrations)"
- repo: https://github.com/asottile/add-trailing-comma
rev: "v2.3.0"
hooks:
- id: add-trailing-comma
exclude: "(migrations)"
- repo: https://github.com/PyCQA/flake8
rev: 5.0.4
hooks:
- id: flake8
files: ^src/
args:
- "--config=./src/setup.cfg"
- repo: https://github.com/psf/black
rev: 22.10.0
hooks:
- id: black
- repo: https://github.com/asottile/pyupgrade
rev: v3.2.2
hooks:
- id: pyupgrade
exclude: "(migrations)"
args:
- "--py38-plus"
# Dockerfile hooks
- repo: https://github.com/AleksaC/hadolint-py
rev: v2.10.0
hooks:
- id: hadolint
# Shell script hooks
- repo: https://github.com/lovesegfault/beautysh
rev: v6.2.1
hooks:
- id: beautysh
args:
- "--tab"
- repo: https://github.com/shellcheck-py/shellcheck-py
rev: "v0.8.0.4"
hooks:
- id: shellcheck

View File

@@ -1,4 +0,0 @@
# https://prettier.io/docs/en/options.html#semicolons
semi: false
# https://prettier.io/docs/en/options.html#quotes
singleQuote: true

View File

@@ -1,16 +0,0 @@
# .readthedocs.yml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py
# Optionally set the version of Python and requirements required to build your docs
python:
version: "3.8"
install:
- requirements: docs/requirements.txt

18
.travis.yml Normal file
View File

@@ -0,0 +1,18 @@
language: python
sudo: false
matrix:
include:
- python: 3.4
env: TOXENV=py34
- python: 3.5
env: TOXENV=py35
- python: 3.5
env: TOXENV=pep8
install:
- pip install --requirement requirements.txt
- pip install tox
script: tox -c src/tox.ini

View File

@@ -1,9 +0,0 @@
/.github/workflows/ @paperless-ngx/ci-cd
/docker/ @paperless-ngx/ci-cd
/scripts/ @paperless-ngx/ci-cd
/src-ui/ @paperless-ngx/frontend
/src/ @paperless-ngx/backend
Pipfile* @paperless-ngx/backend
*.py @paperless-ngx/backend

View File

@@ -1,128 +0,0 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
- Demonstrating empathy and kindness toward other people
- Being respectful of differing opinions, viewpoints, and experiences
- Giving and gracefully accepting constructive feedback
- Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
- Focusing on what is best not just for us as individuals, but for the
overall community
Examples of unacceptable behavior include:
- The use of sexualized language or imagery, and sexual attention or
advances of any kind
- Trolling, insulting or derogatory comments, and personal or political attacks
- Public or private harassment
- Publishing others' private information, such as a physical or email
address, without their explicit permission
- Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
hello@paperless-ngx.com.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series
of actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within
the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at
https://www.contributor-covenant.org/translations.

View File

@@ -1,132 +0,0 @@
# Contributing
If you feel like contributing to the project, please do! Bug fixes and improvements are always welcome.
If you want to implement something big:
- Please start a discussion about that in the issues! Maybe something similar is already in development and we can make it happen together.
- When making additions to the project, consider if the majority of users will benefit from your change. If not, you're probably better of forking the project.
- Also consider if your change will get in the way of other users. A good change is a change that enhances the experience of some users who want that change and does not affect users who do not care about the change.
- Please see the [paperless-ngx merge process](#merging-prs) below.
## Python
Paperless supports python 3.8 and 3.9. We format Python code with [Black](https://github.com/psf/black).
## Branches
`main` always reflects the latest release. Apart from changes to the documentation or readme, absolutely no functional changes on this branch in between releases.
`dev` contains all changes that will be part of the next release. Use this branch to start making your changes.
`feature-X` branches are for experimental stuff that will eventually be merged into dev.
## Testing:
Please format and test your code! I know it's a hassle, but it makes sure that your code works now and will allow us to detect regressions easily.
To test your code, execute `pytest` in the src/ directory. This also generates a html coverage report, which you can use to see if you missed anything important during testing.
Before you can run `pytest`, ensure to [properly set up your local environment](https://paperless-ngx.readthedocs.io/en/latest/extending.html#initial-setup-and-first-start).
## More info:
... is available in the documentation. https://paperless-ngx.readthedocs.io/en/latest/extending.html
# Merging PRs
Once you have submitted a **P**ull **R**equest it will be reviewed, approved, and merged by one or more community members of any team. Automated code tests and formatting checks must be passed.
## Non-Trivial Requests
PRs deemed `non-trivial` will go through a stricter review process before being merged into `dev`. This is to ensure code quality and complete functionality (free of side effects).
Examples of `non-trivial` PRs might include:
- Additional features
- Large changes to many distinct files
- Breaking or depreciation of existing features
Our community review process for `non-trivial` PRs is the following:
1. Must pass usual automated code tests and formatting checks.
2. The PR will be assigned and pinged to the appropriately experienced team (i.e. @paperless-ngx/backend for backend changes).
3. Development team will check and test code manually (possibly over several days).
- You may be asked to make changes or rebase.
- The team may ask for additional testing done by @paperless-ngx/test
4. **At least two** members of the team will approve and finally merge the request into `dev` 🎉.
This process might be slow as community members have different schedules and time to dedicate to the Paperless project. However it ensures community code reviews are as brilliantly thorough as they once were with @jonaswinkler.
# Translating Paperless-ngx
Some notes about translation:
- There are two resources:
- `src-ui/messages.xlf` contains the translation strings for the front end. This is the most important.
- `django.po` contains strings for the administration section of paperless, which is nice to have translated.
- Most of the front-end strings are used on buttons, menu items, etc., so ideally the translated string should not be much longer than the English original.
- Translation units may contain placeholders. These usually mean that there's a name of a tag or document or something in the string. You can click on the placeholders to copy them.
- Translation units may contain plural expressions such as `{PLURAL_VAR, plural, =1 {one result} =0 {no results} other {<placeholder> results}}`. Copy these verbatim and translate only the content in the inner `{}` brackets. Example: `{PLURAL_VAR, plural, =1 {Ein Ergebnis} =0 {Keine Ergebnisse} other {<placeholder> Ergebnisse}}`
- Changes to translations on Crowdin will get pushed into the repository automatically.
## Adding new languages to the codebase
If a language has already been added, and you would like to contribute new translations or change existing translations, please read the "Translation" section in the README.md file for further details on that.
If you would like the project to be translated to another language, first head over to https://crwd.in/paperless-ngx to check if that language has already been enabled for translation.
If not, please request the language to be added by creating an issue on GitHub. The issue should contain:
- English name of the language (the localized name can be added on Crowdin).
- ISO language code. A list of those can be found here: https://support.crowdin.com/enterprise/language-codes/
- Date format commonly used for the language, e.g. dd/mm/yyyy, mm/dd/yyyy, etc.
After the language has been added and some translations have been made on Crowdin, the language needs to be enabled in the code.
Note that there is no need to manually add a .po of .xlf file as those will be automatically generated and imported from Crowdin.
The following files need to be changed:
- src-ui/angular.json (under the _projects/paperless-ui/i18n/locales_ JSON key)
- src/paperless/settings.py (in the _LANGUAGES_ array)
- src-ui/src/app/services/settings.service.ts (inside the _getLanguageOptions_ method)
- src-ui/src/app/app.module.ts (import locale from _angular/common/locales_ and call _registerLocaleData_)
Please add the language in the correct order, alphabetically by locale.
Note that _en-us_ needs to stay on top of the list, as it is the default project language
If you are familiar with Git, feel free to send a Pull Request with those changes.
If not, let us know in the issue you created for the language, so that another developer can make these changes.
# Organization Structure & Membership
Paperless-ngx is a community project. We do our best to delegate permission and responsibility among a team of people to ensure the longevity of the project.
## Structure
As of writing, there are 21 members in paperless-ngx. 4 of these people have complete administrative privileges to the repo:
- [@shamoon](https://github.com/shamoon)
- [@bauerj](https://github.com/bauerj)
- [@qcasey](https://github.com/qcasey)
- [@FrankStrieter](https://github.com/FrankStrieter)
There are 5 teams collaborating on specific tasks within paperless-ngx:
- @paperless-ngx/backend (Python / django)
- @paperless-ngx/frontend (JavaScript / Typescript)
- @paperless-ngx/ci-cd (GitHub Actions / Deployment)
- @paperless-ngx/issues (Issue triage)
- @paperless-ngx/test (General testing for larger PRs)
## Permissions
All team members are notified when mentioned or assigned to a relevant issue or pull request. Additionally, each team has slightly different access to paperless-ngx:
- The **test** team has no special permissions.
- The **issues** team has `triage` access. This means they can organize issues and pull requests.
- The **backend**, **frontend**, and **ci-cd** teams have `write` access. This means they can approve PRs and push code, containers, releases, and more.
## Joining
We are not overly strict with inviting people to the organization. If you have read the [team permissions](#permissions) and think having additional access would enhance your contributions, please reach out to an [admin](#structure) of the team.
The admins occasionally invite contributors directly if we believe having them on a team will accelerate their work.

View File

@@ -1,255 +1,46 @@
# syntax=docker/dockerfile:1.4
FROM python:3.5
MAINTAINER Pit Kleyersburg <pitkley@googlemail.com>
# Pull the installer images from the library
# These are all built previously
# They provide either a .deb or .whl
# Install dependencies
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
sudo \
tesseract-ocr tesseract-ocr-eng imagemagick ghostscript unpaper \
&& rm -rf /var/lib/apt/lists/*
ARG JBIG2ENC_VERSION
ARG QPDF_VERSION
ARG PIKEPDF_VERSION
ARG PSYCOPG2_VERSION
# Install python dependencies
RUN mkdir -p /usr/src/paperless
WORKDIR /usr/src/paperless
COPY requirements.txt /usr/src/paperless/
RUN pip install --no-cache-dir -r requirements.txt
FROM ghcr.io/paperless-ngx/paperless-ngx/builder/jbig2enc:${JBIG2ENC_VERSION} as jbig2enc-builder
FROM ghcr.io/paperless-ngx/paperless-ngx/builder/qpdf:${QPDF_VERSION} as qpdf-builder
FROM ghcr.io/paperless-ngx/paperless-ngx/builder/pikepdf:${PIKEPDF_VERSION} as pikepdf-builder
FROM ghcr.io/paperless-ngx/paperless-ngx/builder/psycopg2:${PSYCOPG2_VERSION} as psycopg2-builder
# Copy application
RUN mkdir -p /usr/src/paperless/src
RUN mkdir -p /usr/src/paperless/data
RUN mkdir -p /usr/src/paperless/media
COPY src/ /usr/src/paperless/src/
COPY data/ /usr/src/paperless/data/
COPY media/ /usr/src/paperless/media/
FROM --platform=$BUILDPLATFORM node:16-bullseye-slim AS compile-frontend
# Set consumption directory
ENV PAPERLESS_CONSUMPTION_DIR /consume
RUN mkdir -p $PAPERLESS_CONSUMPTION_DIR
# This stage compiles the frontend
# This stage runs once for the native platform, as the outputs are not
# dependent on target arch
# Inputs: None
# Migrate database
WORKDIR /usr/src/paperless/src
RUN ./manage.py migrate
COPY ./src-ui /src/src-ui
# Create user
RUN groupadd -g 1000 paperless \
&& useradd -u 1000 -g 1000 -d /usr/src/paperless paperless \
&& chown -Rh paperless:paperless /usr/src/paperless
WORKDIR /src/src-ui
RUN set -eux \
&& npm update npm -g \
&& npm ci --omit=optional
RUN set -eux \
&& ./node_modules/.bin/ng build --configuration production
# Setup entrypoint
COPY scripts/docker-entrypoint.sh /sbin/docker-entrypoint.sh
RUN chmod 755 /sbin/docker-entrypoint.sh
FROM --platform=$BUILDPLATFORM python:3.9-slim-bullseye as pipenv-base
# This stage generates the requirements.txt file using pipenv
# This stage runs once for the native platform, as the outputs are not
# dependent on target arch
# This way, pipenv dependencies are not left in the final image
# nor can pipenv mess up the final image somehow
# Inputs: None
WORKDIR /usr/src/pipenv
COPY Pipfile* ./
RUN set -eux \
&& echo "Installing pipenv" \
&& python3 -m pip install --no-cache-dir --upgrade pipenv \
&& echo "Generating requirement.txt" \
&& pipenv requirements > requirements.txt
FROM python:3.9-slim-bullseye as main-app
LABEL org.opencontainers.image.authors="paperless-ngx team <hello@paperless-ngx.com>"
LABEL org.opencontainers.image.documentation="https://paperless-ngx.readthedocs.io/en/latest/"
LABEL org.opencontainers.image.source="https://github.com/paperless-ngx/paperless-ngx"
LABEL org.opencontainers.image.url="https://github.com/paperless-ngx/paperless-ngx"
LABEL org.opencontainers.image.licenses="GPL-3.0-only"
ARG DEBIAN_FRONTEND=noninteractive
#
# Begin installation and configuration
# Order the steps below from least often changed to most
#
# copy jbig2enc
# Basically will never change again
COPY --from=jbig2enc-builder /usr/src/jbig2enc/src/.libs/libjbig2enc* /usr/local/lib/
COPY --from=jbig2enc-builder /usr/src/jbig2enc/src/jbig2 /usr/local/bin/
COPY --from=jbig2enc-builder /usr/src/jbig2enc/src/*.h /usr/local/include/
# Packages need for running
ARG RUNTIME_PACKAGES="\
curl \
file \
# fonts for text file thumbnail generation
fonts-liberation \
gettext \
ghostscript \
gnupg \
gosu \
icc-profiles-free \
imagemagick \
media-types \
liblept5 \
libpq5 \
libxml2 \
liblcms2-2 \
libtiff5 \
libxslt1.1 \
libfreetype6 \
libwebp6 \
libopenjp2-7 \
libimagequant0 \
libraqm0 \
libgnutls30 \
libjpeg62-turbo \
python3 \
python3-pip \
python3-setuptools \
postgresql-client \
mariadb-client \
# For Numpy
libatlas3-base \
# OCRmyPDF dependencies
tesseract-ocr \
tesseract-ocr-eng \
tesseract-ocr-deu \
tesseract-ocr-fra \
tesseract-ocr-ita \
tesseract-ocr-spa \
# Suggested for OCRmyPDF
pngquant \
# Suggested for pikepdf
jbig2dec \
tzdata \
unpaper \
# Mime type detection
zlib1g \
# Barcode splitter
libzbar0 \
poppler-utils"
# Install basic runtime packages.
# These change very infrequently
RUN set -eux \
echo "Installing system packages" \
&& apt-get update \
&& apt-get install --yes --quiet --no-install-recommends ${RUNTIME_PACKAGES} \
&& rm -rf /var/lib/apt/lists/* \
&& echo "Installing supervisor" \
&& python3 -m pip install --default-timeout=1000 --upgrade --no-cache-dir supervisor==4.2.4
# Copy gunicorn config
# Changes very infrequently
WORKDIR /usr/src/paperless/
COPY gunicorn.conf.py .
# setup docker-specific things
# Use mounts to avoid copying installer files into the image
# These change sometimes, but rarely
WORKDIR /usr/src/paperless/src/docker/
COPY [ \
"docker/imagemagick-policy.xml", \
"docker/supervisord.conf", \
"docker/docker-entrypoint.sh", \
"docker/docker-prepare.sh", \
"docker/paperless_cmd.sh", \
"docker/wait-for-redis.py", \
"docker/management_script.sh", \
"docker/flower-conditional.sh", \
"docker/install_management_commands.sh", \
"/usr/src/paperless/src/docker/" \
]
RUN set -eux \
&& echo "Configuring ImageMagick" \
&& mv imagemagick-policy.xml /etc/ImageMagick-6/policy.xml \
&& echo "Configuring supervisord" \
&& mkdir /var/log/supervisord /var/run/supervisord \
&& mv supervisord.conf /etc/supervisord.conf \
&& echo "Setting up Docker scripts" \
&& mv docker-entrypoint.sh /sbin/docker-entrypoint.sh \
&& chmod 755 /sbin/docker-entrypoint.sh \
&& mv docker-prepare.sh /sbin/docker-prepare.sh \
&& chmod 755 /sbin/docker-prepare.sh \
&& mv wait-for-redis.py /sbin/wait-for-redis.py \
&& chmod 755 /sbin/wait-for-redis.py \
&& mv paperless_cmd.sh /usr/local/bin/paperless_cmd.sh \
&& chmod 755 /usr/local/bin/paperless_cmd.sh \
&& mv flower-conditional.sh /usr/local/bin/flower-conditional.sh \
&& chmod 755 /usr/local/bin/flower-conditional.sh \
&& echo "Installing managment commands" \
&& chmod +x install_management_commands.sh \
&& ./install_management_commands.sh
# Install the built packages from the installer library images
# Use mounts to avoid copying installer files into the image
# These change sometimes
RUN --mount=type=bind,from=qpdf-builder,target=/qpdf \
--mount=type=bind,from=psycopg2-builder,target=/psycopg2 \
--mount=type=bind,from=pikepdf-builder,target=/pikepdf \
set -eux \
&& echo "Installing qpdf" \
&& apt-get install --yes --no-install-recommends /qpdf/usr/src/qpdf/libqpdf29_*.deb \
&& apt-get install --yes --no-install-recommends /qpdf/usr/src/qpdf/qpdf_*.deb \
&& echo "Installing pikepdf and dependencies" \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/wheels/pyparsing*.whl \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/wheels/packaging*.whl \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/wheels/lxml*.whl \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/wheels/Pillow*.whl \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/wheels/pikepdf*.whl \
&& python3 -m pip list \
&& echo "Installing psycopg2" \
&& python3 -m pip install --no-cache-dir /psycopg2/usr/src/wheels/psycopg2*.whl \
&& python3 -m pip list
WORKDIR /usr/src/paperless/src/
# Python dependencies
# Change pretty frequently
COPY --from=pipenv-base /usr/src/pipenv/requirements.txt ./
# Packages needed only for building a few quick Python
# dependencies
ARG BUILD_PACKAGES="\
build-essential \
git \
default-libmysqlclient-dev \
python3-dev"
RUN set -eux \
&& echo "Installing build system packages" \
&& apt-get update \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& python3 -m pip install --no-cache-dir --upgrade wheel \
&& echo "Installing Python requirements" \
&& python3 -m pip install --default-timeout=1000 --no-cache-dir --requirement requirements.txt \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& apt-get clean --yes \
&& rm -rf /var/lib/apt/lists/* \
&& rm -rf /tmp/* \
&& rm -rf /var/tmp/* \
&& rm -rf /var/cache/apt/archives/* \
&& truncate -s 0 /var/log/*log
# copy backend
COPY ./src ./
# copy frontend
COPY --from=compile-frontend /src/src/documents/static/frontend/ ./documents/static/frontend/
# add users, setup scripts
RUN set -eux \
&& addgroup --gid 1000 paperless \
&& useradd --uid 1000 --gid paperless --home-dir /usr/src/paperless paperless \
&& chown -R paperless:paperless ../ \
&& gosu paperless python3 manage.py collectstatic --clear --no-input \
&& gosu paperless python3 manage.py compilemessages
VOLUME ["/usr/src/paperless/data", \
"/usr/src/paperless/media", \
"/usr/src/paperless/consume", \
"/usr/src/paperless/export"]
# Mount volumes
VOLUME ["/usr/src/paperless/data", "/usr/src/paperless/media", "/consume"]
ENTRYPOINT ["/sbin/docker-entrypoint.sh"]
EXPOSE 8000
CMD ["/usr/local/bin/paperless_cmd.sh"]
CMD ["--help"]

80
Pipfile
View File

@@ -1,80 +0,0 @@
[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"
[[source]]
url = "https://www.piwheels.org/simple"
verify_ssl = true
name = "piwheels"
[packages]
dateparser = "~=1.1"
django = "~=4.1"
django-cors-headers = "*"
django-extensions = "*"
django-filter = "~=22.1"
djangorestframework = "~=3.14"
filelock = "*"
gunicorn = "*"
imap-tools = "*"
langdetect = "*"
pathvalidate = "*"
pillow = "~=9.3"
pikepdf = "*"
python-gnupg = "*"
python-dotenv = "*"
python-dateutil = "*"
python-magic = "*"
psycopg2 = "*"
rapidfuzz = "*"
redis = {extras = ["hiredis"], version = "*"}
scikit-learn = "~=1.1"
# Pin this until piwheels is building 1.9 (see https://www.piwheels.org/project/scipy/)
scipy = "==1.8.1"
numpy = "*"
whitenoise = "~=6.2"
watchdog = "~=2.1"
whoosh="~=2.7"
inotifyrecursive = "~=0.3"
ocrmypdf = "~=14.0"
tqdm = "*"
tika = "*"
# TODO: This will sadly also install daphne+dependencies,
# which an ASGI server we don't need. Adds about 15MB image size.
channels = "~=3.0"
# Locked version until https://github.com/django/channels_redis/issues/332
# is resolved
channels-redis = "==3.4.1"
uvicorn = {extras = ["standard"], version = "*"}
concurrent-log-handler = "*"
"pdfminer.six" = "*"
"backports.zoneinfo" = {version = "*", markers = "python_version < '3.9'"}
"importlib-resources" = {version = "*", markers = "python_version < '3.9'"}
zipp = {version = "*", markers = "python_version < '3.9'"}
pyzbar = "*"
mysqlclient = "*"
celery = {extras = ["redis"], version = "*"}
django-celery-results = "*"
setproctitle = "*"
nltk = "*"
pdf2image = "*"
flower = "*"
[dev-packages]
coveralls = "*"
factory-boy = "*"
pycodestyle = "*"
pytest = "*"
pytest-cov = "*"
pytest-django = "*"
pytest-env = "*"
pytest-sugar = "*"
pytest-xdist = "*"
sphinx = "~=5.3"
sphinx_rtd_theme = "*"
tox = "*"
black = "*"
pre-commit = "*"
sphinx-autobuild = "*"
myst-parser = "*"

2696
Pipfile.lock generated

File diff suppressed because it is too large Load Diff

121
README.md
View File

@@ -1,121 +0,0 @@
[![ci](https://github.com/paperless-ngx/paperless-ngx/workflows/ci/badge.svg)](https://github.com/paperless-ngx/paperless-ngx/actions)
[![Crowdin](https://badges.crowdin.net/paperless-ngx/localized.svg)](https://crowdin.com/project/paperless-ngx)
[![Documentation Status](https://readthedocs.org/projects/paperless-ngx/badge/?version=latest)](https://paperless-ngx.readthedocs.io/en/latest/?badge=latest)
[![Coverage Status](https://coveralls.io/repos/github/paperless-ngx/paperless-ngx/badge.svg?branch=master)](https://coveralls.io/github/paperless-ngx/paperless-ngx?branch=master)
[![Chat on Matrix](https://matrix.to/img/matrix-badge.svg)](https://matrix.to/#/%23paperlessngx%3Amatrix.org)
<p align="center">
<img src="https://github.com/paperless-ngx/paperless-ngx/raw/main/resources/logo/web/png/Black%20logo%20-%20no%20background.png#gh-light-mode-only" width="50%" />
<img src="https://github.com/paperless-ngx/paperless-ngx/raw/main/resources/logo/web/png/White%20logo%20-%20no%20background.png#gh-dark-mode-only" width="50%" />
</p>
<!-- omit in toc -->
# Paperless-ngx
Paperless-ngx is a document management system that transforms your physical documents into a searchable online archive so you can keep, well, _less paper_.
Paperless-ngx forked from [paperless-ng](https://github.com/jonaswinkler/paperless-ng) to continue the great work and distribute responsibility of supporting and advancing the project among a team of people. [Consider joining us!](#community-support) Discussion of this transition can be found in issues
[#1599](https://github.com/jonaswinkler/paperless-ng/issues/1599) and [#1632](https://github.com/jonaswinkler/paperless-ng/issues/1632).
A demo is available at [demo.paperless-ngx.com](https://demo.paperless-ngx.com) using login `demo` / `demo`. _Note: demo content is reset frequently and confidential information should not be uploaded._
- [Features](#features)
- [Getting started](#getting-started)
- [Contributing](#contributing)
- [Community Support](#community-support)
- [Translation](#translation)
- [Feature Requests](#feature-requests)
- [Bugs](#bugs)
- [Affiliated Projects](#affiliated-projects)
- [Important Note](#important-note)
# Features
![Dashboard](https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docs/_static/screenshots/documents-wchrome.png#gh-light-mode-only)
![Dashboard](https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docs/_static/screenshots/documents-wchrome-dark.png#gh-dark-mode-only)
- Organize and index your scanned documents with tags, correspondents, types, and more.
- Performs OCR on your documents, adds selectable text to image only documents and adds tags, correspondents and document types to your documents.
- Supports PDF documents, images, plain text files, and Office documents (Word, Excel, Powerpoint, and LibreOffice equivalents).
- Office document support is optional and provided by Apache Tika (see [configuration](https://paperless-ngx.readthedocs.io/en/latest/configuration.html#tika-settings))
- Paperless stores your documents plain on disk. Filenames and folders are managed by paperless and their format can be configured freely.
- Single page application front end.
- Includes a dashboard that shows basic statistics and has document upload.
- Filtering by tags, correspondents, types, and more.
- Customizable views can be saved and displayed on the dashboard.
- Full text search helps you find what you need.
- Auto completion suggests relevant words from your documents.
- Results are sorted by relevance to your search query.
- Highlighting shows you which parts of the document matched the query.
- Searching for similar documents ("More like this")
- Email processing: Paperless adds documents from your email accounts.
- Configure multiple accounts and filters for each account.
- When adding documents from mail, paperless can move these mail to a new folder, mark them as read, flag them as important or delete them.
- Machine learning powered document matching.
- Paperless-ngx learns from your documents and will be able to automatically assign tags, correspondents and types to documents once you've stored a few documents in paperless.
- Optimized for multi core systems: Paperless-ngx consumes multiple documents in parallel.
- The integrated sanity checker makes sure that your document archive is in good health.
- [More screenshots are available in the documentation](https://paperless-ngx.readthedocs.io/en/latest/screenshots.html).
# Getting started
The easiest way to deploy paperless is docker-compose. The files in the [`/docker/compose` directory](https://github.com/paperless-ngx/paperless-ngx/tree/main/docker/compose) are configured to pull the image from Github Packages.
If you'd like to jump right in, you can configure a docker-compose environment with our install script:
```bash
bash -c "$(curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)"
```
Alternatively, you can install the dependencies and setup apache and a database server yourself. The [documentation](https://paperless-ngx.readthedocs.io/en/latest/setup.html#installation) has a step by step guide on how to do it.
Migrating from Paperless-ng is easy, just drop in the new docker image! See the [documentation on migrating](https://paperless-ngx.readthedocs.io/en/latest/setup.html#migrating-from-paperless-ng) for more details.
<!-- omit in toc -->
### Documentation
The documentation for Paperless-ngx is available on [ReadTheDocs](https://paperless-ngx.readthedocs.io/).
# Contributing
If you feel like contributing to the project, please do! Bug fixes, enhancements, visual fixes etc. are always welcome. If you want to implement something big: Please start a discussion about that! The [documentation](https://paperless-ngx.readthedocs.io/en/latest/extending.html) has some basic information on how to get started.
## Community Support
People interested in continuing the work on paperless-ngx are encouraged to reach out here on github and in the [Matrix Room](https://matrix.to/#/#paperless:adnidor.de). If you would like to contribute to the project on an ongoing basis there are multiple [teams](https://github.com/orgs/paperless-ngx/people) (frontend, ci/cd, etc) that could use your help so please reach out!
## Translation
Paperless-ngx is available in many languages that are coordinated on Crowdin. If you want to help out by translating paperless-ngx into your language, please head over to https://crwd.in/paperless-ngx, and thank you! More details can be found in [CONTRIBUTING.md](https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md#translating-paperless-ngx).
## Feature Requests
Feature requests can be submitted via [GitHub Discussions](https://github.com/paperless-ngx/paperless-ngx/discussions/categories/feature-requests), you can search for existing ideas, add your own and vote for the ones you care about.
## Bugs
For bugs please [open an issue](https://github.com/paperless-ngx/paperless-ngx/issues) or [start a discussion](https://github.com/paperless-ngx/paperless-ngx/discussions) if you have questions.
# Affiliated Projects
Paperless has been around a while now, and people are starting to build stuff on top of it. If you're one of those people, we can add your project to this list:
- [Paperless App](https://github.com/bauerj/paperless_app): An Android/iOS app for Paperless-ngx. Also works with the original Paperless and Paperless-ng.
- [Paperless Share](https://github.com/qcasey/paperless_share). Share any files from your Android application with paperless. Very simple, but works with all of the mobile scanning apps out there that allow you to share scanned documents.
- [Scan to Paperless](https://github.com/sbrunner/scan-to-paperless): Scan and prepare (crop, deskew, OCR, ...) your documents for Paperless.
- [Paperless Mobile](https://github.com/astubenbord/paperless-mobile): A modern, feature rich mobile application for Paperless.
These projects also exist, but their status and compatibility with paperless-ngx is unknown.
- [paperless-cli](https://github.com/stgarf/paperless-cli): A golang command line binary to interact with a Paperless instance.
This project also exists, but needs updates to be compatible with paperless-ngx.
- [Paperless Desktop](https://github.com/thomasbrueggemann/paperless-desktop): A desktop UI for your Paperless installation. Runs on Mac, Linux, and Windows.
Known issues on Mac: (Could not load reminders and documents)
# Important Note
Document scanners are typically used to scan sensitive documents. Things like your social insurance number, tax records, invoices, etc. Everything is stored in the clear without encryption. This means that Paperless should never be run on an untrusted host. Instead, I recommend that if you do want to use it, run it locally on a server in your own home.

140
README.rst Normal file
View File

@@ -0,0 +1,140 @@
Paperless
#########
|Documentation|
|Chat|
|Travis|
|Dependencies|
Scan, index, and archive all of your paper documents
I hate paper. Environmental issues aside, it's a tech person's nightmare:
* There's no search feature
* It takes up physical space
* Backups mean more paper
In the past few months I've been bitten more than a few times by the problem
of not having the right document around. Sometimes I recycled a document I
needed (who keeps water bills for two years?) and other times I just lost
it... because paper. I wrote this to make my life easier.
How it Works
============
1. Buy a document scanner like `this one`_.
2. Set it up to "scan to FTP" or something similar. It should be able to push
scanned images to a server without you having to do anything. If your
scanner doesn't know how to automatically upload the file somewhere, you can
always do that manually. Paperless doesn't care how the documents get into
its local consumption directory.
3. Have the target server run the Paperless consumption script to OCR the PDF
and index it into a local database.
4. Use the web frontend to sift through the database and find what you want.
5. Download the PDF you need/want via the web interface and do whatever you
like with it. You can even print it and send it as if it's the original.
In most cases, no one will care or notice.
Here's what you get:
.. image:: docs/_static/screenshot.png
:alt: The before and after
:target: docs/_static/screenshot.png
Stability
=========
Paperless is still under active development (just look at the git commit
history) so don't expect it to be 100% stable. I'm using it for my own
documents, but I'm crazy like that. If you use this and it breaks something,
you get to keep all the shiny pieces.
Requirements
============
This is all really a quite simple, shiny, user-friendly wrapper around some very
powerful tools.
* `ImageMagick`_ converts the images between colour and greyscale.
* `Tesseract`_ does the character recognition.
* `Unpaper`_ despeckles and deskews the scanned image.
* `GNU Privacy Guard`_ is used as the encryption backend.
* `Python 3`_ is the language of the project.
* `Pillow`_ loads the image data as a python object to be used with PyOCR.
* `PyOCR`_ is a slick programmatic wrapper around tesseract.
* `Django`_ is the framework this project is written against.
* `Python-GNUPG`_ decrypts the PDFs on-the-fly to allow you to download
unencrypted files, leaving the encrypted ones on-disk.
Documentation
=============
It's all available on `ReadTheDocs`_.
Similar Projects
================
There's another project out there called `Mayan EDMS`_ that has a surprising
amount of technical overlap with Paperless. Also based on Django and using
a consumer model with Tesseract and unpaper, Mayan EDMS is *much* more
featureful and comes with a slick UI as well. It may be that Paperless is
better suited for low-resource environments (like a Rasberry Pi), but to be
honest, this is just a guess as I haven't tested this myself. One thing's
for certain though, *Paperless* is a **much** better name.
Important Note
==============
Document scanners are typically used to scan sensitive documents. Things like
your social insurance number, tax records, invoices, etc. While paperless
encrypts the original PDFs via the consumption script, the OCR'd text is *not*
encrypted and is therefore stored in the clear (it needs to be searchable, so
if someone has ideas on how to do that on encrypted data, I'm all ears). This
means that paperless should never be run on an untrusted host. Instead, I
recommend that if you do want to use it, run it locally on a server in your own
home.
Donations
=========
As with all Free software, the power is less in the finances and more in the
collective efforts. I really appreciate every pull request and bug report
offered up by Paperless' users, so please keep that stuff coming. If however,
you're not one for coding/design/documentation, and would like to contribute
financially, I won't say no ;-)
The thing is, I'm doing ok for money, so I would instead ask you to donate to
the `United Nations High Commissioner for Refugees`_. They're doing important
work and they need the money a lot more than I do.
.. _this one: http://www.brother.ca/en-CA/Scanners/11/ProductDetail/ADS1500W?ProductDetail=productdetail
.. _ImageMagick: http://imagemagick.org/
.. _Tesseract: https://github.com/tesseract-ocr
.. _Unpaper: https://www.flameeyes.eu/projects/unpaper
.. _GNU Privacy Guard: https://gnupg.org/
.. _Python 3: https://python.org/
.. _Pillow: https://pypi.python.org/pypi/pillowfight/
.. _PyOCR: https://github.com/jflesch/pyocr
.. _Django: https://www.djangoproject.com/
.. _Python-GNUPG: http://pythonhosted.org/python-gnupg/
.. _ReadTheDocs: https://paperless.readthedocs.org/
.. _Mayan EDMS: https://mayan.readthedocs.org/en/latest/
.. _United Nations High Commissioner for Refugees: https://donate.unhcr.org/int-en/general
.. |Documentation| image:: https://readthedocs.org/projects/paperless/badge/?version=latest
:alt: Read the documentation at https://paperless.readthedocs.org/
:target: https://paperless.readthedocs.org/
.. |Chat| image:: https://badges.gitter.im/danielquinn/paperless.svg
:alt: Join the chat at https://gitter.im/danielquinn/paperless
:target: https://gitter.im/danielquinn/paperless?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
.. |Travis| image:: https://travis-ci.org/danielquinn/paperless.svg?branch=master
:target: https://travis-ci.org/danielquinn/paperless
.. |Dependencies| image:: https://www.versioneye.com/user/projects/57b33b81d9f1b00016faa500/badge.svg?style=flat-square
:target: https://www.versioneye.com/user/projects/57b33b81d9f1b00016faa500

15
Vagrantfile vendored Normal file
View File

@@ -0,0 +1,15 @@
# -*- mode: ruby -*-
# vi: set ft=ruby :
VAGRANT_API_VERSION = "2"
Vagrant.configure(VAGRANT_API_VERSION) do |config|
config.vm.box = "ubuntu/trusty64"
# Provision using shell
config.vm.host_name = "dev.paperless"
config.vm.synced_folder ".", "/opt/paperless"
config.vm.provision "shell", path: "scripts/vagrant-provision"
# Networking details
config.vm.network "private_network", ip: "172.28.128.4"
end

View File

@@ -1,47 +0,0 @@
#!/usr/bin/env bash
# Helper script for building the Docker image locally.
# Parses and provides the nessecary versions of other images to Docker
# before passing in the rest of script args.
# First Argument: The Dockerfile to build
# Other Arguments: Additional arguments to docker build
# Example Usage:
# ./build-docker-image.sh Dockerfile -t paperless-ngx:my-awesome-feature
set -eux
if ! command -v jq; then
echo "jq required"
exit 1
elif [ ! -f "$1" ]; then
echo "$1 is not a file, please provide the Dockerfile"
exit 1
fi
# Parse what we can from Pipfile.lock
pikepdf_version=$(jq ".default.pikepdf.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
psycopg2_version=$(jq ".default.psycopg2.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
pillow_version=$(jq ".default.pillow.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
lxml_version=$(jq ".default.lxml.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
# Read this from the other config file
qpdf_version=$(jq ".qpdf.version" .build-config.json | sed 's/"//g')
jbig2enc_version=$(jq ".jbig2enc.version" .build-config.json | sed 's/"//g')
# Get the branch name (used for caching)
branch_name=$(git rev-parse --abbrev-ref HEAD)
# https://docs.docker.com/develop/develop-images/build_enhancements/
# Required to use cache-from
export DOCKER_BUILDKIT=1
docker build --file "$1" \
--progress=plain \
--cache-from ghcr.io/paperless-ngx/paperless-ngx/builder/cache/app:"${branch_name}" \
--cache-from ghcr.io/paperless-ngx/paperless-ngx/builder/cache/app:dev \
--build-arg JBIG2ENC_VERSION="${jbig2enc_version}" \
--build-arg QPDF_VERSION="${qpdf_version}" \
--build-arg PIKEPDF_VERSION="${pikepdf_version}" \
--build-arg PILLOW_VERSION="${pillow_version}" \
--build-arg LXML_VERSION="${lxml_version}" \
--build-arg PSYCOPG2_VERSION="${psycopg2_version}" "${@:2}" .

View File

@@ -1,6 +0,0 @@
commit_message: '[ci skip]'
files:
- source: /src/locale/en_US/LC_MESSAGES/django.po
translation: /src/locale/%locale_with_underscore%/LC_MESSAGES/django.po
- source: /src-ui/messages.xlf
translation: /src-ui/src/locale/messages.%locale_with_underscore%.xlf

View File

@@ -1,35 +0,0 @@
# This Dockerfile compiles the jbig2enc library
# Inputs:
# - JBIG2ENC_VERSION - the Git tag to checkout and build
FROM debian:bullseye-slim as main
LABEL org.opencontainers.image.description="A intermediate image with jbig2enc built"
ARG DEBIAN_FRONTEND=noninteractive
ARG JBIG2ENC_VERSION
ARG BUILD_PACKAGES="\
build-essential \
automake \
libtool \
libleptonica-dev \
zlib1g-dev \
git \
ca-certificates"
WORKDIR /usr/src/jbig2enc
RUN set -eux \
&& echo "Installing build tools" \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& echo "Building jbig2enc" \
&& git clone --quiet --branch $JBIG2ENC_VERSION https://github.com/agl/jbig2enc . \
&& ./autogen.sh \
&& ./configure \
&& make \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& rm -rf /var/lib/apt/lists/*

View File

@@ -1,92 +0,0 @@
# This Dockerfile builds the pikepdf wheel
# Inputs:
# - REPO - Docker repository to pull qpdf from
# - QPDF_VERSION - The image qpdf version to copy .deb files from
# - PIKEPDF_VERSION - Version of pikepdf to build wheel for
# Default to pulling from the main repo registry when manually building
ARG REPO="paperless-ngx/paperless-ngx"
ARG QPDF_VERSION
FROM ghcr.io/${REPO}/builder/qpdf:${QPDF_VERSION} as qpdf-builder
# This does nothing, except provide a name for a copy below
FROM python:3.9-slim-bullseye as main
LABEL org.opencontainers.image.description="A intermediate image with pikepdf wheel built"
ARG DEBIAN_FRONTEND=noninteractive
ARG PIKEPDF_VERSION
# These are not used, but will still bust the cache if one changes
# Otherwise, the main image will try to build thing (and fail)
ARG PILLOW_VERSION
ARG LXML_VERSION
ARG BUILD_PACKAGES="\
build-essential \
python3-dev \
python3-pip \
# qpdf requirement - https://github.com/qpdf/qpdf#crypto-providers
libgnutls28-dev \
# lxml requrements - https://lxml.de/installation.html
libxml2-dev \
libxslt1-dev \
# Pillow requirements - https://pillow.readthedocs.io/en/stable/installation.html#external-libraries
# JPEG functionality
libjpeg62-turbo-dev \
# conpressed PNG
zlib1g-dev \
# compressed TIFF
libtiff-dev \
# type related services
libfreetype-dev \
# color management
liblcms2-dev \
# WebP format
libwebp-dev \
# JPEG 2000
libopenjp2-7-dev \
# improved color quantization
libimagequant-dev \
# complex text layout support
libraqm-dev"
WORKDIR /usr/src
COPY --from=qpdf-builder /usr/src/qpdf/*.deb ./
# As this is an base image for a multi-stage final image
# the added size of the install is basically irrelevant
RUN set -eux \
&& echo "Installing build tools" \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& echo "Installing qpdf" \
&& dpkg --install libqpdf29_*.deb \
&& dpkg --install libqpdf-dev_*.deb \
&& echo "Installing Python tools" \
&& python3 -m pip install --no-cache-dir --upgrade \
pip \
wheel \
# https://pikepdf.readthedocs.io/en/latest/installation.html#requirements
pybind11 \
&& echo "Building pikepdf wheel ${PIKEPDF_VERSION}" \
&& mkdir wheels \
&& python3 -m pip wheel \
# Build the package at the required version
pikepdf==${PIKEPDF_VERSION} \
# Output the *.whl into this directory
--wheel-dir wheels \
# Do not use a binary packge for the package being built
--no-binary=pikepdf \
# Do use binary packages for dependencies
--prefer-binary \
# Don't cache build files
--no-cache-dir \
&& ls -ahl wheels \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& rm -rf /var/lib/apt/lists/*

View File

@@ -1,48 +0,0 @@
# This Dockerfile builds the psycopg2 wheel
# Inputs:
# - PSYCOPG2_VERSION - Version to build
FROM python:3.9-slim-bullseye as main
LABEL org.opencontainers.image.description="A intermediate image with psycopg2 wheel built"
ARG PSYCOPG2_VERSION
ARG DEBIAN_FRONTEND=noninteractive
ARG BUILD_PACKAGES="\
build-essential \
python3-dev \
python3-pip \
# https://www.psycopg.org/docs/install.html#prerequisites
libpq-dev"
WORKDIR /usr/src
# As this is an base image for a multi-stage final image
# the added size of the install is basically irrelevant
RUN set -eux \
&& echo "Installing build tools" \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& echo "Installing Python tools" \
&& python3 -m pip install --no-cache-dir --upgrade pip wheel \
&& echo "Building psycopg2 wheel ${PSYCOPG2_VERSION}" \
&& cd /usr/src \
&& mkdir wheels \
&& python3 -m pip wheel \
# Build the package at the required version
psycopg2==${PSYCOPG2_VERSION} \
# Output the *.whl into this directory
--wheel-dir wheels \
# Do not use a binary packge for the package being built
--no-binary=psycopg2 \
# Do use binary packages for dependencies
--prefer-binary \
# Don't cache build files
--no-cache-dir \
&& ls -ahl wheels/ \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& rm -rf /var/lib/apt/lists/*

View File

@@ -1,48 +0,0 @@
# This Dockerfile compiles the jbig2enc library
# Inputs:
# - QPDF_VERSION - the version of qpdf to build a .deb.
# Must be present as a deb-src in bookworm
FROM debian:bullseye-slim as main
LABEL org.opencontainers.image.description="A intermediate image with qpdf built"
ARG DEBIAN_FRONTEND=noninteractive
# This must match to pikepdf's minimum at least
ARG QPDF_VERSION
ARG BUILD_PACKAGES="\
build-essential \
debhelper \
debian-keyring \
devscripts \
equivs \
libtool \
# https://qpdf.readthedocs.io/en/stable/installation.html#system-requirements
libjpeg62-turbo-dev \
libgnutls28-dev \
packaging-dev \
cmake \
zlib1g-dev"
WORKDIR /usr/src
RUN set -eux \
&& echo "Installing build tools" \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends $BUILD_PACKAGES \
&& echo "Getting qpdf src" \
&& echo "deb-src http://deb.debian.org/debian/ bookworm main" > /etc/apt/sources.list.d/bookworm-src.list \
&& apt-get update \
&& mkdir qpdf \
&& cd qpdf \
&& apt-get source --yes --quiet qpdf=${QPDF_VERSION}-1/bookworm \
&& echo "Building qpdf" \
&& cd qpdf-$QPDF_VERSION \
&& export DEB_BUILD_OPTIONS="terse nocheck nodoc parallel=2" \
&& dpkg-buildpackage --build=binary --unsigned-source --unsigned-changes --post-clean \
&& ls -ahl ../*.deb \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& rm -rf /var/lib/apt/lists/*

View File

@@ -0,0 +1,15 @@
# Environment variables to set for Paperless
# Commented out variables will be replaced by a default within Paperless.
# Passphrase Paperless uses to encrypt and decrypt your documents
PAPERLESS_PASSPHRASE=CHANGE_ME
# The amount of threads to use for text recognition
# PAPERLESS_OCR_THREADS=4
# Additional languages to install for text recognition
# PAPERLESS_OCR_LANGUAGES=deu ita
# You can change the default user and group id to a custom one
# USERMAP_UID=1000
# USERMAP_GID=1000

View File

@@ -0,0 +1,41 @@
version: '2'
services:
webserver:
image: pitkley/paperless
ports:
# You can adapt the port you want Paperless to listen on by
# modifying the part before the `:`.
- "8000:8000"
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
env_file: docker-compose.env
# The reason the line is here is so that the webserver that doesn't do
# any text recognition and doesn't have to install unnecessary
# languages the user might have set in the env-file by overwriting the
# value with nothing.
environment:
- PAPERLESS_OCR_LANGUAGES=
command: ["runserver", "0.0.0.0:8000"]
consumer:
image: pitkley/paperless
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
# You have to adapt the local path you want the consumption
# directory to mount to by modifying the part before the ':'.
- /path/to/arbitrary/place:/consume
# Likewise, you can add a local path to mount a directory for
# exporting. This is not strictly needed for paperless to
# function, only if you're exporting your files: uncomment
# it and fill in a local path if you know you're going to
# want to export your documents.
# - /path/to/another/arbitrary/place:/export
env_file: docker-compose.env
command: ["document_consumer"]
volumes:
data:
media:

View File

@@ -1 +0,0 @@
COMPOSE_PROJECT_NAME=paperless

View File

@@ -1,42 +0,0 @@
# The UID and GID of the user used to run paperless in the container. Set this
# to your UID and GID on the host so that you have write access to the
# consumption directory.
#USERMAP_UID=1000
#USERMAP_GID=1000
# Additional languages to install for text recognition, separated by a
# whitespace. Note that this is
# different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
# language used for OCR.
# The container installs English, German, Italian, Spanish and French by
# default.
# See https://packages.debian.org/search?keywords=tesseract-ocr-&searchon=names&suite=buster
# for available languages.
#PAPERLESS_OCR_LANGUAGES=tur ces
###############################################################################
# Paperless-specific settings #
###############################################################################
# All settings defined in the paperless.conf.example can be used here. The
# Docker setup does not use the configuration file.
# A few commonly adjusted settings are provided below.
# This is required if you will be exposing Paperless-ngx on a public domain
# (if doing so please consider security measures such as reverse proxy)
#PAPERLESS_URL=https://paperless.example.com
# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
#PAPERLESS_SECRET_KEY=change-me
# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
#PAPERLESS_TIME_ZONE=America/Los_Angeles
# The default language to use for OCR. Set this to the language most of your
# documents are written in.
#PAPERLESS_OCR_LANGUAGE=eng
# Set if accessing paperless via a domain subpath e.g. https://domain.com/PATHPREFIX and using a reverse-proxy like traefik or nginx
#PAPERLESS_FORCE_SCRIPT_NAME=/PATHPREFIX
#PAPERLESS_STATIC_URL=/PATHPREFIX/static/ # trailing slash required

View File

@@ -1,102 +0,0 @@
# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), MariaDB is used as the database server.
# - Apache Tika and Gotenberg servers are started with paperless and paperless
# is configured to use these services. These provide support for consuming
# Office documents (Word, Excel, Power Point and their LibreOffice counter-
# parts.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/mariadb:10
restart: unless-stopped
volumes:
- dbdata:/var/lib/mysql
environment:
MARIADB_HOST: paperless
MARIADB_DATABASE: paperless
MARIADB_USER: paperless
MARIADB_PASSWORD: paperless
MARIADB_ROOT_PASSWORD: paperless
ports:
- "3306:3306"
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
- gotenberg
- tika
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBENGINE: mariadb
PAPERLESS_DBHOST: db
PAPERLESS_DBUSER: paperless # only needed if non-default username
PAPERLESS_DBPASS: paperless # only needed if non-default password
PAPERLESS_DBPORT: 3306
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: docker.io/gotenberg/gotenberg:7.6
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
tika:
image: ghcr.io/paperless-ngx/tika:latest
restart: unless-stopped
volumes:
data:
media:
dbdata:
redisdata:

View File

@@ -1,83 +0,0 @@
# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), MariaDB is used as the database server.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/mariadb:10
restart: unless-stopped
volumes:
- dbdata:/var/lib/mysql
environment:
MARIADB_HOST: paperless
MARIADB_DATABASE: paperless
MARIADB_USER: paperless
MARIADB_PASSWORD: paperless
MARIADB_ROOT_PASSWORD: paperless
ports:
- "3306:3306"
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBENGINE: mariadb
PAPERLESS_DBHOST: db
PAPERLESS_DBUSER: paperless # only needed if non-default username
PAPERLESS_DBPASS: paperless # only needed if non-default password
PAPERLESS_DBPORT: 3306
volumes:
data:
media:
dbdata:
redisdata:

View File

@@ -1,97 +0,0 @@
# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8010.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), PostgreSQL is used as the database server.
#
# To install and update paperless with this file, do the following:
#
# - Open portainer Stacks list and click 'Add stack'
# - Paste the contents of this file and assign a name, e.g. 'Paperless'
# - Click 'Deploy the stack' and wait for it to be deployed
# - Open the list of containers, select paperless_webserver_1
# - Click 'Console' and then 'Connect' to open the command line inside the container
# - Run 'python3 manage.py createsuperuser' to create a user
# - Exit the console
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- 8010:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
# The UID and GID of the user used to run paperless in the container. Set this
# to your UID and GID on the host so that you have write access to the
# consumption directory.
USERMAP_UID: 1000
USERMAP_GID: 100
# Additional languages to install for text recognition, separated by a
# whitespace. Note that this is
# different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
# language used for OCR.
# The container installs English, German, Italian, Spanish and French by
# default.
# See https://packages.debian.org/search?keywords=tesseract-ocr-&searchon=names&suite=buster
# for available languages.
#PAPERLESS_OCR_LANGUAGES: tur ces
# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
#PAPERLESS_SECRET_KEY: change-me
# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
#PAPERLESS_TIME_ZONE: America/Los_Angeles
# The default language to use for OCR. Set this to the language most of your
# documents are written in.
#PAPERLESS_OCR_LANGUAGE: eng
volumes:
data:
media:
pgdata:
redisdata:

View File

@@ -1,94 +0,0 @@
# docker-compose file for running paperless from the docker container registry.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), PostgreSQL is used as the database server.
# - Apache Tika and Gotenberg servers are started with paperless and paperless
# is configured to use these services. These provide support for consuming
# Office documents (Word, Excel, Power Point and their LibreOffice counter-
# parts.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
- gotenberg
- tika
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: docker.io/gotenberg/gotenberg:7.6
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
tika:
image: ghcr.io/paperless-ngx/tika:latest
restart: unless-stopped
volumes:
data:
media:
pgdata:
redisdata:

View File

@@ -1,75 +0,0 @@
# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), PostgreSQL is used as the database server.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
volumes:
data:
media:
pgdata:
redisdata:

View File

@@ -1,81 +0,0 @@
# docker-compose file for running paperless from the docker container registry.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# SQLite is used as the database. The SQLite file is stored in the data volume.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Apache Tika and Gotenberg servers are started with paperless and paperless
# is configured to use these services. These provide support for consuming
# Office documents (Word, Excel, Power Point and their LibreOffice counter-
# parts.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redisdata:/data
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- broker
- gotenberg
- tika
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: docker.io/gotenberg/gotenberg:7.6
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
tika:
image: ghcr.io/paperless-ngx/tika:latest
restart: unless-stopped
volumes:
data:
media:
redisdata:

View File

@@ -1,59 +0,0 @@
# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# SQLite is used as the database. The SQLite file is stored in the data volume.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redisdata:/data
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- broker
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
volumes:
data:
media:
redisdata:

View File

@@ -1,193 +0,0 @@
#!/usr/bin/env bash
set -e
# Adapted from:
# https://github.com/docker-library/postgres/blob/master/docker-entrypoint.sh
# usage: file_env VAR
# ie: file_env 'XYZ_DB_PASSWORD' will allow for "$XYZ_DB_PASSWORD_FILE" to
# fill in the value of "$XYZ_DB_PASSWORD" from a file, especially for Docker's
# secrets feature
file_env() {
local -r var="$1"
local -r fileVar="${var}_FILE"
# Basic validation
if [ "${!var:-}" ] && [ "${!fileVar:-}" ]; then
echo >&2 "error: both $var and $fileVar are set (but are exclusive)"
exit 1
fi
# Only export var if the _FILE exists
if [ "${!fileVar:-}" ]; then
# And the file exists
if [[ -f ${!fileVar} ]]; then
echo "Setting ${var} from file"
val="$(< "${!fileVar}")"
export "$var"="$val"
else
echo "File ${!fileVar} doesn't exist"
exit 1
fi
fi
}
# Source: https://github.com/sameersbn/docker-gitlab/
map_uidgid() {
local -r usermap_original_uid=$(id -u paperless)
local -r usermap_original_gid=$(id -g paperless)
local -r usermap_new_uid=${USERMAP_UID:-$usermap_original_uid}
local -r usermap_new_gid=${USERMAP_GID:-${usermap_original_gid:-$usermap_new_uid}}
if [[ ${usermap_new_uid} != "${usermap_original_uid}" || ${usermap_new_gid} != "${usermap_original_gid}" ]]; then
echo "Mapping UID and GID for paperless:paperless to $usermap_new_uid:$usermap_new_gid"
usermod -o -u "${usermap_new_uid}" paperless
groupmod -o -g "${usermap_new_gid}" paperless
fi
}
map_folders() {
# Export these so they can be used in docker-prepare.sh
export DATA_DIR="${PAPERLESS_DATA_DIR:-/usr/src/paperless/data}"
export MEDIA_ROOT_DIR="${PAPERLESS_MEDIA_ROOT:-/usr/src/paperless/media}"
export CONSUME_DIR="${PAPERLESS_CONSUMPTION_DIR:-/usr/src/paperless/consume}"
}
nltk_data () {
# Store the NLTK data outside the Docker container
local -r nltk_data_dir="${DATA_DIR}/nltk"
local -r truthy_things=("yes y 1 t true")
# If not set, or it looks truthy
if [[ -z "${PAPERLESS_ENABLE_NLTK}" ]] || [[ "${truthy_things[*]}" =~ ${PAPERLESS_ENABLE_NLTK,} ]]; then
# Download or update the snowball stemmer data
python3 -W ignore::RuntimeWarning -m nltk.downloader -d "${nltk_data_dir}" snowball_data
# Download or update the stopwords corpus
python3 -W ignore::RuntimeWarning -m nltk.downloader -d "${nltk_data_dir}" stopwords
# Download or update the punkt tokenizer data
python3 -W ignore::RuntimeWarning -m nltk.downloader -d "${nltk_data_dir}" punkt
else
echo "Skipping NLTK data download"
fi
}
initialize() {
# Setup environment from secrets before anything else
for env_var in \
PAPERLESS_DBUSER \
PAPERLESS_DBPASS \
PAPERLESS_SECRET_KEY \
PAPERLESS_AUTO_LOGIN_USERNAME \
PAPERLESS_ADMIN_USER \
PAPERLESS_ADMIN_MAIL \
PAPERLESS_ADMIN_PASSWORD \
PAPERLESS_REDIS; do
# Check for a version of this var with _FILE appended
# and convert the contents to the env var value
file_env ${env_var}
done
# Change the user and group IDs if needed
map_uidgid
# Check for overrides of certain folders
map_folders
local -r export_dir="/usr/src/paperless/export"
for dir in \
"${export_dir}" \
"${DATA_DIR}" "${DATA_DIR}/index" \
"${MEDIA_ROOT_DIR}" "${MEDIA_ROOT_DIR}/documents" "${MEDIA_ROOT_DIR}/documents/originals" "${MEDIA_ROOT_DIR}/documents/thumbnails" \
"${CONSUME_DIR}"; do
if [[ ! -d "${dir}" ]]; then
echo "Creating directory ${dir}"
mkdir "${dir}"
fi
done
local -r tmp_dir="/tmp/paperless"
echo "Creating directory ${tmp_dir}"
mkdir -p "${tmp_dir}"
nltk_data
set +e
echo "Adjusting permissions of paperless files. This may take a while."
chown -R paperless:paperless ${tmp_dir}
for dir in \
"${export_dir}" \
"${DATA_DIR}" \
"${MEDIA_ROOT_DIR}" \
"${CONSUME_DIR}"; do
find "${dir}" -not \( -user paperless -and -group paperless \) -exec chown paperless:paperless {} +
done
set -e
"${gosu_cmd[@]}" /sbin/docker-prepare.sh
}
install_languages() {
echo "Installing languages..."
read -ra langs <<<"$1"
# Check that it is not empty
if [ ${#langs[@]} -eq 0 ]; then
return
fi
apt-get update
for lang in "${langs[@]}"; do
pkg="tesseract-ocr-$lang"
# English is installed by default
#if [[ "$lang" == "eng" ]]; then
# continue
#fi
if dpkg -s "$pkg" &>/dev/null; then
echo "Package $pkg already installed!"
continue
fi
if ! apt-cache show "$pkg" &>/dev/null; then
echo "Package $pkg not found! :("
continue
fi
echo "Installing package $pkg..."
if ! apt-get -y install "$pkg" &>/dev/null; then
echo "Could not install $pkg"
exit 1
fi
done
}
echo "Paperless-ngx docker container starting..."
gosu_cmd=(gosu paperless)
if [ "$(id -u)" == "$(id -u paperless)" ]; then
gosu_cmd=()
fi
# Install additional languages if specified
if [[ -n "$PAPERLESS_OCR_LANGUAGES" ]]; then
install_languages "$PAPERLESS_OCR_LANGUAGES"
fi
initialize
if [[ "$1" != "/"* ]]; then
echo Executing management command "$@"
exec "${gosu_cmd[@]}" python3 manage.py "$@"
else
echo Executing "$@"
exec "$@"
fi

View File

@@ -1,152 +0,0 @@
#!/usr/bin/env bash
set -e
wait_for_postgres() {
local attempt_num=1
local -r max_attempts=5
echo "Waiting for PostgreSQL to start..."
local -r host="${PAPERLESS_DBHOST:-localhost}"
local -r port="${PAPERLESS_DBPORT:-5432}"
# Disable warning, host and port can't have spaces
# shellcheck disable=SC2086
while [ ! "$(pg_isready -h ${host} -p ${port})" ]; do
if [ $attempt_num -eq $max_attempts ]; then
echo "Unable to connect to database."
exit 1
else
echo "Attempt $attempt_num failed! Trying again in 5 seconds..."
fi
attempt_num=$(("$attempt_num" + 1))
sleep 5
done
}
wait_for_mariadb() {
echo "Waiting for MariaDB to start..."
local -r host="${PAPERLESS_DBHOST:=localhost}"
local -r port="${PAPERLESS_DBPORT:=3306}"
local attempt_num=1
local -r max_attempts=5
while ! true > /dev/tcp/$host/$port; do
if [ $attempt_num -eq $max_attempts ]; then
echo "Unable to connect to database."
exit 1
else
echo "Attempt $attempt_num failed! Trying again in 5 seconds..."
fi
attempt_num=$(("$attempt_num" + 1))
sleep 5
done
}
wait_for_redis() {
# We use a Python script to send the Redis ping
# instead of installing redis-tools just for 1 thing
if ! python3 /sbin/wait-for-redis.py; then
exit 1
fi
}
migrations() {
(
# flock is in place to prevent multiple containers from doing migrations
# simultaneously. This also ensures that the db is ready when the command
# of the current container starts.
flock 200
echo "Apply database migrations..."
python3 manage.py migrate
) 200>"${DATA_DIR}/migration_lock"
}
search_index() {
local -r index_version=1
local -r index_version_file=${DATA_DIR}/.index_version
if [[ (! -f "${index_version_file}") || $(<"${index_version_file}") != "$index_version" ]]; then
echo "Search index out of date. Updating..."
python3 manage.py document_index reindex --no-progress-bar
echo ${index_version} | tee "${index_version_file}" >/dev/null
fi
}
superuser() {
if [[ -n "${PAPERLESS_ADMIN_USER}" ]]; then
python3 manage.py manage_superuser
fi
}
custom_container_init() {
# Mostly borrowed from the LinuxServer.io base image
# https://github.com/linuxserver/docker-baseimage-ubuntu/tree/bionic/root/etc/cont-init.d
local -r custom_script_dir="/custom-cont-init.d"
# Tamper checking.
# Don't run files which are owned by anyone except root
# Don't run files which are writeable by others
if [ -d "${custom_script_dir}" ]; then
if [ -n "$(/usr/bin/find "${custom_script_dir}" -maxdepth 1 ! -user root)" ]; then
echo "**** Potential tampering with custom scripts detected ****"
echo "**** The folder '${custom_script_dir}' must be owned by root ****"
return 0
fi
if [ -n "$(/usr/bin/find "${custom_script_dir}" -maxdepth 1 -perm -o+w)" ]; then
echo "**** The folder '${custom_script_dir}' or some of contents have write permissions for others, which is a security risk. ****"
echo "**** Please review the permissions and their contents to make sure they are owned by root, and can only be modified by root. ****"
return 0
fi
# Make sure custom init directory has files in it
if [ -n "$(/bin/ls -A "${custom_script_dir}" 2>/dev/null)" ]; then
echo "[custom-init] files found in ${custom_script_dir} executing"
# Loop over files in the directory
for SCRIPT in "${custom_script_dir}"/*; do
NAME="$(basename "${SCRIPT}")"
if [ -f "${SCRIPT}" ]; then
echo "[custom-init] ${NAME}: executing..."
/bin/bash "${SCRIPT}"
echo "[custom-init] ${NAME}: exited $?"
elif [ ! -f "${SCRIPT}" ]; then
echo "[custom-init] ${NAME}: is not a file"
fi
done
else
echo "[custom-init] no custom files found exiting..."
fi
fi
}
do_work() {
if [[ "${PAPERLESS_DBENGINE}" == "mariadb" ]]; then
wait_for_mariadb
elif [[ -n "${PAPERLESS_DBHOST}" ]]; then
wait_for_postgres
fi
wait_for_redis
migrations
search_index
superuser
# Leave this last thing
custom_container_init
}
do_work

View File

@@ -1,7 +0,0 @@
#!/usr/bin/env bash
echo "Checking if we should start flower..."
if [[ -n "${PAPERLESS_ENABLE_FLOWER}" ]]; then
celery --app paperless flower
fi

View File

@@ -1,96 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE policymap [
<!ELEMENT policymap (policy)+>
<!ATTLIST policymap xmlns CDATA #FIXED ''>
<!ELEMENT policy EMPTY>
<!ATTLIST policy xmlns CDATA #FIXED '' domain NMTOKEN #REQUIRED
name NMTOKEN #IMPLIED pattern CDATA #IMPLIED rights NMTOKEN #IMPLIED
stealth NMTOKEN #IMPLIED value CDATA #IMPLIED>
]>
<!--
Configure ImageMagick policies.
Domains include system, delegate, coder, filter, path, or resource.
Rights include none, read, write, execute and all. Use | to combine them,
for example: "read | write" to permit read from, or write to, a path.
Use a glob expression as a pattern.
Suppose we do not want users to process MPEG video images:
<policy domain="delegate" rights="none" pattern="mpeg:decode" />
Here we do not want users reading images from HTTP:
<policy domain="coder" rights="none" pattern="HTTP" />
The /repository file system is restricted to read only. We use a glob
expression to match all paths that start with /repository:
<policy domain="path" rights="read" pattern="/repository/*" />
Lets prevent users from executing any image filters:
<policy domain="filter" rights="none" pattern="*" />
Any large image is cached to disk rather than memory:
<policy domain="resource" name="area" value="1GP"/>
Define arguments for the memory, map, area, width, height and disk resources
with SI prefixes (.e.g 100MB). In addition, resource policies are maximums
for each instance of ImageMagick (e.g. policy memory limit 1GB, -limit 2GB
exceeds policy maximum so memory limit is 1GB).
Rules are processed in order. Here we want to restrict ImageMagick to only
read or write a small subset of proven web-safe image types:
<policy domain="delegate" rights="none" pattern="*" />
<policy domain="filter" rights="none" pattern="*" />
<policy domain="coder" rights="none" pattern="*" />
<policy domain="coder" rights="read|write" pattern="{GIF,JPEG,PNG,WEBP}" />
-->
<policymap>
<!-- <policy domain="system" name="shred" value="2"/> -->
<!-- <policy domain="system" name="precision" value="6"/> -->
<!-- <policy domain="system" name="memory-map" value="anonymous"/> -->
<!-- <policy domain="system" name="max-memory-request" value="256MiB"/> -->
<!-- <policy domain="resource" name="temporary-path" value="/tmp"/> -->
<policy domain="resource" name="memory" value="256MiB"/>
<policy domain="resource" name="map" value="512MiB"/>
<policy domain="resource" name="width" value="16KP"/>
<policy domain="resource" name="height" value="16KP"/>
<!-- <policy domain="resource" name="list-length" value="128"/> -->
<policy domain="resource" name="area" value="128MB"/>
<policy domain="resource" name="disk" value="1GiB"/>
<!-- <policy domain="resource" name="file" value="768"/> -->
<!-- <policy domain="resource" name="thread" value="4"/> -->
<!-- <policy domain="resource" name="throttle" value="0"/> -->
<!-- <policy domain="resource" name="time" value="3600"/> -->
<!-- <policy domain="coder" rights="none" pattern="MVG" /> -->
<!-- <policy domain="module" rights="none" pattern="{PS,PDF,XPS}" /> -->
<!-- <policy domain="delegate" rights="none" pattern="HTTPS" /> -->
<!-- <policy domain="path" rights="none" pattern="@*" /> -->
<!-- <policy domain="cache" name="memory-map" value="anonymous"/> -->
<!-- <policy domain="cache" name="synchronize" value="True"/> -->
<!-- <policy domain="cache" name="shared-secret" value="passphrase" stealth="true"/> -->
<!-- <policy domain="system" name="pixel-cache-memory" value="anonymous"/> -->
<!-- <policy domain="system" name="shred" value="2"/> -->
<!-- <policy domain="system" name="precision" value="6"/> -->
<!-- not needed due to the need to use explicitly by mvg: -->
<!-- <policy domain="delegate" rights="none" pattern="MVG" /> -->
<!-- use curl -->
<policy domain="delegate" rights="none" pattern="URL" />
<policy domain="delegate" rights="none" pattern="HTTPS" />
<policy domain="delegate" rights="none" pattern="HTTP" />
<!-- in order to avoid to get image with password text -->
<policy domain="path" rights="none" pattern="@*"/>
<!-- disable ghostscript format types -->
<policy domain="coder" rights="none" pattern="PS" />
<policy domain="coder" rights="none" pattern="PS2" />
<policy domain="coder" rights="none" pattern="PS3" />
<policy domain="coder" rights="none" pattern="EPS" />
<policy domain="coder" rights="read|write" pattern="PDF" />
<policy domain="coder" rights="none" pattern="XPS" />
</policymap>

View File

@@ -1,21 +0,0 @@
#!/usr/bin/env bash
set -eu
for command in decrypt_documents \
document_archiver \
document_exporter \
document_importer \
mail_fetcher \
document_create_classifier \
document_index \
document_renamer \
document_retagger \
document_thumbnails \
document_sanity_checker \
manage_superuser;
do
echo "installing $command..."
sed "s/management_command/$command/g" management_script.sh > /usr/local/bin/$command
chmod +x /usr/local/bin/$command
done

View File

@@ -1,15 +0,0 @@
#!/usr/bin/env bash
set -e
cd /usr/src/paperless/src/
if [[ $(id -u) == 0 ]] ;
then
gosu paperless python3 manage.py management_command "$@"
elif [[ $(id -un) == "paperless" ]] ;
then
python3 manage.py management_command "$@"
else
echo "Unknown user."
fi

View File

@@ -1,15 +0,0 @@
#!/usr/bin/env bash
rootless_args=()
if [ "$(id -u)" == "$(id -u paperless)" ]; then
rootless_args=(
--user
paperless
--logfile
supervisord.log
--pidfile
supervisord.pid
)
fi
exec /usr/local/bin/supervisord -c /etc/supervisord.conf "${rootless_args[@]}"

View File

@@ -1,60 +0,0 @@
[supervisord]
nodaemon=true ; start in foreground if true; default false
logfile=/var/log/supervisord/supervisord.log ; main log file; default $CWD/supervisord.log
pidfile=/var/run/supervisord/supervisord.pid ; supervisord pidfile; default supervisord.pid
logfile_maxbytes=50MB ; max main logfile bytes b4 rotation; default 50MB
logfile_backups=10 ; # of main logfile backups; 0 means none, default 10
loglevel=info ; log level; default info; others: debug,warn,trace
user=root
[program:gunicorn]
command=gunicorn -c /usr/src/paperless/gunicorn.conf.py paperless.asgi:application
user=paperless
priority = 1
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:consumer]
command=python3 manage.py document_consumer
user=paperless
stopsignal=INT
priority = 20
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:celery]
command = celery --app paperless worker --loglevel INFO
user=paperless
stopasgroup = true
stopwaitsecs = 60
priority = 5
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:celery-beat]
command = celery --app paperless beat --loglevel INFO
user=paperless
stopasgroup = true
priority = 10
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:celery-flower]
command = /usr/local/bin/flower-conditional.sh
user = paperless
startsecs = 0
priority = 40
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0

View File

@@ -1,44 +0,0 @@
#!/usr/bin/env python3
"""
Simple script which attempts to ping the Redis broker as set in the environment for
a certain number of times, waiting a little bit in between
"""
import os
import sys
import time
from typing import Final
from redis import Redis
if __name__ == "__main__":
MAX_RETRY_COUNT: Final[int] = 5
RETRY_SLEEP_SECONDS: Final[int] = 5
REDIS_URL: Final[str] = os.getenv("PAPERLESS_REDIS", "redis://localhost:6379")
print(f"Waiting for Redis...", flush=True)
attempt = 0
with Redis.from_url(url=REDIS_URL) as client:
while attempt < MAX_RETRY_COUNT:
try:
client.ping()
break
except Exception as e:
print(
f"Redis ping #{attempt} failed.\n"
f"Error: {str(e)}.\n"
f"Waiting {RETRY_SLEEP_SECONDS}s",
flush=True,
)
time.sleep(RETRY_SLEEP_SECONDS)
attempt += 1
if attempt >= MAX_RETRY_COUNT:
print(f"Failed to connect to redis using environment variable PAPERLESS_REDIS.")
sys.exit(os.EX_UNAVAILABLE)
else:
print(f"Connected to Redis broker.")
sys.exit(os.EX_OK)

18
docs/Dockerfile Normal file
View File

@@ -0,0 +1,18 @@
FROM python:3.5.1
MAINTAINER Pit Kleyersburg <pitkley@googlemail.com>
# Install Sphinx and Pygments
RUN pip install Sphinx Pygments
# Setup directories, copy data
RUN mkdir /build
COPY . /build
WORKDIR /build/docs
# Build documentation
RUN make html
# Start webserver
WORKDIR /build/docs/_build/html
EXPOSE 8000/tcp
CMD ["python3", "-m", "http.server"]

View File

@@ -24,7 +24,6 @@ I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " livehtml to preview changes with live reload in your browser"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@@ -55,9 +54,6 @@ html:
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
livehtml:
sphinx-autobuild "./" "$(BUILDDIR)" $(O)
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo

View File

@@ -1,605 +0,0 @@
/* Variables */
:root {
--color-text-body: #5c5962;
--color-text-body-light: #fcfcfc;
--color-text-anchor: #7253ed;
--color-text-alt: rgba(0, 0, 0, 0.3);
--color-text-title: #27262b;
--color-text-code-inline: #e74c3c;
--color-text-code-nt: #062873;
--color-text-selection: #b19eff;
--color-bg-body: #fcfcfc;
--color-bg-body-alt: #f3f6f6;
--color-bg-side-nav: #f5f6fa;
--color-bg-side-nav-hover: #ebedf5;
--color-bg-code-block: var(--color-bg-side-nav);
--color-border: #eeebee;
--color-btn-neutral-bg: #f3f6f6;
--color-btn-neutral-bg-hover: #e5ebeb;
--color-success-title: #1abc9c;
--color-success-body: #dbfaf4;
--color-warning-title: #f0b37e;
--color-warning-body: #ffedcc;
--color-danger-title: #f29f97;
--color-danger-body: #fdf3f2;
--color-info-title: #6ab0de;
--color-info-body: #e7f2fa;
}
.dark-mode {
--color-text-body: #abb2bf;
--color-text-body-light: #9499a2;
--color-text-alt: rgba(0255, 255, 255, 0.5);
--color-text-title: var(--color-text-anchor);
--color-text-code-inline: #abb2bf;
--color-text-code-nt: #2063f3;
--color-text-selection: #030303;
--color-bg-body: #1d1d20 !important;
--color-bg-body-alt: #131315;
--color-bg-side-nav: #18181a;
--color-bg-side-nav-hover: #101216;
--color-bg-code-block: #101216;
--color-border: #47494f;
--color-btn-neutral-bg: #242529;
--color-btn-neutral-bg-hover: #101216;
--color-success-title: #02120f;
--color-success-body: #041b17;
--color-warning-title: #1b0e03;
--color-warning-body: #371d06;
--color-danger-title: #120902;
--color-danger-body: #1b0503;
--color-info-title: #020608;
--color-info-body: #06141e;
}
* {
transition: background-color 0.3s ease, border-color 0.3s ease;
}
/* Typography */
body {
font-family: system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,sans-serif;
font-size: inherit;
line-height: 1.4;
color: var(--color-text-body);
}
.rst-content p {
word-break: break-word;
}
h1, h2, h3, h4, h5, h6 {
font-family: inherit;
}
.rst-content .toctree-wrapper>p.caption, .rst-content h1, .rst-content h2, .rst-content h3, .rst-content h4, .rst-content h5, .rst-content h6 {
padding-top: .5em;
}
p, .main-content-wrap, .rst-content .section ul, .rst-content .toctree-wrapper ul, .rst-content section ul, .wy-plain-list-disc, article ul {
line-height: 1.6;
}
pre, .code, .rst-content .linenodiv pre, .rst-content div[class^=highlight] pre, .rst-content pre.literal-block {
font-family: "SFMono-Regular", Menlo,Consolas, Monospace;
font-size: 0.75em;
line-height: 1.8;
}
.wy-menu-vertical li.toctree-l3,.wy-menu-vertical li.toctree-l4 {
font-size: 1rem
}
.rst-versions {
font-family: inherit;
line-height: 1;
}
footer, footer p {
font-size: .8rem;
}
footer .rst-footer-buttons {
font-size: 1rem;
}
@media (max-width: 400px) {
/* break code lines on mobile */
pre, code {
word-break: break-word;
}
}
/* Layout */
.wy-side-nav-search, .wy-menu-vertical {
width: auto;
}
.wy-nav-side {
z-index: 0;
display: flex;
flex-wrap: wrap;
background-color: var(--color-bg-side-nav);
}
.wy-side-scroll {
width: 100%;
overflow-y: auto;
}
@media (min-width: 66.5rem) {
.wy-side-scroll {
width:264px
}
}
@media (min-width: 50rem) {
.wy-nav-side {
flex-wrap: nowrap;
position: fixed;
width: 248px;
height: 100%;
flex-direction: column;
border-right: 1px solid var(--color-border);
align-items:flex-end
}
}
@media (min-width: 66.5rem) {
.wy-nav-side {
width: calc((100% - 1064px) / 2 + 264px);
min-width:264px
}
}
@media (min-width: 50rem) {
.wy-nav-content-wrap {
position: relative;
max-width: 800px;
margin-left:248px
}
}
@media (min-width: 66.5rem) {
.wy-nav-content-wrap {
margin-left:calc((100% - 1064px) / 2 + 264px)
}
}
/* Colors */
body.wy-body-for-nav,
.wy-nav-content {
background: var(--color-bg-body);
}
.wy-nav-side {
border-right: 1px solid var(--color-border);
}
.wy-side-nav-search, .wy-nav-top {
background: var(--color-bg-side-nav);
border-bottom: 1px solid var(--color-border);
}
.wy-nav-content-wrap {
background: inherit;
}
.wy-side-nav-search > a, .wy-nav-top a, .wy-nav-top i {
color: var(--color-text-title);
}
.wy-side-nav-search > a:hover, .wy-nav-top a:hover {
background: transparent;
}
.wy-side-nav-search > div.version {
color: var(--color-text-alt)
}
.wy-side-nav-search > div[role="search"] {
border-top: 1px solid var(--color-border);
}
.wy-menu-vertical li.toctree-l2.current>a, .wy-menu-vertical li.toctree-l2.current li.toctree-l3>a,
.wy-menu-vertical li.toctree-l3.current>a, .wy-menu-vertical li.toctree-l3.current li.toctree-l4>a {
background: var(--color-bg-side-nav);
}
.rst-content .highlighted {
background: #eedd85;
box-shadow: 0 0 0 2px #eedd85;
font-weight: 600;
}
.wy-side-nav-search input[type=text],
html.writer-html5 .rst-content table.docutils th {
color: var(--color-text-body);
}
.rst-content table.docutils:not(.field-list) tr:nth-child(2n-1) td,
.wy-table-backed,
.wy-table-odd td,
.wy-table-striped tr:nth-child(2n-1) td {
background-color: var(--color-bg-body-alt);
}
.rst-content table.docutils,
.wy-table-bordered-all,
html.writer-html5 .rst-content table.docutils th,
.rst-content table.docutils td,
.wy-table-bordered-all td,
hr {
border-color: var(--color-border) !important;
}
::selection {
background: var(--color-text-selection);
}
/* Ridiculous rules are taken from sphinx_rtd */
.rst-content .admonition-title,
.wy-alert-title {
color: var(--color-text-body-light);
}
.rst-content .hint,
.rst-content .important,
.rst-content .tip,
.rst-content .wy-alert-success,
.wy-alert.wy-alert-success {
background: var(--color-success-body);
}
.rst-content .hint .admonition-title,
.rst-content .hint .wy-alert-title,
.rst-content .important .admonition-title,
.rst-content .important .wy-alert-title,
.rst-content .tip .admonition-title,
.rst-content .tip .wy-alert-title,
.rst-content .wy-alert-success .admonition-title,
.rst-content .wy-alert-success .wy-alert-title,
.wy-alert.wy-alert-success .rst-content .admonition-title,
.wy-alert.wy-alert-success .wy-alert-title {
background-color: var(--color-success-title);
}
.rst-content .admonition-todo,
.rst-content .attention,
.rst-content .caution,
.rst-content .warning,
.rst-content .wy-alert-warning,
.wy-alert.wy-alert-warning {
background: var(--color-warning-body);
}
.rst-content .admonition-todo .admonition-title,
.rst-content .admonition-todo .wy-alert-title,
.rst-content .attention .admonition-title,
.rst-content .attention .wy-alert-title,
.rst-content .caution .admonition-title,
.rst-content .caution .wy-alert-title,
.rst-content .warning .admonition-title,
.rst-content .warning .wy-alert-title,
.rst-content .wy-alert-warning .admonition-title,
.rst-content .wy-alert-warning .wy-alert-title,
.rst-content .wy-alert.wy-alert-warning .admonition-title,
.wy-alert.wy-alert-warning .rst-content .admonition-title,
.wy-alert.wy-alert-warning .wy-alert-title {
background: var(--color-warning-title);
}
.rst-content .danger,
.rst-content .error,
.rst-content .wy-alert-danger,
.wy-alert.wy-alert-danger {
background: var(--color-danger-body);
}
.rst-content .danger .admonition-title,
.rst-content .danger .wy-alert-title,
.rst-content .error .admonition-title,
.rst-content .error .wy-alert-title,
.rst-content .wy-alert-danger .admonition-title,
.rst-content .wy-alert-danger .wy-alert-title,
.wy-alert.wy-alert-danger .rst-content .admonition-title,
.wy-alert.wy-alert-danger .wy-alert-title {
background: var(--color-danger-title);
}
.rst-content .note,
.rst-content .seealso,
.rst-content .wy-alert-info,
.wy-alert.wy-alert-info {
background: var(--color-info-body);
}
.rst-content .note .admonition-title,
.rst-content .note .wy-alert-title,
.rst-content .seealso .admonition-title,
.rst-content .seealso .wy-alert-title,
.rst-content .wy-alert-info .admonition-title,
.rst-content .wy-alert-info .wy-alert-title,
.wy-alert.wy-alert-info .rst-content .admonition-title,
.wy-alert.wy-alert-info .wy-alert-title {
background: var(--color-info-title);
}
/* Links */
a, a:visited,
.wy-menu-vertical a,
a.icon.icon-home,
.wy-menu-vertical li.toctree-l1.current > a.current {
color: var(--color-text-anchor);
text-decoration: none;
}
a:hover, .wy-breadcrumbs-aside a {
color: var(--color-text-anchor); /* reset */
}
.rst-versions a, .rst-versions .rst-current-version {
color: #var(--color-text-anchor);
}
.wy-nav-content a.reference, .wy-nav-content a:not([class]) {
background-image: linear-gradient(var(--color-border) 0%, var(--color-border) 100%);
background-repeat: repeat-x;
background-position: 0 100%;
background-size: 1px 1px;
}
.wy-nav-content a.reference:hover, .wy-nav-content a:not([class]):hover {
background-image: linear-gradient(rgba(114,83,237,0.45) 0%, rgba(114,83,237,0.45) 100%);
background-size: 1px 1px;
}
.wy-menu-vertical a:hover,
.wy-menu-vertical li.current a:hover,
.wy-menu-vertical a:active {
background: var(--color-bg-side-nav-hover) !important;
color: var(--color-text-body);
}
.wy-menu-vertical li.toctree-l1.current>a,
.wy-menu-vertical li.current>a,
.wy-menu-vertical li.on a {
background-color: var(--color-bg-side-nav-hover);
border: none;
font-weight: normal;
}
.wy-menu-vertical li.current {
background-color: inherit;
}
.wy-menu-vertical li.current a {
border-right: none;
}
.wy-menu-vertical li.toctree-l2 a,
.wy-menu-vertical li.toctree-l3 a,
.wy-menu-vertical li.toctree-l4 a,
.wy-menu-vertical li.toctree-l5 a,
.wy-menu-vertical li.toctree-l6 a,
.wy-menu-vertical li.toctree-l7 a,
.wy-menu-vertical li.toctree-l8 a,
.wy-menu-vertical li.toctree-l9 a,
.wy-menu-vertical li.toctree-l10 a {
color: var(--color-text-body);
}
a.image-reference, a.image-reference:hover {
background: none !important;
}
a.image-reference img {
cursor: zoom-in;
}
/* Code blocks */
.rst-content code, .rst-content tt, code {
padding: 0.25em;
font-weight: 400;
background-color: var(--color-bg-code-block);
border: 1px solid var(--color-border);
border-radius: 4px;
}
.rst-content div[class^=highlight], .rst-content pre.literal-block {
padding: 0.7rem;
margin-top: 0;
margin-bottom: 0.75rem;
overflow-x: auto;
background-color: var(--color-bg-side-nav);
border-color: var(--color-border);
border-radius: 4px;
box-shadow: none;
}
.rst-content .admonition-title,
.rst-content div.admonition,
.wy-alert-title {
padding: 10px 12px;
border-top-left-radius: 4px;
border-top-right-radius: 4px;
}
.highlight .go {
color: inherit;
}
.highlight .nt {
color: var(--color-text-code-nt);
}
.rst-content code.literal,
.rst-content tt.literal,
html.writer-html5 .rst-content dl.footnote code {
border-color: var(--color-border);
background-color: var(--color-border);
color: var(--color-text-code-inline)
}
/* Search */
.wy-side-nav-search input[type=text] {
border: none;
border-radius: 0;
background-color: transparent;
font-family: inherit;
font-size: .85rem;
box-shadow: none;
padding: .7rem 1rem .7rem 2.8rem;
margin: 0;
}
#rtd-search-form {
position: relative;
}
#rtd-search-form:before {
font: normal normal normal 14px/1 FontAwesome;
font-size: inherit;
text-rendering: auto;
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
content: "\f002";
color: var(--color-text-alt);
position: absolute;
left: 1.5rem;
top: .7rem;
}
/* Side nav */
.wy-side-nav-search {
padding: 1rem 0 0 0;
}
.wy-menu-vertical li a button.toctree-expand {
float: right;
margin-right: -1.5em;
padding: 0 .5em;
}
.wy-menu-vertical a,
.wy-menu-vertical li.current>a,
.wy-menu-vertical li.current li>a {
padding-right: 1.5em !important;
}
.wy-menu-vertical li.current li>a.current {
font-weight: 600;
}
/* Misc spacing */
.rst-content .admonition-title, .wy-alert-title {
padding: 10px 12px;
}
/* Buttons */
.btn {
display: inline-block;
box-sizing: border-box;
padding: 0.3em 1em;
margin: 0;
font-family: inherit;
font-size: inherit;
font-weight: 500;
line-height: 1.5;
color: #var(--color-text-anchor);
text-decoration: none;
vertical-align: baseline;
background-color: #f7f7f7;
border-width: 0;
border-radius: 4px;
box-shadow: 0 1px 2px rgba(0,0,0,0.12),0 3px 10px rgba(0,0,0,0.08);
appearance: none;
}
.btn:active {
padding: 0.3em 1em;
}
.rst-content .btn:focus {
outline: 1px solid #ccc;
}
.rst-content .btn-neutral, .rst-content .btn span.fa {
color: var(--color-text-body) !important;
}
.btn-neutral {
background-color: var(--color-btn-neutral-bg) !important;
color: var(--color-btn-neutral-text) !important;
border: 1px solid var(--color-btn-neutral-bg);
}
.btn:hover, .btn-neutral:hover {
background-color: var(--color-btn-neutral-bg-hover) !important;
}
/* Icon overrides */
.wy-side-nav-search a.icon-home:before {
display: none;
}
.fa-minus-square-o:before,.wy-menu-vertical li.current>a button.toctree-expand:before,.wy-menu-vertical li.on a button.toctree-expand:before {
content: "\f106"; /* fa-angle-up */
}
.fa-plus-square-o:before, .wy-menu-vertical li button.toctree-expand:before {
content: "\f107"; /* fa-angle-down */
}
/* Misc */
.wy-nav-top {
line-height: 36px;
}
.wy-nav-top > i {
font-size: 24px;
padding: 8px 0 0 2px;
color:#var(--color-text-anchor);
}
.rst-content table.docutils td,
.rst-content table.docutils th,
.rst-content table.field-list td,
.rst-content table.field-list th,
.wy-table td,
.wy-table th {
padding: 8px 14px;
}
.dark-mode-toggle {
position: absolute;
top: 14px;
right: 12px;
height: 20px;
width: 24px;
z-index: 10;
border: none;
background-color: transparent;
color: inherit;
opacity: 0.7;
}
.wy-nav-content-wrap {
z-index: 20;
}
.rst-content .toctree-wrapper {
display: none;
}
.redirect-notice {
font-size: 2.5rem;
}

14
docs/_static/custom.css vendored Normal file
View File

@@ -0,0 +1,14 @@
/* override table width restrictions */
@media screen and (min-width: 767px) {
.wy-table-responsive table td {
/* !important prevents the common CSS stylesheets from
overriding this as on RTD they are loaded after this stylesheet */
white-space: normal !important;
}
.wy-table-responsive {
overflow: visible !important;
}
}

View File

@@ -1,47 +0,0 @@
let toggleButton
let icon
function load() {
'use strict'
toggleButton = document.createElement('button')
toggleButton.setAttribute('title', 'Toggle dark mode')
toggleButton.classList.add('dark-mode-toggle')
icon = document.createElement('i')
icon.classList.add('fa', darkModeState ? 'fa-sun-o' : 'fa-moon-o')
toggleButton.appendChild(icon)
document.body.prepend(toggleButton)
// Listen for changes in the OS settings
// addListener is used because older versions of Safari don't support addEventListener
// prefersDarkQuery set in <head>
if (prefersDarkQuery) {
prefersDarkQuery.addListener(function (evt) {
toggleDarkMode(evt.matches)
})
}
// Initial setting depending on the prefers-color-mode or localstorage
// darkModeState should be set in the document <head> to prevent flash
if (darkModeState == undefined) darkModeState = false
toggleDarkMode(darkModeState)
// Toggles the "dark-mode" class on click and sets localStorage state
toggleButton.addEventListener('click', () => {
darkModeState = !darkModeState
toggleDarkMode(darkModeState)
localStorage.setItem('dark-mode', darkModeState)
})
}
function toggleDarkMode(state) {
document.documentElement.classList.toggle('dark-mode', state)
document.documentElement.classList.toggle('light-mode', !state)
icon.classList.remove('fa-sun-o')
icon.classList.remove('fa-moon-o')
icon.classList.add(state ? 'fa-sun-o' : 'fa-moon-o')
darkModeState = state
}
document.addEventListener('DOMContentLoaded', load)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 67 KiB

BIN
docs/_static/screenshot.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 445 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 661 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 457 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 436 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 462 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 608 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 698 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 706 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 480 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 680 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 686 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 848 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 703 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 388 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 517 KiB

View File

@@ -1,38 +0,0 @@
{% extends "!layout.html" %}
{% block extrahead %}
<script>
// MediaQueryList object
const prefersDarkQuery = window.matchMedia("(prefers-color-scheme: dark)");
const lsDark = localStorage.getItem("dark-mode");
let darkModeState = lsDark !== null ? lsDark == "true" : prefersDarkQuery.matches;
document.documentElement.classList.toggle("dark-mode", darkModeState);
document.documentElement.classList.toggle("light-mode", !darkModeState);
const RTD_TO_MKD = {
"index.html": "",
"setup.html": "setup",
"usage_overview.html": "usage",
"advanced_usage.html": "advanced_usage",
"administration.html": "administration",
"configuration.html": "configuration",
"api.html": "api",
"faq.html": "faq",
"troubleshooting.html": "troubleshooting",
"extending.html": "development",
"scanners.html": "",
"screenshots.html": "",
"changelog.html": "changelog",
}
const path = (RTD_TO_MKD[window.location.pathname.substring(window.location.pathname.lastIndexOf("/") + 1)] ?? "") + "/";
const hash = window.location.hash;
const redirectURL = new URL(path + hash, "https://docs.paperless-ngx.com/");
console.log(`Redirecting to ${redirectURL} in 3 seconds...`);
setTimeout(() => {
window.location.replace(redirectURL);
}, 3000);
</script>
{{ super() }}
{% endblock %}

View File

@@ -1,11 +0,0 @@
.. _administration:
**************
Administration
**************
.. cssclass:: redirect-notice
The Paperless-ngx documentation has permanently moved.
You will be redirected shortly...

View File

@@ -1,11 +0,0 @@
.. _advanced_usage:
***************
Advanced topics
***************
.. cssclass:: redirect-notice
The Paperless-ngx documentation has permanently moved.
You will be redirected shortly...

View File

@@ -1,12 +1,23 @@
.. _api:
************
The REST API
************
############
Paperless makes use of the `Django REST Framework`_ standard API interface
because of its inherent awesomeness. Conveniently, the system is also
self-documenting, so to learn more about the access points, schema, what's
accepted and what isn't, you need only visit ``/api`` on your local Paperless
installation.
.. _Django REST Framework: http://django-rest-framework.org/
.. cssclass:: redirect-notice
.. _api-uploading:
The Paperless-ngx documentation has permanently moved.
Uploading
---------
You will be redirected shortly...
File uploads in an API are hard and so far as I've been able to tell, there's
no standard way of accepting them, so rather than crowbar file uploads into the
REST API and endure that headache, I've left that process to a simple HTTP
POST, documented on the :ref:`consumption page <consumption-http>`.

View File

@@ -1,11 +1,159 @@
.. _changelog:
*********
Changelog
*********
#########
.. cssclass:: redirect-notice
* 0.3.2
* Fix for #172: defaulting ALLOWED_HOSTS to ``["*"]`` and allowing the user
to set her own value via ``PAPERLESS_ALLOWED_HOSTS`` should the need
arise.
The Paperless-ngx documentation has permanently moved.
* 0.3.1
* Added a default value for ``CONVERT_BINARY``
You will be redirected shortly...
* 0.3.0
* Updated to using django-filter 1.x
* Added some system checks so new users aren't confused by misconfigurations.
* Consumer loop time is now configurable for systems with slow writes. Just
set ``PAPERLESS_CONSUMER_LOOP_TIME`` to a number of seconds. The default
is 10.
* As per `#44`_, we've removed support for ``PAPERLESS_CONVERT``,
``PAPERLESS_CONSUME``, and ``PAPERLESS_SECRET``. Please use
``PAPERLESS_CONVERT_BINARY``, ``PAPERLESS_CONSUMPTION_DIR``, and
``PAPERLESS_SHARED_SECRET`` respectively instead.
* 0.2.0
* `#150`_: The media root is now a variable you can set in
``paperless.conf``.
* `#148`_: The database location (sqlite) is now a variable you can set in
``paperless.conf``.
* `#146`_: Fixed a bug that allowed unauthorised access to the `/fetch` URL.
* `#131`_: Document files are now automatically removed from disk when
they're deleted in Paperless.
* `#121`_: Fixed a bug where Paperless wasn't setting document creation time
based on the file naming scheme.
* `#81`_: Added a hook to run an arbitrary script after every document is
consumed.
* `#98`_: Added optional environment variables for ImageMagick so that it
doesn't explode when handling Very Large Documents or when it's just
running on a low-memory system. Thanks to `Florian Harr`_ for his help on
this one.
* `#89`_ Ported the auto-tagging code to correspondents as well. Thanks to
`Justin Snyman`_ for the pointers in the issue queue.
* Added support for guessing the date from the file name along with the
correspondent, title, and tags. Thanks to `Tikitu de Jager`_ for his pull
request that I took forever to merge and to `Pit`_ for his efforts on the
regex front.
* `#94`_: Restored support for changing the created date in the UI. Thanks
to `Martin Honermeyer`_ and `Tim White`_ for working with me on this.
* 0.1.1
* Potentially **Breaking Change**: All references to "sender" in the code
have been renamed to "correspondent" to better reflect the nature of the
property (one could quite reasonably scan a document before sending it to
someone.)
* `#67`_: Rewrote the document exporter and added a new importer that allows
for full metadata retention without depending on the file name and
modification time. A big thanks to `Tikitu de Jager`_, `Pit`_,
`Florian Jung`_, and `Christopher Luu`_ for their code snippets and
contributing conversation that lead to this change.
* `#20`_: Added *unpaper* support to help in cleaning up the scanned image
before it's OCR'd. Thanks to `Pit`_ for this one.
* `#71`_ Added (encrypted) thumbnails in anticipation of a proper UI.
* `#68`_: Added support for using a proper config file at
``/etc/paperless.conf`` and modified the systemd unit files to use it.
* Refactored the Vagrant installation process to use environment variables
rather than asking the user to modify ``settings.py``.
* `#44`_: Harmonise environment variable names with constant names.
* `#60`_: Setup logging to actually use the Python native logging framework.
* `#53`_: Fixed an annoying bug that caused ``.jpeg`` and ``.JPG`` images
to be imported but made unavailable.
* 0.1.0
* Docker support! Big thanks to `Wayne Werner`_, `Brian Conn`_, and
`Tikitu de Jager`_ for this one, and especially to `Pit`_
who spearheadded this effort.
* A simple REST API is in place, but it should be considered unstable.
* Cleaned up the consumer to use temporary directories instead of a single
scratch space. (Thanks `Pit`_)
* Improved the efficiency of the consumer by parsing pages more intelligently
and introducing a threaded OCR process (thanks again `Pit`_).
* `#45`_: Cleaned up the logic for tag matching. Reported by `darkmatter`_.
* `#47`_: Auto-rotate landscape documents. Reported by `Paul`_ and fixed by
`Pit`_.
* `#48`_: Matching algorithms should do so on a word boundary (`darkmatter`_)
* `#54`_: Documented the re-tagger (`zedster`_)
* `#57`_: Make sure file is preserved on import failure (`darkmatter`_)
* Added tox with pep8 checking
* 0.0.6
* Added support for parallel OCR (significant work from `Pit`_)
* Sped up the language detection (significant work from `Pit`_)
* Added simple logging
* 0.0.5
* Added support for image files as documents (png, jpg, gif, tiff)
* Added a crude means of HTTP POST for document imports
* Added IMAP mail support
* Added a re-tagging utility
* Documentation for the above as well as data migration
* 0.0.4
* Added automated tagging basted on keyword matching
* Cleaned up the document listing page
* Removed ``User`` and ``Group`` from the admin
* Added ``pytz`` to the list of requirements
* 0.0.3
* Added basic tagging
* 0.0.2
* Added language detection
* Added datestamps to ``document_exporter``.
* Changed ``settings.TESSERACT_LANGUAGE`` to ``settings.OCR_LANGUAGE``.
* 0.0.1
* Initial release
.. _Brian Conn: https://github.com/TheConnMan
.. _Christopher Luu: https://github.com/nuudles
.. _Florian Jung: https://github.com/the01
.. _Tikitu de Jager: https://github.com/tikitu
.. _Paul: https://github.com/polo2ro
.. _Pit: https://github.com/pitkley
.. _Wayne Werner: https://github.com/waynew
.. _darkmatter: https://github.com/darkmatter
.. _zedster: https://github.com/zedster
.. _Martin Honermeyer: https://github.com/djmaze
.. _Tim White: https://github.com/timwhite
.. _Florian Harr: https://github.com/evils
.. _Justin Snyman: https://github.com/stringlytyped
.. _#20: https://github.com/danielquinn/paperless/issues/20
.. _#44: https://github.com/danielquinn/paperless/issues/44
.. _#45: https://github.com/danielquinn/paperless/issues/45
.. _#47: https://github.com/danielquinn/paperless/issues/47
.. _#48: https://github.com/danielquinn/paperless/issues/48
.. _#53: https://github.com/danielquinn/paperless/issues/53
.. _#54: https://github.com/danielquinn/paperless/issues/54
.. _#57: https://github.com/danielquinn/paperless/issues/57
.. _#60: https://github.com/danielquinn/paperless/issues/60
.. _#67: https://github.com/danielquinn/paperless/issues/67
.. _#68: https://github.com/danielquinn/paperless/issues/68
.. _#71: https://github.com/danielquinn/paperless/issues/71
.. _#81: https://github.com/danielquinn/paperless/issues/81
.. _#89: https://github.com/danielquinn/paperless/issues/89
.. _#94: https://github.com/danielquinn/paperless/issues/94
.. _#98: https://github.com/danielquinn/paperless/issues/98
.. _#121: https://github.com/danielquinn/paperless/issues/121
.. _#131: https://github.com/danielquinn/paperless/issues/131
.. _#146: https://github.com/danielquinn/paperless/issues/146
.. _#148: https://github.com/danielquinn/paperless/pull/148
.. _#150: https://github.com/danielquinn/paperless/pull/150

View File

@@ -1,40 +1,64 @@
import sphinx_rtd_theme
# -*- coding: utf-8 -*-
#
# Paperless documentation build configuration file, created by
# sphinx-quickstart on Mon Oct 26 18:36:52 2015.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys
import os
__version__ = None
__full_version_str__ = None
__major_minor_version_str__ = None
exec(open("../src/paperless/version.py").read())
# Believe it or not, this is the officially sanctioned way to add custom CSS.
def setup(app):
app.add_stylesheet("custom.css")
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.insert(0, os.path.abspath('.'))
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.intersphinx",
"sphinx.ext.todo",
"sphinx.ext.imgmath",
"sphinx.ext.viewcode",
"sphinx_rtd_theme",
"myst_parser",
'sphinx.ext.autodoc',
'sphinx.ext.intersphinx',
'sphinx.ext.todo',
'sphinx.ext.pngmath',
'sphinx.ext.viewcode',
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = {
".rst": "restructuredtext",
".md": "markdown",
}
source_suffix = '.rst'
# The encoding of source files.
# source_encoding = 'utf-8-sig'
#source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = "index"
master_doc = 'index'
# General information about the project.
project = "Paperless-ngx"
copyright = "2015-2022, Daniel Quinn, Jonas Winkler, and the paperless-ngx team"
project = u'Paperless'
copyright = u'2015, Daniel Quinn'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
@@ -47,190 +71,199 @@ copyright = "2015-2022, Daniel Quinn, Jonas Winkler, and the paperless-ngx team"
#
# The short X.Y version.
version = __major_minor_version_str__
version = ".".join([str(_) for _ in __version__[:2]])
# The full version, including alpha/beta/rc tags.
release = __full_version_str__
release = ".".join([str(_) for _ in __version__[:3]])
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
# language = None
#language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
# today = ''
#today = ''
# Else, today_fmt is used as the format for a strftime call.
# today_fmt = '%B %d, %Y'
#today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ["_build"]
exclude_patterns = ['_build']
# The reST default role (used for this markup: `text`) to use for all
# documents.
# default_role = None
#default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
# add_function_parentheses = True
#add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
# add_module_names = True
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
# show_authors = False
#show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
# modindex_common_prefix = []
#modindex_common_prefix = []
# If true, keep warnings as "system message" paragraphs in the built documents.
# keep_warnings = False
#keep_warnings = False
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = "sphinx_rtd_theme"
html_theme = 'default'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
# html_theme_options = {}
#html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
# html_title = None
#html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
# html_short_title = None
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
# html_logo = None
#html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
# html_favicon = None
#html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
# These paths are either relative to html_static_path
# or fully qualified paths (eg. https://...)
html_css_files = [
"css/custom.css",
]
html_js_files = [
"js/darkmode.js",
]
html_static_path = ['_static']
# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.
# html_extra_path = []
#html_extra_path = []
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
# html_last_updated_fmt = '%b %d, %Y'
#html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
# html_use_smartypants = True
#html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
# html_sidebars = {}
#html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
# html_additional_pages = {}
#html_additional_pages = {}
# If false, no module index is generated.
# html_domain_indices = True
#html_domain_indices = True
# If false, no index is generated.
# html_use_index = True
#html_use_index = True
# If true, the index is split into individual pages for each letter.
# html_split_index = False
#html_split_index = False
# If true, links to the reST sources are added to the pages.
# html_show_sourcelink = True
#html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
# html_show_sphinx = True
#html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
# html_show_copyright = True
#html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
# html_use_opensearch = ''
#html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
# html_file_suffix = None
#html_file_suffix = None
# Output file base name for HTML help builder.
htmlhelp_basename = "paperless"
htmlhelp_basename = 'paperless'
#
# Attempt to use the ReadTheDocs theme. If it's not installed, fallback to
# the default.
#
try:
import sphinx_rtd_theme
html_theme = "sphinx_rtd_theme"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
except ImportError:
pass
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#'preamble': '',
# The paper size ('letterpaper' or 'a4paper').
#'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#'preamble': '',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
("index", "paperless.tex", "Paperless Documentation", "Daniel Quinn", "manual"),
('index', 'paperless.tex', u'Paperless Documentation',
u'Daniel Quinn', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
# latex_logo = None
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
# latex_use_parts = False
#latex_use_parts = False
# If true, show page references after internal links.
# latex_show_pagerefs = False
#latex_show_pagerefs = False
# If true, show URL addresses after external links.
# latex_show_urls = False
#latex_show_urls = False
# Documents to append as an appendix to all manuals.
# latex_appendices = []
#latex_appendices = []
# If false, no module index is generated.
# latex_domain_indices = True
#latex_domain_indices = True
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [("index", "paperless", "Paperless Documentation", ["Daniel Quinn"], 1)]
man_pages = [
('index', 'paperless', u'Paperless Documentation',
[u'Daniel Quinn'], 1)
]
# If true, show URL addresses after external links.
# man_show_urls = False
#man_show_urls = False
# -- Options for Texinfo output -------------------------------------------
@@ -239,99 +272,93 @@ man_pages = [("index", "paperless", "Paperless Documentation", ["Daniel Quinn"],
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(
"index",
"Paperless",
"Paperless Documentation",
"Daniel Quinn",
"paperless",
"Scan, index, and archive all of your paper documents.",
"Miscellaneous",
),
('index', 'Paperless', u'Paperless Documentation',
u'Daniel Quinn', 'paperless', 'Scan, index, and archive all of your paper documents.',
'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
# texinfo_appendices = []
#texinfo_appendices = []
# If false, no module index is generated.
# texinfo_domain_indices = True
#texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
# texinfo_show_urls = 'footnote'
#texinfo_show_urls = 'footnote'
# If true, do not generate a @detailmenu in the "Top" node's menu.
# texinfo_no_detailmenu = False
#texinfo_no_detailmenu = False
# -- Options for Epub output ----------------------------------------------
# Bibliographic Dublin Core info.
epub_title = "Paperless"
epub_author = "Daniel Quinn"
epub_publisher = "Daniel Quinn"
epub_copyright = "2015, Daniel Quinn"
epub_title = u'Paperless'
epub_author = u'Daniel Quinn'
epub_publisher = u'Daniel Quinn'
epub_copyright = u'2015, Daniel Quinn'
# The basename for the epub file. It defaults to the project name.
# epub_basename = u'Paperless'
#epub_basename = u'Paperless'
# The HTML theme for the epub output. Since the default themes are not optimized
# for small screen space, using the same theme for HTML and epub output is
# usually not wise. This defaults to 'epub', a theme designed to save visual
# space.
# epub_theme = 'epub'
#epub_theme = 'epub'
# The language of the text. It defaults to the language option
# or en if the language is not set.
# epub_language = ''
#epub_language = ''
# The scheme of the identifier. Typical schemes are ISBN or URL.
# epub_scheme = ''
#epub_scheme = ''
# The unique identifier of the text. This can be a ISBN number
# or the project homepage.
# epub_identifier = ''
#epub_identifier = ''
# A unique identification for the text.
# epub_uid = ''
#epub_uid = ''
# A tuple containing the cover image and cover page html template filenames.
# epub_cover = ()
#epub_cover = ()
# A sequence of (type, uri, title) tuples for the guide element of content.opf.
# epub_guide = ()
#epub_guide = ()
# HTML files that should be inserted before the pages created by sphinx.
# The format is a list of tuples containing the path and title.
# epub_pre_files = []
#epub_pre_files = []
# HTML files shat should be inserted after the pages created by sphinx.
# The format is a list of tuples containing the path and title.
# epub_post_files = []
#epub_post_files = []
# A list of files that should not be packed into the epub file.
epub_exclude_files = ["search.html"]
epub_exclude_files = ['search.html']
# The depth of the table of contents in toc.ncx.
# epub_tocdepth = 3
#epub_tocdepth = 3
# Allow duplicate toc entries.
# epub_tocdup = True
#epub_tocdup = True
# Choose between 'default' and 'includehidden'.
# epub_tocscope = 'default'
#epub_tocscope = 'default'
# Fix unsupported image types using the PIL.
# epub_fix_images = False
#epub_fix_images = False
# Scale large images.
# epub_max_image_width = 0
#epub_max_image_width = 0
# How to display URL addresses: 'footnote', 'no', or 'inline'.
# epub_show_urls = 'inline'
#epub_show_urls = 'inline'
# If false, no index is generated.
# epub_use_index = True
#epub_use_index = True
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {"http://docs.python.org/": None}
intersphinx_mapping = {'http://docs.python.org/': None}

View File

@@ -1,12 +0,0 @@
.. _configuration:
*************
Configuration
*************
.. cssclass:: redirect-notice
The Paperless-ngx documentation has permanently moved.
You will be redirected shortly...

189
docs/consumption.rst Normal file
View File

@@ -0,0 +1,189 @@
.. _consumption:
Consumption
###########
Once you've got Paperless setup, you need to start feeding documents into it.
Currently, there are three options: the consumption directory, IMAP (email), and
HTTP POST.
.. _consumption-directory:
The Consumption Directory
=========================
The primary method of getting documents into your database is by putting them in
the consumption directory. The ``document_consumer`` script runs in an infinite
loop looking for new additions to this directory and when it finds them, it goes
about the process of parsing them with the OCR, indexing what it finds, and
encrypting the PDF, storing it in the media directory.
Getting stuff into this directory is up to you. If you're running Paperless
on your local computer, you might just want to drag and drop files there, but if
you're running this on a server and want your scanner to automatically push
files to this directory, you'll need to setup some sort of service to accept the
files from the scanner. Typically, you're looking at an FTP server like
`Proftpd`_ or `Samba`_.
.. _Proftpd: http://www.proftpd.org/
.. _Samba: http://www.samba.org/
So where is this consumption directory? It's wherever you define it. Look for
the ``CONSUMPTION_DIR`` value in ``settings.py``. Set that to somewhere
appropriate for your use and put some documents in there. When you're ready,
follow the :ref:`consumer <utilities-consumer>` instructions to get it running.
.. _consumption-directory-hook:
Hooking into the Consumption Process
------------------------------------
Sometimes you may want to do something arbitrary whenever a document is
consumed. Rather than try to predict what you may want to do, Paperless lets
you execute scripts of your own choosing just before or after a document is
consumed using a couple simple hooks.
Just write a script, put it somewhere that Paperless can read & execute, and
then put the path to that script in ``paperless.conf`` with the variable name
of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or
``PAPERLESS_POST_CONSUME_SCRIPT``. The script will be executed before or
or after the document is consumed respectively.
.. important::
These scripts are executed in a **blocking** process, which means that if
a script takes a long time to run, it can significantly slow down your
document consumption flow. If you want things to run asynchronously,
you'll have to fork the process in your script and exit.
.. _consumption-directory-hook-variables:
What Can These Scripts Do?
..........................
It's your script, so you're only limited by your imagination and the laws of
physics. However, the following values are passed to the scripts in order:
.. _consumption-director-hook-variables-pre:
Pre-consumption script
::::::::::::::::::::::
* Document file name
.. _consumption-director-hook-variables-post:
Post-consumption script
:::::::::::::::::::::::
* Document id
* Generated file name
* Source path
* Thumbnail path
* Download URL
* Thumbnail URL
* Correspondent
* Tags
The script can be in any language you like, but for a simple shell script
example, you can take a look at ``post-consumption-example.sh`` in the
``scripts`` directory in this project.
.. _consumption-imap:
IMAP (Email)
============
Another handy way to get documents into your database is to email them to
yourself. The typical use-case would be to be out for lunch and want to send a
copy of the receipt back to your system at home. Paperless can be taught to
pull emails down from an arbitrary account and dump them into the consumption
directory where the process :ref:`above <consumption-directory>` will follow the
usual pattern on consuming the document.
Some things you need to know about this feature:
* It's disabled by default. By setting the values below it will be enabled.
* It's been tested in a limited environment, so it may not work for you (please
submit a pull request if you can!)
* It's designed to **delete mail from the server once consumed**. So don't go
pointing this to your personal email account and wonder where all your stuff
went.
* Currently, only one photo (attachment) per email will work.
So, with all that in mind, here's what you do to get it running:
1. Setup a new email account somewhere, or if you're feeling daring, create a
folder in an existing email box and note the path to that folder.
2. In ``settings.py`` set all of the appropriate values in ``MAIL_CONSUMPTION``.
If you decided to use a subfolder of an existing account, then make sure you
set ``INBOX`` accordingly here. You also have to set the
``UPLOAD_SHARED_SECRET`` to something you can remember 'cause you'll have to
include that in every email you send.
3. Restart the :ref:`consumer <utilities-consumer>`. The consumer will check
the configured email account every 10 minutes for something new and pull down
whatever it finds.
4. Send yourself an email! Note that the subject is treated as the file name,
so if you set the subject to ``Correspondent - Title - tag,tag,tag``, you'll
get what you expect. Also, you must include the aforementioned secret
string in every email so the fetcher knows that it's safe to import.
5. After a few minutes, the consumer will poll your mailbox, pull down the
message, and place the attachment in the consumption directory with the
appropriate name. A few minutes later, the consumer will import it like any
other file.
.. _consumption-http:
HTTP POST
=========
You can also submit a document via HTTP POST. It doesn't do tags yet, and the
URL schema isn't concrete, but it's a start.
To push your document to Paperless, send an HTTP POST to the server with the
following name/value pairs:
* ``correspondent``: The name of the document's correspondent. Note that there
are restrictions on what characters you can use here. Specifically,
alphanumeric characters, `-`, `,`, `.`, and `'` are ok, everything else it
out. You also can't use the sequence ` - ` (space, dash, space).
* ``title``: The title of the document. The rules for characters is the same
here as the correspondent.
* ``signature``: For security reasons, we have the correspondent send a
signature using a "shared secret" method to make sure that random strangers
don't start uploading stuff to your server. The means of generating this
signature is defined below.
Specify ``enctype="multipart/form-data"``, and then POST your file with::
Content-Disposition: form-data; name="document"; filename="whatever.pdf"
.. _consumption-http-signature:
Generating the Signature
------------------------
Generating a signature based a shared secret is pretty simple: define a secret,
and store it on the server and the client. Then use that secret, along with
the text you want to verify to generate a string that you can use for
verification.
In the case of Paperless, you configure the server with the secret by setting
``UPLOAD_SHARED_SECRET``. Then on your client, you generate your signature by
concatenating the correspondent, title, and the secret, and then using sha256
to generate a hexdigest.
If you're using Python, this is what that looks like:
.. code:: python
from hashlib import sha256
signature = sha256(correspondent + title + secret).hexdigest()

View File

@@ -1,12 +0,0 @@
.. _extending:
*************************
Paperless-ngx Development
*************************
.. cssclass:: redirect-notice
The Paperless-ngx documentation has permanently moved.
You will be redirected shortly...

View File

@@ -1,12 +0,0 @@
.. _faq:
**************************
Frequently asked questions
**************************
.. cssclass:: redirect-notice
The Paperless-ngx documentation has permanently moved.
You will be redirected shortly...

85
docs/guesswork.rst Normal file
View File

@@ -0,0 +1,85 @@
.. _guesswork:
Guesswork
#########
During the consumption process, Paperless tries to guess some of the attributes
of the document it's looking at. To do this it uses two approaches:
.. _guesswork-naming:
File Naming
===========
Any document you put into the consumption directory will be consumed, but if
you name the file right, it'll automatically set some values in the database
for you. This is is the logic the consumer follows:
1. Try to find the correspondent, title, and tags in the file name following
the pattern: ``Date - Correspondent - Title - tag,tag,tag.pdf``. Note that
the format of the date is **rigidly defined** as ``YYYYMMDDHHMMSSZ`` or
``YYYYMMDDZ``. The ``Z`` refers "Zulu time" AKA "UTC".
2. If that doesn't work, we skip the date and try this pattern:
``Correspondent - Title - tag,tag,tag.pdf``.
3. If that doesn't work, we try to find the correspondent and title in the file
name following the pattern: ``Correspondent - Title.pdf``.
4. If that doesn't work, just assume that the name of the file is the title.
So given the above, the following examples would work as you'd expect:
* ``20150314000700Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
* ``20150314Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
* ``Another Company - Letter of Reference.jpg``
* ``Dad's Recipe for Pancakes.png``
These however wouldn't work:
* ``2015-03-14 00:07:00 UTC - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
* ``2015-03-14 - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
* ``Another Company- Letter of Reference.jpg``
.. _guesswork-content:
Reading the Document Contents
=============================
After the consumer has tried to figure out what it could from the file name,
it starts looking at the content of the document itself. It will compare the
matching algorithms defined by every tag and correspondent already set in your
database to see if they apply to the text in that document. In other words,
if you defined a tag called ``Home Utility`` that had a ``match`` property of
``bc hydro`` and a ``matching_algorithm`` of ``literal``, Paperless will
automatically tag your newly-consumed document with your ``Home Utility`` tag
so long as the text ``bc hydro`` appears in the body of the document somewhere.
The matching logic is quite powerful, and supports searching the text of your
document with different algorithms, and as such, some experimentation may be
necessary to get things Just Right.
.. _guesswork-content-howto:
How Do I Set Up These Matching Algorithms?
------------------------------------------
Setting up of the algorithms is easily done through the admin interface. When
you create a new correspondent or tag, there are optional fields for matching
text and matching algorithm. From the help info there:
.. note::
Which algorithm you want to use when matching text to the OCR'd PDF. Here,
"any" looks for any occurrence of any word provided in the PDF, while "all"
requires that every word provided appear in the PDF, albeit not in the
order provided. A "literal" match means that the text you enter must
appear in the PDF exactly as you've entered it, and "regular expression"
uses a regex to match the PDF. If you don't know what a regex is, you
probably don't want this option.
Then just save your tag/correspondent and run another document through the
consumer. Once complete, you should see the newly-created document,
automatically tagged with the appropriate data.

View File

@@ -1,25 +1,38 @@
*********
.. _index:
Paperless
*********
=========
Scan, index, and archive all of your paper documents. Say goodbye to paper.
.. cssclass:: redirect-notice
.. _index-why-this-exists:
The Paperless-ngx documentation has permanently moved.
Why This Exists
===============
You will be redirected shortly...
Paper is a nightmare. Environmental issues aside, there's no excuse for it in
the 21st century. It takes up space, collects dust, doesn't support any form of
a search feature, indexing is tedious, it's heavy and prone to damage & loss.
I wrote this to make "going paperless" easier. I wanted to be able to feed
documents right from the post box into the scanner and then shred them so I
never have to worry about finding stuff again. Perhaps you might find it useful
too.
Contents
========
.. toctree::
:maxdepth: 2
screenshots
scanners
administration
advanced_usage
usage_overview
setup
troubleshooting
changelog
configuration
extending
api
faq
requirements
setup
consumption
api
utilities
guesswork
migrating
troubleshooting
changelog

108
docs/migrating.rst Normal file
View File

@@ -0,0 +1,108 @@
.. _migrating:
Migrating, Updates, and Backups
===============================
As Paperless is still under active development, there's a lot that can change
as software updates roll out. You should backup often, so if anything goes
wrong during an update, you at least have a means of restoring to something
usable. Thankfully, there are automated ways of backing up, restoring, and
updating the software.
.. _migrating-backup:
Backing Up
----------
So you're bored of this whole project, or you want to make a remote backup of
the unencrypted files for whatever reason. This is easy to do, simply use the
:ref:`exporter <utilities-exporter>` to dump your documents and database out
into an arbitrary directory.
.. _migrating-restoring:
Restoring
---------
Restoring your data is just as easy, since nearly all of your data exists either
in the file names, or in the contents of the files themselves. You just need to
create an empty database (just follow the
:ref:`installation instructions <setup-installation>` again) and then import the
``tags.json`` file you created as part of your backup. Lastly, copy your
exported documents into the consumption directory and start up the consumer.
.. code-block:: shell-session
$ cd /path/to/project
$ rm data/db.sqlite3 # Delete the database
$ cd src
$ ./manage.py migrate # Create the database
$ ./manage.py createsuperuser
$ ./manage.py loaddata /path/to/arbitrary/place/tags.json
$ cp /path/to/exported/docs/* /path/to/consumption/dir/
$ ./manage.py document_consumer
Importing your data if you are :ref:`using Docker <setup-installation-docker>`
is almost as simple:
.. code-block:: shell-session
# Stop and remove your current containers
$ docker-compose stop
$ docker-compose rm -f
# Recreate them, add the superuser
$ docker-compose up -d
$ docker-compose run --rm webserver createsuperuser
# Load the tags
$ cat /path/to/arbitrary/place/tags.json | docker-compose run --rm webserver loaddata_stdin -
# Load your exported documents into the consumption directory
# (How you do this highly depends on how you have set this up)
$ cp /path/to/exported/docs/* /path/to/mounted/consumption/dir/
After loading the documents into the consumption directory the consumer will
immediately start consuming the documents.
.. _migrating-updates:
Updates
-------
For the most part, all you have to do to update Paperless is run ``git pull``
on the directory containing the project files, and then use Django's
``migrate`` command to execute any database schema updates that might have been
rolled in as part of the update:
.. code-block:: shell-session
$ cd /path/to/project
$ git pull
$ cd src
$ ./manage.py migrate
Note that it's possible (even likely) that while ``git pull`` may update some
files, the ``migrate`` step may not update anything. This is totally normal.
Additionally, as new features are added, the ability to control those features
is typically added by way of an environment variable set in ``paperless.conf``.
You may want to take a look at the ``paperless.conf.example`` file to see if
there's anything new in there compared to what you've got int ``/etc``.
If you are :ref:`using Docker <setup-installation-docker>` the update process
requires only one additional step:
.. code-block:: shell-session
$ cd /path/to/project
$ git pull
$ docker build -t paperless .
$ docker-compose up -d
$ docker-compose run --rm webserver migrate
If ``git pull`` doesn't report any changes, there is no need to continue with
the remaining steps.

119
docs/requirements.rst Normal file
View File

@@ -0,0 +1,119 @@
.. _requirements:
Requirements
============
You need a Linux machine or Unix-like setup (theoretically an Apple machine
should work) that has the following software installed on it:
* `Python3`_ (with development libraries, pip and virtualenv)
* `GNU Privacy Guard`_
* `Tesseract`_, plus its language files matching your document base.
* `Imagemagick`_ version 6.7.5 or higher
* `unpaper`_
.. _Python3: https://python.org/
.. _GNU Privacy Guard: https://gnupg.org
.. _Tesseract: https://github.com/tesseract-ocr
.. _Imagemagick: http://imagemagick.org/
.. _unpaper: https://www.flameeyes.eu/projects/unpaper
Notably, you should confirm how you access your Python3 installation. Many
Linux distributions will install Python3 in parallel to Python2, using the names
``python3`` and ``python`` respectively. The same goes for ``pip3`` and
``pip``. Using Python2 will likely break things, so make sure that you're using
the right version.
For the purposes of simplicity, ``python`` and ``pip`` is used everywhere to
refer to their Python 3 versions.
In addition to the above, there are a number of Python requirements, all of
which are listed in a file called ``requirements.txt`` in the project root.
If you're not working on a virtual environment (like Vagrant or Docker), you
should probably be using a virtualenv, but that's your call. The reasons why
you might choose a virtualenv or not aren't really within the scope of this
document. Needless to say if you don't know what a virtualenv is, you should
probably figure that out before continuing.
.. _requirements-apple:
Apple-tastic Complications
--------------------------
Some users have `run into problems`_ with installing ImageMagick on Apple
systems using HomeBrew. The solution appears to be to install ghostscript as
well as ImageMagick:
.. _run into problems: https://github.com/danielquinn/paperless/issues/25
.. code:: bash
$ brew install ghostscript
$ brew install imagemagick
$ brew install libmagic
.. _requirements-baremetal:
Python-specific Requirements: No Virtualenv
-------------------------------------------
If you don't care to use a virtual env, then installation of the Python
dependencies is easy:
.. code:: bash
$ pip install --user --requirement /path/to/paperless/requirements.txt
This should download and install all of the requirements into
``${HOME}/.local``. Remember that your distribution may be using ``pip3`` as
mentioned above.
.. _requirements-virtualenv:
Python-specific Requirements: Virtualenv
----------------------------------------
Using a virtualenv for this is pretty straightforward: create a virtualenv,
enter it, and install the requirements using the ``requirements.txt`` file:
.. code:: bash
$ virtualenv --python=/path/to/python3 /path/to/arbitrary/directory
$ . /path/to/arbitrary/directory/bin/activate
$ pip install --requirement /path/to/paperless/requirements.txt
Now you're ready to go. Just remember to enter your virtualenv whenever you
want to use Paperless.
.. _requirements-documentation:
Documentation
-------------
As generation of the documentation is not required for use of Paperless,
dependencies for this process are not included in ``requirements.txt``. If
you'd like to generate your own docs locally, you'll need to:
.. code:: bash
$ pip install sphinx
and then cd into the ``docs`` directory and type ``make html``.
If you are using Docker, you can use the following commands to build the
documentation and run a webserver serving it on `port 8001`_:
.. code:: bash
$ pwd
/path/to/paperless
$ docker build -t paperless:docs -f docs/Dockerfile .
$ docker run --rm -it -p "8001:8000" paperless:docs
.. _port 8001: http://127.0.0.1:8001

Some files were not shown because too many files have changed in this diff Show More