Add section for extending

No need to extend object
Conform to an 80-character limit
2018-02-03 15:31:20 +00:00 · 2018-02-03 15:26:28 +00:00 · 2018-02-03 15:26:09 +00:00 · 2018-02-03 14:49:48 +00:00 · 2018-02-03 14:49:17 +00:00 · 2018-02-03 14:49:01 +00:00
37 changed files with 972 additions and 480 deletions
--- a/.travis.yml
+++ b/.travis.yml
@@ -1,5 +1,9 @@
 language: python
 before_install:
 - sudo apt-get update -qq
 - sudo apt-get install -qq libpoppler-cpp-dev unpaper tesseract-ocr tesseract-ocr-eng
 sudo: false
 matrix:
@@ -11,7 +15,7 @@ matrix:
        - python: 3.6
          env: TOXENV=py36
        - python: 3.6
-          env: TOXENV=pep8
+          env: TOXENV=pycodestyle
 install:
    - pip install --requirement requirements.txt
--- a/76
+++ b/76
@@ -1,50 +1,48 @@
-FROM python:3.5
+FROM alpine:3.7
 MAINTAINER Pit Kleyersburg <pitkley@googlemail.com>
-# Install dependencies
+LABEL maintainer="The Paperless Project https://github.com/danielquinn/paperless" \
-RUN apt-get update \
+      contributors="Guy Addadi <addadi@gmail.com>, Pit Kleyersburg <pitkley@googlemail.com>, \
-    && apt-get install -y --no-install-recommends \
+        Sven Fischer <git-dev@linux4tw.de>"
        sudo \
        tesseract-ocr tesseract-ocr-eng imagemagick ghostscript unpaper \
    && rm -rf /var/lib/apt/lists/*
 # Install python dependencies
 RUN mkdir -p /usr/src/paperless
 WORKDIR /usr/src/paperless
 COPY requirements.txt /usr/src/paperless/
 RUN pip install --no-cache-dir -r requirements.txt
 # Copy application
-RUN mkdir -p /usr/src/paperless/src
+COPY requirements.txt /usr/src/paperless/
 RUN mkdir -p /usr/src/paperless/data
 RUN mkdir -p /usr/src/paperless/media
 COPY src/ /usr/src/paperless/src/
 COPY data/ /usr/src/paperless/data/
 COPY media/ /usr/src/paperless/media/
 # Set consumption directory
 ENV PAPERLESS_CONSUMPTION_DIR /consume
 RUN mkdir -p $PAPERLESS_CONSUMPTION_DIR
 # Migrate database
 WORKDIR /usr/src/paperless/src
 RUN ./manage.py migrate
 # Create user
 RUN groupadd -g 1000 paperless \
    && useradd -u 1000 -g 1000 -d /usr/src/paperless paperless \
    && chown -Rh paperless:paperless /usr/src/paperless
 # Set export directory
 ENV PAPERLESS_EXPORT_DIR /export
 RUN mkdir -p $PAPERLESS_EXPORT_DIR
 # Setup entrypoint
 COPY scripts/docker-entrypoint.sh /sbin/docker-entrypoint.sh
 RUN chmod 755 /sbin/docker-entrypoint.sh
-# Mount volumes
+# Set export and consumption directories
 ENV PAPERLESS_EXPORT_DIR=/export \
    PAPERLESS_CONSUMPTION_DIR=/consume
 # Install dependencies
 RUN apk --no-cache --update add \
        python3 gnupg libmagic bash \
        sudo poppler tesseract-ocr imagemagick ghostscript unpaper && \
    apk --no-cache add --virtual .build-dependencies \
        python3-dev poppler-dev gcc g++ musl-dev zlib-dev jpeg-dev && \
 # Install python dependencies
    python3 -m ensurepip && \
    rm -r /usr/lib/python*/ensurepip && \
    cd /usr/src/paperless && \
    pip3 install --no-cache-dir -r requirements.txt && \
 # Remove build dependencies
    apk del .build-dependencies && \
 # Create the consumption directory
    mkdir -p $PAPERLESS_CONSUMPTION_DIR && \
 # Migrate database
    ./src/manage.py migrate && \
 # Create user
    addgroup -g 1000 paperless && \
    adduser -D -u 1000 -G paperless -h /usr/src/paperless paperless && \
    chown -Rh paperless:paperless /usr/src/paperless && \
    mkdir -p $PAPERLESS_EXPORT_DIR && \
 # Setup entrypoint
    chmod 755 /sbin/docker-entrypoint.sh
 WORKDIR /usr/src/paperless/src
 # Mount volumes and set Entrypoint
 VOLUME ["/usr/src/paperless/data", "/usr/src/paperless/media", "/consume", "/export"]
 ENTRYPOINT ["/sbin/docker-entrypoint.sh"]
 CMD ["--help"]
--- a/README.md
+++ b/README.md
@@ -0,0 +1,70 @@
 # Paperless
 ![Documentation](https://readthedocs.org/projects/paperless/badge/?version=latest) ![Chat](https://badges.gitter.im/danielquinn/paperless.svg) ![Travis](https://travis-ci.org/danielquinn/paperless.svg?branch=master)
 Index and archive all of your scanned paper documents
 I hate paper.  Environmental issues aside, it's a tech person's nightmare:
 * There's no search feature
 * It takes up physical space
 * Backups mean more paper
 In the past few months I've been bitten more than a few times by the problem of not having the right document around.  Sometimes I recycled a document I needed (who keeps water bills for two years?) and other times I just lost it... because paper.  I wrote this to make my life easier.
 ## How it Works
 Paperless does not control your scanner, it only helps you deal with what your scanner produces
 1. Buy a document scanner that can write to a place on your network.  If you need some inspiration, have a look at the [scanner recommendations](https://paperless.readthedocs.io/en/latest/scanners.html) page.
 2. Set it up to "scan to FTP" or something similar. It should be able to push scanned images to a server without you having to do anything.  Of course if your scanner doesn't know how to automatically upload the file somewhere, you can always do that manually.  Paperless doesn't care how the documents get into its local consumption directory.
 3. Have the target server run the Paperless consumption script to OCR the file and index it into a local database.
 4. Use the web frontend to sift through the database and find what you want.
 5. Download the PDF you need/want via the web interface and do whatever you like with it.  You can even print it and send it as if it's the original. In most cases, no one will care or notice.
 Here's what you get:
 ![The before and after](https://raw.githubusercontent.com/danielquinn/paperless/master/docs/_static/screenshot.png)
 ## Documentation
 It's all available on [ReadTheDocs](https://paperless.readthedocs.org/).
 ## Requirements
 This is all really a quite simple, shiny, user-friendly wrapper around some very powerful tools.
 * [ImageMagick](http://imagemagick.org/) converts the images between colour and greyscale.
 * [Tesseract](https://github.com/tesseract-ocr) does the character recognition.
 * [Unpaper](https://www.flameeyes.eu/projects/unpaper) despeckles and deskews the scanned image.
 * [GNU Privacy Guard](https://gnupg.org/) is used as the encryption backend.
 * [Python 3](https://python.org/) is the language of the project.
  * [Pillow](https://pypi.python.org/pypi/pillowfight/) loads the image data as a python object to be used with PyOCR.
  * [PyOCR](https://github.com/jflesch/pyocr) is a slick programmatic wrapper around tesseract.
  * [Django](https://www.djangoproject.com/) is the framework this project is written against.
  * [Python-GNUPG](http://pythonhosted.org/python-gnupg/) decrypts the PDFs on-the-fly to allow you to download unencrypted files, leaving the encrypted ones on-disk.
 ## Stability
 This project has been around since 2015, and there's lots of people using it, however it's still under active development (just look at the git commit history) so don't expect it to be 100% stable.  You can backup the sqlite3 database, media directory and your configuration file to be on the safe side.
 ## Similar Projects
 There's another project out there called [Mayan EDMS](https://mayan.readthedocs.org/en/latest/) that has a surprising amount of technical overlap with Paperless.  Also based on Django and using a consumer model with Tesseract and Unpaper, Mayan EDMS is *much* more featureful and comes with a slick UI as well, but still in Python 2. It may be that Paperless consumes fewer resources, but to be honest, this is just a guess as I haven't tested this myself.  One thing's for certain though, *Paperless* is a **way** better name.
 ## Important Note
 Document scanners are typically used to scan sensitive documents.  Things like your social insurance number, tax records, invoices, etc.  While Paperless encrypts the original files via the consumption script, the OCR'd text is *not* encrypted and is therefore stored in the clear (it needs to be searchable, so if someone has ideas on how to do that on encrypted data, I'm all ears).  This means that Paperless should never be run on an untrusted host.  Instead, I recommend that if you do want to use it, run it locally on a server in your own home.
 ## Donations
 As with all Free software, the power is less in the finances and more in the collective efforts.  I really appreciate every pull request and bug report offered up by Paperless' users, so please keep that stuff coming.  If however, you're not one for coding/design/documentation, and would like to contribute financially, I won't say no ;-)
 The thing is, I'm doing ok for money, so I would instead ask you to donate to the [United Nations High Commissioner for Refugees](https://donate.unhcr.org/int-en/general). They're doing important work and they need the money a lot more than I do.
--- a/README.rst
+++ b/README.rst
@@ -1,144 +0,0 @@
 Paperless
 #########
 |Documentation|
 |Chat|
 |Travis|
 |Dependencies|
 Index and archive all of your scanned paper documents
 I hate paper.  Environmental issues aside, it's a tech person's nightmare:
 * There's no search feature
 * It takes up physical space
 * Backups mean more paper
 In the past few months I've been bitten more than a few times by the problem
 of not having the right document around.  Sometimes I recycled a document I
 needed (who keeps water bills for two years?) and other times I just lost
 it... because paper.  I wrote this to make my life easier.
 How it Works
 ============
 Paperless does not control your scanner, it only helps you deal with what your
 scanner produces
 1. Buy a document scanner that can write to a place on your network.  If you
   need some inspiration, have a look at the `scanner recommendations`_ page.
   recommended by another user.
 2. Set it up to "scan to FTP" or something similar. It should be able to push
   scanned images to a server without you having to do anything.  If your
   scanner doesn't know how to automatically upload the file somewhere, you can
   always do that manually.  Paperless doesn't care how the documents get into
   its local consumption directory.
 3. Have the target server run the Paperless consumption script to OCR the file
   and index it into a local database.
 4. Use the web frontend to sift through the database and find what you want.
 5. Download the PDF you need/want via the web interface and do whatever you
   like with it.  You can even print it and send it as if it's the original.
   In most cases, no one will care or notice.
 Here's what you get:
 .. image:: docs/_static/screenshot.png
   :alt: The before and after
   :target: docs/_static/screenshot.png
 Stability
 =========
 Paperless is still under active development (just look at the git commit
 history) so don't expect it to be 100% stable.  You can backup the sqlite3
 database, media directory and your configuration file to be on the safe side.
 Requirements
 ============
 This is all really a quite simple, shiny, user-friendly wrapper around some
 very powerful tools.
 * `ImageMagick`_ converts the images between colour and greyscale.
 * `Tesseract`_ does the character recognition.
 * `Unpaper`_ despeckles and deskews the scanned image.
 * `GNU Privacy Guard`_ is used as the encryption backend.
 * `Python 3`_ is the language of the project.
  * `Pillow`_ loads the image data as a python object to be used with PyOCR.
  * `PyOCR`_ is a slick programmatic wrapper around tesseract.
  * `Django`_ is the framework this project is written against.
  * `Python-GNUPG`_ decrypts the PDFs on-the-fly to allow you to download
    unencrypted files, leaving the encrypted ones on-disk.
 Documentation
 =============
 It's all available on `ReadTheDocs`_.
 Similar Projects
 ================
 There's another project out there called `Mayan EDMS`_ that has a surprising
 amount of technical overlap with Paperless.  Also based on Django and using
 a consumer model with Tesseract and Unpaper, Mayan EDMS is *much* more
 featureful and comes with a slick UI as well, but still in Python 2. It may be
 that Paperless consumes fewer resources, but to be honest, this is just a guess
 as I haven't tested this myself.  One thing's for certain though, *Paperless*
 is a **much** better name.
 Important Note
 ==============
 Document scanners are typically used to scan sensitive documents.  Things like
 your social insurance number, tax records, invoices, etc.  While Paperless
 encrypts the original files via the consumption script, the OCR'd text is *not*
 encrypted and is therefore stored in the clear (it needs to be searchable, so
 if someone has ideas on how to do that on encrypted data, I'm all ears).  This
 means that Paperless should never be run on an untrusted host.  Instead, I
 recommend that if you do want to use it, run it locally on a server in your own
 home.
 Donations
 =========
 As with all Free software, the power is less in the finances and more in the
 collective efforts.  I really appreciate every pull request and bug report
 offered up by Paperless' users, so please keep that stuff coming.  If however,
 you're not one for coding/design/documentation, and would like to contribute
 financially, I won't say no ;-)
 The thing is, I'm doing ok for money, so I would instead ask you to donate to
 the `United Nations High Commissioner for Refugees`_.  They're doing important
 work and they need the money a lot more than I do.
 .. _scanner recommendations: https://paperless.readthedocs.io/en/latest/scanners.html
 .. _ImageMagick: http://imagemagick.org/
 .. _Tesseract: https://github.com/tesseract-ocr
 .. _Unpaper: https://www.flameeyes.eu/projects/unpaper
 .. _GNU Privacy Guard: https://gnupg.org/
 .. _Python 3: https://python.org/
 .. _Pillow: https://pypi.python.org/pypi/pillowfight/
 .. _PyOCR: https://github.com/jflesch/pyocr
 .. _Django: https://www.djangoproject.com/
 .. _Python-GNUPG: http://pythonhosted.org/python-gnupg/
 .. _ReadTheDocs: https://paperless.readthedocs.org/
 .. _Mayan EDMS: https://mayan.readthedocs.org/en/latest/
 .. _United Nations High Commissioner for Refugees: https://donate.unhcr.org/int-en/general
 .. |Documentation| image:: https://readthedocs.org/projects/paperless/badge/?version=latest
   :alt: Read the documentation at https://paperless.readthedocs.org/
   :target: https://paperless.readthedocs.org/
 .. |Chat| image:: https://badges.gitter.im/danielquinn/paperless.svg
   :alt: Join the chat at https://gitter.im/danielquinn/paperless
   :target: https://gitter.im/danielquinn/paperless?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
 .. |Travis| image:: https://travis-ci.org/danielquinn/paperless.svg?branch=master
   :target: https://travis-ci.org/danielquinn/paperless
 .. |Dependencies| image:: https://www.versioneye.com/user/projects/57b33b81d9f1b00016faa500/badge.svg
   :target: https://www.versioneye.com/user/projects/57b33b81d9f1b00016faa500
--- a/docker-compose.yml.example
+++ b/docker-compose.yml.example
@@ -2,7 +2,7 @@ version: '2'
 services:
    webserver:
-        image: pitkley/paperless
+        build: ./
        ports:
            # You can adapt the port you want Paperless to listen on by
            # modifying the part before the `:`.
@@ -20,7 +20,7 @@ services:
        command: ["runserver", "--insecure", "0.0.0.0:8000"]
    consumer:
-        image: pitkley/paperless
+        build: ./
        volumes:
            - data:/usr/src/paperless/data
            - media:/usr/src/paperless/media
--- a/docs/changelog.rst
+++ b/docs/changelog.rst
@@ -1,235 +1,302 @@
 Changelog
 #########
-* 1.0.0
+1.2.0
-  * Upgrade to Django 1.11.  **You'll need to run
+=====
-    ``pip install -r requirements.txt`` to after the usual ``git pull`` to
+
 * New Docker image, now based on Alpine, thanks to the efforts of `addadi`_
  and `Pit`_.  This new image is dramatically smaller than the Debian-based
  one, and it also has `a new home on Docker Hub`_.  A proper thank-you to
  `Pit`_ for hosting the image on his Docker account all this time, but after
  some discussion, we decided the image needed a more *official-looking* home.
 * `BastianPoe`_ has added the long-awaited feature to automatically skip the
  OCR step when the PDF already contains text. This can be overridden by
  setting ``PAPERLESS_OCR_ALWAYS=YES`` either in your ``paperless.conf`` or
  in the environment.  Note that this also means that Paperless now requires
  ``libpoppler-cpp-dev`` to be installed. **Important**: You'll need to run
  ``pip install -r requirements.txt`` after the usual ``git pull`` to
  properly update.
 * `BastianPoe`_ has also contributed a monumental amount of work (`#291`_) to
  solving `#158`_: setting the document creation date based on finding a date
  in the document text.
 1.1.0
 =====
 * Fix for `#283`_, a redirect bug which broke interactions with
  paperless-desktop.  Thanks to `chris-aeviator`_ for reporting it.
 * Addition of an optional new financial year filter, courtesy of
  `David Martin`_ `#256`_
 * Fixed a typo in how thumbnails were named in exports `#285`_, courtesy of
  `Dan Panzarella`_
 1.0.0
 =====
 * Upgrade to Django 1.11.  **You'll need to run
  ``pip install -r requirements.txt`` after the usual ``git pull`` to
  properly update**.
-  * Replace the templatetag-based hack we had for document listing in favour of
+* Replace the templatetag-based hack we had for document listing in favour of
  a slightly less ugly solution in the form of another template tag with less
  copypasta.
-  * Support for multi-word-matches for auto-tagging thanks to an excellent
+* Support for multi-word-matches for auto-tagging thanks to an excellent
  patch from `ishirav`_ `#277`_.
-  * Fixed a CSS bug reported by `Stefan Hagen`_ that caused an overlapping of
+* Fixed a CSS bug reported by `Stefan Hagen`_ that caused an overlapping of
  the text and checkboxes under some resolutions `#272`_.
-  * Patched the Docker config to force the serving of static files.  Credit for
+* Patched the Docker config to force the serving of static files.  Credit for
  this one goes to `dev-rke`_ via `#248`_.
-  * Fix file permissions during Docker start up thanks to `Pit`_ on `#268`_.
+* Fix file permissions during Docker start up thanks to `Pit`_ on `#268`_.
-  * Date fields in the admin are now expressed as HTML5 date fields thanks to
+* Date fields in the admin are now expressed as HTML5 date fields thanks to
  `Lukas Winkler`_'s issue `#278`_
-* 0.8.0
+0.8.0
-  * Paperless can now run in a subdirectory on a host (``/paperless``), rather
+=====
 * Paperless can now run in a subdirectory on a host (``/paperless``), rather
  than always running in the root (``/``) thanks to `maphy-psd`_'s work on
  `#255`_.
-* 0.7.0
+0.7.0
-  * **Potentially breaking change**: As per `#235`_, Paperless will no longer
+=====
 * **Potentially breaking change**: As per `#235`_, Paperless will no longer
  automatically delete documents attached to correspondents when those
  correspondents are themselves deleted.  This was Django's default
  behaviour, but didn't make much sense in Paperless' case.  Thanks to
  `Thomas Brueggemann`_ and `David Martin`_ for their input on this one.
-  * Fix for `#232`_ wherein Paperless wasn't recognising ``.tif`` files
+* Fix for `#232`_ wherein Paperless wasn't recognising ``.tif`` files
  properly.  Thanks to `ayounggun`_ for reporting this one and to
  `Kusti Skytén`_ for posting the correct solution in the Github issue.
-* 0.6.0
+0.6.0
-  * Abandon the shared-secret trick we were using for the POST API in favour
+=====
 * Abandon the shared-secret trick we were using for the POST API in favour
  of BasicAuth or Django session.
-  * Fix the POST API so it actually works.  `#236`_
+* Fix the POST API so it actually works.  `#236`_
-  * **Breaking change**: We've dropped the use of ``PAPERLESS_SHARED_SECRET``
+* **Breaking change**: We've dropped the use of ``PAPERLESS_SHARED_SECRET``
  as it was being used both for the API (now replaced with a normal auth)
  and form email polling.  Now that we're only using it for email, this
  variable has been renamed to ``PAPERLESS_EMAIL_SECRET``.  The old value
  will still work for a while, but you should change your config if you've
  been using the email polling feature.  Thanks to `Joshua Gilman`_ for all
  the help with this feature.
-* 0.5.0
+
-  * Support for fuzzy matching in the auto-tagger & auto-correspondent systems
+0.5.0
 =====
 * Support for fuzzy matching in the auto-tagger & auto-correspondent systems
  thanks to `Jake Gysland`_'s patch `#220`_.
-  * Modified the Dockerfile to prepare an export directory (`#212`_).  Thanks
+* Modified the Dockerfile to prepare an export directory (`#212`_).  Thanks
  to combined efforts from `Pit`_ and `Strubbl`_ in working out the kinks on
  this one.
-  * Updated the import/export scripts to include support for thumbnails.  Big
+* Updated the import/export scripts to include support for thumbnails.  Big
  thanks to `CkuT`_ for finding this shortcoming and doing the work to get
  it fixed in `#224`_.
-  * All of the following changes are thanks to `David Martin`_:
+* All of the following changes are thanks to `David Martin`_:
  * Bumped the dependency on pyocr to 0.4.7 so new users can make use of
  Tesseract 4 if they so prefer (`#226`_).
  * Fixed a number of issues with the automated mail handler (`#227`_, `#228`_)
  * Amended the documentation for better handling of systemd service files (`#229`_)
  * Amended the Django Admin configuration to have nice headers (`#230`_)
-* 0.4.1
+0.4.1
-  * Fix for `#206`_ wherein the pluggable parser didn't recognise files with
+=====
 * Fix for `#206`_ wherein the pluggable parser didn't recognise files with
  all-caps suffixes like ``.PDF``
-* 0.4.0
+0.4.0
-  * Introducing reminders.  See `#199`_ for more information, but the short
+=====
 * Introducing reminders.  See `#199`_ for more information, but the short
  explanation is that you can now attach simple notes & times to documents
  which are made available via the API.  Currently, the default API
  (basically just the Django admin) doesn't really make use of this, but
  `Thomas Brueggemann`_ over at `Paperless Desktop`_ has said that he would
  like to make use of this feature in his project.
-* 0.3.6
+0.3.6
-  * Fix for `#200`_ (!!) where the API wasn't configured to allow updating the
+=====
 * Fix for `#200`_ (!!) where the API wasn't configured to allow updating the
  correspondent or the tags for a document.
-  * The ``content`` field is now optional, to allow for the edge case of a
+* The ``content`` field is now optional, to allow for the edge case of a
  purely graphical document.
-  * You can no longer add documents via the admin.  This never worked in the
+* You can no longer add documents via the admin.  This never worked in the
  first place, so all I've done here is remove the link to the broken form.
-  * The consumer code has been heavily refactored to support a pluggable
+* The consumer code has been heavily refactored to support a pluggable
  interface.  Install a paperless consumer via pip and tell paperless about
  it with an environment variable, and you're good to go.  Proper
  documentation is on its way.
-* 0.3.5
+0.3.5
-  * A serious facelift for the documents listing page wherein we drop the
+=====
 * A serious facelift for the documents listing page wherein we drop the
  tabular layout in favour of a tiled interface.
-  * Users can now configure the number of items per page.
+* Users can now configure the number of items per page.
-  * Fix for `#171`_: Allow users to specify their own ``SECRET_KEY`` value.
+* Fix for `#171`_: Allow users to specify their own ``SECRET_KEY`` value.
-  * Moved the dotenv loading to the top of settings.py
+* Moved the dotenv loading to the top of settings.py
-  * Fix for `#112`_: Added checks for binaries required for document
+* Fix for `#112`_: Added checks for binaries required for document
  consumption.
-* 0.3.4
+0.3.4
-  * Removal of django-suit due to a licensing conflict I bumped into in 0.3.3.
+=====
 * Removal of django-suit due to a licensing conflict I bumped into in 0.3.3.
  Note that you *can* use Django Suit with Paperless, but only in a
  non-profit situation as their free license prohibits for-profit use.  As a
  result, I can't bundle Suit with Paperless without conflicting with the
  GPL.  Further development will be done against the stock Django admin.
-  * I shrunk the thumbnails a little 'cause they were too big for me, even on
+* I shrunk the thumbnails a little 'cause they were too big for me, even on
  my high-DPI monitor.
-  * BasicAuth support for document and thumbnail downloads, as well as the Push
+* BasicAuth support for document and thumbnail downloads, as well as the Push
  API thanks to @thomasbrueggemann.  See `#179`_.
-* 0.3.3
+0.3.3
-  * Thumbnails in the UI and a Django-suit -based face-lift courtesy of @ekw!
+=====
-  * Timezone, items per page, and default language are now all configurable,
+
 * Thumbnails in the UI and a Django-suit -based face-lift courtesy of @ekw!
 * Timezone, items per page, and default language are now all configurable,
  also thanks to @ekw.
-* 0.3.2
+0.3.2
-  * Fix for `#172`_: defaulting ALLOWED_HOSTS to ``["*"]`` and allowing the
+=====
 * Fix for `#172`_: defaulting ALLOWED_HOSTS to ``["*"]`` and allowing the
  user to set her own value via ``PAPERLESS_ALLOWED_HOSTS`` should the need
  arise.
-* 0.3.1
+0.3.1
-  * Added a default value for ``CONVERT_BINARY``
+=====
-* 0.3.0
+* Added a default value for ``CONVERT_BINARY``
-  * Updated to using django-filter 1.x
+
-  * Added some system checks so new users aren't confused by misconfigurations.
+0.3.0
-  * Consumer loop time is now configurable for systems with slow writes.  Just
+=====
 * Updated to using django-filter 1.x
 * Added some system checks so new users aren't confused by misconfigurations.
 * Consumer loop time is now configurable for systems with slow writes.  Just
  set ``PAPERLESS_CONSUMER_LOOP_TIME`` to a number of seconds.  The default
  is 10.
-  * As per `#44`_, we've removed support for ``PAPERLESS_CONVERT``,
+* As per `#44`_, we've removed support for ``PAPERLESS_CONVERT``,
  ``PAPERLESS_CONSUME``, and ``PAPERLESS_SECRET``.  Please use
  ``PAPERLESS_CONVERT_BINARY``, ``PAPERLESS_CONSUMPTION_DIR``, and
  ``PAPERLESS_SHARED_SECRET`` respectively instead.
-* 0.2.0
+0.2.0
 =====
-  * `#150`_: The media root is now a variable you can set in
+* `#150`_: The media root is now a variable you can set in
  ``paperless.conf``.
-  * `#148`_: The database location (sqlite) is now a variable you can set in
+* `#148`_: The database location (sqlite) is now a variable you can set in
  ``paperless.conf``.
-  * `#146`_: Fixed a bug that allowed unauthorised access to the ``/fetch``
+* `#146`_: Fixed a bug that allowed unauthorised access to the ``/fetch``
  URL.
-  * `#131`_: Document files are now automatically removed from disk when
+* `#131`_: Document files are now automatically removed from disk when
  they're deleted in Paperless.
-  * `#121`_: Fixed a bug where Paperless wasn't setting document creation time
+* `#121`_: Fixed a bug where Paperless wasn't setting document creation time
  based on the file naming scheme.
-  * `#81`_: Added a hook to run an arbitrary script after every document is
+* `#81`_: Added a hook to run an arbitrary script after every document is
  consumed.
-  * `#98`_: Added optional environment variables for ImageMagick so that it
+* `#98`_: Added optional environment variables for ImageMagick so that it
  doesn't explode when handling Very Large Documents or when it's just
  running on a low-memory system.  Thanks to `Florian Harr`_ for his help on
  this one.
-  * `#89`_ Ported the auto-tagging code to correspondents as well.  Thanks to
+* `#89`_ Ported the auto-tagging code to correspondents as well.  Thanks to
  `Justin Snyman`_ for the pointers in the issue queue.
-  * Added support for guessing the date from the file name along with the
+* Added support for guessing the date from the file name along with the
  correspondent, title, and tags.  Thanks to `Tikitu de Jager`_ for his pull
  request that I took forever to merge and to `Pit`_ for his efforts on the
  regex front.
-  * `#94`_: Restored support for changing the created date in the UI.  Thanks
+* `#94`_: Restored support for changing the created date in the UI.  Thanks
  to `Martin Honermeyer`_ and `Tim White`_ for working with me on this.
-* 0.1.1
+0.1.1
 =====
-  * Potentially **Breaking Change**: All references to "sender" in the code
+* Potentially **Breaking Change**: All references to "sender" in the code
  have been renamed to "correspondent" to better reflect the nature of the
  property (one could quite reasonably scan a document before sending it to
  someone.)
-  * `#67`_: Rewrote the document exporter and added a new importer that allows
+* `#67`_: Rewrote the document exporter and added a new importer that allows
  for full metadata retention without depending on the file name and
  modification time.  A big thanks to `Tikitu de Jager`_, `Pit`_,
  `Florian Jung`_, and `Christopher Luu`_ for their code snippets and
  contributing conversation that lead to this change.
-  * `#20`_: Added *unpaper* support to help in cleaning up the scanned image
+* `#20`_: Added *unpaper* support to help in cleaning up the scanned image
  before it's OCR'd.  Thanks to `Pit`_ for this one.
-  * `#71`_ Added (encrypted) thumbnails in anticipation of a proper UI.
+* `#71`_ Added (encrypted) thumbnails in anticipation of a proper UI.
-  * `#68`_: Added support for using a proper config file at
+* `#68`_: Added support for using a proper config file at
  ``/etc/paperless.conf`` and modified the systemd unit files to use it.
-  * Refactored the Vagrant installation process to use environment variables
+* Refactored the Vagrant installation process to use environment variables
  rather than asking the user to modify ``settings.py``.
-  * `#44`_: Harmonise environment variable names with constant names.
+* `#44`_: Harmonise environment variable names with constant names.
-  * `#60`_: Setup logging to actually use the Python native logging framework.
+* `#60`_: Setup logging to actually use the Python native logging framework.
-  * `#53`_: Fixed an annoying bug that caused ``.jpeg`` and ``.JPG`` images
+* `#53`_: Fixed an annoying bug that caused ``.jpeg`` and ``.JPG`` images
  to be imported but made unavailable.
-* 0.1.0
+0.1.0
 =====
-  * Docker support!  Big thanks to `Wayne Werner`_, `Brian Conn`_, and
+* Docker support!  Big thanks to `Wayne Werner`_, `Brian Conn`_, and
  `Tikitu de Jager`_ for this one, and especially to `Pit`_
  who spearheadded this effort.
-  * A simple REST API is in place, but it should be considered unstable.
+* A simple REST API is in place, but it should be considered unstable.
-  * Cleaned up the consumer to use temporary directories instead of a single
+* Cleaned up the consumer to use temporary directories instead of a single
  scratch space.  (Thanks `Pit`_)
-  * Improved the efficiency of the consumer by parsing pages more intelligently
+* Improved the efficiency of the consumer by parsing pages more intelligently
  and introducing a threaded OCR process (thanks again `Pit`_).
-  * `#45`_: Cleaned up the logic for tag matching.  Reported by `darkmatter`_.
+* `#45`_: Cleaned up the logic for tag matching.  Reported by `darkmatter`_.
-  * `#47`_: Auto-rotate landscape documents.  Reported by `Paul`_ and fixed by
+* `#47`_: Auto-rotate landscape documents.  Reported by `Paul`_ and fixed by
  `Pit`_.
-  * `#48`_: Matching algorithms should do so on a word boundary (`darkmatter`_)
+* `#48`_: Matching algorithms should do so on a word boundary (`darkmatter`_)
-  * `#54`_: Documented the re-tagger (`zedster`_)
+* `#54`_: Documented the re-tagger (`zedster`_)
-  * `#57`_: Make sure file is preserved on import failure (`darkmatter`_)
+* `#57`_: Make sure file is preserved on import failure (`darkmatter`_)
-  * Added tox with pep8 checking
+* Added tox with pep8 checking
-* 0.0.6
+0.0.6
 =====
-  * Added support for parallel OCR (significant work from `Pit`_)
+* Added support for parallel OCR (significant work from `Pit`_)
-  * Sped up the language detection (significant work from `Pit`_)
+* Sped up the language detection (significant work from `Pit`_)
-  * Added simple logging
+* Added simple logging
-* 0.0.5
+0.0.5
 =====
-  * Added support for image files as documents (png, jpg, gif, tiff)
+* Added support for image files as documents (png, jpg, gif, tiff)
-  * Added a crude means of HTTP POST for document imports
+* Added a crude means of HTTP POST for document imports
-  * Added IMAP mail support
+* Added IMAP mail support
-  * Added a re-tagging utility
+* Added a re-tagging utility
-  * Documentation for the above as well as data migration
+* Documentation for the above as well as data migration
-* 0.0.4
+0.0.4
 =====
-  * Added automated tagging basted on keyword matching
+* Added automated tagging basted on keyword matching
-  * Cleaned up the document listing page
+* Cleaned up the document listing page
-  * Removed ``User`` and ``Group`` from the admin
+* Removed ``User`` and ``Group`` from the admin
-  * Added ``pytz`` to the list of requirements
+* Added ``pytz`` to the list of requirements
-* 0.0.3
+0.0.3
 =====
-  * Added basic tagging
+* Added basic tagging
-* 0.0.2
+0.0.2
 =====
-  * Added language detection
+* Added language detection
-  * Added datestamps to ``document_exporter``.
+* Added datestamps to ``document_exporter``.
-  * Changed ``settings.TESSERACT_LANGUAGE`` to ``settings.OCR_LANGUAGE``.
+* Changed ``settings.TESSERACT_LANGUAGE`` to ``settings.OCR_LANGUAGE``.
-* 0.0.1
+0.0.1
 =====
-  * Initial release
+* Initial release
 .. _Brian Conn: https://github.com/TheConnMan
 .. _Christopher Luu: https://github.com/nuudles
@@ -258,6 +325,10 @@ Changelog
 .. _Stefan Hagen: https://github.com/xkpd3
 .. _dev-rke: https://github.com/dev-rke
 .. _Lukas Winkler: https://github.com/Findus23
 .. _chris-aeviator: https://github.com/chris-aeviator
 .. _Dan Panzarella: https://github.com/pzl
 .. _addadi: https://github.com/addadi
 .. _BastianPoe: https://github.com/BastianPoe
 .. _#20: https://github.com/danielquinn/paperless/issues/20
 .. _#44: https://github.com/danielquinn/paperless/issues/44
@@ -281,6 +352,7 @@ Changelog
 .. _#146: https://github.com/danielquinn/paperless/issues/146
 .. _#148: https://github.com/danielquinn/paperless/pull/148
 .. _#150: https://github.com/danielquinn/paperless/pull/150
 .. _#158: https://github.com/danielquinn/paperless/issues/158
 .. _#171: https://github.com/danielquinn/paperless/issues/171
 .. _#172: https://github.com/danielquinn/paperless/issues/172
 .. _#179: https://github.com/danielquinn/paperless/pull/179
@@ -304,3 +376,10 @@ Changelog
 .. _#272: https://github.com/danielquinn/paperless/issues/272
 .. _#248: https://github.com/danielquinn/paperless/issues/248
 .. _#278: https://github.com/danielquinn/paperless/issues/248
 .. _#283: https://github.com/danielquinn/paperless/issues/283
 .. _#256: https://github.com/danielquinn/paperless/pull/256
 .. _#285: https://github.com/danielquinn/paperless/pull/285
 .. _#291: https://github.com/danielquinn/paperless/pull/291
 .. _pipenv: https://docs.pipenv.org/
 .. _a new home on Docker Hub: https://hub.docker.com/r/danielquinn/paperless/
--- a/docs/extending.rst
+++ b/docs/extending.rst
@@ -0,0 +1,104 @@
 .. _extending:
 Extending Paperless
 ===================
 For the most part, Paperless is monolithic, so extending it is often best
 managed by way of modifying the code directly and issuing a pull request on
 `GitHub`_.  However, over time the project has been evolving to be a little
 more "pluggable" so that users can write their own stuff that talks to it.
 .. _GitHub: https://github.com/danielquinn/paperless
 .. _extending-parsers:
 Parsers
 -------
 You can leverage Paperless' consumption model to have it consume files *other*
 than ones handled by default like ``.pdf``, ``.jpg``, and ``.tiff``.  To do so,
 you simply follow Django's convention of creating a new app, with a few key
 requirements.
 .. _extending-parsers-parserspy:
 parsers.py
 ..........
 In this file, you create a class that extends
 ``documents.parsers.DocumentParser`` and go about implementing the three
 required methods:
 * ``get_thumbnail()``: Returns the path to a file we can use as a thumbnail for
  this document.
 * ``get_text()``: Returns the text from the document and only the text.
 * ``get_date()``: If possible, this returns the date of the document, otherwise
  it should return ``None``.
 .. _extending-parsers-signalspy:
 signals.py
 ..........
 At consumption time, Paperless emits a ``document_consumer_declaration``
 signal which your module has to react to in order to let the consumer know
 whether or not it's capable of handling a particular file.  Think of it like
 this:
 1. Consumer finds a file in the consumption directory.
 2. It asks all the available parsers: *"Hey, can you handle this file?"*
 3. The first parser that says yes gets to handle the file.  The order in which
   the parsers are asked is handled by sorting ``INSTALLED_APPS`` in
   ``settings.py``.
 .. _extending-parsers-appspy:
 apps.py
 .......
 This is a standard Django file, but you'll need to add some code to it to
 register your parser as being able to handle particular files.
 .. _extending-parsers-finally:
 Finally
 .......
 The last step is to update ``settings.py`` to include your new module.
 Eventually, this will be dynamic, but at the moment, you have to edit the
 ``INSTALLED_APPS`` section manually.  Simply add the path to your AppConfig to
 the list like this:
 .. code:: python
    INSTALLED_APPS = [
        ...
        "my_module.apps.MyModuleConfig",
        "paperless_tesseract.apps.PaperlessTesseractConfig",
        ...
    ]
 Note that we're placing our module *above* ``PaperlessTesseractConfig``.  This
 is to ensure that if your module wants to handle any files typically handled by
 the default module, yours will win instead.  If there's no conflict between
 what your module does and the default, then order doesn't matter.
 .. _extending-parsers-example:
 An Example
 ..........
 The core Paperless functionality is based on this design, so if you want to see
 what a parser module should look like, have a look at `parsers.py`_,
 `signals.py`_, and `apps.py`_ in the `paperless_tesseract`_ module.
 .. _parsers.py: https://github.com/danielquinn/paperless/blob/master/src/paperless_tesseract/parsers.py
 .. _signals.py: https://github.com/danielquinn/paperless/blob/master/src/paperless_tesseract/signals.py
 .. _apps.py: https://github.com/danielquinn/paperless/blob/master/src/paperless_tesseract/apps.py
 .. _paperless_tesseract: https://github.com/danielquinn/paperless/blob/master/src/paperless_tesseract/
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -5,9 +5,9 @@ Paperless
 Paperless is a simple Django application running in two parts:
 a :ref:`consumer <utilities-consumer>` (the thing that does the indexing) and
-the :ref:`webserver <utilities-webserver>` (the part that lets you search & download
+the :ref:`webserver <utilities-webserver>` (the part that lets you search &
-already-indexed documents). If you want to learn more about its functions keep on
+download already-indexed documents). If you want to learn more about its
-reading after the installation section.
+functions keep on reading after the installation section.
 .. _index-why-this-exists:
@@ -16,12 +16,13 @@ Why This Exists
 ===============
 Paper is a nightmare.  Environmental issues aside, there's no excuse for it in
-the 21st century.  It takes up space, collects dust, doesn't support any form of
+the 21st century.  It takes up space, collects dust, doesn't support any form
-a search feature, indexing is tedious, it's heavy and prone to damage & loss.
+of a search feature, indexing is tedious, it's heavy and prone to damage &
 loss.
 I wrote this to make "going paperless" easier.  I do not have to worry about
-finding stuff again. I feed documents right from the post box into the scanner and
+finding stuff again. I feed documents right from the post box into the scanner
-then shred them.  Perhaps you might find it useful too.
+and then shred them.  Perhaps you might find it useful too.
@@ -39,6 +40,7 @@ Contents
   utilities
   guesswork
   migrating
   extending
   troubleshooting
   scanners
   changelog
--- a/docs/requirements.rst
+++ b/docs/requirements.rst
@@ -11,24 +11,27 @@ should work) that has the following software installed:
 * `Tesseract`_, plus its language files matching your document base.
 * `Imagemagick`_ version 6.7.5 or higher
 * `unpaper`_
 * `libpoppler-cpp-dev`_ PDF rendering library
 .. _Python3: https://python.org/
 .. _GNU Privacy Guard: https://gnupg.org
 .. _Tesseract: https://github.com/tesseract-ocr
 .. _Imagemagick: http://imagemagick.org/
 .. _unpaper: https://www.flameeyes.eu/projects/unpaper
 .. _libpoppler-cpp-dev: https://poppler.freedesktop.org/
 Notably, you should confirm how you access your Python3 installation.  Many
-Linux distributions will install Python3 in parallel to Python2, using the names
+Linux distributions will install Python3 in parallel to Python2, using the
-``python3`` and ``python`` respectively.  The same goes for ``pip3`` and
+names ``python3`` and ``python`` respectively.  The same goes for ``pip3`` and
-``pip``.  Running Paperless with Python2 will likely break things, so make sure that 
+``pip``.  Running Paperless with Python2 will likely break things, so make sure
-you're using the right version.
+that you're using the right version.
 For the purposes of simplicity, ``python`` and ``pip`` is used everywhere to
 refer to their Python3 versions.
 In addition to the above, there are a number of Python requirements, all of
-which are listed in a file called ``requirements.txt`` in the project root directory.
+which are listed in a file called ``requirements.txt`` in the project root
 directory.
 If you're not working on a virtual environment (like Vagrant or Docker), you
 should probably be using a virtualenv, but that's your call.  The reasons why
@@ -39,12 +42,13 @@ probably figure that out before continuing.
 .. _requirements-apple:
-Apple-tastic Complications
+Problems with Imagemagick & PDFs
--------------------------
+--------------------------------
-Some users have `run into problems`_ with installing ImageMagick on Apple
+Some users have `run into problems`_ with getting ImageMagick to do its thing
-systems using HomeBrew.  The solution appears to be to install ghostscript as
+with PDFs.  Often this is the case with Apple systems using HomeBrew, but other
-well as ImageMagick:
+Linuxes have been a problem as well.  The solution appears to be to install
 ghostscript as well as ImageMagick:
 .. _run into problems: https://github.com/danielquinn/paperless/issues/25
--- a/docs/setup.rst
+++ b/docs/setup.rst
@@ -95,48 +95,6 @@ Standard (Bare Metal)
 .. _Paperless webserver: http://127.0.0.1:8000
 .. _setup-installation-vagrant:
 Vagrant Method
 ..............
 1. Install `Vagrant`_.  How you do that is really between you and your OS.
 2. Run ``vagrant up``.  An instance will start up for you.  When it's ready and
   provisioned...
 3. Run ``vagrant ssh`` and once inside your new vagrant box, edit
   ``/etc/paperless.conf`` and set the values for:
    * ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
      dumped to be consumed by Paperless.
    * ``PAPERLESS_PASSPHRASE``: this is the passphrase Paperless uses to
      encrypt/decrypt the original document.
    * ``PAPERLESS_SHARED_SECRET``: this is the "magic word" used when consuming
      documents from mail or via the API.  If you don't use either, leaving it
      blank is just fine.
 4. Exit the vagrant box and re-enter it with ``vagrant ssh`` again.  This
   updates the environment to make use of the changes you made to the config
   file.
 5. Initialise the database with ``/opt/paperless/src/manage.py migrate``.
 6. Still inside your vagrant box, create a user for your Paperless instance
   with ``/opt/paperless/src/manage.py createsuperuser``. Follow the prompts to
   create your user.
 7. Start the webserver with
   ``/opt/paperless/src/manage.py runserver 0.0.0.0:8000``. You should now be
   able to visit your (empty) `Paperless webserver`_ at ``172.28.128.4:8000``.
   You can login with the user/pass you created in #6.
 8. In a separate window, run ``vagrant ssh`` again, but this time once inside
   your vagrant instance, you should start the consumer script with
   ``/opt/paperless/src/manage.py document_consumer``.
 9. Scan something.  Put it in the ``CONSUMPTION_DIR``.
 10. Wait a few minutes
 11. Visit the document list on your webserver, and it should be there, indexed
    and downloadable.
 .. _Vagrant: https://vagrantup.com/
 .. _Paperless server: http://172.28.128.4:8000
 .. _setup-installation-docker:
 Docker Method
@@ -175,7 +133,8 @@ Docker Method
   modified versions of the configuration files.
 4. Modify ``docker-compose.yml`` to your preferences, following the
   instructions in comments in the file. The only change that is a hard
-   requirement is to specify where the consumption directory should mount.
+   requirement is to specify where the consumption directory should
   mount.[#dockercomposeyml]_
 5. Modify ``docker-compose.env`` and adapt the following environment variables:
   ``PAPERLESS_PASSPHRASE``
@@ -192,7 +151,7 @@ Docker Method
     default English, set this parameter to a space separated list of
     three-letter language-codes after `ISO 639-2/T`_. For a list of available
     languages -- including their three letter codes -- see the
-     `Debian packagelist`_.
+     `Alpine packagelist`_.
   ``USERMAP_UID`` and ``USERMAP_GID``
     If you want to mount the consumption volume (directory ``/consume`` within
@@ -282,12 +241,60 @@ Docker Method
 .. _Docker: https://www.docker.com/
 .. _docker-compose: https://docs.docker.com/compose/install/
 .. _ISO 639-2/T: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
-.. _Debian packagelist: https://packages.debian.org/search?suite=jessie&searchon=names&keywords=tesseract-ocr-
+.. _Alpine packagelist: https://pkgs.alpinelinux.org/packages?name=tesseract-ocr-data*&arch=x86_64
 .. [#compose] You of course don't have to use docker-compose, but it
   simplifies deployment immensely. If you know your way around Docker, feel
   free to tinker around without using compose!
 .. [#dockercomposeyml] If you're upgrading your docker-compose images from
   version 1.1.0 or earlier, you might need to change in the
   ``docker-compose.yml`` file the ``image: pitkley/paperless`` directive in
   both the ``webserver`` and ``consumer`` sections to ``build: ./`` as per the
   newer ``docker-compose.yml.example`` file
 .. _setup-installation-vagrant:
 Vagrant Method
 ..............
 1. Install `Vagrant`_.  How you do that is really between you and your OS.
 2. Run ``vagrant up``.  An instance will start up for you.  When it's ready and
   provisioned...
 3. Run ``vagrant ssh`` and once inside your new vagrant box, edit
   ``/etc/paperless.conf`` and set the values for:
    * ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
      dumped to be consumed by Paperless.
    * ``PAPERLESS_PASSPHRASE``: this is the passphrase Paperless uses to
      encrypt/decrypt the original document.
    * ``PAPERLESS_SHARED_SECRET``: this is the "magic word" used when consuming
      documents from mail or via the API.  If you don't use either, leaving it
      blank is just fine.
 4. Exit the vagrant box and re-enter it with ``vagrant ssh`` again.  This
   updates the environment to make use of the changes you made to the config
   file.
 5. Initialise the database with ``/opt/paperless/src/manage.py migrate``.
 6. Still inside your vagrant box, create a user for your Paperless instance
   with ``/opt/paperless/src/manage.py createsuperuser``. Follow the prompts to
   create your user.
 7. Start the webserver with
   ``/opt/paperless/src/manage.py runserver 0.0.0.0:8000``. You should now be
   able to visit your (empty) `Paperless webserver`_ at ``172.28.128.4:8000``.
   You can login with the user/pass you created in #6.
 8. In a separate window, run ``vagrant ssh`` again, but this time once inside
   your vagrant instance, you should start the consumer script with
   ``/opt/paperless/src/manage.py document_consumer``.
 9. Scan something.  Put it in the ``CONSUMPTION_DIR``.
 10. Wait a few minutes
 11. Visit the document list on your webserver, and it should be there, indexed
    and downloadable.
 .. _Vagrant: https://vagrantup.com/
 .. _Paperless server: http://172.28.128.4:8000
 .. _setup-permanent:
@@ -563,7 +570,8 @@ your gunicorn instance.  This should do the trick:
 Vagrant
 .......
-You may use the Ubuntu explanation above. Replace ``(local-filesystems and net-device-up IFACE=eth0)`` with ``vagrant-mounted``.
+You may use the Ubuntu explanation above. Replace
 ``(local-filesystems and net-device-up IFACE=eth0)`` with ``vagrant-mounted``.
 .. _setup-permanent-docker:
@@ -577,7 +585,7 @@ Docker daemon.
 .. _restart-policy: https://docs.docker.com/engine/reference/commandline/run/#restart-policies-restart
-.. _setup-subdirectory
+.. _setup-subdirectory:
 Hosting Paperless in a Subdirectory
 -----------------------------------
--- a/paperless.conf.example
+++ b/paperless.conf.example
@@ -167,6 +167,12 @@ PAPERLESS_PASSPHRASE="secret"
 #PAPERLESS_TIME_ZONE=UTC
 # If set, Paperless will show document filters per financial year.
 # The dates must be in the format "mm-dd", for example "07-15" for July 15.
 #PAPERLESS_FINANCIAL_YEAR_START="mm-dd"
 #PAPERLESS_FINANCIAL_YEAR_END="mm-dd"
 # The number of items on each page in the web UI.  This value must be a
 # positive integer, but if you don't define one in paperless.conf, a default of
 # 100 will be used.
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,5 +1,6 @@
 Django>=1.11,<2.0
 Pillow>=3.1.1
 dateparser>=0.6.0
 django-crispy-forms>=1.6.1
 django-extensions>=1.7.6
 django-filter>=1.0
@@ -7,20 +8,21 @@ django-flat-responsive>=1.2.0
 djangorestframework>=3.5.3
 filemagic>=1.6
 fuzzywuzzy[speedup]==0.15.0
 gunicorn>=19.7.1
 langdetect>=1.0.7
 pdftotext>=2.0.1
 pyocr>=0.4.7
 python-dateutil>=2.6.0
 python-dotenv>=0.6.2
 python-gnupg>=0.3.9
 pytz>=2016.10
 gunicorn==19.7.1
 # For the tests
 factory-boy
-pytest
+flake8
 pytest==3.3.2  # Newer versions break with pytest-sugar
 pytest-django
 pytest-sugar
 pytest-env
 pycodestyle
 flake8
 tox
--- a/scripts/docker-entrypoint.sh
+++ b/scripts/docker-entrypoint.sh
@@ -9,7 +9,7 @@ map_uidgid() {
    USERMAP_UID=${USERMAP_UID:-$USERMAP_ORIG_UID}
    if [[ ${USERMAP_UID} != "${USERMAP_ORIG_UID}" || ${USERMAP_GID} != "${USERMAP_ORIG_GID}" ]]; then
        echo "Mapping UID and GID for paperless:paperless to $USERMAP_UID:$USERMAP_GID"
-        groupmod -g "${USERMAP_GID}" paperless
+        addgroup -g "${USERMAP_GID}" paperless
        sed -i -e "s|:${USERMAP_ORIG_UID}:${USERMAP_GID}:|:${USERMAP_UID}:${USERMAP_GID}:|" /etc/passwd
    fi
 }
@@ -56,25 +56,24 @@ install_languages() {
        return
    fi
    # Update apt-lists
    apt-get update
    # Loop over languages to be installed
    for lang in "${langs[@]}"; do
-        pkg="tesseract-ocr-$lang"
+        pkg="tesseract-ocr-data-$lang"
-        if dpkg -s "$pkg" > /dev/null 2>&1; then
+
        # English is installed by default
        if [ "$lang" ==  "eng" ]; then
            continue
        fi
-        if ! apt-cache show "$pkg" > /dev/null 2>&1; then
+        if apk info -e "$pkg" > /dev/null 2>&1; then
            continue
        fi
        if ! apk info "$pkg" > /dev/null 2>&1; then
            continue
        fi
-        apt-get install "$pkg"
+        apk --no-cache --update add "$pkg"
    done
    # Remove apt lists
    rm -rf /var/lib/apt/lists/*
 }
--- a/src/documents/admin.py
+++ b/src/documents/admin.py
@@ -1,3 +1,5 @@
 from datetime import datetime
 from django.conf import settings
 from django.contrib import admin
 from django.contrib.auth.models import User, Group
@@ -32,6 +34,71 @@ class MonthListFilter(admin.SimpleListFilter):
        return queryset.filter(created__year=year, created__month=month)
 class FinancialYearFilter(admin.SimpleListFilter):
    title = "Financial Year"
    parameter_name = "fy"
    _fy_wraps = None
    def _fy_start(self, year):
        """Return date of the start of financial year for the given year."""
        fy_start = "{}-{}".format(str(year), settings.FY_START)
        return datetime.strptime(fy_start, "%Y-%m-%d").date()
    def _fy_end(self, year):
        """Return date of the end of financial year for the given year."""
        fy_end = "{}-{}".format(str(year), settings.FY_END)
        return datetime.strptime(fy_end, "%Y-%m-%d").date()
    def _fy_does_wrap(self):
        """Return whether the financial year spans across two years."""
        if self._fy_wraps is None:
            start = "{}".format(settings.FY_START)
            start = datetime.strptime(start, "%m-%d").date()
            end = "{}".format(settings.FY_END)
            end = datetime.strptime(end, "%m-%d").date()
            self._fy_wraps = end < start
        return self._fy_wraps
    def _determine_fy(self, date):
        """Return a (query, display) financial year tuple of the given date."""
        if self._fy_does_wrap():
            fy_start = self._fy_start(date.year)
            if date.date() >= fy_start:
                query = "{}-{}".format(date.year, date.year + 1)
            else:
                query = "{}-{}".format(date.year - 1, date.year)
            # To keep it simple we use the same string for both
            # query parameter and the display.
            return (query, query)
        else:
            query = "{0}-{0}".format(date.year)
            display = "{}".format(date.year)
            return (query, display)
    def lookups(self, request, model_admin):
        if not settings.FY_START or not settings.FY_END:
            return None
        r = []
        for document in Document.objects.all():
            r.append(self._determine_fy(document.created))
        return sorted(set(r), key=lambda x: x[0], reverse=True)
    def queryset(self, request, queryset):
        if not self.value() or not settings.FY_START or not settings.FY_END:
            return None
        start, end = self.value().split("-")
        return queryset.filter(created__gte=self._fy_start(start),
                               created__lte=self._fy_end(end))
 class CommonAdmin(admin.ModelAdmin):
    list_per_page = settings.PAPERLESS_LIST_PER_PAGE
@@ -59,7 +126,9 @@ class DocumentAdmin(CommonAdmin):
    search_fields = ("correspondent__name", "title", "content")
    list_display = ("title", "created", "thumbnail", "correspondent", "tags_")
-    list_filter = ("tags", "correspondent", MonthListFilter)
+    list_filter = ("tags", "correspondent", FinancialYearFilter,
                   MonthListFilter)
    ordering = ["-created", "correspondent"]
    def has_add_permission(self, request):
--- a/src/documents/consumer.py
+++ b/src/documents/consumer.py
@@ -118,12 +118,14 @@ class Consumer(object):
            parsed_document = parser_class(doc)
            thumbnail = parsed_document.get_thumbnail()
            date = parsed_document.get_date()
            try:
                document = self._store(
                    parsed_document.get_text(),
                    doc,
-                    thumbnail
+                    thumbnail,
                    date
                )
            except ParseError as e:
@@ -174,7 +176,7 @@ class Consumer(object):
        return sorted(
            options, key=lambda _: _["weight"], reverse=True)[0]["parser"]
-    def _store(self, text, doc, thumbnail):
+    def _store(self, text, doc, thumbnail, date):
        file_info = FileInfo.from_path(doc)
@@ -182,7 +184,7 @@ class Consumer(object):
        self.log("debug", "Saving record to database")
-        created = file_info.created or timezone.make_aware(
+        created = file_info.created or date or timezone.make_aware(
                    datetime.datetime.fromtimestamp(stats.st_mtime))
        with open(doc, "rb") as f:
--- a/src/documents/management/commands/document_exporter.py
+++ b/src/documents/management/commands/document_exporter.py
@@ -64,7 +64,7 @@ class Command(Renderable, BaseCommand):
            file_target = os.path.join(self.target, document.file_name)
-            thumbnail_name = document.file_name + "-tumbnail.png"
+            thumbnail_name = document.file_name + "-thumbnail.png"
            thumbnail_target = os.path.join(self.target, thumbnail_name)
            document_dict[EXPORTER_FILE_NAME] = document.file_name
--- a/src/documents/models.py
+++ b/src/documents/models.py
@@ -135,8 +135,10 @@ class MatchingModel(models.Model):
        """
        findterms = re.compile(r'"([^"]+)"|(\S+)').findall
        normspace = re.compile(r"\s+").sub
-        return [normspace(r"\s+", (t[0] or t[1]).strip())
+        return [
-                for t in findterms(self.match)]
+            normspace(" ", (t[0] or t[1]).strip()).replace(" ", r"\s+")
            for t in findterms(self.match)
        ]
    def save(self, *args, **kwargs):
--- a/src/documents/parsers.py
+++ b/src/documents/parsers.py
@@ -9,7 +9,7 @@ class ParseError(Exception):
    pass
-class DocumentParser(object):
+class DocumentParser:
    """
    Subclass this to make your own parser.  Have a look at
    `paperless_tesseract.parsers` for inspiration.
@@ -19,7 +19,7 @@ class DocumentParser(object):
    def __init__(self, path):
        self.document_path = path
-        self.tempdir = tempfile.mkdtemp(prefix="paperless", dir=self.SCRATCH)
+        self.tempdir = tempfile.mkdtemp(prefix="paperless-", dir=self.SCRATCH)
        self.logger = logging.getLogger(__name__)
        self.logging_group = None
@@ -35,6 +35,12 @@ class DocumentParser(object):
        """
        raise NotImplementedError()
    def get_date(self):
        """
        Returns the date of the document.
        """
        raise NotImplementedError()
    def log(self, level, message):
        getattr(self.logger, level)(message, extra={
            "group": self.logging_group
--- a/src/paperless/settings.py
+++ b/src/paperless/settings.py
@@ -210,6 +210,9 @@ OCR_LANGUAGE = os.getenv("PAPERLESS_OCR_LANGUAGE", "eng")
 # The amount of threads to use for OCR
 OCR_THREADS = os.getenv("PAPERLESS_OCR_THREADS")
 # OCR all documents?
 OCR_ALWAYS = bool(os.getenv("PAPERLESS_OCR_ALWAYS", "NO").lower() in ("yes", "y", "1", "t", "true"))
 # If this is true, any failed attempts to OCR a PDF will result in the PDF
 # being indexed anyway, with whatever we could get.  If it's False, the file
 # will simply be left in the CONSUMPTION_DIR.
@@ -255,3 +258,9 @@ POST_CONSUME_SCRIPT = os.getenv("PAPERLESS_POST_CONSUME_SCRIPT")
 # positive integer, but if you don't define one in paperless.conf, a default of
 # 100 will be used.
 PAPERLESS_LIST_PER_PAGE = int(os.getenv("PAPERLESS_LIST_PER_PAGE", 100))
 FY_START = os.getenv("PAPERLESS_FINANCIAL_YEAR_START")
 FY_END = os.getenv("PAPERLESS_FINANCIAL_YEAR_END")
 # Specify the default date order (for autodetected dates)
 DATE_ORDER = os.getenv("PAPERLESS_DATE_ORDER", "DMY")
--- a/src/paperless/urls.py
+++ b/src/paperless/urls.py
@@ -44,8 +44,8 @@ urlpatterns = [
    # The Django admin
    url(r"admin/", admin.site.urls),
-    # Catch all redirect back to /admin
+    # Redirect / to /admin
-    url(r"", RedirectView.as_view(permanent=True, url="/admin/")),
+    url(r"^$", RedirectView.as_view(permanent=True, url="/admin/")),
 ] + static.static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
--- a/src/paperless/version.py
+++ b/src/paperless/version.py
@@ -1 +1 @@
-__version__ = (1, 0, 0)
+__version__ = (1, 2, 0)
--- a/src/paperless_tesseract/parsers.py
+++ b/src/paperless_tesseract/parsers.py
@@ -3,6 +3,8 @@ import os
 import re
 import subprocess
 from multiprocessing.pool import Pool
 import dateparser
 import pdftotext
 import langdetect
 import pyocr
@@ -30,7 +32,10 @@ class RasterisedDocumentParser(DocumentParser):
    DENSITY = settings.CONVERT_DENSITY if settings.CONVERT_DENSITY else 300
    THREADS = int(settings.OCR_THREADS) if settings.OCR_THREADS else None
    UNPAPER = settings.UNPAPER_BINARY
    DATE_ORDER = settings.DATE_ORDER
    DEFAULT_OCR_LANGUAGE = settings.OCR_LANGUAGE
    OCR_ALWAYS = settings.OCR_ALWAYS
    TEXT_CACHE = None
    def get_thumbnail(self):
        """
@@ -46,13 +51,32 @@ class RasterisedDocumentParser(DocumentParser):
        return os.path.join(self.tempdir, "convert-0000.png")
    def _is_ocred(self):
        # Extract text from PDF using pdftotext
        text = get_text_from_pdf(self.document_path)
        # We assume, that a PDF with at least 50 characters contains text
        # (so no OCR required)
        if len(text) > 50:
            return True
        return False
    def get_text(self):
        if self.TEXT_CACHE is not None:
            return self.TEXT_CACHE
        if not self.OCR_ALWAYS and self._is_ocred():
            self.log("info", "Skipping OCR, using Text from PDF")
            self.TEXT_CACHE = get_text_from_pdf(self.document_path)
            return self.TEXT_CACHE
        images = self._get_greyscale()
        try:
-            return self._get_ocr(images)
+            self.TEXT_CACHE = self._get_ocr(images)
            return self.TEXT_CACHE
        except OCRError as e:
            raise ParseError(e)
@@ -175,6 +199,29 @@ class RasterisedDocumentParser(DocumentParser):
        text += self._ocr(imgs[middle + 1:], self.DEFAULT_OCR_LANGUAGE)
        return text
    def get_date(self):
        text = self.get_text()
        # This regular expression will try to find dates in the document at
        # hand and will match the following formats:
        # - XX.YY.ZZZZ with XX + YY being 1 or 2 and ZZZZ being 2 or 4 digits
        # - XX/YY/ZZZZ with XX + YY being 1 or 2 and ZZZZ being 2 or 4 digits
        # - XX-YY-ZZZZ with XX + YY being 1 or 2 and ZZZZ being 2 or 4 digits
        # - XX. MONTH ZZZZ with XX being 1 or 2 and ZZZZ being 2 or 4 digits
        # - MONTH ZZZZ
        m = re.search(
            r'\b([0-9]{1,2})[\.\/-]([0-9]{1,2})[\.\/-]([0-9]{4}|[0-9]{2})\b|' +
            r'\b([0-9]{1,2}\. [^ ]{3,9} ([0-9]{4}|[0-9]{2}))\b|' +
            r'\b([^ ]{3,9} [0-9]{4})\b', text)
        if m is None:
            return None
        return dateparser.parse(m.group(0),
                                settings={'DATE_ORDER': self.DATE_ORDER,
                                          'PREFER_DAY_OF_MONTH': 'first',
                                          'RETURN_AS_TIMEZONE_AWARE': True})
 def run_convert(*args):
@@ -212,3 +259,13 @@ def image_to_string(args):
            except (TesseractError, OtherTesseractError):
                pass
        return ocr.image_to_string(f, lang=lang)
 def get_text_from_pdf(pdf_file):
    with open(pdf_file, "rb") as f:
        try:
            pdf = pdftotext.PDF(f)
        except pdftotext.Error:
            return ""
    return "\n".join(pdf)
--- a/src/paperless_tesseract/signals.py
+++ b/src/paperless_tesseract/signals.py
@@ -3,7 +3,7 @@ import re
 from .parsers import RasterisedDocumentParser
-class ConsumerDeclaration(object):
+class ConsumerDeclaration:
    MATCHING_FILES = re.compile("^.*\.(pdf|jpe?g|gif|png|tiff?|pnm|bmp)$")
--- a/src/paperless_tesseract/tests/samples/tests_date_1.pdf
+++ b/src/paperless_tesseract/tests/samples/tests_date_1.pdf
--- a/src/paperless_tesseract/tests/samples/tests_date_1.png
+++ b/src/paperless_tesseract/tests/samples/tests_date_1.png
--- a/src/paperless_tesseract/tests/samples/tests_date_2.pdf
+++ b/src/paperless_tesseract/tests/samples/tests_date_2.pdf
--- a/src/paperless_tesseract/tests/samples/tests_date_2.png
+++ b/src/paperless_tesseract/tests/samples/tests_date_2.png
--- a/src/paperless_tesseract/tests/samples/tests_date_3.pdf
+++ b/src/paperless_tesseract/tests/samples/tests_date_3.pdf
--- a/src/paperless_tesseract/tests/samples/tests_date_3.png
+++ b/src/paperless_tesseract/tests/samples/tests_date_3.png
--- a/src/paperless_tesseract/tests/samples/tests_date_4.pdf
+++ b/src/paperless_tesseract/tests/samples/tests_date_4.pdf
--- a/src/paperless_tesseract/tests/samples/tests_date_4.png
+++ b/src/paperless_tesseract/tests/samples/tests_date_4.png
--- a/src/paperless_tesseract/tests/samples/tests_date_5.pdf
+++ b/src/paperless_tesseract/tests/samples/tests_date_5.pdf
--- a/src/paperless_tesseract/tests/samples/tests_date_5.png
+++ b/src/paperless_tesseract/tests/samples/tests_date_5.png
--- a/src/paperless_tesseract/tests/samples/tests_date_6.pdf
+++ b/src/paperless_tesseract/tests/samples/tests_date_6.pdf
--- a/src/paperless_tesseract/tests/samples/tests_date_6.png
+++ b/src/paperless_tesseract/tests/samples/tests_date_6.png
--- a/src/paperless_tesseract/tests/samples/tests_date_7.pdf
+++ b/src/paperless_tesseract/tests/samples/tests_date_7.pdf
--- a/src/paperless_tesseract/tests/test_date.py
+++ b/src/paperless_tesseract/tests/test_date.py
@@ -0,0 +1,215 @@
 import datetime
 import os
 import shutil
 from unittest import mock
 from uuid import uuid4
 from dateutil import tz
 from django.test import TestCase
 from ..parsers import RasterisedDocumentParser
 class TestDate(TestCase):
    SAMPLE_FILES = os.path.join(os.path.dirname(__file__), "samples")
    SCRATCH = "/tmp/paperless-tests-{}".format(str(uuid4())[:8])
    def setUp(self):
        os.makedirs(self.SCRATCH, exist_ok=True)
    def tearDown(self):
        shutil.rmtree(self.SCRATCH)
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_1_pdf(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_1.pdf")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        self.assertEqual(document._is_ocred(), True)
        self.assertEqual(document.get_date(),
                         datetime.datetime(2018, 4, 1, 0, 0,
                                           tzinfo=tz.tzutc()))
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_1_png(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_1.png")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        self.assertEqual(document._is_ocred(), False)
        self.assertEqual(document.get_date(),
                         datetime.datetime(2018, 4, 1, 0, 0,
                                           tzinfo=tz.tzutc()))
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_2_pdf(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_2.pdf")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        self.assertEqual(document._is_ocred(), True)
        self.assertEqual(document.get_date(),
                         datetime.datetime(2013, 2, 1, 0, 0,
                                           tzinfo=tz.tzutc()))
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_2_png(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_2.png")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        self.assertEqual(document._is_ocred(), False)
        self.assertEqual(document.get_date(),
                         datetime.datetime(2013, 2, 1, 0, 0,
                                           tzinfo=tz.tzutc()))
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_3_pdf(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_3.pdf")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        self.assertEqual(document._is_ocred(), True)
        self.assertEqual(document.get_date(),
                         datetime.datetime(2018, 10, 5, 0, 0,
                                           tzinfo=tz.tzutc()))
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_3_png(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_3.png")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        self.assertEqual(document._is_ocred(), False)
        self.assertEqual(document.get_date(),
                         datetime.datetime(2018, 10, 5, 0, 0,
                                           tzinfo=tz.tzutc()))
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_4_pdf(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_4.pdf")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        self.assertEqual(document._is_ocred(), True)
        self.assertEqual(document.get_date(),
                         datetime.datetime(2018, 10, 5, 0, 0,
                                           tzinfo=tz.tzutc()))
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_4_png(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_4.png")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        self.assertEqual(document._is_ocred(), False)
        self.assertEqual(document.get_date(),
                         datetime.datetime(2018, 10, 5, 0, 0,
                                           tzinfo=tz.tzutc()))
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_5_pdf(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_5.pdf")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        self.assertEqual(document._is_ocred(), True)
        self.assertEqual(document.get_date(),
                         datetime.datetime(2018, 12, 17, 0, 0,
                                           tzinfo=tz.tzutc()))
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_5_png(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_5.png")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        self.assertEqual(document._is_ocred(), False)
        self.assertEqual(document.get_date(),
                         datetime.datetime(2018, 12, 17, 0, 0,
                                           tzinfo=tz.tzutc()))
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_6_pdf_us(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_6.pdf")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        document.DATE_ORDER = "MDY"
        self.assertEqual(document._is_ocred(), True)
        self.assertEqual(document.get_date(),
                         datetime.datetime(2018, 12, 17, 0, 0,
                                           tzinfo=tz.tzutc()))
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_6_png_us(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_6.png")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        document.DATE_ORDER = "MDY"
        self.assertEqual(document._is_ocred(), False)
        self.assertEqual(document.get_date(),
                         datetime.datetime(2018, 12, 17, 0, 0,
                                           tzinfo=tz.tzutc()))
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_6_pdf_eu(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_6.pdf")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        self.assertEqual(document._is_ocred(), True)
        self.assertEqual(document.get_date(), None)
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_6_png_eu(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_6.png")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        self.assertEqual(document._is_ocred(), False)
        self.assertEqual(document.get_date(), None)
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SCRATCH
    )
    def test_get_text_7_pdf(self):
        input_file = os.path.join(self.SAMPLE_FILES, "tests_date_7.pdf")
        document = RasterisedDocumentParser(input_file)
        document.get_text()
        self.assertEqual(document._is_ocred(), True)
        self.assertEqual(document.get_date(),
                         datetime.datetime(2018, 4, 1, 0, 0,
                                           tzinfo=tz.tzutc()))
Author	SHA1	Message	Date
Daniel Quinn	5009bd022f	Add section for extending	2018-02-03 15:31:20 +00:00
Daniel Quinn	73163d893f	No need to extend object	2018-02-03 15:26:28 +00:00
Daniel Quinn	506af7c9c2	Conform to an 80-character limit	2018-02-03 15:26:09 +00:00
Daniel Quinn	c90ed2da1d	Rework tests to write to /tmp Originally the test wrote scratch data inside the repo dir, which meant manual cleanup. Now it writes to `/tmp/paperless-tests-<random-string>` and cleans up after itself.	2018-02-03 14:49:48 +00:00
Daniel Quinn	cebb8b9fa2	Use `paperless-` instead of `paperless` for tempdir name This is purely aesthetic.	2018-02-03 14:49:17 +00:00
Daniel Quinn	46aca10a72	No need to explicitly extend object	2018-02-03 14:49:01 +00:00
Daniel Quinn	6384c698ad	Fix DeprecationWarning as-per ishirav's advice	2018-02-03 14:48:14 +00:00
Daniel Quinn	abf01be889	Move the Docker method up	2018-02-03 14:22:10 +00:00
Daniel Quinn	1e4928d2a0	Update the release notes for 1.2.0	2018-02-03 14:21:55 +00:00
Daniel Quinn	503be90932	Reorganise the sections.	2018-02-03 14:11:43 +00:00
Daniel Quinn	b5d6a82cc3	Reformat README to be Docker Hub-friendly For some reason, Docker Hub doesn't follow the Markdown spec correctly, and inserts `<br />` tags on single newlines, meaning that this file can't use hard wraps.	2018-02-03 14:06:11 +00:00
Daniel Quinn	c073ba5272	Try to get Docker Hub liking the README	2018-02-03 14:02:21 +00:00
Daniel Quinn	d1e317ce21	Switch from README.rst to README.md This is to work around a shortcoming in Docker Hub that requires that we use markdown.	2018-02-03 13:40:27 +00:00
Daniel Quinn	d4abeafb34	Re-order requirements.txt	2018-02-03 13:28:22 +00:00
Daniel Quinn	4d96551619	Merge pull request #291 from BastianPoe/feature/heuristically-extract-date-from-document-text Add support for a heuristic that extracts the document date from its text	2018-02-03 14:13:14 +01:00
Wolf-Bastian Pöttner	178361b247	Add tesseract and unpaper to travis-ci for automated tests	2018-02-03 00:28:52 +01:00
Wolf-Bastian Pöttner	40f8ba23a4	Added a text cache to optimize performance of date detection	2018-02-03 00:28:52 +01:00
Wolf-Bastian Pöttner	bef2d94374	Add test cases for date parsing	2018-02-03 00:28:49 +01:00
Wolf-Bastian Pöttner	f39c7654a0	Merge branch 'master' of https://github.com/danielquinn/paperless into feature/heuristically-extract-date-from-document-text	2018-02-02 22:44:03 +01:00
Daniel Quinn	e9fff764cb	Fix text formatting	2018-02-02 22:38:07 +01:00
Wolf-Bastian Pöttner	87e466c47c	Add support for using pre-existing text from PDFs	2018-02-02 22:37:58 +01:00
Daniel Quinn	bd0b593c4a	Clean up grammar & remove VersionEye	2018-02-02 22:37:36 +01:00
Daniel Quinn	7a8142df2b	Update docs to reflect Docker changes	2018-02-02 22:37:36 +01:00
Guy	bbe3084eda	fixing typos and rst syntax	2018-02-02 22:37:36 +01:00
Guy	89d42bd078	Updated Dockerfile with maintainer and contributors Updated setup.rst with information on upgrade path if coming from an earlier version of docker-compose images	2018-02-02 22:37:36 +01:00
Guy	93efaf7a38	adapted docker-entrypoint script for alpine docker image (mainly how to install additional OCR languages)	2018-02-02 22:37:36 +01:00
Guy	398575c70c	changed docker-comppse.yml example to build the docker image instead of pull the previously used debian based image from docker hub	2018-02-02 22:37:36 +01:00
Guy Addadi	4e21fa4830	removed ENV WORKDIR layers, reorg the commands in groups with comments and black lines when possible. Removed redundant mkdir command	2018-02-02 22:37:36 +01:00
Guy Addadi	d2d2d9edaf	moved to alpine:3.7 removed RUN layers to save image space, removed redundant mkdir commands	2018-02-02 22:37:35 +01:00
Guy Addadi	771c8bbbe4	added bash and moved all dev packages to be with virtual alpine env that is removed after python libraries installation	2018-02-02 22:37:35 +01:00
Guy Addadi	20eeda19b8	adapted Dockerfile for alpine image	2018-02-02 22:37:35 +01:00
Daniel Quinn	5e40227bc3	Merge branch 'master' of github.com:danielquinn/paperless	2018-02-01 16:04:57 +00:00
Daniel Quinn	5479942fc0	Merge pull request #294 from matthewmoto/fix_pdftotext_errors Fixing error sentinel for pdftotext when the PDF has no text (scanned…	2018-02-01 16:20:42 +01:00
Matt	ce98019b49	Fixing error sentinel for pdftotext when the PDF has no text (scanned images). It was causing a crash previously.	2018-02-01 10:08:57 -05:00
Daniel Quinn	9470154df2	Fiddling to get Docker Hub to behave	2018-02-01 13:02:48 +00:00
Daniel Quinn	5c59120c57	Merge branch 'master' of github.com:danielquinn/paperless	2018-02-01 12:37:29 +00:00
Daniel Quinn	88736ff867	Version bump in anticipation of release later this week	2018-02-01 12:37:21 +00:00
Daniel Quinn	fd5b831979	Fix pytest to 3.3.2	2018-02-01 12:18:01 +00:00
Daniel Quinn	3fcd1e2d7e	Fix text formatting	2018-01-30 20:27:40 +00:00
Daniel Quinn	2c81648d59	Merge branch 'BastianPoe-feature/autodetect_if_ocr_is_required'	2018-01-30 20:14:53 +00:00
Daniel Quinn	cd92c005e3	Add support for using pre-existing text from PDFs	2018-01-30 20:13:35 +00:00
Daniel Quinn	31c8cf020e	Clean up grammar & remove VersionEye	2018-01-30 18:46:46 +00:00
Daniel Quinn	e900a38983	Update docs to reflect Docker changes	2018-01-30 17:19:18 +00:00
Daniel Quinn	7343a07ddd	Merge pull request #274 from addadi/master Adapt Dockerfile for alpine image	2018-01-30 18:11:19 +01:00
Guy	e20b4fb905	fixing typos and rst syntax	2018-01-29 23:41:52 +02:00
Guy	cbbc4d37d0	Updated Dockerfile with maintainer and contributors Updated setup.rst with information on upgrade path if coming from an earlier version of docker-compose images	2018-01-29 23:19:06 +02:00
Wolf-Bastian Pöttner	b140935843	Add support for a heuristic that extracts the document date from its text	2018-01-28 19:37:10 +01:00
Daniel Quinn	9faf0a102e	Update changelog & version bump	2018-01-21 17:39:00 +00:00
Daniel Quinn	b747dd58c3	Fix redirect bug #283	2018-01-21 17:33:04 +00:00
Daniel Quinn	09e1b505e1	Merge pull request #256 from ddddavidmartin/add_financial_year_filter Add financial year documents filter	2018-01-21 18:23:45 +01:00
Daniel Quinn	a6babffed8	Merge pull request #285 from pzl/master small typo in exporter thumbnail filename	2018-01-21 18:18:31 +01:00
pzl	0256e2dfbb	small typo in exporter thumbnail filename	2018-01-19 14:28:46 -05:00
Daniel Quinn	7afa90b769	Fix travis reference to pycodestyle	2018-01-06 19:32:51 +00:00
Daniel Quinn	5796956235	Fix typo	2018-01-06 19:30:08 +00:00
Guy	7e49d047b0	adapted docker-entrypoint script for alpine docker image (mainly how to install additional OCR languages)	2017-12-20 16:17:58 +02:00
Guy	68cdeb7b3d	changed docker-comppse.yml example to build the docker image instead of pull the previously used debian based image from docker hub	2017-12-19 22:34:22 +02:00
Guy Addadi	76293084a4	removed ENV WORKDIR layers, reorg the commands in groups with comments and black lines when possible. Removed redundant mkdir command	2017-12-12 23:12:34 +02:00
Guy Addadi	e1cf2117f5	moved to alpine:3.7 removed RUN layers to save image space, removed redundant mkdir commands	2017-12-11 22:03:51 +02:00
Guy Addadi	7d81de4edf	added bash and moved all dev packages to be with virtual alpine env that is removed after python libraries installation	2017-12-11 00:41:36 +02:00
Guy Addadi	37af5992c7	adapted Dockerfile for alpine image	2017-12-09 23:08:56 +02:00
David Martin	67b492bcb7	Determine the start of the financial only for wrapping years. If the financial year is from Jan to Dec there we do not need to determine the start to see which year it falls into.	2017-08-26 19:50:57 +10:00
David Martin	360d1e2802	Store whether financial year wraps instead of re-determining it. It either wraps or it does not depending on how it is set in the config. There is no point in determining it again for each document. Instead we simply store it as a member variable the first time we check.	2017-08-26 19:45:39 +10:00
David Martin	1cd76634a3	Take non-wrapping financial years into account. The German financial year for example goes from January to December. In those cases we simply only show the year in the overview.	2017-08-25 20:27:39 +10:00
David Martin	c65c5009e4	Return no filter results if financial year dates are not set. This is a lot cleaner than trying to hack around whether or not the FinancialYearFilter is part of the available filters. This way it will show up if there are result for it and the dates are set, and it will not if any of those conditions is not set.	2017-08-25 17:36:09 +10:00
David Martin	24fb6cefb9	Add config settings to set the start and the end of the financial year. Now we allow to filter for any financial year dates. Note that we also only show the financial year filter if the dates are actually set.	2017-08-24 20:51:09 +10:00
David Martin	d80e272b75	Add a basic financial year filter for the document overview. For now we simply hardcode the dates for the AU financial years. We simply show a list of financial years and filter the documents accordingly.	2017-08-24 20:20:00 +10:00
`@@ -1 +1 @@`
	`__version__ = (1, 0, 0)`	`__version__ = (1, 2, 0)`