paperless-ngx

Author	SHA1	Message	Date
Erik Arvstedt	f56ec70aad	Ensure docs have been unmodified for some time before consuming Previously, the second mtime check for new files usually happened right after the first one, which could have caused consumption of docs that were still being modified. We're now waiting for at least FILES_MIN_UNMODIFIED_DURATION (0.5s). This also cleans up the logic by eliminating the consumer.stats attribute and the weird double call to consumer.run(). Additionally, this a fixes memory leak in consumer.stats where paths could be added but never removed if the corresponding files disappeared from the consumer dir before being considered ready.	2018-05-11 14:05:29 +02:00
Erik Arvstedt	0db6ed225b	Refactor: extract fn try_consume_file The main purpose of this change is to make the following commits more readable.	2018-05-11 14:05:28 +02:00
Erik Arvstedt	312a6a91b5	Use os.scandir instead of os.listdir It's simpler and better suited for use cases introduced in later commits.	2018-05-11 14:05:25 +02:00
Erik Arvstedt	2c64e70754	Consume documents in order of increasing mtime This increases overall usability, especially for multi-page scans. Previously, the consumption order was undefined (see os.listdir())	2018-05-11 14:04:37 +02:00
Erik Arvstedt	9320230100	Refactor: extract fn 'make_dirs'	2018-05-11 14:04:36 +02:00
Daniel Quinn	13452ba33b	Clean up docstring to be properly rst	2018-03-03 18:43:20 +00:00
Ovv	b10c2c770c	style & test	2018-03-03 18:43:20 +00:00
Ovv	d89dbbe537	Configuration cli argument for document_consumer	2018-03-03 18:43:20 +00:00
Daniel Quinn	4f726e1991	Monitor return codes of calls to `convert` and `unpaper` ...and handle the failures nicely. Addresses #303.	2018-02-18 16:02:27 +00:00
Daniel Quinn	caf44146db	Style and removal of Python 2.7 stuff	2018-02-18 15:55:55 +00:00
Wolf-Bastian Pöttner	21fc51c09a	Add support for a heuristic that extracts the document date from its text	2018-01-28 19:37:10 +01:00
Daniel Quinn	e7d4ca92ba	fix: allow for caps in file name suffixes #206 @schinkelg ran aground of this one and I took the opportunity to add a test to catch this sort of thing for next time.	2017-03-28 21:14:24 +00:00
Daniel Quinn	d2c283582b	feat: refactor for pluggable consumers I've broken out the OCR-specific code from the consumers and dumped it all into its own app, `paperless_tesseract`. This new app should serve as a sample of how to create one's own consumer for different file types. Documentation for how to do this isn't ready yet, but for the impatient: * Create a new app * containing a `parsers.py` for your parser modelled after `paperless_tesseract.parsers.RasterisedDocumentParser` * containing a `signals.py` with a handler moddelled after `paperless_tesseract.signals.ConsumerDeclaration` * connect the signal handler to `documents.signals.document_consumer_declaration` in `your_app.apps` * Install the app into Paperless by declaring `PAPERLESS_INSTALLED_APPS=your_app`. Additional apps should be separated with commas. * Restart the consumer	2017-03-25 15:10:25 +00:00
Daniel Quinn	18495ce9da	Fix for #154 * Added a test with a faked pyocr and tesseract * Added a catch for pyocr's other TesseractError	2016-11-27 15:06:45 +00:00
Daniel Quinn	ca21929cee	Moved logging logic into the consumer	2016-10-26 09:52:09 +00:00
Daniel Quinn	8e58406881	pep8 corrections	2016-10-26 09:32:59 +00:00
Aleksandr Bogdanov	63de2ca1b0	Collapsing excess whitespace after OCR	2016-10-12 01:46:34 +02:00
Daniel Quinn	1ce76a5486	Actually write the date found in the file name	2016-08-20 18:11:51 +01:00
Lenz Weber	018efc576b	wait until file is completely transmitted negation was missing for feature to be active, see #128	2016-06-26 10:18:58 +02:00
Brian Martin	b6ae129ad1	Sample Config and Bug Fix Update sample config to reflect new setting variable. Change consumer to handle density setting as str instead of int.	2016-05-13 23:23:58 -04:00
Brian Martin	52c5aafb3f	Convert Density Add settings variable for the convert density setting. If no variable is set, default to 300.	2016-05-13 22:47:40 -04:00
Daniel Quinn	e96c7448bc	Fix for #107	2016-04-11 23:28:12 +01:00
Daniel Quinn	90939be6af	@Pitkley made a good suggestion in #98	2016-04-10 17:39:49 +01:00
Daniel Quinn	64b72d4337	Added test for duplicates	2016-04-03 18:44:00 +01:00
Daniel Quinn	bbe691f342	Merge pull request #101 from danielquinn/issue/89 Closes #89.	2016-03-28 14:25:56 +01:00
Daniel Quinn	b4e648e1e3	Test All The Things	2016-03-28 14:16:26 +01:00
Daniel Quinn	b92e007e15	Removed log components and introduced signals for tags & correspondents	2016-03-28 11:11:15 +01:00
Daniel Quinn	49b56425e8	Merge branch 'master' into issue/81	2016-03-25 20:56:30 +00:00
Daniel Quinn	b387be6f25	I didn't mean to explicitly set -limit	2016-03-25 20:33:00 +00:00
Daniel Quinn	9991f5a6b2	Introducing optional env vars for ImageMagick	2016-03-25 20:31:15 +00:00
Daniel Quinn	0aa0513004	Modifications for support for dates	2016-03-24 19:18:33 +00:00
Daniel Quinn	1170139127	Added a consume-start and consume-finish signal	2016-03-14 21:20:44 +00:00
Tikitu de Jager	95217e8e21	Use FileInfo directly instead of via indirection	2016-03-07 21:08:07 +02:00
Tikitu de Jager	1f75af0137	Extract filename parsing into testable class	2016-03-07 21:05:04 +02:00
Pit Kleyersburg	fb36a49c26	Add unpaper as another pre-processing step	2016-03-06 15:30:37 +01:00
Daniel Quinn	495ed1c36c	Added thumbnail generation to the conumer	2016-03-05 12:09:06 +00:00
Daniel Quinn	5d4587ef8b	Accounted for .sender in a few places	2016-03-04 09:14:50 +00:00
Daniel Quinn	070463b85a	s/Sender/Correspondent & reworked the (im\|ex)porter	2016-03-03 20:52:42 +00:00
Daniel Quinn	fad466477b	More verbose error logging	2016-03-03 18:18:48 +00:00
Daniel Quinn	631aa99d92	No need to pass verbosity around anymore	2016-02-28 00:39:40 +00:00
Daniel Quinn	2fe9b0cbc1	New logging appears to work	2016-02-27 20:18:50 +00:00
Daniel Quinn	1aecb1e63a	Compensate for case and format of jpg vs. jpeg	2016-02-23 20:15:13 +00:00
Daniel Quinn	3a7923e32d	Moved pyocr.get_available_tools() into a method	2016-02-21 02:24:05 +00:00
Daniel Quinn	422ae9303a	pep8	2016-02-21 00:14:50 +00:00
Daniel Quinn	51b19f4c19	Issue #57	2016-02-20 22:30:01 +00:00
Pit Kleyersburg	c45f951ca0	Ignore error if orientation detection fails Fixes an additional issue that came up in #48.	2016-02-19 09:52:32 +01:00
Pit Kleyersburg	c34d57a872	Detect image orientation if the OCR supports it Fixes issue #47.	2016-02-18 09:37:13 +01:00
Daniel Quinn	1e7ece81ee	Fixes #45	2016-02-17 23:07:54 +00:00
Daniel Quinn	6f95b05287	Support appropriate sorting for long documents	2016-02-17 00:10:05 +00:00
Pit Kleyersburg	46f8f492f5	Safely and non-randomly create scratch directory Creating the scratch-files in `_get_grayscale` using a random integer is for one inherently unsafe and can cause a collision. On the other hand, it should be unnecessary given that the files will be cleaned up after the OCR run. Since we don't know if OCR runs might be parallel in the future, this commit implements thread-safe and deterministic directory-creation. Additionally it fixes the call to `_cleanup` by `consume`. In the current implementation `_cleanup` will not be called if the last consumed document failed with an `OCRError`, this commit fixes this.	2016-02-16 12:15:57 +01:00

1 2 3

109 Commits