==========================================================================================================
       Porta Fontium Page Segmentation Dataset
==========================================================================================================

BASIC INFORMATION
--------------------
This dataset was created from document images digitised within the Porta fontium project. We have selected a newspaper called “Ascher Zeitung”. This newspaper dates back to the second half of the nineteenth century and it is printed in German using Fraktur font. We collected 25 pages in total. The pages were annotated using Aletheia and the layout desription is saved in the PAGE XML format. The transcription was utilised using tools presented in paper “Tools for semi-automatic preparation of training data for ocr”. All pages are annotated on the paragraph level by bounding polygons. The test set contains also text lines with corresponding baselines.

This corpus is available only for research purposes for free. Commercial use in any form is strictly excluded. For further information, please, see the paper below:

- L. Lenc, J. Martinek, P. Kral Text Line Segmentation in Historical Newspapers, in 21th International Conference on Artificial Intelligence and Soft Computing (ICAISC 2022).

@ARTICLE{icaisc2022,
author={Lenc, L. and Mart\'inek, J. and Kr\'al, P.},
title={Text Line Segmentation in Historical Newspapers},
journal={21th International Conference on Artificial Intelligence and Soft Computing (ICAISC 2022)},
year={2022},
pages={},
doi=,
url={},
document_type={Article},
issn={},
publisher={Springer}
}

Please, cite this paper when you used this database in your experiments.

AUTHORS
------------------------

- Ladislav Lenc llenc(at)kiv.zcu.cz
- Jiri Martinek jimar(at)kiv.zcu.cz
- Pavel Kral pkral(at)kiv.zcu.cz

Date: June, 2022
