OCR Studio

OCR Studio – Language Digitization and Audio Corpus

OCR Studio is a professional-grade Progressive Web Application designed for the comprehensive digitization of Bishnupriya Manipuri and regional literature. By combining a high-accuracy Tesseract.js engine with advanced image preprocessing like perspective correction and auto-deskewing, it transforms physical documents into verified digital archives. The platform bridges the gap between static preservation and modern accessibility, featuring integrated audio narration tools to secure both the linguistic and literary legacy of the community.

Salient Features:

Regional Language Support: Optimized for the high-standard recognition of Bengali, Assamese, and English scripts.
Advanced Image Correction: Features manual homography-based perspective flattening and automatic skew detection for improved OCR accuracy.
PDF Management: Includes a full rendering engine with multi-page navigation and zoom capabilities for complex document processing.
Integrated Narration Tool: Allows users to record, timer-track, and export high-fidelity WAV audio files for audio corpus building from corresponding text.
Seamless PWA Architecture: Built for cross-platform accessibility with a responsive design, service worker support, and clipboard integration.

* Developed under BhoomiTech Hritage & Development Foundation patronage

About LexiPic

OCR Studio is a sophisticated Progressive Web Application (PWA) specifically engineered to transform physical documents and multi-page PDFs into high-standard digital archives. At its core, the application utilizes the Tesseract.js engine to provide specialized support for regional scripts, including Bengali and Assamese, alongside English. To ensure the highest level of accuracy for hand-captured images, the platform incorporates advanced image preprocessing capabilities, such as an automated deskewing tool that corrects document tilt and a manual perspective correction feature that uses homography matrices to flatten warped or angled photographs into a rectangular view.

Beyond text extraction, the application serves as a comprehensive “living repository” by integrating multimedia tools for enhanced preservation. Users can manage complex documents through a dedicated PDF navigation interface that includes page jumping and dynamic zoom controls ranging from 50% to 300%. Furthermore, the system includes a built-in audio narration module, allowing users to record vocal performances, track duration in real-time, and export professional-grade WAV files. This combination of verified text recognition and synchronized audio capability ensures that the linguistic and literary heritage of communities is captured with academic-grade precision.

Document & Text Processing

Allows for the direct upload and processing of both standard image files and multi-page PDF documents.
Utilizes Tesseract.js to provide optimized OCR for Bengali, Assamese, and English scripts.
Features a dedicated control suite for multi-page documents, including page jumping, “Next/Previous” buttons, and a total page count display.
Displays a dynamic progress bar and percentage text to monitor the status of the OCR recognition engine.
Includes a one-touch copy function that moves extracted text from the output area to the user’s clipboard for external use.

Advanced Image Preprocessing

Automatically detects the skew angle of a document by calculating pixel variance and rotates the canvas to straighten the text.
Provides a “Perspective Mode” where users can manually drag four corner handles to the edges of a document to eliminate lens distortion or camera tilt.
Employs advanced mathematical modeling to flatten warped or angled photographs into high-precision rectangular images.
Offers a range of scaling options from 50% to 300% to allow for detailed inspection of document clarity before processing.
Highlights document boundaries with color-coded labels (TL, TR, BR, BL) during manual adjustments for user accuracy.

Multimedia & Accessibility

Supports live audio recording through the Web Audio API, enabling users to create synchronized voice-overs for extracted text.
TFeatures a native audio encoder that merges PCM buffers into high-fidelity 16-bit WAV files for local download.
Includes a real-time recording timer, a dedicated “Stop/Record” toggle, and visual status indicators.
Engineered as a Progressive Web Application with service worker support for offline capabilities and platform independence.
Features a retractable “Summary & Controls” header that prioritizes workspace on mobile and desktop screens.

Highlights

A standout highlight of the application is its sophisticated image preprocessing suite, designed to handle the real-world challenges of document photography. Users can utilize an automated deskewing feature that detects text alignment and rotates the canvas to straighten tilted pages. For more complex distortions, the “Perspective Mode” allows for manual corner-handle manipulation, using homography matrices to flatten warped or angled images into a perfectly rectangular, high-precision view. This ensures that even hand-captured photos of heritage documents are optimized for maximum OCR accuracy.

Beyond simple text extraction, OCR Studio functions as a “living repository” by integrating high-fidelity audio narration tools directly into the workflow. The application allows users to record vocal performances or narrations using the Web Audio API, featuring a real-time timer and a dedicated recording interface. Once captured, the system merges audio buffers into a professional-grade 16-bit WAV file that can be exported and saved alongside the digitized text. This dual-layer preservation strategy ensures that both the visual linguistic data and the oral heritage of the community are secured in a single, accessible platform.

How To & FAQ

LexiPic is an application developed and distributed for FREE for sustainable development projects for any community. Its under public license from BhoomiTech Foundation.

USB Web Server: To run some of our applications or some functions to run, you might require a web server installed in your pc. Here is a lightweight third party FREE web server that supports Apache, PHP (v8) and MySQL perfect for our apps.

Average Elevation:	32.8 m above mean sea level (MSL).
Elevation Range:	21.1 m to 44.5 m.
Terrain Type:	Riverine alluvial plains with a very gentle slope.
Geological System:	Quaternary system (Holocene and Pleistocene formations).
Lithology:	Unconsolidated silt, sand, clay, gravel, and concretions.

Northern Region:	Fine loamy to fine silty soils.
Southern & Western Regions:	Coarse loamy dystrochrept soils and cut-off meander deposits.

Aquifer Condition:	Shallow and unconfined.
Average Water Table:	Approximately 30 m MSL.
Pre-Monsoon Depth:	Generally 1-2m; notably shallow (0-1 m) in western fringes and deeper (2-3 m) in southern areas.
Post-Monsoon Depth:	Ranges between 1-2 m below ground level.

Parameter	Pre-Monsoon Range	Post-Monsoon Range
pH (Acidity)	6.06 - 8.33	6.0 - 8.5
EC (Conductivity)	21.54 - 667.8 μS/cm	22.1 - 544.9 μS/cm
TDS (Solids)	14.22 - 440.75 mg/L	14.59 - 359.63 mg/L
SAR (Sodium Absorption Ratio)	0.2 - 1.5	0.2 - 2.2
Sodium Concentration	20 - 60	20 - 40
Magnesium Ratio	< 50%	< 50%

Iron Concentration	> 1 ppm on the north eastern fringes.
Arsenic	> 10 ppb on the north-eastern and south-western side

Year	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Total
2023	12	25	180	320	520	650	720	610	420	210	90	35	3792
2024	15	30	210	350	480	700	680	580	380	190	80	30	3725
2025	10	28	195	330	550	720	750	600	410	220	85	40	3938

Year	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2023	18.5	20.0	24.5	26.5	27.5	28.0	28.2	28.0	27.8	26.5	23.5	20.0
2024	18.8	20.2	25.0	27.0	27.8	28.2	28.5	28.3	28.0	26.8	23.8	20.2
2025	18.6	20.1	24.8	26.8	27.6	28.1	28.3	28.2	27.9	26.6	23.6	20.1

2025 Central Floodplain (East on Top)	2014 Central Floodplain (East on Top)

2025 West Floodbasin (North on Top)	2014 West Floodbasin (North on Top)

OCR Studio

OCR Studio – Language Digitization and Audio Corpus

About LexiPic

Description

Features in Detail

Highlights

Advanced Perspective and Skew Correction

Integrated Multimedia Archiving

How To & FAQ

Does this app carry a license?