
OCR Studio – Language Digitization and Audio Corpus
OCR Studio is a professional-grade Progressive Web Application designed for the comprehensive digitization of Bishnupriya Manipuri and regional literature. By combining a high-accuracy Tesseract.js engine with advanced image preprocessing like perspective correction and auto-deskewing, it transforms physical documents into verified digital archives. The platform bridges the gap between static preservation and modern accessibility, featuring integrated audio narration tools to secure both the linguistic and literary legacy of the community.
Salient Features:
- Regional Language Support: Optimized for the high-standard recognition of Bengali, Assamese, and English scripts.
- Advanced Image Correction: Features manual homography-based perspective flattening and automatic skew detection for improved OCR accuracy.
- PDF Management: Includes a full rendering engine with multi-page navigation and zoom capabilities for complex document processing.
- Integrated Narration Tool: Allows users to record, timer-track, and export high-fidelity WAV audio files for audio corpus building from corresponding text.
- Seamless PWA Architecture: Built for cross-platform accessibility with a responsive design, service worker support, and clipboard integration.
* Developed under BhoomiTech Hritage & Development Foundation patronage
About LexiPic
OCR Studio is a sophisticated Progressive Web Application (PWA) specifically engineered to transform physical documents and multi-page PDFs into high-standard digital archives. At its core, the application utilizes the Tesseract.js engine to provide specialized support for regional scripts, including Bengali and Assamese, alongside English. To ensure the highest level of accuracy for hand-captured images, the platform incorporates advanced image preprocessing capabilities, such as an automated deskewing tool that corrects document tilt and a manual perspective correction feature that uses homography matrices to flatten warped or angled photographs into a rectangular view.
Beyond text extraction, the application serves as a comprehensive “living repository” by integrating multimedia tools for enhanced preservation. Users can manage complex documents through a dedicated PDF navigation interface that includes page jumping and dynamic zoom controls ranging from 50% to 300%. Furthermore, the system includes a built-in audio narration module, allowing users to record vocal performances, track duration in real-time, and export professional-grade WAV files. This combination of verified text recognition and synchronized audio capability ensures that the linguistic and literary heritage of communities is captured with academic-grade precision.
Document & Text Processing
- Allows for the direct upload and processing of both standard image files and multi-page PDF documents.
- Utilizes Tesseract.js to provide optimized OCR for Bengali, Assamese, and English scripts.
- Features a dedicated control suite for multi-page documents, including page jumping, “Next/Previous” buttons, and a total page count display.
- Displays a dynamic progress bar and percentage text to monitor the status of the OCR recognition engine.
- Includes a one-touch copy function that moves extracted text from the output area to the user’s clipboard for external use.
Advanced Image Preprocessing
- Automatically detects the skew angle of a document by calculating pixel variance and rotates the canvas to straighten the text.
- Provides a “Perspective Mode” where users can manually drag four corner handles to the edges of a document to eliminate lens distortion or camera tilt.
- Employs advanced mathematical modeling to flatten warped or angled photographs into high-precision rectangular images.
- Offers a range of scaling options from 50% to 300% to allow for detailed inspection of document clarity before processing.
- Highlights document boundaries with color-coded labels (TL, TR, BR, BL) during manual adjustments for user accuracy.
Multimedia & Accessibility
- Supports live audio recording through the Web Audio API, enabling users to create synchronized voice-overs for extracted text.
- TFeatures a native audio encoder that merges PCM buffers into high-fidelity 16-bit WAV files for local download.
- Includes a real-time recording timer, a dedicated “Stop/Record” toggle, and visual status indicators.
- Engineered as a Progressive Web Application with service worker support for offline capabilities and platform independence.
- Features a retractable “Summary & Controls” header that prioritizes workspace on mobile and desktop screens.
Highlights
A standout highlight of the application is its sophisticated image preprocessing suite, designed to handle the real-world challenges of document photography. Users can utilize an automated deskewing feature that detects text alignment and rotates the canvas to straighten tilted pages. For more complex distortions, the “Perspective Mode” allows for manual corner-handle manipulation, using homography matrices to flatten warped or angled images into a perfectly rectangular, high-precision view. This ensures that even hand-captured photos of heritage documents are optimized for maximum OCR accuracy.
Beyond simple text extraction, OCR Studio functions as a “living repository” by integrating high-fidelity audio narration tools directly into the workflow. The application allows users to record vocal performances or narrations using the Web Audio API, featuring a real-time timer and a dedicated recording interface. Once captured, the system merges audio buffers into a professional-grade 16-bit WAV file that can be exported and saved alongside the digitized text. This dual-layer preservation strategy ensures that both the visual linguistic data and the oral heritage of the community are secured in a single, accessible platform.
How To & FAQ
LexiPic is an application developed and distributed for FREE for sustainable development projects for any community. Its under public license from BhoomiTech Foundation.
USB Web Server: To run some of our applications or some functions to run, you might require a web server installed in your pc. Here is a lightweight third party FREE web server that supports Apache, PHP (v8) and MySQL perfect for our apps.














