This is an external open-source GitHub repository imported into the WOCSOL Marketplace for discovery. The original repository owner is the primary creator.
Assignment By - MASTERS INDIA Project - Extract invoice number, invoice date, line items from invoice images. Project Details - MySelf Aditya, After my research toward this assignment, i found, this is the problem of OCR (OPTICAL CHARACTER RECOGNITION) Basically the work of OCR is to transform & extract the data from semi-structured(BILLs, INVOICES) or un-structured(CONTRACT, LEGAL DOCUMENTS) to structured format(CSV, EXCEL, XML, DATABASES). By this project
Assignment By - MASTERS INDIA Project - Extract invoice number, invoice date, line items from invoice images. Project Details - MySelf Aditya, After my research toward this assignment, i found, this is the problem of OCR (OPTICAL CHARACTER RECOGNITION) Basically the work of OCR is to transform & extract the data from semi-structured(BILLs, INVOICES) or un-structured(CONTRACT, LEGAL DOCUMENTS) to structured format(CSV, EXCEL, XML, DATABASES). By this project i got idea about REAL-LIFE Problem, th
# OPTICAL-CHARACTER-RECOGNITION Assignment By - MASTERS INDIA Project - Extract invoice number, invoice date, line items from invoice images. Project Details - MySelf Aditya, After my research toward this assignment, i found, this is the problem of OCR (OPTICAL CHARACTER RECOGNITION) Basically the work of OCR is to transform & extract the data from semi-structured(BILLs, INVOICES) or un-structured(CONTRACT, LEGAL DOCUMENTS) to structured format(CSV, EXCEL, XML, DATABASES). By this project i got idea about REAL-LIFE Problem, the problem of entire enterprise are data entry. Data entry is preety expensive & time consuming So, basically OCR creates an environment without manual data entry. 90% of stuffs can be configure by OCR & humans will only interrupt when accuracy is not good or some intervention is required. TOOLS of OCR are available in the market like- ABBYY, ROSSUM, AUTOMATION ANYWHERE, XTRACTA for the sake of Assignment, I did this Project with the help of CV2 (Computer Vision) and PYTESSERACT (python library for OCR). ''' ''' the main problem with this project is that we dont have any similar kind of format or structure, [things can be done- REGEX or ROI] initially i was looking for regex to solve this problem, but again due to variation in every format, i can just fetch the invoice date automatically by regex. if i show you the images, we can fetch the each given date by REGEX then store it in list, we know that invoice date will be store firstly & be the index 0. And hence we can easily fetch 0th index element every time & append it in CSV. But the problem arrives in remains 2 that we don't have any common starting point or end point hence we can't apply regex here......
Ask questions or discuss this product. New comments are reviewed before publishing.
Loading comments...