Fyrirlestur um NLP-post-processing of OCR-output of business cards

 

Máltæknisetur gengst fyrir fyrirlestri þriðjudaginn 25. maí kl. 12:00 í nýbyggingu Háskólans í Reykjavík í Nauthólsvík, sal V.1.01.

Fyrirlesari er Dr. Bettina Harriehausen-Mühlbauer frá háskólanum í Darmstadt í Þýskaland. Fyrirlesturinn nefnist "NLP-post-processing of OCR-output of business cards – project description of ongoing project between Hochschule Darmstadt and Linguatec/Munich". Lýsing á efni fyrirlestrarins og kynning fyrirlesara fer hér á eftir.

 Abstract

------------------

The exchange of business cards belongs to daily (business) life. But we spend (waste?) a lot of time to post-process the received cards in a way so that the included data/information can later be easily used in an address organizing program.

The goal of the project is the design and development of an innovative and competitive „Business Card Organizer“ using a photo-mobile phone / Smartphone to combine images, audio files, geo-data, as well as textual data (from the business cards). The innovation is the business card being photographed via Smartphone (or alike) and combining that with the above mentioned data (audio,…). All data have to be interpreted/analyzed and stored in an address-DB.

Business cards have many formats and cause the OCR software various problems, the worst being a bad quality of the input picture. Further challenges to the project, the hardest one of them being the extraction of information from the OCR output, demand NLP post-processing of the data, e.g. the unambiguous identification and characterization of “atomic” units (for example, Prof. is a job title, but Dr. can be both a “Doctor” and an abbreviation for “Drive”.)

In this talk, I will line out the project and discuss the challenges we are facing.

About the speaker

------------------

Dr. Bettina Harriehausen-Mühlbauer is a professor in the Department of Computer Science at the University of Applied Sciences, Darmstadt, Germany. She studied computer science and linguistics at the University of California, Berkeley, USA and got her doctorate degree in Computational Linguistics under the supervision of Prof. Charles Fillmore, linguistics, and Prof. Robert Wilensky, Computer Science.

Bettina worked at IBM's research labs in Yorktown Heights, New York, USA and Heidelberg, Germany, as well as the development lab in Bethesda, USA.

        During her 13 years with IBM she worked in the A.I. group, primarily developing NLP tools, such as an electronic grammar for text processing, a grammar for machine translation, and various e-learning applications.

8 years ago, she left industry to accept her position as a tenure professor at the University of Applied Sciences, Darmstadt, Germany.

        Her fields of focus in lectures and research are multimedia, e-learning, and natural language processing. She is supervising students on Bachelor, Master, and PhD level. In parallel to her post in Darmstadt, she is regularly teaching summer schools at the University of Utah, Salt Lake City, and giving guest lectures in Oulu / Finland, Vellore / India, Xi'an / China, and Townsville / Australia.

        Bettina is a consultant for the German Academic Exchange Service (DAAD) for their North America programme and she is an invited external examiner of the HETAC accreditation agency in Ireland. For more information, you can visit her homepage (www.fbi.h-da.de/~harriehausen).


 

Tungumál


Leita




Þetta vefsvæði byggir á Eplica