Seminar - Text Mining

Date and Time:20 April 2010 10.00 (arrival and coffee 09.00-10.00)
Location:Jaarbeurs Utrecht
Maximum participants:40

Registration via WON secretary.
Registration for non-members opens: 7th April 2010.

Participant list will be filled in the following order:

Participation is free, however, registration obliges to participation. Number 41 on the list will be disappointed to hear there were empty seats.

Agenda

  1. 10.00-11.00
    Text Mining: An introduction
    Text Mining in non-patent and patent literature - Wolfgang Thielemann
  2. 11.00-11.15
    Coffee Break
  3. 11.15-12.15
    How text mining is changing the way you search - Wolfgang Thielemann
  4. 12.15-13.15
    Lunch
  5. 13.15-14.00
    Linguamatic’s Text Mining solution I2E - Phil Hastings
  6. 14.00-15.15
    Text mining in Patent Analysis - Piet van Zanten
  7. 15.15-15.30
    Tea break
  8. 15.30-16.15
    Treparel’s Text Mining solution KMX - Anton Heijs
  9. 16.15-16.45
    Wrap-up and discussion
  10. 16.45
    End

Summaries

Text mining in Patent Analysis

In this WON seminar the place of text mining, used in patent analysis, will be described. In patent analysis classification and clustering are important tasks.

Since more than 10 years Philips IP&S has been looking for text mining methods which can be helpful for classification and clustering. A short overview will be given of several trials, together with some limitations of the methods used. Especially quality is an important item. Quality in terms of Recall and Precision will be discussed. To measure quality a defined set of training and test documents will be necessary. The definition of such a set will be given.
Text mining methods are based on statistical or on linguistic principles. In this presentation the focus will be on statistical methods, especially on SVM (Support Vector Machines).
We intend to show, in a live demonstration of the KMX analytics suite, several aspects of working on classification and clustering.

TREPAREL - KMX

We will describe the ideas and concepts behind the text mining techniques of KMX such as advanced document classification and clustering techniques.
Patent classification determines for every patent in a document set if it belongs to a certain class or not (binary classification) or what the probability is that a patent belongs to a set of classes (multi class classification).
This is determined by an algorithm called the “classifier” which is trained by providing the algorithm with a small set of positive and negative examples.
If one does not know how many classes exist in a document set, one uses clustering techniques to determine the groups in the set.
The advanced classification and clustering algorithms of KMX which provide searchers with very accurate (high precision) and complete (high recall) results will be discussed.

Speakers

Biography Wolfgang Thielemann

Wolfgang Thielemann is currently head of “Information Retrieval and Analysis” at Bayer Schering Pharma AG.
After receiving his Ph.D. in Organic Chemistry from the University of Münster, Germany in 1997 he spend a year as postdoctoral fellow at the University of California, Berkeley.
In 1999 he started as medicinal chemist in the Chemical Research Department of Bayer HealthCare. After 3 years of chemical research he moved to the &lquo;Scientific Information and Documentation&rquo; department as head of the patent information group and in 2005 as head of the information retrieval group.
After the merger with Schering AG he became head of “Information Retrieval and Analysis&rquo; within the “Global Research & Development Information” department of Bayer Schering Pharma AG.
The main focus of his group is the provision of professional in-depth searches and analyses of patent, literature and pipeline information and the development and implementation of advanced post-processing and text mining technologies.

Biography Piet van Zanten

Piet van Zanten started his career in 1969 at Philips Research. After twelve years of research he fulfilled several positions within Philips Product divisions. In 1987 he joined Philips Intellectual Property & Standards (IP&S) as a Patent Searcher. He has held various key positions in the search department, nowadays the Business Intelligence group.
He retired on January 1, 2010. After that he founded “VZ Patent Intelligence” on March 1, 2010. This company is active in the area of patent analysis.

Linguamatics

For researchers and information professionals who need answers from extensive literature resources, Linguamatics’ high performance semantic knowledge discovery platform, I2E, rapidly reveals relevant facts and relationships from unstructured/semi-structured text. I2E’s agile text mining delivers value for applications ranging from target selection, biomarker discovery, safety-tox and drug repurposing to trend, sentiment, and competitor analysis.
I2E offers particular techniques relevant to patent mining, including use of linguistics for filtering noise and for terminology development, flexible categorization and the ability to plug in domain-specific thesauri. By adopting these techniques, users can accelerate the systematic and comprehensive analysis of patents, and combine the results with insights from other literature sources such as an organization’s internal documents or external scientific literature.
Linguamatics has a rapidly growing user community with I2E deployed at most of the world’s largest pharmaceutical companies. The company was founded in 2001, is headquartered in Cambridge, UK, and has US operations in Boston, MA.

Treparel

Treparel focuses on the intellectual property (IP) intensive industries and the life science industry with text mining and visualization solutions and consultancy for patent and non-patent literature.
The KMX analytics suite provides in-depth analysis of a patent portfolio in order to optimize value and the analysis of scientific literature for R&D departments to help them determine a research strategy.
Treparel offers state of the art patent analytics solutions, including advanced classification, clustering and visualization, with it’s KMX Patent Analytcs platform.
KMX Patent Analytics comes in a Standard Edition (SE) for smaller departments of individual professionals and an Enterprise Edition (EE) for the major patent departments.

Biography Anton Heijs

Anton Heijs is CEO and CTO of Treparel. He has a PhD in physics and a strong background in scientific computing, data mining and text mining and scientific and information visualization. He is the founder of Treparel and the KMX technology.