Data Science


Information Service Provider

Document Abstraction

This case study is about the collection of lease agreements from a company. The lease agreement is a contract where it comprise of the terms and conditions under which one party agrees to rent property owned by another party. The data is extracted and abstracted based upon different clauses. It observes the different formats of the documents and works efficiently according to time. The abstracted data holds high accuracy rate.

Client Profile

Our client is a leading global information service provider working for a company based in USA. It is a part of the global conglomerate that offers an extensive range of data, content and design services. They process the lease agreements of Landlords in the space of commercial and residential properties across North America. They use smart tools and technologies and offer customized business solutions to their clients by increasing their productivity.


The objective of this project was to abstract the data from various documents with the help of the neural networks through a point generator technique. The abstracted document is stored in a database that is categorised for various clauses and subclauses that exists in these agreements. Retrieval of critical data by the respective stakeholders is the key objective apart from optimising the time taken to digitise the agreement.

Key Challenges

The primary challenge was faced when we abstract the data, few significant features like date, amount, time period were left out in the abstraction process. It makes it difficult for the users to understand the document in depth. The processing took a lot of time since there was a huge collection of documents. Each document held different formats. The presence of complex clauses was missed out. The indexed and non-indexed lease agreements made it hard to abstract the data.

Our Approach

Our approach was to search a keyword, both objective and subjective, from the relevant clause. It is later abstracted into a simple document. This process can avoid the errors and acts as a time efficient process.

Our Solution

We are using the neural networks, point generators to fetch the keywords. Once the keywords are fetched, we extract the corresponding data and abstract the extracted data by means of machine learning algorithm for the respective bag of words containing the relevant keywords. Training the system on multiple occurrence of these key words across multiple documents increases the resultant efficiency.

Technology Used

  • Deep learning
  • Machine learning
  • NLP
  • OCR
  • Bag of words (Vectorization)


We abstracted the data with high accuracy of 75% from the lease documents efficiently. It also features a least loss accuracy of 3% of data.

Download Document Abstraction Case Study


Send download link to:

Other Case Studies