Skip to the content.

Text Analyser

Table of Contents

About

In today’s world toxic content is much higher in social media and in other services. A text analyser can be used to analyse the sentiment of content, measuring the toxicity and analysing the context of the content. This analyser is developed using

This application is developed using python and flask.

Installation

Using Flask to build a Text Analyser with Restful API Server.

Pre-requisites:

Installation

  1. Clone the repository

     git clone https://github.com/Subramaniam-dot/text_analyser/tree/main/text-analyser.git
    
  2. Create virtual environment and install requirements with pip

     python3 -m venv venv
     pip install -r requirements.txt
    
  3. Create Service Account on Google Cloud Console, we will need this service account json to setup and access Cloud Datastore. The account creation can be done by following the steps given in this link.

    After creating the service account, download the service account json (sa.json) from console and move it to the application root directory.

  4. Create account on Perspective platform, which we will be using to analyse toxicity of the content. After creation the API needs to be enabled on the Google Cloud APIs.

    Update PERSPECTIVE_API_KEY in the config.py under class Config

     class Config:
         PERSPECTIVE_API_KEY = ""
    
  5. Enable Google Natural Language API on the console, this service will be used to analyse context and sentiment of the given document.

  6. Export environment variables

     source local_env.sh
    
  7. Run the application locally

     python main.py
    

Flask Application Structure

.
|──────text-analyser/
| |────modules/
| | |────text-analyser/
| | |──────__init__.py
| | |──────analyser.py
| | |──────base.py
| | |──────views.py
| |────models/
| | |────odb/
| | |──────__init__.py
| | |──────ndb.py
| | |────text_analyser.py/
| | |────users.py
| |────static/
| | |────styles/
| | |──────styles.css
| | |──────text-analyser.css
| |────template/
| | |────text_analyser/
| | |──────display.html
| | |──────document_render.html
| | |────header.html
| | |────.gitignore
| | |────Dockerfile
| | |────LICENSE
| | |────README.md
| | |────config.py
| | |────deployment.yaml
| | |────local_env.sh
| | |────main.py
| | |────requirements.txt
| | |────service.yaml
| | |────wsgi.py
| | |────sa.json

Run flask for development

$ python main.py

Run flask for production

$ gunicorn -w 4 -b 127.0.0.1:5000 run:app

Run with Docker

$ docker build -t text-analyser .

$ docker run -p 5000:5000 --name text-analyser text-analyser
 

System Overview

The front-end serves as a seamless user interface that works as an website. The backend is a Restful service interface for CRUD operations (for example, data query of text content) deployed via google cloud to facilitate scalable performance with kubernetes of the dockerized image. SSL deployments are facilitated through Google Cloud. The application interacts with Google Natural Language V1 and Perspective API for sentiment, context and perspective analysis respectively. Furthermore, the application uses google cloud datastore as database to store our text content information.

Terms used for Text Analysis

System Architecture

alt img

Application Overview

Main Page

This will be the initial dashboard page, where user will input the document inside the text box. Multiple analysis such as context, sentiment and toxicity can be generated by selecting the check box.

alt img

After clicking the analyse button the report(s) are generated for the same.

Context Analysis

alt img

Sentiment Analysis

alt img

Toxicity Analysis

alt img

Overall Result Page

alt img

Application Architecture

Flask App

Flask is a small and lightweight Python web framework that provides useful tools and features that make creating web applications in Python easier. It gives developers flexibility and is a more accessible framework for new developers since you can build a web application quickly using only a single Python file. Flask is also extensible and doesn’t force a particular directory structure or require complicated boilerplate code before getting started.

Flask uses the Jinja template engine to dynamically build HTML pages using familiar Python concepts such as variables, loops, lists, and so on.

RESTful-API

A REST API (also known as RESTful API) is an application programming interface (API or web API) that conforms to the constraints of REST architectural style and allows for interaction with RESTful web services. REST stands for representational state transfer and was created by computer scientist Roy Fielding.

Flask-RESTful is an extension for Flask that adds support for quickly building REST APIs. It is a lightweight abstraction that works with your existing ORM/libraries. Flask-RESTful encourages best practices with minimal setup. If you are familiar with Flask, Flask-RESTful should be easy to pick up.

CRUD Operations

CRUD stands for “Create, Read, Update, and Delete,” which are the four basic database operations. Many HTTP services also model CRUD operations through REST or REST-like APIs.

The four main HTTP methods (GET, PUT, POST, and DELETE) can be mapped to CRUD operations as follows:

External APIs Used

An API is a set of definitions and protocols for building and integrating application software. It’s sometimes referred to as a contract between an information provider and an information user—establishing the content required from the consumer (the call) and the content required by the producer (the response).

The APIs that are used in the project are described below:

Google Natural Language API

The Natural Language API provides a powerful set of tools for analysing and parsing text through syntactic analysis. To perform syntactic analysis, use the analyse Syntax method. Syntactic Analysis consists of the following operations: Sentence extraction breaks up the stream of text into a series of sentences.

External API- Perspective API

Perspective uses machine learning models to identify abusive comments. The models score a phrase based on the perceived impact the text may have in a conversation. Developers and publishers can use this score to give feedback to commenters, help moderators more easily review comments, or help readers filter out “toxic” language.

Perspective models provide scores for several different attributes. In addition to the flagship Toxicity attribute, here are some of the other attributes Perspective can provide scores for:

Cloud Infrastructure

Cloud computing infrastructure is the collection of hardware and software elements needed to enable cloud computing. It includes computing power, networking, and storage, as well as an interface for users to access their virtualised resources. The virtual resources mirror a physical infrastructure, with components like servers, network switches, memory and storage clusters.

Why Cloud Computing Infrastructure?

Cloud infrastructure offers the same capabilities as physical infrastructure but can provide additional benefits like a lower cost of ownership, greater flexibility, and scalability.

Cloud computing infrastructure is available for private cloud, public cloud, and hybrid cloud systems. It’s also possible to rent cloud infrastructure components from a cloud provider, through cloud infrastructure as a service (Iaas). Cloud infrastructure systems allow for integrated hardware and software and can provide a single management platform for multiple clouds.

Google Cloud

Why google cloud?

Google Cloud Platform (GCP) naturally offers a more limited range of services and doesn’t command the same global spread of data centre offered by AWS and Azure. It does, however, give customers a highly specialised service in three main streams: big data, machine learning, and analytics, with good scale and stable load balancing, as well as those famously low response times. Google’s container offering provides users with a significant advantage as it developed the very Kubernetes standard now utilised by competitors AWS and Azure.

Customers tend to choose GCP as a secondary vendor in a hybrid solution, though it is becoming increasingly popular with organisations that are direct competitors with Amazon, and therefore cannot use AWS. It’s important to note that GCP is very open-source- and DevOps-centric, and as a result does not integrate as well with Microsoft Azure.

This distribution of resources provides several benefits, including redundancy in case of failure and reduced latency by locating resources closer to clients. This distribution also introduces some rules about how resources can be used together.

The backend is currently accessible as a flask application on Google cloud (Google Kubernetes Engine).

http://34.105.204.16:8000/

This deployment configured with load balancer

Cloud Datastore

Why Google Cloud Datastore?

Datastore is a highly scalable NoSQL database. Datastore automatically handles sharding and replication, providing the application with a highly available and durable database that scales automatically to handle incoming load. Datastore provides a myriad of capabilities such as ACID transactions, SQL-like queries, indexes, and much more.

Datastore features include:

Kubernetes

Kubernetes is a portable, extensible, open source platform for managing containerised workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available.

Kubernetes, at its basic level, is a system for running and coordinating containerised applications across a cluster of machines. It is a platform designed to completely manage the life cycle of containerised applications and services using methods that provide predictability, scalability, and high availability.

Kubernetes allows users to run scalable, highly available containerised workloads on a highly abstracted platform. While Kubernetes’ architecture and set of internal components can at first seem daunting, their power, flexibility, and robust feature set are unparalleled in the open-source world.

service.yaml contains Kubernetes load-balancer configuration for the project.

apiVersion: v1
kind: Service
metadata:
  name: text-analyer
spec:
  type: LoadBalancer
  selector:
    app: text-analyer
  ports:
    - port: 80
      targetPort: 8082

deployment.yaml contains Kubernetes nodes and replicas configuration for the project.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: text-analyer
  labels:
    name: text-analyer
spec:
  replicas: 2
  selector:
    matchLabels:
      name: text-analyer
  template:
    metadata:
      name: text-analyer
      labels:
        name: text-analyer
    spec:
      containers:
        - name: text-analyer
          image: gcr.io/eminent-enigma-364712/text-analyser:v1.0
          ports:
            - containerPort: 8082

Docker

Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly. With Docker, you can manage your infrastructure in the same ways you manage your applications. By taking advantage of Docker’s methodologies for shipping, testing, and deploying code quickly, you can significantly reduce the delay between writing code and running it in production.

Docker provides the ability to package and run an application in a loosely isolated environment called a container. The isolation and security allows you to run many containers simultaneously on a given host. Containers are lightweight and contain everything needed to run the application, so you do not need to rely on what is currently installed on the host. You can easily share containers while you work, and be sure that everyone you share with gets the same container that works in the same way.

Docker provides tooling and a platform to manage the lifecycle of your containers: