WebMIND™

WebMIND is a powerful data-harvesting tool used to generate open source web intelligence (WEBINT). Designed to benefit varied organizations, WebMIND collects, cleans and structures information from websites, forums, social-networking sites and other deep-web sources, making it easily available for third-party data analysis and processing engines.

 

 

Countless people search the vast quantity and variety of content that makes up the web those few nuggets that they are looking for. As the web has grown more social, the social networks, forums and micro-blogs known collectively as the deep web, has become the source for most of valuable data. Why is this important? Because the data hidden in the deep web is not accessible to standard search engines.

In addition, the information you’re searching for is often difficult and time-consuming to uncover, scattered across both the surface and deep web in multiple sources, formats and languages that are constantly being updated.

And when you finally find the data point you spent all that time looking for, you have to manually cut and paste it into a spreadsheet or some other structured format so that you can fully use it by further analyzing it, processing it and otherwise manipulating it with standard or custom analysis solutions. So how do you find the data nuggets you’re looking for easily and in a re-useable format?

WebMIND: Powerful web harvester

WebMIND addresses these and other WEBINT challenges by introducing a flexible and fast-adapting approach to information collection. Its Robot Studio application enables non-developers to build customized collection robots (crawlers) with an easy-to-use graphical interface. These robots can be quickly reused and repurposed for a number of different use cases. Operators, for example, can use WebMIND’s Robot Studio to add new collection capabilities, structuring information for optimal usability in minutes. WebMIND’s Harvesting Management module manages thousands of concurrent API- and robot-driven data collection tasks, ensuring continuous collection while maintaining anonymity, security and virtually unlimited scalability.

stock
ROBOT STUDIO 

Using an embedded browser and graphical robot – building interface, users can simply point and click in order to structure content on any page or element of interest withhigh granularity. Advanced options, such as looping over lists, navigating within websites and entering inputs are also supported. After extracting the target content, users apply filtering to collect only the most relevant information. The robot generates an output file (JSON, CSV, etc.) for furtherprocessing by either a standard or custom analysis solution.

Features

  • Full human behavior and surfing imitation – mimics human surfing patterns and actions
  • Deep web access – extracts information from deep web sources (e.g., social networks, password-protected sites, online archives and databases)
  • Compatible with / fully supports AJAX and other Web 2.0 technologies and data formats
  • Transforms unstructured data into structured format
  • Offers multiple language support
  • Downloads any type of file or attachment (text, media, PDF, Excel, etc.)
  • Supports multiple collection scenarios with pre-defined, out-of-the-box robot templates
ROBOT DISPATCHING & MANAGEMENT

WebMIND makes collection management easy for non-technical users with simple step-by-step wizards.

Its management console has a user-friendly dashboard for managing, scheduling and executing collection tasks, and its robot dispatching algorithms use built-in anonymity and human behavior imitation to avoid detection and blocking. When necessary, robots use a virtual agent to provide required credentials.

Features

  • Full robot dispatching scalability
  • User-friendly management console
  • Built-in harvesting robots
  • Full error reporting with notification of changes in source websites
  • Easy management of multiple proxies to ensure anonymity
  • Cookies management
  • Virtual-agent management
  • Web resources caching
DATA PROCESSING & DELIVERY

Harvested data is cleansed, normalized and exported for use as input in a wide range of analysis tools, including standard spreadsheets, off-the-shelf business intelligence (BI) applications, highly complex custom analysis software and other information systems.

WebMIND’s open APIs enable augmentation of existing information systems with web data, and fusion of content from multiple web sources into unified, more comprehensive databases

Features

  • Capabilities
  • Removes data duplication
  • Cleanses data (e.g., ad removal)
  • APIs and multiple output delivery methods
Delivers the right information for smart analysis  
  • Harvests web content automatically in real-time instead of manual cutting and pasting, freeing your team to focus on analysis.
  • Accesses data in the social networking sites, forums, blogs and micro-blogs of the deep web and preparing it for analysis at a level previously not feasible.
  • Enables creation of sophisticated collection robots with no need for coding or development expertise.
  • Extracts at frequent intervals to keep up with the dynamic nature of social networks.
  • Easily scales to extract more data from more websites as your business expands.
  • Provides an intuitive, easy-to-use platform that empowers your staff.
Enriches existing information systems with web data
  • Seamlessly integrating with existing, internal information systems through open Application Programming Interfaces (APIs).
  • Automated cross-referencing of extracted web content against your data for more complete coverage.
  • Enhancing your organizational databases with relevant information from web knowledge repositories.
  • Fusing data from multiple web sources to create unified databases with more complete information.
Structures web content to be more easily useable
  • Formats web data and exporting it for further processing by standard spreadsheet applications; off-the-shelf, business intelligence (BI) applications; custom analysis software or other information system solutions.
  • Leverages our domain expertise, by using ready-made scraping templates to more quickly creating web collection tasks.

 

Find out how our offerings can help you capitalize on the power of the deep web.

Inquiry Form >>