WebMIND is a powerful data-harvesting tool used to generate open source web intelligence (WEBINT). Designed to benefit varied organizations, WebMIND collects, cleans and structures information from websites, forums, social-networking sites and other deep-web sources, making it easily available for third-party data analysis and processing engines.
Countless people search the vast quantity and variety of content that makes up the web those few nuggets that they are looking for. As the web has grown more social, the social networks, forums and micro-blogs known collectively as the deep web, has become the source for most of valuable data. Why is this important? Because the data hidden in the deep web is not accessible to standard search engines.
In addition, the information you’re searching for is often difficult and time-consuming to uncover, scattered across both the surface and deep web in multiple sources, formats and languages that are constantly being updated.
And when you finally find the data point you spent all that time looking for, you have to manually cut and paste it into a spreadsheet or some other structured format so that you can fully use it by further analyzing it, processing it and otherwise manipulating it with standard or custom analysis solutions. So how do you find the data nuggets you’re looking for easily and in a re-useable format?
WebMIND: Powerful web harvester
WebMIND addresses these and other WEBINT challenges by introducing a flexible and fast-adapting approach to information collection. Its Robot Studio application enables non-developers to build customized collection robots (crawlers) with an easy-to-use graphical interface. These robots can be quickly reused and repurposed for a number of different use cases. Operators, for example, can use WebMIND’s Robot Studio to add new collection capabilities, structuring information for optimal usability in minutes. WebMIND’s Harvesting Management module manages thousands of concurrent API- and robot-driven data collection tasks, ensuring continuous collection while maintaining anonymity, security and virtually unlimited scalability.
Using an embedded browser and graphical robot – building interface, users can simply point and click in order to structure content on any page or element of interest withhigh granularity. Advanced options, such as looping over lists, navigating within websites and entering inputs are also supported. After extracting the target content, users apply filtering to collect only the most relevant information. The robot generates an output file (JSON, CSV, etc.) for furtherprocessing by either a standard or custom analysis solution.
- Full human behavior and surfing imitation – mimics human surfing patterns and actions
- Deep web access – extracts information from deep web sources (e.g., social networks, password-protected sites, online archives and databases)
- Compatible with / fully supports AJAX and other Web 2.0 technologies and data formats
- Transforms unstructured data into structured format
- Offers multiple language support
- Downloads any type of file or attachment (text, media, PDF, Excel, etc.)
- Supports multiple collection scenarios with pre-defined, out-of-the-box robot templates
WebMIND makes collection management easy for non-technical users with simple step-by-step wizards.
Its management console has a user-friendly dashboard for managing, scheduling and executing collection tasks, and its robot dispatching algorithms use built-in anonymity and human behavior imitation to avoid detection and blocking. When necessary, robots use a virtual agent to provide required credentials.
- Full robot dispatching scalability
- User-friendly management console
- Built-in harvesting robots
- Full error reporting with notification of changes in source websites
- Easy management of multiple proxies to ensure anonymity
- Cookies management
- Virtual-agent management
- Web resources caching
Harvested data is cleansed, normalized and exported for use as input in a wide range of analysis tools, including standard spreadsheets, off-the-shelf business intelligence (BI) applications, highly complex custom analysis software and other information systems.
WebMIND’s open APIs enable augmentation of existing information systems with web data, and fusion of content from multiple web sources into unified, more comprehensive databases
- Removes data duplication
- Cleanses data (e.g., ad removal)
- APIs and multiple output delivery methods