Clusterpoint
Clusterpoint Ltd.
Private
Industry enterprise software
database software
cloud computing
Founded August 21, 2006
Founder Gints Ernestsons
Jurgis Orups
Oskars Viksna
Headquarters London, United Kingdom
Products Clusterpoint DBMS
Clusterpoint DBaaS
NTSS
GOL
Website www.clusterpoint.com
Clusterpoint Database
Developer(s) Clusterpoint Ltd.
Initial release 2006
Stable release
4.0 / October 8, 2015 (2015-10-08)
Available in English
Type distributed database
enterprise search
operational database
document-oriented
NoSQL, XML, JSON, SQL database
cloud DBAAS
Website www.clusterpoint.com

Clusterpoint is a European software technology company developing and supporting Clusterpoint database management system platform. [1][2][3]

Company was founded by software engineers.[4] Company is venture capital backed. [5][6][7][8]

Clusterpoint is a schema-free document database that removes complexity, scalability problems and performance limitations of relational database architecture.[9]

Clusterpoint database eliminates customer integration efforts among database, search and analytics platforms. Clusterpoint database replaces integrated multi-platform solutions with a single-platform and one-API solution, typically, where SQL RDBMS data is used in combination with an enterprise search engine to address performance and scalability needs of web and mobile applications, or where Big data and analytics tools such as Hadoop might be needed due to sheer volume of data or large computing workloads.[10]

The first version of the Clusterpoint database was released in 2006. The most recent Clusterpoint version 4 includes JavaScript computing engine and JS/SQL query language, it was released in October, 2015.[11]

Clusterpoint database is a document-oriented database server platform for storage and processing of XML and JSON data in a distributed fashion on large clusters of commodity hardware. Database architecture blends ACID-compliant OLTP transactions, full-text search and analytics in the same code, delivering high availability, fault-tolerance, data replication and security.[12][13]

Clusterpoint database enables to perform transactions in a distributed document database model in the same way as in a SQL database. Users can perform secure real-time updates, free text search, analytical SQL querying and reporting at high velocity in very large distributed databases containing XML or JSON document type data. Transactions are implemented without database consistency issues plaguing most of NoSQL databases and can safely run at high-performance speed previously available only with relational databases.[14]Real time Big data analytics, replication, loadsharing and high-availability are standard features of Clusterpoint database software platform.[15]

Clusterpoint database enables web-style free text search with natural language keywords and programmable relevance sorting of results. Constant and predictable search response time with latency in milliseconds and high quality of search results are achieved using policy-based inverted indexation and unique relevance ranking method. Clusterpoint database version 4 supports JS/SQL query language. Classic SQL queries can be combined with free text search and with custom distributed computing functions written in JavaScript, executed in a single REST API call.[16]

For most of its history Clusterpoint was servicing business customers as an enterprise software vendor.[17][18][19]

Use cases

Clusterpoint database delivers real-time business information management in electronic XML or JSON document format. It can be used as a high-performance operational database for web and mobile database services requiring scalability, fast speed and strong security. Software enables to safely handle financial, billing, security, medical, travel, information services, e-commerce, government and municipal open data and other data stored in electronic document data format that uses industry standard XML and JSON markup.[20][21][22]

Generic database use cases can also be where flexible XML or JSON document data model commonly fits best: processing mix of variable data, including structured data, unstructured data (textual), semi-structured data and blobs such as images, voice, video files. Software can be used for computing tasks requiring low millisecond-range latency data processing services in distributed databases, for instance, to feed data at high speed to interactive NoSQL visualizations, Big data online analytics and safe reporting in large databases.[23]

Distinctive technology

High-speed ACID-compliant Transactions in Distributed Document Database

Clusterpoint database provides distributed, ACID-compliant transactions, including basic SQL support, in a document model database that is massively scalable for Big data volumes. Distributed transactions, data storage, search and analytics can be performed at high performance and high availability, while delivering strong database consistency and security. It gives Clusterpoint performance and scalability advantage over other NoSQL document databases, that are compromising on security and integrity of customer data, typically providing only limited eventual consistency at high availability.[24]

Programmable database ranking for search relevance in Big data

Another distinction is programmable ranking index, that can be flexibly customized through relevance rules assigned in the Document Policy configuration file. It is a small XML configuration file accompanying each Clusterpoint database. Database search behavior can be quickly changed through configuring of ranking index rules vs modifying software code. The increasing importance of ranking is directly derived from the explosion in the volume of data handled by current applications. The user would be overwhelmed by too many unranked results. Furthermore, the sheer amount of data makes it almost impossible to process queries in the traditional compute-then-sort approach. Customer application software code can be simplified by delegating most indexing and search sorting details, including ranking algorithms, to the Document policy configuration attributes in Clusterpoint database. Document policy, when customized for a particular web or mobile application need, determines the particular ranking index organization at the physical storage level by presorting the actual index data for custom relevance algorithms. Developers can avoid most of complex SQL programming for data sorting and grouping in their application software code, while database hardware can be liberated from the excessive Big data sorting per each database query. Instead the Clusterpoint database ranking index delivers fast search and relevance sorting functionality, without performance degradation characteristic to relational SQL databases.

Ranking index method, applied to document database model, enables Clusterpoint to outperform SQL databases at search by several orders of magnitude. It solves information overload and latency problem for interactive web and mobile applications processing Big data. Today limited-size mobile device screens and network bandwidth restrictions prevent users requesting and processing large size data volumes per each query. Database search and querying need to be interactive and transactional to satisfy Internet users. Clusterpoint ranking index was designed for this computing model. It extracts relevant data first and returns information page by page in decreasing relevance. For instance, using only free text search, latency in large databases containing billions of document will be milliseconds, while relevance ranking will prevent overwhelming end-user with too much low-quality search results. This is also a crucial design element for distributed document database architecture: it makes its index scalable so that it can be safely shared across large cluster of servers without significant performance loss at data injection, free text search and access.[25]

Additionally Clusterpoint ranking index can be fine-tuned by developers to match the natural language terms in queries to the most relevant textual data content in a customer database. When querying a distributed database with free text format keywords in natural language or with phrases, ranking index sorts out the best relevant documents where query is matching textual content parts in the database, taking into account natural language density, word statistics and language-specific grammatics attributes (incl. stemming, spelling, collation), performing automatic self merged joins. Very few database products support similar type of self-merge joins.[26]

Adjusting ranking rules, customers can configure various grouping, ordering and positioning algorithms for their search results through the ranking index so that it starts delivering the best end-user search experience. A set of ranking configuration rules, once established for a particular database, is then being applied and maintained automatically by Clusterpoint database when customer data is loaded or updated through Clusterpoint database CRUD API commands.

Developers can freely use full text search as the fastest information access method in Clusterpoint databases, while having capability to flexibly query the database structure with standard analytics using SQL. In Clusterpoint database both methods can be combined in a single query, enabling combined analytical and search queries in mixed structured and unstructured data content.

Clusterpoint database deployments

Clusterpoint database is used in production deployments of enterprise customers operating their 24/7 web and mobile services from 2006. Vendor has built partnerships that provide solutions in different industry sectors, such as:

  • Governance, Risk Management and Regulatory Compliance[27]
  • Agile Web Software Development[28]
  • Online Business Intelligence in NoSQL and Big Data[29]
  • Cloud Computing Services
  • Web Site Design[30]
  • Cybersecurity and Lawful Intercept[31]

A public demonstration solution powered by Clusterpoint database, illustrating how document type data of the entire Wikipedia and DBpedia (English) data corpus can be efficiently managed within a single consolidated database platform is available on the Web site Wikisearch.net.

Competitors

Clusterpoint database technology is positioned by industry experts among other emerging NoSQL and Big data technologies having distributed data management architecture.[32]

Platform Components

The Clusterpoint database software source code is being developed in C and C++ programming languages and supports multi-threading, multi-core CPUs and distributed computing. Primary method of developer's access to the platform capabilities is REST API. Clusterpoint database software is being managed across the large cluster of commodity hardware with Clusterpoint Console application. Console provides centralized administration and control for all customer databases through a single web GUI. In order to access Clusterpoint Console, or download it along Clusterpoint database software for on-premises use, customers have to sign up for Clusterpoint Cloud Database Account on the vendor website. Sign-up is free, no credit card required.

Architecture

Clusterpoint database has multi-master shared-nothing, distributed, document-oriented database architecture storing XML and JSON data types. [33]

It works as transactional high-speed OLTP database for XML and JSON data objects. New content can be added, updated and deleted in real-time, with real-time all changed data indexing, including full text, date, numeric, geospatial data. Index data immediately can be read for search and analytics after each document has been inserted, updated or deleted, while ACID-compliant transactions provide security and consistency. Database API also supports storage and processing of binary data as part of document data object model.

It supports no-single-point-of failure fault-tolerant infrastructure hardware setup with multi-datacenter replication capability for the entire distributed database cluster.

Query syntax

To query a database customers can use either free text query, XML-based syntax, JS/SQL query or Clusterpoint REST API that supports JSON.

General features

  • Data is managed in open, cross-platform, industry standard XML or JSON format using open API, for instance, Python API[34][35] or JavaScript Node.js API[36]
  • Data structure agnostic and type-rich database, handles variable data structure XML or JSON documents in a single database. Supports unstructured textual data, dates, numbers, meta-data (all XML and JSON types)
  • Cross-platform support: binaries are available for Linux, FreeBSD, Mac OS X and Windows. Clusterpoint database software can be compiled on other operating systems.
  • Multi-master cluster software architecture: no single point of failure, any cluster node can serve as a master and run the management application
  • Horizontal database scalability: scales out from a single server to few thousands of servers networked into a cluster infrastructure

Access features

  • REST API is used for XML and JSON document format management, search and data manipulation.
  • Consistent UTF-8 encoding. Non-UTF-8 data can be saved, queried, and retrieved with a special binary data type.
  • XML and JSON objects for API queries and responses: enable direct integration in other programming languages supporting XML or JSON parsing, no specific client software required

Search/query features

  • Built-in rich full text search functionality, with fast and free use of keywords and phrases, result snippeting, highlighting, term proximity search and other full-text search options[37]
  • Querying with term stemming, term wildcards and character position patterns, for inflected words and plural word forms delivering automagical self merge-joins[38]
  • SQL-like XML-structured (fielded) queries like in SQL SELECT ... WHERE ... statements
  • Cluster-wide analytics aggregation with MIN, MAX, COUNT, AVG like in SQL SELECT ... GROUP BY ..., ORDER BY ... statements
  • JS/SQL querying language, than combines well-known familiarity of SQL language with ubiquitous JavaScript code and the Web programming skills
  • Sorting of results in alphabetic, numeric, date order or according to result relevance
  • Autocomplete (instant search as you type) using the actual index data
  • Spell-check of query terms with alternative spelling suggestions for "Did you mean that?" functionality
  • Boosting of search query terms at query time, in order to increase, decrease or overwrite through the API relevancy weights or sorting rules built into the ranking index
  • Dynamic data classification per query by multi-level customer defined facets with exact hit counting (examples: categories, themes, product catalogs, geographic locations etc.)
  • Text-analytics driven similar content search across the entire database
  • XML or JSON data structure relevance ranking by tag weighting and document relevance ranking by document rating
  • Textual relevance ranking for matching search query terms to context, taking into account frequency and density of natural language terms
  • Predictive calculation of expected number of results based on the actual index statistics in large size databases to optimize performance

Administration/production use features

  • Granular security partitioning: API users and their access rights are based on groups and permissions assigned per specific databases and API commands
  • Transaction journaling, redo logs, access logs, error logs and audit logs enabled by default
  • Document versioning enabled by default (preserving previous document versions for a certain time period)
  • Reindexing in background with automatic switchover provides availability during reindexation
  • Online, offline and incremental database backup
  • Automatic or manual synchronization of database replicas
  • Multiple administrator accounts for secure multi-tenancy of different customer databases on the same hardware
  • Centralized web GUI based database administration Console, including one-click configuration of clustered and replicated databases across all nodes

Automatic full database content indexing

Clusterpoint software automatically builds and maintains document-type XML and JSON data content index when data us loaded, updated or deleted. A single database index (ranking index) is maintained to support these types of querying:

  • natural language based full text search indexing, including language-specific stemming and collation rules
  • XML or JSON data structure queries (with full-text, exact match and binary match options) or Essential SQL queries for analytics
  • virtual data structure search created from aliasing multiple real tags values to speed up Boolean OR queries
  • ad hoc search across all database content irrespectively from the database structure
  • numeric and date range search
  • geospatial search by range, distance or polygon coordinates and ordering by distance from a certain point
  • multi-level faceted search with automatic results classification by XML / JSON tags assigned as containing facets
  • combination of any of the above database search criteria into complex nested multi-part query expressions using Boolean AND, OR, NOT logic

Database administration

Clusterpoint database can be controlled centrally through the Clusterpoint Console application. It is a web-GUI dashboard that enables to control all database services enterprise-wide, including cluster database administration, configuration of indexing and ranking policy, secure user account management, audit and log file view, database backup/restore, database sharding and replication.

Each customer database is being started and stopped as an isolated database server process for the controlled management of CPU resources, RAM memory and disk storage. All databases share a single networked computing and storage infrastructure.

Clusterpoint Console is used to manage underlying hardware (cluster nodes) to share computing resources among different databases in parallel.

Process and storage architecture

Clusterpoint database processes are safely isolated, each process runs only in its own RAM memory address space. It can access only its own local file system storage folder with the same name containing the particular database XML or JSON documents, index, configuration and log files stored on that local cluster node (shard). This architecture delivers elastic horizontal scale out ability and cluster-wide control over resource consumption for a particular customer database. It also prevents unauthorized access to multi-tenant databases using the same computing hardware infrastructure, with option to fully encrypt sensitive data.

Multi-tenancy and virtualization

Clusterpoint supports secure multi-tenant database services. Software platform takes care about safe partitioning of runtime database computing environment among all cluster CPUs nodes, all RAM processes and all storage resources within a larger cluster, while operating databases in parallel on the same hardware equipment. This method delivers the best utilization of modern multi-core CPU hardware arranged in large distributed clusters.

Use of native multi-tenancy is the preferred method for high-performance database computing with Clusterpoint software vs operating system level virtualization or software containerization for safe multi-tenancy. OS-level virtualization may decrease available network bandwidth and computing resource, creating also unexpected bottlenecks at storage I/O level, that could result in increased application latencies. Database virtualization can be best use for prototyping and development where operational performance guarantees and low latency are not the first priority.[39]

Clusterpoint Cloud Database as A Service (DBaaS) is a secure multi-tenant database platform, with isolated data for each customer account and encrypted access security. Clusterpoint software does not need virtualization for safe and efficient multi-tenancy.

Multi-copy database replication

Automatic multi-copy replication for the entire database is built into the Clusterpoint database software. It is active replication, with workload sharing within a cluster. Clusterpoint supports high-performance OLTP transactions, ACID-compliant, within a main cluster in a single data center, while providing fail-over to more datacenters running database replica clusters. Fail-over takes only few seconds, if communication latency among data centers is minor.

Database replicas in Clusterpoint architecture are used for automatic load balancing of database search queries through Clusterpoint API.

In multi-datacenter use network bandwidth among locations may become the critical issue for Clusterpoint architecture because of increased latencies for database updates and synchronization delays among replicas, in particular, if encrypted VPN networking over the Internet links is used.

A high-capacity bandwidth might be required for high-performance database replication among geographically different location datacenters.

Extendable server-side scripting with Lua

The Lua extends Clusterpoint Server functionality with custom server-side scripts. Lua scripts can implement customer-specific functions such as data aggregation, ETL tasks, meta-data markup, call-back to external programming languages using web services for extra functionality, real-time alerting or asynchronous triggers. Scripts can be executed before, during or after Clusterpoint API transactions of interest. Built-in configurable server-side hooks activate Lua scripts in different stages of each Clusterpoint transaction execution process.

Custom Lua scripts can be stored in Clusterpoint Server to work as "stored procedures".

Extendable server-side scripting with JavaScript computing engine

Starting from Clustepoint database version 4, JS/SQL has been added as main scripting engine. JS/SQL is representing SQL query language that can be custom extended with free JavaScript user code. JavaScript can be used within WHERE, GROUP BY, ORDER BY and other SQL statament clauses. This feature enables to custom extend Clusterpoint database functionality beyond standard database and search features. For example, users can perform highly parallel computing tasks within a database where local data storage will provide the fastest possible performance, while only using familiar SQL syntax extended with own JavaScript functionality, all within a single JS/SQL query in Clusterpoint database architecture.

Programming language support

Clusterpoint database uses REST principles and HTTP/HTTPS messaging for client-server communications between customer software applications and Clusterpoint database server. Any client programming language or development environment, supporting HTTP POST/GET messaging, can connect to Clusterpoint Server directly and read, write, update, delete and search XML and JSON documents.

In versions 1.x, 2.x and 3.0 REST API interface for JSON data format transforms customer data between JSON and XML, while only XML is used for internal server-side data storage and processing by Clusterpoint Server.

Clusterpoint Server has native client API Libraries using HTTP and faster TCP/IP transport protocol for the following popular programming environments:

Please check the vendor web site for API support in other languages.

Licensing and support

Clusterpoint offers two database licensing options based on functionality and scalability:

- Clusterpoint Enterprise - The most comprehensive DBMS product solution, delivering unlimited scalability and the highest standards of enterprise grade functionality, fulfilling the most demanding of customer requirements.

- Clusterpoint Lite - The Clusterpoint DBMS solution for smaller organisations who require high standards in basic database functionality, supported by replication on 2 servers, but for whom scalability and sharding is not an immediate operational requirement.

There are four types of on-premise licensing models available - Perpetual licence, Subscription licence, OEM licence and Developer licence.

Vendor provides standard software maintenance and technical support service based on subscription model (on premises or Clusterpoint Database Cloud), delivering it over email, Skype or phone.[40]

Premium technical support for customers using the software in 24h/7d production environments includes remote problem diagnostics and resolution based on Service-level agreement. Vendor provides installation support, help-desk, training and partnership programs.[41][42][43]

Clusterpoint Products

  • Clusterpoint DBMS - clustered NoSQL database, which uses approach of multiple server system to spread load and increase performance. Clusterpoint database facilitates high parallelism of computing and distribution of data.
  • GOL: Big Data SIEM Analytics tool from Clusterpark - Log, Events and Security Records Search and Analytics.[44]
  • DigiBrowser: Quick SQL denormalization into NoSQL database - imports multi-table SQL database into one Clusterpoint database using automagic denormalization.[45]
  • NTSS: Network Traffic Sureveillance System for Lawful Intercept - High-speed capture, store, search and analysis of all Internet traffic for the corporate network.[46][47]

See also

References

  1. ^ "Clusterpoint Group Limited". Companies House (UK). Retrieved 2015. 
  2. ^ "Clusterpoint Development Center". Lursoft (LV). Retrieved 2015. 
  3. ^ "Clusterpoint Profile on Firmas.lv". Firmas.lv (LV). Retrieved 2015. 
  4. ^ "Bring the Power of Big Data to Small Businesses". Data-Informed.com (US). Retrieved 2015. 
  5. ^ "Imprimatur Capital About Clusterpoint". Imprimatur Capital. Retrieved 2015. 
  6. ^ "Clusterpoint Raises EUR1 Million From BaltCap". Privateequitywire. Retrieved 2013. 
  7. ^ "Clusterpoint Receives EUR1 Million From BaltCap". Arcticstartup.com. Retrieved 2013. 
  8. ^ "Latvian Database Platform Clusterpoint secures 1.25 million". Arcticstartup.com. Retrieved 2015. 
  9. ^ "Clusterpoint 4 Computing Engine Combines Instantly Scalable Database and Computational Power". InsideBigdata.com (US). Retrieved 2015. 
  10. ^ "A new document database emerges from the cloud". infoworld.com (US). Retrieved 2015. 
  11. ^ "Clusterpoint adds computation to NoSQL database engine". SiliconAngle.com (US). Retrieved 2015. 
  12. ^ "List of NOSQL Databases". Nosql-database.org. Retrieved 2015. 
  13. ^ "The NoSQL movement: document databases". Dataversity. Retrieved 2013. 
  14. ^ "Big data startups / document stores". Bigdata-startups.com. Retrieved 2013. 
  15. ^ "Technology Behind Clusterpoint Database". Gints Ernestsons, Founder. Retrieved 2015. 
  16. ^ "Fulltext search engines". Mediawiki.org. Retrieved 2013. 
  17. ^ "Bloomberg Company Research Profile". Bloomberg.com. Retrieved 2015. 
  18. ^ "Crunchbase Clusterpoint Profile". Crunchbase.com. Retrieved 2013. 
  19. ^ "BusinessWeek Clusterpoint Profile". Businessweek. Retrieved 2013. 
  20. ^ "Business Directory Use Case". Yellow Search Today. Retrieved 2015. 
  21. ^ "Clusterpoint Use Case In E-commerce". Exim.lv. Retrieved 2015. 
  22. ^ "Open Data and Public Services 2015". Garage48 Foundation. Retrieved 2015. 
  23. ^ "Clusterpoint and ZoomCharts". Zoomcharts.com. Retrieved 2015. 
  24. ^ "Developers Club NoSQL Meetup with Clusterpoint". Dev Club Riga. Retrieved 2015. 
  25. ^ "Top NOSQL document databases". Big Data Analytics Today. Retrieved 2015. 
  26. ^ "How to make a Google App Engine application searchable using self merge joins". Google, Inc. Retrieved 2015. 
  27. ^ "Infogov Proteus iGRC (Internet Governance and Regulatory Compliance)". Infogov Ltd (United Kingdom). Retrieved 2015. 
  28. ^ "Agile Web Software Development". Agile.org. Retrieved 2015. 
  29. ^ "Turbocharge HTML5 web applications". Ambienttech. Retrieved 2015. 
  30. ^ "Converting web sites to NoSQL". Rixtellab. Retrieved 2015. 
  31. ^ "Bit IT Solution for Network Traffic Control". Bit IT solutions. Retrieved 2015. 
  32. ^ "NoSQL Scaling Beyond Traditional SQL" (PDF). Intel Corp. Retrieved 2015. 
  33. ^ "HP Guide to NoSQL". Hewlett-Packard Corp. March 5, 2015. 
  34. ^ "Clusterpoint API on Github". Github.com. Retrieved 2015. 
  35. ^ "Python API for Clusterpoint Server". Python.org. Retrieved 2015. 
  36. ^ "Clusterpoint Node.js API". NPM, inc. Retrieved 2015. 
  37. ^ "Full Text Search Explained". Everything.Explained.At. Retrieved 2015. 
  38. ^ "Making you app searchable using self merge-joins". Google. Retrieved 2013. 
  39. ^ "The Do's and Don'ts of Virtualizing Database Servers". Network Computing. Retrieved 2015. 
  40. ^ "Clusterpoint DBaaS Cloud Service". Facebook. Retrieved 2015. 
  41. ^ "Clusterpoint DBMS by 1DataGroup". 1DataGroup. Retrieved 2015. 
  42. ^ "Knowledge Academy Training Course in Clusterpoint DBMS". Knowledge Academy. Retrieved 2015. 
  43. ^ "Big Data Meetup. Clusterpoint XML Database Engine". Meetup.com. Retrieved 2015. 
  44. ^ "Clusterpoint GOL - fast log data analytics & search application software". Clusterpoint. Retrieved 2016. 
  45. ^ "DigiBrowser: Quick SQL denormalization into NoSQL database". Datorikas Instituts DIVI. Retrieved 2015. 
  46. ^ "Clusterpoint NTSS Product Review". SpiceWorks, Inc. Retrieved 2015. 
  47. ^ "Clusterpoint Network Traffic Surveillance System". iiGrowth LLC. Retrieved 2015. 

  This article uses material from the Wikipedia page available here. It is released under the Creative Commons Attribution-Share-Alike License 3.0.


Clusterpoint
 



 

Connect with defaultLogic
What We've Done
Led Digital Marketing Efforts of Top 500 e-Retailers.
Worked with Top Brands at Leading Agencies.
Successfully Managed Over $50 million in Digital Ad Spend.
Developed Strategies and Processes that Enabled Brands to Grow During an Economic Downturn.
Taught Advanced Internet Marketing Strategies at the graduate level.


Manage research, learning and skills at defaultLogic. Create an account using LinkedIn or facebook to manage and organize your IT knowledge. defaultLogic works like a shopping cart for information -- helping you to save, discuss and share.


  Contact Us