Hadoop and its filesystem HDFS is open-source software (part of Apache) for distributed processing of Big Data.
Keras is a high-level neural networks API, written in Python and capable of running on top of either TensorFlow, CNTK. It was developed with a focus on enabling fast experimentation.
Python has become a competitor as go-to language for all things data-science.
Developed by the Google Brain team, Tensorflow is now under Apache 2.0 open source license, and is one of the best applications for neural networks.It's written in Python, C++ and CUDA.
Scala is a modern multi-paradigm programming language stemming from Java. It is designed to express common programming patterns in a concise, elegant, and type-safe way. Spark loves Scala (not the choir though).
Part of the world of machine learning technologies and neural network, torch is an open source machine learning library, based Lua (using LuaJIT as scripting language).
Apache Kafka is an open-source stream-processing software platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
R is the go-to tool of our Data Scientists. It's great for exploratory analysis and allows easy access to statistical, mathematical and machine learning functions.
The Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.
Spark can run on Hadoop 2's YARN and can read any existing Hadoop data. It is developed to run programs faster by making more use of in-memory data processing. Spark developers claim that it runs 100 times faster than Hadoop MapReduce in memory or 10 times faster on disk.