Ing is a Data Driven Experimental Enterprise, which is heavily investing in big data, analytics and streaming processing. As in many other enterprises, we deal with a large variety of data sources. Some are responsible for primary processes, while others are used to improve the quality of the service and to keep internal operations going on smoothly. The amount of data which must be handled goes beyond the computing performance of single machines, and vertical scalability is hardly an option. For this specific reason we are moving towards a scenario where machines are grouped into clusters and data is handled by distributed processing systems. An important building block in ING’s analytics journey is a state-of-the-art Data Lake. The data lake replaces several enterprise data warehouses and is the central repository for all types of data, supporting various types of queries for our stakeholders’ demands: batch, real-time, large and small datasets. Key elements of the ING’s Data Lake are RESTfull APIs, secured and managed access to big data storage and processing, and real-time streaming analytics. Data is also being handled more often than not as streams and we are experimenting with Kafka and streaming computing to provide faster, more reactive and up-to-date user experiences and journeys. Finally, machine learning is aiding traditional sql analytics to provide better insight when it comes to operational excellence, business processes, marketing and security applications. In this talk we will start from traditional batch processes, touching upon the latest development about big data, data lakes and hadoop, to move further into the world of in-memory analytics. We will explore some of the fast and streaming data systems and tools in streaming Analytics such as Kafka, and Akka and describe some typical it architectures and data processing related to streaming data. We will illustrate a Ing open-sourced solution, realized in Akka, Scala, and Spray to deal with streaming event processing, named “Coral”. We are working with Hadoop, Spark, Scala, R and other technologies to set up a data lake where all data flows from the bank come together. Please join if you want to find out what’s involved in such a programme. We’ll share lessons learned, best practices and code!
Bio van Bas Geerdink & Natalino Busa Bas is a programmer, scientist, and IT manager. At ING, he is responsible for setting up a data lake where data from all core systems of the bank is gathered and distributed. His academic background is in Artificial Intelligence and Informatics. Bas has a background in software development, design and architecture with a broad technical view from C++ to Prolog to Scala. He occasionally teaches programming courses and is a regular speaker on conferences and informal meetings.
Natalino is currently Senior Data Architect at ING in the Netherlands, where leads the strategy, definition, design and implementation of big/fast data solutions for data-driven applications, for personalized marketing, predictive analytics, and fraud/security management. All-round Software Architect, Data Technologist, Innovator, with 15+ years experience in research, development and management of distributed architectures and scalable services and applications. Served as Senior researcher at Philips Research Laboratories in the Netherlands, on the topics of system-on-a-chip architectures, distributed computing and parallelizing compilers. Blogs regularly about big data, analytics, data science and scala reactive programming at natalinobusa.com