Big Data Challenge Hackathon
If you think that Hackathon is an event taking a couple of hours or days where people simply drink tea and have nice time, you are wrong. NIX Solutions’ Data Analyst Michael tells about a real “hardcore” hackathon. Vodafone Ukraine has held an unprecedented, by Ukranian measures, event, The Big Data Challenge Hackathon, at the end of 2017 in Kiev, and opened its real telecom data to participants.
At the same time in one place you could find:
• organizers from Vodafone providing data and infrastructure;
• representatives of business and municipal authorities generating inquiries;
• analysts, date-scientists and data-engineers making creative decisions;
• investors, ready to finance implementation of these decisions.
Big Internet companies and telecom operators have realized the huge assets they obtain: data about their clients. The main idea of a hackathon was to find solutions for different tasks which:
• are based on data about clients;
• can make our world a little better.
From the city authorities there have been, for example, these inquiries:
• determine how and when the transport circulates and construct a model which allows usage of the transport more comfortably and reduces time loss;
• optimize the of work of the border guards: discover the flow of people coming to the checkpoints and at what time;
• ascertain population decentralization: how much the city has grown, where it is necessary to install new routes and which routes aren’t used any more;
• promote tourism: how to build tourist routes;
• develop municipal infrastructure: how to build the system for new quarters.
Inquiries from business were the following:
• placement of the shops, their format, range, and size;
• analysis of the available real estate – where to build;
• advertising agencies: clients’ profiles
NIX Solutions’ data analyst Mikhail, data scientists Ivan and Dmitry have participated in this event.
Within the hackathon there were several stages. To fully “dip” you into this atmosphere we briefly explore each of them.
The first stage was introductory and was held in the format of a bootcamp. About 300 participants from all corners of Ukraine gathered in a conference room, and at least as many participated in online broadcasting.
The Bootcamp task was to understand the data structure and determine exactly how they can be applied for the maximum benefit. Managers of Vodafone of different levels discussed cases of successful usage of data and also acquainted us with variants of business inquiries.
Following this stage there were tests on mathematical statistics and data science carried out immediately, allowing only participants with a suitable level of knowledge take part in further competition.
According to the test results, our team has made it to the next stage, and we went home to prepare for the hackathon. For this purpose we were transferred a test sample of data (250 MB, 10 thousand subscribers, 21 attributes, 1 million lines).
Returning to Kharkiv and considering everything we have heard, we have asked analysts from all of our departments to carry out brainstorming sessions for ideas for the project at the hackathon :).
The second stage of the hackathon, occured 2 weeks later. More than 30 teams have been created from the participants selected at the previous stage. This time, they were listening to representatives of real businesses, Ukrposhta, TV-provider, a retail company, chain of supermarkets and at exactly noon on Saturday, the countdown timer has been started. Each team could choose technical mentors (generally representatives of Vodafone, who told the teams “What to Do.”) and business coaches (skilled startupers and businessmen with work experience in different fields like medicine, construction, consulting, insurance and many others). Teams discussed the ideas forof their projects with them, and looked for interesting applications of their systems in business.
The sample of data on the second stage was even bigger: 2 GB, 100 thousand subscribers, 26 attributes, 10 million lines.
We decided to name our team, the X-Team (This, as you may have noticed, is a play on NIX :).
At 9 pm the same day we had a checkpoint where teams presented their ideas to the judges. By this time, we already had 4 working ideas and we wanted to use two of them at once. Our business mentor has come up with the idea of drawing a portrait of the visitors of open-air events (concerts, festivals, meetings) — it was the plan A and we have spent all of Saturday on it. However, from our “home-brainstorm” we had another worthy idea: identification of the hidden and implicit risks during crediting and insurance. After listening to everything, the mentors said that the project on identification of risks is the most interesting.
We had the whole night to develop the idea, to Google, to design signs, to train the model. Talking about the organization’s amenities, we had all the conditions for a comfortable night hackathon: unlimited coffee and cookies, the Internet and convenient padded stools. In the morning we had one more checkpoint where we had to tell the mentors about our progress. Then, we had 3 hours to prepare the presentation and “train ourselves” for the report. The performances are strictly regulated and restricted to one speech.
Following the results reports, the judges have selected 12 teams which have reached the final. Our team has been given an AWS award, and as the prize we were provided with the option to work with Amazon servers in the final.
The third stage has started right after the hackathon — there we had a program-accelerator with the possibility of online communication with mentors and representatives of businesses in different fields.
It was, perhaps, the most interesting and ambiguous stage. It lasted for one and a half months, and during this time, we have held negotiations with various credit and insurance companies, and credit bureaus.
We were searching for a model of cooperation which would be interesting and favorable to all sides because you can’t solve a problem without data, data can’t be obtained without a contract, a contract can’t be signed without the demonstration of working prototypes which, in turn, are impossible without getting the data. Vicious catch 22 circle.
During the third stage we have participated in real business negotiations where each word has to be weighed and carefully considered. At the final phase, analyst Ira, manager Renata and the designers have joined our team. There was in instance on AWS with the maximum sample of data (21,5 GB, 30 attributes, 120 million lines) developed for us, and we could build the models straight away.
The final, fourth stage is a pitching of projects for the investors. There were AVentures Capital, CYFRD, UAngel, Western NIS Enterprise Fund, Chernovetskyi Investment Group and the top-management of Vodafone attending.
It must be surprising we managed to accomplish this project, especially under the conditions of a long, multi-stage and saturated event. Here you are:
The aim of our project is a service which can predict the credit rating of the borrower by his telecom-behavior (calls, their duration, time of the day, frequency and the sums of replenishment, consumption of traffic and many other parameters) even if he has no credit history yet. This service is interesting for the banks and financial institutions.
With the help of Machine Learning we trace the patterns of borrowers’ telecom-behavior and calculate their interrelationship with their financial behavior
The advantages of this approach are:
- We know borrower’s behavior long before he has obtained his first credit.
- We react to a changes in the borrower’s behavior quicker, and we can tell the creditor about it.
- We can predict credit ratings for borrowers without a history.
After the reports were presented, all participants voted for the project they liked, and the majority liked projects with social orientation. For example, “Service of search of fellow travelers with similar interests” or “Optimization of night routes of public transport” received many votes.
As for our team, the hackathon has a positive impact on us and has reminded us of the possibility for the embodiment of ideas and development of a project prototype in only 2 days in fruitful team-work. This project was useful for us as an opportunity:
- to quickly understand the sphere of analysis and processing of big telecommunication data;
- to obtain information about various opportunities for application of the telecommunication data;
- to develop practical models of machine learning on this basis.
What should be noted is the developed models can be used not only within the crediting sphere, but also for assessment of risks in other business fields. This line of our project has been noted by investors and mentors, and potentially opens a way to further development and expansion of scope of the constructed model.
For organizers, this action has become a source of ideas for how to use telecom-data in various projects for investors. It is demonstration of the opportunities in modern Data Science with usage of Telecom Big Data and a chance to make contributions to relevant technological projects.