Bit of Data Science, Bit of Data Engineering & a world of Computer Science

Data Revolution Continues: Data Science and the future…

Picking up from where I left off in the last post, now lets discuss why and how leveraging this mammoth of data can revolutionise how a business organisation is run. This is the 2nd part of a two part blog post and if you haven’t read part 1 (Data Revolution … Big Data), I highly recommend you read it before delving into this one.


Figure 1: An example: How movie renting industry has evolved from a product driven approach to a data driven one over time

A use-case: Entertainment industry

From this example in figure 1, it is observable how the movie industry went from a product oriented business model to a data driven model. The earliest businesses focussed solely on selling VHS tapes & DVDs to consumers. This model is terrible for the consumer as there is no long lasting value in owning a movie. Once you watch it, the value is gone. Then this industry evolved into a service oriented industry. Instead of transferring ownership of the good to the consumer, he/she would buy a subscription from the vendor where the vender would make a collection of media available to the members. As the internet boomed and hardware flourished, innovative companies like Netflix, Spotify took away the whole physical aspect of having to collect a DVD/ CD and return it back to a physical location. With the improvement of internet technologies, content is delivered to consumers digitally. At this point, the service oriented business has evolved into a technology focussed model where the business process and the customer experience has been completely changed using new technology. And this technological trend has enabled businesses to capture more user behaviour signal and the software advances have led these companies to create ways to analyse this digital footprint and make more informed decisions to further improve customer experience, transforming these technology focussed companies into data driven business organisations. Using techniques such as Machine Learning, Data Science and Computational Statistics, these organisations have found new ways to improve components such as content search, information retrieval and personalisation which improves the user experience drastically. Availability of big data and tools to manage this datasource is the single vector triggering this transformation. Methods such as Machine Learning and Data Mining complements this trend by giving better ways to create disproportionately large value from the data that is being captured.

So what???

What this use case suggests is that the business landscape is truly changing with the emergence of new technologies. This gives you an opportunity to use these methods to enhance your business organisation to reap the benefits of the technologies thats been developed in the last decade.

But, applying big data and machine learning to every business naively can greatly harm the return on investment (ROI). There could be some risks of jumping into the whole big data, machine learning bandwagon too aggressively without caution. Some obvious risks are,

  1. Big data and Machine Learning are not for every company. It doesn’t make sense to invest big money on infrastructure and personnel if your business has very little value to extract from the data you accumulate.
  2. It helps greatly if the stakeholders of the organisation (such as directors, accountants, financial controllers and etc…) are aware of the end goal of these projects
  3. If the endeavour is a long term one, it is quite important that your superiors have the patience to see it through.
  4. Often, big data and machine learning projects might not have direct $$$ numbers that can be associated with them although they bring in great value passively.

Due to above reasons, it makes a lot of sense to follow a systematic approach to keep the above factors under control and reduce the risk of your vision going down in books as one of those “unnecessary financial blackholes”.

DIKW Pyramid

One of the very popular approaches to leveraging Data Intelligence to business organisations is the DIKW pyramid. DIKW stands for Data –> Information –> Knowledge –> Wisdom. DIKW Pyramid is a popular representation that is used to depict how data can be harnessed into a valuable resource by adding more and more context. This is a systematic approach that starts with the raw forms of data and use a step by step strategy to develop features that add value by refining the useful signals from that data.

Before analysing how to use this tool, lets understand what is Data, Information, Knowledge and Wisdom.

DIKW Pyramid explained

Figure 2: DIKW pyramid

As you can see from the figure 2, the left side represents the pyramid while the right side describes how data looks like at each stage. Data can be harnessed in different levels to extract increasingly valuable information from it. Data consists of values(letters, numbers, symbols and etc…) in its raw form. It is evident from figure 2 that the bottom layer data represents a bunch of values that do not mean anything by themselves. Information is data enriched with some context. By this stage, additional information has been added to data to create more sense out of it. Value “Red” now has some meaning as there is an additional piece of information that “it relates to traffic lights”. The other two degrees values are also meaningful now as it is a location. Knowledge is understanding the patterns in the information. As the right side of figure shows, now there is additional information about the traffic light and its relationship with traffic violations. Wisdom is being able to react using the knowledge at hand. Based on the knowledge of the pattern recognised, i.e. “crossing red traffic lights triggers traffic violations”, now it is possible to react to that knowledge and actively avoid negative consequences. 

But what happens in this process is that we continuously enrich data with additional information that allows us to get more actionable information. Now let’s see how this pyramid is applied in real world.

DIKW --> Real World

Figure 3: How the DIKW maps to real world business functions

Different Data related operations fall in different stages of the DIKW pyramid. The base mainly consists of getting the right data into the system and unifying the data coming in. The data layer mainly consists of building the right data sensor that can capture the right data in the right volume and velocity. In this phase, the enrichment of data is minimal. The main focus is to use the raw data to build accurate representations that are useful. Main activities can involve things like de-duplication of data, extreme value removal and making sure that the data is captured and stored as reliably as required.

The Information and knowledge layers of pyramid are where data mining, machine learning is used to enrich the captured data with more context. this is the data that is mainly consumed in making the reports and dashboards that give the visibility into business process. In this phase a lot of pattern recognition, statistical inference and predictions are used to enrich the data.

On the top, we have the action phase, that uses wisdom to actively take decisions. Based on traffic prediction and pattern recognition on orders etc… a distribution company can forecast stocking, warehousing and distribution more effectively. A trading algorithm uses the knowledge it extracts from numerous data sources inc. stock movement, weather data, news, social media to execute trades.

In summary, DIKW pyramid is a good approach to introduce incrementally complex and valuable data driven features to a business without having to throw into leaps of faith two large for the organisation.

DIKW pyramid to everyday businesses…

As explained in the sections above, the data layer is mainly leveraged by building the right sensors and data ingestion infrastructure to store the data. Once the data layer has useable data, it is possible to start building components that will lead to higher levels in the DIKW pyramid. Figure 4 below gives a concise summary of different levels of complexity that can be introduced incrementally to transform your organisation from a data warehouse into a proactive data driven engine.

Incremental Improvement

Figure 4: Incrementally increasing complexity of data drive in your business

The most important thing to bear in mind during this transformation is that a business organisation is a system that needs all the nuts and bolts to turn for the business to move forward. Data transformation cannot be single handedly achieved by enhancing the technology that drives the business. The transformation is enabled by improving the technology, people and the processes that makes the business operate as one unit. If you fail to keep in mind all these aspects when introducing change, it can have dire consequences.


In conclusion,

  • With the technological advances leading to cheaper, yet powerful infrastructure, data collection, processing and storage has become so simple and feasible.
  • Seeing the opportunity, business organisations have adapted a culture where they record as much data as possible → Big Data
  • With the emergence of big data, methods and techniques to extract insight from this mammoth of data has emerged → Data Science and machine learning
  • With the proper use of Big Data and Data Science, a business can unlock limitless opportunities to evolve into data centric organisational cultures.



These two posts are a summarisation of an invited talk I did at Post Graduate Institute of Agriculture at University of Peradeniya in Sri Lanka in January 2017. The slides from presentation are found below:


Data Revolution: Big data…

Last December, I was back in my beautiful Sri Lanka to spend Christmas and New year with my family and friends. During this time, I got invited to do a guest lecture about Data Science at my former alma mater, University of Peradeniya.


University of Peradeniya

University of Peradeniya is one of the prettiest universities in the whole of South Asia equipped with world class faculty and resources enabling the perfect environment for academic curiosity.

Being lucky to work in the bleeding edge of revolutionizing data landscape for a few years, I felt greatly passionate about sharing my views and opinions about data driven future with the academic community of Sri Lanka.

businessI decided this would be a great opportunity for me to share the knowledge I have acquired working in London and inspire academics to increase their curiosity towards data science.

Because the audience of this lecture was mainly Economics, Business Studies and Statistics students, I decided to title the talk “Data Revolution, Big Data, Data Science and the future” focusing on the value return and competitive edge data science provides to a business. I decided to structure the talk in such a way that I would take the audience in a journey that asks the following questions..

  • What has changed in businesses?
  • Why have the businesses changed this way?
  • What methods have let us adapt this change?
    • What is big data?
    • What is data science?
  • How do we device these methods for organizational growth?

In the following sections, I will explore my talk in the avenues relating to the questions above. I am planning to structure this article in to two blog posts due to the quantity of the content.

  1. Post 1: Motivation for Data Science and the emergence of Big Data
  2. Post 2: What is data Science, and how to use it in your organization

Change of dynamics in how businesses run ….

With the advancement of technology and society, we, the mankind has moved forward adapting our way through industrial revolution and digital revolution. During industrial revolution, humans enhanced ways to produce goods efficiently. The process was very “product” focused.

Then came a series of improvements in computer science that led to more software improving our processes. All the analog machines became digital and the ways to use digitization to improve quality of life increased. The Internet came in to being !! Digital systems allowed providing services such reservation, shopping and numerous other services that enabled humans to innovate in new ways of creating value without having to produce physical goods.

Over the past decade, the business culture has gone through a drastic paradigm shift. With the digitization, we have learned that we can create value with understanding the recurring patterns in the data. We have learned that data is a good representation of the underlying process and invested time and effort in innovating ways to create value off this understanding.


Figure 1: Growth of digital footprint (Source: Tony Pearson, IBM Edge 2015 : Las Vegas, NV, USA)

The Digital data footprint has changed drastically in the last few decades. As figure 1 shows, the amount of digital data that has “potential” to be processed using computers has grown.

This leaves us a huge opportunity to use this data to harness our understanding of the world around us and use this understanding to create value.

Drivers of change

Now let us understand the drivers of this change. Over the last few decades, The cost of computation has dropped hugely.


Figure 2: Cost of computation over the last few decades

As shown in figure 2 referenced from O’reilly radar clearly shows how the cost of CPU, Storage, and Network bandwidth has dropped over the last few decades. The Top Right plot shows how the Internet has grown from 1 node to 1Bn++ nodes.

One of the main factors taken into account when deciding when to invest on technology projects is the financial feasibility. These plots give solid evidence that the financial viability and feasibility of data processing has improved over the years. And now, it has come to a stage where businesses are actively investing in this vertical.

In addition to this, the recent advancements of technology that has enabled the following also helped..

  • Internet
  • Better data sensors (logical and physical)
  • Easier ways to set up operational systems

The Internet Happened

By looking at figure 2, you can clearly see how the Internet has grown into such a huge network over the last few decades. It has become such a vital part of our lives, we have even had several disasters relating to relying on the Internet in recent years(Y2K, dotcom bubble).

In 1973, at the inception of the Internet, Advanced Research Projects Agency NETwork (ARPANET) was built to communicate research between 15 sites across the USA. This was the first network to implement the TCP/IP communication protocol that we heavily rely on in the present day.

A few decades later, we rely on the internet to disseminate this very blog post. All the tech giants in the world build their businesses around the internet. Trends such as Mobile computing has position all sorts of hardware and software providers to create value in different layers of the Internet platform. Many people design very smart sensors (hardware and software) to capture valuable data around mobile computing and social networking. In year 2016, 7 Bn of world population (95%) live in mobile accessible geographies. 47.1% of individuals in the world and 40.1 % people in the developing world have internet access.

Infrastructure as a Service (IaaS) / Cloud services

As the scale of users increases, the businesses also need to be able to serve users in this scale. In the early days of the Internet, the entry cost for the Internet Market was extra-ordinary. I remember the days when one had to

  • Buy a good uplink Internet line from an ISP
  • Reserve a Static IP and a domain name through the ISP
  • Buy an expensive server to run the server
  • Purchase all the software to run a web server
  • and … Maintain the server so that it won’t fail !!!

JUST to host a personal website…

Infrastructure-as-a-service (IaaS) completely changed the game by creating a platform to set up virtual servers on demand through a software interface. Amazon was one of the first companies who provided these services in large scale. The idea behind IaaS is that a large corporation will invest Billions of dollars to build multiple data centers. Then they would build a virtualization layer on top of the enterprise hardware to be able to create  vertual infrastructure abstracted from the hardware running the infrastructure. The user will use a user interface to push down commands that will create virtual systems that will run on data center hardware underneath without exposing that complexity to the user.

The main advantage of IaaS/Cloud to the user is that there is very little risk for the user as there is no initial investment in purchasing hardware. Most of these systems are built for elasticity and hence, you pay for what you consume. And the user doesn’t have to allocate resources to maintain the server, data center space. The users also can stop worrying about the security concerns such as disaster recovery, replication and defending the systems against cyber attacks. The IaaS provider takes care of these things in a data center scale where they would have 1000s of experts working on solving these problems. This is a creative way of ensabling the users take advantage of economies of scale.


There are disadvantages of course. You systems could only use features that the IaaS provides you. The capabilities of the IaaS provider might limit your capabilities and make your business make technological compromises. Another main downside is the tendency of the business having to heavily bind itself to IaaS specific features which will make you heavily dependent on the IaaS provider.

Big Data

Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy.

– Wikipedia

With the technological and social trends mentioned in the above sections, we have come to a stage where collecting data is both cheap and useful.

storage_b low.jpg

Figure 3: Big Data .. Biiiiig Data

The improvement of digital footprint as shown in figure 1 and the cost of storage shown in figure 2 suggest, it is a convincing case to store as much data as we have. As figure 3 on the left shows, we have more 280 exabytes (280,000 Tb/ 280 Mn GB) of digital data in today’s world. With this scale of data being at hand, we have come up with a term for this type of data, Big Data.

But with storing big data, comes big problems. No data is valuable if there is no way to extract the valuable information it carries. Moving, processing data of mega scale is not easy. And although cost of computation has dropped, it makes no difference if there was no way to process this data in acceptable time lines. Traditional data management technology companies such as Oracle, SAP, Sun Microsystems tried to solve this problem by increasing the capacity of the single data processing unit. But This approach deemed to get exponentially expensive with the linear increase of data scale. So the companies like Google, Facebook that had “Web scale” data set out to find more effective solutions. And they invented technologies such as Hadoop. Dr. Ted Dunning from MapR gave an excellent explanation of this phenomenon at dotScale 2016 in Paris.

This video very well explains how using a network of compute units rather than a big single compute unit enables us to rationalize for cost for return value. Google engineers came up with the core concept behind Hadoop which is MapReduce, This research paper initially appeared at OSDI’04 conference outlining the design being MapReduce. MapReduce framework rely on the concept of processing data based on (Key, value) units.


Figure 4: MapReduce being used to solve a word count problem in parallel

The map phase triggers computations on lines of the big file independently. The Reduce phase is used to aggregate results based on key. To get a better understanding about how the MapReduce framework works, I highly recommend referring to Chapter 2: Map-Reduce and the new software stack of the book Mining Massive Datasets by Jure Leskovec, Anand Rajaraman and Jeff Ullman.

With the emergence of Map-Reduce, Yahoo research lab created Apache Hadoop, a java implementation of the MapReduce framework. Alot of technologies such as Apache HBase, Cassandra, Ignite came into being inspired by key value pairs and Map-reduce paradigm itself. A recent project Apache Spark simplifies using MapReduce paradigm by covering the complexities of the thought pattern by providing a more functional API to run massively parallel data pipelines. You can refer to my pySpark API Tutorial to get a quick glimpse of how to use Map-Reduce in data processing.

What Now..

We have now seen how the businesses have changed, what this change can be rooted to and how the technology has adapted to these changes and leveraged solutions that can tame this big data.

I will discuss the methods invented to handle this data, aka Data Science and how to use these elements to create value and competitive edge in a business organization in my next post.

Please subscribe to  my blog to stay in the loop on my latest posts about Big Data, Data Science, Application of Machine Learning and building high scale data pipelines.

For part 2 of this post where applying Data Science and Machine Learning to business is discussed, please refer to my next post (Data Revolution Continues: Data Science and the future)

An introduction to Apache Spark (PySpark API)



I have been wanting to write this post since the first time I conducted an Apache Spark workshop with Maria Mestre (her blog can be found here) and later with Erik Pazos. Finally, I have found some time to compile everything together and present in one concise post.

Apache Spark


Apache Spark is a popular distributed computing engine that enables processing large datasets in parallel using a cluster of computers.  The specific distributed computing paradigm implemented Apache Spark is mainly known as Map-Reduce framework. As the primary objective of this post is not delve into details of Map-Reduce framework, please refer to this paper if you want to learn more about how Map-Reduce works.

The popularity of Map-Reduce framework grew when Apache Hadoop project (with Hadoop MapReduce module) emerged. Hadoop MapReduce grew rapidly with a lot of players in the market adapting this technology to process TB scale data. Spark is a more recent entrant to the market. Having its inceptions at an academic project at UC Berkeley, Spark solved some of the performance drawbacks of Apache Hadoop and showed that outstanding performance gains can be achieved using in-memory computing capabilities in Spark in comparison to Hadoop that wrote intermediary data to disc after every step. Spark eventually evolved into a Apache open source project and became Apache Spark. With constant improvement and very short release cycles, Apache Spark has been improving rapidly ever since while capturing a reasonable market share in distributed computing market.

The core spark engine is mainly written in Scala programming language while the initial version provided three standard API interfaces to Apache Spark, namely Scala, Java and Python. The recent releases have expanded the API offering to also provide a R API positioning itself more attractive among use cases that involve lot of data science, statistics and big data.

What is PySpark?

PySpark is the python interface to Apache Spark distributed computing framework which has been catching a lot of traction lately due to the emergence of Big Data and Distributed computing frameworks that enable us to process and extract value from Big Data.

Python has been a popular choice for data science and machine learning in the near past. This is mainly due to some of the features in python that makes it very efficient and attractive when doing data science in the industry. Some of the reasons are as follows:

  • Python is a production ready programming language that has been developed and matured for a while
  • Python is a very expressive language which allows you to express programming logic very clearly in fewer lines of code compared to alot of languages in the industry today.
  • Python has a rich eco-system that has feature rich libraries or scientific computing (scipy, numpy), data science & statistics (statmodels, matplotlib, pyMC, pandas), machine learning (scikit_learn, scikit_image, nltk)
  • Python also has a rich library ecosystem to build industry-scale software products such as web frameworks (Flask, Django), database ORM engines (psycopg, sqlAlchemy) and utility libraries (json, ssl, regex) including standard APIs for popular cloud services such as GoogleCloud and AWS.
  • The availability of a shell interface makes the development process very interactive.
  • Research to Production transition is very smooth as the python ecosystem provides tools and libraries for both ends of the spectrum.

As said earlier PySpark is the python programming interface to Apache Spark cluster computing engine.

About the tutorial

This tutorial has been prepared to help someone take the first steps and familiarize themselves with Apache Spark Computing Engine using the python programming interface (pyspark).

This tutorial uses multiple resources to demonstrate different aspects of using Apache Spark.

  • Presentation slides to explain the theoretical aspects and logical  functionality of Apache Spark
  • A Python Notebook with executable code that will allow you to run code and experience how Apache Spark works. The notebook is also a potential playground to set you imagination free and experiment your own additions to existing code
  • A software environment that has some data, libraries to run the tutorial code

Structure of the Tutorial

The Tutorial is structured into three main parts.

  1. Spark RDD API : Spark RDD API is the core API that allows the user to do data transformation operations on rows in the data set in the raw form.
    • The slides for this section explains how to Apache Sparks computing engine works and how the (key, value) records are manipulated using the RDD API.
  2. Spark DataFrame API and Spark SQL : DataFrame API is the enhanced API developed as part of the Apache Spark project to use the benefits of having schematic data to process data efficiently.
    • The slides for this section shows how DataFrame API works under the hood and why it is recommended to use the DataFrame/ SQL interfaces to transform data where possible
  3. Final Remarks : This section outlines some of the other attractive features in Apache Spark.
    • This section outlines some complementary projects that makes Apache Spark more attractive for data processing and machine learning.
    • Some features in newer spark releases that complements manipulating data efficiently
    • References to some papers and URLs that provide additional information that will help you better understand how Apache Spark works.
    • Some references to where to go from here (Beginner to Expert 😀 )…

Presentation Slides

Presentation slides outline the theoretical aspects of  the technology and logical explanations about how things work and why.

The presentation slides are found below:


Jupyter Notebook

The Jupyter notebook is a very useful tool to observe how pySpark syntax works. Furthermore, it can be used to experiment on existing Raw Data, Spark Dataframes, RDDs in the notebook to get a better understanding about how Spark works.

The notebook with all the required data files can be found in the following github repository:

Click here to go to the git hub repository.

Directions to set up the local environment to run the git hub repository is found in the below section: How to set up the environment

How to set up the environment

You will to set up the environment to run the tutorial in your local machine. To keep your local python environment uninterrupted, I highly recommend you setup a python virtual environment to run this code.

There are a few steps to setting up the environment. They are:

  • Set up the virtual environment
  • Download the Git hub repository
  • Install the required libraries
  • Download and set up Apache Spark
  • FIRE AWAY !!!

Set up the virtual environment

  • Go to the local directory you want to create the virtual environment
    • Eg: /home/in4maniac/path/to/the/desired/venv/directory
 cd /home/in4maniac/path/to/the/desired/venv/directory 
  • Create a virtual environment with desired name
    • Eg: spark_tutorial_env
sudo virtualenv spark_tutorial_env
  • Activate the virtual environment
 source spark_tutorial_env/bin/activate 

Once the virtual environment is activated, you should see the virutal_environment_name within brackets in front for your shell terminal


Download the Git hub repository

Now that you have set up the virtual environment, you should download the git hub repository.

  • Go to the local directory you want to clone the git hub repository
    • Eg: /home/in4maniac/path/to/the/desired/github/clone/directory
 cd /home/in4maniac/path/to/the/desired/github/clone/directory 
  • Clone the git repository
    • mrm1001/spark_tutorial.git
 git clone 


Install the required libraries

We need to install the python libraries we need to run the tutorial. The following python libraries have to be installed.

  • numpy : is a prerequisite for scikit learn
  • scipy : is a prerequisite for scikit learn
  • scikit learn : required to run the machine learning classifier
  • ipython : required to power the notebook
  • jupyter : required to run the notebook

The following code snippet will install the ideal versions of libraries :

cd /home/in4maniac/path/to/the/desired/github/clone/directory/spark_tutorial
sudo pip install -r requirements.txt

Download and set up Apache Spark

The final building block is Apache Spark. For this tutorial, we use Apache Spark version 1.6.2.

There are multiple ways you can download and build Apache Spark (Refer here). For simplicity, we use pre-built Spark for this tutorial as it is the easiest to set up. We use Apache Spark version 1.6.2 pre-built for Hadoop 2.4

  • Go to the Apache Spark download page
  • Select the right download file from the User Interface
    • Choose a Spark release: 1.6.2 (Jun 25 2016)
    • Choose a package type: Pre-built for Hadoop 2.4
    • Choose a download type: Direct Download
    • Download Spark: spark-1.6.2-bin-hadoop2.4.tgz
  • You can download the file by clicking on the hyper link to the .tgz file


  • Unzip the .tgz file to your desired directory
tar -xf spark-1.6.2-bin-hadoop2.4.tgz -C /home/in4maniac/path/to/the/desired/spark/directory/
  • As this package is pre-built, Spark is ready to use as soon as you unzip the package 😀


Now that the environment is set, you can run the tutorial in the virtual environment.

  • Go to the directory where the local copy of the git hub directory is (download location in section: Download the Git hub repository)
cd /home/in4maniac/path/to/the/desired/github/clone/directory/spark_tutorial
    Run pySpark with IPYTHON options to launch pyspark in a Jupyter Notebook
IPYTHON_OPTS='notebook' /home/in4maniac/path/to/the/desired/spark/directory/spark-1.6.2-bin-hadoop2.4/bin/pyspark
  • Jupyter Notebook will launch in your default browser upon triggering the above command.


  • Select “Spark_Tutorial.ipynb” and launch the notebook
  • Start playing with the notebook
    • You can use the option Cell >> All Output >> Clear option to clear all the cell outputs.

Presentation Videos

A video of the tutorial that was done in London a while ago is hosted in the following youtube URLs. Although some content might have changed over time, these links should be useful.

Maria Mestre on Spark RDDs


Sahan Bulathwela on DataFrames and Spark SQL


Finally !!!


Hope you guys enjoy it !!

in4diary : Strata Conference (Day 1)

Strata Hadoop world conference is the first technology conference I am attending in London since my return to this beautiful city. Time flies so fast that it feels not so long ago I attended Cisco Live in San Francisco. Overall, I should say it was a mix of both technical and business talks. Some talks were more technically focussed while others were business oriented. This is justifiable due to the fact that the Strata + Hadoop world conference was targeted to both engineers and managers. So there it went, for 3 days at Hilton Hotel near Edgware Road Station, London. 11215334_10155512300555291_33391154_o I had the privilege to attend two out of three days of the conference. The first day was mainly focussed on training programmes and we skipped it due to this reason. Through this blog post, I will outline some of the takeaways I presonally grasped from the conference. The article will be in two parts where I will cover the day 1 at the conference in the former article and day 2 in the latter. Being an engineer, most of the keynotes did not carry a lot of weight in terms of technical details. But I should say that I enjoyed some of the valuable messages delivered by heads of large corperations and found them useful in terms of learing how to manage innovation and the business perspective of things. Amongst Day one’s keynotes, I loved how the keynotes  emphasised on the fact that today’s computer systems have transformed from methodical computer systems into complex human-machine systems. This is the same argument I tried to point out in my last blog post about Complex Systems. The keynotes also talked about the new regimes of startups and creative innovators that are revolutionizing both the Computer Science and Big Data landscapes. One of the most innovative companies I came accross is Brytlyt, a GPU relational database company that uses GPUs to run relational database engines. I was also fascinated by Julie Mayer’s perspective on the startup culture. The talked emphasised why new startups should think how to coexist with giants like Google and Facebook rather than thinking how to take them down. Once the keynotes were finished, session and tutorials started. There were several sessions that stimulated me alot. The first lecture I was most excited about was the session about Multi-model databases. In current IT world, Data is king. And Data comes in all sizes, shapes and formats. Due to this reason, the full plate of data in any organization would be stored in multiple silo’s in different formats. Some data is best performent in document format when they should be searchable and relatable. Some data has connections that represet a net or a graph. Some information need fast access. ArangoDB is one of the new generation NoSQL database that allows all data of document, key-value or graph-y forms to be stored in one unified datastore.

Querying multi-model with ArangoDB

Querying multi-model with ArangoDB

It also allows you to seamlessly mix and match query components that allow joining all this data within one query language. This also enables polygot persistence, the ability to store different data models in different hardware that performs best for the data models. Putting the product itself aside, I believe the conceptual aspect of allowing multi-model data structures to be housed under one database engine is very important for data management and information enrichment. If scalability can be embedded into this solution, the final conceptualization would lead to a very strong data tool. Another talk that was very popular among the attendees of the conference was Martin Kleppmann’s (LinkedIn) talk on data agility. This talk was mainly based on how LinkedIn uses Apache Kafka, a distributed commit log service to keep its data agile. Three main points discussed was how:11167637_10155499438490291_1001867868_o

  1. Data should be accessible: All data should be accessible and available. How linkedIn facilitates this is by providing all data (Including Databases) as streams.
  2. Data should be Composable : Data should be loosely coupled and atomic peices that do not depend on others to be informative.
  3. Data should have the ability to rebuild state: By keeping logs, we can preserve the source of truth. Historic logs can be used to replay and rebuild stateful data.

The talk about monitoring, and productionizing machine learning models was one of the other most interesting talks from Day 1 of the conference. There were some key take home lessons for us data scientists from that talk. They are:

  • When choosing offline metrics to measure performance of a model, the best metrics are the one’s that closely correspond to the business metrics of the organization
    • eg: if you are measuring a article recommender, although you are trying to predict the score for each article, it is rather better to use a ranking metric to measure the performance of model than a regression type metric. This is because the end business goal of the engine is to suggest relevant articles in the top.
  • When doing A/B testing, always do your math to understand for how long a experiement should run until you can come to a conclusion
  • Also in A/B testing, validate if the model assumptions are met and change your statistical tests according to the realty.
  • When you have rare classes in your multi-label classification problems, make sure you pick an appropriate accuracy measure to evaluate the model. (weighted, micro, macro)
  • Beware of the Shock of Newness: Everyone hates change, therefore, it is ideal to leave a burnout period before start measuring reaction to a change in the system.
  • Models go out of date: One should be aware that trends change and models go out of date.

These were the most intersting sessions from the first day at Strata Conference Lonodn. The second day was also pretty exciting with more scalable machine learning and Alot of Apache Spark which I will be covering in the next blog post. I hope you enjoyed reading this post. Do not hesitate to build up a conversation around the post to refine and improve it. Follow me in wordpress to stay in the loop for my latest blog posts.



SKIMLINKS IS HIRING A NEW DATA SCIENTIST !! If you know machine learning and if you have dying interst to work in distributed large scale data with Apache Spark, you might be working with me and my exciting team in the days to come. Get in touch with me with  a CV on

Complex Systems and how can they fail !!


This blog post is inspired by a meetup that I attended few weeks ago where they discussed how complex systems behave and how they can fail (papers we love). While listening to this presentation, I couldn’t help but realise how similar our experiences at work are to the facts that were revealed in this meetup. Therefore, I decided that I should write a post about our experiences on how we can deal with complex systems of this nature.

Complex Systems

Complex systems can be defined as a collection of components/parts, both physical and logical, working together to form a unified system. The individual components of this system are tied together to deliver very powerful solutions. But this system is very delicate and small changes can lead to catastrophic consequences. The term Complex system is coined to systems that are difficult to fit into conventional mechanistic concept provided by science (Complex Systems). When we think about computer science, present systems have swayed from being conventional waterfall modelled software projects to more resilient, reactive methodologies that enable developing solutions that adapt to rapidly changing environments. With the recent trending of new paradigms such as clouds, distributed computing, big data and various new technological vectors, the systems are required to adapt and scale rapidly. These trends have introduced a whole new domain of failure threats that have to be addressed when building systems. The introduction of micro-services, changing software and elastic hardware pieces to a system have led to having complex systems in Datacentres.

Classic Examples

A classic example of a complex (mission critical) system is a nuclear launching bunker. This system involves a series of software, hardware, physical and human components that have to work together smoothly to maintain the system. Although most of it is classified, this post gives a gist of the processes involving handling a nuclear silo. There are strict protocols and multiple measures implemented to fail proof the system as much as possible. Another great (maybe, not so great after what we’ve been hearing lately…) example is the airline protocols that are empowered to ensure technical and functional safety of a journey. The flight has to be checked several times and verified by multiple engineers before the flight is ready to take off. The communication and navigation channels are rigorously checked. On top of all this, there are strict protocols of conduct for the airline staff and passengers to ensure inflight safety. Having to turn off all electronic and communication devices during takeoff and landing is one of them. After the 911 attacks, it was essential for cockpit doors to have locks to prevent hijack. There are hardware such as the black box to record numerous sensor readings and sound recording to rigorously investigate any accident. You can refer to IATA Operations Safety Audit (IOSA) certification for full details.

At work…

Recently, I’ve been involved in some exciting projects that were critical to our company. One of the systems projects was a piece of software that produced customer facing data. The success of the system decided if the prototyped service will continue or not. Although this service cannot be considered as a mission critical system in comparison to transportation, healthcare and defense systems around us, it is fair to treat a system of this nature as a mission critical system within the context of a business organization. The performance and delivery of this product has dire consequences on how the business strategy would change in the days to come. Working in a Research and Development team, we thrive hard to build a Minimum Viable Product and start customer trials as soon as possible. At this stage, the system will:

  1. be “early adopter ready”
  2. give a sufficient representation of the end product

As we were dealing with TBs of data, this system was built around Apache Spark. A pre configured Linux Cron Job lets us trigger the daily computation process at 2am everyday. The system uses a series of configuration files and python scripts to

  1. Setup the Spark Cluster in Amazon
  2. Setup the environment
  3. Clone and configure the relevant git repositories
  4. Run a series of spark jobs (a data pipeline)
  5. Do monitoring and reporting

This system interacts with different subsystems such as Relational Database Management Systems, git, Amazon Web Services such as Elastic Computing (EC2), Elastic MapReduce (EMR), Simple Storage Service (S3) and so forth. There were two main non-functional requirements of this system.

  1. We should be able to incrementally improve the algorithms we were using to generate the information
  2. The system should deliver results on daily basis

As we are changing the algorithms (code, obviously) systematically, we have to make sure the system will not fail due to this. Therefore, changing anything in the system should be done with extreme caution to avoid any catastrophic failures.

Avoiding Failures

There are numerous ways that complex systems can fail. A good list of reasons can be found in a paper by Dr. Richard I Cook. In this section, I will outline some aspects of failures from this paper that we had to anticipate and how we addressed them.

Complex systems contain changing mixtures of failures as they are not perfect

Complex systems have multiple moving pieces. Due to the change in technology and processes, there is a possibility for latent failures to occur. Sometimes, analysing the error is challenging solely due to the fact that there are so many moving dependencies that keep the system together.

Change introduces new forms of failure

As I mentioned earlier, our system shouldn’t fail although we change our algorithms from time to time. The problem in implementing a change is that there are so many pieces that needs different configs when testing and deploying the code. We have different cron jobs, clusters configurations, data source configs that are different in testing and production.This demands a strict protocol for change management. One approach that we can use is to have standard checklists among the team to ensure that a strict protocol is followed when deploying a new change. Some example items in these checklists can be :

  • Record the git hash of the last stable version of the code
  • Check the cluster configuration files (more priority to specific parameters)
  • Check if the repository is in the correct branch
  • Verify the time and command parameters in crontab                                            & so on…

These checks allow us to make sure that vital areas of the unified system are validated before deploying the changes to production.

Complex Systems are heavily defended against failures

Complex systems are inherently and unavoidably hazardous. Therefore, they are built to be as fail proof as possible. The high impact consequences also demand these systems to be fail proof. Due to this, a lot of exception handling and good coding practice is used in building such systems. In terms of the system in context, the project owner defines a common standard on how functions and data items should be introduced to the main system. There is a defined structure that should be used when introducing

  • new data sources
  • new lookup tables
  • new machine learning classifiers
  • new data fields and etc…

This allows different people in the team to understand and review code from other members of the team. This is very important when different people are pushing different features that are merged to the same code base. It also helps immensely when we have to deal with someone else’s code fragments when mitigating a failure quickly.

Catastrophe is always round the corner

In addition to coding practices, we have to keep backup scripts that run stable code and mitigate failures when they occur. For instance, we should keep backup scripts that enable us to resume the system with previous versions of stable code in case the newer version fails unexpectedly. We should keep them prepared ( so that we could trigger them immediately without having to spend time troubleshooting the existing system before reacting. This is important in taking out the pressure when trying to debug the failure and lets us do a better job at fixing the current problem. This approach may not work for every system. But in some cases where multiple verstions of your system can produce the desired output, this is is a great way to mitigate failures without causing a chain reaction.

Multiple small failures lead to a catastrophic failure collectively

In the domain of complex systems, some failures by themselves are not large enough to be noticeable. But when these small errors add up together, it collectively triggers a catastrophic outcome. Our systems compute daily statistics that we use to compute weekly and monthly statistics for our data services. A minor technical error leading to saving incomplete daily statistical records ultimately lead to generating misleading weekly figures. These technical difficulties do not affect the results on daily level. But they lead to unacceptable errors when they are aggregated together. Something we can do to mitigate this type of failures is to validate some vital statistics that represent this type of errors. For example, number of records per day, the size of the output file and etc… can be validated for anomalies. This allows us to investigate more and deep dive into the problem if we observe anomalous values that are unlikely to occur (very little number of records, less number of partitions in a file and etc…).

Post-accident attribution to a root-cause is fundamentally wrong

A common mistake most of us do after an accident is trying to isolate a single root-cause. Although this works for simple systems that has a fairly straightforward structure, this approach seldom works with complex systems. Most of the time, the outcome is a chain of failures that occur in different parts of the system. They can be either dependent or independent from each other. No individual incident is sufficient to break the system. Lets look at an example. Imagine a system failing because it couldn’t reach one or several components of a geographically distributed DBMS. The natural instinct of the engineers would be to find the persons running the DBMS and encourage them to improve availability. By taking this action, we are ignorantly overlooking the fact that our system is not defended against DBMS service outages. In this case, there are multiple contributors to the failure although there is a starting point. Trying to isolate a single part of the system (and the person who built it) to be a root-cause only shows lack of understanding of the system. It is mainly the human and cultural urge to dump blame on an individual entity. A better approach would be to also assess the in-house system and implement some approximate querying mechanism to use the available sites. Which brings the next point.

Views of ’cause’ limit effectiveness of defenses against future events

Following from the earlier point, I cannot stop emphasizing the importance of looking at the big picture when mitigating accidents in complex systems. I came across a perfect example couple of days ago when our Spark clusters started throwing a Java Exception restricting SSL handshake renegotiation. This error was never thrown before, and only started when we upgraded the Java version we were using with Apache Spark. When we figured that this exception only gets thrown on long running spark jobs, the immediate remedy was to keep our data processes short. Alternatively, we could downgrade our systems. But once we digged into the problem, we realised that this error is a result of proofing TLS protocol stack from Poodle vulnerability. And we can use our system without this error by only downgrading TLS version. In the context of this example, even downgrading the TLS version is not the wiser solution because it makes clusters vulnerable to cyber attacks.

Safety is a characteristic of the system; not of its components

The safety of a complex system is not represented individually by

  • quality of defense in individual software components
  • defense mechanisms used in integrating the components together
  • level of expertise people bring in to the system
  • process integrity

but is a mix of all these parts. The best approach to defend a stable complex system from accidents is to use structured change to introduce changes to the system and stabilize the system as a whole.

Improve system components

  1. Having code reviews before pushing code to the repository
    1. Code review for logical correctness
    2. Peer review increases the possibility of seeing the bigger picture better
    3. Simple features should be written simple, complex features should be possible.
    4. Review for maintainability of source code is a must. If no-one wants to review your code, no-one would want to deal with it when it fails. Messy, unreadable code is unacceptable !!

Improve processes

  1. Having a source control strategy for R&D and production
    1. Put new features in new branches
    2. Review and merge them as soon as they are finished
    3. Having a separate Production branch avoids a lot of merging accidents

Improve people

  1. Train for hazardous situations
    1. Having exposure to hazardous experiences helps a lot when a real one comes on its way
    2. Netflix Chaos Monkey is a great example
    3. Familiarise the team with different parts of the system by getting individuals to work in multiple components. This also enables having multiple domain experts per component in a team


Complex systems may not be easy. But they are possible. By carefully investigating and analysing the complex system as a unified entity, a lot of complex problems that arise from complex systems can be managed. One of the primary lessons I learned with my experience is that complex systems can be done right. The key to success is to keep focus on the long-term solution rather than hiding the issue with a temporary fix. A temporary fix that can bring the system back up is very important. But this shouldn’t be your stopping point. The key is holding on to the issue until you have systematically mitigated the latent issue.

*** This is my first blog post after several years. So I am sure that this one is nowhere near perfect. I would really welcome your feedback or comments on this post so that I can shape it well and do a better job next time. Thanks alot for taking the time to read it. I hope it was helpful. 🙂

The Apple and Its Creator

“The visionary in the black turtleneck”, a cliché familiar to the majority of the world was a clear bench mark of innovation itself. He was always famous in the industry as a person who would set up new levels of standardization for every aspect of technological application while the rest were worried about uplifting the existing standards. He always dared to venture the stranger tides before rest of his contenders, to be the change that would revolutionize the digital consumer industry. He constantly managed to oversee the future of the industry and be the first to adapt to it. Daring to be different, always worked for him.It is amazing how he inspired a whole world to follow him. He always was the first to see and opportunity and streamline resources to realize it. Starting with design, to the technical detail, he revolutionized the digital consumer industry. Steve always believed that the success of a product depended on its look and feel rather than the raw technical perfection. It is one ideology that he proved to be correct in every product he introduced to the world. He was one of the first persons on earth to identify the usefulness of Graphical User Interface (GUI) and control potential of the Mouse. But the most important reality behind this truth is that he was the first person to integrate these concepts with iconic “1984” to redefine consumer expectations. His primary vision for Apple was to always achieve minimalist design. His contribution towards music is also impeccable. The itunes revolutionized the sole idea of purchase of music and other media content in the entertainment industry. The tweets from various artists in response to Steve’s death prove its capacity. Then comes the “i”conic family of digital consumer products which he proudly introduced to the global consumer himself. Ipod,(2001), MacBook (2006), iPhone (2007) and the iPad (2010). These were products that redefined portable music players, laptops, smartphones and tablet PC on their own domains. Apple’s contribution towards device usability has been immense and its popularity has triggered major usability changes in mainstream contender products and services. His quotes have always driven people totake example of his revolutionary, daring qualities. Some of the noteworthy quotes are as follows.

Image via Wikipedia

“When you first start off trying to solve a problem, the first solutions you come up with are very complex, and most people stop there, But if you keep going, and live with the problem and peel more layers of the onion off, you can often times arrive at some very elegant and simple solutions.”

“Sometimes when you innovate, you make mistakes. It is best to admit them quickly, and get on with improving your other innovations.”

”Innovation distinguishes between a leader and a follower.”

Steve Jobs was no saint though. He wasn’t god either. He was very famous among his collogues for throwing temper tantrums at employees and firing them unnecessarily. He was also unpopular in philanthropic work.  While his competitor billionaires such as Bill Gates invited the wealthy to share half of their fortunes with the suffering helpless, Steve never joined hands with them. He always focused his ideology around expanding Apple and bringing out more products. In the first biography of his which is due this November, there is an insight about some of his confessions towards his early life. And being gifted with such revolutionary skill of innovation and creativity, there is no indication of any instance where this marvelous ability has been utilized for community service or social benefit. The luxuries and services of his wonderful mind were limited to the affordable and it always came at a cost.

Apple and the global digital consumer community as a whole mourn at this dark hour of fate. The ship has lost a great captain. It can be seen all over the internet how hard this bad news has struck the world community. It has also been a massive blow for Apple Inc. as both they and majority of its clientele have been confiding on Steve Jobs than the company itself. This can analyzed as a consequence of the immense media attention given to Steve. This tendency can be very well explained when Wall Street reports a drop of apple shares by 0.2% on the day he stepped down and by 0.7% on the day of his departure. These fluctuations are showing that a considerable number of shareholders have been confiding more on Jobs’ skills and capabilities than of Apple family as a whole. But it is very unlikely to predict that Steve’s departure will make the decline of Apple Inc.

“I mean, some people say, ‘Oh, God, if [Jobs] got run over by a bus, Apple would be in trouble.’ And, you know, I think it wouldn’t be a party, but there are really capable people at Apple. My job is to make the whole executive team good enough to be successors, so that’s what I try to do.”

Steve Jobs, CNN Money

Apple has one of the most dynamic and energetic workforces in the whole wide world. And these experts have been molded in to a highly intensive and innovative organizational culture while tempering the creative skills every second. It is very unlikely and misleading for people to assume that a company like apple would die off just by the death of Steve Jobs. Chances are the legacies will continue for the generations to come.

The way the world community took this message is quite interesting though.

“The popular reaction to his demise is an indicator of the obsessive materialism of the world we live in… I am disappointed that we have no better visionaries to admire… visionaries whose talents are focused more on social and cultural innovation than material innovations…”

-A very Understanding Friend-

This is a very interesting concept that is worth exploring. It is true that admiration of such materialistic perfection suggests the obsession of material on a global scale. This idea actually makes perfect sense. But it is also noteworthy that when seeking example from Steve’s character, one always has a choice to take what is right and useful and reject what is wrong. As mentioned above, Steve Jobs is neither god nor a saint. He has had his flaws. In terms of material innovation, he has had a great ride and given so many lessons to the community to indulge in. It is the responsibility of the community to analyze them and use his lessons in realizing creativity and innovation with a social and cultural value. The ones who learn from him should also be responsible to utilize his skills and wisdom in favour of the global community that does not measure monetary success. There is a great possibility that there are a handful of people who been inspired by jobs with perfect understanding and utilized their knowledge and experience with more social responsibility. The bottom line is sometimes you have to be a pirate to get a social cause up and running. And this wonderful idea of his is highly relevant to social service in these terms.

Steve’s apple will always bear bright and shine before us as shiny and tempting as it can get. But, it is our actions that would define our role in this play. If we would be Adams and Eves who would misuse this marvelous creation under materialistic obsession or a bunch of Newtons who would look at it in an optimistic perspective to understand the “”Gravity of his lessons” is our choice. His life and examples will leave before use numerous possibilities and opportunities. But it is us who would decide how the change is going to affect the social system at the end of the day.

Unfortunately, That Day Has Come: The Life of a True Visionary

It was shocking news on Thursday morning to see my facebook wall overflowing with a spree of status updates and comments that could be described with key words Steve Jobs, Apple, RIP and iSad.  Indeed half of the tech community of the world was in a very bad emotional status. Steve Jobs, the celebrity CEO who made many a dream, a living realty has indeed rested in peace. The hotshot technology change maker sought the realms of death at 56, on October 5, 2011 after a long and bitter battle with pancreatic cancer since 2004. It seemed to the whole world that he saw it coming with his retirement from the executive direction of Apple Inc. a legendary success story he built himself from a garage in Silicon Valley to a global staking technological superpower that redefined way of life for the 21st century human being.


Steve was born on February 24th, 1955 to unwedded parents and was later adopted. He was raised in Cupertino, CA, Apple’s long time home. He showed promising signs of a genius on the making from his early stages of life. He called Mr. William Hawelette to obtain certain computer parts for one of his college projects at the time when Steve was just a teenager. Not only did he win the required parts, but did he win William’s heart and a summer internship at Havelette Packard (HP). After being selected to Reeds Collage, Portland, Steve dropped out of college and headed his sail to India for spiritual exploration and psychedelic experiences.

While at HP, Jobs made friends with Steve Wozniak. Then, both of them joined the Silicon Valley Computer Hobbyist club where they met their other cofounders of apple. With the main contribution from Jobs and Wazniak, Apple was launched on April 1, 1976. And their company thrived forward rapidly with their innovations and soon jobs recruited Micheal Scott, an experienced CEO to manage the rapid growth of Apple while steering it to greener pastures. With the launch of Apple II, Apple revolutionized the Personal Computer Market being one of the first popular and successful mass market products. After jobs was relieved from his duties at apple after a conflict of interest between him and John Scully, the contemporary chairman of Apple at that time, Jobs went  to join NeXT Computer systems where he focused more on Software industry.  After a rapid growth of NeXT in the years to come, Jobs bought PIXAR, formerly known as the Graphic Group. Which he envisioned and boosted into a multimillion company that created hit 3D animation movies like the toy story series. In 1996, apple bought NeXT and jobs was back on home grounds. But this time, he had the steering wheel.

In 2001, Apple entered the portable music player industry under the patronage of Steve to launch the revolutionary iPod mp3 music player that stands out sustaining highest sales under the same brand to this day. In 2007, apple takes a leap forward and enters the mobile handset market with the iPhone. And Apple’s first venture into business mobile phones industry was a smash hit selling more than 100 million handsets worldwide to become the smartphone market leader. April 2010 marked the birth of the latest member of i’s to the Apple family, the iPad. iPad also marked a benchmark in tablet PC industry for its contender products to abide by. Along the evolution of Apple Inc. from its humble beginnings to the glorious present, Steve has always been the leading decision maker who dared to keep his stakes in changing every core ideology behind digital life. And he took trouble to introduce a handful of his little baby “i”s to the world himself.

Since 2004, the future of apple has been quite shaky by the on and off disappearance of Steve Jobs from the leadership chair. His concerns over his health condition managed to attract a lot of media attention which he least fancied at times of such. And this man leaves a void in our hearts and souls by a well- structured departure. Steve steps down from executive chairmanship to give his place to Tim Cook, the former Chief Operations Officer of Apple to join the team as a non- executive chairman. And he passes away on October 5, 2011 leaving millions of fans, consumers, followers and disciples in a state of shock.

The Story of Googlemoto : The Bright Side

The breaking news around town these days is Search optimizations and Advertising giant Google acquiring Motorola Mobility Inc. After almost a year of anticipation, Google finally announced last week on 15th of August in its investor relations blog that it acquired Motorola Mobility Inc. (Listed as MMI at New York Stock Exchange) for approximately US$ 12.5 billion. Google bought Motorola paying $40.00 per share in cash which is actually 63% premium to the closing price of Motorola on Friday. Digging the money out of its 39 billion dollar cash reserve, this deal marks Google’s biggest acquisition ever. It is in fact quite interesting to see such humongous alliance activating as short as six months after Google present CEO, Larry Page replaced its former CEO, Eric E. Schmidt who acts as the current Executive Chairman at Google.

“This deal also brings a lot of value and confidence to the Motorola Mobility Stake holders” says Sanjay Jha, the Chief Executive Officer of Motorola Mobility Inc. Motorola Mobility houses top notch computer hardware production lines around the globe that are administered and managed by 19,000 expert employees. Motorola Mobility Inc. has been literally the pioneer of research and development of Mobile Devices industry for the last several decades. From the earliest introduction of the portable cellular phone, Motorola Research’s name runs along with numerous inventions related to mobile communications, Cable Networks, and In-Home Product lines. Motorola have been part of several firsts of the most popular technologies in the world. According to its Official website, Motorola Mobility owns approximately 14,600 (yes, you read it right. It is Fourteen Thousand Six Hundred) odd grated patents out of 24,500 patents that are owned by all segments of Motorola. That is approximately 60% of the whole of patents owned by the company. Furthermore, they have around 6700 pending patent application worldwide.

The “GoogleMoto” alliance as I see, has an enormous impact on the technology industry. As common as every business deal that takes place in the arena of technology, this deal too has its perks and drawbacks. In this article, I will discuss the Bright aspect of this alliance.

Acquiring Motorola Mobility will provide Google with a marvellous opportunity to build a solid foundation in vertical integration of Android Ecosystem. Vertical integration is achieved when several segments of a supply chain unite under one owner in pursuit of a common goal. This means that Google can finally use the full potential of the Android ecosystem through planning the future products of Motorola in a more android friendly manner. This integration can also lead Google to a vertical monopoly in the long run if GoogleMoto’s Contenders like Microsoft, Apple, WebOS, HP (Google rivals) and HTC, Samsung, LG, Sony Ericson (Motorola rivals) do not respond timely.

Google, throughout its history has tried several times to enter the mobile hardware industry through products in the Nexus family. Even then, Google had to depend on mobile device manufacturers such as Samsung for the fact that Google did not have its own handset production unit. Not anymore, as Google now have control over one of the market leading handset producers in the world. Google manages to break free from its dependence on manufacturers finally. reports Motorola as the 3rd leading mobile OEM with 16.5% share of subscribers in the US as at January 2011. In the same survey, Google leads the Smartphone Platforms Category with a 31.2% share which is also a 7.7% increase over a 3 month time period from October 2010. Gartner Inc., a leading research firm also reports that the market share of Google Android to be 43.4% in the Quarter 2, 2011 followed by Symbian (Nokia) and iOS (Apple) Worldwide as per their report on August 11,2011. These statistics show a great opportunity for Google to ally with Motorola to build the next generation’s smartphone in near future.

The stupendous portfolio of granted and pending patents that Motorola Mobility processes is the cherry on the icing cake of this deal. As mentioned above, approximately 21,000 granted and pending patents on mobile and in-home products segment will be under ownership of Google when this acquisition comes to completion. This will unleash Google with enormous possibilities to look forward in dominating the mobile communications industry in the coming years. Now that Google has gotten its own Mobile Hardware Labs and Factories with state of the art equipment and 19,000 best hardware experts in the world, we can look forward for a flawless range of smartphones in the years to come. This deal certainly matches up Google with its main contender Apple who already got full control over both its iPhone and iOS Divisions.

In his blog post, Larry Page further explores the opportunities of Motorola being the market leader in Home Devices and Video Solutions business. This can be interpreted as one of the key interests of Google to take such a humongous bid on Motorola apart from its eager attempt to stabilize the patent war against Android.  Motorola Mobility holds as much patents related to H.264 advance video encoding technology, MPEG 4 (a term we all are familiar with), 802.11 Wireless Local Area Network protocols, and Near Field Communications (also known as NFC) used for payments using smartphones, file sharing and etc… Statistically, Motorola proudly boasts about approximately 1,900 granted and 1,300 pending patent applications on Home business segment.

“Motorola is also a market leader in the home devices and video solutions business. With the transition to Internet Protocol, we are excited to work together with Motorola and the industry to support our partners and cooperate with them to accelerate innovation in this space.“

Larry Page, August 15, 2011

We can clearly see Larry’s vision in revolutionizing the home entertainment business segment with a GoogleMoto alliance. The union will actually facilitate an integration of the most advanced, yet protected technologies in Home Entertainment Hardware, Cable Broadcasting, Internet Protocol, Advertising, Service Provision, NFC and various other technologies. The opportunities and possibilities of products and services that can be provided in the home business segment are quite immense.

There too is a dark side of this alliance which I will discuss in detail in my next article. For now, I pray that the brighter side turns out to be reality. It sure is big and it sure will mark a new chapter in the information technology industry. It could be for better, or for worse. But the destiny sure lies on the future decisions of GoogleMoto alliance.

Energy Conservation Starts with Your Computer’s Wallpaper !!!

“It is an inescapable reality that fossil fuels will continue to be an important part of the energy mix for decades to come.”

UK government spokesperson, April 2008

Energy Crisis is a very popular term that is being discussed in the world today. While the globe is pacing forward with a rapid improvement rate in the industrial revolution, the demand for energy increases rapidly every second. The sectorial petroleum demand information illustrated by the Sri Lanka Sustainable Energy Authority clearly shows how the Demand for energy has increased through years. The main source of energy at present is fossil fuel which is a limited resource. The usage of Fuels used for Power generation has also increased immensely during the last few years. At the rate the energy demand increases at present, the figures and tables show a clear indication of an energy crisis around in 2030 if this demand problem is addressed improperly. Hence it is extremely important that energy is saved in every mean possible.

Sri Lanka Energy Balance 2007 report by Sri Lanka Sustainable Energy Authority (SLSEA) states that the energy consumption of Sri Lanka has rapidly increased after 1997 and is increasing. The gross electricity generation, Hired and off-grid Thermal Generation and Fuel usage and other tables and figures all show the high rate of increase of energy demand. This report by SLSEA also shows how much fuel has been consumed in production of electricity requirement.

The utilization of computers and digital media in the world has had a great improvement within the last decade. With this rapid expansion, the main source of interfacing between digital multimedia and the humans have been display monitors. Apart from the computers, display monitors are being used as Televisions, Mobile phone screens, tablet PC Screens and as various other implementations. A study done by researchers at Ernest Orlando Lawrence Berkeley National Laboratory shows that the energy consumption of display monitors differ with the colours they display on the screen. Their readings show that the energy consumed to display a full black screen differs than that of a fully white screen. Robertson et al, 2002 clearly depicts that the display colour of an image does impact the power consumption of the display monitors. The readings recorded by the group in their research report clearly shows a difference between the power dissipation of full black screen, and a full white screen.

During the last few years, the technological utilization in display monitors has greatly diverted from Cathode Ray Tube (CRT) to Liquid Crystal Display (LCD) and Light Emitting Diode (LED) technologies. The world’s main display technologies are now dominated by CRTs, LCDs and LED Displays. As these are different technologies that use different mechanisms to generate colours, the rate of energy consumption differs from each other.

United States Department of Energy has investigated and published a reading set that they have measured on energy requirement of a CRT monitor in displaying different colours. This data set was quoted and published exactly the same way by They have further referred to a study done in September 2007 and state that approximately 75% of the developed economies use LCD monitors while the rest uses CRT screens.

Researchers from the Simon Fraser University in Canada have developed two different color maps for organic LEDs that, thanks to an appropriate choice of colors and by exploiting characteristics of human perception, can consume up to 40 percent less power and could be used to increase battery life for a wide range of portable devices. The details of this research have been quoted by

As mentioned above, the amount of power consumed by any display technology differs with the type of colour they display. Millions of visual images are generated in the world every day in forms of images, user interfaces, advertisements and etc… Due to the lack of a proper matrix or a scale that will categorize the colours in terms of energy efficiency, these images are produced inconsiderate of the energy efficiency of the utilized colour combinations. These images are viewed all around the world every second in display screens costing a huge amount of energy that could have been saved if those images were generated with proper concern about the energy consumption of displaying the colours on screen.

Therefore, by creating a scale of values that will position different display colours according to their energy consumption, this scale can be used to compare the energy efficiency of different colour combinations and hence use in building eco-friendly screen images for display monitors. This will ultimately lead to an enormous power saving.

But however, it should also be mentioned that the eco-friendliness of colours do differ with the technologies used to display them. For example, the mechanisms used in CRT Monitors and LED Monitors solely differ from that that of the LCD. And while the CRT Monitors and LED monitors cost more energy to display brighter colours, the LCDs on the other had acts totally opposite to it. It cost more energy to display a darker colour in LCDs. This is because the LCD uses a technology that genetates a colour spectrum using its back light and uses the screen to block the unnecessary wavelengths. This way, a darker colour demands blocking more colour wavelengths which will ultimately cost more energy.

By determining the eco-friendly ness of colours that are being displayed in the humongous amount of Display monitors all around the globe, those colour schemes can be efficiently used. It provide a path way to assess the impact of displaying different colours on display monitors have on the environment. This assessment will help the global community to carefully choose the colour schemes when designing and developing screens, images or interfaces that will be displayed frequently throughout the world.

Awareness of using eco-friendly colour schemes will ultimately lead to saving of large amounts of energy that would have gotten wasted if an energy inefficient colour scheme was used.

Technology’s Impact on Brain Utilization: Trojan Horse of The Tech Revolution

Technology can be defined as using peripherals and new techniques to enhance the day-to-day life of people. In today’s context, there are numerous peripherals and techniques that have uplifted the human lifestyle. From the tiny nano particle to the humongous skyscrapers are evident as examples of this technological rising. At present, the advancement of technology has created a very user friendly surrounding around us.  Everything we seek for is a mere fingertip away.

Are we loosing Touch?

With this massive boost in technology, a lot of discussion is depicted in the areas of how these advances has affected on the human independence. On a present day context, we should,  alarmingly admit that we are very dependent on technology and have so much reliance on tiny gadgets labeled with catchy phrases such as automation, smart, sophistication, intelligent and etc… while we thrive forward very proudly boast of  the fact that we are very well supported with numerous “gadgemetics” created using top notch technology, we care to simply ignore the basic argument that we have paid a huge price in pursuit of  this level of convenience; that we are at the verge of abandoning lengthy list of perks that only human beings are blessed with.  These perks, we can collective call them “Human Independence”.

Recent researches have clearly shown that the brain utilization of the humans have drastically diminished amid the introduction of various technologies. Scientists have been proving to the world continuously that the new advances in life free brain from utilization in large scale in areas like memory, stochastic thinking, simple mathematics and so on…

These ample amenities have caused massive blow on human life in terms of memory utilization. It is a agonizing reality to reveal that we can simply reflect our own way of life and isolate numerous examples that prove this obvious truth. We no longer can remember a simple mobile phone number.  We cannot spell a word properly without a word processor. We hardly remember driving directions. This is how badly dependence on technology has downgraded our skills. Research work conducted by Author Nocholas Carr, author of  the book “Is Google making us stupid?” and Dr. Gary Small, Professor of Psychiatry & Bio-behavioral Sciences, and Director of the UCLA Center on Aging suggest through their research that the new technology does have a very serious impact of rewiring of neural networks in the human brain. Too much dependence on modern communication results in weak memory. “People who use tools to store information on their behalf, would likely find it difficult to remember many things”, said Prof Dzulkifli Abdul Razak, vice-chancellor of Universiti Sains Malaysia (USM) at the opening of the ICT Inovate week 2008 in Penang.

We indulge ourselves in the bogus concept shining brightly saying “New is always better”. But haven’t we evolved from hardworking, athletic personnel to a figurative vegetable that spends the majority of our time stationary in front of a computer staring at a flood of information feeds? The advancement of service industry has envisioned the world to a brand new level of convenience and luxury which comes at a terrible cost of human health.

In further discussion of technology’s impact on human activity, we cannot ignore the severe perceptional drift mankind is going through. It is surprising how human perceptions and values have evolved (rather had a massive blow) with the enhancement of technology. Our human relationships and inter dependencies have been drastically replaced with lonely, self-centered souls strongly entangled in self-contempt.  It is also a pity how we have lost the ability of Emotional Quotient of our brain which helps us to take very accurate decisions based on human instincts. With the advancement of technology, the rapid evolution of knowledge areas such as cognitive science, human perception, logic, philosophy, and reasoning sciences, the impact of the methodology implemented by our own selves have rather changed. The dark hour of reason has crushed the Emotional Intelligence that has been naturally gifted to mankind.

Mankind still is thriving forward rapidly having their focus on one aspect of development. It is high time that we understand and be aware of the world view before it is too late.