Thursday, December 27, 2012

Minimize Regrets And Not Failures

While I ponder on 2012 and plan for 2013, I always keep the regret minimization framework (watch the short video clip above) in back of my mind. Of course luck plays a huge part in people's success, but we owe it a lot to Jeff Bezos. We probably wouldn't have seen and we most certainly would not have seen EC2. No one predicted anything about Amazon being a key cloud player. A few years back Twitter didn't exist and Facebook was limited to college kids. I do make plans but I have stopped predicting since I will most certainly get it wrong.

"Plans are useless, but planning is indispensable." - Dwight Eisenhower

I use regret minimization framework not only as a long-term thinking tool but also to make decisions in short-term. It helps me assess, prioritize, and focus on right opportunities. While long-term thinking is a good thing, I strongly believe in setting short term goals, meeting them, and more importantly cherishing them. If you're not minimizing regret you're minimizing fear of failures. I don't fear failures, I celebrate them; they're a learning opportunity. As Bill Cosby put it, "In order to succeed, your desire for success should be greater than your fear of failure."

All the best with your introspection and indispensable planning for 2013. Focus on the journey, the planning, and not the destination, the plan.

Tuesday, December 18, 2012

Objectively Inconsistent

During his recent visit to the office of 37 Signals, Jeff Bezos said, "to be consistently objective, one has to be objectively inconsistent." I find this perspective very refreshing that is applicable to all things and all disciplines in life beyond just product design. As a product designer you need to have a series of point of views (POV) that would be inconsistent when seen together but each POV at any given time will be consistently objective. This is what design thinking, especially prototyping is all about. It shifts a subjective conversation between people to an objective conversation about a design artifact.

As I have blogged before I see data scientists as design thinkers. Most data scientists that I know of have knowledge-curse. I would like them to be  consistently objective by going through the journey of analyzing data without any pre-conceived bias. The knowledge-curse makes people commit more mistakes. It also makes them defend their POV instead of looking for new information and have courage to challenge and change it. I am a big fan of work of Daniel Kahneman. I would argue that prototyping helps deal with what Kahneman describers as "cognitive sophistication."
The problem with this introspective approach is that the driving forces behind biases—the root causes of our irrationality—are largely unconscious, which means they remain invisible to self-analysis and impermeable to intelligence.
This very cognitive sophistication works against people who cannot self-analyze themselves and be critical to their own POV. Prototyping brings in objectivity and external validation to eliminate this unconscious-driven irrationality. It's fascinating what happens when you put prototypes in the hands of users. They interact with it in unanticipated ways. These discoveries are not feasible if you hold on to single POV and defend it.

Let it go. Let the prototype speak your design—your product POV—and not your unconscious.

Photo courtesy: New Yorker

Friday, November 30, 2012

Enterprise Software Needs Flow And Not Gamification

I don't believe in gamifying enterprise applications. As I have argued before, the primary drivers behind revenue and valuation of consumer software companies are number of users, traffic (unique views), and engagement (average time spent + conversion). This is why gamification is critical to consumer applications since it is an effort to increase the adoption of an application amongst the users and maintain the stickiness so that the users keep coming back and enjoy using the application. This isn't true for enterprise applications at all. This is not only not true for enterprise applications, but gamifying enterprise applications is couterproductive that makes existing task more complex and creates an artificial carrot that does not quite work.

A design philosophy that we really need for enterprise applications is flow. I am a big fan of Mihaly Csikszentmihalyi and his book "Flow: The Psychology of Optimal Experience." I would highly recommend you to read it. Mihaly describes flow as a series of autotelic experiences as an activity that consumes us and becomes intrinsically rewarding. The core intent of gamification is to make the applications a pleasure to use. What people really want is enjoyment and not just pleasure. They are different. Enjoyment is about moving forward and accomplishing something. Enjoyment happens due to unusual investment of attention. It comes from tasks that you have a chance to complete, has clear goals, provides feedback, and makes you lose your self-consciousness.

All the gamification efforts by new innovative entrants that I see seem to be disproportionately focused on "edge" applications since it's relatively easy for an entrant to break into edge applications to beat an incumbent as opposed to redesigning a core application. But most users I know spend their lives using the core systems. They have no intrinsic or extrinsic motivation to use these systems. Integrate flow in these systems to create intrinsic rewards that creates autotelic experiences. Application designers have traditionally ignored flow since it's a physical element that is external to an application, but life and social status extend beyond the digital life and enterprise applications. You get to be known as that finance guy or that marketing gal who is really awesome at work and helps people with their problems to get work done. Needless to say, helping people and getting work done are intrinsically rewarding. Help these people with their core activities and make non-core activities as minimum or transparent as possible. If I am hiking, make my drive to the trail head as easy as possible but make my hike as rewarding as possible. That should be the design principle of how you integrate flow into enterprise applications. Also, focus on perpetual intermediaries; design applications to reduce or eliminate learning curve but introduce users to advanced features as they make progress to increase their productivity on performing repeated tasks. This helps create an intrinsic reward of having learned and mastered a system. As people learn new things they become more complex and unique human beings, and believe it or not, you can influence that in your design of your enterprise software that they spend their lives using it.

Photo Courtesy: Mark Chadwick

Tuesday, November 20, 2012

5 Tips On How To Network Effectively At Conferences

I go to a lot of conferences and quite a few people, including the ones that I mentor, have asked me how they can effectively network at a conference. Here are five simple but effective tips. Start practicing them at local meetups and refine them for large conferences.

Connect before the conference: 

Your networking efforts should start as soon as you decide to go to a conference or even before that. Go through the speaker list and search Twitter exhaustively to find and follow these folks, either directly or via a list. Interact with these speakers on Twitter to ask them meaningful questions. Also ask them if you can have 5 minutes of their time at the conference. Look up on LinkedIn and Plancast to identify who is going to be at the conference. Ask the organizer to send you a list of attendees. Some organizers would happily oblige. If any of these folks sound interesting, follow them on Twitter and reach out to them with a request to see them at the conference. Be specific about why you would want to see them. Do your homework to get up to the speed on some of the topics that you're interested in hearing more about at the conference. Use the conference sessions to enrich yourself and not to educate.

Be smart with your time:

Design your agenda upfront and put the sessions that you want to go to on your calendar. Spend your time wisely by not going to too many sessions. On an extreme, for certain conferences, I would suggest not to go to any sessions, at all. Differentiate between content and inspirational sessions - ask yourself why you are there. Once you sit down, you're in a zombie mode receiving content. Some speakers and panelists are good and some are not. Don't hesitate to leave or join a session in the middle. I closely monitor my Twitter stream in real-time based on a conference hashtag. If I see tweets from people praising other sessions, I walk out and go there. For asking questions, the worst time to approach a speaker is right before and right after the session. You're competing for his/her attention. Find (don't stalk) the person later on during a conference and follow up with your questions. I have sent emails to the speakers after the sessions and have received great responses.

Don't waste your time watching pitches of a vendor in the exhibit area or talking to a marketing guy/gal for the purposes of gathering information. You should research the products of vendors ahead of time and have a list of exhibits that you want to visit. Write down what you want to know and who you want to meet. Go to the booth and ask them those specific questions or demand to see a specific person. Even better, set up appointments ahead of time. If they can't answer your question or if you don't get to see the person you wanted to see, leave your business card and ask them to reach out to you. Don't become a victim of meaningless marketing and a sales pitch. Your time at a conference is far more valuable than that.

Don't miss coffee breaks and cocktail receptions:

Meet any and all people you can. Have meaningful conversations. Offer them to help and ask for help. The experts don't become experts merely based on what they think; they extensively collect information from other people and synthesize that to form a point of view. Ask yourself how you might be able to help them so that they can help you. Use your smartphone to send them a LinkedIn invitation while you are at the conference and take some notes of the conversation that you had. I typically use the back of the business card (that I receive) to take notes. Use Highlight to instrument and take advantage of serendipity.

Do not run out of business cards:

I have come across people during a conference telling me they don't have their business cards. If they are not lying, it's just ridiculous. You should never run out of business cards at a conference, ever. Keep them in your bag and keep them in your coat pocket. I even have a designated coat pocket to keep my business cards so that I don't have to shuffle things to look for one. I also use another designated pocket to collect business cards that I receive. I also keep a pen in my coat to take notes on the business cards. I keep two sets of business cards, on that has my cell phone on it and the other that has my land line on it. I never use my landline to take any incoming calls, only voicemail. If you want the person to call you, give them the ones with the mobile number on it. If not, give them the other card. Instead of a landline number you can also use a Google Voice number. Print a small QR code on your business card that directs people to your website which could be your LinkedIn page, page, or your blog. Make it easy for people to find you and know more about you. Needless to say, you should have a fairly detailed profile on the internet before you decide to go to a conference. If your company doesn't allow you to print your personal social media details on your company business cards, keep two sets of cards - the business as well as the personal.

Follow-up after the conference: 

This is the biggest mistake that I always see people make. Once you're back from a conference, you have only accomplished 50% of your task. Follow up with all the people whom you met. Send them emails with enough relevant information to jog their memory. The influential people meet a lot of people during a conference. So, don't just say hi, but go back to you notes and refer to a very specific conversation you had with them. Ask them if it would be okay to follow up with them. Typically no one says no, but you should ask. This gives you a right to send them a second email. Do NOT call them even if you have their phone number. That's what sales people do. Some people scan the business cards they receive using cloud-based services such as Cloud Contacts. If it works for you, do it. I don't.

Read the analysis and coverage of the conference by thought leaders and bloggers i.e. do read what I write :-). Compare and contrast your views. Comment on their blogs and tweets and continue interacting with them. Even better, create a Storify of what you liked the most. Use delicious to tag all the research material that you went through. Share your delicious tags on Twitter and let people add to it.

Wednesday, October 31, 2012

Building And Expanding Enterprise Software Business In Brazil

While in Brazil, describing his country, one of my friends said, "We have all the natural resources that we need to be a self-sufficient country and we have had no natural disasters such as earthquakes and hurricanes. The only disaster that we had: we lost the worldcup."

This pretty much summarizes Brazil.

A helipad in front of my hotel in São Paulo.
On one side, São Paulo, the seventh largest city in the world, has one the largest per capita helipads in the world where the rich people don't like to drive around in traffic in cheap cars to avoid getting kidnapped at stop lights. On the other side, it is one hell of a city, just like Mumbai - large, organized chaos, and money. It is growing and it's growing fast. While income inequality has been on a steep rise in emerging economies as well as in the western world, it is declining in the Latin American countries, especially in Brazil.

If you're thinking of building or expanding enterprise software business in Brazil, now is the time. This is why:

Developing to a developed economy

Brazil has been boxed into BRIC economies but in reality it behaves more like a developed economy with lingering effects of a developing economy. Even though corruption is rampant in Brazil, it exists at much higher level and a common man typically doesn't suffer as miserably as he/she would suffer in other countries such as India. Being a resourceful country, there are all kinds of jobs. The bureaucracy will break and the infrastructure will also catch up very soon due to the soccer world cup in 2014 followed by the Olympics in 2016. Don't apply your BRIC strategy to Brazil. Consider Brazil as a developed nation and aggressively expand.

Courtesy: Economist
Stronger middleclass

Middleclass has money and they are willing to spend. Brazilian tax laws are the most complex laws that I have ever seen.  Even though the global retail brands are present in Brazil, they are outrageously expensive. Making a weekend trip to Miami to shop is quite common. Even after paying for a plane ticket and hotels it is cheaper shop in the US. The retailers in Brazil are trying to better understand this behavior and the global brands are also looking at several different ways to market to this middleclass. As an ISV this is a gold mine that you should not be ignoring.

Local to Global

Following the nation's growth many local companies in Brazil are aspiring to go global, establishing their business in developed economies. Local ISVs neither have scale nor features to support these efforts. These companies (typically mid to large) are looking at global ISVs for help, and yes, they are willing to spend.

Then you ask, if it is this obvious, why aren't global ISVs already doing this. They are. It's obvious, but it is not that easy. These are the challenges you would run into:

Complex localization

Many global ISVs have given up localizing their software for the Brazilian market. The tax laws are extremely complex and so are other processes. If you are truly interested in the Brazilian market you need to build from scratch in Brazil for Brazil. Hire local talent, empower them, and educate them on your global perspective. Linux and related open source software talent is plentiful in São Paulo. These developers are also excited about the cloud are are building some amazing stuff. I would also suggest to either hire or partner with local domain experts as consultants, who can work with a product manager, to truly understand the nuts and bolts of local processes, laws, and regulations.

Rough sales cycles

Selling into large accounts is not easy. Work with partners for a joint go-to-market solution or have them lead or participate in your sales cycle. The sales cycle is not fair and square and the purchase decisions are not just based on merits of your offering. Even if customer likes a product, commercial discussion are a huge drag, from the sponsor, to buyer, to all the way up to purchasing. Be patient and take help of local experts to navigate these roads.

My taxi driver watching a live soccer game while driving
Language and culture

Speaking Portuguese is pretty much a requirement to get anything done. But, if you speak Spanish, you could get around and also pick up a little bit of conversational Portuguese. English-only approach won't work. Do not even attempt. Also, Brazilians don't like to be called Latin Americans. They like to be called Brazilians; avoid any Latin American references. While you are there, learn a thing or two from an average Brazilian about fitness. Unlike Americans, the Brazilians are not into junk food. At a churrascaria, they eat salad followed by meat followed by some more meat. If you wonder why they are so fit, especially in Rio, this diet perhaps explains. They do enjoy their lives and sip Cachaça at the beach, but they are damn serious about working out.

Tuesday, October 16, 2012

Analytics-first Enterprise Applications

This is the story of Tim Zimmer who has been working as a technician for one of the large appliance store chains. His job is to attend service calls for washers and dryers. He has seen a lot in his life; a lot has changed but a few things have stayed the same.

The 80's saw a rise of homegrown IT systems and 90's was the decade of standardized backend automation where a few large vendors as well as quite a few small vendors built and sold solutions to automate a whole bunch of backend processes. Tim experienced this firsthand. He started getting printed invoices that he could hand out to his customers. He also heard his buddies in finance talking about a week-long training class to learn "computers" and some tools to make journal entries. Tim's life didn't change much. He would still get a list of customers handed out to him in the morning. He would go visit them. He would turn-in a part-request form manually for the parts he didn't carry in his truck and life went on. Not knowing what might be a better way to work Tim always knew there must be a better way. Automation did help the companies run their business faster and helped increased their revenue and margins but the lives of their employees such as Tim didn't change much.

Mid to late 90's saw the rise of CRM and Self-Service HCM where vendors started referring to "resources" as "capital" without really changing the fundamental design of their products. Tim heard about some sales guys entering information into such systems after they had talked to their customers. They didn't quite like the system, but their supervisors and their supervisors' supervisors had asked them to do so. Tim thought somehow the company must benefit out of this but he didn't see his buddies' lives get any better. He did receive a rugged laptop to enter information about his tickets and resolutions. The tool still required him to enter a lot of data, screen by screen. He didn't really like the tool and the tool didn't make him any better or smarter, but he had no other choice but to use it.

Tim heard that the management gets weekly reports of all the service calls that he makes. He was told that the parts department uses this information to create a "part bucket" for each region. He thought it doesn't make any sense - by the time the management receives the part information, analyzes it, and gives me parts, I'm already on a few calls where I am running out of parts that I need. He also received an email from "Center of Excellence" (he couldn't tell what it is, but guessed, "must be those IT guys") whether he would like to receive some reports. He inquired. The lead time for what he thought was a simple report, once he submits a request, was 8-10 weeks and that "project" would require three levels of approval. He saw no value in it and decided not to pursue. While watching a football game, over beer, his buddy in IT told him that the "management" has bought very expensive software to run these reports and they are hiring a lot of people who would understand how to use it.

One day, he received a tablet. And he thought this must be yet another devious idea by his management to make him do more work that doesn't really help him or his customers. A fancy toy, he thought. For the first time in his life, the company positively surprised him. The tablet came with an app that did what he thought the tool should have done all along. As soon as he launched the app it showed him a graphical view of his service calls and parts required for those calls based on the historic analysis of those appliances. It showed him which trucks has what parts and which of his team members are better of visiting what set of customers based on their skill-set and their demonstrated ability in having solved those problems in the past. Tim makes a couple of clicks to analyze that data, drills down into line-item detail in realtime, and accepts recommendations with one click. He assigns the service calls to his team-members and drives his truck to a customer that he assigned to himself. As soon as he is done he pulls out his tablet. He clicks a button to acknowledge the completion of a service call. He is presented with new analysis updated in realtime with available parts in his truck as well as in his teammates' trucks. He clicks around, makes some decisions, cranks up the radio in his truck, and he is off to help the next customer. No more filling out any long meaningless screens. His view of his management has changed for good for the very first time.

As the world is moving towards building mobile-first or mobile-only applications I am proposing to build analytics-first enterprise applications that are mobile-only. Finally, we have access to sophisticated big data products, frameworks, and solutions that can help analyze large volume of data in real time. The large scale hardware — commodity, specialized, or virtualized — are accessible to the developers to do some amazing things. We are at an inflection point. There is no need to discriminate between transactional and analytic workload. Navigating from aggregated results to line-item details should just be one click instead of punching out into a separate system. There are many processes, if re-imagined without any pre-conceived bias, would start with an analysis at the very first click and will guide the user to a more fine-grained data-entry or decision-making screens. If mobile-first is the mindset to get the 20% of the scenarios of your application right that are used 80% of the times, the analytics-first is a design that should thrive to move the 20% of the decision-making workflows used 80% of the time that currently throw the end users into the maze of data entries and beautiful but completely isolated, outdated, and useless reports.

Let's rethink enterprise applications. Today's analytics is an end result of years of neglect to better understand human needs to analyze and decide as opposed to decide and analyze. Analytics should not be a category by itself disconnected from the workflows and processes that the applications have automated for years to make businesses better. Analytics should be an integral part of an application, not embedded, not contextual, but a lead-in.

Friday, September 28, 2012

A Lean Greentech Approach

I am a greentech enthusiast and I have been closely following the greentech VC investment landscape. The VCs like Kleiner Perkins who have had a large greentech portfolio including companies such as Bloom Energy are scaling down on greentech investment. Their current investment is not likely to get any returns close to what a VC would expect. The fundamental challenge with such greentech (excluding software) investment is that they are open ended capital-intensive; you just don't know home much time it would take to build the technology/product, how much it would cost, and how much you would be able to sell it for. The market fluctuations make things even worse. This is not only true in the case of start-ups but also true for the large companies; Applied Materials' grand plan to revolutionize thin-film solar business ended up in a bust.  

There's a different way to approach this monumental challenge.

Just look at how open source has evolved. It started out as non-commercial academia projects where a few individuals challenged the way the existing systems behaved and created new systems. These open source projects found corporate sponsors who embraced them and helped them find a permanent home. This also resulted in a vibrant ecosystem around it to extend those projects. A few entrepreneurs looked at these open source projects and built companies to commercialize them with the help of VC funding. Time after time, this business model has worked. Technologists are great at building technology, companies are great at throwing money at people, entrepreneurs are great at extending and combining existing technology to create new products, and VCs are great at funding those companies to help entrepreneurs build businesses. What VCs are not good at is doling out very large sum of money to bet on technology that doesn't yet exist.

If we need to make it work, we need a three-way relationship. People in academia should work on capital-intensive greentech technology projects that are funded by corporations through traditional grants. These projects should become available in public domain with an open source like license or even a commercial license. The entrepreneurs can license these technology, open source or not, and raise venture money to build a profitable business. The companies that are constantly contributing their greentech initiatives to public domain should continue to do so. Facebook's Open Compute project is gaining traction in its second year and Google continues to share their green data center design.

The important aspect is to differentiate technology from a product. The VCs are not that good at investing into (non-software) technology but are certainly good at investing into products. For many greentech companies, technology is a key piece such as a battery, a specific kind of a solar film, a fuel cell etc. Commercializing this technology is a completely different story. This requires setting up key partnerships such as eBay's new data center using Bloombox and Israeli government committing to a nationwide all-electric car infrastructure with Better Place.

Many large companies have set up their incubators or "labs" to find something that is fundamentally disruptive that could help their business. Later, there have been a very few success stories of these incubators or labs because the start-up world is way more efficient to do what big companies want to do. These labs are also torn between technology and products. My suggestion to them would be to go back to what they were good at - hiring great scientists from academia and working with academia on the next-generation technology to create a business model by either using that technology in your products or to license it to others who want to build business. This shifts the investment from a few VCs to a relatively large number of corporations.

What we really need is a lean greentech approach.

Photo Courtesy: Kah Wai Lin

Wednesday, September 19, 2012

Role Of Analytics In Creating New Consumer Behaviors

I am in India visiting a large customer who has heavily invested into organized retail stores, a relatively new category for the Indian market. Their head of analytics shared some details of their last promotion with me. They ran an email promotion to send out coupons that were valid on one and only one day -15th August, the Independence Day of India, which is a holiday in the country. They were really bold to take out a page-long ad in all large newspapers on the 15th August highlighting this promotion.

Their sales, in all regions, soared on that day. It not only soared but broke all their previous records. They registered the highest sale in that year which was more than the Diwali sale. In the American terms, they managed to sell more on 4th July than on the Black Friday. This shocked me. I analyzed their efforts further to better understand this behavior.

Indians in India don't drink beer, barbecue, or watch fireworks on the Independence Day. In fact they don't do anything. It's just another day except that you don't go to work and kids don't go to school. That was the key. Since they didn't have anything else to do they went to the store and shopped. They bought things they were contemplating to buy for some time. This is where coupons helped and they also ended up buying things they didn't need. Yes, they are quickly learning from Americans.

What amazed me the most that the company manufactured this behavior that was analytics-led. They studied all kinds of data, created a promotion, made sure that they can execute on their promotions, and customers came. And, they are using this data to further refine their promotions and store inventory.

Big Data and analytics are not only useful to instrument existing customers' behavior but they could also help create new customer behaviors. This is especially powerful when the company is in high growth mode and has a bold vision to do whatever it takes to gain a top position in the market.

As I blog this, Indian government just changed their policy to allow up to 51% of foreign direct investment (FDI) into multi-brand organized retail sector. India has miles to go before the organized retail sector shapes up; Indians still prefer to shop at mom and pop stores and not at a large organized Walmartish store. Due to lack of a mature organized retail sector the (Indian) companies don't have a pre-conceived bias on how to run a large brick and mortar store - that's a good thing. They are not localizing a global brand. They are creating a new brand, and hence new consumer behavior, from ground up. And, analytics has been playing a key role than ever before.

Photo courtesy: McKay Savage

Friday, August 31, 2012

Designing The Next-generation Review And Recommendation System

It's unfortunate that despite of the popularity of social networks and plenty of other services that leverage network effects, the review and recommendation systems that are supposed to help users make the right decisions haven't changed much.

Thumbs-up and thumbs-down or likes and unlikes signal two things: popularity and polarization. If a YouTube video has 400 thumbs-up and 500 thumbs-down it means that the video is popular as well as polarized, but it doesn't tell me whether I will like it or not. The star review system also signals two things - on average how good something is and whether it's significant or not. There are multiple problems with this approach. An item with 8 reviews, all 5 stars, could be really bad compared to an item that has 300 reviews with 3.5 stars. Star ratings alone, without associated descriptive reviews, wouldn't make much sense if there aren't enough people who have reviewed the item. Also, relying on an average rating alone could also be problematic since it lacks the polarization element. On top of it, the review and likes could be gamed.

Pandora's as well as Netflix's recommendations are a good example of using collaborative filtering to fine tune recommendations based on user preferences. The system aggregates the overall likes and dislikes and combines that with your taste profile and a few killer algorithms to recommend what you might like. If designed well and if it has large user population, it does work. But, the challenges with such system are missing descriptive reviews and lack of ability to perform any analysis on it. If I dislike a song on Pandora, it doesn't mean the song is bad in the absolute sense. It simply means it doesn't match my taste profile. This isn't entirely true if I dislike a blender. In this case, a descriptive context is more meaningful such as I don't like this blender because it doesn't crush spinach well. People who care to make smoothies and crush ice may not care about this issue. But, these consumers have to wade through large number of reviews to determine the product fit.

E-commerce sites review systems use the same descriptive as well as non-descriptive review systems, commonly used at all places on the internet, without any significant modifications, even if the expected investment of a user is much higher on their site. If I don't like a song, I can skip it. If I don't like a YouTube video, I can stop watching it and now if I don't like a movie I can stop streaming it. This does not apply in the traditional world of e-commerce. I absolutely need to make sure that I buy something that I like. Returning an item is a far more involved process than stop watching a movie. It's an exception, not a norm.

Word of mouth and passive buying

People shop in two ways: 1) they look for a specific product, research for it, and buy it. 2) they come across a product while not looking for it, like it, and buy it.

The second way of shopping, passive buying, is as important as active buying. There are many companies with a business model built around this impulse or "serendipitous commerce", but they don't leverage collaborative filtering. I would happily read reviews of products written by my friends and people that I trust regardless of whether I'm looking for those products or not. Think of it as Disqus-style aggregated reviews by people that I trust in my social graph. This is like an online version of a cocktail party conversation where someone is raving about a new phone that he just bought. I'm not looking for a phone, but I might, in a few days. This could create new interest or expedite my decision process. This isn't done well in the online world.

The word of mouth is still by far the best system for following recommendations. I invariably watch movies that my brother recommends to me and one of my friends will read all the books that I recommend to her. I have non-transactional relationship with my friends and family.

Contextualized long tail 

One of my favorite things, when I travel (leisure or business), is to try out at least one or two recommended Indian restaurants to see how Indian food compares from city to city and country to country (so far my vote for the best Indian food outside of India goes to London). While researching for a restaurant, I typically read all the reviews that I can find. Some reviewers are Indians and some are not. Also, for the reviews written by non-Indians, some are new to Indian food and some are not. In most cases people don't identify who they are and I end up guessing based on their username, description etc. These reviews, positive or negative, don't help me much to narrow down which restaurant I should try out.

I have always found the best food at the most unusual places. All sophisticated recommendation systems would fall short of helping me find such an unusual place. These places are not the hits. They are the long tail. Getting to this long tail isn't an easy process - a lot of asking around, digging for reviews, trying out a few awful places etc.

Privacy concerns and connected identities

As the debate between anonymity and identity continues, there has been a little or no effort to get to the middle-ground, a connected identity. As a marketer I don't care who Jane is in its absolute sense but I am interested in what she likes and dislikes based on her collective and aggregated behavior across the Internet and beyond. This is not an easy system to build and consumers won't sign up for this unless there's a significant value for them. The popularity of social networks is an example where even if users are arguably upset about their privacy they still use it since the value that they receive far outweighs their concern. And remember the social networks follow the power laws. As more and more people use it the network becomes more and more valuable to the users.

Why not design review and recommendation systems that are based on connected identities? Users don't want ads, the marketers do. If companies can focus on building good products, incentivize users to write reviews, and rely on great recommendation systems to connect the right users with right products they wouldn't need ads. The marketers are chasing the illusion of targeting the right users but the inconvenient truth is that it's incredibly hard to find those users and if they do find them, they don't really want ads. What they really want is value for their money. That is the inherent conflict between the marketers and end users.

Using connected identities beyond reviews and recommendations

Connected identities are also useful beyond reviews and recommendation systems. Comcast support is one of those examples where using connected identities could greatly improve their customer support.

Comcast started using Twitter early on to respond to customers' support issues. It was a novel concept in the beginning and they really understood Twitter as an effective social media channel, but lately that model has turned out to be as bad as their phone customer support. When I tweet to @comcastcares someones gets back to me asking who I am and what issues I have. You follow me, I follow you, you DM me, I DM you my info, and after few minutes, we are nowhere close to resolving the issue. What if Comcast allowed me to attach my Twitter account to my Comcast profile? I will OAuth that, for sure. When I tweet, they exactly know who I am, what problem I am experiencing, and how they might be able to help me. This is an example of using a connected identity without compromising privacy. Comcast knows their customer's billing information; it's transactional information. But they attempt to use Twitter to communicate with you without connecting these two identities.

I don't want to "like" Comcast or "follow" Comcast to be a victim of their spam and indifference. Comcast is easy to pick on, but there are plenty of other examples where connected identities could be useful.

Users don't like to be sold at, but they do want to buy. Let's build the next-generation review and recommendation system to help them.

Monday, August 20, 2012

Applying Moneyball To Cricket

What if a cricket team gets two batsmen to replace Sachin Tendulkar and still collectively get 100 runs out of them or have two not-so-great bowlers to replace Shane Warne and still get the other side out? If an eventual goal is to score, say 300+ runs in an ODI match does it matter how the runs are scored? What if you could find four players scoring 50 runs each instead of counting on Sehwag and Tendulkar types to score a century and lose miserably when they don't?

This is not how people think when it comes to cricket. That's also not how people used to think when it came to baseball until Billy Beane applied radical thinking to baseball, sabermetrics, now popularly known as Moneyball. On Base Percentage (OBP) became one of the most important metrics since then.

As yet another provocative aspect of Moneyball suggests, only thing that matters is whether a hitter puts a ball in play or not. Once the ball is in play the hitter does not control the outcome of that play. In cricket, when a fielder drops a catch could it be because the ball came too quickly to him, he was at the wrong position, or he was just too lame to catch it. Is there a difference between a batsman getting caught near the boundary as opposed to getting bowled? Currently, none. But, based on Moneyball, if a batsman gets caught, at least that batsman put the "ball in play." A little more practice and precision and that could have been a four or a six.

I want the cricket team selectors and captains (an equivalent of baseball general managers) to apply some of the Moneyball concepts to cricket, a sport older and more popular than baseball. In cricket, even though it's a team that wins or loses, there's typically more emphasis on the ability of an individual as opposed to measuring individuals in the capacity of how they help the team.

Bowling and batting powerplays are relatively a new concept in cricket. Skippers on either side don't have access to deep analysis of current situation and performance of opposite players in deciding when to take a powerplay. They make such crucial decisions based on their gut feeling and opinion of key players on the field. This is where data can do wonders. In baseball, managers keep a tab on an extensive set of data to make dynamic decisions such as which bullpen pitcher has a better track record against the current hitter, success of a hitter to get walks as opposed to hits etc. Most recent example is of Tampa Bay Rays aggressively using field shifting against powerful lefties, a practice that most baseball franchises still don't use or approve of.

In cricket, right handed bowlers switch from over the wicket to round the wicket mostly when whatever they are trying is not working. These decisions are not necessarily based on any historic data. In this case, it could be as simple as gathering and analyzing data about which batsmen have poor performance when bowled round the wicket as opposed to over the wicket. In baseball, using a left-handed pitcher against a left-handed hitter and using a right-handed pitcher against a right-handed hitter have proven to work well in most cases (with some exceptions). That's why there are switch hitters in baseball to take this advantage away from a pitcher. Why are there no switch hitters in cricket?

Why can't there be a dedicated bowler to finish the last over of the cricket match just like a closer in baseball? Imagine a precision bowler — a batsman who is trained as a "closer" — whose job is to throw six deliveries, accurately at a spot, fast or slow.  The regular bowlers are trained to bowl up to 10 overs, 6-8 at once, with a variety of deliveries (pitches) and a mission to stop batsmen from scoring runs and getting them out. A closer would only have one goal: stop batsmen from scoring. Historically, there have been a very few good all-rounders in cricket. It's incredibly difficult to be a great batsman as well as a great bowler, but there's a middle ground - to be a a great batsman and a closer. Some batsmen such as Sachin Tendulkar have been good at bowling off and on when the regular bowlers get in trouble (an equivalent of a reliever in baseball), but invariably their task becomes getting a wicket to break the partnership. Even if wickets are important, in most cases, it's the ability to stop the opposite team from scoring in the last couple of overs brings team a victory.

There is just one baseball, but there's no one cricket. The game of cricket differs so much from a test match to one day international (ODI) to Twenty20. But, a fresh look at data and analysis on what really matters and courage to implement those changes could do wonders.

Tuesday, July 31, 2012

Data Scientists Should Be Design Thinkers

World Airline Routes

Every company is looking for that cool data scientist who will come equipped with all the knowledge of data, domain expertise, and algorithms to turn around their business. The inconvenient truth is there are no such data scientists. Mike Loukides discusses the overfocus on tech skills and cites DJ Patil:

But as DJ Patil said in “Building Data Science Teams,” the best data scientists are not statisticians; they come from a wide range of scientific disciplines, including (but not limited to) physics, biology, medicine, and meteorology. Data science teams are full of physicists. The chief scientist of Kaggle, Jeremy Howard, has a degree in philosophy. The key job requirement in data science (as it is in many technical fields) isn’t demonstrated expertise in some narrow set of tools, but curiousity, flexibility, and willingness to learn. And the key obligation of the employer is to give its new hires the tools they need to succeed.
I do agree there's a skill gap, but it is that of "data science" and not of "data scientists." What concerns me more about this skill gap is not the gap itself but the misunderstanding around how to fill it.

There will always be a skill gap when we encounter a new domain or rapidly changing technology that has a promise to help people do something radically different. You can't just create data scientists out of thin air, but if you look at the problem a little differently — perhaps educating people on what the data scientists are actually required to do and have them follow the data science behind it — the solution may not be that far-fetched as it appears to be.

Data scientists, the ones that I am proposing who would practice "data science" should be design thinkers, the ones who practice design thinking. This is why:

Multidisciplinary approach

Design thinking encourages people to work in a multidisciplinary team where each individual team member champions his or her domain to ensure a holistic approach to a solution. To be economically viable, technologically feasible, and desirable by end users summarizes the philosophy behind this approach. Without an effective participation from a broader set of disciplines the data scientists are not likely to be that effective solving the problems they are hired and expected to solve.

Outside-in thinking and encouraging wild ideas

As I have argued before, the data external to a company is far more valuable than the one they internally have since Big Data is an amalgamation of a few trends - data growth of a magnitude or two, external data more valuable than internal data, and shift in computing business models. Big Data is about redefining (yet another design thinking element, referred to as "reframing the problem") what data actually means to you and its power resides in combining and correlating these two data sets.

In my experience in working with customers, this is the biggest challenge. You can't solve a problem with a constrained and an inside-out mindset. This is where we need to encourage wild ideas and help people stretch their imagination without worrying about underlying technical constraints that have created data silos, invariably resulting into organization silos. A multidisciplinary team, by its virtue of people from different domains, is well-suited for this purpose.

What do you do once you have plenty of ideas and a vision of where you want to go? That brings me to this last point.

Rapid prototyping

Rapid prototyping is at the heart of design thinking. One of the common beliefs I often challenge is the overemphasis on perfecting an algorithm. Data is more important than algorithms; getting to an algorithm should be the core focus and not fixating on finding the algorithm. Using the power of technology and design thinking mindset, iterating rapidly on multiple data sets, you are much likely to discover insights based on a good-enough algorithm. This does sound counterintuitive to the people that are trained in designing, perfecting, and practicing complex algorithms, but the underlying technology and tools have shifted the dynamics.

Wednesday, July 18, 2012

Learn To Fail And Fail To Learn

"I have never let my schooling interfere with my education" - Mark Twain

In a casual conversation with a dad of an eight-year old over a little league baseball game on a breezy bay area evening, who also happens to be an elementary school teacher, he told me that teaching cursive writing to kids isn't particularly a bright idea. He said, "it's a dying skill." The only thing he cares about is to teach kids write legibly. He even wonders whether kids would learn typing the same way some of us learned or they would learn tap-typing due to the growing popularity of tablets. He is right.

When the kids still have to go to a "lab" to work on a "computer" while "buffering" is amongst the first ten words of a two-year old's vocabulary, I conclude that the schools haven't managed to keep up their pace with today's reality.

I am a passionate educator. I teach graduate classes and I have worked very hard to ensure that my classes — the content as well as the delivery methods — are designed to prepare students for today's and tomorrow's world. At times, I feel ashamed we haven't managed to change our K-12 system, especially the elementary schools, to prepare kids for the world they would work in.

This is what I want the kids to learn in a school:  

Learn to look for signal in noise:

Today's digital world is full of noise with a very little signal. It's almost an art to comb through this vast ocean of real-time information to make sense out of it. Despite the current generation being digital native the kids are not trained to effectively look for signal in noise. While conceited pundits still debate whether multi-tasking is a good idea or not, in reality the only way to deal with an eternal digital workflow and the associated interactions is to multitask. I want the schools to teach kids differentiate between the tasks that can be accomplished by multitasking and the ones that require their full attention. Telling them not to multitask is no longer an option.

I spend a good chunk of of time reading books, blogs, magazines, papers, and a lot of other stuff. I personally taught myself when to scan and when to read. I also taught myself to read fast. The schools emphasize a lot on developing reading skills early on, but the schools don't teach the kids how to read fast. The schools also don't teach the kids how to scan - look for signal in noise. The reading skills developed by kids early on are solely based on print books. Most kids will stop reading print books as soon as they graduate, or even before that. Their reading skills won't necessarily translate well into digital medium. I want schools to teach the kids when to scan and how to read fast, and most importantly to differentiate between these two based on the context and the content.

Learn to speak multiple languages:

I grew up learning to read, write, and speak three languages fluently. I cannot overemphasize how much it has overall helped me. One of the drawbacks of the US education system is that emphasize on a second or a third language starts very late. I also can't believe it's optional to learn a second language. In this highly globalized economy, why would you settle with just one language? Can you imagine if a very large number of Americans were to speak either Mandarin, Portuguese, Russian, or Hindi? Imagine the impact this country will have.

A recent research has proven that bilinguals have heightened ability to monitor the environment and being able to switch the context. A recent study also proved that bilinguals are more resistant to dementia and other symptoms of Alzheimer's disease.

Learn to fail and fail to learn:

"For our children, everything they will 'know' is wrong – in the sense it won’t be the primary determinant of their success. Everything they can learn anew will matter – forever in their multiple and productive careers." - Rohit Sharma

As my friend Rohit says you actually want to teach kids how to learn. Ability to learn is far more important than what you know because what you know is going to become irrelevant very soon. Our schools are not designed to deal with this. On top of that there is too much emphasis on incentivizing kids at every stage to become perfect. The teachers are not trained to provide constructive feedback to help kids fail fast, iterate, and get better.

Our education system that emphasizes on measuring students based on what and how much they know as opposed to how quickly they can learn what they don't know is counterproductive in serving its own purpose.

Learn to embrace unschooling:

Peter Thiel's 20 under 20 fellowship program has received a good deal of criticism from people who are suggesting that dropping out from a college to pursue entrepreneurship is not a good idea. I really liked the response from one of the fellows of this program, Dale Stephens, where he discusses unschooling. He is also the founder of UnCollege. Unschooling is not about not going to school but it's about not accepting the school as your only option. Lately if you have looked at the education startups, especially my favorite ones — Khan Academy, Coursera, and Codeacademy — you would realize the impact of technology and social networks on radically changing the way people learn. Our schools are neither designed to comprehend this idea nor to embrace it. This is what disruption looks like when students find different ways to compensate for things that they can't get from a school. This trend will not only continue but is likely to accelerate. This is a leading indicator suggesting that we need a change. Education is what has made this country great and it is one of the main reasons why skilled immigrants are attracted to the US. Let's not take it for granted, and let's definitely not lose that advantage.

Originally, I had written this as a guest post for Vijay Vijayasankar's blog

Photo courtesy: BarbaraLN

Monday, June 25, 2012

With Yammer, Microsoft Begins Its Journey From Collaborative To Social

Confirming what we already knew, today Microsoft announced they are acquiring Yammer for $1.2 billion in cold cash. Here's a blog post by David Sacks, the CEO of Yammer.

Microsoft doesn't report a revenue breakdown for their individual products but SharePoint is believed to be one of the fastest growing products with annual revenue of more than $1 billion. Regardless of how Microsoft markets and positions SharePoint, it has always been collaboration software and not really social software. Microsoft does seem to understand the challenges it faces in moving their portfolio of products to the cloud, including SharePoint. Microsoft also understands value of having end users on their side even though SharePoint is sold as enterprise software. Microsoft's challenges in transitioning to the cloud are similar to the ones faced by other on-premise enterprise software vendors.

But, I really admire Microsoft's commitment by not giving up on any of these things. Skype's acquisition was about reaching those millions of end users and they continue to do that with their acquisition of Yammer. Going from collaborative to social requires being able to play at the grassroots level in an organization as opposed to a top down push and more importantly being able to create and leverage network effects. It's incredibly difficult to lead in with an on-premise solution retrofitted for cloud to create network effects. Native cloud solutions do have this advantage. Yammer will do this really well while helping Microsoft to strengthen SharePoint as a product and maintain its revenue without compromising margins. If Microsoft executes this well, they might unlock a solution for their Innovator's Dilemma.

With Yammer, Microsoft does have an opportunity to fill in the missing half of social enterprise by transforming productivity silos into collaborative content curation. As a social enterprise software enthusiast, I would love to see it happen, sooner rather than later.

At personal level, I am excited to see the push for social in enterprise software and a strong will and desire to cater to the end users and not just the decision makers.  I hope that more entrepreneurs recognize that enterprise software could be social, cool, and lucrative. This also strengthens market position for the vendors such as Box and Asana.

It's impressive what an incumbent can do when they decide to execute on their strategy. Microsoft is fighting multiple battles. They do have the right cards. It's to be seen how they play the game.

Friday, June 15, 2012

Proxies Are As Useful As Real Data

Last year I ran a highly unscientific experiment. I would regularly put a DVD in an open mail bin in my office to mail it back to Netflix, every late Monday afternoon. I would also count the total number of Netflix DVDs put inside that bin by other people. Over a period of time I observed a continuous and consistent decline in the number of DVDs. I compared my results with the numbers released by Netflix. They matched. I'm not surprised. Even though this was an unscientific experiment on a very small sample size with a high degree of variables, it still gave me insights into the overall real data, that I otherwise had no access to.

Proxies are as useful as real data.

When Uber decides to launch a service in a new city or when they are assessing demand in an existing city they use crime data as surrogate to measure neighborhood activity. This measurement is a basic input in calculating the demand. There are many scenarios and applications where access to a real dataset is either prohibitively expensive or impossible. But, a proxy is almost always available and it is good enough in many cases to make certain decisions that eventually can be validated by real data. This approach, even though simple, is ignored by many product managers and designers. Big Data is not necessarily solving the problem of access to a certain data set that you may need, to design your product or make decisions, but it is certainly opening up an opportunity that didn't exist before: ability to analyze proxy data and use algorithms to correlate them with your own domain.

As I have argued before, the data external to an organization is probably far more valuable than the data that they internally have. Until now the organizations barely had capabilities to analyze a subset of their all internal data. They could not even think of doing anything interesting with the external data. This is rapidly going to change as more and more organizations dip their toes in Big Data. Don't discriminate any data sources, internal or external.

Probably the most popular proxy is the per-capita GDP to measure the standard of living. The Hemline Index is yet another example where it is believed that the women's skirts become shorter (higher hemline) during good economic times and longer during not-so-good economic times.

Source: xkcd
Proxy is just a beginning of how you could correlate several data sources. But, be careful. As wise statisticians will tell you, correlation doesn't imply causation. One of my personal favorite example is the correlation between the Yankees winning the worldseries and a democratic president in the oval office. Correlation doesn't guarantee causation, but it gives you insights into where to begin, what question to ask next, and which dataset might hold a key to that answer.This iterative approach wasn't simply feasible before. By the time people got an answer to their first question, it was too late to ask the second question. Ability to go after any dataset anytime you want opens up a lot more opportunities. At the same time when Big Data tools, computing, and access to several external public data sources become a commodity it would come down to human intelligence prioritizing the right questions to ask. As Peter Skomoroch, a principal data scientist at LinkedIn, puts it "'Algorithmic Intuition' is going to be as important a skill as 'Product Sense' in the next decade."

Thursday, May 31, 2012

I Want USPS To Think Outside The Box

Recently I had to go to a consulate to get a visa and the consulate would only accept a USPS money order and a USPS pre-paid envelope. I went to a post office to get those. That particular post office decided to change their business hours that day to open late. I hurriedly drove to a different post office where two out of there clerks didn't know how to issue a pre-paid envelope! At personal level I never look forward to going to a post office. It invariable delays my schedule. I am met with unpleasant customer service and inefficiency everywhere. This is also true with some of the other services that I get but there's one major difference. I cannot opt out of USPS.

USPS anticipates to lose about $7 billion during the fiscal year that ends in September. They even have their own conference called PostalVision 2020 where they have invited technology thought leaders such as Vint Cerf and many others to honestly and seriously look at the issues they have. The agenda is to:
"Postal Vision 2020/2.0 is as much a movement as it is a Conference.  It is a forum for an open and honest dialog to better understand the future of postal communications and shipping, and what this means to those who regulate, supply and use mail.  It’s about sharing ideas and knowledge with the hope of sparking innovation and the creation of new successful business models.  It’s about asking each other lots of difficult questions for which there may be many answers to consider before finding those that serve the long term health of the industry and any particular enterprise."
USPS is broken at so many levels; they have short term as well long term issues to deal with and it is likely to get uglier before it may get better. Channeling Geoffrey Moore, USPS needs to retain their core and and redefine the context. Massive fleet of trucks, logistics, and outlets in all foreseeable locations is their core strength. Postal mail and other related services is their context where they are simply unable to compete because of shrinking addressable market (due to digital communication) and poor service design that applies the legacy mindset to solve today's and tomorrow's problems.

USPS should think outside the box. No pun intended.  

Here are some ideas/suggestions:

Deliver groceries: Remember Webvan? I loved their service during the dot com boom. One of the main reasons they went out of business is they had no expertise on logistics. Since then nothing much has changed in home-delivered grocery business. What if USPS delivered grocery to your home? What if they partnered with a local supermarket and took over their logistics business? This is a complimentary business model. The supermarkets are not in the delivery business and it's not economical for them to enter into the logistics business. This is also a sustainable business that helps the environment. The USPS trucks are on the road no matter what, but now they can take a few cars off the road. This may sound crazy but times are changing and it's time for USPS to rethink what unfair advantage they have over others.

Re-think mail delivery: It's perfectly acceptable to me if I only receive my mail every other day. In many cases, I am fine if I don't get my mail for a week at times. There's nothing time-sensitive about my mail. And with changing demographics, this is true with a lot of other people as well. Incentivize customers to skip mail by offering them discount on other services and have less trucks and less people going around the neighborhoods. This brings the overall cost down and opens up new revenue opportunities.

Double down on self-service: I know USPS is trying hard to add more and more self-service kiosks but they're not enough. Think like Coinstar and Redbox. I should be able to do everything related to USPS at the places where I can get milk at the 11th hour, money from ATM, and gas for my car. They really need to work hard to give people a reason to use USPS when people have much better alternatives to mail packages. Think of UPS, DHL, and FedEx as incumbents and leap frog them at places, using the unfair advantage that USPS has, where they can't possibly compete.

Rethink the identity: USPS doesn't directly receive federal tax dollars and it is expected to meet expenses from the revenue it generates. But, it's not that black and white. Even though USPS doesn't get any tax money it receives plenty of other money via grants and other special funds. It's neither truly a government entity nor truly a business entity. If USPS needs to be fixed it needs to rethink its identity and decide whether it's a complete public sector or a mix of private and public sector and how. Once that identity is set they can follow through on their revenue sources, cost measures, and building an ecosystem of partners. Mixed and complicated business structure introduces complexity at all the levels and prevents the organization to think and execute in a unified way.

Monday, May 21, 2012

Data Is More Important Than Algorithms

Netflix Similarity Map

In 2006 Netflix offered to pay a million dollar, popularly known as the Netflix Prize, to whoever could help Netflix improve their recommendation system by at least 10%. A year later Korbel team won the Progress Prize by improving Netflix's recommendation system by 8.43%. They also gave the source code to Netflix of their 107 algorithms and 2000 hours of work. Netflix looked at these algorithms and decided to implement two main algorithms out of it to improve their recommendation system. Netflix did face some challenges but they managed to deploy these algorithms into their production system.

Two years later Netflix awarded the grand prize of $1 million to the work that involved hundreds of predictive models and algorithms. They evaluated these new methods and decided not to implement them. This is what they had to say:
"We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment. Also, our focus on improving Netflix personalization had shifted to the next level by then."
This appears to be strange on the surface but when you examine the details it totally makes sense.

The cost to implement algorithms to achieve incremental improvement isn't simply justifiable. While the researchers worked hard on innovating the algorithms Netflix's business as well as their customers' behavior changed. Netflix saw more and more devices being used by their users to stream movies as opposed to get a DVD in mail. The main intent behind the million dollar prize for Netflix was to perfect their recommendation system for their DVD subscription plan since those subscribers carefully picked the DVDs recommended to them as it would take some time to receive those titles in mail. Customers wanted to make sure that they don't end up with lousy movies. Netflix didn't get any feedback regarding those titles until after their customers had viewed them and decided to share their ratings.

This customer behavior changed drastically when customers started following recommendations in realtime for their streaming subscription. They could instantaneously try out the recommended movies and if they didn't like them they tried something else. The barrier to get to the next movie that the customers might like significantly went down. Netflix also started to receive feedback in realtime while customers watched the movies. This was a big shift in user behavior and hence in recommendation system as customers moved from DVD to streaming.

What does this mean to the companies venturing into Big Data?

Algorithms are certainly important but they only provide incremental value on your existing business model. They are very difficult to innovate and way more expensive to implement. Netflix had a million dollar prize to attract the best talent, your organization probably doesn't. Your organization is also less likely to open up your private data into the public domain to discover new algorithms. I do encourage to be absolutely data-driven and do everything that you can to have data as your corporate strategy including hiring a data a scientist. But, most importantly, you should focus on your changing business — disruption and rapidly changing customer behavior — and data and not on algorithms. One of the promises of Big Data is to leave no data source behind. Your data is your business and your business is your data. Don't lose sight of it. Invest in technology and more importantly in people who have skills to stay on top of changing business models and unearth insights from data to strengthen and grow business. Algorithms are cool but the data is much cooler.

Monday, April 30, 2012

Fixing Software Patents, One Hack At Time

Software patents are broken and patent trolls are seriously hurting innovation. Companies are spending more money on buying patents to launch offensive strikes against other companies instead of competing by building great products. There are numerous patent horror stories I could outline where they are being used for all purposes except to innovate. In fact the software patent system as it stands today has nothing to do with innovation at all. This is the sad side of the Silicon Valley. While most people are whining about how software patent trolls are killing innovation some are trying to find creative ways to fix the problems. This is why it was refreshing to see Twitter announcing their policy on patents, Innovator's Patent Agreement, informally called IPA. As per IPA, patents can only be used in an offensive litigation if the employees who were granted the patents consent to it. I have no legal expertise to comment on how well IPA itself might hold up in a patent litigation but I am thrilled to see companies like Twitter stepping up to challenge the status quo by doing something different about it. If you're an employee you want three things: innovate, get credit for your innovation, and avoid your patents being used as an offensive tool. IPA is also likely to serve as a hiring magnet for great talent. Many other companies are likely to follow the suit. I also know of a couple of VCs that are aggressively pushing their portfolio companies to adopt IPA.

The other major challenge with software patents is the bogus patents granted based on obvious ideas. I really like the approach taken by Article One Partners to deal with such patent trolls. Article One Partners crowdsources the task of digging the prior art to identify bogus patents and subsequently forces the US patent office to invalidate them. Turns out that you don't have to be a lawyer to find prior art. Many amateurs who love to research this kind of stuff have jumped into this initiative and have managed to find prior art for many bogus patents. It's very hard to change the system but it's not too hard to find creative ways to fix parts of the system.

I would suggest going beyond the idea of crowdsourcing the task to find the prior art. We should build open tools to gather and catalog searchable prior art. If you have an idea just enter into that database and it becomes prior art. This would make it incredibly difficult for any company to patent an obvious idea since it would already be a prior art. We should create prior art instead of reactively research for it. Open source has taught us many things and it's such a vibrant community. I can't imagine the state of our industry without open source. Why can't we do the same for patents? I want to see Creative Commons of patents.

The industry should also create tools to reverse translate patents by taking all the legal language out of it to bring transparency to show for what purposes that patents are being granted for.

I would also want to see an open source like movement where a ridiculously large set of patents belong to one group - a GitHub of patents. And that group will go after anyone who attempt to impede innovation by launching an offensive strike. If you can't beat a troll then become one.

Silicon valley is a hacker community and hackers should do what they are good at, hack the system — to fix it — using creative ways.


Wednesday, April 18, 2012

4 Big Data Myths - Part II

This is the second and the last part of this two-post series blog post on Big Data myths. If you haven't read the first part, check it out here.

Myth # 2: Big Data is an old wine in new bottle

I hear people say, "Oh, that Big Data, we used to call it BI." One of the main challenges with legacy BI has been that you pretty much have to know what you're looking for based on a limited set of data sources that are available to you. The so called "intelligence" is people going around gathering, cleaning, staging, and analyzing data to create pre-canned "reports and dashboards" to answer a few very specific narrow questions. By the time the question is answered its value has been diluted. These restrictions manifested from the fact that the computational power was still scarce and the industry lacked sophisticated frameworks and algorithms to actually make sense out of data. Traditional BI introduced redundancies at many levels such as staging, cubes etc. This in turn reduced the the actual data size available to analyze. On top of that there were no self-service tools to do anything meaningful with this data. IT has always been a gatekeeper and they were always resource-constrained. A lot of you can relate to this. If you asked the IT to analyze traditional clickstream data you became a laughing stroke.

What is different about Big Data is not only that there's no real need to throw away any kind of data, but the "enterprise data", which always got a VIP treatment in the old BI world while everyone else waited, has lost that elite status. In the world of Big Data, you don't know which data is valuable and which data is not until you actually look at it and do something about it. Every few years the industry reaches some sort of an inflection point. In this case, the inflection point is the combination of cheap computing — cloud as well as on-premise appliances — and emergence of several open computing data-centric software frameworks that can leverage this cheap computing.

Traditional BI is a symptom of all the hardware restrictions and legacy architecture unable to use relatively newer data frameworks such as Hadoop and plenty of others in the current landscape. Unfortunately, retrofitting existing technology stack may not be that easy if an organization truly wants to reap the benefits of Big Data. In many cases, buying some disruptive technology is nothing more than a line item in many CIOs' wish-list. I would urge them to think differently. This is not BI 2.0. This is not a BI at all as you have known it.

Myth # 1: Data scientist is a glorified data analyst

The role of a data scientist has exponentially grown in its popularity. Recently, DJ Patil, a data scientist in-residence at Greylock, was featured on Generation Flux by Fast Company. He is the kind of a guy you want on your team. I know of a quite a few companies that are unable to hire good data scientists despite of their willingness to offer above-market compensation. This is also a controversial role where people argue that a data scientist is just a glorified data analyst. This is not true. Data scientist is the human side of Big Data and it's real.

If you closely examine the skill set of people in the traditional BI ecosystem you'll recognize that they fall into two main categories: database experts and reporting experts. Either people specialize in complicated ETL processes, database schemas, vendor-specific data warehousing tools, SQL etc. or people specialize in reporting tools, working with the "business" and delivering dashboards, reports etc. This is a broad generalization, but you get the point. There are two challenges with this set-up: a) the people are hired based on vendor-specific skills such as database, reporting tools etc. b) they have a shallow mandate of getting things done with the restrictions that typically lead to silos and lack of a bigger picture.

The role of a data scientist is not to replace any existing BI people but to complement them. You could expect the data scientists to have the following skills:

  • Deep understanding of data and data sources to explore and discover the patterns at which data is being generated. 
  • Theoretical as well practical (tool) level understanding of advanced statistical algorithms and machine learning.
  • Strategically connected with the business at all the levels to understand broader as well deeper business challenges and being able to translate them into designing experiments with data.  
  • Design and instrument the environment and applications to generate and gather new data and establish an enterprise-wide data strategy since one of the promises of Big Data is to leave no data behind and not to have any silos.

I have seen some enterprises that have a few people with some of these skills but they are scattered around the company and typically lack high level visibility and an executive buy-in.

Whether data scientists should be domain experts or not is still being debated. I would strongly argue that the primary skill to look for while hiring a data scientist should be how they deal with data with great curiosity and asking a lot of whys and not what kind of data they are dealing with. In my opinion if you ask a domain expert to be a data expert, preconceived biases and assumptions — knowledge curse —  would hinder the discovery. Being naive and curious about a specific domain actually works better since they have no pre-conceived biases and they are open to look for insights in unusual places. Also, when they look at data in different domains it actually helps them to connect the dots and apply the insights gained in one domain to solve problems in a different domain.

No company would ever confess that their decisions are not based on hard facts derived from extensive data analysis and discovery. But, as I have often seen, most companies don't even know that many of their decisions could prove to be completely wrong had they have access to right data and insights. It's scary, but that's the truth. You don't know what you don't know. BI never had one human face that we all could point to. Now, in the new world of Big Data, we can. And it's called a data scientist.

Photo courtesy: Flickr

Friday, March 30, 2012

4 Big Data Myths - Part I

It was cloud then and it's Big Data now. Every time there's a new disruptive category it creates a lot of confusion. These categories are not well-defined. They just catch on. What hurts the most is the myths. This is the first part of my two-part series to debunk Big Data myths.

Myth # 4: Big Data is about big data

It's a clear misnomer. "Big Data" is a name that sticks but it's not just about big data. Defining a category just based on size of data appears to be quite primitive and rather silly. And, you could argue all day about what size of data qualifies as "big." But, the name sticks, and that counts. The insights could come from a very small dataset or a very large data set. Big Data is finally a promise not to discriminate any data, small or large.

It has been prohibitively expensive and almost technologically impossible to analyze large volumes of data. Not any more. Today, technology — commodity hardware and sophisticated software to leverage this hardware — changes the way people think about small and large data. It's a data continuum. Big Data is not just about technology, either. Technology is just an enabler. It has always been. If you think Big Data is about adopting new shiny technology, that's very limiting. Big Data is an amalgamation of a few trends - data growth of a magnitude or two, external data more valuable than internal data, and shift in computing business models. The companies mainly looked at their operational data, invested into expensive BI solutions, and treated those systems as gold. Very few in a company got very little value out of those systems.

Big Data is about redefining what data actually means to you. Examine the sources that you never cared to look at before, instrument your systems to generate the kind of data that are valuable to you and not to your software vendor. This is not about technology. This is about completely new way of doing business where data finally gets the driver's seat. The conversations about organizations' brands and their competitors' brands are happening in social media that they neither control nor have a good grasp of. At Uber, Bradly Voytek, a neuroscientist is looking at interesting ways to analyze real-time data to improve the way Uber does business. Recently, Target came under fire for using data to predict future needs of a shopper. Opportunities are in abundance.

Myth # 3: Big Data is for expert users    

The last mile of Big Data is the tools. As technology evolves the tools that allow people to interact with data have significantly improved, as well. Without these tools the data is worth nothing. The tools have evolved in all categories ranging from simple presentation charting frameworks to complex tools used for deep analysis. With rising popularity and adoption of HTML 5 and people's desire to consume data on tablets, the investment in presentation side of the tools have gone up. Popular javascript frameworks such as D3 have allowed people to do interesting things such as creating a personal annual report. Availability of a various datasets published by several public sector agencies in the US have also spurred some creative analysis by data geeks such as this interactive report that tracks money as people move to different parts of the country.

The other exciting trend has been the self-service reporting in the cloud and better abstraction tools on top of complex frameworks such as Hadoop. Without self-service tools most people will likely be cut off from the data chain even if they have access to data they want to analyze. I cannot overemphasize how important the tools are in the Big Data value chain. They make it an inclusive system where more people can participate in data discovery, exploration, and analysis. Unusual insights rarely come from experts; they invariably come from people who were always fascinated by data but analyzing data was never part of their day-to-day job. Big Data is about enabling these people to participate - all information accessible to all people.

Coming soon in the Part II: Myth # 2 and Myth # 1.