Home Explore Marketing data driven techniques

Marketing data driven techniques

Published by atsalfattan, 2023-01-10 09:24:14

Description: Marketing data driven techniques

Read the Text Version

Pages:

XI Internet and Social Marketing Chapter 42: Networks Chapter 43: The Mathematics Behind The Tipping Point Chapter 44: Viral Marketing Chapter 45: Text Mining

42 Networks You may be familiar with the movie The Social Network that describes the early days of Facebook. In general, a network consists of points (usually called nodes) that are connected by links (sometimes called arcs). You can easily associate (a huge!) network with Facebook in which the nodes are the members of Facebook and a link exists between two nodes if the people represented by the two nodes are friends. In this chapter you learn how marketing analysts describe networks and gain insight into how networks such as Facebook evolve. Applications of network theory to the spread of new products are also discussed. The chapter closes with a brief discussion of the well-known Klout score, which purports to measure an individual’s online influence. Measuring the Importance of a Node It is important to have a way to measure the importance of a node or link. For example, on the Internet, Google, Bing, and Amazon.com are clearly more impor- tant nodes than the author’s blog (www. waynewinston.com). Marketers would love to have a measure of inﬂuence so that they can reach the most inﬂuential people and have these people spread the word about their products. This section discusses three metrics (assuming that each link in the network is bidirectional) that you can use to measure the importance of a node: ■ Degree centrality ■ Closeness centrality ■ Betweenness centrality Degree Centrality For any node its degree centrality is simply the number of nodes connected to the given node by a link. Take a look at the network shown in Figure 42-1.

622 Part XI: Internet and Social Marketing Paris Blake Lindsay Blair Helen Jessica Figure 42-1: Example of a simple network Table 42-1 shows the degree centrality of each node: Table 42-1: Degree Centrality Degree Centrality Node 3 Paris 2 Blake 4 Blair 3 Jessica 1 Helen 3 Lindsay For example, Lindsay’s degree centrality is 3 because there are links connecting Lindsay to Paris, Blair, and Jessica. Also Helen’s degree centrality is 1 because the only link to Helen is Blair. Degree centrality indicates that Blair is most inﬂuential and is only slightly more important than Jessica and Lindsay. Referring to Figure 42-1, however, it appears that Blair is much more inﬂuential than say Lindsay. For example, removing Blair from the network would result in there being no path to Helen. You can soon see that betweenness centrality (see the section “Betweenness Centrality”) illuminates Blair’s importance to the network. Therefore, the shortcoming of degree centrality is that it does not measure the extent to which a node’s links help connect pairs of nodes.

Networks 623 Closeness Centrality One way to look at the importance of a node is to assume a node is more important if the node is close to other nodes. To implement this idea, pick a given node (say Paris) and ﬁnd the shortest path from Paris to each other node. Averaging those path lengths gives you an idea of how far Paris is from the rest of the network. Taking the reciprocal of the average path length (that is 1 / average path length) yields your measure of closeness centrality. This measure of closeness centrality is larger for nodes that tend to be closer to the other nodes in the network. Referring to the network in Figure 42-1, you can determine Paris’ closeness centrality as follows: ■ Shortest path from Paris to Blake has length 1. ■ Shortest path from Paris to Blair (Paris-Blake-Blair) has length 2. ■ Shortest path from Paris to Jessica has length 1. ■ Shortest path from Paris to Helen has length 3 (Paris-Lindsay-Blair-Helen). ■ Shortest path from Paris to Lindsay is length 1. ■ The average of the length of these shortest paths is: 1+2+1+3+1 = 1.6 so Paris’ closeness centrality is 1 / 1.6 = 5 / 8. 5 In a similar fashion you can ﬁnd the closeness centrality of all nodes in Figure 42-1, as shown in Table 42-2. Table 42-2: Closeness Centralities for Figure 42-1 Node Closeness Centrality Paris 5 / 8 = 0.625 Blake 5 / 8 = 0.625 Blair 5 / 6 = 0.833 Jessica 5 / 8 = 0.625 Helen 5 / 10 =0.50 Lindsay 5 / 8 = 0.625 Closeness centrality indicates that Lindsay and Blake are almost as important as Blair. Note, however, the closeness centrality values of the nodes are close in value. This is typical, and changing even a large network by adding or deleting a few links can result in a large change in how the nodes rank for closeness centrality. Because a node with a large closeness centrality is close to the other nodes, a node with large closeness centrality is a good position to view what happens in the network. Closeness centrality does not measure the power of a node to inﬂuence how information ﬂows

624 Part XI: Internet and Social Marketing through the network. To measure the power of a node to inﬂuence the ﬂow of infor- mation through the network, you need the measure of betweenness centrality. Betweenness Centrality Suppose a marketer wants to spread knowledge of a new product throughout a network. She would then want to know which nodes have the most impact on the spread of information through the network. To illustrate how the concept of between- ness centrality measures a node’s impact of information spread, focus on Blair. To compute Blair’s betweenness centrality, follow these steps: 1. For each pair of nodes excluding Blair, ﬁnd all shortest paths between the pair of nodes. For example, for the Blake-Lindsay pair, two shortest paths exist (Blake-Paris-Lindsay and Blake-Blair-Lindsay). 2. Determine the fraction of these shortest paths that include Blair. In this case ½ = 0.5 of the shortest paths that include Blair. Blair now earns 0.5 points toward betweenness centrality. 3. Summing Blair’s points over the 10 pairs of nodes which exclude Blair yields Blair’s betweenness centrality. All the computations needed to compute Blair’s and Paris’ betweenness centrality are summarized in Table 42-3 and Table 42-4. Table 42-3: Blair’s Betweenness Centrality Pair of Nodes Shortest Paths Points for Blair 0/1=0 Paris Blake Paris-Blake 0/1=0 2/2=1 Paris Jessica Paris-Jessica 0/1=0 Paris Helen Paris-Blake-Blair-Helen and 1/1=1 Paris-Lindsay-Blair-Helen 1/1=1 1 / 2 = 0.5 Paris Lindsay Paris-Lindsay 1/1=1 0/1=0 Blake Jessica Blake-Blair-Jessica 1/1=1 Blake Helen Blake-Blair-Helen Blake Lindsay Blake-Paris-Lindsay and Blake-Blair-Lindsay Jessica Helen Jessica-Blair-Helen Jessica Lindsay Jessica-Lindsay Helen Lindsay Helen-Blair-Lindsay Adding up the last column of this table, you can ﬁnd that Blair’s betweenness centrality is 5.5.

Networks 625 Table 42-4 shows the calculations needed to compute Paris’ betweenness centrality. Table 42-4 Paris’ Betweenness Centrality Pair of Nodes Shortest Paths Points for Paris 0/1=0 Blake Blair Blake-Blair 0/1=0 0/1=0 Blake Jessica Blake-Blair-Jessica 1 / 2 = 0.5 0/1=0 Blake Helen Blake-Blair-Helen 0/1=0 0/1=0 Blake Lindsay Blake-Paris-Lindsay and Blake-Blair-Lindsay 0/1=0 0/1=0 Blair Jessica Blair-Jessica 0/1=0 Blair Helen Blair-Helen Blair Lindsay Blair-Lindsay Jessica Helen Jessica-Blair-Helen Jessica Lindsay Jessica-Lindsay Helen Lindsay Helen-Blair-Lindsay Adding up the last column of this table, you can see that Paris’ betweenness centrality measure is 0.5. This indicates that Paris is not important in passing information through the network. This is good because while in South Africa, Paris Hilton said, “I love Africa in general. South Africa and West Africa they are both great countries.” (http://www.foxnews.com/story/2008/03/25/ paris-hilton-west-africa-is-great-country/) Table 42-5 summarizes the betweenness centrality for each node referred to in Figure 42-1. Table 42-5: Betweenness Centrality for Figure 42-1 Node Betweenness Centrality Paris 0.5 Blake 2 / 3 = 0.67 Blair 5.5 Lindsay 2 / 3 = 0.67 Helen 0 Jessica 2 / 3 = 0.67

626 Part XI: Internet and Social Marketing Table 42-5 makes it clear that Blair is the key to spreading information through the network. Measuring the Importance of a Link Analogous to betweenness centrality for a node, you can deﬁne link betweenness as a measure of a link’s importance in the network. To illustrate the concept, determine (see Table 42-6) the link betweenness for the Blake-Blair link. 1. For each pair of nodes, ﬁnd all shortest paths between the pair of nodes. For example, for the Blake-Lindsay pair of nodes, two shortest paths of length 2 exist (Blake-Paris-Lindsay and Blake-Blair-Lindsay). 2. Determine the fraction of these shortest paths that include the Blake-Blair link. In this case ½ = 0.5 of the shortest paths includes Blair. The Blake-Lindsay pair of nodes now earns 0.5 points toward link betweenness for the Blake- Blair link. 3. Summing these points over all pairs of nodes yields the Blake-Blair link’s link betweeness. Table 42-6: Link Betweenness for Blake-Blair Link Node Pair Number of Number of Shortest Points Shortest Paths Paths Between Contributed Between Node Node Pair Including to Link Pair the Blake-Blair Link Betweenness 1/1=1 Blake-Blair 1 1 1/1=1 1/1=1 Blake-Jessica 1 1 1 / 2 = 0.5 0/1=0 Blake-Helen 1 1 0/1=0 0/1=0 Blake-Lindsay 2 1 0/1=0 1 / 2 = 0.5 Blake-Paris 1 0 0/1=0 0/1=0 Blair-Jessica 1 0 0/1=0 Blair-Helen 1 0 continues Blair-Lindsay 1 0 Blair-Paris 2 1 Jessica-Helen 1 0 Jessica-Lindsay 1 0 Jessica-Paris 1 0

Networks 627 Table 42-6: Link Betweenness for Blake-Blair Link (continued) Node Pair Number of Number of Shortest Points Helen-Lindsay Shortest Paths Paths Between Contributed Between Node Node Pair Including to Link Pair the Blake-Blair Link Betweenness 1 0 0/1=0 Helen-Paris 2 1 1 / 2 = 0.5 Lindsay-Paris 1 0 0/1=0 Adding up the third column of Table 42-6 shows the link betweenness for the Blake-Blair link is 4.5. Table 42-7 shows the computation of the link betweenness for the Jessica- Blair link. Table 42-7: Computation of Link Betweenness for Jessica-Blair link Node Pair Number of Number of Points Blake-Blair Shortest Paths Shortest Paths Contributed Between Node Between Node to Link Pair Pair Including the Betweenness Jessica-Blair Link 1 0/1=0 0 Blake-Jessica 1 1 1/1=1 Blake-Helen 1 0 0/1=0 Blake-Lindsay 2 0 0/2=0 Blake-Paris 1 0 0/1=0 Blair-Jessica 1 1 1/1=1 Blair-Helen 1 0 0 / 1 =0 Blair-Lindsay 1 0 0/1=0 Blair-Paris 2 0 0/2=0 Jessica-Helen 1 1 1/1=1 Jessica-Lindsay 1 0 0/1=1 Jessica-Paris 1 0 0/1=0 Helen-Lindsay 1 0 0 / 1 =0 Helen-Paris 2 0 0/2=0 Lindsay-Paris 1 0 0/1=0

628 Part XI: Internet and Social Marketing Adding up the numbers in the third column, you ﬁnd that the Jessica-Blair link has a link betweenness of 3. Summarizing Network Structure Because large networks are complex, you need simple metrics that can be used to summarize a network’s structure. In Chapter 3, “Using Excel Functions to Summarize Marketing Data,” you learned how a large data set could be summa- rized by two numbers: the mean or median as a measure of typical value and the standard deviation as a measure of spread about the mean. In this section you learn how the structure of a large complex network can be summarized by two numbers: ■ L = a measure of the average distance between network nodes. ■ C = a local cluster coefficient, which measures the extent to which your friends are friends of one another. Six Degrees of Separation In 1967, Harvard sociology professor Stanly Milgram performed an interesting experi- ment. He gave 296 residents of Omaha, Nebraska, a letter and the name and address of a stockbroker living in a Boston suburb. The goal was to get the letter to the stock- broker with the minimum number of mailings, but the rule of the game was each time the letter was mailed the letter had to be mailed to a friend of the person mailing the letter. Two hundred and seventeen of the Omaha residents mailed the letter, and 64 made it to the stockbroker. Each time the letter mailed was referred to as a “hop.” On average it took 5.2 hops to get the letter to the stockbroker (never more than 10 hops!) and the median number of hops was 6, hence the phrase “six degrees of separation.” Another example of six degrees of separation is the famous six degrees of the actor Kevin Bacon. Deﬁne a network in which the nodes are movie actors and actresses, and there is a link between two actors and/or actresses if they appeared in the same movie. Most actors can be linked to Kevin Bacon in six links or less. The website http://oracleofbacon.org/ enables you to ﬁnd the path linking an actor or actress to Bacon. For example, as shown in Figure 42-2, Cate Blanchett can be linked to Kevin Bacon through John Goodman. You can then say Cate Blanchett’s Bacon number is 2. Deﬁnition and Computation of L For any network deﬁne L = average over all pairs of nodes of the length of the shortest between the pairs of nodes. Essentially a small value for L means that for a randomly chosen pair of nodes it is likely that there exists a fairly short path connecting the

Networks 629 nodes. On the other hand, a large value of L indicates that many pairs of nodes are not connected by short paths. Cate Blanchett has a Bacon number of 2. Cate Blanchett was in The Monuments Men (2013) with John Goodman (I) was in Death Sentence (2007) with Kevin Bacon Figure 42-2: Cate Blanchett’s Bacon number For the network in Figure 42-1, the computations for L are shown in Table 42-8. Table 42-8: Computation of L for Figure 42-1 Node Pair Length of Shortest Path Between Node Pair Blake-Blair 1 Blake-Jessica 2 Blake-Helen 2 Blake-Lindsay 2 Blake-Paris 1 Blair-Jessica 1 Blair-Helen 1 Blair-Lindsay 1 Blair-Paris 2 Jessica-Helen 2 Jessica-Lindsay 2 Jessica-Paris 1 Helen-Lindsay 2 Helen-Paris 3 Lindsay-Paris 1

630 Part XI: Internet and Social Marketing Adding the second column you get 24, so L = 24 / 15 = 1.6. For such a small network, the small value of L is not surprising. The ﬁlm actor network amazingly has L = 3.65. Amazingly the Facebook friends network has L = 4.7 and for just U.S Facebook users L = 4.3! The Local Cluster Coefﬁcient A network’s local cluster coefficient is a number between 0 and 1 that measures the tendency of a person’s friends to be friends of one another. Because most of your friends know each other, you would expect most social networks to have a relatively large cluster coefficient. Before deﬁning a network’s local cluster coefficient, you need a deﬁnition: A neighbor of node n is any node connected by a link to node n. To deﬁne a network’s local clustering coefficient C, you deﬁne for each node n a cluster coefficient Cn, which is the fraction of node n’s pairs of neighbors that are linked to each other. Then C is obtained by averaging Cn over all nodes. The following illustrates the determination of C for the network pictured in Figure 42-1: ■ Paris has three pairs of friends (Blake and Lindsay, Blake and Jessica, and Lindsay and Jessica) and one of these pairs (Lindsay and Jessica) is linked, 1. ■ so CParis = o3ne pair of friends (Paris and Blair) and they are not linked, so Blake has CBlake = 0. ■ Blair has four friends (Blake, Jessica, Helen, and Lindsay), which results in six pairs of friends (Blake and Jessica, Blake and Helen, Blake and Lindsay, Jessica and Helen, Jessica and Lindsay, and Helen and Lindsay.) Of the six 1. pairs of friends, only Jessica and Lindsay are linked, so CBlair = 6 ■ Jessica has three pairs of friends (Blair and Lindsay, Blair and Paris, and Lindsay and Paris). Blair and Lindsay and Lindsay and Paris are linked, so 2. CJessica = 3 ■ Helen has no pair of friends, so you can omit her when computing the average of the cluster coefficients for each node. ■ Lindsay has three pairs of friends (Blair and Jessica, Blair and Paris, and Paris 2. and Jessica). Paris and Jessica and Blair and Jessica are linked, so CLindsay = 3 You now ﬁnd that the following is true: C= 1 +0+ 1 + 2 + 2 11 3 6 3 3 = = 0.37 5 30 The ﬁlm actor/actress network has C = 0.79.

Networks 631 The next two sections use your understanding of L and C to demonstrate how various real-life networks were created. Random and Regular Networks This section looks at random and regular networks and then discusses the seminal work of Steven Strogatz and Duncan Watts (then of Cornell) on small world net- works (see “Collective Dynamics of ‘small-world’ networks,” Nature, June 4, 1998). Random Networks Consider a network with n nodes. Then there are n * (n − 1) / 2 possible links in this network. You can form a random network by choosing a probability p for each possible link to be included in the network. For example, choose n = 10 and p = 0.25. Then you might obtain the network shown in Figure 42-3. Figure 42-3: Example of random graph In general a random network has a low C and low L. For example recall that for the ﬁlm actor network L = 3.65 and C = 0.79. Strogatz and Watts used simulation to repeatedly generate a random network with the same number of nodes and links as the ﬁlm actor network. They found L = 2.99 and C = 0.08. As this example illustrates, randomly generated networks typically have a small L and a small C. Because most social networks (like the ﬁlm actor network) have low L and high C, it is not likely that a social network was generated by successive random generation of links. Regular Networks Another type of network often studied is a regular network. A network is regular if every node is linked to the same number of nodes. Figure 42-4 shows an example of a regular network on a circle in which each node is linked to four neighboring nodes.

632 Part XI: Internet and Social Marketing It is easy to compute L and C for the network in Figure 42-4 because the network looks the same when viewed from any node. Therefore, you can choose node 1 and compute L for the network by computing (as shown in Table 42-9) the average length of the shortest paths from node 1 to nodes 2–12. You can view the top node in the middle as node 1 and then the nodes are numbered clockwise. Figure 42-4: Regular network Table 42-9: Lengths of Shortest Paths from Node 1 Node Length of shortest path from Node 1 2 1 (1-2) 3 1 (1-3) 4 2 (1-3-4) 5 2 (1-3-5) 6 3 (1-3-5-6) 7 3 (1-3-5-7) 8 3 (1-11-10-8) 9 2 (1-11-9) 10 2 (1-11-10) 11 1 (1-11) 12 1 (1-12)

Networks 633 Averaging the lengths of the paths in the second column, you can find L = 21 / 11. By the symmetry of the network, you can compute C as the fraction of pairs of node 1’s friends that have links between them. Node 1 has six pairs of friends: 11-12, 11-2, 11-3, 12-2, 12-3, and 2-3. Of these pairs 11-12, 12-2, and 2-3 are linked, so C = 3 / 6 = 0.5. If you drew a regular network with more nodes (say 1,000) in which each node has four neighbors, then it is straightforward to show (see Problem 9) that C remains at 1 and L = 125.4. In general for a large regular graph both L and C are large. This i2mplies that social networks such as the movie star network cannot be represented by a regular graph. Thirty-one years after Milgram coined six degrees of separation, Strogatz and Watts ﬁgured out a reasonable expla- nation for the prevalence of social networks having a small L and large C. It’s a Small World After All! Strogatz and Watts began with a regular network like the network shown in Figure 42-5. This network has 10 nodes and each node is linked to two neighboring nodes. Figure 42-5: Regular network 10 nodes Strogatz and Watt’s brilliant insight was to deﬁne for each link a probability (call it PROB) that the link is deleted. Then the deleted link is replaced by a link joining a randomly chosen pair of nodes. Figure 42-6 shows an example of how the regular network might look after two links are deleted and then replaced. Figure 42-6: New network after two links are replaced

634 Part XI: Internet and Social Marketing In the original network node 10 was four links away from node 6. In Figure 42-6 the new link has reduced the distance from node 10 to node 6 to one link. Strogatz and Watts showed that even a small value of PROB can create a network with a much smaller L than the regular network and virtually the same C value as the regular network. Strogatz and Watts referred to the new arcs as weak ties. For most people the great majority of their friends live in their city of residence, but most have several “weaker” acquaintances in different cities. These weak ties provide a possible explanation for the creation of networks (like the movie star network) having a small L and large C. In their wonderful book Networks Illustrated (Edwiser Scholastic Press, 2013) Princeton graduate student Chris Brinton and Princeton professor Mung Chiang report simulations that start with a 600-node network having six links per node. Even if PROB is relatively small (say 0.1) L is reduced by 70 percent and C hardly changes. The Rich Get Richer Google, Facebook YouTube, Yahoo, and Baidu are the ﬁve most-visited websites in the world. The Internet has evolved to the point in which several Internet sites handle lots of traffic, and many sites handle little traffic. Consider the Internet as a network with unidirectional links deﬁned by hyperlinks. For example, your website may contain a link to Amazon.com but Amazon.com does not likely link to your website. Deﬁne a node for the Internet to simply be a URL. The in-degree of a node in this network is the number of websites linking to the given node. More speciﬁcally, let x = in degree of a network URL and y(x) = the number of URLs having in degree x. Then the graph of y(x) versus x follows a Power Law where y = cx-a, where a is thought to be near 2. Amazingly if you graph the relationship between x and y(x), you get a graph like Figure 42-7, which follows the Power Law (so called because x is raised to a power). Figure 42-7: Power Law for networks

Networks 635 Figure 42-7 illustrates the Power Law for the Internet viewed as a network. The many URLs with small x represent the many URLs with few URLs pointing to them. The URLs with large x represent the few URL’s with many nodes pointing toward them. The Power Law with a = 2 implies, for example: 1 ■ 4 as many sites have 1,000,000 nodes pointing to them as have 500,000 nodes pointing to them. ■ 1 as many sites have 2,000,000 nodes pointing to them as have 1,000,000 n4odes pointing to them. Consider for any integers k > 0 and x > 0 the following ratio: Number of nodes with in degree kx Number of nodes with in degree x If a network follows a Power Law then this ratio is independent of x (see Exercise 11). Therefore, networks following the Power Law are also known as scale-free networks. Neither random networks, regular networks, nor small world networks yield a Power Law. In 1976, D.J. Price of Yale University provided an elegant explanation of network evolution that is consistent with Power Laws in his article “A general theory of bibliometric and other cumulative advantage processes” (Journal of American Society of Information Sciences). The following steps detail this explanation: 1. Begin with a network (see Figure 42-8) having two nodes and one link. Node 1 Node 2 Figure 42-8: Rich Get Richer step 1 2. At each step create a new node. The new node will be linked to an existing node and the probability that an existing node is selected is proportional to the number of links possessed by the existing node. In Figure 42-9, node 3 is added. Because nodes 1 and 2 each have one link, there is a 50-percent chance that node 3 will be linked to either node 1 or 2. As shown in Figure 42-9, you can assume that the new link connects node 3 to node 2. Node 1 Node 2 Node 3 Figure 42-9: Rich Get Richer step 2

636 Part XI: Internet and Social Marketing 3. Now add another node (node 4.) Because node 2 has two links and nodes 1 and 3 have one link, there is a 50 percent chance that node 4 will link to node 2, a 25 percent chance node 4 will link to node 1, and a 25 percent chance node 4 will link to node 3. Now suppose that node 4 links to node 2. The resulting network is shown in Figure 42-10. Node 4 Node 1 Node 2 Node 3 Figure 42-10: Step 3: Rich Get Richer If you now add node 5, there would still be a 3 / 6 = 50 percent chance that node 5 links to node 2. Because nodes with more links are more likely to get the newest link, this view of network formulation is called the Rich Get Richer or the method of preferential attachment. Price showed that networks formed by the Rich Get Richer mechanism follow the Power Law. The Rich Get Richer mechanism also provides a possible explanation for why high-tech product markets are sometimes dominated by a single player (such as Microsoft Office or the Google Search engine.) The more people who use Office, the more attractive Office becomes to prospective customers seeking a productivity suite. Similarly, more people using a search engine leads to better performance, making it more likely that people will use the dominant search engine. Klout Score Anyone who uses social media, and especially a marketing analyst, could beneﬁt from knowing how their posts, tweets and other contributions to the Internet move the opinions of others. The website Klout.com aids in this task by creating Klout scores based on Twitter, Facebook, Google+, Instagram, FourSquare, LinkedIn, and even YouTube which purport to measure how Internet content created by a person moves the opinions of other Internet users. For example, on a scale of 0–100 in April 2013, Barack Obama had a Klout Score of 99 and the author had a Klout score of 43.

Networks 637 While nobody outside of Klout knows how an individual’s Klout score is computed the following are believed to be true: ■ Increasing the number of followers you have on Twitter, Facebook, or Instagram will (all other things equal) increase your Klout score. ■ A key to your Klout scores is the likelihood that your activity will be acted upon. For example, increasing your Likes on Facebook or Instagram will increase your Klout score, and being retweeted more often will also increase your Klout score. ■ The inﬂuence of your engaged audience affects your Klout score. For exam- ple, being retweeted by one person with a Klout score of 95 might be more important than being retweeted by 40 people each having a Klout score of 2. ■ Sean Golliher (see www.seangolliher.com/2011/uncategorized/how-i- reversed-engineered-klout-score-to-an-r2-094/) cleverly attempted to “reverse engineer” the computation of Klout score. For 99 people Golliher attempted to predict their Klout scores using only each person’s number of Twitter followers and each person’s number of retweets. Golliher found the following simple equation explained 94 percent of the variation in Klout scores: Klout Score = 23,474- 0.109 * Log (TwitterFollowers) + 4.838 * Log(Retweets) Summary In this chapter you learned the following: ■ A network consists of nodes and links connecting the nodes. ■ The importance of a node can be evaluated by degree centrality, closeness centrality, or betweenness centrality. ■ The importance of a link can be evaluated by link betweenness. ■ The structure of a network can be characterized by L, a measure of the average distance between nodes and C, the local clustering coefficient that measures the tendency of a person’s friends to be friends of one another. ■ Random networks have a small L and small C. ■ Regular networks have a large L and large C. ■ Most social networks have a small value of L (probably caused by weak ties that are the essence of the Strogatz-Watts model) and a large value of C. ■ The Rich Get Richer theory explains how few nodes with lots of traffic came about on the Internet.

638 Part XI: Internet and Social Marketing ■ If you let x = in degree of a network URL and y(x) = the number of URLs having in degree x, then the graph of y(x) versus x follows a Power Law where y = cx-a, where a is thought to be near 2. ■ Klout score measures a person’s online inﬂuence across a variety of channels. Exercises For Exercises 1–3 use the network shown in Figure 42-11. 5 6 1 3 2 7 4 Figure 42-11: Network for exercises 1–3 1. For each node in the network, compute all three centrality measures. 2. Compute the link betweenness measure for nodes 1 and 2. 3. Compute L and C for the network in Figure 42-11. 4. Compute L and C for a regular network on a circle consisting of 12 nodes and 2 links per node. That is, node 1 links to nodes 12 and 2, and so on. 5. Consider the U.S. power grid network where nodes are generators, substa- tions, and transformers and two nodes are linked if there is a transmission line joining them. Explain why you would expect this network to have a large L and small C. 6. The tributaries of the Mississippi River follow a Power Law. Can you explain which variable should go on each axis? 7. Zipf’s Law states that the number of times a word appears in the English language follows a Power Law. Can you explain which variable should go on each axis? 8. How is the Pareto principle (80-20 rule discussed in Chapter 1, “Slicing and Dicing Marketing Data with PivotTables”) related to the Power Law?

Networks 639 9. Consider a regular network on a circle with 1,000 nodes in which each node has links to its four neighbors. Show that L = 125.4 10. Explain why Twitter is a unidirectional network and Facebook is not. 11. Show that for any network following a Power Law Number of nodes with in degree kx Number of nodes with in degree x is independent of x.

43 The Mathematics Behind The Tipping Point Malcolm Gladwell’s book The Tipping Point (Back Bay Books, 2000) has sold nearly 3 million copies. In his book Gladwell explains how little things can have a large effect on determining whether a new product succeeds or fails in the marketplace. This chapter builds on the discussion of networks in Chapter 42, “Networks,” and examines two mathematical models that illuminate some of Gladwell’s key ideas. ■ You begin with an explanation of the classical theory of network contagion, which enables you to determine whether all nodes in a network eventually get turned on. The contagion model enables you to see how little things do indeed make a difference in the spread of a new product. ■ You then modify the Bass model of product diffusion discussed in Chapter 27, “The Bass Diffusion Model,” to further illustrate some of Gladwell’s main ideas. Network Contagion Marketing analysts want to know how knowledge of networks can help spread knowledge of their product. Consider the metaphor that each person who might buy a product is a node in a network. When the product ﬁrst comes out, all nodes are in the “off” position corresponding to nobody having knowledge of the product. If you deﬁne the “on” position for a node as denoting that a person has knowledge of the product, then the marketer’s goal is to turn all nodes on as quickly as possible. Now reconsider the 10-node ring network with two links per node discussed in Chapter 42. Figure 43-1 shows this network.

642 Part XI: Internet and Social Marketing 1 10 2 9 3 84 75 6 Figure 43-1: 10-Node Network with two nodes per link Suppose at present only Person 1 knows about your product. Call a person who knows about the product an on node and a person who does not know about the product an off node. To model the spread of a product (or disease!) the contagion model assumes there is a Threshold level (call it T) between 0 and 1 such that an off node can switch to on if at least a fraction T of a node’s neighbors are on. Assume T = 0.5. Then the following sequence of events can ensue: ■ Round 1: Nodes 2 and 10 turn on. ■ Round 2: Nodes 3 and 9 turn on. ■ Round 3: Nodes 4 and 8 turn on. ■ Round 4: Nodes 5 and 7 turn on. ■ Round 5: Node 6 turns on. Even though you began with only one person knowing about the product, quickly everyone learned about it. Now assume instead that T = 0.51. In Round 1 Nodes 2 and 10 are the only can- didates to turn on. Node 2 has only 50 percent of its neighbors on, so Node 2 does not turn on. The same is true of Node 10. Therefore none of the other nine nodes will ever turn on. This example shows how a small increase in the threshold can make a big difference in who knows about the product. As Gladwell says, “Little things can make a big difference.” Now try and ﬁgure out who eventually will know about the product for the network in Figure 43-2 if T = 0.50 and originally only Node 2 is on. On nodes are shaded in subsequent ﬁgures. ■ Round 1: 50 percent of Node 1 neighbors are on (Node 2 is on and Node 6 is off), so Node 1 turns on. Also 50 percent of Node 5 neighbors (Node 2 is on and Node 6 is off) are on, so Node 5 turns on. Node 3 does not turn on

The Mathematics Behind The Tipping Point 643 because only one of ﬁve neighbors is on. Node 7 has one of ﬁve neighbors on, so it does not turn on. None of Node 4’s, Node 6’s, or Node 8’s neighbors are on, so none turn on. After Round 1 the network looks like Figure 43-3. (On nodes are shaded.) 1234 567 8 Figure 43-2: Node 2 is initially on 1234 5678 Figure 43-3: Round 1: Nodes 1, 2, and 5 are on ■ Round 2: Two out of four neighbors of Node 6 are on, so Node 6 turns on. Node 3 has one of ﬁve neighbors on, so Node 3 does not turn on. Nodes 4 and 8 do not have neighbors on, so neither turns on. Node 7 has one of ﬁve neighbors turned on, so Node 7 remains off. The network now looks like Figure 43-4. 1234 5678 Figure 43-4: Round 2: Nodes 1, 2, 5, and 6 are on

644 Part XI: Internet and Social Marketing ■ Round 3: Node 3 has two of ﬁve neighbors on, so Node 3 stays off. Nodes 4 and 8 have no neighbors on, so they remain off. For Node 7, two out of ﬁve neighbors (or 40 percent) are on, so Node 7 stays off. At this point Nodes 3, 4, 7, and 8 never turn on. This simple example can be tied to some of Gladwell’s key ideas: ■ Connectors (see pages 38–46 of The Tipping Point) who know a lot of people can be the key for a product managing to break out. For example, suppose Node 1 was more connected (say linked to Node 3 and Node 5) and Node 2 was also connected to Node 6. Also suppose T = 0.5 and Nodes 1 and 2 are initially on. You can verify (see Exercise 1) that all nodes in the network shown in Figure 43-5 eventually turn on due to the increased inﬂuence of the connectors: Nodes 1 and 2. Gladwell’s classic example of a connector was Paul Revere spreading the word that “The British are coming.” William Dawes also tried to spread the word, but Revere was better connected, so he was much more successful in spreading the word. The lesson for marketers is that well-connected customers (such as people with high betweenness central- ity) can make the difference between a successful and failed product rollout. 1234 567 8 Figure 43-5: All nodes turn on ■ Gladwell also discusses the importance (see pages 54–55) of weak ties in spreading knowledge of a product. Recall from Chapter 42 that in the Strogatz- Watts Small World model the average distance between nodes in a network can be greatly reduced by introducing a few weak ties that correspond to arcs that connect people who (without the weak ties) are far apart in the network. Many products (such as the Buick Rendezvous in the early 2000s) try hard to spread the word about products in Times Square because people

The Mathematics Behind The Tipping Point 645 in Times Square are often connectors who are not from New York City and have weak ties to people from far-ﬂung areas of the United States and the rest of the world. ■ Mavens (see pages 59–68) are people who are knowledgeable and highly persuasive about a product. In effect mavens reduce the threshold T. You can see (Exercise 2) that all nodes in the network of Figure 43–2 would turn on eventually if you could lower T to 0.4. ■ Great salespeople (see pages 78–87) can make the difference between a suc- cessful and failed product rollout. A great salesperson reduces T because she makes the potential customer less resistant to trying a new product. For an arbitrary 8-node network, the Contagion.xlsx ﬁle enables you to vary T, the links in the network, and the nodes that begin on and trace the path of nodes turning on. Deﬁne a node that is initially on as a seeded node. As shown in Figure 43-6, you enter a 1 in the range D5:K12 for each arc in the network. Also enter T in cell K2 and the initial on nodes are indicated by a 1 in the range D15:K15. As shown in K24 of the Initial Seeding worksheet, only four nodes eventually turn on. In the Seed 2 nodes worksheet (see Figure 43-7) you can see that if the ﬁrm seeded Node 7 as well as Node 2, you could eventually turn the whole market on. This example shows it might pay to give away your product to members of difficult- to-reach market segments. Figure 43-6: Only four nodes turn on when you start with Node 2

646 Part XI: Internet and Social Marketing Figure 43-7: Nodes 2 and 7 on cause all nodes to turn on A Bass Version of the Tipping Point On pages 12 and 13 of his book Gladwell gives several examples of the tipping point concept, including the following: ■ When the fraction of African-Americans in a neighborhood exceeds 20 per- cent, most remaining whites suddenly leave the neighborhood. ■ Teenage pregnancy rates in neighborhoods with between 5 and 40 percent professional workers are relatively constant, but in neighborhoods with 3.2 percent professionals, pregnancy rates double. Essentially the central thesis of The Tipping Point is that in many situations involving social decision making there exists a threshold value (call the threshold p*) for a key parameter (call the parameter p) such that small movements of the parameter around p* can elicit a huge response. In the ﬁrst example p = the fraction of African-Americans and when p > p* = 0.20 a huge social response (more whites moving out) is elicited. In the second example p = fraction of professional workers in the neighborhood and for p < p* = 0.05 a huge social response (more teenage pregnancies) is elicited.

The Mathematics Behind The Tipping Point 647 The idea of a threshold is easily understood if you consider every individual in a population to be either sick or healthy. Let p = probability that a contact between a sick person and a healthy person infects the healthy person. In this context the tipping point corresponds to the existence of a threshold value p* such that a small increase of p above p* elicits a large increase in the number of people who eventually become sick. You can use your knowledge of the Bass diffusion model (see Chapter 27) to analyze this situation. This model demonstrates that a small change in p can result in a large change in the number of people who eventually get infected. The model is in the basstippoint.xlsx ﬁle (see Figure 43-8). The evolution of the number of infected and healthy people at time t = 0, 1, 2, …, 100 is described here: Figure 43-8: Bass Tipping Point model 1. Assume there is a total of 1,000 people (enter this in E3), and at Time 0 nobody has been sick. 2. In cell E1 enter the probability (0.0013) that a contact between a sick and healthy person will infect the healthy person. 3. In cell E2 enter the probability (0.2) that a sick person gets better during a period. This implies that a person is sick for an average of 5 days. When a sick person gets better, he cannot ever infect anyone. This corresponds in the

648 Part XI: Internet and Social Marketing marketing context to a person “forgetting” about a product and not spreading the word about the product. 4. At Time 1 assume 1 person is sick. In cell D7 compute the number of people who will get better at Time 1 with the formula =C7*get_better. 5. Copy this formula to the range D8:D106 to compute the number of people who get better during each of the remaining periods. 6. In cell E7 use the formula =C7*G6 to compute the number of contacts for t = 1 between sick people and people who have never been sick. This formula is analogous to the Bass model formula that models the word-of-mouth term by multiplying those people who have purchased the product times those who have not. 7. Copy this formula to the range E8:E106 to compute the number of contacts between sick people and people who are never sick during the remaining periods. 8. In cell F7 use the formula =infect*E7 to compute the number of contacts for t = 1 that result in infection. 9. Copy this formula to the range F8:F106 to determine the number of new infections during the remaining periods. 10. In cell G7 use the formula =MAX(G6-F7,0) to reduce the number of people who have never been sick by the number of infections at t = 1. This yields the number of people who have not been sick by the end of period 1. 11. Copy this formula to the range G8:G106 to compute for t = 2, 3, …, 100 the number of people who are not sick by the end of period t. Using the max function ensures that the number of people who are not sick will stop at 0 when everyone has become sick. Figure 43-9 shows a two-way data table with row input cell = probability of getting better; column input cell = probability of infection; and output cell =Total–G107, which measures the number of people who have become sick by t =100. As expected, an increase in the chance of getting better decreases the number of people who eventually get sick. This is because an increase in the chance of get- ting better means a sick person has less time to infect healthy people. An increase in the chance of infection increases the number of people who eventually get sick. Also the infection probability needed to infect everyone increases as the chance of getting better increases. This is reasonable because if people are “carriers” for less time, you need a more virulent disease to ensure that everyone is infected. Figure 43-10 summarizes the data table by graphing for each probability of getting better the dependence of the number of people who eventually fall ill on the chance of

The Mathematics Behind The Tipping Point 649 infection. For each curve there is a steep portion that indicates the tipping point for the infection probability. For example, if there is a 50-percent chance of a healthy person getting better, the tipping point appears to occur when the probability of a contact between a healthy and sick person resulting in a new sick person reaches a number between 0.005 and 0.006. Figure 43-9: Data table summarizing number of people who eventually get sick Figure 43-10: Number infected as a function of infection probability

650 Part XI: Internet and Social Marketing The marketing analog of the epidemic model is clear: to be infected is to know about a product and to become healthy means you are no longer discussing the product. After recognizing that the marketer’s goal is to “infect” everyone, this model provides two important marketing insights: ■ Lengthening the amount of time that people talk about your product (decreas- ing chance of becoming healthy) can enhance the spread of your product. ■ Sometimes, a small increase in the persuasiveness of people who discuss your product or a small decrease in product resistance to your product among noncustomers can greatly increase the eventual sales of your product. Summary In this chapter you learned the following: ■ The contagion model assumes that a node will turn on if at least a fraction T of a node’s neighbors is already on. ■ A small difference in T or the number of initial on nodes can make a huge difference in the eventual number of on nodes. ■ Connectors, mavens, and salespeople can provide the extra energy needed for a product to achieve 100-percent market penetration. ■ The Bass version of Gladwell’s tipping point model implies that lengthening the amount of time that people talk about your product (decreasing chance of becoming healthy) can enhance the spread of your product. Also a small increase in the persuasiveness of people who discuss your product or a small decrease in product resistance to your product among noncustomers may greatly increase the eventual sales of your product. Exercises 1. Consider a network on a circle for which each node is linked to the closest four nodes. Suppose Node 1 in currently on. If T = 0.5, which nodes will eventually turn on? If T = 0.3, which nodes will eventually turn on? 2. For the network in Figure 43-2, assume that T = 0.4 and Node 2 is initially on. Show that all nodes will eventually turn on.

The Mathematics Behind The Tipping Point 651 3. Modify the network in Figure 43-2 so that Node 1 is now also linked to Node 3 and Node 5; and Node 2 is now also linked to Node 6. Assume that T = 0.5 and Nodes 1 and 2 are initially on. Verify that if Node 2 is initially on and T = 0.5 that all nodes will eventually turn on.

44 Viral Marketing On July 14, 2010, Old Spice launched a viral video campaign (see www.you- tube.com/watch?v=owGykVbfgUE) involving ex-San Francisco linebacker Isaiah Mustafa. This video received 6.7 million views after 24 hours and 23 million views after 36 hours. Likewise, the famous “Gangnam Style” (www.youtube.com/ watch?v=9bZkp7q19f0) video has now received nearly 2 billion views! Because views of these videos spread quickly like an epidemic, the study of such successes is often referred to as viral marketing. Of course, many videos are posted to YouTube (like the author’s video on Monte Carlo simulation) and receive few views. This chapter discusses two mathematical models of viral marketing that attempt to model the dynamics that cause a video to either go viral or die a quick death. For simplicity this chapter assumes that the viral campaign is a video and you want to describe the viewing history of the video. The two mathematical models attempt to explain how the number of people viewing a video grows over time. Assume at the beginning of the ﬁrst period (t = 1), N people view the video. ■ The ﬁrst model (Watts’ Model) is based on Duncan Watts’ 2007 article “The Accidental Inﬂuentials” (Harvard Business Review, Vol. 85, No. 2, 2007, pp. 22–23). This model provides a simple explanation for the spread of a video, but as you will see, Watts ignores the fact that several people may send the video on to the same person. Watts’ Model predicts the total views of a video based on two parameters: N = initial number of people who view the video and R = the expected number of new viewers generated by a person who has just seen the video. ■ The second model improves on Watts’ Model by including the fact that some of the videos sent on at a given time will be sent to the same person.

654 Part XI: Internet and Social Marketing Watts’ Model Watts assumes that at the beginning of the ﬁrst period (t = 1) the maker of the video “seeds” the video by getting N people to view it. Then during each time period, each new viewer is assumed to pass the video on to R new viewers. This implies that at t = 2, NR new viewers are generated; at time t = 3, NR(NR) = NR2 new viewers are gener- ated; at t = 4, (NR2) * NR = NR3 new viewers are generated; and so on. This implies that there will be a total of S distinct viewers of the video where Equation 1 is true: (1) S = N + NR + NR2 + NR3 + … If R>=1, S will be inﬁnite, indicating a “viral” video. Of course, R cannot stay greater than 1 forever, so in all likelihood R will drop after a while. Assuming that R stays constant at a value less than 1, you may evaluate S by using an old trick from high school algebra. Simply multiply Equation 1 by R, obtaining Equation 2: (2) RS = NR + NR2 + NR3 + … Subtracting Equation 2 from Equation 1 yields S-RS = N. Solving for S you ﬁnd Equation 3: (3) S = N/(1-R). In many situations you know S and N, so you may use Equation 3 to solve for R and ﬁnd R = (S-N)/S. Watts’ Model can be used with many examples of viral marketing campaigns. Listed here are a few for which Watts listed the relevant model parameters: ■ Tom’s Petition was a 2004 petition for gun control. This petition had R = 0.58 and N = 22,582. ■ Proctor and Gamble started a campaign to promote Tide Coldwater as an energy-efficient detergent. This campaign began with N near 900,000 and R = 0.041. ■ The Oxygen Network ran a campaign to raise money for Hurricane Katrina, which had N = 7,064, S = 30,608, and an amazingly large R = 0.769. Watts’ Model shows that the initial seeding (N) and the number of new view- ers (R) are both critical to determining the ﬁnal number of video views. Watts’

Viral Marketing 655 Model assumes, however, that each person reached at, say, time t has never been reached before. This is unreasonable. For example, suppose there is a population of 1,000,000 people, and at the beginning of time t 800,000 people have seen the video. Then it seems highly unlikely that the NRt-1 new viewers the Watts Model generates at Time t are all people who have not already seen the video. Also if R >= 1, Watts predicts an inﬁnite number of people will see the video, and this does not make sense. In the next section you modify, the Watts Model in an attempt to resolve these issues. A More Complex Viral Marketing Model A revised version of Watts’ Model is in the worksheet basic of the workbook viral. xlsx (see Figure 44-1). The model requires the following inputs: ■ The population size N (named as pop and entered in C2). Assume that a maximum of 10 million people might see the video. Note that 1.00 + E + 07 is scientiﬁc notation and is equivalent to 1*107 = 10,000,000. ■ The probability (given range name prob entered in C3) that a person who sees the video will send the video on to at least one person. Assume this prob- ability is 0.1. Assume that everyone who is sent the video views the video. In Exercise 6, you modify this assumption. ■ If the video is sent on, the average number of people (given the range name of people entered in C4) to whom a person will send the video. Assume that on average a person will send the video to 20 people. Note that Watts’ R = prob * people. In this case R = (0.1) * 20 = 2. In this case Watts’ Model would predict an inﬁnite number of people to see the video. As you will see the model predicts that 7,965,382 of the 10,000,000 potential viewers will eventually see the video. ■ In cell E5 enter the number of people who are “seeded” as video viewers at the beginning of Period 1. Assume 10,000 viewers are seeded. During each period t, the model tracks the following quantities: ■ At the start of period t, the number of people who have seen the video ■ The number of people who were newly introduced to the video during period t – 1 and are potential spreaders of the video during period t

656 Part XI: Internet and Social Marketing ■ The probability that a given person will receive the video during period t. Estimating this probability requires some discussion of the binomial and Poisson random variables. ■ The number of new viewers of the video who are created during period t ■ The number of people who have viewed the video by the end of period t ■ Assume 400 time periods Figure 44-1: Improved viral marketing model Before explaining the formulas that underlie the model, you need to brieﬂy con- sider the binomial and Poisson random variables. The Binomial and Poisson Random Variables The binomial random variable is used to compute probabilities in the following situation: ■ N repeated trials occur in which each trial results in success or failure. ■ The probability of success on each trial is P. ■ The trials are independent, that is, whether a given trial results in a success or failure has no effect on the result of the other N-1 trials.

Viral Marketing 657 ■ The Excel BINOMDIST function can be used to compute binomial probabilities in the following situations: ■ Entering the formula =BINOMDIST(x, N, P, 1) in a cell computes the prob- ability of <=x successes in N trials. ■ Entering the formula =BINOMDIST(x, N, P, 0) in a cell computes the prob- ability of exactly x successes in N trials. ■ The mean of a binomial random variable is simply N*P. The ﬁle BinomialandPoisson.xlsx (see Figure 44-2) illustrates the computation of binomial probabilities. Assume that 60 percent of all people are Coke drinkers and 40 percent are Pepsi drinkers. Deﬁne a success such that a person is a Coke drinker. In cell D4 you can use the formula =BINOMDIST(60,100,0.6,1) to compute the probability (53.8 percent) that <= 60 people in a group of 100 are Coke drink- ers. In cell D5 you can use the formula =BINOMDIST(60,100,0.6,0) to compute the probability (8.1 percent) that exactly 60 people are Coke drinkers. Figure 44-2: Illustration of binomial and Poisson probabilities The Poisson random variable is a discrete random variable that can assume the values 0, 1, 2, …. To determine the probability that a Poisson random variable assumes a given value, all you need is the mean (call it M) of the Poisson ran- dom variable. Then the following Excel formulas can be used to compute Poisson probabilities: ■ POISSON(x, M, 1) gives the probability that the value of a Poisson random variable with mean M is ≤ x. ■ POISSON(x, M, 0) gives the probability that the value of a Poisson random variable with mean M = x.

658 Part XI: Internet and Social Marketing The Poisson random variable is relevant in many interesting situations (particu- larly in queuing or waiting line models), but for your purposes, use the fact that when N is large and P is small, binomial probabilities can be well approximated by Poisson probabilities where M = NP. To illustrate this idea, assume that a teen driver has a 0.001 chance of having an accident each day. What is the chance the teen will have 0 accidents in a year? Here deﬁne a “success” on a day to be an accident. You have N = 365 and P =0.001. In cell D9 the formula =BINOMDIST(0,365,0.001,0) computes the chance of 0 accidents (69.41 percent) in a year. Now the mean number of accidents in a year is 0.001(365) = 0.365, so using the Poisson approximation to the binomial, you can estimate the probability of 0 accidents in a year with the formula = POISSON(0,365*0.001,0). You obtain 69.42 percent, which is an accurate approximation. Building the Model of Viral Marketing Armed with your knowledge of the binomial and Poisson random variables, you are now ready to explain how your model estimates the ultimate penetration level for a viral video. In Period 1, 10 percent of the 10,000 people will spread the product. This number is computed in cell F5 with the formula =prob*E5. Now comes the hard part! Of the 10,000 people who have seen the video in Period 1, (.10)*10,000 = 1,000 of them will pass it on. Each of these 1,000 people sends the video to an average of 20 people, so 20,000 e-mails or text messages describing the video will be sent during Period 1. This does not mean (as Watts assumes) that 20,000 new people see the video. This is because it is possible that a single person will receive e-mails or texts about the video from several different people. Now estimate the probability that a person will receive a video during Period 1. For a given person there is a chance 1/pop that each of the 20,000 e-mails or texts sent out during Period 1 will go to the person. Thus on average a person receives 20,000/pop e-mails during Period 1, and the chance that the person receives 0 e-mails can be approximated by = POISSON(0,F5people/pop,TRUE), and the probability that a person will receive at least 1 e-mail during Period 1 is in cell G5 with the formula =1-POISSON(0,F5people/pop,TRUE).

Viral Marketing 659 The following steps allow you to trace the evolution of the number of people who have seen the video: 1. Multiply the number of people who have not yet seen the video (pop – G5) times 0.0002 to compute the number of new Period 1 video viewers. The formula =(pop-E5)*G5 computes the number (1,997.8) of new viewers of the video during Period 1. 2. In cell I5 use the formula =E5+H5 to add the 1,997.8 new viewers to the original 10,000 viewers to obtain the number of total viewers of the video (11,997.8) by the end of Period 1. 3. Copy the formula =I5 from E6 to E7:E404 to compute the number of total viewers at the beginning of the period by simply copying the ending viewers from the previous period. 4. Copy the formula =H5 from F6 to F7:F404 to list the number of people avail- able to spread the video during each period. This number is simply the num- ber of new viewers during the previous period. 5. Copy the formula =1-POISSON(0,F6*prob*people/pop,TRUE) from G6 to G7:G404 to apply the Poisson approximation to the binomial to compute the probability that a person who has not already seen the video will be sent the video during the current period. 6. Copy the formula =(pop-E5)*G5 from H5 to H6:H404 to compute for each period the number of new viewers of the video by multiplying the number of people who have not seen the video times the chance that each person sees the video. 7. Copy the formula =E5+H5 from I5 to I6:I404 to compute the total number of viewers to date of the video by adding previous viewers to the new viewers created during the current period. It is estimated that 7,971,541 people will eventually see the video. Using a Data Table to Vary R In your new viral marketing model, prob and People impact the predicted spread of the video only through their product prob * People, which is the expected number of people to whom each new video viewer passes the video. Watts set prob * People = R. The worksheet data table of the workbook viral.xlsx varied R. Figure 44-3

660 Part XI: Internet and Social Marketing shows the dependence of the ﬁnal viewers on R. Note that until R exceeds 1 the video does not go viral. For example, when R = 0.8, only 49,413 people eventually see the video while if R = 2 nearly 8 million people see the video. Figure 44-3: Dependence of video spread on R Summary In this chapter you learned the following: ■ Let N = initial viewers of a video and R = new viewers generated per person by a video. Then for R>1, the Watts’ Model predicts the eventual number of viewers will be inﬁnite, and for R<1, the Watts’ Model predicts the video will eventually reach N/(1-R) viewers. ■ Because an inﬁnite number of viewers is impossible, the Watts’ Model is ﬂawed. The major ﬂaw is that many people may send the video to the same person. The more complex model (which utilizes the Poisson approximation to the binomial random variable) resolves this problem.

Viral Marketing 661 Exercises 1. Verify that for the Oxygen Network the values of N = 7,064 and S = 30,608 imply that R = 0.769. 2. Using the Watts’ Model estimate the number of new viewers generated by the Coldwater Tide campaign. Use N = 900,000 and R = 0.041. 3. Apply your new viral marketing model to the Tide Coldwater campaign. 4. Determine the dependence of the spread of the video on the population size as the population size varies between 10 and 100 million. 5. For the Watts’ Model estimate the ﬁnal number of viewers for Tom’s Petition. This petition had R = 0.58 and N = 22,582. 6. Suppose a fraction F<1 of people receiving your video view it, but a fraction 1-F do not look at the video. How does this modify the Watts’ Model?

45 Text Mining Every day Twitter handles more than 400 million tweets. Miley Cyrus’ 2013 VMA fiasco generated more than 17 million tweets! Many of these tweets comment on products, TV shows, or ads. These tweets contain a great deal of information that is valuable to marketers. For example, if you read every tweet on a Super Bowl ad, you could determine if the United States liked or hated the ad. Of course, it is impractical to read every tweet that discusses a Super Bowl ad. What is needed is a method to find all tweets and then derive some marketing insights from the tweets. Text mining refers to the process of using statistical methods to glean useful infor- mation from unstructured text. In addition to analyzing tweets, you can use text mining to analyze Facebook and blog posts, movies, TV, and restaurant reviews, and newspaper articles. The data sets from which text mining can glean meaningful insights are virtually endless. In this chapter you gain some basic insights into the methods you can use to glean meaning from unstructured text. You also learn about some amazing applications of text mining. In all of the book’s previous analysis of data, each data set was organized so that each row represented an observation (such as data on sales, price, and advertising during a month) and each column represented a variable of interest (sales each month, price each month, or advertising each month). One of the big challenges in text mining is to take an unstructured piece of text such as a tweet, newspaper article, or blog post and transform its contents into a spreadsheet-like format. This chapter begins by exploring some simple ways to transform text into a spreadsheet- like format. After the text has been given some structure, you may apply many techniques discussed earlier (such as Naive Bayes, neural networks, logistic regres- sion, multiple regression, discriminant analysis, principal components, and cluster

664 Part XI: Internet and Social Marketing analysis) to analyze the text. The chapter concludes with a discussion of several important and interesting applications of text mining including the following: ■ Using text content of a review to predict whether a movie review was posi- tive or negative ■ Using tweets to determine whether customers are happy with airline service ■ Using tweets to predict movie revenues ■ Using tweets to predict if the stock market will go up or down ■ Using tweets to evaluate viewer reaction to Super Bowl ads Text Mining Deﬁnitions Before you can understand any text mining studies, you need to master a few simple deﬁnitions: ■ A corpus is the relevant collection of documents. For example, if you want to evaluate the effectiveness of a Soﬁa Vergara Diet Pepsi ad, the corpus might consist of all tweets containing references to Soﬁa Vergara and Diet Pepsi. ■ A document consists of a list of individual words known as tokens. For example, for the tweet “Love Soﬁa in that Diet Pepsi ad,” it would contain seven tokens. ■ In a tweet about advertising, the words “ad” and “ads” should be treated as the same token. Stemming is the process of combining related tokens into a single token. Therefore “ad” and “ads” might be grouped together as one token: “ad.” ■ Words such as “the,” appear often in text. These common words are referred to as stopwords. Stopwords give little insight into the meaning of a piece of text and slow down the processing time. Therefore, stopwords are removed. This process is known as stopping. In the Soﬁa Vergara tweet, stopping would remove the words “in” and “that.” ■ Sentiment analysis is an attempt to develop algorithms that can automati- cally classify the attitude of the text as pro or con with respect to a topic. For example, sentiment analysis has been used in attempts to mechanically classify movie and restaurant reviews as favorable or unfavorable. Giving Structure to Unstructured Text To illustrate how you can give structure to unstructured text, again consider the problem of analyzing tweets concerning Soﬁa Vergara in Diet Pepsi ads. To begin you need to use a statistical package with text mining capabilities (such as R, SAS,

Text Mining 665 SPSS, or STATISTICA) that can interface with Twitter and retrieve all tweets that are relevant. Pulling relevant tweets is not as easy as you might think. You might pull all tweets containing the tokens Soﬁa, Vergara, Diet, and Pepsi, but then you would be missing tweets such as those shown in Figure 45-1, which contain the token Soﬁa and not Vergara. Figure 45-1: Tweets on Soﬁa Vergara Diet Pepsi ad As you can see, extracting the relevant text documents is not a trivial matter. To illustrate how text mining can give structure to text, you can use the following guidelines to set the stage for an example: ■ Assume the corpus consists of N documents. ■ After all documents undergo stemming and stopping, assume a total of W words occur in the corpus. ■ After stemming and stopping is concluded for i = 1, 2, …,W and j = 1, 2, …, N, let Fij = number of times word i is included in document j. ■ For j = 1, 2, …, W deﬁne Dj = number of documents containing word j. After the corpus of relevant tweets has undergone stemming and stopping, you must create a vector representation for each document that associates a value with each of the W words occurring in the corpus. The three most common vector codings

666 Part XI: Internet and Social Marketing are binary coding, frequency coding, and the term frequency/inverse document frequency score (tf-idf for short). The three forms of coding are deﬁned as follows: NOTE Before coding each document, infrequently occurring words are often deleted. ■ The binary coding simply records whether a word is present in a document. Therefore, the binary representation for the ith word in the jth document is 1 if the ith word occurs in the jth document and is 0 if the ith word does not occur in the jth document. ■ The frequency coding simply counts for the ith word in the jth document the number of times (Fij) the ith word occurs in the jth document. ■ To motivate the term frequency/inverse document frequency score, suppose the words cat and dog each appear 10 times in a document. Assume the cor- pus consists of 100 documents, and 50 documents contain the word cat and only 10 documents contain the word dog. Despite each appearing 10 times in a document, the occurrences of dog in the document should be given more weight than the occurrences of cat because dog appears less frequently in the corpus. This is because the relative rarity of dog in the corpus makes each occurrence of dog more important than an occurrence of cat. After deﬁning Tj = number of tokens in document j, you can deﬁne the following equation: (1) tf − idf = (Fij / Tj) * log(N / Dj). The term Fij / Tj is the relative frequency of word i in document j, whereas the term log(N / Dj) is a decreasing function of the number of documents in which the word i occurs. These deﬁnitions can be illustrated by considering a corpus consisting of the following three statements about Peyton Manning (see Figure 45-2 and the Text mining coding.xlsx) ﬁle. ■ Document 1: Peyton Manning is a great quarterback. ■ Document 2: Peyton is a great passer and has a great offensive line. ■ Document 3: Peyton is the most overrated of all quarterbacks. After stemming and stopping is performed, the documents are transformed into the text shown in rows 10–12. Now you can work through the three vector codings of the text.

Text Mining 667 Figure 45-2: Examples of text mining coding Binary Coding Rows 15–17 show the binary coding of the three documents. You simply assign a 1 if a word occurs in a document and a 0 if a word does not appear in a document. For example, cell F16 contains a 1 because Document 2 contains the word great, and cell G16 contains a 0 because Document 2 does not contain the word most. Frequency Coding Rows 21–23 show the frequency coding of the three documents. You simply count the number of times the word appears in the document. For example, the word great appears twice in Document 2, so you can enter a 2 in cell F22. Because the word “most” does not occur in Document 2, you enter a 0 in cell G22. Frequency/Inverse Document Frequency Score Coding Copying the formula =(D21/$N21)*LOG(3/D$18) from D26 to the range D26:M28 implements Equation 1. In this formula D21 represents Fij; $N21 represents the number of words in the document; 3 is the number of documents in the corpus; and D$18 is the number of documents containing the relevant word. Note that the use of dollar signs ensures that when the formula is copied the number of words in each document and the number of documents containing the relevant word are pulled from the correct cell. The tf-idf coding attaches more signiﬁcance to “overrated”

668 Part XI: Internet and Social Marketing and “all” in Document 3 than to “great” in Document 2 because “great” appears in more documents than “overrated” and “all.” Applying Text Mining in Real Life Scenarios Now that you know how to give structure to a text document, you are ready to learn how to apply many tools developed earlier in the book to some exciting text mining applications. This section provides several examples of how text mining can be used to analyze documents and make predictions in a variety of situations. Text Mining and Movie Reviews In their article “Thumbs up? Sentiment Classiﬁcation using Machine Learning Techniques” (Proceedings of EMNNP, 2002, pp. 79–86) Pang, Lee, and Vaithyanathan of Cornell and IBM applied Naive Bayes (see Chapter 39, “Classiﬁcation Algorithms: Naive Bayes Classiﬁer and Discriminant Analysis”) and other techniques to 2,053 movie reviews in an attempt to use text content of a review as the input into mechan- ical classiﬁcation of a review as positive or negative. The authors went about this process by ﬁrst converting the number of stars a reviewer gave the movie to a positive, negative, or neutral rating. They then applied frequency and binary coding to each review. You would expect that reviews con- taining words such as brilliant, dazzling, and excellent would be favorable, whereas reviews that contained words such as bad, stupid, and boring would be negative. They also used Naive Bayes (see Chapter 39) based on binary coding in an attempt to classify each review, in terms of the Chapter 39 notation C1 = Positive Review, C2 = Negative Review, and C3 = Neutral review. The n attributes X1, X2, …., Xn correspond to whether a given word is present in the review. For example, if X1 represents the word brilliant, then you would expect P(C1 | X1 = 1) to be large and P(C2 | X1 = 1) to be small. The authors then used machine learning (a generalization of the neural networks used in Chapter 15, “Using Neural Networks to Forecast Sales”) to classify each review using the frequency coding. Surprisingly, the authors found that the simple Naive Bayes approach correctly classiﬁed 81 percent of all reviews, whereas the more sophisticated machine learning approach did only marginally better than Naive Bayes with an 82-percent correct classiﬁcation rate.

Pages:

atsalfattan

Marketing data driven techniques

Like this book? You can publish your book online for free in a few minutes!

Create your own flipbook

TOP SEARCH

business design fashion music health life sports home marketing children

Marketing data driven techniques

Description: Marketing data driven techniques

Read the Text Version

atsalfattan

TOP SEARCH

RELATED PUBLICATIONS