Data Mining

  • Most Topular Stories

  • FitBit: A great product with an even better website

    Data Mining: Text Mining, Visualization and Social Media
    Matthew Hurst
    29 Dec 2013 | 10:36 am
    Briefly - Wakako gave me (actually us) a FitBit for Christmas. This is a great product if you are (like me) motivated by data to take action. While I appreciate the device design (small but functional), I really like the thought that has gone in to the data presentation in the dashboard. The displays of the key variables are clean and yet subtle enough to reward interaction by revealing additional dimensions.
  • GrubHub's Phasmid Websites

    Data Mining: Text Mining, Visualization and Social Media
    Matthew Hurst
    3 May 2014 | 9:49 pm
    The rationale behind mining business data directly from the business's own website is that the business has a clear economic motivation to ensure that the data is up to date. If you own a restaurant that changes location, and your website still publishes the former address, those potential customers who visit your site will not be enjoying your delicious offerings. For the web mining proposition to work, it is important to firstly know that you have in your hand a genuine business website and secondly, to have excellent extraction and inference technology to pull the required…
  • The truth about product registration cards

    Data mining News
    23 Jul 2014 | 2:24 pm
    … . It’s a tactic called data mining, the harvesting of personal information …
  • Bing hearts World Cup 2014, Google - not so much

    Data Mining: Text Mining, Visualization and Social Media
    Matthew Hurst
    12 Jul 2014 | 12:19 pm
    While Google has been doing a great job of their front page animations (today's is very nice, illustrating how Brazil and The Netherlands are on their way to Russia for 2018), Bing appears to be far more attentive to actually answering questions about the competition. For example: Compared to Bing's Google's answer brings up some interesting news articles, but Bing brings up stats on the teams and even a prediction of who will win (Cortana - which is driving these predictions - has been doing a perfect job of predicting game outcomes).
  • Apache Hive brings real-time queries to Hadoop

    Computerworld BI and Analytics News
    23 Jul 2014 | 6:23 am
    Hive's SQL-like query language and vastly improved speed on huge data sets make it the perfect partner for an enterprise data warehouse
 
  • add this feed to my.Alltop

    Data Mining: Text Mining, Visualization and Social Media

  • Bing hearts World Cup 2014, Google - not so much

    Matthew Hurst
    12 Jul 2014 | 12:19 pm
    While Google has been doing a great job of their front page animations (today's is very nice, illustrating how Brazil and The Netherlands are on their way to Russia for 2018), Bing appears to be far more attentive to actually answering questions about the competition. For example: Compared to Bing's Google's answer brings up some interesting news articles, but Bing brings up stats on the teams and even a prediction of who will win (Cortana - which is driving these predictions - has been doing a perfect job of predicting game outcomes).
  • GrubHub's Phasmid Websites

    Matthew Hurst
    3 May 2014 | 9:49 pm
    The rationale behind mining business data directly from the business's own website is that the business has a clear economic motivation to ensure that the data is up to date. If you own a restaurant that changes location, and your website still publishes the former address, those potential customers who visit your site will not be enjoying your delicious offerings. For the web mining proposition to work, it is important to firstly know that you have in your hand a genuine business website and secondly, to have excellent extraction and inference technology to pull the required…
  • Hopper - new in the travel space

    Matthew Hurst
    19 Jan 2014 | 11:24 am
    Briefly - Hopper is something new in the travel  / local space. In their own words: What if you could plan an amazing trip based on a vague idea — like “spring surfing in California” or “Mediterranean cruise”? What if logistical information popped up right when you needed it, so you wouldn't have to spend hours on research? This is our vision: to make planning a trip an effortless extension of discovering and exploring new places. We spent several years experimenting with different tools, technology and algorithms to collect, organize and manage massive amounts of…
  • FitBit: A great product with an even better website

    Matthew Hurst
    29 Dec 2013 | 10:36 am
    Briefly - Wakako gave me (actually us) a FitBit for Christmas. This is a great product if you are (like me) motivated by data to take action. While I appreciate the device design (small but functional), I really like the thought that has gone in to the data presentation in the dashboard. The displays of the key variables are clean and yet subtle enough to reward interaction by revealing additional dimensions.
  • Review: Information is Beautiful by David McCandless

    Matthew Hurst
    28 Dec 2013 | 5:23 pm
    Information is Beautiful is a thought provoking labour of love by one of the first true data journalists, David McCandless. It is a simply structured collection of graphical interpretations of a variety of interesting statistics, factoids and opinions. It is compelling in its ability to provoke exclamations of surprise at the relationships between facts (e.g. the financial crisis costing us almost four times more than the expected total cost of the west's adventurism in Iraq and Afghanistan) as well as generating respect for the creativity and design that has gone in to presenting the…
 
  • add this feed to my.Alltop

    The Numbers

  • Restaurant Jobs Take Off, Even as Pay for Servers Stalls

    Rani Molla
    23 Jul 2014 | 8:50 am
    As states debate the minimum wage, tipped workers are going to be a big part of the conversation. That's because there are more of them than ever, and their pay isn't keeping up.
  • After Years of Falling, Highway Deaths Tick Up

    Jessica Sparks
    23 Jul 2014 | 7:07 am
    The number of registered vehicles in the U.S. is on the rise, and so are vehicle-related fatalities, according to data from the U.S. Department of Transportation's Federal Highway Administration.
  • The Foreclosure Fade, and What it Means for the Housing Market

    Nick Timiraos
    22 Jul 2014 | 2:30 pm
    The U.S. housing market appears to be finding its footing after a sharp rise in mortgage rates last summer, on top of some big price gains, deflated sales.
  • When Cancer Is More Common Than Experts Thought

    Jennifer Levitz
    22 Jul 2014 | 12:22 pm
    The chances of a presumed “fibroid” being cancerous had long been considered extremely remote. But in recent years, a growing number of doctors have suggested in studies and formal academic discussions that the medical community was using the wrong denominator to assess risk.
  • Breakfast Sandwiches: Balancing Protein and Calories

    Rani Molla
    22 Jul 2014 | 11:30 am
    As food companies look to cash in on the decline of cereal sales, expect protein-rich foods to headline their fare.
  • add this feed to my.Alltop

    natural language processing blog

  • My ACL 2014 picks...

    5 Jul 2014 | 9:22 am
    Usual caveats: didn't see all papers, blah blah blah. Also look for #acl14nlp on twitter -- lots of papers were mentioned there too!A Tabular Method for Dynamic Oracles in Transition-Based Parsing; Yoav Goldberg, Francesco Sartorio, Giorgio Satta.Jaokim Nivre, Ryan McDonald and I tried searnifying MaltParser back in 2007 and never got it to work. Perhaps this is because we didn't have dynamic oracles and we thought that a silly approximate oracle would be good enough. Guess not. Yoav, Francesco and Giorgio have a nice technique for efficiently computing the best possible-to-achieve dependency…
  • Divergences passed through Bayes' rule

    30 Jun 2014 | 8:30 am
    In a previous post's comments, we talked about Bayes rule and things like that. This got me wondering about the following question:If we know p(A) and p(B|A), we can reconstruct p(A|B) perfectly by Bayes' rule. What if we only have estimates of p(A) and p(B|A)? How does the quality of the reconstruction of p(A|B) vary as a function of the quality of the estimates of the marginal and conditional?I feel like there have to be results along these lines, but I was unable to find them. My next attempt was to prove something, which failed miserably after a few hours.  So, as a good empiricist…
  • Role models

    2 Jun 2014 | 10:52 am
    During grad school, my advisor suggested I identify a recent grad who has been, to me, successful. I could then use him or her as a guide. I picked someone (he now knows who he is), and the exercise was useful: there are lots of ways to be successful in research land, and this helped me focus.RST-relation=Topic-Shift.I'm fairly serious about yoga. I've had a lot of instructors over the years and noticed a high correlation between InstructorILike and InstructorWhoIsMale. Initially I believed this was because male instructors pushed more, and that worked for me. Over time I realized that was…
  • Past tense is not past tense

    30 May 2014 | 12:31 pm
    I took part in a wonderful Dagstuhl workshop this past February on translating morphologically rich languages. (Yeah, I also don't really know why I was invited :P.) But many thanks to Alex, Kevin, Philipp, Helmut and Hans for inviting me. I had a realization during this workshop that I thought I'd share. It's obvious in retrospect, and perhaps in front-spect for many of you. Much of this came up in the discussion with Bonnie Webber, Marion Weller, Martin Volk, Marine Carpuat, Jörg Tiedemann and Maja Popovic, and Maja deserves much credit for her awesome error analysis tool that helped shed…
  • Perplexity versus error rate for language modeling

    16 May 2014 | 2:54 pm
    It's fair to say that perplexity is the de facto standard for evaluating language models. Perplexity comes under the usual attacks (what does it mean? does it correlate with something we care about? etc.), but here I want to attack it for a more pernicious reason: it locks us in to probabilistic models.Background Language modeling---or more specifically, history-based language modeling (as opposed to full sentence models)---is the task of predicting the next word in a text given the previous words. For instance, given the history "Mary likes her coffee with milk and", a good language model…
 
  • add this feed to my.Alltop

    Kevin Hillstrom: MineThatData

  • What Has Changed Since "Hillstrom's Database Marketing"?

    Kevin Hillstrom
    22 Jul 2014 | 8:15 pm
    We're going to take a few days, and talk about how times have changed, through the eyes of books and booklets.This book was first, folks (click here). It took about six months to write the thing. Don Libey graciously published the book.The book was essentially a retrospective of the work I'd done in my last year at Lands' End, and covered a ton of the work I'd done at Eddie Bauer. To promote the book, I started a blog, a place where I'd write on a daily basis. In March of 2006, the first post was published. I was thrilled, months later, when a post would go viral and I'd see ten or fifteen…
  • Forrester Research Annual Report

    Kevin Hillstrom
    21 Jul 2014 | 8:15 pm
    If you're going to read retail annual reports, then you should read research brand annual reports as well. Click here to take a peek at the Forrester Research 2013 Annual Report (or click here).A few tidbits for you.Client retention dropped from 80% in 2011 to 77% in 2012 to 73% in 2013. Net income is at a five year low.Cash is at a five year low.It costs Forrester $39 to produce $100 of services.It costs Forrester $36 to sell and market products and services. Is your ad-to-sales ratio 36%? It's really tough to generate profit at a 36% ad-to-sales ratio.Pre-tax profit dropped from…
  • Elevating Bread And Butter

    Kevin Hillstrom
    20 Jul 2014 | 8:15 pm
    Have you had a chance to watch this video (click here)?What does bread and butter have to do with your business? EVERYTHING!!Do me a favor. Go take a walk through your merchandising department, and identify the one person who possesses this level of passion. Once identified, give that person an opportunity to represent your brand.Think carefully how the message in the video, about bread and butter for crying out loud, compares with the message you see below.
  • Maybe The Best Article I've Read In 2014

    Kevin Hillstrom
    17 Jul 2014 | 8:15 pm
    Click here to read this article about restaurants and mobile. Seriously. Do it now.Use the comments section to describe how you'd fix this challenge.Think carefully how this change in behavior impacts your business.
  • Creative

    Kevin Hillstrom
    16 Jul 2014 | 8:15 pm
    In an omnichannel world, you are supposed to align all channels with beautiful creative, impressive campaigns, and robust technology. You're supposed to tear down all silos (though the vendors who tell you to do this still seem to have siloed sales teams, don't they), and you're supposed to provide a 360 degree view of your business to the customer.Few, if anybody, talks about merchandise and creative.Your creative presentation says a lot about your business. If you were a 27 year old looking for a wig, and you end up on the Paula Young website, you're more than likely to be presented with a…
  • add this feed to my.Alltop

    TIBCO Spotfire's Trends and Outliers

  • How Big Data and Analytics Can Reshape Healthcare

    Spotfire Blogging Team
    22 Jul 2014 | 5:55 am
    The U.S. healthcare system continues to undergo dramatic changes when it comes to how consumers can obtain insurance as well as the growing role of individuals in plotting their own treatments. Big data is expected to play an increasingly significant role in treatment analysis as well as enable healthcare insurers to cater to individual consumers. For instance, the use of analytics can help healthcare providers and payers better predict who needs care and when, according to an article in The Washington Post. “The same way that shopping Web sites can predict what you want to buy, healthcare…
  • Using Analytics to Prioritize Spending on Cybersecurity

    Spotfire Blogging Team
    21 Jul 2014 | 5:55 am
    Spending on cybersecurity is expected to rise this year as a growing number of organizations shift spending from defensive-minded approaches to detection and mitigation of cyberattacks and data breaches. Nearly 70 percent of CIOs expect security spending to represent one of the top segments to gain share of overall IT spending as “security continues to take dollars from other categories,” according to survey of 101 CIOs conducted by UBS AG. The Wall St. Journal reports that the perception among CIOs about security has changed over the last 12 to 18 months as corporate have…
  • Why Big Data is the New Black in Entertainment

    Spotfire Blogging Team
    17 Jul 2014 | 5:55 am
    Programs such as “Orange is the New Black” and “House of Cards” from Netflix have come together in large part because of the media company’s data-driven understanding of its viewers and its ability to determine the type of content they want. Netflix and other broadcasters are increasingly relying on big data and analytics to strengthen their programming development to ensure they’re creating the right content, and using the actors, writers, and directors to win large audiences. For its part, Netflix is making a major push into European markets through 2014.
  • Analytics to Drive Customer Satisfaction, Boost Revenue

    Spotfire Blogging Team
    16 Jul 2014 | 5:55 am
    Companies that use analytics to bolster their customer service operations realize better results when it comes to customer satisfaction, operational efficiency and financial performance compared to their counterparts that do not use analytics this way. That’s according to a recent research report from Aberdeen Group that surveyed 233 organizations about their customer experience management programs. Those companies that use customer service analytics are 39 percent more satisfied with their abilities to make service decisions driven by data compared to those companies that do not use…
  • How Agile BI Improves Your Company’s Responsiveness to Change

    Spotfire Blogging Team
    15 Jul 2014 | 5:55 am
    Seventy percent of the companies that were listed on the Fortune 1000 just 10 years ago have disappeared. The reason? They’re victims of change – casualties of the digital disruption as well as their inabilities to anticipate and mitigate risk, says Forrester VP and Principal Analyst Craig Le Clair. But the fact is, these threats could impact any company in any industry. The antidote to these disruptions is business agility, according to Le Clair. There are different types of business agility that companies can achieve through the use of analytics. One is market agility – the use of…
 
  • add this feed to my.Alltop

    PolicyMap

  • Exploring Foreclosure data in Chicago

    Adam Kurstin
    23 Jul 2014 | 9:08 am
    Chicago’s Englewood neighborhood has been struggling with depopulation, violent crime, and a host of other urban ailments for decades.  This summer, the City of Chicago is attempting to implement an intriguing new strategy to stem these long trends towards neglect by leveraging one of the neighborhood’s most valuable assets, homeowners.  Chicago plans to sell city owned properties for $1 to local homeowners who already have a stake in their communities. Homeowners who live in the community would presumably be willing to invest their money and efforts to improve their neighborhood.  The…
  • PolicyMap Named Data Wizards/Ninjas/Unicorns/Whatevs by Wonkblog!

    Bernie Langer
    21 Jul 2014 | 12:07 pm
    It was a Friday afternoon like any other, until #NameThatData came along. Christopher Ingraham at The Washington Posts’s Wonkblog posted a map of the United States with data, without saying what the data was. The contest was to see who could correctly name the data on the map. Spoiler alert: We won. When we saw the contest, we sprung into action. How could we not? We started with some quick guesses. The dense arc through the south suggested African American population. But then what’s going on in New England? Could be obesity. But then Colorado should look better. Interestingly, our…
  • Map NSP Target Areas On PolicyMap

    Katie Nelson
    21 Jul 2014 | 6:20 am
    The Neighborhood Stabilization Program (NSP) is a federal program that provides assistance to state and local governments to acquire and redevelop foreclosed and abandoned properties that might otherwise become sources of blight to their communities. As a part of the program, grantees picked target areas in which to focus their efforts. The criteria for identifying target areas were very specific and many grantees turned to PolicyMap for the data to complete their applications. NSP-approved target areas for communities throughout the country are now available on PolicyMap. This update may be…
  • Mapping SBA-Approved Microlenders

    Morgan Robinson
    17 Jul 2014 | 6:28 am
    Microlending is the practice of providing small loans to low-income people to start small businesses. Grameen Bank, founded in Bangladesh in 1983, was a pioneer in microfinance, generally providing small loans to the rural poor. While the practice has become extremely popular globally, microfinance has only recently joined the arsenal of financing options for small business owners and would-be entrepreneurs in the U.S. Microlending in the United States got an official boost in 2009, with the passage of American Recovery and Reinvestment Act. Through ARRA, the Small Business Administration…
  • 2014 CRA Eligibility Status Updated on PMap!

    Kristin Crandall
    16 Jul 2014 | 5:51 am
    The Community Reinvestment Act (CRA) was passed by Congress in 1977 to encourage banks to extend credit to low- and moderate-income Americans. The Act was a response to redlining, a common practice involving systematically denying credit or increasing the costs of banking services to communities based on income, race or other discrimination. CRA requires that financial institutions undergo periodic evaluations to determine whether they are meeting the credit needs of the communities in which they operate, including low- and moderate-income neighborhoods. Tracts are CRA eligible if they are…
  • add this feed to my.Alltop

    Revolutions

  • magrittr: Simplifying R code with pipes

    David Smith
    23 Jul 2014 | 3:09 pm
    R is a functional language, which means that your code often contains a lot of parentheses . And complex code often means nesting those parentheses together, which make code hard to read and understand. But there's a very handy R package — magrittr, by Stefan Milton Bache — which lets you transform nested function calls into a simple pipeline of operations that's easier to write and understand. Hadley Wickham's dplyr package benefits from the %>% pipeline operator provided by magrittr. Hadley showed at useR! 2014 an example of a data transformation operation using…
  • How to Choose an R-Trainer?

    Joseph Rickert
    22 Jul 2014 | 8:30 am
    by James Peruvankal There are plenty of options if you want to learn R and are looking for training: your college’s statistics department, massive open online courses like Coursera, Udacity, edX, Datacamp etc. SiliconANGLE recently published an article about top R-training companies. Let’s talk about how to choose a good R-trainer. First and foremost is technical competency in R - In addition to having done a significant amount of R programming, the instructor should have an education in a quantitative field. The idea behind this is that the instructor will have had experience…
  • There's no mistake in the barley data

    David Smith
    21 Jul 2014 | 12:09 pm
    Statistics has many canonical data sets. For classification statistics, we have the Fisher's iris data. For Big Data statistics, the canonical data set used in many examples is the Airlines data. And for dotplots, we have the barley data, first popularized by Bill Cleveland in the landmark 1993 text Visualizing Data. Cleveland's innovations in data visualiation were hugely influential in the S language and (later) R's lattice and ggplot2 packages, and the panel chart of the barley data shown below is one of the best known.  The chart above shows the yields for several…
  • Because it's Friday: Word Crimes

    David Smith
    18 Jul 2014 | 11:12 am
    This blog has its fair share of typos and homophones, I know. There's always room for more proofreading. (And don't get me started on the inconsistent use of "favorite" and "favourite" — my spelling locus is drifting somewhere in the mid-Pacific these days.) But I am a bit of a grammar nerd, so I appreciate Weird Al Yankovic's attempt to get the social media set to use words proper-like (and also get off my lawn!). Any song that uses "conjugate" in the grammatical sense gets my endorsement. (And although I've spent more time than usual…
  • July 22: Applications in R Webinar

    David Smith
    18 Jul 2014 | 9:12 am
    Just a quick heads-up that I'll be presenting with Neera Talbert (VP Professional Services, Revolution Analytics) in a free webinar on Tuesday, July 22 on Applications in R: Success and Lessons Learned from the Marketplace. I'll describe several R applications from well-known companies (some of which can be seen in the presentation I gave at the China R User Conference), and Neera will present a few case studies of how the Revolution Analytics consulting group has helped companies using R in areas such supply chain analytics, sensor data analysis, and R package validation and…
  • add this feed to my.Alltop

    iTrend Blog

  • 5 new Bitcoin facts that may surprise you

    Michael Alatortsev
    10 Jul 2014 | 10:50 am
    1. Russia had previously declared Bitcoin illegal.  It has just recently softened its stance, and, judging from the prevalence of Russian language tweets in our Bitcoin data sets, the Russians are now all over the cryptocurrency.  Based on volume alone, they are now dominating #bitcoin social media conversations.   2. new cryptocurrencies are continuing to emerge; latest example is Latium - claiming to be the fist and only cryptocurrency network (no mining required). 3. Dogecoin is dead.  Wow, really. 4. Snoop Dogg‘s comment about Bitcoin remains the highest retweeted comment…
  • sneak preview of iTrend 2.0 #analytics – new UI, new insights

    iTrend LLC
    8 Jul 2014 | 9:26 am
    We are testing the latest version of our social analytics platform. It offers tons of new functionality: multi-language support, with ability to split social data by language global maps, with several different views improved filtering brand-new NLP capabilities (the system can understand what people are talking about) additional ways to combine social with other data sources Plus, it is: super fast more affordable than Salesforce Marketing Cloud, Sysomos, etc more flexible than any leading tool customizable (talk to us about your specific requirements today) If you are interested in…
  • Comprehensive analysis of 273,000 #AmazonCart tweets

    iTrend LLC
    23 May 2014 | 8:43 am
    May 28 2014 update: 273,000 tweets were analyzed. Updated Top Selling items are shown below. Please note: we can only track products being added to cart, we don’t have access to actual checkout transactions (unless people choose to share their purchase on Twitter upon checkout – which some do).  Not all ‘sales’ mentioned below have been taken through checkout process.   Top #AmazonCart sellers, by number of items sold: Top #AmazonCart sellers, by total sales value:   We posted some preliminary data when the new feature went live on May 5 2014.  Two weeks…
  • iTrend Build 1984 Release Notes

    iTrend LLC
    15 May 2014 | 8:14 am
    You may have noticed the new build number at the bottom of iTrend’s login page.  We’ve been implementing a number of enhancements based on feedback from our TechCrunch Disrupt NY 2014 presentations.   What’s new in version 1984: improved algorithm for fetching and displaying ‘Software” clients, you will see more product icons FIXED width display bug in two types of reports: ‘Retweets’ and ‘Tweets’ improved PDF exporting/print capabilities in qualifying subscription plans improved display/refresh UX in ‘Conversations’…
  • demonstrating iTrend at TechCrunch Disrupt NY 2014

    iTrend LLC
    12 May 2014 | 7:19 am
    iTrend Cinevent Interview – thanks to Stephanie Pelletier at SPELLNET for putting this together. Some highlights from Day 1:  
 
Log in