Data Mining

  • Most Topular Stories

  • Mining data to make government more efficient

    Data mining News
    30 Sep 2014 | 10:37 pm
    … one innovative approach to such data mining, Indiana Gov. Mike Pence’s …
  • The Longform Manifesto

    Data Mining: Text Mining, Visualization and Social Media
    Matthew Hurst
    25 Sep 2014 | 10:37 pm
    Sometimes a title for a blog posts suggests itself to me which seems so self contained that it takes real effort to actual write the post ('Machine Intelligence, not Machine Learning is the Next Big Thing' is another in this line). The idea behind the (or a) Longform Manifesto is as follows. I have become aware of late of the sense of deterioration that is associated with the mobile 'revolution' and the info snacking, casual gaming and interupt driven lifestyle that it has entailed. The behaviours are perfectly illustrated in this scene from Portlandia:   With a daughter…
  • 5 Reasons New Houses Are Still Getting Bigger

    The Numbers
    Rani Molla
    30 Sep 2014 | 7:37 am
    With America's aging, "empty nester" population, increasing environmental concerns and smaller household sizes, you might think the U.S. would want to build smaller houses. We haven't. Here are five reasons houses are still getting bigger.
  • AMR: Not semantics, but close (? maybe ???)

    natural language processing blog
    27 Sep 2014 | 9:00 am
    Okay, necessary warning. I'm not a semanticist. I'm not even a linguist. Last time I took semantics was twelve years ago (sigh.)Like a lot of people, I've been excited about AMR (the "Abstract Meaning Representation") recently. It's hard not to get excited. Semantics is all the rage. And there are those crazy people out there who think you can cram meaning of a sentence into a !#$* vector [1], so the part of me that likes Language likes anything that has interesting structure and calls itself "Meaning." I effluviated about AMR in the context of the (awesome) SemEval panel.There is an LREC…
  • Catalog vs. E-Commerce Annual Repurchase Rates

    Kevin Hillstrom: MineThatData
    Kevin Hillstrom
    30 Sep 2014 | 8:15 pm
    There are two issues that persist in the work I perform for clients. One, of course, is Merchandise ... a nearly forgotten concept in modern marketing. 24 of the past 29 Merchandise Forensics projects illustrated a merchandising challenge that is holding back the business.The other issue, of course, centers around a fundamental misunderstanding about customer purchasing habits.Look at this graph. Here, we measure the annual repurchase rate of customers as the customer moves from a first-to-second purchase, then a second-to-third purchase, and so forth. The blue bars represent total repurchase…
 
  • add this feed to my.Alltop

    Data Mining: Text Mining, Visualization and Social Media

  • The Longform Manifesto

    Matthew Hurst
    25 Sep 2014 | 10:37 pm
    Sometimes a title for a blog posts suggests itself to me which seems so self contained that it takes real effort to actual write the post ('Machine Intelligence, not Machine Learning is the Next Big Thing' is another in this line). The idea behind the (or a) Longform Manifesto is as follows. I have become aware of late of the sense of deterioration that is associated with the mobile 'revolution' and the info snacking, casual gaming and interupt driven lifestyle that it has entailed. The behaviours are perfectly illustrated in this scene from Portlandia:   With a daughter…
  • Scottish Independence : Bing Predicts 'No'

    Matthew Hurst
    18 Sep 2014 | 9:58 am
    Bing's prediction team has a feature live on the site right now that predicts Scotland will not become an independant nation as a result of today's referendum.
  • Bing hearts World Cup 2014, Google - not so much

    Matthew Hurst
    12 Jul 2014 | 12:19 pm
    While Google has been doing a great job of their front page animations (today's is very nice, illustrating how Brazil and The Netherlands are on their way to Russia for 2018), Bing appears to be far more attentive to actually answering questions about the competition. For example: Compared to Bing's Google's answer brings up some interesting news articles, but Bing brings up stats on the teams and even a prediction of who will win (Cortana - which is driving these predictions - has been doing a perfect job of predicting game outcomes).
  • GrubHub's Phasmid Websites

    Matthew Hurst
    3 May 2014 | 9:49 pm
    The rationale behind mining business data directly from the business's own website is that the business has a clear economic motivation to ensure that the data is up to date. If you own a restaurant that changes location, and your website still publishes the former address, those potential customers who visit your site will not be enjoying your delicious offerings. For the web mining proposition to work, it is important to firstly know that you have in your hand a genuine business website and secondly, to have excellent extraction and inference technology to pull the required…
  • Hopper - new in the travel space

    Matthew Hurst
    19 Jan 2014 | 11:24 am
    Briefly - Hopper is something new in the travel  / local space. In their own words: What if you could plan an amazing trip based on a vague idea — like “spring surfing in California” or “Mediterranean cruise”? What if logistical information popped up right when you needed it, so you wouldn't have to spend hours on research? This is our vision: to make planning a trip an effortless extension of discovering and exploring new places. We spent several years experimenting with different tools, technology and algorithms to collect, organize and manage massive amounts of…
  • add this feed to my.Alltop

    The Numbers

  • 5 Reasons New Houses Are Still Getting Bigger

    Rani Molla
    30 Sep 2014 | 7:37 am
    With America's aging, "empty nester" population, increasing environmental concerns and smaller household sizes, you might think the U.S. would want to build smaller houses. We haven't. Here are five reasons houses are still getting bigger.
  • How Much of World’s Greenhouse-Gas Emissions Come From Agriculture?

    Rani Molla
    29 Sep 2014 | 10:41 am
    Agriculture might seem green by definition, but farming accounts for a lot of greenhouse-gas emissions when the entire food production system is taken into account.
  • How College Football Teams Choose Opponents

    Jo Craven McGinty
    26 Sep 2014 | 10:47 am
    College football teams, which are divided into conferences loosely by geographic region, play 12 games each season. The conferences arrange eight or nine games for each of their member teams, and those tend to be the tougher contests. The remaining three or four games (the split varies by conference) are set up by the individual teams, who select opponents at their own discretion.
  • Have College Football Games Become More Lopsided?

    Tom McGinty
    26 Sep 2014 | 10:24 am
    Analysis of 40 years of college football games in which top-ranked programs played teams outside their conferences shows that the elite teams have been winning those games, which they schedule themselves, by wider and wider margins.
  • Marriage and Divorce, Ebola and College Football (Statshot)

    David Goldenberg
    26 Sep 2014 | 7:39 am
    College graduates are more likely to get married--and stay that way, Ebola kills roughly half of the people diagnosed with it, and college quarterbacks throw many more passes now.
 
  • add this feed to my.Alltop

    natural language processing blog

  • AMR: Not semantics, but close (? maybe ???)

    27 Sep 2014 | 9:00 am
    Okay, necessary warning. I'm not a semanticist. I'm not even a linguist. Last time I took semantics was twelve years ago (sigh.)Like a lot of people, I've been excited about AMR (the "Abstract Meaning Representation") recently. It's hard not to get excited. Semantics is all the rage. And there are those crazy people out there who think you can cram meaning of a sentence into a !#$* vector [1], so the part of me that likes Language likes anything that has interesting structure and calls itself "Meaning." I effluviated about AMR in the context of the (awesome) SemEval panel.There is an LREC…
  • Reading group notes: point/counter-point on "predict models"

    31 Jul 2014 | 6:26 am
    In our local summer reading group, I led the discussion of two papers that appeared in Baltimore last month:Marco Baroni, Georgiana Dinu & German Kruszewski, Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. ACL 2014.Omer Levy & Yoav Goldberg., Linguistic Regularities in Sparse and Explicit Word Representations. CoNLL 2014 (best paper award recipient).I love handouts, so I made a handout for this one too. I paste below the handout. All good ideas are those of the respective authors; all errors and bad ideas are probably due to…
  • Hello, World!

    27 Jul 2014 | 7:18 am
    Okay, usually Hello World is the first program you learn to write in a new programming language. For fun, I've been collecting how to say hello world in different human languages, something remarkably difficult to search for (because of the overloading of the word "language"). I have 28. I'd like to make it to 280 :). If you have one (or more) to contribute, email me, post a comment, or tweet to me @haldaume3. And of course if you think any of these is wrong, please let me know that too. 1 bar Servus Woid! 2 ca Hola Món! 3 de Hallo Welt! 4 en Hello World! 5 eo Saluton, Mondo! 6 es ¡Hola…
  • My ACL 2014 picks...

    5 Jul 2014 | 9:22 am
    Usual caveats: didn't see all papers, blah blah blah. Also look for #acl14nlp on twitter -- lots of papers were mentioned there too!A Tabular Method for Dynamic Oracles in Transition-Based Parsing; Yoav Goldberg, Francesco Sartorio, Giorgio Satta.Jaokim Nivre, Ryan McDonald and I tried searnifying MaltParser back in 2007 and never got it to work. Perhaps this is because we didn't have dynamic oracles and we thought that a silly approximate oracle would be good enough. Guess not. Yoav, Francesco and Giorgio have a nice technique for efficiently computing the best possible-to-achieve dependency…
  • Divergences passed through Bayes' rule

    30 Jun 2014 | 8:30 am
    In a previous post's comments, we talked about Bayes rule and things like that. This got me wondering about the following question:If we know p(A) and p(B|A), we can reconstruct p(A|B) perfectly by Bayes' rule. What if we only have estimates of p(A) and p(B|A)? How does the quality of the reconstruction of p(A|B) vary as a function of the quality of the estimates of the marginal and conditional?I feel like there have to be results along these lines, but I was unable to find them. My next attempt was to prove something, which failed miserably after a few hours.  So, as a good empiricist…
  • add this feed to my.Alltop

    Kevin Hillstrom: MineThatData

  • Catalog vs. E-Commerce Annual Repurchase Rates

    Kevin Hillstrom
    30 Sep 2014 | 8:15 pm
    There are two issues that persist in the work I perform for clients. One, of course, is Merchandise ... a nearly forgotten concept in modern marketing. 24 of the past 29 Merchandise Forensics projects illustrated a merchandising challenge that is holding back the business.The other issue, of course, centers around a fundamental misunderstanding about customer purchasing habits.Look at this graph. Here, we measure the annual repurchase rate of customers as the customer moves from a first-to-second purchase, then a second-to-third purchase, and so forth. The blue bars represent total repurchase…
  • Diagnostics: How Management Responds To Changes

    Kevin Hillstrom
    29 Sep 2014 | 8:15 pm
    A good marketing/analytics system should be able to detect when the Management Team is actively trying to steer a business out of a problem.Look at this table. Here, new buyers decrease from 55,000 in 2012 to 52,000 in 2013 to 49,000 in 2014, similar to yesterday.And yet, sales do not decrease. Why?Look at $ per repurchaser - among 12-month buyers.This metric increases from $200 per customer to $220 per customer to $230 per customer, on an annual basis.When you see declining new buyer counts and increasing spend levels among existing customers, you realize that "management figured out…
  • Diagnostics: Canary In The Coal Mine

    Kevin Hillstrom
    29 Sep 2014 | 7:00 am
    Any marketing/analytics system should be able to quickly identify a business issue. One of the problems with modern analytics software is that the software tends to be campaign centric - meaning that software identify problems with campaigns, not problems with businesses.In most of the businesses I analyze, there is a canary in the coal mine ... a metric that helps determine that something is wrong.That metric, of course, is the number of new customers.The example above is very, very, very common. Look at the annual repurchase rate ... it is essentially flat. This means that existing…
  • Shop.org - Seattle

    Kevin Hillstrom
    28 Sep 2014 | 8:15 pm
    I'll be in Seattle on Tuesday afternoon - slots are filling up fast, so if you want to meet, send me a quick message (kevinh@minethatdata.com).Thanks,KevinP.S.: Speaking of conferences, join me in New Hampshire in February ... click here! I have all sorts of goodies ready to share for that one.
  • Matthew McConaughey

    Kevin Hillstrom
    25 Sep 2014 | 9:37 pm
    Ok, click here to see the commercial (right here, folks).By the way, I'm sure the title of this blog post will really ramp-up the fabled "engagement" metric here at The MineThatData Blog. Sure, I could write about Merchandise Forensics and sell a bunch of consulting projects, but my goodness, Mr. McConaughey is going to blow up my engagement dashboard. Real-time alerts are already going off, folks. For a moment or two, I thought all that beeping meant that I had to replace a smoke detector battery, but then I realized it isn't time to go off Daylight Savings Time yet.This article (click here)…
 
  • add this feed to my.Alltop

    TIBCO Spotfire's Trends and Outliers

  • Predictive Analytics in Financial Services: Pinpointing Customer Opportunities

    Spotfire Blogging Team
    30 Sep 2014 | 5:55 am
    Rising regulatory pressures. A sluggish economy. Heightened competition from new market entrants. These are just a few of the challenges companies in the financial services industry are facing today. In response to these pressures, there are tremendous opportunities for banks, brokerages, and other players in financial services to use customer data and analytics to gain deep insights about customers’ needs, preferences, credit worthiness, product ownership, risk appetite, investing habits, lifecycle status, and other characteristics that they can then use to provide customers with targeted…
  • Maximizing Your Digital Marketing with Business Intelligence

    Spotfire Blogging Team
    29 Sep 2014 | 5:55 am
    Today there are so many channels available like Facebook, Twitter, and Pinterest that social media, along with SEO, have become the foundations of many companies’ online marketing campaigns. And that means more data to gather up, analyze and process to propel your business forward. Using business intelligence (BI) tools, your company can find and then share critical business insights about your digital marketing campaigns. “With low costs and the potential for high ROI, it’s easy to see why these schools of marketing are gaining popularity so quickly,” according to an…
  • Predictive Analytics to Boost Customer Acquisition, Retention for Financial Services

    Spotfire Blogging Team
    26 Sep 2014 | 5:55 am
    While predicting the ups and downs of capricious financial markets may be next to impossible, financial services companies can tap analytics to confidently predict customer behavior to outperform their competitors. Financial services firms that have adopted predictive analytics performed better in identifying new customer opportunities, increasing total numbers of customer and boosting cross-sell and upsell revenue, according to a new research report from Aberdeen Group. The companies that adopted predictive analytics realized a 10 percent increase in new customer opportunities over a year,…
  • Analytics to Create a Data-Driven Culture

    Spotfire Blogging Team
    25 Sep 2014 | 5:55 am
    While there’s been no lack of attention paid to the vast potential big data has for bolstering business operations, many of the challenges to exploiting big data sources lie in transforming the people and culture within companies. You Need More Than Good Data That’s the assertion of McKinsey & Co. in a pair of videos detailing how companies need to adjust their talent pools and decision-making processes to avoid ending up with an avalanche of data but no actionable insights. “Analytics will define the difference between the losers and winners going forward,” notes Tim McGuire, a…
  • 10 Ways to Use Analytics to Drive Innovation

    Spotfire Blogging Team
    24 Sep 2014 | 5:55 am
    Much of the focus on the potential for big data has centered around using analytics to boost sales and marketing. While those areas are indeed ripe for innovation, companies can tap analytics for a slew of other novel improvements to outperform competitors, including identifying new profit models, designing new products and streamlining processes. Tom Davenport, a professor at Babson College and an independent senior advisor to Deloitte Analytics, outlines the 10 ways companies can foster innovation with analytics in an article in Deloitte University Press. The 10 ways companies can drive…
  • add this feed to my.Alltop

    PolicyMap

  • Print, Save, Email, and Embed your work on PolicyMap

    Phil V.
    30 Sep 2014 | 6:26 am
    You’ve made the perfect map; data layers are customized, map is zoomed to your location, and data points have been added and filter. Now you want to share this wonderful map with others. The icons on the top left of the map will allow you to save and share your maps. All printed maps, tables, and reports will store a copy in your My PolicyMap, allowing you to log into your account at any time and download a copy or reopen a saved copy. Email – this feature will give users to ability to quickly share an interactive map to any user. Selecting the icon will open a window which you…
  • Home sale data for 2013 now available on PolicyMap!

    Katie Nelson
    25 Sep 2014 | 7:14 am
    PolicyMap recently updated our home sale data to include 2013 transactions. The data include information on how many home sales occurred in your neighborhood, city or county. It also includes the median sales price of those transactions, and the loan-to-value ratio, which compares the value of the first mortgage taken out on a home to the sales price. Our home sale data is one of the most popular datasets we have because these statistics are so important to understanding the real estate market in an area. Annual data are available from 2006 through 2013, but we also offer a 5-year trend…
  • Low Income Housing Tax Credit Data Updated on PolicyMap!

    Kristin Crandall
    24 Sep 2014 | 1:40 pm
    One of our most popular datasets, Low Income Housing Tax Credit (LIHTC) projects, has been updated on PolicyMap! The LIHTC program is an indirect federal subsidy used to finance the development of affordable rental housing for low-income households. It provides tax incentives to encourage individuals, corporate investors, and banks to invest in the development, acquisition, and rehabilitation of affordable rental housing. According to one estimate, the LIHTC program has helped to finance the development of more than 2.4 million affordable rental housing units since its inception in 1986. The…
  • Developing an Arts and Culture Hub, with Data

    Morgan Robinson
    23 Sep 2014 | 3:06 pm
    CultureBlocks profiled in Exploring Our Town, NEA’s showcase of best practicesThe Philadelphia Inquirer recently showcased a study by Drexel researchers exploring cultural assets in the Mantua, Powelton and West Powelton neighborhoods in West Philadelphia. The report, A Fragile Ecosystem (pdf), describes a neighborhood with lots of resident artists, dense artistic clusters along major commercial corridors, and tons of regional cultural amenities close by. This report was of particular interest to me, as one of this area’s newest residents. (My roommate and I moved in on Sunday. Once I…
  • Detailed Cancer Rates by State and County from the CDC

    Morgan Robinson
    11 Sep 2014 | 9:11 am
    Cancer is one of the most common diseases in the United States: approximately 40 percent of all people will be diagnosed with some type of cancer during their lifetime. Many health agencies monitor cancer trends; among them is the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) program. SEER measures cancer incidence – the number of new cases diagnosed – as well as survival statistics and mortality. The newest data on PolicyMap is state and county cancer rates and cases, provided by SEER and CDC’s National Program of Cancer Registries (NPCR). The rates…
 
  • add this feed to my.Alltop

    Revolutions

  • Why are we still teaching T-tests?

    Joseph Rickert
    30 Sep 2014 | 8:30 am
    The following post by Norm Matloff originally appeared on his blog, Mad(Data)Scientist, on September 15th. We rarely republish posts that have appeared on other blogs, however, the questions that Norm raises both with respect to the teaching of statistics, and his assertion that "R's statistical procedures are centered far too much on significance testing" deserve a second look. Moreover, Norm's post elicited quite a few comments, many of which are at a high level of discourse. At the bottom of this post we have include excerpts from exchanges with statistician Mervin…
  • Video introduction to data manipulation with dplyr

    David Smith
    29 Sep 2014 | 2:24 pm
    Hadley Wickham's dplyr package is a great toolkit for getting data ready for analysis in R. If you haven't yet taken the plunge to using dplyr, Kevin Markham has put together a great hands-on video tutorial for his Data School blog, which you can see below. The video covers the five main data-manipulation "verbs" that dplyr provides: filter, select, arrange, mutate and summarise/group_by. (It also introduces the glimpse function, a handy alternative to str, that I had overlooked before.) The video also provides an introduction to the %>% ("then") operator from…
  • Because it's Friday: Why you shouldn't hate on ET the game

    David Smith
    26 Sep 2014 | 11:21 am
    I think I may be one of the few kids that actually liked the ET: The Extra-Terrestial game for the Atari 2600. Sure it was frustrating, but so were most games of the era, and at least it wasn't a disappointing "recreation" of one of my arcade favourites. (Looking at you, Pac-Man 2600.) If you're not familiar with this part of gaming history, ET was famously so unpopular that Atari was blamed for the downfall of the entire home-console industry and subsequently buried thousands of unsold cartridges in the Nevada desert. This (somewhat NSFW) Zero Punctuation video explains it all better…
  • Police militarization in the US, over time

    David Smith
    26 Sep 2014 | 10:45 am
    The militarization of local police departments here in the US has been much in the news lately, and the New York Times published in June an in-depth article on how materiel from wars has ended up in the hands of US counties. Besides the traditional reporting it's a fantastic piece of data journalism: the Times submitted a freedom-of-information request to the Defense Department for the items, value and the date they were provided to each county, and published the data on GitHub. Here's a small snippet of the data: The Times also published an interactive map of the data, aggregated…
  • DescTools: a new R "misc package"

    Joseph Rickert
    25 Sep 2014 | 8:30 am
    by Joseph Rickert One of the most difficult things about R, a problem that is particularly vexing to beginners, is finding things. This is an unintended consequence of R's spectacular, but mostly uncoordinated, organic growth. The R core team does a superb job of maintaining the stability and growth of the R language itself, but the innovation engine for new functionality is largely in the hands of the global R communty.  Several structures have been put in place to address various apsects of the finding things problem. For example, Task Views represent a monumental effort to collect…
Log in