Data Mining

  • Most Topular Stories

  • Get your R education going with GitHub

    Revolutions
    Joseph Rickert
    2 Jul 2015 | 5:05 am
    by Joseph Rickert Last week, I was fortunate enough to attend the R Summit & Workshop, an invitation only event, held at the Copenhagen Business School. The abstracts for the public talks presented are online and well worth a look. Collectively they provide a snapshot of the state of development of R and the R Community as well some insight into the directions in which researchers are moving to expand the boundaries of R. Real highlights of the event were talks by Jennifer Bryan and Mine Çetinkaya-Rundel, two educators who are channeling enormous amounts of energy into teaching…
  • June Jobs Report – The Numbers

    The Numbers
    Kate Davidson
    2 Jul 2015 | 6:06 am
    Jobs, wages, labor-force participation and more.
  • We’re Hiring!

    PolicyMap
    Elizabeth Nash
    29 Jun 2015 | 10:05 am
    If processing data, blogging about mapping, strategizing about taxonomies and collaborating on data visualization tools sound up your alley, check out the Data Associate job opening on our new PolicyMap careers page.  We’d love to talk to you about why you love PolicyMap and want to be a part of our team! The post We’re Hiring! appeared first on PolicyMap.
  • Data Science con R

    MiningData, explotando los datos
    miningdataadm
    19 Jun 2015 | 9:44 am
    Dado que el Data Science está ahora de rabiosa actualidad por las múltiples aplicaciones que tiene la analítica en campos y entornos como la medicina, la política, la banca, seguros, finanzas, previsiones, smart cities, el marketing, la tecnología, ciencia, negocios, redes sociales, y un largo etcétera., gracias en gran parte a la enorme cantidad de datos que prolifera a nuestro alrededor, vamos a hablar un poco sobre el lenguaje más popular a la hora de hacer Data Science, estamos hablando del proyecto R. R es un entorno de software Open Source para analítica de datos,…
  • Harnessing big data to solve problems is goal of new ...

    Data mining News
    3 Jul 2015 | 2:41 am
    … part, Lodi says the old data mining that was done was static …
 
  • add this feed to my.Alltop

    MiningData, explotando los datos

  • Data Science con R

    miningdataadm
    19 Jun 2015 | 9:44 am
    Dado que el Data Science está ahora de rabiosa actualidad por las múltiples aplicaciones que tiene la analítica en campos y entornos como la medicina, la política, la banca, seguros, finanzas, previsiones, smart cities, el marketing, la tecnología, ciencia, negocios, redes sociales, y un largo etcétera., gracias en gran parte a la enorme cantidad de datos que prolifera a nuestro alrededor, vamos a hablar un poco sobre el lenguaje más popular a la hora de hacer Data Science, estamos hablando del proyecto R. R es un entorno de software Open Source para analítica de datos,…
  • add this feed to my.Alltop

    The Numbers

  • June Jobs Report – The Numbers

    Kate Davidson
    2 Jul 2015 | 6:06 am
    Jobs, wages, labor-force participation and more.
  • Number of the Day: 1

    Brian Hershberg
    30 Jun 2015 | 11:29 am
    Today's number, 1, comes courtesy of the leap second.
  • Train of Thought: Sometimes Numbers Speak for Themselves

    Brian Hershberg
    17 Jun 2015 | 7:49 am
    Last night in baseball--with home runs galore, a cycle and a run of shutouts seen just five times--was a wonderful reminder that numbers are sometimes just cool for the accomplishment they represent.
  • Number of the Day: $1.64 Billion

    Brian Hershberg
    16 Jun 2015 | 10:54 am
    "Jurassic World" set a box-office record for the best opening weekend of all-time. Even better? The inflation-adjusted returns of "Gone With the Wind" in 1939.
  • Behind The Numbers: American Pharoah and 37 Years

    Brian Hershberg
    8 Jun 2015 | 12:07 pm
    By winning horse racing's Triple Crown this past weekend, American Pharoah entered the conversation on where he stands among all-time great thoroughbreds. But ranking American Pharoah is a subjective game.
 
  • add this feed to my.Alltop

    natural language processing blog

  • Some NAACL 2013 statistics on author response, review quality, etc.

    9 Jun 2015 | 8:40 am
    NAACL 2015 has just passed, NAACL 2013 is long in the past.One bonus of being a program chair is that you get to have fun with data. In this post I'd like to review two pieces of data, one related to author response and one related to review quality assessment.tldr: Generally, I think author response is useless, except insofar as it can be cathartic to authors and thereby provide some small psychological benefit. And in general people don't seem that dissatisfied with their papers' reviews, and this is largely independent of the outcome of the paper (conditioned on selection bias of those who…
  • The myth of a strong baseline

    15 Nov 2014 | 7:00 am
    I can probably count on my fingers the number of papers I've submitted for which a reviewer hasn't complained about a baseline in some way. I don't mean to imply that all of those complaints are invalid: many of them were 100% right on in ways that either I was lazy about or ways that I hadn't seen a priori.In fact, I remember back in 2005 I visited MIT and gave a talk on what eventually became the BayeSum paper (incidentally, probably one of my favorite papers I've written, though according to friends not exactly the best written... drat). I was comparing in the talk against some baselines,…
  • EMNLP 2014 paper list (with mini-reviews)

    1 Nov 2014 | 9:40 am
    I'm going to try something new and daring this time. I will talk about papers I liked, but I will mention some things I think could be improved. I hope everyone finds this interesting and productive. As usual, I didn't see everything, didn't necessarily understand everything, and my errors are my fault. With that warning, here's my EMNLP 2014 list, sorted in anthology order. Identifying Argumentative Discourse Structures in Persuasive Essays (Christian Stab, Iryna Gurevych)Full-on discourse parsing of rhetorical structure (e.g., RST) is really hard. In previous work, these authors created a…
  • Hyperparameter search, Bayesian optimization and related topics

    10 Oct 2014 | 10:55 am
    In terms of (importance divided-by glamour), hyperparameter (HP) search is probably pretty close to the top. We all hate finding hyperparameters. Default settings are usually good, but you're always left wondering: could I have done better? I like averaged perceptron for this reason (I believe Yoav Goldberg has also expressed this sentiment): no pesky hyperparameters.But I want to take a much broader perspective on hyperparameters. We typically think of HPs as { regularization constant, learning rate, architecture } (where "architecture" can mean something like neural network structure,…
  • Machine learning is the new algorithms

    3 Oct 2014 | 10:19 am
    When I was an undergrad, probably my favorite CS class I took was algorithms. I liked it (a) because my background was math so it was the closest match to what I knew and (b) because even though it was "theory," a lot of the stuff we learned was really relevant. Over time, it seemed like the area had distilled worthwhile algorithms from interesting-in-theory-but-you'll-never-actually use algorithms.In fact, I think this is a large part of why most undergraduate CS degrees today require a course in algorithms. You have these very nice, clearly defined statements, and very elegant solutions to…
  • add this feed to my.Alltop

    Kevin Hillstrom: MineThatData

  • A Break

    Kevin Hillstrom
    30 Jun 2015 | 8:10 pm
    Let's take a few days off, and enjoy Canada Day and the 4th of July?I'll see you again on July 6!
  • Macy's / Trump

    Kevin Hillstrom
    30 Jun 2015 | 5:24 pm
    You read this one, right (click here)?As always ... Merchandise > Omnichannel. What good does a seamless customer experience do when there are merchandising issues?It's the same thing with DirecTV ... #1 in customer service ... but the picture goes out during the Womens World Cup game (hint - the picture = merchandise). You can have all the commercials with talking horses and people on beaches - but none of it matters when there isn't a picture.We're a half-generation into not focusing on what is most important to our businesses - coupled with an inability to think three steps ahead and…
  • The Secret

    Kevin Hillstrom
    29 Jun 2015 | 8:10 pm
    So now we know the secret.You saw the Disney example ... they leveraged an "omnichannel" approach in the 1950s ... THE 1950s!!!!The Disney secret was easy to identify ... use the "brand" as the connection between numerous mutually exclusive spending opportunities.The reason the omnichannel approach doesn't work for our businesses is also simple ... we use the "brand" as the connection between numerous activities that lead to (usually) just one spending opportunity.In other words, the Disney example has worked for more than a half-century because of a diversified portfolio. Meanwhile, we've…
  • Disney Magic

    Kevin Hillstrom
    28 Jun 2015 | 8:10 pm
    Read every single line connecting boxes in this image. Every. Single. One.In the current omnichannel thesis, this image would exist ... but all of the words that support the lines that connect the boxes would be empty.In other words, the current omnichannel thesis lacks any semblance of business understanding or vision or strategy.In this one image, you can clearly see that the author understands how business works.This truly is Disney Magic!Let's think about our world. What might our image look like?This is my opinion only (many of you will disagree), but our process has been created and…
  • Omnichannel Fans: Read This Immediately, You'll Love It!!

    Kevin Hillstrom
    26 Jun 2015 | 11:23 am
    Click on this story right now ... seriously, drop everything you are doing if you are an omnichannel fan, and see your dreams validated (click here).This strategy, of course, is modern and exciting and synergistic and ... oh ... wait ... it's from the 1950s?The 1950s?This is what the omnichannel community is arguing for, when it comes right down to it ... and it makes perfect sense!Here's the secret.Notice that the "brands" in the story are common across each box ... but the purpose of each box is fundamentally different. Comic Strips are different than Disneyland ... a fundamentally unique…
 
  • add this feed to my.Alltop

    TIBCO Spotfire's Trends and Outliers

  • Spotfire Tips & Tricks: Telling a Story Using Annotations

    Spotfire Blogging Team
    2 Jul 2015 | 5:55 am
    For a long time users have been able to use Spotfire to tell a compelling story with their data. Spotfire Cloud now offers an additional way to enhance the story through the use of Annotations. Bookmarks and text-areas have been the “go to” way to add a narrative around your data. With Annotations, it’s possible to add a comment directly on the visualizations itself.  Just choose “Insert>Annotation” from the menu and a new note will be added to the canvas. The font and color of the annotation can be customized for each, or use the Theme editor to apply a global style.
  • Analytics Integration: Finding The Center

    Spotfire Blogging Team
    1 Jul 2015 | 5:55 am
    What’s holding back your analytics deployment? For many, it’s a problem of cross-functionality, in that, solutions that work for one department may not work for another, and trying to convince business “silos” to give up a measure of control in favor of collaboration is often a losing battle. It’s no surprise, then, that according to the Harvard Business Review, just 33 percent of companies across the United States and Western Europe are “aggressively adopting analytics across the entire enterprise.” So how do companies effectively integrate analytics without alienating…
  • Getting the Most From Supply Chain Analytics

    Spotfire Blogging Team
    29 Jun 2015 | 5:55 am
    Because a company’s supply chain is typically complex—though able to significantly affect cost structures and profits—it is ripe for the insights that analytics can unearth. That’s according to a recent article from Deloitte Consulting that notes multiple industries can improve forecasts, demand planning, sourcing, and production by infusing analytics into the supply chain. “Companies with leading supply chain capabilities have typically made significant shifts in their use of advanced analytics to transform historical data captured in Enterprise Resource Planning systems into…
  • CargoSmart Uses Big Data to Transform Shipping

    Ann Scheuerell
    25 Jun 2015 | 5:55 am
    CargoSmart, a leading provider of global shipment management solutions, is in an expansive and volatile industry. According to CEO Steve Siu, using Big Data will provide a competitive advantage and lead to industry transformation. “We are managing Big Data from over 5,500 vessels, covering 90% of the world’s ocean container traffic,” he says. “Our mission is to derive insights from which our customers can optimize shipment routes, lower transportation and fuel costs, and reduce the risk of late arrivals.”  Click here to read how CargoSmart is achieving that mission while…
  • Forecasting Energy Use with Predictive Analytics

    Spotfire Blogging Team
    24 Jun 2015 | 5:55 am
    Predictive analytics can empower utility companies to deliver better customer service and become more profitable while enabling them to respond in real time to any number of issues including outages, thefts, and spikes in energy use. Utilities can analyze the massive amounts of data they capture via new smart meters to operate more efficiently, enhance customer experience, make better buying decisions, predict and detect outages, and protect their revenue by detecting thefts, notes Scott Zoldi, vice president of analytic science at FICO, in an article on Information Management. However,…
  • add this feed to my.Alltop

    PolicyMap

  • Credit Unions: Your Neighborhood Cooperative

    Morgan Robinson
    2 Jul 2015 | 11:32 am
    You’re probably getting your hot dogs, American flags, and sparklers ready for Independence Day, but did you know that Friday, July 3rd is International Day of Cooperatives? According to the International Cooperative Alliance, cooperatives (co ops) are businesses owned and run collectively by and for their members. One common example of a cooperative organization is a credit union. Credit unions are financial institutions where, unlike banks, all members own a share in the overall business. Credit unions do not have external stockholders, and are not accountable to creating…
  • We’re Hiring!

    Elizabeth Nash
    29 Jun 2015 | 10:05 am
    If processing data, blogging about mapping, strategizing about taxonomies and collaborating on data visualization tools sound up your alley, check out the Data Associate job opening on our new PolicyMap careers page.  We’d love to talk to you about why you love PolicyMap and want to be a part of our team! The post We’re Hiring! appeared first on PolicyMap.
  • We’re at ALA 2015 Annual Conference! Come visit us!

    Phil Vu
    28 Jun 2015 | 9:17 am
    The ALA 2015 Annual Conference this year is in fabulous San Francisco! Just in time for the festivities that’s happening today. Very exciting. Come by our booth at 3807 and say hi! We’ve met lot of great librarians from across the country, learned so much more about our tool, and are excited to continue our conversations when we get back. We’ve met Conference Elvis (Twitter handle to come…) and we were lucky enough to have the a NASA super hero come see us also. We still have two more days so come by booth #3807, enter to win the Chromebook and learn more about…
  • Data Updates! Predominant Race/Ethnicity and Diversity Index

    Lauren Parker
    22 Jun 2015 | 2:15 pm
    We’re excited to announce that two indicators have been updated to include the 2013 ACS 5-year estimates: Predominant Race/Ethnicity and the Diversity Index. You’ll find both of these indicators by going to the “Demographics” header above the map, and then scrolling down to the “Diversity” sub-header. Just to review, the Diversity Index reflects the probability that if two people were chosen at random in a given area, they would be of different races and ethnicities. In calculating the index, we used 8 mutually-exclusive racial and ethnic categories reported by the American…
  • Low-Mod Dataset Update!

    Lauren Parker
    18 Jun 2015 | 2:30 pm
    It’s the annual Low-Mod Blog Post! Not quite as fantastic a name as perhaps, Bob Loblaw’s Law Blog, but arguably much more exciting. Recently, we released the American Community Survey’s 5-year estimates for 2009-2013 on PolicyMap. With this new release, we’ve updated the low to moderate income “Low-Mod” dataset to include 2013 values, as well. Bob Loblaw’s Law Blog is in awe of the Low-Mod blog. (source: GIPHY, credit to Arrested Development) The Low-Mod dataset reflects the local median income as a share of area median income. For all tracts and block groups located within…
 
  • add this feed to my.Alltop

    Revolutions

  • Get your R education going with GitHub

    Joseph Rickert
    2 Jul 2015 | 5:05 am
    by Joseph Rickert Last week, I was fortunate enough to attend the R Summit & Workshop, an invitation only event, held at the Copenhagen Business School. The abstracts for the public talks presented are online and well worth a look. Collectively they provide a snapshot of the state of development of R and the R Community as well some insight into the directions in which researchers are moving to expand the boundaries of R. Real highlights of the event were talks by Jennifer Bryan and Mine Çetinkaya-Rundel, two educators who are channeling enormous amounts of energy into teaching…
  • News from UseR!2015 - the RHadoop tutorial

    Andrie de Vries
    1 Jul 2015 | 8:00 am
    by Andrie de Vries Today is the first day of UseR!2015 conference in Aalborg in Northern Denmark.  But yesterday was a day packed with 16 tutorials on a range of interesting topics.  I submitted a proposal many months ago to run a session on using R in Hadoop and was very happy to selected to run a session in the morning. When we first started planning the session, we set a Big Hairy Audacious Goal to run the session using a HortonWorks Hadoop cluster hosted in the Microsoft Azure cloud. We trialled the session at the Birmingham R user group during May, and then again last week during a…
  • Announcing the R Consortium

    David Smith
    30 Jun 2015 | 5:17 am
    The R community has grown explosively over the past few years, both in terms of the number of R users and the number of companies who rely on R as their data science platform. To serve the needs of this rapidly growing community, and to continue the success of the R Project as a whole, representatives from the R Foundation and from industry have joined forces to create the R Consortium, a new collaborative project of the Linux Foundation. The R Consortium is a 501(c)6 non-profit organization dedicated to the support and growth of the R user community. The R Consortium will work with and…
  • Generalized Linear Mixed Models: the FAQ

    David Smith
    29 Jun 2015 | 8:00 am
    Mixed models (which include random effects, essentially parameters drawn from a random distribution) are tricky beasts. Throw non-Normal distributions into the mix for Generalized Linear Mixed Models (GLMMs), or go non-linear, and things get trickier still. It was a new field of Statistics when I was working on the Oswald package for S-PLUS, and even 20 years later some major questions have yet to be fully answered (like, how do you calculate the degrees of freedom for a significance test?). These days lme4, nlme and MCMCglmm are the go-to R packages for mixed models, and if you're using…
  • Because it's Friday: Bad Flags

    David Smith
    26 Jun 2015 | 1:00 pm
    Here's one for all the vexillologists out there. The USA has 50 states, and each of them has a flag. Even if you live in the States, you probably haven't seen them all and many of them are ... surprising. Take the "typography" on Oregon's flag to start: The Washington post has an hilarious tongue-in-cheek review of each of the 50 state flags which had me laughing out loud. On the other hand, the flag of my birth state features a bird doing the YMCA perched on a rectal thermometer, so I probably shouldn't laugh too loudly. That's all for this week.
  • add this feed to my.Alltop

    Data Science Notes

  • Fitness Week Summary

    29 Jun 2015 | 9:11 pm
    One last post to wrap up Fitness Week.  I realized I let myself get distracted by tax policy and whether I could save money by driving to Missouri (read: me being cheap) that I forgot my last post for Fitness week.  That post was supposed to be on modeling fitness data, but first, how did I actually do during fitness week?FITNESS WEEK STATSFitness week ended up being a pretty average week for me, in aggregate above average, as shown below:   A few notes:Each weekday was close to its average over time, each day slightly exceeding the Garmin set goal, per my earlier analysis…
  • Exponential Growth!!! Maybe not...

    28 Jun 2015 | 9:51 pm
    It's much easier to sell a business plan based on the idea of exponential growth, than diminishing returns.A few years ago, I asked a junior team member for an update on a product he was working on.  He was the assigned analytics resource on the product, and I was curious how the new product was proceeding.  He laid out the way the product would work, followed by the general business plan.  He capped his statement off with "which would lead to exponential growth...." which he had been told by the product manager.The last statement got my attention.  So I asked a couple of…
  • Anatomy of an Analysis: How your Toolkit comes together.

    27 Jun 2015 | 10:25 pm
    This is a follow up to my original post on my Data Science Toolkit, which received a hugely positive response.  One of the questions I get from young analysts is "what tool should I use for this task?"  Generally when they are asking this question, they are suggesting a couple of software products, both capable of completing the task, but one has a distinct advantage. So, the advice I give goes something like this:Use whatever tool will get you to an accurate answer fastest; putting you in the best position to followup with the data in the future.  Rules like this are nice, but…
  • Kansas Sales Tax: Drive to Missouri?

    26 Jun 2015 | 10:43 am
    I have lived in Johnson County (JoCo for locals) Kansas for about a year now.  For our out of State readers, Johnson county is considered the most affluent county in Kansas by a fairly wide margin, and is located on the Missouri border.  It is home to many of the Kansas-side Kansas City suburbs, and some of the largest corporations in Kansas City (notably Sprint and Garmin).I live in the eastern half of Johnson County, about ten miles to Missouri.  Since most of Kansas City is in Missouri, you might think that I cross the state line pretty often.  But that's not true.
  • Fitness Week #4: Garmin Vivo Fit 2 Review

    25 Jun 2015 | 8:28 am
    A few months ago I started tracking my fitness using Google Fit on my phone.  It wasn't ideal, but it was good for a while. Soon I bought a Garmin Vivo Fit 2, after reviewing several products and determining what I wanted.  After about three months, I'm generally happy with the product.  Here's my review. I'm not the average user.  What I want in a fitness tracker is this: accurate, complete data.  In long form:My fitness tracker would capture every step and moment of sleep, and have a "dump" button that dumps everything it tracks into a tabular form, where my ETL's…
  • add this feed to my.Alltop

    Data Science Notes

  • Fitness Week Summary

    29 Jun 2015 | 9:11 pm
    One last post to wrap up Fitness Week.  I realized I let myself get distracted by tax policy and whether I could save money by driving to Missouri (read: me being cheap) that I forgot my last post for Fitness week.  That post was supposed to be on modeling fitness data, but first, how did I actually do during fitness week?FITNESS WEEK STATSFitness week ended up being a pretty average week for me, in aggregate above average, as shown below:   A few notes:Each weekday was close to its average over time, each day slightly exceeding the Garmin set goal, per my earlier analysis…
  • Exponential Growth!!! Maybe not...

    28 Jun 2015 | 9:51 pm
    It's much easier to sell a business plan based on the idea of exponential growth, than diminishing returns.A few years ago, I asked a junior team member for an update on a product he was working on.  He was the assigned analytics resource on the product, and I was curious how the new product was proceeding.  He laid out the way the product would work, followed by the general business plan.  He capped his statement off with "which would lead to exponential growth...." which he had been told by the product manager.The last statement got my attention.  So I asked a couple of…
  • Anatomy of an Analysis: How your Toolkit comes together.

    27 Jun 2015 | 10:25 pm
    This is a follow up to my original post on my Data Science Toolkit, which received a hugely positive response.  One of the questions I get from young analysts is "what tool should I use for this task?"  Generally when they are asking this question, they are suggesting a couple of software products, both capable of completing the task, but one has a distinct advantage. So, the advice I give goes something like this:Use whatever tool will get you to an accurate answer fastest; putting you in the best position to followup with the data in the future.  Rules like this are nice, but…
  • Kansas Sales Tax: Drive to Missouri?

    26 Jun 2015 | 10:43 am
    I have lived in Johnson County (JoCo for locals) Kansas for about a year now.  For our out of State readers, Johnson county is considered the most affluent county in Kansas by a fairly wide margin, and is located on the Missouri border.  It is home to many of the Kansas-side Kansas City suburbs, and some of the largest corporations in Kansas City (notably Sprint and Garmin).I live in the eastern half of Johnson County, about ten miles to Missouri.  Since most of Kansas City is in Missouri, you might think that I cross the state line pretty often.  But that's not true.
  • Fitness Week #4: Garmin Vivo Fit 2 Review

    25 Jun 2015 | 8:28 am
    A few months ago I started tracking my fitness using Google Fit on my phone.  It wasn't ideal, but it was good for a while. Soon I bought a Garmin Vivo Fit 2, after reviewing several products and determining what I wanted.  After about three months, I'm generally happy with the product.  Here's my review. I'm not the average user.  What I want in a fitness tracker is this: accurate, complete data.  In long form:My fitness tracker would capture every step and moment of sleep, and have a "dump" button that dumps everything it tracks into a tabular form, where my ETL's…
 
Log in