Black Coffee & Blue Hours

  • Archivio

  • Categorie

  • Best albums

  • Best films

Archive for the ‘data, analytics, stats’ Category

How can I learn A/B testing? I am into data science/analysis but can’t find good practical resources, just marketing fluff.

Posted by Matthias on November 2, 2019

It depends how deep you need or want to go.

‘A/B testing’ is the jargon used in the industry to refer to a technique which is part of the big Statistics field of DoE (Design of Experiments).
Modern DoE methodology was first formalized by the genius R.A. Fisher in his 1935 book The Design of Experiments – that, among other things, first introduced the concept of the null hypothesis.

If you’re looking for a solid introduction written in a very accessible and applied format, I would recommend the book How to Measure Anything.
This is a very appreciated book in the tech world, and rightly so. It’s easy to read, but at the same time it doesn’t contain any fluff.
The 3rd section specifically is what you’re looking for, but, depending on your knowledge, you could first need to read the previous ones as well.

For something that goes beyond business and is written for university students, the two most popular textbooks on the subject are Oehlert’s A First Course in Design and Analysis of Experiments, and Montgomery’s Design and Analysis of Experiments.

Posted in books, data, analytics, stats, marketing & growth | Leave a Comment »

If machine learning is so good at predictions, why don’t all data scientists model and predict the stock market and just become rich?

Posted by Matthias on October 19, 2019

Thanks for the A2A.
They already do.
It’s what has been happening for a while – super smart people applying science to create predictive models and get spectacular ROI on WS.
I recommend you the book The Physics of Wall Street if you want to know the history and the details of this trend.

Posted in business, data, analytics, stats | Leave a Comment »

Where are PPC careers headed in 2020?

Posted by Matthias on October 15, 2019

In my opinion, in the next years we’re going to see more and more focus on:

1) Ability to play with large data sets (skills needed: data cleaning and data integration skills, practical coding for automation tasks)
2) Audience segmentation & targeting (skills needed: big picture thinking, strategic thinking, data analysis, audience analysis, marketing strategy)
3) Customer journey & Attribution modeling (needed: statistics/analytics)

To do well in PPC, you’ll be required since entry level to have
A) A practical understanding of experiment design
B) A solid grasp of descriptive statistics

PPC roles nowadays can’t afford anymore not to ask for A & B skills right away.
Then, as your career progresses, the more statistical expertise, strategic/holistic view on the customer journey, and abilities in data wrangling you have, the better. Otherwise, you won’t be able to do much in the previously listed three points.

Required skills for PPC are progressively overlapping with BI, Insights, DS.

Posted in data, analytics, stats, marketing & growth | Leave a Comment »

What is the brutal truth about data scientists?

Posted by Matthias on October 12, 2019

Thanks for the A2A.

Kaggle published a survey asking DS what are the barriers they face at work.
Here are the results:

Notice what stands out here?
1) Dirtiness of data is the #1 issue
2) The majority of the following ones are organizational and managerial issues

Data professionals tend to underestimate these two brutal facts.
Even if you personally have all the needed talent and the tools, there’s so much more that can sink you. Being in a company that didn’t invest in building a good data pipeline before hiring analysts, that can’t set in place a good cross-team coordination for DS projects, or having managers who can’t speak your language, have different expectations, hired you to do awesome stuff and then gave you just data cleaning work, or ignore your findings for their decisions… these are all extremely common situations and can be very frustrating in the long-run, as they’re outside your control.

Posted in data, analytics, stats | Leave a Comment »

Is the European Union economy stagnating?

Posted by Matthias on October 8, 2019

No, it’s not.

If we define stagnating as “growing at an average of 0% – 1% every year for various years in a row”, then the only stagnating economy in the EU since 2013 has been Italy. But Italy has been stagnating for around 30 years (I talked about this more in depth in a separate answer), so unfortunately no surprise there.

Separating the EU countries in groups by average GDP growth rate for 2013–2018 (World Bank data):

High (2.7% – 9%)
Czech Republic

Average (1.3% – 2.1%)
UK (it tops the group thanks to the peak in 2014–15, then it went back to 1.5%)

Low (0% – 1.3%)
Belgium (usually flat, but last two years closed normal at 1.73% and 1.5%)
France (it has been flat for some years, but last two closed normal at 1.82% and 1.5%)
Finland (bad performance for some years until 2015, then it started growing back fast at 2% – 2.6%)
Cyprus (because of the huge -6% in 2013, now it’s growing at 3%)
Italy (it has been flat for decades)
Greece (many negative or flat years, but the last two were normal at 1.5% – 2%)

You can see the division is basically in
1) Eastern countries that have a lot of room for growth, and that’s why their rate is so high,
2) Outliers: more mature economies that have leveraged particular policies and have had a lot of growth as a result (Ireland, Luxembourg, Malta, Sweden),
3) Healthy mature economies, growing at a normal rate,
4) Bad performers (only the bottom 3, of which only Italy is still flat now in 2019)

For the same period, the growth rate of USA has been 2.28%, which is a performance that is slightly above the middle group, but still below the high rate group.
Mature economies in EU are currently (2018–2019) growing at 1.5% – 2%, and USA is growing at 2% – 3%.
EU could definitely do better, but it’s not far away from the USA, and, given its very different social policies (healthcare, education, welfare) and internal obstacles (ex. lower labor mobility due to the different languages), that’s quite a good result.

Then someone could argue (and many do, like Tyler Cowen, Peter Thiel, etc.) that getting used to consider 1.5% – 2.5% a good performance means actually living in a complacent, stagnating economy. But, if one adopts that benchmark, then USA has been complacent and stagnating for a long time as well.

Posted in data, analytics, stats, economics | Leave a Comment »

Is it true that the best data scientists are self-taught?

Posted by Matthias on October 3, 2019

Literally every good performer in a certain field (any field) is for the most part self-taught.

Think about it: you start to perform well once you develop expertise. You acquire expertise with experience. You acquire experience by working on specific problems in your field for hours every day.
There’s no shortcut for this.

Not only that: new tools and methods will come around with time, so you’re always required to keep up-to-date with new information (lifelong learning).
There’s no shortcut for this, too.

High School and University give you invaluable fundamentals and methodology, and make you able to take an entry level position in a given field.
They do not guarantee anything beyond that, if you’re then not able to use it to do deep work (as defined by Cal Newport) and/or willing to keep up with innovation.

Since OP’s question focused on DS, in the words of François Chollet himself:

There are no “online” and “offline” educations, nor “formal” and “informal”. The best people are 90%+ self-educated, whether they have a degree from Stanford or not. The value-add of degrees in CS is increasingly marginal.

Posted in data, analytics, stats, food for thought | Leave a Comment »

Is data scientist a new job or was it rebranded?

Posted by Matthias on October 3, 2019


In my opinion it’s more new, or better an acceptable new buzzword, than rebranded: there was no good enough term to describe in a single, short manner the variety of different roles in the contemporary world of predictive analytics (if it were just a rebranded term, then “applied statistician”, which would be the closest alternative, would be better at describing them, but it’s not).

Such variety of roles was already large before the so-called big data explosion, and is going to become even larger as the big data democratization process will continue.
At the same time, it’s also too much of a macro-group to mean anything substantial. Like saying “medicine”. You can say “I studied medicine”, or “I work in the medical field”, which are ways to communicate the main point in an easy, short and universally understandable manner, but what your title is and what you actually do daily are going to be way more specific and limited.

What makes sense is to say “I work with X methods/tools applied to Y domain/field/sector”.
Then anyone can give that combination whatever title they think is going to make it sound cooler or to attract the best talent, but that’s not important.

Posted in business, data, analytics, stats | Leave a Comment »

Which is the best major for students who don’t know what career they want?

Posted by Matthias on October 3, 2019


  1. If you later decide for an academic career, no matter the field, from sciences to social sciences to nowadays even humanities, you’ll use it
  2. If you decide to go into business, no matter the specific area (from the most obvious ones like marketing, to the less obvious ones like human resources), to get to advanced positions you’ll use it
  3. If you decide to specialize into any applied science, you’ll use it
  4. If you decide for tech/programming, either you’re going to use it or your thinking is going to benefit from having studied it
  5. Even if you’re not interested, or you won’t manage, to be in a role directly using it, your future company/industry/field is almost certainly going to have other people directly working with it, and your knowledge of the subject will therefore make you able to communicate on a technical level with them, making you just as essential
  6. Lastly, even if you happen not to do anything directly related, it’s going to be useful in your daily life nonetheless – to think more clearly, make better decisions, and avoid being deceived by misinformation, media and politics

In the vast majority of cases, you’re not going to need an advanced knowledge in the field, so a Bachelor’s would be optimal.
After such a Bachelor’s, you can literally specialize in anything you prefer, and, no matter your chosen path, you’re going to be ahead of the curve compared to your peers.

Posted in data, analytics, stats, life choices, strategy & decision-making | Leave a Comment »

Do many people applying to Data scientist positions end up either unemployed or in other positions because of the intense competition for most data scientist positions?

Posted by Matthias on October 6, 2018

If you’re good at automating stuff via programming you’re never going to be unemployed.
Economic growth nowadays comes in large part from increasing productivity, which is done by automating.

If you’re good at analyzing and interpreting data, same thing.
Data is the equivalent of oil in the digital economy.

What you’re probably referring to is a specific type of position that matches the classic description of what a DS does that you can find posted online everywhere.
Well, the problem with that type of role is not exactly the competition. You’re going to find it difficult to get a position like that even if your skill set is perfectly aligned. The problem is that the companies owning very large data sets that also have a specific strategy about how to leverage them are still a low number, and for the moment are either American or Chinese.
You need to get hired by one of those companies in order to get the experience needed to reach the type of DS role you have in mind and articles on the interwebz love to talk about.

But why obsess over that when the same skills can get you so many easily accessible alternatives?

Posted in data, analytics, stats, life choices | Leave a Comment »

Why is becoming a data scientist so difficult?

Posted by Matthias on October 6, 2018

It’s like asking why is becoming [any other high-level profession] difficult. DS is not an entry-level position, and I’m constantly surprised by how many on the net seem to think it is.

Let’s leave the theory alone, since the typical answer focuses on that.

The real mountain to climb is not learning theory, which is the starting point and can be done on your own, but developing as an applied programmer + developing a domain knowledge. Having all three is the only way to reach a DS position, because otherwise you’re not going to be useful there. You’ll just sit on a mountain of data and stare into the abyss, but that’s what the folks who accumulated that data are already doing, and they don’t need you for that.

For the first part, there’s no shortcut – you just have to practice constantly and challenge yourself with progressively complex issues to solve *in a real setting*. When you come out of academia you have no clue about this. And you won’t be able to do it in your room. You’ll need a work environment where you’ll be given real datasets and you’ll deal with real problems you’re expected to solve. This way you’ll be able to develop an eye for practical solutions.

For the second part, forget about shortcuts either. Developing domain knowledge is fundamental to understand what questions need to be asked, and the only way to reach that point of awareness is to work in a specific field and understand where exactly the well-known issues typically are, where the possible points of optimization are, and what ideas are actually still unexplored (and not new and cool in your head, but old news outside of it).

Previous points can take easily 10 years of work experience, and, even if there’s a fast progression in challenges and complexity, 5 years bare minimum.

Finding a place where that progression happens and you’re finally able to reach the point of being useful in DS is what’s really difficult.

Posted in business, data, analytics, stats, life choices | 1 Comment »

Why is Italy poorer and more underdeveloped than other European countries?

Posted by Matthias on December 18, 2017

First of all, Italy is quite wealthy, it’s one of the 10 top economies in the world.

And not only that, but the Italian super-rich cohort keeps growing. In fact, it has just pushed the country to the world’s #8 for financial wealth, says a report by The Boston Consulting Group.

The problem is where all that wealth gets produced, and where all that wealth goes.

That’s why the country falls down to somewhere around #25-36 once you look at the GDP (PPP) per capita.

The facts:

  1. A small part of the country (the North, and not even all of it) produces >60% of the national GDP
  2. Italy export volumes are huge, stable within the world’s Top 10:


A small part of the country (the North, and not even all of it) produces >60% of the national GDP; at the same time, internal inequality has been steadily increasing for decades, with fewer and fewer people actually seeing that wealth because of internal dysfunctions in both the public and private sector.

The dysfunctions are concentrated in the big part of the economy that does not export and thus avoids to face international competition, and are due to excessively high costs, excessively low productivity, family and personal relationships considered more important than skills when hiring and promoting, protectionist laws, and a distorted use of public resources. These problems found a weird vicious balance during the decades right after the economic boom, but the 1992 (both political and economic) crisis first and the 2007–2008 financial crisis later inflicted a fatal blow to the absurd system that was built up until then. Italians have yet to wake up to the fact that that system was absurd and that it needs to be forgotten, though. Unfortunately, it seems like it’s also impossible to make most of them understand the basic point that a country grows if its productivity does. For some reason, the argument gets constantly ignored or rejected by the public.

The 5 key reforms Italy needs ASAP are:
1. Making pensions sustainable (pensions/GDP is at 16%, and we’re projected to have a retired person for every working person by 2045)
2. Making the public sector efficient and accountable, and creating a federalist system (a real one, unlike the fake federal reforms that have blown up public debt for decades), to make regions (especially in the South, but also Rome) accountable and fiscally responsible
3. Cutting regressive taxes and bureaucracy (you can do this only after #1 and #2)
4. Increasing the private sector productivity by removing laws and subsidies that are artificially keeping unproductive firms alive and new, better competitors out of the market (you can do this only after #3)
5. Reforming the banking system, which is undercapitalized and unable to guarantee access to credit

The problem with productivity (#4) is concentrated in the small firms, which, unfortunately, occupy the vast majority of the Italian workforce (some sources say around 70%). It’s extremely difficult to do something about it, because too many voters are employed there, and too many of them prefer the status quo to a better possible future where they can be more productive and make more money.


Nobody is touching #5 because of rampant cronyism and corruption.
Some micro reforms have been done during Italy’s darkest hour, after the financial recession nearly made us collapse (2011), especially about #1. But it wasn’t enough.
Renzi tried to do something about #2, but it wasn’t enough, and his main idea (the referendum that didn’t pass) was going in the wrong direction in more than one sense.
The perfect moment to do all that was in the early 2000s, a stagnant but stable time when we had just joined the Euro and were enjoying low interest rates. But we lost a decade thanks to Berlusconi and his banana republic antics on one hand, and thanks to people obsessed by him 24/7 (rather than by those issues) on the other. Now it’s late. Anyone attempting to attack those points will lose the elections, because most Italians live with the dream that things could just go back to Eldorado (the 1980s, basically, when money was growing on trees), and, if they don’t, it’s because there’s some conspiracy going on (blame the EU, the Euro, the immigrants, the masons, the technocrats, Germany, CIA, politicians’ salaries, you name it – you can even find several examples in the other answers to this question here on Quora). They just can’t accept the fact that the world has changed and that’s what we should do as well, radically. Because we can’t afford to have an inefficient public sector anymore, we can’t afford to compete on low costs of labor anymore, and we can’t afford not to invest in innovation and not to jump to produce new things and leave behind the old models (non-innovative, non-meritocratic, family-run or relationship-based businesses and protected professions) anymore.

Until then, we’ll just keep stagnating and electing worse and worse demagogues promising us to go back to Eldorado through some silly or nasty slogan.
And, with a stagnating growth and a broken public sector, wealth will be shared with fewer and fewer people.
And young people with college degrees and work ethic will remain unemployed, or forced to compete like crazy for the few good jobs in the few productive centers, and to accept miserable wages to sustain the privileges accumulated by the Baby Boomers during the past crazy decades. Or forced to move out of the country, somewhere where they can be absorbed by the economy and be productive.

If you want to know more about the crazy policies that undermined the Italian public sector and distorted the social contract, I suggest two books: Il macigno and La lista della spesa.
If you want an overall analysis of the major problems of the Italian economy, then read I sette peccati capitali dell’economia Italiana.
All these three books were written by the economist Carlo Cottarelli, who got hired by the Government to help solving them, and then left the role once he realized nobody actually had any intention of applying his tips.

Two related videos, in English.

The first one focuses especially on the productivity problem of the Italian private sector and how it deliberately missed the IT train in the 1990s.

The second one explores three hypotheses to the question of “why has Italy stopped growing?”: 1. the size of the public sector, 2. the Euro, 3. the failed transition from an imitation to an innovation economy. Then it shows how some popular explanations (joining the Euro and having the wrong sectoral allocation) don’t hold once you scrutinize the data. The data show instead that the inequality of productivity and profits between exporting firms and firms oriented only to the local market has dramatically increased – with the first ones driving up the economy, and the second ones driving it down, since they should close and disappear, but they don’t because they’re protected by anti-competition laws and rules.
So the only possible solution should be to enable the reallocation of capital from the inefficient protected firms to the efficient and competitive ones, but it can’t be done because the large institutions (built during the imitation phase and useful back then but not now) have gained too much power to be dismantled, an excessive number of Italians are employed in the unproductive smaller firms plus are not willing to change anything of their situation, and, as a consequence, the political parties’ culture is generally anti-competition across the entire spectrum.
The other odd difference found is that, unlike in other developed countries, in the productive firms, management comes mostly from within the family who owns the business.

They’re both must-watch, and probably the best ones on Youtube on the subject, if you’re interested in understanding the Italian economy.

Posted in business, data, analytics, stats, economics, food for thought | 2 Comments »

%d bloggers like this: