Automate the survey data pipeline with python

Incorporate data from all the major survey data providers into your data pipeline with Tally, the API for survey data

Geir Freysson
Datasmoothie

--

At Datasmoothie, we recently launched a new product called Tally. Tally is a RESTful API for survey data that can connect to all the major enterprise survey software providers, such as Confirmit, Nebu, Dimensions (reading mdd/ddf files), SPSS (sav files) and others. It can also read CSV files.

Tally enables users to use Python (or Javascript, R, or any other programming language) to integrate their survey data with all the other data in their pipeline. It offers both common data processing methods (e.g. recoding, weighting) and also methods to easily produce deliverables, such as Excel tables and Powerpoint decks.

Selecting your data provider

Tally comes with a Python client library that simplifies using the API. To get started, simply import tally and choose a dataset. In this example, we are using SPSS. After running pip install datasmoothie-tally-client we run:

import tally
dataset = tally.DataSet(api_key=[your_key])
dataset.use_spss('my_data.sav')

We could also have selected use_confirmit, use_nebu, and so on.

Data processing

Using the Tally API, we can both derive new variables and apply RIM weighting to correct our sample to better reflect the population.

First, we combine our locality variable into a new, derived variable, called urban.

derive_conditions = [
(1, "Urban", {'locality':[1,4,5]}),
(2, "Rural", {'locality':[2,3]})
]
column = dataset.derive(name='urban', label='Urban or rural', cond_map=derive_conditions, qtype="single")

Then, we use the new variable to apply a weight scheme

scheme={
'urban':{1:75.0, 2:25.0},
'gender':{1:49.0, 2:51.0}
}
result = dataset.weight(name='Gender and urban',
variable='weight_c',
unique_key='resp_id',
scheme=scheme)

The weight method both weighs the dataset and generates a weighting report, with information about how often the RIM algorithm iterated, minimum and maximum weight factors and so on. See the example notebook for more details.

Deliverables

Now that we’ve fetched our data and applied our data processing, it’s time to produce our deliverables.

In this case, the client has asked us for one PowerPoint deck for each locality. So instead of having to painstakingly produce five decks, we simply loop through the codes and use them as a filter when we produce our PowerPoint documents.

The end result is a script that takes the raw survey data, recodes and weights it, and finally generates PowerPoint documents.

--

--

Co-founder of Datasmoothie. I also maintain the open-source survey data library Quantipy and it’s enterprise equivalent Tally.