Automate the survey data pipeline with python
Incorporate data from all the major survey data providers into your data pipeline with Tally, the API for survey data
At Datasmoothie, we recently launched a new product called Tally. Tally is a RESTful API for survey data that can connect to all the major enterprise survey software providers, such as Confirmit, Nebu, Dimensions (reading mdd
/ddf
files), SPSS (sav
files) and others. It can also read CSV files.
Tally enables users to use Python (or Javascript, R, or any other programming language) to integrate their survey data with all the other data in their pipeline. It offers both common data processing methods (e.g. recoding, weighting) and also methods to easily produce deliverables, such as Excel tables and Powerpoint decks.
Selecting your data provider
Tally comes with a Python client library that simplifies using the API. To get started, simply import tally and choose a dataset. In this example, we are using SPSS. After running pip install datasmoothie-tally-client
we run:
import tally
dataset = tally.DataSet(api_key=[your_key])
dataset.use_spss('my_data.sav')
We could also have selected use_confirmit
, use_nebu
, and so on.
Data processing
Using the Tally API, we can both derive new variables and apply RIM weighting to correct our sample to better reflect the population.
First, we combine our locality
variable into a new, derived variable, called urban
.
derive_conditions = [
(1, "Urban", {'locality':[1,4,5]}),
(2, "Rural", {'locality':[2,3]})
]
column = dataset.derive(name='urban', label='Urban or rural', cond_map=derive_conditions, qtype="single")
Then, we use the new variable to apply a weight scheme
scheme={
'urban':{1:75.0, 2:25.0},
'gender':{1:49.0, 2:51.0}
}
result = dataset.weight(name='Gender and urban',
variable='weight_c',
unique_key='resp_id',
scheme=scheme)
The weight method both weighs the dataset and generates a weighting report, with information about how often the RIM algorithm iterated, minimum and maximum weight factors and so on. See the example notebook for more details.
Deliverables
Now that we’ve fetched our data and applied our data processing, it’s time to produce our deliverables.
In this case, the client has asked us for one PowerPoint deck for each locality. So instead of having to painstakingly produce five decks, we simply loop through the codes and use them as a filter when we produce our PowerPoint documents.
The end result is a script that takes the raw survey data, recodes and weights it, and finally generates PowerPoint documents.