pycarol.query

Contain all the classes to query data from RT layer in Carol.

class pycarol.query.Query(carol, max_hits=inf, offset=0, page_size=100, sort_order='ASC', sort_by=None, scrollable=True, index_type='MASTER', only_hits=True, fields=None, get_aggs=False, save_results=False, filename='query_result.json', print_status=True, safe_check=False, get_errors=False, flush_result=False, use_stream=False, get_times=False, **kwargs)[source]

Class to query data from Carol.

This class can be used to query data from data models and stagings tables, query using named queries and delete records.

Parameters
  • carol – carol: Carol object Carol object.

  • max_hitsint, default float(‘inf’) Number of records that will be downloaded.

  • offsetint, default 0 Offset for pagination. Only used when scrollable=False

  • page_sizeint, default 100 Number of records downloaded in each pagination. The maximum value is 1000

  • sort_orderstr, default ‘ASC’ Sort ascending (‘ASC’) vs. descending (‘DESC’).

  • sort_bystr, default None Name to sort by.

  • scrollablebool, default True Use scroll for pagination. This should be the main way of doing, unless you are querying few data.

  • index_typestr, default ‘MASTER’ Query data from ‘MASTER’, ‘STAGING’

  • only_hitsbool, default ‘True’ Return only results in the response path $hits.mdmGoldenFieldAndValues

  • fieldslist, default None Fields to return in response. e.g., [“mdmGoldenFieldAndValues.mdmtaxid”, “mdmGoldenFieldAndValues.date”]

  • get_aggsbool, default False To be used if the query/named query has aggravations

  • save_resultsbool, default False If save the result of the query in the file specified in filename

  • filenamestr, default query_result.json File path to save the response.

  • print_statusbool, default True Print the number of records in each interaction.

  • safe_checkbool, default False To be used if there are repeated records (same mdmId)

  • get_errorsbool, default False To get the errors in the goldenRecords, if any.

  • flush_resultbool, default False To be used with save_results, it will not copy the result to memory, only to the file.

  • use_streambool, default False Use the stram of data.

  • get_timesbool, default False It will create a list of times that each pagination took.

  • kwargsdict Extra parameters to be passed to Carol.call_api

check_total_hits(json_query, index_type='MASTER')[source]

Check the total hits for a given query :param json_query: Json object with the query to use :param index_type: Index type to query. :return: number of records for this query

go(callback: Optional[Callable] = None) Query[source]

Run the query.

Parameters

callback – This function will receive the current batch of records from the filter made.

Returns: Query self.

page(offset=0)[source]

Get only one page of the result using offset.

Parameters

offsetint, default 0 Offset to get. To properly paginate manually, offset should be offset + page_size.

Returns

Query json response

pycarol.query.delete_golden(carol, dm_name, now=None)[source]

Delete Golden records.

It Will delete all golden records of a given data model based on lastUpdate.

Parameters
  • carolpycarol.carol.Carol Carol instance

  • dm_namestr Data model name

  • nowstr Delete records where last update is less the now. Any date time ISO format is accepted.

Usage:

from pycarol.query import delete_golden
from pycarol.auth.PwdAuth import PwdAuth
from pycarol.carol import Carol
login = Carol()
delete_golden(login, dm_name=my_dm)

#To delete based on a date.
delete_golden(login, dm_name=my_dm, now='2018-11-16')

Attention

This API will delete all data in the DataModel, and if there is a DataModel View related to this DataModel one needs to reprocess it.