pycarol.query¶
This submodule has all the classes to query data from RT layer in Carol.
-
class
pycarol.query.Query(carol, max_hits=inf, offset=0, page_size=100, sort_order='ASC', sort_by=None, scrollable=True, index_type='MASTER', only_hits=True, fields=None, get_aggs=False, save_results=False, filename='query_result.json', print_status=True, safe_check=False, get_errors=False, flush_result=False, use_stream=False, get_times=False, **kwargs)[source]¶ Class to query data from Carol.
This class can be used to query data from data models and stagings tables, query using named queries and delete records.
Args:
- carol: carol: Carol object
- Carol object.
- max_hits: int, default float(‘inf’)
- Number of records that will be downloaded.
- offset: int, default 0
- Offset for pagination. Only used when scrollable=False
- page_size: int, default 100
- Number of records downloaded in each pagination. The maximum value is 1000
- sort_order: str, default ‘ASC’
- Sort ascending (‘ASC’) vs. descending (‘DESC’).
- sort_by: str, default None
- Name to sort by.
- scrollable: bool, default True
- Use scroll for pagination. This should be the main way of doing, unless you are querying few data.
- index_type: str, default ‘MASTER’
- Query data from ‘MASTER’, ‘STAGING’
- only_hits: bool, default ‘True’
- Return only results in the response path $hits.mdmGoldenFieldAndValues
- fields: list, default None
- Fields to return in response. e.g., [“mdmGoldenFieldAndValues.mdmtaxid”, “mdmGoldenFieldAndValues.date”]
- get_aggs: bool, default False
- To be used if the query/named query has aggravations
- save_results: bool, default False
- If save the result of the query in the file specified in filename
- filename: str, default query_result.json
- File path to save the response.
- print_status: bool, default True
- Print the number of records in each interaction.
- safe_check: bool, default False
- To be used if there are repeated records (same mdmId)
- get_errors: bool, default False
- To get the errors in the goldenRecords, if any.
- flush_result: bool, default False
- To be used with save_results, it will not copy the result to memory, only to the file.
- use_stream: bool, default False
- Use the stram of data.
- get_times: bool, default False
- It will create a list of times that each pagination took.
- kwargs: dict
- Extra parameters to be passed to Carol.call_api
-
check_total_hits(json_query, index_type='MASTER')[source]¶ Check the total hits for a given query :param json_query: Json object with the query to use :param index_type: Index type to query. :return: number of records for this query
-
pycarol.query.delete_golden(carol, dm_name, now=None)[source]¶ Delete Golden records.
It Will delete all golden records of a given data model based on lastUpdate.
Args:
- carol: pycarol.carol.Carol
- Carol instance
- dm_name: str
- Data model name
- now: str
- Delete records where last update is less the now. Any date time ISO format is accepted.
Usage:
from pycarol.query import delete_golden from pycarol.auth.PwdAuth import PwdAuth from pycarol.carol import Carol login = Carol() delete_golden(login, dm_name=my_dm) #To delete based on a date. delete_golden(login, dm_name=my_dm, now='2018-11-16')
Attention:
This API will delete all data in the DataModel, and if there is a DataModel View related to this DataModel one needs to reprocess it.