Dr Hina Khan

Advanced Queensland Research Fellow

School of Information Technology and Electrical Engineering
Faculty of Engineering, Architecture and Information Technology

Overview

Research Interests

  • Data visualization, Exploratory Data Analysis, Query Optimization, Data Quality, Search Results Diversification

Qualifications

  • Doctor of Philosophy, The University of Queensland

Publications

  • Khan, Hina A. and Sharaf, Mohamed (2017) Model-based diversification for sequential exploratory queries. Data Science and Engineering, 2 151-168. doi:10.1007/s41019-017-0038-0

  • Khan, Hina Anwar (2016). Scalable diversification for data exploration platforms PhD Thesis, School of Information Technology and Electrical Engineering, The University of Queensland. doi:10.14264/uql.2016.1073

  • Hussain, Zaeem, Khan, Hina A. and Sharaf, Mohamed A. (2015). Diversifying with few regrets, but too few to mention. In: Georgia Koutrika, Laks V.S. Lakshmanan, Mirek Riedewald and Kostas Stefanidis, Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web: ExploreDB 2015. ExploreDB, Melbourne, VIC, Australia, (27-32). 31 May 2015. doi:10.1145/2795218.2795225

View all Publications

Grants

View all Grants

Available Projects

  • Collecting smart grid data accurately at fine granularity is a challenging and costly task. Specifically, there is often missing and noisy data due to various reasons including meter problems, communication failures, equipment outages, lost data, and other factors. Consequently, it is important to identify and correct noisy data as poor quality data can lead to misleading data analysis and incorrect decision-making. In this project, we will develop novel data cleaning algorithms based on outlier detection and data imputation techniques that are able to clean meter stream data in real time. Besides data cleaning, automatic detection of data qaultiy issues is an essential component of an efficient data quality management system. Hence, we will focus on effective data visualization techniques for the anomaly detection in smart energy grid data.

  • The data in the smart grid is generated from various sources, such as: phasor measurement data; energy consumption data measured by the widespread smart meters; energy market pricing and bidding data collected by advanced metering infrastructure; management, control and maintenance data for devices and equipment in the power generation, transmission and distribution networks equipped by intelligent electronic devices and operational data for running utilities. In addition to grid data there are other very large data sets, widely used in decision making, such as weather data and geographic information system data. Thus, in this project we will design an innovative and integrated data storage model for incorporating heterogeneous data from various sources, providing a transparent, high-level interface to the users and applications, as well maintaining consistency, integrity and traceability between the various data sources. Modern data storage models like in-memory systems and column stores will be exploited for efficient processing of data for real time analytics. Besides data storage model and processing techniques, it is also very important to decide which data actually needs to be stored and for how long. Particularly, huge volumes of data generated by the smart grid render high storage cost. Hence, we will develop techniques that examine the various data sources and evaluate how long and how much of that data needs to be stored. Those techniques will not only consider the amount of data, but also how much time is that data useful for. Accordingly, this project will define policies that allow historical, non-useful data to be removed from the data store.

View all Available Projects

Publications

Journal Article

Conference Publication

  • Hussain, Zaeem, Khan, Hina A. and Sharaf, Mohamed A. (2015). Diversifying with few regrets, but too few to mention. In: Georgia Koutrika, Laks V.S. Lakshmanan, Mirek Riedewald and Kostas Stefanidis, Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web: ExploreDB 2015. ExploreDB, Melbourne, VIC, Australia, (27-32). 31 May 2015. doi:10.1145/2795218.2795225

  • Khan, Hina A. and Sharaf, Mohamed A. (2015). Progressive diversification for column-based data exploration platforms. In: 2015 IEEE 31st International Conference on Data Engineering, ICDE 2015. IEEE International Conference on Data Engineering, Seoul, South Korea, (327-338). 13-17 April 2015. doi:10.1109/ICDE.2015.7113295

  • Khan, Hina A., Sharaf, Mohamed A. and Albarrak, Abdullah (2014). DivIDE: efficient diversification for interactive data exploration. In: Christian S. Jensen, Hua Lu, Torben Bach Pedersen, Christian Thomsen and Kristian Torp, SSDBM 2014 - Proceedings of the 26th International Conference on Scientific and Statistical Database Management. 26th International Conference on Scientific and Statistical Database Management, SSDBM 2014, Aalborg, Denmark, (). 30 June-2 July 2014. doi:10.1145/2618243.2618253

  • Albarrak, Abdullah, Noboa, Tatiana, Khan, Hina A., Sharaf, Mohamed A., Zhou, Xiaofang and Sadiq, Shazia (2014). ORange: Objective-aware Range Query Refinement. In: 2014 IEEE 15th International Conference on Mobile Data Management : IEEE MDM 2014. 15th IEEE International Conference on Mobile Data Management, IEEE MDM 2014, Brisbane, Australia, (333-336). 15-18 July 2014. doi:10.1109/MDM.2014.48

  • Khan, Hina A., Drosou, Marina and Sharaf, Mohamed A. (2013). DoS: an efficient scheme for the diversification of multiple search results. In: SSDBM 2013 - Proceedings of the 25th International Conference on Scientific and Statistical Database Management. 25th International Conference on Scientific and Statistical Database Management, SSDBM 2013, Baltimore, MD, United States, (1-4). 29-31 July 2013. doi:10.1145/2484838.2484858

  • Khan, Hina A., Drosou, Marina and Sharaf, Mohamed A. (2013). Scalable diversification of multiple search results. In: CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, United States, (775-780). 27 October - 1 November 2013. doi:10.1145/2505515.2505740

Other Outputs

Grants (Administered at UQ)

Possible Research Projects

Note for students: The possible research projects listed on this page may not be comprehensive or up to date. Always feel free to contact the staff for more information, and also with your own research ideas.

  • Collecting smart grid data accurately at fine granularity is a challenging and costly task. Specifically, there is often missing and noisy data due to various reasons including meter problems, communication failures, equipment outages, lost data, and other factors. Consequently, it is important to identify and correct noisy data as poor quality data can lead to misleading data analysis and incorrect decision-making. In this project, we will develop novel data cleaning algorithms based on outlier detection and data imputation techniques that are able to clean meter stream data in real time. Besides data cleaning, automatic detection of data qaultiy issues is an essential component of an efficient data quality management system. Hence, we will focus on effective data visualization techniques for the anomaly detection in smart energy grid data.

  • The data in the smart grid is generated from various sources, such as: phasor measurement data; energy consumption data measured by the widespread smart meters; energy market pricing and bidding data collected by advanced metering infrastructure; management, control and maintenance data for devices and equipment in the power generation, transmission and distribution networks equipped by intelligent electronic devices and operational data for running utilities. In addition to grid data there are other very large data sets, widely used in decision making, such as weather data and geographic information system data. Thus, in this project we will design an innovative and integrated data storage model for incorporating heterogeneous data from various sources, providing a transparent, high-level interface to the users and applications, as well maintaining consistency, integrity and traceability between the various data sources. Modern data storage models like in-memory systems and column stores will be exploited for efficient processing of data for real time analytics. Besides data storage model and processing techniques, it is also very important to decide which data actually needs to be stored and for how long. Particularly, huge volumes of data generated by the smart grid render high storage cost. Hence, we will develop techniques that examine the various data sources and evaluate how long and how much of that data needs to be stored. Those techniques will not only consider the amount of data, but also how much time is that data useful for. Accordingly, this project will define policies that allow historical, non-useful data to be removed from the data store.