Data Mining

Data mining can be used to extract useful and understandable patterns from big data streams, providing new knowledge to better understand occupant behavior in buildings.

Understanding and accurately quantifying the impact of energy-related occupant behavior (OB) in buildings is key to energy efficient design, operations, and retrofit of buildings. Nevertheless the stochastic nature of OB, the number of people that occupy a space, the duration of the occupied period, and this collective impact on building energy use, are non-trivial aspects to characterize. We apply state-of-the-art data mining approaches as a powerful analysis technique to extrapolate useful and understandable occupancy patterns from big streams of monitored building data.

Data Mining Process: 

Data mining is "Knowledge Discovery in Databases (KDD)" and includes a combination of advancements in the fields of machine learning, pattern recognition, database, statistics, artificial intelligence, knowledge acquisition and data visualization. This process of extraction is conducted following six steps as shown in the diagram below.

One of the objectives of our research, in conjunction with Annex 66, is to address the following fundamental research question: How to develop quantitative descriptions of OB in order to analyze and evaluate the impact of OB on building energy consumption? In one study [1], D'Oca and Hong (2015) applied advanced data mining techniques to improve occupancy schedules used in building energy modelling (BEM). Traditionally, the simplified assumptions of OB used in BEM programs are represented by deterministic schedules, where time schedules of occupants, lights, HVAC systems, and plug-load equipment are decided independently of the indoor and outdoor conditions. This simplification decouples OB from the building simulation process at the cost of large deviations from actual behavior. To address this issue, a three-step data mining framework was used to identify actual occupancy schedules. First, a data set of 16 offices with 10-minute interval occupancy data, over a two year period was mined through a decision tree model to predict occupancy presence. Then a rule induction algorithm was used to learn a pruned set of rules from the results from the decision tree model. Finally, a cluster analysis is employed in order to obtain consistent patterns of occupancy schedules. The patterns established included:

Pattern A: Highest occupancy rate Monday to Friday. 

Pattern B: Medium occupancy rate Monday to Friday. 

Pattern C: Most variable occupancy rate. This includes a medium occupancy rate on Monday, Tuesday and Thursday and medium-high occupancy on Friday. On Wednesday, the user's vacancy/presence state in the office space varies with high frequency. 

Pattern D: Lowest occupancy rate Monday through Friday. 

As a result four identifiable archetype user profiles were determined, each with different user energy savings strategies and building design recommendations.  

This work provides one example of how data mining can quantitatively describe OB in order to analyse and evaluate the impact of OB on building energy consumption, especially through simulation. Two other studies used data mining approaches to identify drivers and patterns of windows opening and closing [3], and to analyse influence of OB on energy use in buildings [4] have been completed.

More Information about data mining can be found at: 

  1. D’Oca S, Hong T. Occupancy schedules learning process through a data mining framework. Energy and Buildings, 2015.
  2. D'Oca S, Corgnati S, Hong T. Data mining of occupant Behavior in office buildings, 6th International Building Physics Conference, IBPA, 2015. 
  3. D’Oca S, Hong T. A data-mining approach to discover patterns of window opening and closing behavior in offices. Building and Environment, 2014.
  4. Yu Z, Fung BCM, Haghighat F, Yoshino H, Morofsky E. A systematic procedure to study the influence of occupant behavior on building energy consumption. Energy and Buildings, 2011.