需要关于此作业的帮助?欢迎联系我

Artificial Intelligence Assignment 1

Due 11:59pm Wednesday 3 May 2023

1 Wine Quality Prediction with 1NN (K-d Tree)

Wine experts evaluate the quality of wine based on sensory data. We could also collect the features of wine from objective tests, thus the objective features could be used to predict the expert’s judgement, which is the quality rating of the wine. This could be formed as a supervised learning problem with the objective features as the data features and wine quality rating as the data labels. In this assignment, we provide objective features obtained from physicochemical statistics for each white wine sample and its corresponding rating provided by wine experts. You are expect to implement k-d tree (KDT), and use the training set to train your k-d tree, then provide wine quality prediction on the test set by searching the tree. Wine quality rating is measured in the range of 0-9. In our dataset, we only keep the samples for quality ratings 5, 6 and 7. The 11 objective features are listed as follows [1]:

• f acid: fixed acidity
• v acid: volatile acidity
• c acid: citric acid
• res sugar: residual sugar
• chlorides: chlorides
• fs dioxide: free sulfur dioxide
• ts dioxide: total sulfur dioxide
• density: density
• pH: pH
• sulphates: sulphates
• alcohol: alcohol

Explanation of the Data. train: The first 11 columns represent the 11 features and the 12th column is the wine quality. A sample is depicted as follows:

1.1 1NN (K-d Tree)

From the given training data, our goal is to learn a function that can predict the wine quality rating of a wine sample, based on the objective features. In this assignment, the predictor function will be constructed as a k-d tree. Since the attributes (objective features) are continuously valued, you shall apply the k-d tree algorithm for continuous data, as outlined in Algorithms 1. It is the same as taught in the lecture. Once the tree is constructed, you will search the tree to find the 1-nearest neighbour of a query point and label the query point. Please refer to the search logic taught in the lecture to write your code of 1NN search.

Note: Sorting is not necessary in some cases depending on your implementation. Please figure out whether your code needs to sort the number first. Also, if you compute the median by yourself, when there’s an even number of points, say [1,2,3,4], the median is 2.5.

1.2 Deliverable

Write your k-d tree program in Python 3.6.9 in a file called nn kdtree.py. Yourprogram must be able to run as follows:

$ python nn_kdtree.py [train] [test] [dimension]

The inputs/options to the program are as follows:

• [train] specifies the path to a set of the training data file.
• [test] specifies the path to a set of testing data file.
• [dimension] is used to decide which dimension to start the comparison. (Algorithm1).

Given the inputs, your program must construct a k-d tree (following the prescribed algorithms) using the training data, then predict the quality rating of each of the wine sample in the testing data. Your program must then print to standard output (i.e., the command prompt) the list of predicted wine quality ratings, vertically based on the order in which the testing cases appear in [test].

1.3 Python libraries

You are allowed to use the Python standard library to write your k-d tree learning program (see https://docs.python.org/3/library/ for the components that make up the Python v3.6.9 standard library). In addition to the standard library, you are allowed to use NumPy and Pandas. Note that the marking program will not be able to run your program to completion if other third-party libraries are used. You are NOT allowed to use implemented tree structures from any Python package, otherwise the mark will be set to 0. ...