Which of the following tasks can you perform with the cloud Pak for data platform API?
Provide analysis results, including data quality score, and automatically assigned data classes and business terms, data types, formats, frequency distributions, and more. Show
Can run automated discovery for Db2, Db2 on Cloud, HDFS, Hive, Microsoft SQL Server, MongoDB, Oracle, PostgreSQL, and Teradata data sources connected through a JDBC connector Can run it for Amazon S3, Greenplum, Netezza, and Snowflake data sources that you connect to through connections created through metadata import if metadata import is enabled. DataStage Edition - Transform data to provide enriched and tailored information for your enterprise. Data Refinery - cleanse and shape tabular data with a graphical flow editor. You can also use dplyr R library operations, functions, and logical operators. When you cleanse data, you fix or remove data that is incorrect, incomplete, improperly formatted, or duplicated. When you shape data, you customize it by filtering, sorting, combining or removing columns, and performing operations. IBM Watson DiscoveryIntroductionIBM Watson™ Discovery for IBM® Cloud Pak is an AI-powered search and content analytics engine that finds answers and insights from complex business content with speed and accuracy. Answers can be surfaced to users through a conversational dialog driven by Watson Assistant or embedded in your own user-interface. With its Smart Document Understanding training interface, Watson Discovery can learn where answers live in complex business content based on a visual understanding of documents. Further enhance Watson Discovery's ability to understand domain specific language by teaching it with Watson Knowledge Studio. Watson Discovery brings together a functionally rich set of integrated and automated Watson APIs to:
Chart DetailsThis chart deploys a single Watson Discovery node with a default pattern. It includes the endpoints listed here. PrerequisitesBefore installing Watson Discovery, you must install:
IMPORTANT: Portworx must be installed before you install Watson Discovery. A Portworx license is included with IBM® Cloud Pak for Data 2.5.0.0. Limitations
Resources RequiredIn addition to the System requirements for IBM Cloud Pak for Data, IBM Watson Discovery has the following requirements. For installation (at minimum):
For use:
StorageParenthetical numbers are the PVs required/created when deploying with the recommended high availability (HA) configuration. See High availability (Production) configuration for more information.
If the kubectl apply -f - < If the kubectl apply -f - < Note: Gluster File System (GlusterFS) is not a supported storage option for Watson Discovery. DocumentationIBM Watson Discovery documentation:
IBM® Cloud Pak for Data:
Pre-install stepsSetting up an OpenShift EnvironmentNOTE: Skip this section if you are deploying to IBM Cloud Private. See Setting up an IBM Cloud Private environment. If you're deploying to an OpenShift cluster,
To make these changes, run Setting up an IBM Cloud Private environmentNOTE: Skip this section if you are deploying to OpenShift. See Setting up an OpenShift Environment.
Configuring Firewall RulesNOTE If your cluster is running with a firewall between nodes, you must complete this step. Otherwise, skip to the next section. Perform the following tasks on every node of your cluster:
Loading Watson Discovery Docker images into your container registryBefore you can install Watson Discovery, you must load all of the Docker images distributed as part of the PPA archive into
the cluster's internal container registry using the
Loading Watson Discovery Docker images into an OpenShift container registry ./bin/loadImages.sh --registry $(oc get routes docker-registry -n default -o template={{.spec.host}}) --namespace {namespace} Loading Watson Discovery Docker images into an IBM Cloud Private container registry./bin/loadImages.sh --registry {cluster_hostname}:8500 --namespace {namespace} Installing Watson DiscoveryInstalling
Watson Discovery deploys a single Watson Discovery application into an IBM Cloud Pak environment. You can deploy to a To install Watson Discovery to your cluster, run the
If you've purchased and downloaded Discovery for Content Intelligence, you must install it now:
Note: The Tip: Run High availability configurationTo deploy in High Availability (Production) mode,
Verifying the Watson Discovery installationOn LinuxFrom the On macOSFrom the The installation is complete. Installing the optional language packThe following enrichments are supported
in English only, unless you download and install the language extension pack
Prerequisite:
To install
$ mkdir ibm-watson-discovery-language-pack $ cd ibm-watson-discovery-language-pack $ mv ~/Downloads/ibm-wat-dis-pack1-prod-2.1.2.tar.xz $ tar xJf ibm-wat-dis-pack1-prod-2.1.2.tar.xz $ ls . bin lib LICENSE README.md RELEASENOTES.md
NOTE The copy of
./installLanguagePack.sh --cluster-pull-prefix {registry}/{namespace} --namespace {namespace} Collecting OpenShift Support LogsFrom the
For more information and options, run This will produce a You can follow the process of the log
collection by running a command like Uninstalling Watson DiscoveryUninstalling Watson Discovery 2.1.2To remove Watson Discovery from an OpenShift or IBM Cloud Private cluster, use the
./uninstallDiscovery.sh --namespace {namespace} By default this script will not remove persistent volume claims or specific secrets required to access or retrieve any data stored in Watson Discovery. To delete all objects associated with this instance of Watson Discovery, including any and all ingested data, include the ./uninstallDiscovery.sh --namespace {namespace} --force Uninstalling Watson Discovery 2.1.1, 2.1.0, 2.0.1, or 2.0.0If you are upgrading from Watson Discovery 2.1.1 or earlier you must uninstall that version of Watson Discovery before you can install Watson Discovery 2.1.2. To delete the resources from a previous Watson Discovery installation named kubectl delete --namespace=my-namespace all,configmaps,jobs,networkpolicies,persistentvolumeclaims,poddisruptionbudgets,roles,rolebindings,clusterroles,clusterrolebindings,secrets,serviceaccounts --selector=release=my-release kubectl delete --namespace=my-namespace configmaps stolon-cluster-my-release-postgresql my-release.v1 NOTE You should run the backup scripts to export your data before performing this uninstallation procedure. Any data in Watson Discovery 2.1.1 will be deleted and unreachable after the uninstall completes. See the documentation on backing up and restoring data for more information. Security referenceNOTE: This information is provided for reference. The install will create the security requirements for you. PodSecurityPolicy RequirementsNOTE: This information is provided for reference. The install will create the security requirements for you. This chart requires a PodSecurityPolicy to be bound to the target namespace prior to installation. The predefined PodSecurityPolicy name: These PodSecurityPolicy resources can also be created manually. A cluster admin can save these templates to separate yaml files and run the command below for each file:
Template for a PodSecurityPolicy definition, currently equivalent Custom PodSecurityPolicy definition: apiVersion: extensions/v1beta1 kind: PodSecurityPolicy metadata: annotations: kubernetes.io/description: This policy is the most restrictive, requiring pods to run with a non-root UID, and preventing pods from accessing the host. seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default name: ibm-discovery-custom-psp spec: allowPrivilegeEscalation: false forbiddenSysctls: - '*' fsGroup: ranges: - max: 65535 min: 1 rule: MustRunAs requiredDropCapabilities: - ALL runAsUser: rule: MustRunAsNonRoot seLinux: rule: RunAsAny supplementalGroups: ranges: - max: 65535 min: 1 rule: MustRunAs volumes: - configMap - emptyDir - projected - secret - downwardAPI - persistentVolumeClaim Template for custom Role resource to replace the default privileged role: kind: Role apiVersion: rbac.authorization.k8s.io/v1 metadata: name: discovery-custom-priv-role rules: - apiGroups: ["", "batch", "extensions"] resources: ["jobs", "jobs/status", "secrets", "pods", "pods/exec", "configmaps"] verbs: ["get", "watch", "create", "apply", "list", "update", "patch", "delete"] - apiGroups: ["policy"] resources: ["podsecuritypolicies"] resourceNames: [ibm-discovery-custom-psp] verbs: ["use"] - apiGroups: [""] resources: ["resourcequotas", "resourcequotas/status"] verbs: ["get", "list", "watch"] To use a custom Role resource you must have custom ServiceAccount resource. NOTE: You must replace with the name of the image pull secret in the namespace of your cluster. For IBM Cloud Private Foundations that secret name is sa-. For RedHat OpenShift you can find the secret name by running oc get secrets \ --namespace "${NAMESPACE}" \ --output=jsonpath='{ range .items[*] }{@.metadata.name}{"\n"}{end}' \ | grep default-dockercfg \ | tr -d '[:space:]' Template for custom ServiceAccount resource to replace the default privileged service account: apiVersion: v1 kind: ServiceAccount metadata: name: discovery-custom-priv-service-account imagePullSecrets: - name: To bind your service accounts to your role you need to create a rolebinding. Template for role binding your custom privileged service account to your custom privileged role: kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: discovery-custom-priv-role-binding subjects: - kind: ServiceAccount name: discovery-custom-priv-service-account roleRef: kind: Role name: discovery-custom-priv-role apiGroup: rbac.authorization.k8s.io Finally, you can specify the name of the custom service account when installing the chart. For example,if you want to override the privileged service account with
Red Hat OpenShift SecurityContextConstraints RequirementsNOTE: This information is provided for reference. The install will create the security requirements for you. If running in a Red Hat OpenShift cluster, this chart requires a The SecurityContextConstraint resource can also be created manually. A cluster admin can save this template to a yaml file and run the command below:
Template for
a SecurityContextConstraints definition, currently equivalent
apiVersion: security.openshift.io/v1 kind: SecurityContextConstraints metadata: annotations: kubernetes.io/description: "This policy is the most restrictive, requiring pods to run with a non-root UID, and preventing pods from accessing the host." cloudpak.ibm.com/version: "1.0.0" name: ibm-discovery-prod-scc allowHostDirVolumePlugin: false allowHostIPC: false allowHostNetwork: false allowHostPID: false allowHostPorts: false allowPrivilegedContainer: false allowPrivilegeEscalation: false allowedCapabilities: [] allowedFlexVolumes: [] allowedUnsafeSysctls: [] defaultAddCapabilities: [] defaultPrivilegeEscalation: false forbiddenSysctls: - "*" fsGroup: type: MustRunAs ranges: - max: 65535 min: 1 readOnlyRootFilesystem: false requiredDropCapabilities: - ALL runAsUser: type: MustRunAsNonRoot seccompProfiles: - docker/default seLinuxContext: type: RunAsAny supplementalGroups: type: MustRunAs ranges: - max: 65535 min: 1 volumes: - configMap - downwardAPI - emptyDir - persistentVolumeClaim - projected - secret priority: 0 After creating the scc, you can bind the SCC to the namespace with this command, replacing
ConfigurationContact Support. Backup and RestoreThis chart currently does not support upgrades or rollbacks. See Backing up and restoring data for instructions. Integration with other IBM Watson servicesWatson Discovery is one of many IBM Watson services. Additional Watson services on IBM Cloud Pak for Data and the IBM Public Cloud allow you to bring Watson's AI platform to your business application, and to store, train, and manage your data in the most secure cloud. For the full list of available Watson services, see:
Watson services are currently organized into the following categories for different requirements and use cases:
Copyright© IBM Corporation 2020. All Rights Reserved. Which of the following tasks can you perform in the cloud Pak for Data web client?From the IBM Cloud Pak for Data web client, you can monitor the services that are running on the platform, understand how you are using cluster resources, and be aware of issues as they arise. You can also set quotas on the platform and on individual services to help mitigate unexpected spikes in resource use.
Which services are included with the base cloud Pak for Data?Some services are included in your purchase of Cloud Pak for Data.. Analytics.. Dashboards.. Data governance.. Data sources.. Developer tools.. Industry solutions.. Storage.. What are the main tools available in cloud Pak for Data for business analyst?IBM Cloud Pak for Data comes with many pre-configured services for practical use, including AI, analytics, dashboards, data governance, industry solutions, data sources, and developer tools.
What are the three primary outcomes the cloud Pak for Data platform delivers over other alternatives?The main three elements are: Organize – business-ready analytics. Analyze – build and scale with confidence. Infuse – harness the power of AI.
|